Indexing Policy & Robots.txt
Sunny
Posts: 9296
Reputation: 456

Message # 1 | 10:41 AM
Website's Indexing Status


All uCoz websites have Indexing status that is displayed at the top of the Control Panel's main page (/panel/?a=cp). The parameter shows whether indexing by search engines is allowed for the website or not (whether the website is in quarantine).
The indexing status can show one of the two options: "indexing is allowed (quarantine is removed)":



Or "indexing is prohibited (the website is in quarantine)":



The status "indexing is prohibited (the website is in quarantine)" is assigned by default to all newly created websites.

Quarantine Removal Policy


A website can become available for indexing either automatically (if a premium plan is purchased) or upon the website owner's request. If the website does not have a premium plan and the user wants the quarantine to be removed, a request should be submitted from the website's Control Panel:



There will be a pop-up window with the info on the quarantine policy:



After the request has been submitted, the website will be checked automatically according to a number criteria: the website's age, presence of a custom domain name, content, verified phone number etc. On the basis of these criteria the system decides whether the quarantine should be removed. We cannot provide a more detailed description of the algorithm.

Note! If the quarantine removal was denied, the next request can be submitted no sooner than in 7 days.


Robots.txt


A website's robots.txt file is located at http://your_website_address/robots.txt. A website with the default robots.txt is indexed in the best possible way – we set up the file in such a way that only pages with content are indexed, and not all existing pages (e.g. login or registration page). Therefore uCoz websites are indexed better and get higher priority in comparison with other sites where all unnecessary pages are indexed.

That's why we strongly recommend not to replace the default robots.txt by your own.


If you still want to replace the file by your own, create a text file using Notepad or any other text editor and name it "robots.txt". Then upload it to the root folder of your website via File Manager or FTP. Note: while website indexing is prohibited, no modification of the robots.txt file is possible.

The default robots.txt looks as follows:
Quote

User-agent: *
Allow: /*?page
Allow: /*?ref=
Allow: /stat/dspixel
Disallow: /*?
Disallow: /stat/
Disallow: /index/1
Disallow: /index/3
Disallow: /register
Disallow: /index/5
Disallow: /index/7
Disallow: /index/8
Disallow: /index/9
Disallow: /index/sub/
Disallow: /panel/
Disallow: /admin/
Disallow: /informer/
Disallow: /secure/
Disallow: /poll/
Disallow: /search/
Disallow: /abnl/
Disallow: /*_escaped_fragment_=
Disallow: /*-*-*-*-987$
Disallow: /shop/order/
Disallow: /shop/printorder/
Disallow: /shop/checkout/
Disallow: /shop/user/
Disallow: /*0-*-0-17$
Disallow: /*-0-0-

Sitemap: http://forum.ucoz.com/sitemap.xml
Sitemap: http://forum.ucoz.com/sitemap-forum.xml



Robots.txt during the quarantine looks as follows:

Quote

User-agent: *
Disallow: /




Robots.txt FAQ


Informers are not indexed because they display information that ALREADY exists. As a rule this information is already indexed on the corresponding pages.


Question: I have accidentally messed up robots.txt. What should I do?

Answer: Delete it. The default robots.txt file will be added back automatically (the system checks whether a website has it, and if not – adds back the default file).


Question: Is there any use in submitting a website to search engines if the quarantine hasn't been removed yet?

Answer: No, your website won't be indexed while in quarantine.


Question: Will the robots.txt file be replaced automatically after the quarantine has been removed? Or should I update it manually?

Answer: It will be updated automatically.


Question: Is it possible to delete the default robots.txt?

Answer: You can't delete it, it's a system file, but you can add your own file. However, we don't recommend to do this, as was stated above. During the quarantine it is impossible to upload a custom robots.txt.


Question: What should I do to forbid indexing of the following pages?
_http://site.ucoz.com/index/0-4
_http://site.ucoz.com/index/0-5

Answer: Add the following lines to the robots.txt file:
/index/0-4
/index/0-5


Question: I have forbidden indexing of some links by means of robots.txt but they are still displayed. Why is it so?

Answer: By means of robots.txt you can forbid indexing of pages, not links.


Question: I want to make some changes in my robots.txt file. How can I do this?

Answer: Download it to your PC, edit it and then upload it back via File Manager or FTP.

I'm not active on the forum anymore. Please contact other forum staff.
Tia
Posts: 27
Reputation: -1

Message # 16 | 12:03 PM
Sunny (or Darts), my site is 28 days old, but my robots.txt looks this way:

User-agent: *
Disallow: /a/
Disallow: /stat/
Disallow: /index/1
Disallow: /index/2
Disallow: /index/3
Disallow: /index/5
Disallow: /index/7
Disallow: /index/8
Disallow: /index/9
Disallow: /panel/
Disallow: /admin/
Disallow: /secure/
Disallow: /informer/
Disallow: /mchat
Disallow: /search

So...does it mean that my site is not in quarantine any longer?
If not, then what is the problem? I'm sorry...for being so annoying.

P.S. When i type www.mysitename.com/index/1 (2 or 3) it takes me straight to my main page. Does it mean that my main page is blocked (index/1 (2 and 3) are in the robots.txt)? or i just don't have those pages that's why it takes me to the mane page?

Sunny
Posts: 9296
Reputation: 456

Message # 17 | 12:48 PM
Yes, it looks like your site is not on quarantine any more. As for the problem... try to contact Tech. Support via your CP -> Help tab.

Quote (Tia)
or i just don't have those pages that's why it takes me to the mane page?

Yes, you don't have those pages.


I'm not active on the forum anymore. Please contact other forum staff.
Tia
Posts: 27
Reputation: -1

Message # 18 | 3:48 AM
Sunny! I did... :

Answer: You should download your existing http://site_name/robots.txt, change it and then upload to the root folder of your site.
Answered by: Sergio | Received: 2009-09-26, 5:43 Pm | Answered: 2009-09-27, 5:19 Pm

hmmm....i thought i shouldn't touch robot's txt at all... (read your post, Synny, about robots.txt) now he told me to change it...???
I don't understand!!!!!!

Tia
Posts: 27
Reputation: -1

Message # 19 | 11:47 AM
Sunny, you closed my thread but i still didn't get the answer i'm looking for...
My original thread - http://forum.ucoz.com/forum/6-8045-1
So what am i supposed to do with robots. txt?

Quote (Sunny)
WE STRONGLY RECOMMEND NOT TO REPLACE THE DEFAULT robots.txt BY YOUR OWN. You may be sure, we do all possible for uCoz sites to develop better. Otherwise, what’s the use of uCoz?!

Quote (Tech Support)
Answer: You should download your existing http://site_name/robots.txt, change it and then upload to the root folder of your site.
Answered by: Sergio | Received: 2009-09-26, 5:43 Pm | Answered: 2009-09-27, 5:19 Pm

I don't get it...replace or not... If i need to replace it - where do i find this file (i can't find it in my File manager!)???

Quote (Sergio)
...change it and then upload to the root folder of your site.
...change what?

I also asked Tech Support if robots.txt was the reason why google was blocked by the server and he didn't answer...i thought indexing is prohibited only when your robots.txt is in quarantine???!!! Mine is not!

Please... explain!

Sunny
Posts: 9296
Reputation: 456

Message # 20 | 1:44 PM
Tia, I closed the thread because you created three threads about one and the same matter which is against forum rules.

Quote (Tia)
So what am i supposed to do with robots. txt?

Yes, it is not desirable to change default robots.txt but if you need to change it you can do this. I don't think changing is necessary on your case.

Quote (Tia)
If i need to replace it - where do i find this file (i can't find it in my File manager!)???

To replace it open your robots.txt (by the address _http://site_name/robots.txt), click File -> Save as (in your browser), save, edit the file, then upload it to File Manager.

Quote
I also asked Tech Support if robots.txt was the reason why google was blocked by the server and he didn't answer...i thought indexing is prohibited only when your robots.txt is in quarantine???!!! Mine is not!

Yes, indexing is prohibited when a website is on the quarantine. Maybe at that time, when Google was blocked, your website was still on quarantine. Now it is not. And your website is indexed by Google, Yahoo!, Binq... If some of your pages are not indexed by Google it may be caused by many reasons and it is hardly because of the default robots.txt. I advise you to read articles on website promotion and search engine optimization.


I'm not active on the forum anymore. Please contact other forum staff.
jaz
Posts: 35
Reputation: 1

Message # 21 | 8:34 AM
Friends i have my robots.txt in my

Code
old domain as (www.gospeldownloads.do.am/robots.txt)

User-agent: uBot
Disallow: /a/
Disallow: /stat/
Disallow: /index/1
Disallow: /index/2
Disallow: /index/3
Disallow: /index/5
Disallow: /index/7
Disallow: /index/8
Disallow: /index/9
Disallow: /panel/
Disallow: /admin/
Disallow: /secure/
Disallow: /informer/
Disallow: /mchat
Disallow: /search

User-agent: *
Disallow: /

and in the new like this (www.christian-downloads.co.cc/robots.txt)

Code
user-agent: *
Disallow: /a/
Disallow: /stat/
Disallow: /index/1
Disallow: /index/2
Disallow: /index/3
Disallow: /index/5
Disallow: /index/7
Disallow: /index/8
Disallow: /index/9
Disallow: /panel/
Disallow: /admin/
Disallow: /secure/
Disallow: /informer/
Disallow: /mchat
Disallow: /search

Sitemap: http://christian-downloads.co.cc/sitemap.xml
Sitemap: http://christian-downloads.co.cc/sitemap-forum.xml

I face few probs in webmaster tools do to this..is the robots.txt is correct in both the domain???


X-Zoner
Sunny
Posts: 9296
Reputation: 456

Message # 22 | 2:12 PM
jaz, the first website is on the quarantine.
I'm not active on the forum anymore. Please contact other forum staff.
edyvlad
Posts: 5
Reputation: 0

Message # 23 | 3:29 AM
Hi,i need a little help if it's possible.
I have enabled the "friendly url" option for my site,but now i see in the google webmaster tools that google has find on my site pages with duplicate titles and descriptions (i suppose duplicate content too) for example:

http://mysite/news/2010-01-05-3
http://mysite/news/bla_bla_bla/2010-01-05-3

I guess this is not good for my site...i want to know if i can stop google indexing the first url by addind

Code
Disallow: /news/2010-01-05-3
to my robots.txt ,if the other url will remain indexed,and if by doing this i can finish my duplicate content problems...
If this is not the solution,please advise me what shoud i do...

Thank you anticipated.

Sunny
Posts: 9296
Reputation: 456

Message # 24 | 10:22 AM
edyvlad, don't worry. Search engines usually see the links you show them, i.e. the links they find on your website and in the sitemap - and at the moment they must be friendly. Of course at first search engines will show both old and new links, but old links will be gradually filtered off and only new will remain.
I'm not active on the forum anymore. Please contact other forum staff.
Hilti26
Posts: 11
Reputation: 0

Message # 25 | 2:03 PM
hi how do i find out if my website is still in quarantine or not as im trying to ad sitemap to google
please help
thanks
Sunny
Posts: 9296
Reputation: 456

Message # 26 | 2:13 PM
Hilti26, you can see examples of both robots.txt in the first message. If you see this User-agent: uBot then your website is still on quarantine.
I'm not active on the forum anymore. Please contact other forum staff.
Hilti26
Posts: 11
Reputation: 0

Message # 27 | 5:12 PM
Okay thankyou your help an fast reply

Added (2010-02-06, 11:12 Am)
---------------------------------------------
My site map page www.appulous.eu/robots.txt page is blank

it had this one minute and now it's a blank page
User-agent: *
Disallow: /a/
Disallow: /stat/
Disallow: /index/1
Disallow: /index/2
Disallow: /index/3
Disallow: /index/5
Disallow: /index/7
Disallow: /index/8
Disallow: /index/9
Disallow: /main/
Disallow: /admin/
Disallow: /secure/
Disallow: /informer/
Disallow: /mchat

Sunny
Posts: 9296
Reputation: 456

Message # 28 | 11:29 AM
Hilti26, it works fine atm.
I'm not active on the forum anymore. Please contact other forum staff.
Mike7501
Posts: 1
Reputation: 0

Message # 29 | 10:10 AM
Sunny, Hi Sunny, I also sent you this messsage, but im not sure if you got it, soo im posting it here also:

I was reading your post about the robots.TXT file, and my website is over a year old, and suddenly it just started getting blocked yesterday by the robots.TXT file, do you know how to stop this or fix it?

thanks

Added (2010-02-11, 4:10 Am)
---------------------------------------------
Sunny, Hi Sunny, I also sent you this messsage, but im not sure if you got it, soo im posting it here also:

I was reading your post about the robots.TXT file, and my website is over a year old, and suddenly it just started getting blocked yesterday by the robots.TXT file, do you know how to stop this or fix it?

thanks

Sunny
Posts: 9296
Reputation: 456

Message # 30 | 11:07 AM
Mike7501, provide the website url.
I'm not active on the forum anymore. Please contact other forum staff.
Search: