Interaction between robots.txt and meta robots tags

Question

There are other questions here on SO about what happens if you have both meta robots and I thought I understood what was happening until I came across this answer on Google Webmasters site: https://support.google.com/webmasters/answer/93710

Here's what it says:

Important! For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex tag, and the page can still appear in search results, for example if other pages link to it.

This is saying that if another site links to my page then my page will be indexed even if I have that page blocked by a robots.txt.

The implication from this is the only way to stop my page being indexed is to allow it in robots.txt and use a meta robots tag to stop it being indexed. This seems to completely defeat the purpose of robots.txt

unor unor · Accepted Answer · 2016-07-05T02:20:31

Disallow in robots.txt is for preventing crawling (= a bot visits your page), not for preventing indexing (= the link to your page, possibly with metadata, gets added to a database).

If you block crawling of a page in robots.txt, you convey that bots should not visit the page (e.g., because there’s nothing interesting to see, or because it would waste your resources), and not that the URL to that page should be considered a secret.

The original specification of robots.txt doesn’t define a way to prevent indexing. Google seems to support a Noindex field in robots.txt, but just as an "experimental feature" that’s not documented yet.

Interaction between robots.txt and meta robots tags

1 Answers