Block bots from crawling your website using robots.txt
So that search engines can index your website, they need to crawl it. For that they use programs called bots. Sometimes you might not want all of your pages to appear in search engines. In that case, create a robots.txt file in the root directory of your website.
In this file you can either block all, or just bots from specific search engines. See also All options for user-agent in robots.txt file
Each section in the robots.txt file is separate and does not build upon previous sections. For example:
User-agent: * Disallow: /folder1/ User-Agent: Googlebot Disallow: /folder2/
In this example only the URLs matching /folder2/ would be disallowed for Googlebot.
User-agent: Googlebot-Image Disallow: /images/dogs.jpg
User-agent: Googlebot-Image Disallow: /
User-agent: Googlebot Disallow: /*.gif$
User-agent: Googlebot Disallow: /private*/
User-agent: Googlebot Disallow: /*?
User-agent: Googlebot Disallow: /*.xls$
You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows:
User-agent: * Allow: /*?$ Disallow: /*?
The Disallow: /*? directive will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string).
The Allow: /*?$ directive will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?).
Was this article helpful? Then please donate to keep The IT Community alive...
If you found this article helpful please share it, comment and help others by writing your own article.
Translate this page:
Articles found in the same category: