Robots Exclusion Standard

From DreamHost
(Redirected from Robots.txt)
Jump to: navigation, search

The robots exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data.

The protocol, however, is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee privacy. Some web site administrators have tried to use the robots file to make private parts of a website invisible to the rest of the world, but the file is necessarily publicly available and its content is easily checked by anyone with a web browser.

Example

Create and upload a text file named robots.txt containing the following:

User-agent: *
Disallow: /images/

This will have the effect of stopping all conforming web spiders/bots from indexing your images folder. Using disallow on the root (e.g. /) will keep the spiders from indexing anything on the site.

External link