Selectively Excluding Pages from Being Indexed

There are many times you may with to exclude certain pages from being indexed by certain engines. One way to do this is by utilizing a robots.txt file and uploading it to the root directory of your Web site.

Basically, you just create a text file with Window’s NotePad or any other editor that can save ASCII .txt files.

Use the following syntax:

User-Agent: {SpiderNameHere}
Disallow: {FilenameHere}

For example, to tell Inktomi’s spider, called Slurp, to not index files called orderform.html and junk.html, create a robots.txt file as follows:

User-Agent: ArchitextSpider
Disallow: orderform.html
Disallow: junk.html

You would then upload this robots.txt file to the root directory of your Web site. Although this is a voluntary protocol, most major search engines will honor it.

You can add more lines to exclude pages from other engines by specifying the User-Agent parameter again in the same file, followed by more Disallow lines. Each disallow statement will be applied to the last User-Agent that was specified. If you want to exclude an entire directory, use this syntax:

User-Agent: ArchitextSpider
Disallow: /mydirectory/

Other options are to exclude the page from all spiders with:

User-Agent: *
Disallow: /mydirectory/

Do NOT use the wildcard (*) character in the Disallow line since that’s not supported.

Make sure you use the proper syntax. If you misspell something, it’s not going to work.

