Site Monitoring
Web Site Monitoring
 Home 
 Site Monitor 
 Case Studies 
 Download 
 Purchase 
 Search Site 
 Contact Us 
 Reference 
Keyword Specification
Are the META keywords still important
Read More

Controlling Visits by Robots

Tip 3 : Robot Visits

Monitoring the activity of robots is an important function of web site administration. Robots used by the Search Engines (e.g. Google) continually scan web sites to keep their indices up to date. Once search engines are regularly visiting a web site you may want to control which areas of the site are visited. This is controlled by a file named robots.txt that is located in the root folder of a domain.

The file contains a set of directives that a robot should read before scanning a site, it states which pages are to be included and excluded from such scans. You can use this facility to prevent pages being indexed by search engines. For example you may have a set of test pages that you don't want to be seen yet or you might want to keep images used on the site from public search.

Example /robots.txt :

User-agent: *
Disallow: /newversion/
Disallow: /directory.htm

This excludes access to the newversion folder and a specific file /directory.htm the rule applies to all robots (you can make it apply to specific agents if you wish).

You can also indicate that you do not want robots to reference a page using the META tags within the page itself :

<META NAME="ROBOTS" CONTENT="NOINDEX">

For further details please visit : www.RobotsTxt.org.

Site Vigil takes notice of the robots.txt directives when it scans web sites.