Site  Monitoring Web Site Monitoring
Using ALT text
Making sure your web site images get referenced
Read More

Robot visits to a Web Site

On the Internet a robot is not some mechanical human-like servant but just a special type of computer program. Ever wondered how search engines build up their indices of web sites ? Well, search engines amongst other programs use robots to continually trawl web sites analyzing the contents as if they were human visitors using a browser. They use HTTP just like browsers in order to access information. A server can state which pages should be inspected by robots, the instructions are stored in the robots.txt file. Over time, search engines have grown much more sophisticated and the way that they scan sites is complex. Many will first scan the site's index page, coming back after weeks or months to drill down to scan the rest of the web site.

The server log usually includes a browser field in each log record indicating the name of the robot. No well-behaved robot will flood a server with requests as this would affect the server's performance. They will spread their site scan over hours or days. Robots should include a contact URL or email address in the browser information included in the HTTP request header so that a web master can analyze the activity by robots.

robot web work

robot web work

Site Vigil supports the monitoring of robot activity as an important function of web site administration. Robots used by the Search Engines (e.g. Google) continually scan web sites to keep their indices up to date. Once search engines are regularly visiting a web site you may want to control which areas of the site are visited. This is controlled by a file named robots.txt that is located in the root folder of a domain.

The file contains a set of directives that a robot should read before scanning a site, it states which pages are to be included and excluded from such scans. You can use this facility to prevent pages being indexed by search engines. For example you may have a set of test pages that you don't want to be seen yet or you might want to keep images used on the site from public search.

You can also indicate that you do not want robots to reference a page using the META tags within the page itself :

<meta name="robots" content="noindex"/>

Site Vigil monitors the visits made by robots by periodically scanning the log file and keeping track of all the information. It reports how many visits have been made and how long since the last visit by a particular robot.

robot web works

robot web works

Site Vigil™ is a sophisticated web site monitoring tool that lets you thoroughly check your site. Improve the effectiveness of your web site by tracking the flow of visitors. Make sure your site is operational and your web host is giving a fast, trouble-free service. Automatically check the position of your site on search engines and the validity of all the links.

It runs on an ordinary Windows® PC, using your normal Internet connection to gather data quietly in the background. Periodically it will alert you as soon as a problem is spotted.

Site Vigil Site Vigil is available for a free 40 day trial, give it a try by going to the download page.
To look at the program features in more detail take an online tour of the product
We have a large reference section and a set of case studies to help you with all your monitoring needs.