![]() |
![]() | ||||||||||||||||
|
Index
Details How it Works Online Tour Reference Download Purchase Upgrade Versions Case Studies FAQ Contact Tips Manual Other Sites |
Glossary S to Z
Scripts
Sockets
Search Engine
SecureFTP
Site Availability
Speed
Spider
TCP/IP
Traffic Analysis
URL
Virus
Watch
Web hosting
Who is
ScriptsHTML Script FilesOriginally HTML did not have any scripting capability, what you saw on the page was fixed once it had been loaded. Scripts can be run by the browser on the client computer and allow various animated effects to be simply generated. Typically they are used to make graphics change when the user hovers or clicks on buttons. This would be very difficult and slow to do if the web server had to send back information each time the user moved the mouse. When scripts were first introduced some years ago the support for script languages was patchy, with Microsoft® still pushing their own VBScript® in competition to the more widely supported JavaScript. JavaScript has now won and is compatible with most browsers in widespread use. Any animated effects generated by a script should be used with restraint as these can distract and put people off visiting a site. Javascript Programming Resource SocketsSocket based CommunicationMaking the Internet ConnectionSocket based communication is the principal mechanism that works the whole Internet.
Sockets can also be used to communicate between processes on the same computer and that is how it was first
developed on the earliest UNIX systems This client-server model is an appropriate scheme for the Internet as there are many clients making connection requests for information from one place (the server). A web server does not normally need to initiate communication to a client. However the HTTP is a once off request-response protocol that means a connection is only made momentarily and this is not an efficient use of resources if a client is going to request a whole set of information from the same place. HTTP has a Keep-Alive request header setting that suggests that a socket connection is kept open as further requests are expected. An introduction to socket programming on UNIX Search EnginesPosition on Search EnginesA search engine continuously monitors the Internet to build up massive databases that categorise the content of web sites according to keywords. Because a good position on the search engine results is such an important way of bringing visitors to a web site a whole industry has been created to enable web sites to achieve a higher placement on the result list - Search Engine Optimization. Originally a search engine used the META keywords TAG in the HTML page header to determine the keywords for a page. This was soon abused by people who put in common search phrases and words just to fool the search engine into thinking that pages had relevant content. Now search engines tend to disregard individual elements in a page and look for consistent usage of keywords in the text, headers, title and tags in a page. They discriminate against sites that look as though they are trying to subvert the process by over-used keywords. From the end user's point of view they want to see the sites that they want to visit at the top of the list. If a search engine shows sites of little relevance then the end user will go to another search engine that does give them more appropriate results. It is not in a web site owner's interest to increase traffic that is not appropriate to the web site as users will just feel frustrated and go elsewhere. For much more on search engine please refer to our companion search engine reference pages. Site Vigil monitors the position of sites in the results list from search engines and can raise an alert when the position of a web site changes. Secure File TransferTransferring files securely over the InternetFTP was designed way back in the early 1970s before malevolent software was created. FTP has a number of security weaknesses, especially the fact that the FTP connection command sends the user/account name and password in plain text over the Internet, and so it can be easily intercepted. SecureFTP uses state of the art encryption to make sure neither the commands or data can be quite so easily eavesdropped. It is based around the OpenSource Secure Shell (ssh). For more information see Secure FTP transfers via Secure Shell Tunneling You can access tell Site Vigil to access a web server's log files using normal FTP or SecureFTP. Site AvailabilityThe Internet is a lot more stable than it used to be. Five years ago it was quite common to find web sites unavailable for one reason or another. Sometimes the web site server itself was offline, more frequently it was a failure in the network communication infrastructure. Better reliability has been achieved by connecting web hosts to several network service providers, so if one route to the site goes down then an alternative route is always available. Nowadays a web site should be available to all users over 99% of the time. The most frequent reason for a web server to appear to go down nowadays, is when it is subject to a denial of service attack. In this case a web site (usually co-located on the same server) is flooded by requests from multiple places (a distributed denial of service or Ddos attack). These may be launched by someone aiming to maliciously harm a web site and bring it down. If a web host is running a flaky server or have an unreliable connection to the Internet then this will show up in measures of the availability of the web site. The worst scenario is that your site is down when a search engine is scanning your site when it is checking for inclusion in its search database, your site may then not be listed for several more months.
SpeedSpeed of Access to Web SitesNow that web hosting companies have become more competitive at seeking out new business, there is less differential in speed than there ever used to be. It is often the case that the design of a web page will have a more dramatic effect on perceived speed than a slightly faster web host. Many sites use graphics that can be substantially reduced in file size, or redesigned so that they have smaller, simpler graphics. Speed is also a measure of the communication system, it will often be the case (especially without a broadband connection) that the slowest step is getting the data from the Internet to the client computer over a 56K modem line.
Site Vigil lets you measure the speed of access over time and build up a profile of the basic site speed. Spidering a web siteChecking all pages on a web siteIf a web site has more than a handful of pages it is very difficult to keep track of which page links to other pages and which pages use a particular graphics image. Some web designer tools will let you check pages before they are uploaded but this may not reflect the live content of the web site. As well as checking all the pages on a web site a spider monitoring scan can establish that all the links to other web sites are working correctly too. To perform this spider scan, a robot is used that reads each page on a web site in turn. It then analyses the HTML making up each page and adds any new links to pages or graphics that it finds to the list of pages to scan next. The spider monitor robot continues to scan the site until it has scanned every page it has found a reference to. A well behaved spider must take account of the special directives put on the web site or individual pages to control what can be scanned. This is explained in our Controlling Robot Visit tip.
TCP/IP CommunicationCommunicating over the WebAll the commonly used communications protocols HTTP, HTTPS, FTP, Secure FTP, Gopher, DHCP, USENET need ultimately to send out digital signals over a physical wire connection or with radio waves. Rather than each program using their own implementation of communication services all of these protocols make use of a common underlying set of communication services called TCP/IP. In the late 1960s the U.S. military research network pioneered a network of computers that remained remarkably stable considering the unreliability of the equipment in those days. It achieved this by using a set of communication protocols that are resilient to failure and loss. Each network node computer works independently, there is no 'master' node controlling the whole network. Each node dynamically maintains its own routing data as a 'map' of how to get information to a particular destination node. It exchanges routing information with only its immediate neighbours. This mechanism allows the network to 'self heal' when a network link or node becomes unavailable, and re-adjusts automatically when it becomes available again. Network architecture is traditionally split into layers starting at the top application layer and going progressively down towards the hardware. The Transmission Control Protocol (TCP) forms the Transport layer and beneath it the Internet Protocol (IP) forms the Network layer. The Internet may also use UDP (User Datagram Protocol) as an alternative to TCP in some circumstances. In rough terms the Transport layer looks after assembling whole messages from individual small packets of data whatever route they may take. The Network layer looks after getting individual packets across the network. If data packets are lost then TCP automatically attempts to retry the operation. It uses a simple acknowledgement interchange to ensure this. Access to the communication stack is usually made by sockets. See also : RFC1180 : A TCP/IP Tutorial Traffic MonitoringMonitoring visits to a Web SiteIt's very important to keep an accurate measure of the level of interest in a web site. The number of hits is a very crude estimate of activity as accesses to graphics images on pages are often treated as hits as well as the HTML page itself. Page impressions are a better measure as they ignore references to graphics. Similarly, scans by automated programs (robots) rather than real users are often counted as hits. More sophisticated analysis requires a group of requests from the same client to be treated as a single session and the number of sessions or user visits is a more useful measure. The profile of activity during a week is important as some sites are busy during the day and while other sites are busiest at weekends when users browse from home. It will also indicate where geographically the main source of visitors are located. If the traffic peak coincides with the peak of the Pacific timezone then that can quickly identify the main audience. A brief, high peak may indicate a scan by a robot, perhaps a search engine building an index for the web site. The traffic profile will indicate when to do site maintenance or when to increase available bandwidth. Web site traffic analysis is an increasingly important tool for the efficient management of web sites. The most significant reason why traffic might suddenly rise is when some other web site or an advertisement promotes your site. These days that will normally be a mention in an online forum or blog. Even if it is not possible to use the Referral tracking information to work out where the new users are coming from, the date and time that it happens can indicate, for example, when a magazine advertisement campaign hits the streets. Similarly if the traffic drops off unexpectedly this may indicate a major network routing problem that is blocking out a large number of users. Just because you can access a site does not guarantee that everyone else is equally fortunate. See the Internet Traffic Report
URLSpecifying Which pages you Want with URLsA Uniform Resource Locator (URL) is the way to specify what you want and how it should be provided. It has the following format : <protocol>:://<domain name>/<object name>?<query string> The protocol specifies how to access the information, the same information may be available through several protocols (e.g. typically via HTTP, HTTPS and FTP). HTTP is just one example of a protocol handler, because web browsers usually use http: they normally accept www.mine.com as shorthand for the full http://www.mine.com The domain name is the multi-level DNS name of the server e.g. www.silurian.com. More strictly www is a sub-domain name and silurian.com is the actual domain name. It is used in DNS to locate the web server that can provide the information. The object name is the resource on the web site to access e.g. /win32/chart.htm, this need not be a file it can be a program or any other named entity that the server wishes to make accessible. The query string is an optional part that provides additional information for the server to give a tailored response. It is frequently used by HTML Forms (as in a query to a search engine) to specify extra information. A search engine query string usually has a whole set of keywords. For example http://www.google.com/search?sourceid=navclient&query=develop+XP+style is a query with keywords sourceid as navclient, and query as 'develop XP style'. The port is an optional entry that determines which port will be accessed on the server to retrieve the information. Normally this is implied by the protocol, so for HTTP the port defaults to 80. You can gain access to authenticated areas of a web site by specifying a user name and password. If no username is provided then the default public access permission control is used.
VirusesServer Virus AttacksAlthough it's ordinary PCs which normally succumb to computer viruses, it's all too easy for web servers to become infected too.
All that is needed is a security weakness to be exploited and arbitrary code can be executed on the server. This can often be because access to a
port has not been blocked to external access.
Some pernicious viruses will not only infect executable files held on the Server but include extra script code into any HTML,
PHP or ASP files they find on a server.
Nimda Site Vigil helps detect virus attacks by maintaining a checksum for web pages and alerts you when the page has been unexpectedly changed. Watch WebSite PagesWhen you want to know the pattern of accesses to individual pages, graphics or other web site resources then standard site statistics fail to deliver the level of detail you need. This facility is useful if you want to count the build up of accesses to a new area of a web site or perhaps accesses to a download file or a contact page indicating that users are having problems. Of particular use is the ability to keep a watch on accesses to a 404 error handler page, so that you are quickly alerted when a page is missing anywhere on a web site. Site Vigil lets you monitor the level of activity of accesses to resources on a web site. It will then alert you when the number of accesses is above or falls below a programmable threshold of number of hits per day. A detailed history of accesses gives you the information needed to check out the historical pattern of access to a resource. In order to monitor web site accesses Site Vigil needs access to the web server log files in order to analyse the data. Web HostingChoosing a good Web HostA web hosting company looks after the web pages for a domain. The company has a high capacity connection to the Internet to allow anyone to access the web pages on their computer. A good web host will provide several independent connections to the Internet in case one of them fails. The web service will normally dictate how much web space is available in total and how much bandwidth is allocated. Small sites may need only 10Mb of web space and the actual bandwidth used may be as low as 10Mb a month but packages will often allow a much greater level of web traffic. For a large or very busy web site it will be necessary to have a dedicated server to host the site. The very largest web sites will have multiple server computers with complex failover and load balancing techniques to ensure fast access times. Most small domains are hosted together in groups on a single server computer. Each server is accessed by a single unique IP address. If a web site experiences sporadic slow access it may be that the some of the web sites with which it is co-located are overloaded. You can get Site Vigil to inform you when the IP address for any domain name has changed. This is particularly useful when you want to track the transfer of a domain from one web hosting company to another. WhoIsFinding out who owns whatTo find out about information about a domain you can use a Who is service like Whois Source The second way is to trace the owner of a domain's IP address. The Internet is split up into geographical areas,
each with their own controlling authority for IP addresses, for the Americas this is ARIN There follows two examples (they do not reflect real contact information) copied from Site Vigil screens. Example Whois for Detron.com
Lookup for information about domain name 'www.detron.com'
Using information read from 'whois.internic.net,whois.networksolutions.com' Registrant: Retro Aerospace (RETRO2-DOM) 7600 Belfast Ave Oakland, CA 94719 US Domain Name: RETRO.COM Administrative Contact: Rupert Bushell (38456655P) rui300@hotmail.com 24672 Santa Clara St Hayward, CA 94544 US 510-521-3650 Technical Contact: Hamish McCall (3823255P) rji340@hotmail.com 24672 Santa Clara St Hayward, CA 94544 US 510-521-3650 Record expires on 21-Mar-2006. Record created on 20-Mar-1994. Database last updated on 29-Dec-2004 11:47:28 EST. Domain servers in listed order: KW.RETRO.COM 205.179.156.5 GATEWAY.RETRO.COM 64.81.61.130 NS1.DSL.NET 209.87.64.70 NS2.DSL.NET 209.87.79.232 Example Whois for 205.179.166.15
Lookup for information about IP address '205.179.166.15'
Using information read from 'whois.arin.net' OrgName: DSN.net, Inc. OrgID: FTCI Address: 541 Long Wharf Drive City: New Haven StateProv: CT PostalCode: 06612 Country: US NetRange: 205.179.0.0 - 205.179.255.255 CIDR: 205.179.0.0/16 NetName: DSL-NET-21 NetHandle: NET-205-179-0-0-1 Parent: NET-205-0-0-0-0 NetType: Direct Allocation NameServer: NS1.DSN.NET NameServer: NS2.DSN.NET Comment: ADDRESSES WITHIN THIS BLOCK ARE NON-PORTABLE Comment: rwhois.scruz.net 4321 RegDate: 1995-03-20 Updated: 2004-07-29 OrgAbuseHandle: ABUSE177-ARIN OrgAbuseName: Abuse OrgAbusePhone: +1-203-778-1000 OrgAbuseEmail: abuse@dsn.net OrgNOCHandle: NOC291-ARIN OrgNOCName: Network Operations Center OrgNOCPhone: +1-203-778-1000 OrgNOCEmail: noc@dsn.net OrgTechHandle: IPADM54-ARIN OrgTechName: IP Administration OrgTechPhone: +1-203-772-1000 OrgTechEmail: ipadmin@dsn.net
|
||||||||||||||||