Using search Keywords to find information on the Web

Everyone is now familiar with the term 'keyword' it is what you type into a search engine to find relevent web pages. However, like most Internet terms its meaning has evolved in the last few years.

It used to really refer only to the META keywords used in the HEAD of a web page. These are explored in our META keywords page.

When search engines began to get more sophisticated the term 'keyword' began to change, it could be not just one word but a phrase or keyphrase. You could soon use double quotes (") to distinguish a match for an exact series of words so that the search engine will give more focused results. So "poll tax" compared to poll tax will give results concerning the specific tax measure while the latter will bring in for both matches for sticks of wood and all types of tax.

Roughly speaking the search engine has a huge index that is looked up by words, it will give results for a phrase depending on the order of the words, tax poll will give different results and so will "tax poll". Simple linking words like "a"; "the"; and "of" are ignored (unless in "s).

The Ask Jeeves search engine was one of the first to claim to answer a user's question, as if the software understood the question e.g. "How do I find cheap electrical products". In fact all it was doing was strip out significant words from the question typed in and use them as a keyphrase just like all the other engines. It did not understand the semantics of the keyphrase that was being used. You can test this by testing phrases like "How do I find non expensive electrical products?", it will not give you cheap ones at present.

In early 2005 Google have started to use more sophisticated keyword finding algorithms in their results. At present this is limited to semantics of individual words so that it knows that say "wood" and "timber" are related terms. If you put "wood" in a keyphrase you may get some "timber" results. This requires the search engine to be cleverer in the way that it scans web sites. It must not handle words in isolation but build up associations by context. If as web site mentions both "wood" and "timber" it is giving a stronger hint to main association of the web site than if either word was used on its own. You can test out what Google considers to be synonyms by using the special ˜ character. So ˜wood will give both wood, timber, furniture and forest results; ˜car will give not only general automobile results but specific car makes (BMW, Chrysler, VW etc.).

There is still a long way to go on this semantic analysis journey. To keep search engines happy many web sites use a rather stilted style, constantly repeating keywords and their synonyms. It will take more time before a search engine understands web content in a more human-like way. Currently the pressure is still on to keep the text amenable more to the taste of search engines rather than humans. Consider the complexity of working out what to make of text like "This is the best ever software, I think not !", at present the keyphrase "best ever software" will be given a high association when in fact the reverse should have been the case. These negation of meaning in English are very common and subtle, consider "All except IXWebHosting won the best web site host award" and "It is certainly not the case that IXWebHosting is the best ever web hosting company"

Keyword analysis of web pages

