You probably use a search engine (SE) like Google or Yahoo every time you use the internet but do do know how they work ? If you have a website or intend creating one it is important to have at least a basic knowledge of them.
The most common type of SE is a crawling search engine. This uses a computer program called a 'spider' to build up a database of the contents of websites which can then be searched by users. To do this it must first, find web pages and store their content and then analyse each page's content
The spider has access to a list of all the domains on the web. It visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes. How often it returns depends on how important it views the page or site. Every piece of text that the spider finds is indexed. This means sorting the data into a form that it can search easily later; like a telephone directory for example.
The next two stages are the most interesting and shrouded in secrecy. A program ranks the importance of each website and each page depending on a set of criteria such as how many other sites link to it, how relevant these other sites are and how these 'inward linking' sites are ranked themselves.
When a search engine user searches for a word or series of words the SE looks through its index for matching words. The all-important order in which it brings back the results determines where a page ranks in the search engine results page (SERP) and is calculated by a complex method involving many factors. The search engines like Google, Yahoo and MSN keep their methods of doing this secret so that web developers cannot easily influence the results.
What we do know about the process is that the mission of the SE is to find the best results for the person searching and sometimes penalises sites that try to influence this by purely artifical means.
Search Engine Optimization aims to present a site to the SE spiders in such a way that it can recognise the site as a useful reference on the particular subject. There are many aspect to this and the techniques are far too numerous and complex to discuss here.