Back

Search engine indexing

Introduction
When you need to find something in a book, you usually go to the index at the back of the book or sometimes the chapter headings at the front. If you didn't have an index to search through in a book, you would have to go through each page in turn, scanning through all the words from top to bottom of each page. You would have to do this each time you wanted to find something in particular and it would take you a long time. Similarly, when you do a search for information on the Internet or indeed any digital source of information, a search engine would have to go through each document in turn to see if it were relevant to what you were searching for. Each search would take a long time and take a lot of computing power.

Search engine indexing
Search engine indexing, also called web indexing, is the act of looking at web pages and other digital information, breaking them down to see what information is stored in them and then storing that data so that the information can be searched for and retrieved as quickly as possible and as accurately as possible. Although an index takes up additional storage space, and indexing needs to be regularly done to keep it up-to-date, which takes extra processing time, the benefits in terms of how quickly and accurately information can be found far outweigh the drawbacks.

Meta information (metadata)
Digital documents can have information embedded into them by whoever produced it, such as the author's name, keywords in the document, a description and so on. In the early days of the Internet, when the hardware was not as powerful as it is today, whole documents were not examined and indexed. Only meta information was used to create indexes. This meant that people and organisations spent a lot of time trying to get metadata entries perfect, so that their documents were indexed effectively, which would lead to them appearing high up in the results returned when somebody did a search. The problem with this is that it was easy to insert unrelated metadata into documents, which could lead to spam results (results that were nothing to do with the documents themselves). As technology developed, whole-text indexing became the norm. All of the document was examined and indexed, which meant the results were far more focussed and accurate.

spider2How indexing works
Companies that provide search engines such as Yahoo or Google use software robots known as 'web crawlers' or 'spiders' to find web pages to index. These are stored in data centres around the world.

When you search for something, the algorithms developed by companies start searching their data centres for relevant matches. The algorithms include a whole range of criteria, including how recently a web page was added or updated, how many words are on the web page, the metadata used, what sort of information it holds (the text, what language, types of pictures, videos, links etc), recommendations to the site, links to the site from other sites, the web site title and address and so on. As results are collected together, they are ranked and then displayed, so that you get the most relevant results at the top of your search results. All of this happens in a fraction of a second!

Companies spend a lot of time and effort and money trying to ensure that when someone searches for a product they sell, they appear on the first page and not the tenth one, where they are unlikely to be noticed. Anyone who wants to ensure a high ranking has to be proactive. 

    • The site should be informative, relevant and updated regularly.
    • Metadata should be added using the Title, Description and Keywords tags in HTML pages.
    • Websites should be actively submitted to search engines, to tell them that your website exists and should be indexed. For example, you can submit your website to Google here: https://www.google.com/webmasters/tools/submit-url?pli=1
    • Your site should be linked to from as many other sites as possible, as search engines view this as a good indication that a website is recommended. This can be done by submitting your website to online directories, by making it easy for other people to link to you by giving them a clickable tool to link to you, by ensuring that you are active on social media sites and provide links to your website, and by offering to exchange links with other web site owners. 

Back