- Internet & the Web
An Introduction To Search Engines
In the last unit, we explained why search engine visibility is important. In this unit we will take closer look at search engines. Because SEO is about improving the visibility of your web pages in search engine results, we have to understand a bit about how search engines work. By the end of this unit you should be able to:
- understand what search engines do
- understand which search engines to concentrate on when optimising your site
- understand how search engines rank results
- measure the PageRank of individual web pages
- understand how to perform advanced searches
This unit assumes that you have read and understood the last part of the course and that you are comfortable with the terms: keyword, keyphrase and search engine optimisation.
2.1 What is a search engine?
Wikipedia defines a search engine as: ‘a program designed to help find information stored on a computer system such as the World Wide Web, or a personal computer. The search engine allows one to ask for content meeting specific criteria (typically those containing a given word or phrase) and retrieving a list of references that match those criteria. Search engines use regularly updated indexes to operate quickly and efficiently.’
In other words, a search engine is a sophisticated piece of software, accessed through a page on a website that allows you to search the web by entering search queries into a search box. The search engine then attempts to match your search query with the content of web pages that is has stored, or cached, and indexed on its powerful servers in advance of your search.
Note: many search engines allow you to search for things other than text: for example, images. However, for the purpose of this course, we will focus on text-based searches. As we pointed out in the last unit, SEO methods are largely (but not exclusively) centred upon text as they involve matching key parts of the text in your web pages with the keywords or keyphrases that people actually type into search engines when looking for something on the internet.
There are two main types of search indexes we access when searching the web:
- crawler-based search engines
Unlike search engines, which use special software to locate and index sites, directories are compiled and maintained by humans. Directories often consist of a categorised list of links to other sites to which you can add your own site. Editors sometimes review your site to see if it is fit for inclusion in the directory.
Crawler-based search engines
Crawler-based search engines differ from directories in that they are not compiled and maintained by humans. Instead, crawler-based search engines use sophisticated pieces of software called spiders or robots to search and index web pages.
These spiders are constantly at work, crawling around the web, locating pages, and taking snapshots of those pages to be cached or stored on the search engine’s servers. They are so sophisticated that they can follow links from one page to another and from one site to another.
Google is a prominent example of a crawler-based search engine.
Note: Some search systems are ‘hybrid’ systems as they combine both forms of index. Yahoo, for example, features both directories and search engines.
As we will see later in this course, the SEO process often involves optimising your site in such a way that it allows search engine spiders to locate every page on your site quickly and easily.
Spidering vs submitting your site manually
If you browse the web, you will notice that many companies will offer to submit your site to search engines for inclusion in their listings. The services these companies offer are largely unnecessary and can prove to be a waste of time and money.
It is important to remember that search engine spiders are constantly crawling the web, following links and indexing pages. Because spiders automatically index your pages when they find them, there is absolutely no need to submit your site manually to the major search engines.
Note, however, that the process of being found can take some time, and it can be weeks before the major search engines index your site. SEO is a cost-effective way of making your site visible, but it can take time especially for new sites. However, there are ways to accelerate the indexing process which include xml site maps and RSS. Both these topics will be covered in the next tutorial.
2.2 Which search engines to target?
In the last unit, we suggested that the vast majority of Internet users use search engines to locate products or services. This free system of listings is a more popular method of locating sites than paid-for advertising such as PPC and is thus a better way of improving the visibility of your website. But which search engines do you want to be found by and which search engines should you target?
Although the majority of Internet users rely on search engines to find what they are looking for, they do not all use the same search engines. There are, in fact, numerous search engines out there, all vying for a share in the lucrative search engine market. Here are just a few of the search engines that we use when looking for something on the Internet:
As you can see, then, there are numerous companies we can turn to when searching the Internet. Note, however, that not all of these search engines use truly distinct search technology. AOL, for example, bases part of its search results on Google. Teoma uses Ask Jeeves technology. Dogpile is a meta-crawler, which means that it searches all the major search engines for you and compiles results from places like Google, Yahoo, and Ask Jeeves.
This may seem like a bewildering array of search options and a formidable amount of search engines to optimise your site for. However, we only have to concentrate on the largest players in the search engine market as they have the most people using their search technology, and because they also act as search providers, leasing out their search technology to other search engines.
Let’s look at who the leading players are in the search engine market. The following chart, compiled from data provided by Hitwise shows the search engine market share for December 2007, November 2007 and December 2006.
As we can see, Google, Yahoo, and MSN are the big players in the search engine market, accounting for just over 90% of the total market. This means that more people use their search technology to search for products or services on the web than any other search engine. For this reason, these are the search engines you should primarily focus on when analysing optimising your site. Consequently, these are the search engines we will focus on throughout this course:
- Google – www.google.com
- Yahoo – www.yahoo.com
- MSN – search.msn.com
There are some important things to note about these search engines.
- each use different systems to rank pages
- because different systems are used, a high ranking for a specific keywords in one search engine does not automatically mean that your page will rank highly for the same keywords in another search engine
- nevertheless, each use similar principles to determine the relevancy and importance of web pages in relation to search queries
2.3 Anatomy of a search
In the last unit of the course we began to show you how search engines work. For the sake of simplicity, we can consider the search process to work something like the following:
- Search Engine Spiders the web
- Search engine caches pages that its spiders on its servers
- User enters a search query
- Search engine checks the search query against its index
- Search engine returns what it believes to be the most relevant results for that query
Although the process is actually more complex than this, the above diagram is useful in helping us to visualise how searches work, more so in reminding us that when we enter a search term, the search engine does not actually rush off and check every page on the web. This would take far too long. Instead it checks your search term against an index that is stored on its servers. Spiders working their way around the web constantly update this index.
Note: because pages are indexed in advance of searches, the results returned might be out of date. When you click on the link for one of the results, for example, you may find that the page has been updated since the search engine last spidered it, or even that the page you want has moved.
If I carry out a search for cheap web-hosting, the search engine checks its index to see which pages carry the terms ‘cheap’, ‘web’ and ‘hosting’. It then returns a results page containing what it believes are the most relevant pages for these particular keywords.
Let’s look at a typical search result page. Thispage shows the results for the above search in Google (Illustration 1). The results page is set out as follows:
- Search box with our search query
- The number of results Google returned for our search query (plus the time the search took)
- Sponsored links. This is paid-for advertising. For this results page, Google has selected adverts that are relevant to our search query.
- Search results. This section shows the pages that Google thinks are most relevant to our particular search terms. These listings are free.
- Link/Page title. The text is the exact text that appears between the title tags (<title></title>) on the page that the search result links to. Notice how keywords from our search query have been highlighted.
- Page description. This text is commonly the actual text that appears in the meta description of the page that the search result links to. This is the text between the quotation marks in the HTML tag <META NAME="description" content="YOUR TEXT HERE">. Again, Google has matched this text with our search query.
- Domain. This is the address of the page linked to.
- Cached page link. Unlike the above link, which links to the domain that the page is on, this link takes us to the cached version of the page that Google has stored on its server.
- More results. Links to further pages of results
We will now look at some of the ways in which search engines rank pages when determining search results.
Search providers use complex mathematical equations called algorithms to rank web pages. These algorithms make calculations about the relevance of words on web pages in relation to search queries or the perceived importance and link popularity of websites. They may also take other factors into account when ranking results, such as the age of the domain your site is on, or whether the terms used in a search query appear in the URLs of sites in the search engine’s index.
You may be surprised to learn that SEO professionals are not entirely sure how these algorithms work. In fact, search algorithms are a closely guarded trade secret. If they were made available to the public, we would see a lot more websites trying to find ways to exploit them in order to gain better search engine rankings.
Algorithms tend to be patented, and these patents can sometimes give SEO professionals a clue as to how search engines rank the relevance and importance of web pages. Otherwise, SEO involves a fair degree of trial and error, and most of the SEO process falls back upon tried and tested methods that circulate amongst the SEO community and that have been shown to be effective in improving search engine visibility (SEO websites and forums can be a good place to visit to see SEO professionals discussing these methods and exchanging ideas).
2.4.1 Page Importance
There are two main factors that search engines use to determine the position that pages will gain in search results:
- Keyword relevancy
- Page importance or link popularity
As we noted above, when you carry out a search query, the search engine tries to return relevant pages for that query by returning pages that contain the keywords in your search query.
However, search engines also take the importance of the page into account when ranking pages. This importance is based on the number of external links pointing to a page. The more links pointing to your pages, the more important they are deemed to be by the search engine.
The best example of this system of ranking pages is Google’s patented PageRank.
2.5 Google PageRank
Google’s PageRank is a system that rates the importance of pages in direct proportion to the number of external links pointing to that page.
PageRank exploits the network of links on the web in order to determine the relative value of individual web pages. It does this by counting the number of links pointing to one page from other sites. As Google puts it, a link to one of your pages from another site is considered a ‘vote’ in favour of that page.The higher the votes, the greater the value or perceived importance of the page.
However, Google also takes the importance of the page that links to your page into account when determining the value of your page. If the page that links to you is already seen to have a high importance –in other words, if it already has a high PageRank – then the link it provides is ‘weighted’ higher than a link coming from a page with a lower PageRank or lesser importance.
Google then combines PageRank with page relevance to ensure that the pages returned in results are not only important in themselves but are also relevant to your search.
You can find Google’s own explanation of PageRank here:
Pages Vs Websites
Google PageRank applies to individual pages and not websites as a whole. Pages on the same site will often have a different PageRank.
It is important to note this emphasis on individual pages rather than sites as a whole. Similarly, when we carry out a search in a search engine, the results returned refer to individual pages rather than whole sites. This makes absolute sense from the point of view of the both the search engine and the user. Some pages within a website will usually be more important than others, e.g. the homepage. Also individual pages within websites are not always relevant to the same things and may cover topics that are unrelated to user’s search query.
From an SEO point of view, you will be looking to optimise individual pages so that they rank for different keywords. We will show you effective methods of achieving this later in the course.
Although PageRank is specific to Google, most of the major search engines now use a similar system to determine the position of pages in search results.
2.5.1 How to check PageRank
It is particularly important that you learn how to understand and measure PageRank, as it will play a significant part in your future SEO efforts. The ability to measure PageRank will help you analyse competitor’s web pages and to keep track of how well your own web pages are faring when they are optimised and online.
To measure Google PageRank you must first install the free Google Toolbar into your browser. 2.5.2 Installing the Google Toolbar To get the toolbar, navigate your browser to the following URL:
Different versions of the toolbar are available for different browsers like Internet Explorer and Firefox. Google should automatically detect which browser you are using and offer a download for the appropriate version.
2.5.3 Measuring PageRank
With the toolbar installed, try browsing the web. As you navigate from page to page, the little bar next to ‘PageRank’ will fill up green as the PageRank for the current page increases and go down as the PageRank for the current page decreases.
To get a more accurate numeric measure of PageRank, hover your mouse over the part of the Toolbar that reads PageRank. A small dialogue box should appear with the following text:
‘PageRank is Google’s measure of the importance of this page (x/10)’
where x is the actual value of the page out of a total of 10. The higher the number, the higher the PageRank for that page.
You can now measure the PageRank of web pages. This has many SEO applications, including:
- The ability to measure the PageRank of your own pages
- The ability to measure the PageRank of competitor’s web pages
- The ability to measure the PageRank of potential link partners. Remember that the more important the site that links to you is, the more weight is given to that link, hence the greater your perceived importance.
TASK 1: MEASURING PAGE RANK
Let’s try measuring the PageRank of some web pages:
- Install the Google Toolbar into your browser, making sure that you enable advanced options.
- Once installed, try carrying out a search for the kind of products or services that your website offers.
- Visit all the pages returned on the first page of search engine results, and note down their PageRank.
- Compare the PageRank of these pages. Which pages have the highest measure of importance and which the lowest?
- Search engines allow us to search the web by entering search queries that the search engine compares against its index of web pages.
- The leading search engines are currently Google, Yahoo, and MSN.
- Crawler-based search engines use software called spiders to crawl the web and index web pages.
- Search engines use complex mathematical algorithms to rank web pages.
- Search engine ranking is based on a combination of page relevance and page importance.
- Page importance (or PageRank) is based on the link popularity of a web page and the quantity and quality of external links pointing to that page.
- PageRank is calculated on a per-page basis and does not apply to websites as a whole.
Search Engines are sophisticated engines that allow users to quickly locate products and services on the Internet. Since SEO is aimed at improving your visibility in search engine results, it is essential that you understand the criteria they use to rank web pages. In the next units of this course we will show how to use search engines to help locate the right keywords for your products and help analyse the competition you will face in search engine listings.
What do you understand by the following terms?
- Search Engine
- Search query
- Page Importance
- Link Popularity
Once you fell that you can satisfactorily explain these terms move on to the next unit of the course.
04: An Introduction to Search Engines (You Are Here)
07: Keyword Research
Related SEO Hubs And Articles
Internet Marketing Scotland: Promoting business online with professionalism and integrity.