Generating a Good Sitemap
If the internet was a three-dimensional object, it would be a massive entity. Because, even though the world of “0’s and 1’s ” cannot be seen, or even imagined by most people - it is a really wide e-space. But to get an idea of how big it is all we need to do is consider a few numbers1:
- As of May 2011, there were approximately 39 billion web pages online.
- As of May 2011, Google had indexed around 36.5 billion web pages. Bing had indexed 11.5 billion web pages and Yahoo! had 10.6 billion pages in its search index.
Map Needed
With all these pages floating around on the internet it would be almost impossible to find a single page without the help of search engines. These engines are on a constant crawl over the internet discovering and indexing each and every web page created. Any webmaster looking to have his or her site surface on search results would have to make sure that all the help is given to the web crawlers. One way of making sure that a website is properly crawled, and hence indexed, is by submitting a Sitemap.
What is it?
A Sitemap, spelt with a capital ‘S’, is an XML file that lists the pages in a website2. It is created with the intention of serving as a map for search engine crawlers. This is especially good if the site has dynamic content that changes frequently. Also, if the site contains pages that are not linked to one another, search engine crawlers might not find and index them, because search engines find pages by following link.
How to create it
To understand how to create a Sitemap, it is necessary to understand the structure and commands it contains. As mentioned earlier, a Sitemap is an XML file; therefore the first line will read:
<?xml version="1.0" encoding="UTF-8"?>
Next, the URL of the website needs to be included:
<urlset xmlns="http://www.example.com/sitemap/0.9">
Now the individual pages can be added, but with the pages, more information can be submitted to the search engine crawlers. These are:
· The exact location of the page: <loc>http://www.example.com/?id=who</loc>
· The day it was last modified: <lastmod> 2009-09-22</lastmod>
· The frequency of update in the page: <changefreq>monthly</changefreq> Here it should be noted that the frequency can be set to ‘always’, ‘hourly SDHG’, ‘daily’, ‘weekly’, ‘monthly’ or ‘never’.
· The importance of the page with respect to other pages: <priority>0.8</priority> the range here is between 0 and 1 (0 = ‘Low’, 1 = ‘High’ importance) with 0.5 being the default.
Putting it all together we would get:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.example.com/sitemap/0.9">
<url>
<loc>http://www.example.com/?id=who </loc>
<lastmod> 2011-01-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.1</priority>
</url>
<url>
<loc>http://www.example.com/?id=what </loc>
<lastmod> 2011-02-23</lastmod>
<changefreq>monthly</changefreq>
<priority>0.9</priority>
</url>
·
·
·
</urlset>
Once all the pages have been added as <url> </url> the file is ended with a closing </urlset> and that becomes a neatly created Sitemap.
After saving it as an XML file, all that remains to be done is to upload it to the root folder of the website and then submit the URL of it to all the search engines.
1 – World Wide Web Size: http://www.worldwidewebsize.com/
2 – Google Webmaster Central:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156184&from=40318&rd=1