Basic SEO for HTML - Creating a Robots.txt Script for a Search Engine Crawler
Every Website Should Use The Robots.txt
How Does Using the robots.txt help a websites SEO?
The robots.txt is used by all search engine friendly websites.
The robots.txt tells a crawler the URL's to be indexed, and the URL's that are not to be indexed. It also contains the exact URL address of the sitemap script.
The robots.txt is the tool used in determining what is wanting to be indexed and what isn't wanting to be indexed. The sitemap is an xml script that lists the locations of all URL's on your website and how frequently they are updated.
It's best to break this scripts functions and attributes up into sections so one is not confronted with a tonne of information, in this article we will discuss only the robots.txt.
Let's create a scenario to express how beneficial the script is and what it is used for.
Your at a newly built shopping mall and your child needs to use the bathroom... Urgently!
You have taken (what you think) must be a wrong turn as you have absolutely no idea where you are.
There's no signs around showing the toilets location, you don't have a map of the mall and you are unable to find a familiar area where toilets may be located.
What do you do?
You can ask a shop for directions (robots.txt)
The shopkeeper may not know the intricate workings and layout of the entire mall, but he has a general enough idea to express what toilets are in good condition and where you can find a map of the mall (sitemap).
The map of the mall (sitemap)
The map of the mall tells you pretty much everything. It tells you where the toilets are, as well as every other shops location. By looking for a date on the map, you can also see how up to date the map is and the layout shows which shops are of priority.
To sum everything up, the robots.txt script helps a crawler know what you want indexed and the sitemap shows where everything is.
Both of these scripts are used on any website wanting to increase it's search engine potential.
These scripts (along with others) are the core fundamentals of any websites SEO plan.
Creating The Script
Notepad - Notepad must be open to enter the script into.
Allow: attribute - This is where the URL's and data wanting to be displayed are typed.
Disallow: - Disallows specific URL or Data.
The filename has to be saved as robots.txt. No capitals.
What do i need to create a robots.txt script?
Creating the robots.txt script is a straightforward process that requires no extra programs or paid service.
All you need if your running windows is notepad.
To open notepad, click on start an select the notepad icon.
If it's not in the list, type notepad in the search bar and it will locate it for you. For a video tutorial see DIY SEO - Robots.txt
When you are finished writing your robots.txt it has to be saved with this exact filename robots.txt.
Once the script has been saved it has to be uploaded to that specific website as a page so a search engine crawler can access it.
Basic Functions and Attributes
To better comprehend the script, a basic example will be needed.
What is written here will tell every search engines crawler that everything on the website it is attached to is o.k to be indexed.
This section is the place where the search engine that wants to be effected is inserted. i.e Google-bot, Bing-bot etc...
Placing the * means every search engine will be effected.
This section is where what is allowed to be indexed is placed. Placing the / means everything will be indexed.
Alternatively, this can also be Disallow: which will revoke a specific URL or item.
How to append specific Images and URL's
It may sound tricky to add or remove URL's and images in your robots.txt, but it truly isn't.
Once you have a grasp on what the specific attributes are referencing to, it becomes a lot easier to understand.
The following example will portray how to disallow a specific page. An explanation can be found after the example.
As described in a previous example, the asterisk (*) indicates that all search engines should adhere to the next lines of code.
As you would have noticed in the disallow section (or allow for that matter), there is no need to place the homepage URL.
The first forward slash (/) is everything after your homepage. All that you need to type is the URL after the /. i.e The URL in the disallow section wasn't written as http://yoursite.com/apps/games.html. All that was written was /apps/games.html.
Now you know how to modify a URL, adding or removing pictures is very much the same concept.
All you do is add the folder and the image name as well.
The following example is how a picture is removed
Now you have the basics sorted your ready to create your own robots.txt.