ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel
  • »
  • Business and Employment»
  • E-Commerce & How to Make Money Online

Basic SEO for HTML - Creating a Robots.txt Script for a Search Engine Crawler

Updated on August 31, 2013

Every Website Should Use The Robots.txt

Bleep Bleep, the robots are coming! The robots.txt informs crawlers of what is, and isn't to be displayed in search results. This is basic SEO
Bleep Bleep, the robots are coming! The robots.txt informs crawlers of what is, and isn't to be displayed in search results. This is basic SEO | Source

How Does Using the robots.txt help a websites SEO?

The robots.txt is used by all search engine friendly websites.

The robots.txt tells a crawler the URL's to be indexed, and the URL's that are not to be indexed. It also contains the exact URL address of the sitemap script.

The robots.txt is the tool used in determining what is wanting to be indexed and what isn't wanting to be indexed. The sitemap is an xml script that lists the locations of all URL's on your website and how frequently they are updated.

It's best to break this scripts functions and attributes up into sections so one is not confronted with a tonne of information, in this article we will discuss only the robots.txt.

Let's create a scenario to express how beneficial the script is and what it is used for.

Your at a newly built shopping mall and your child needs to use the bathroom... Urgently!

You have taken (what you think) must be a wrong turn as you have absolutely no idea where you are.

There's no signs around showing the toilets location, you don't have a map of the mall and you are unable to find a familiar area where toilets may be located.

What do you do?

You can ask a shop for directions (robots.txt)

The shopkeeper may not know the intricate workings and layout of the entire mall, but he has a general enough idea to express what toilets are in good condition and where you can find a map of the mall (sitemap).

The map of the mall (sitemap)

The map of the mall tells you pretty much everything. It tells you where the toilets are, as well as every other shops location. By looking for a date on the map, you can also see how up to date the map is and the layout shows which shops are of priority.

To sum everything up, the robots.txt script helps a crawler know what you want indexed and the sitemap shows where everything is.

Both of these scripts are used on any website wanting to increase it's search engine potential.

These scripts (along with others) are the core fundamentals of any websites SEO plan.

Creating The Script

Tools Needed
Notepad - Notepad must be open to enter the script into.
Allow: attribute - This is where the URL's and data wanting to be displayed are typed.
Disallow: - Disallows specific URL or Data.
The filename has to be saved as robots.txt. No capitals.

What do i need to create a robots.txt script?

Creating the robots.txt script is a straightforward process that requires no extra programs or paid service.

All you need if your running windows is notepad.

To open notepad, click on start an select the notepad icon.

If it's not in the list, type notepad in the search bar and it will locate it for you. For a video tutorial see DIY SEO - Robots.txt

When you are finished writing your robots.txt it has to be saved with this exact filename robots.txt.

Once the script has been saved it has to be uploaded to that specific website as a page so a search engine crawler can access it.

Basic Functions and Attributes

To better comprehend the script, a basic example will be needed.

User-agent: *

Allow: /

Sitemap: http://www.yoursite.com/sitemap.xml

What is written here will tell every search engines crawler that everything on the website it is attached to is o.k to be indexed.

The User-agent:

This section is the place where the search engine that wants to be effected is inserted. i.e Google-bot, Bing-bot etc...

Placing the * means every search engine will be effected.

The Allow:

This section is where what is allowed to be indexed is placed. Placing the / means everything will be indexed.

Alternatively, this can also be Disallow: which will revoke a specific URL or item.

Robots.txt Guidelines

Read all About it!  Robots.txt Functions and Attributes
Read all About it! Robots.txt Functions and Attributes

How to append specific Images and URL's

It may sound tricky to add or remove URL's and images in your robots.txt, but it truly isn't.

Once you have a grasp on what the specific attributes are referencing to, it becomes a lot easier to understand.

The following example will portray how to disallow a specific page. An explanation can be found after the example.

User-agent: *

Disallow: /apps/games.html

Sitemap: http://www.yoursite.com/sitemap.xml

As described in a previous example, the asterisk (*) indicates that all search engines should adhere to the next lines of code.

As you would have noticed in the disallow section (or allow for that matter), there is no need to place the homepage URL.

The first forward slash (/) is everything after your homepage. All that you need to type is the URL after the /. i.e The URL in the disallow section wasn't written as http://yoursite.com/apps/games.html. All that was written was /apps/games.html.

Now you know how to modify a URL, adding or removing pictures is very much the same concept.

All you do is add the folder and the image name as well.

The following example is how a picture is removed

User-agent: *

Disallow: /images/golf-cart.jpg

Sitemap: http://www.yoursite.com/sitemap.xml

Now you have the basics sorted your ready to create your own robots.txt.

How Did You Go With the Robots Script Commands?

Did we explain the script enough for you to implement your own?

See results

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.