Arts Autos Books Business Education Entertainment Family Fashion Food Games Gender Health Holidays Home HubPages Personal Finance Pets Politics Religion Sports Technology Travel

On Page SEO Part 2: An Introduction To Signals of Quality

Updated on July 17, 2010

Peter Hoggan

Contact Author

The Findability Formula: The Easy, Non-Technical Approach to Search Engine Marketing

Buy Now

In the previous tutorial we looked at some basic on page factors including the alt attribute. It was suggested that every img tag should also have an alt attribute even if the image referred to was entirely decorative. These changes might at first seem a bit pedantic, however it makes for better accessibility and standards compliant HTML.

Ensuring pages are accessible and standards compliant can cause a lot of work for webmasters trying to rectify things after a site has gone live, especially if every page contains multiple HTML errors. So is it worth all the bother? The simple fact is that accessible sites are generally more search engine friendly and can be viewed on a wider selection of devices and browsers.

Making sure that every piece of html code on every page validates and meets current accessibility standards are signals that a business cares about every single visitor to their website. Spammers using ‘throwaway domains’ are more likely to shy away from this type of work because of labor, time and expense.

Signals of quality are rarely about relevance, for example it’s easy to understand why allowing a page to go live as an ‘untitled document’ would harm relevancy, it’s not so obvious why including a telephone number would increase search engine rankings.

There is a distinct difference between quality and relevance and search engine must necessarily balance both aspects in order to deliver the best results. The task of Identifying quality is becoming increasingly important due to the amount of low-quality content that is being uploaded to the web every day.

Bayesian Filters

Bayesian filtering is utilized by most modern day mail clients as a means to weed out spam emails from legitimate emails. Search engines use it to categorize documents and Google uses it to deliver relevant Adsense ads. How do Bayesian filters Work? Initially the process starts with a list of sites that have been classified as high quality and another list that has been classified as low quality. The filter looks at both and analyzes the characteristics common to either type of site.

Once the filter has been seeded and the initial analysis completed they can be used to analyze every page on the web. The clever thing about Bayesian filters is that they continue to spot new characteristics and get smarter over time. Before we delve into any great detail on how Bayesian filters work, here is a couple of quotes from Matt Cuts regarding Signals of quality that clearly show Google is addressing the problems caused by low quality mass generated content.

“Within Google, we have seen a lot of feedback from people saying, Yeah, there’s not as much web spam, but there is this sort of low-quality, mass-generated content . . . where it’s a bunch of people being paid a very small amount of money. So we have started projects within the search quality group to sort of spot stuff that’s higher quality and rank it higher, you know, and that’s the flip side of having stuff that’s lower-quality not rank as high.”

“You definitely want to write algorithms that will find the signals of good sites. You know, the sorts of things like original content rather than just scraping someone, or rephrasing what someone else has said. And if you can find enough of those signals—and there are definitely a lot of them out there—then you can say, OK, find the people who break the story, or who produce the original content, or who produce the impact on the Web, and try to rank those a little higher. . . .”

There has been mention of Signals of Quality in Google patents and some specifics have been discussed by Google engineers so hopefully the days of article mills and article spinners are numbered.

How Bayesian Filtering Works

Although it is known that search engines use Bayesian Filtering the exact algorithm is of course proprietary and unlikely to be made public, however the actions of Bayesian filters are well understood. So lets start by looking at how Bayesian filtering works.

To begin a large sample or white list of known good documents (authoritative highly trusted pages) and a large sample of known bad documents (pages from splogs, scrapper sites etc) are analyzed and the characteristics of each page compared. When a large corpus of documents is compared programmatically patterns or ‘signals’ emerge that were hitherto invisible. These signals can then be used to provide a numeric value (or percentage likelihood) of whether the characteristics of other pages lean towards those from the original sample of good documents or those from the original sample of bad documents.

Some simple examples of this would be to compare the words in the good documents to those in the bad documents, if it is discovered that many low quality pages use the terms like ‘buy cheap Viagra’ or have a section on each page for ‘sponsored links´ then other pages that do the same might be of low quality also. Conversely if it is discovered that high quality pages often contain a link to a Privacy Policy or display a contact telephone number then other pages that do the same might also be high quality pages.

As the process continues more signals are uncovered. In this way the filter learns to recognize other traits and whether they are good or bad. There is likely to be many signals of quality measured, each one adding to or subtracting from an overall score of a pages quality.
This means is that SEO’s web designers and webmasters need to adopt a holistic approach that takes into account information architecture, relevancy, accessibility, usability, quality, hosting and user experience.

The Link Structure of The Web

Although links will be covered in future tutorials, it makes sense to discuss some of the implications of recent changes in the link structure of the web now. Once upon a time reciprocal links were all that were needed to achieve top search engine rankings. Because reciprocal links were easy to acquire and made it easy to promote sites of lesser quality so that they outranked quality sites search engines stepped in and devalued reciprocal links along with PageRank.

One way links were now the way to go, so a new market in selling one way links emerged. Search engines again viewed this as a way to game the system and paid links, if detected, were devalued so that they passed no value whatsoever. The nofollow attribute was implemented so that, amongst other reasons, links could be sold without penalty. The nofollow attribute has also been adopted for other reasons and is used on millions of blogs and some of the most popular social sites.

URL shortening is also popular and again is used by some of the most popular sites on the web. The upshot of all this is that although the web continues to grow the ability of many millions of pages to link out and cast a vote for other pages has been removed. Of course you still get the traffic which can be substantial if you make the front page of Digg. Because the link graph of the entire web is essentially in recession, search engines are again reevaluated the way they calculate rankings and quality has many discernable signals.

The Need To Discern Quality

According a study carried out by WebmasterWorld the top 15 doorway domains are a haven for spam. The study analyzed popular search terms and discovered that more than 50% of the results were spam. 77% of the results from blogspot.com were found to be spam. The following list shows the level of spam found on the top 15 doorway domains:

Dorway Domain	Spam%
sitegr.com	100%
blog.hix.com	100%
blogstudio.com	99%
torospace.com	95%
home.aol.com	95%
blogsharing.com	93%
hometown.aol.de	91
usaid.gov	85
hometown.aol.com	84
maxpages.com	81
oas.org	78
blogspot.com	77
xoomer.alice.it	77
netscape.com	74
freewebs.com	52

The study shows that on the keywords tested some of these blogs are used exclusively by spammers, while others had a very high percentage. The reason for this is that these sites provide free blog space which is a magnet for spammers who need to generate links to low quality splogs or scraper sites quickly.

The next list compares percentage of spam sites by top-level domain' (TLD):

TLD	Spam%
.info	68
.biz	53
.net	12
.org	11%
.com	4%

This research highlights the incredible amount of spam that exists on the web but it would be unfair to penalize every .info domain for example just because a high percentage of .info domains are used by spammers. Conversely it would be unwise to trust every .com even though in general they seem to be comparatively spam free. To discern quality many signals have to be considered covering every aspect of a website.

The next tutorial in this series will be looking at on page signals of quality nad why quality score is the new PageRank.

Course Index

01: A Free SEO Training Course For Hubbers

02: SEO Course Outline

03: An Intoduction to SEO

04: An Introduction to Search Engines

05: Search Engines and Latent Semantic Indexing

06: Search Engine Users

07: Keyword Research

08: Competitor Research

09: A Guide to PageRank

10: On Page SEO Part 1

11: On Page SEO Part 2 - Introduction To Quality Signals (You Are Here)

This website uses cookies

As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

Necessary

Features

Marketing

Statistics

Approve All & Submit
Approve Checked Only

For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://corp.maven.io/privacy-policy

Show Details

Necessary
HubPages Device ID	This is used to identify particular browsers or devices when the access the service, and is used for security reasons.
Login	This is necessary to sign in to the HubPages Service.
Google Recaptcha	This is used to prevent bots and spam. (Privacy Policy)
Akismet	This is used to detect comment spam. (Privacy Policy)
HubPages Google Analytics	This is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic Pixel	This is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web Services	This is a cloud services platform that we used to host our service. (Privacy Policy)
Cloudflare	This is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted Libraries	Javascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)

Features
Google Custom Search	This is feature allows you to search the site. (Privacy Policy)
Google Maps	Some articles have Google Maps embedded in them. (Privacy Policy)
Google Charts	This is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host API	This service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTube	Some articles have YouTube videos embedded in them. (Privacy Policy)
Vimeo	Some articles have Vimeo videos embedded in them. (Privacy Policy)
Paypal	This is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook Login	You can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
Maven	This supports the Maven widget and search functionality. (Privacy Policy)

Marketing
Google AdSense	This is an ad network. (Privacy Policy)
Google DoubleClick	Google provides ad serving technology and runs an ad network. (Privacy Policy)
Index Exchange	This is an ad network. (Privacy Policy)
Sovrn	This is an ad network. (Privacy Policy)
Facebook Ads	This is an ad network. (Privacy Policy)
Amazon Unified Ad Marketplace	This is an ad network. (Privacy Policy)
AppNexus	This is an ad network. (Privacy Policy)
Openx	This is an ad network. (Privacy Policy)
Rubicon Project	This is an ad network. (Privacy Policy)
TripleLift	This is an ad network. (Privacy Policy)
Say Media	We partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing Pixels	We may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking Pixels	We may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.

Statistics
Author Google Analytics	This is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
Comscore	ComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking Pixel	Some articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
Clicksco	This is a data management platform studying reader behavior (Privacy Policy)

On Page SEO Part 2: An Introduction To Signals of Quality

Bayesian Filters

How Bayesian Filtering Works

The Link Structure of The Web

The Need To Discern Quality

Related SEO Hubs And Articles

Related

How to Use Photos to Increase Search Engine Visibility

Writing Without Keywords and SEO

The Rise of Zero-Click Searches in 2025: Ultimate Guide

Top 10 Most Popular and Handsome Korean Drama Actors

Learn How to Strengthen Your Writing Skills

Popular

50 Answers to the Most Common "Why" Questions on Google

Google Advanced Search Features: Expert Level Internet Research

5 Methods for Filtering Google Searches

Arts and Design

Autos

Books, Literature, and Writing

Business and Employment

Education and Science

Entertainment and Media

Family and Parenting

Fashion and Beauty

Food and Cooking

Games, Toys, and Hobbies

Gender and Relationships

Health

Holidays and Celebrations

Home and Garden

HubPages Tutorials and Community

Personal Finance

Pets and Animals

Politics and Social Issues

Religion and Philosophy

Sports and Recreation

Technology

Travel and Places

About Us

This website uses cookies