ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

On Page SEO Part 2: An Introduction To Signals of Quality

Updated on July 17, 2010

In the previous tutorial we looked at some basic on page factors including the alt attribute. It was suggested that every img tag should also have an alt attribute even if the image referred to was entirely decorative. These changes might at first seem a bit pedantic, however it makes for better accessibility and standards compliant HTML.

Ensuring pages are accessible and standards compliant can cause a lot of work for webmasters trying to rectify things after a site has gone live, especially if every page contains multiple HTML errors. So is it worth all the bother? The simple fact is that accessible sites are generally more search engine friendly and can be viewed on a wider selection of devices and browsers.

Making sure that every piece of html code on every page validates and meets current accessibility standards are signals that a business cares about every single visitor to their website. Spammers using ‘throwaway domains’ are more likely to shy away from this type of work because of labor, time and expense.

Signals of quality are rarely about relevance, for example it’s easy to understand why allowing a page to go live as an ‘untitled document’ would harm relevancy, it’s not so obvious why including a telephone number would increase search engine rankings.

There is a distinct difference between quality and relevance and search engine must necessarily balance both aspects in order to deliver the best results. The task of Identifying quality is becoming increasingly important due to the amount of low-quality content that is being uploaded to the web every day.

Bayesian Filters

Bayesian filtering is utilized by most modern day mail clients as a means to weed out spam emails from legitimate emails. Search engines use it to categorize documents and Google uses it to deliver relevant Adsense ads. How do Bayesian filters Work? Initially the process starts with a list of sites that have been classified as high quality and another list that has been classified as low quality. The filter looks at both and analyzes the characteristics common to either type of site.

Once the filter has been seeded and the initial analysis completed they can be used to analyze every page on the web. The clever thing about Bayesian filters is that they continue to spot new characteristics and get smarter over time. Before we delve into any great detail on how Bayesian filters work, here is a couple of quotes from Matt Cuts regarding Signals of quality that clearly show Google is addressing the problems caused by low quality mass generated content.

“Within Google, we have seen a lot of feedback from people saying, Yeah, there’s not as much web spam, but there is this sort of low-quality, mass-generated content . . . where it’s a bunch of people being paid a very small amount of money. So we have started projects within the search quality group to sort of spot stuff that’s higher quality and rank it higher, you know, and that’s the flip side of having stuff that’s lower-quality not rank as high.”

“You definitely want to write algorithms that will find the signals of good sites. You know, the sorts of things like original content rather than just scraping someone, or rephrasing what someone else has said. And if you can find enough of those signals—and there are definitely a lot of them out there—then you can say, OK, find the people who break the story, or who produce the original content, or who produce the impact on the Web, and try to rank those a little higher. . . .”

There has been mention of Signals of Quality in Google patents and some specifics have been discussed by Google engineers so hopefully the days of article mills and article spinners are numbered.

How Bayesian Filtering Works

Although it is known that search engines use Bayesian Filtering the exact algorithm is of course proprietary and unlikely to be made public, however the actions of Bayesian filters are well understood. So lets start by looking at how Bayesian filtering works.

To begin a large sample or white list of known good documents (authoritative highly trusted pages) and a large sample of known bad documents (pages from splogs, scrapper sites etc) are analyzed and the characteristics of each page compared. When a large corpus of documents is compared programmatically patterns or ‘signals’ emerge that were hitherto invisible. These signals can then be used to provide a numeric value (or percentage likelihood) of whether the characteristics of other pages lean towards those from the original sample of good documents or those from the original sample of bad documents.

Some simple examples of this would be to compare the words in the good documents to those in the bad documents, if it is discovered that many low quality pages use the terms like ‘buy cheap Viagra’ or have a section on each page for ‘sponsored links´ then other pages that do the same might be of low quality also. Conversely if it is discovered that high quality pages often contain a link to a Privacy Policy or display a contact telephone number then other pages that do the same might also be high quality pages.

As the process continues more signals are uncovered. In this way the filter learns to recognize other traits and whether they are good or bad. There is likely to be many signals of quality measured, each one adding to or subtracting from an overall score of a pages quality.
This means is that SEO’s web designers and webmasters need to adopt a holistic approach that takes into account information architecture, relevancy, accessibility, usability, quality, hosting and user experience.

The Link Structure of The Web

Although links will be covered in future tutorials, it makes sense to discuss some of the implications of recent changes in the link structure of the web now. Once upon a time reciprocal links were all that were needed to achieve top search engine rankings. Because reciprocal links were easy to acquire and made it easy to promote sites of lesser quality so that they outranked quality sites search engines stepped in and devalued reciprocal links along with PageRank.

One way links were now the way to go, so a new market in selling one way links emerged. Search engines again viewed this as a way to game the system and paid links, if detected, were devalued so that they passed no value whatsoever. The nofollow attribute was implemented so that, amongst other reasons, links could be sold without penalty. The nofollow attribute has also been adopted for other reasons and is used on millions of blogs and some of the most popular social sites.

URL shortening is also popular and again is used by some of the most popular sites on the web. The upshot of all this is that although the web continues to grow the ability of many millions of pages to link out and cast a vote for other pages has been removed. Of course you still get the traffic which can be substantial if you make the front page of Digg. Because the link graph of the entire web is essentially in recession, search engines are again reevaluated the way they calculate rankings and quality has many discernable signals.

The Need To Discern Quality

According a study carried out by WebmasterWorld the top 15 doorway domains are a haven for spam. The study analyzed popular search terms and discovered that more than 50% of the results were spam. 77% of the results from blogspot.com were found to be spam. The following list shows the level of spam found on the top 15 doorway domains:

Dorway Domain
Spam%
sitegr.com
100%
blog.hix.com
100%
blogstudio.com
99%
torospace.com
95%
home.aol.com
95%
blogsharing.com
93%
hometown.aol.de
91
usaid.gov
85
hometown.aol.com
84
maxpages.com
81
oas.org
78
blogspot.com
77
xoomer.alice.it
77
netscape.com
74
freewebs.com
52

The study shows that on the keywords tested some of these blogs are used exclusively by spammers, while others had a very high percentage. The reason for this is that these sites provide free blog space which is a magnet for spammers who need to generate links to low quality splogs or scraper sites quickly.

The next list compares percentage of spam sites by top-level domain' (TLD):

TLD
Spam%
.info
68
.biz
53
.net
12
.org
11%
.com
4%

This research highlights the incredible amount of spam that exists on the web but it would be unfair to penalize every .info domain for example just because a high percentage of .info domains are used by spammers. Conversely it would be unwise to trust every .com even though in general they seem to be comparatively spam free. To discern quality many signals have to be considered covering every aspect of a website.

The next tutorial in this series will be looking at on page signals of quality nad why quality score is the new PageRank.

working

This website uses cookies

As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://corp.maven.io/privacy-policy

Show Details
Necessary
HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
LoginThis is necessary to sign in to the HubPages Service.
Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
AkismetThis is used to detect comment spam. (Privacy Policy)
HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
Features
Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
MavenThis supports the Maven widget and search functionality. (Privacy Policy)
Marketing
Google AdSenseThis is an ad network. (Privacy Policy)
Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
Index ExchangeThis is an ad network. (Privacy Policy)
SovrnThis is an ad network. (Privacy Policy)
Facebook AdsThis is an ad network. (Privacy Policy)
Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
AppNexusThis is an ad network. (Privacy Policy)
OpenxThis is an ad network. (Privacy Policy)
Rubicon ProjectThis is an ad network. (Privacy Policy)
TripleLiftThis is an ad network. (Privacy Policy)
Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
Statistics
Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
ClickscoThis is a data management platform studying reader behavior (Privacy Policy)