- Internet & the Web»
- Search Engines
Can Article Spinning Beat Google’s Duplicate Content Filters
Article Spinning Software provides authors with the ability to create variations of existing articles and documents for publication and syndication on the web. Words, phrases, sentences or entire paragraphs can be rewritten using alternate words and synonyms in order to put a new spin on the original, hence the term spinning.
- Most spinners have the option to completely automate the rewriting process, however this invariably produces unreadable garbage.
- A huge amount of human input is required if you want the finished article to be readable.
- Although it is claimed that spinning can produce unique content a more accurate description would be that it is simply paraphrased.
Are Article Spinners Capable Of Delivering Their Promise?
The purpose of spinning articles is to reposition existing content so that it can be syndicated through article directories like ezinearticles.com and thousands of others. It is claimed that by syndicating spun content it is possible to beat Google’s duplicate content filters. So the burning question is; Are article spinners capable of delivering their promise?
But first the article has to be indexed. Prior to indexing search engines have no idea what the page is about so there is no advantage given to unique content over duplicate content when it comes to indexing a page for the first time. How long it takes to be initially indexed depends on multiple factors, how well the page is linked from other powerful pages within the site for example. However, none of these factors have anything to do with how unique, how well written, or how complete the article is.
Develop Real Copywriting Skills With These Great Books From Amazon
Article Spinning and Host Crowding
Host crowding is a filter that limits the amount of results returned from any website in response to a search query. In effect only two pages can be returned, these can be grouped together in the search results with the second result slightly indented or they could be pages apart. The problem host crowding presents to authors is that there may already be tens or even hundreds of pages already targeting the term they want to chase.
You can check how many pages there are on ezinearticles.com targeting Article Spinning by using Google’s advanced site: search
“Article Spinning” site:ezinearticles.com
This search produces 8 results, so before any page targeting ‘Article Spinning’ will appear in the Google’s SERPs you need to outrank these pages locally. 8 is not a threatening number but keep in mind that some of these pages may already have inbound links as part of a sustained SEO campaign. Article spinning gives no advantage in this area.
Article Spinning and Duplicate Content
Although Google have publicly declared there is no duplicate content penalty, they do filter overly similar or duplicate content. Content that is deemed overly similar can only be seen if you navigate to the last page of results and click on the link ‘repeat the search with the omitted results included’. While it’s true that article spinners can create articles that are unique, readable to humans and pass Copyscape’s plagiarism test with flying colors, it all becomes academic unless they can sneak under Google's radar too.
It may seem that search engines are able to read and understand words and decide what pages are relevant for a given search. Well, search engines are contextual and they do base their rankings based on the words that make up pages and the links that point to those pages but they don’t read words in the conventional sense.
Document Normalization is an essential step in every search engines algorithm; it allows them to look at pages on a level playing field by removing the noise and concentrating on the words that have true meaning. This means that many words are simply ignored, these words are called function words and account for approximately 40% of any article. Although function words are ignored or given little vallue by search engines they are required by human readers and include words like ‘the’, ‘and’ etc. Function words are necessary if you want human readers to enjoy and understand your writing; this limits article spinning to effectively working with around 60% of the total words on a page.
Content Words and Function Words
In every language, you have two different kinds of word:
- content words - e.g. car, phone, liberty, celebrity, etc.
- function words - e.g. and, but, to, the, etc.
Content words hold some kind of meaning; we can visualize a car is or understand the concept of liberty. Function words don’t hold meaning; ask yourself, what is the meaning of ‘the’? Search engines strip documents of function words in order to focus on words with meaning. It is useful to know this, as it is what a search engine will be doing to the words in every article you write and syndicate.
Search engines employ a list of stop words in order to strip web pages down to a skeleton of content words. This stop list is a list of commonly used words, function words, verbs, prepositions, etc, which it removes from the page and helps the search engine determine what the page is about. This is all part of the Document Normalization process search engines performs upon web pages in order to determine the relevance of each page objectively.
The complete document normalization process search engines perform upon web pages when indexing a document is as follows:
Linearization and Tokenization
Markup tags (html code), punctuation and
capitalization are removed from a page, the search engine moves through
the page systematically, working from top to bottom and left to right,
removing content from tags as it finds it. This action leaves the page
as very basic text file containing one continuous block of words.
Regardless of whether the above paragraph was original or the result of article spinning it would be reduced to:
markup tags html code punctuation and capitalization are removed from a page the search engine moves through the page systematically working from top to bottom and left to right removing content from tags as it finds it this action leaves the page as very basic text file containing one continuous block of words
Filtration and Stemming
The search engine applies a stop list to remove commonly used words from the document. This leaves us with only content words. The remaining content words are then ‘stemmed’. That is to say that the remaining terms are reduced to common word roots (e.g. ‘techno’ for ‘technology’, ‘technologies’, ‘technological’).
And again whether the above paragraph was original or the result of article spinning it would be reduced to:
markup tag html code punctuat capit remov page search engine move page systemat work top bottom left right remov content tag find action leav page basic text file contain continu block word
Just over 40% of the words used in the original text were stop words which is about the norm for any webpage or article. In reality this means that a 300 word article is going to be reduced to around 180 words. There is also the target keyphrase to take into account, say for example the article was targeting ‘Article Spinning Software Review’ and that it was used five times in the article. None of these words are stop words so our 300 word document in reality offers only 160 words in which to make it suitably unique in order to pass Google’s duplicate content filter.
This is where the real headache starts for anyone using article spinning software, you can’t replace words with words that share the same root because it won’t make one bit of difference to the way search engines see the page. As I said earlier, search engines don’t read words in the conventional sense, and while it is possible to spin articles that are grammatically correct, readable and pass Copyscape’s plagiarism test, it is a lot harder to fly under Google’s radar.