How To Deal With Stolen Hubs & Content Theft - A Best Practice Solution
There’s nothing more irritating than finding out that your original content that you spent hours on has been added to someone else’s website or ebook without your permission. Many Hubpages writers can find themselves drowning in thousands of plagiarised works, feeling worried and overwhelmed at having an unwanted second career in keeping up with getting rid of the duplications.
With only 50+ hubs, I found myself spending days filling out DMCA notices and reading forums on content theft. With the wise words of a few expert hubbers and other helpful writers, I eventually formulated a plan of action to be my best practice routine, so I didn’t have to dread checking all of my hubs every month (and with the amount of work expected to get bigger over time, I was debating whether or not to continue with Hubpages at all).
Luckily, my planned routine turned out to be much easier to work with than doing the monthly hub checks and it takes only a few minutes a day to maintain. I decided to keep writing at Hubpages and I use this solution to deal with different levels of content duplication quickly and efficiently. In this hub, I'll explain it in detail, so you can use it too!
Should I Establish A Routine?
I am going to talk a lot about Google, which is the biggest search engine in the world and therefore has the most impact on your traffic. The way Google deals with content duplication currently is that it will place any duplications after the original article in search results.
It is unclear to me whether this occurs for everyone, or only for writers who have claimed Google Authorship. I have Google Authorship and there has only been one instance in thousands of duplications where the infringer ranked higher. But there is a place to report this, so it doesn't have to be a big deal.
If you decide not to bother chasing duplications, it will only hurt your traffic a little and only if the infringer’s website ranks higher than yours in search results. If you build up a large subdomain with lots of hubs, Google is going to give you some juice because you have lots of content (I’ve heard that if you go over the 100 hub mark, your traffic explodes – same with Etsy shops, same with any website that has over 100 pages of decent content). Hence, if you are a prolific writer and plan to output regularly, you might want to claim Google Authorship and just check once a year about whether any site has completely copied your subdomain.
If you are a sporadic writer, or you spend a lot of time making quality hubs, you may need this routine. If it’s going to help you sleep better at night knowing you’re getting the maximum amount of traffic, then do it. Setting up the routine takes time, but you only need to set it up once, then maintain it as you add hubs.
Why Doesn’t Google Remove Duplicate Content?
Google doesn’t want to remove duplicate content for a few reasons. Firstly, if all of the summary text in backlinks (especially on social sites) showed up as duplicate content, then there would be a lot of problems with sharing anything.
Secondly, there are some weird cases where people can both be original authors and can still write the same thing (though it is not usually article-length). Thirdly, regardless of the date of publication on the web, Google is still uncertain as to whom exactly is the original author (think of someone publishing text from a hard copy book onto their website).
Therefore, I expect that for many years to come, Google will not be removing duplicate content, except by DMCA request.
Why Do People Steal Content?
There are many reasons. From the lazy amateur web designer who steals a couple of sentences for their source code to artificially gain traffic, to outright article theft by major newspapers and pretend “authors” who like to plump out their ebooks, stolen content is rife on the internet.
Way back in the old days of the internet, stuffing keywords into website source code and stealing your competitor’s meta descriptions was all the rage and really worked in getting websites ranked much higher. Many amateurs who have not kept up with developments still believe this is the case.
Google decided that since so many people were copying each other, and it added so much spam to search results that they would put measures in place to prevent it. These measures include giving websites with duplicate text a lower ranking in search results than the original content.
Occasionally, you might notice that an infringer's website appears ABOVE your original hub in search results. If this happens, report it to the Google Scraper Report. Google hasn't perfected its methods of knowing which content is original yet, so it uses this Scraper Report to rectify any odd search results where the original is overlooked.
The Impact Of Content Theft
Since Google places duplicated content further down the page in search results than the original article (in the majority of cases), content theft doesn’t send offenders more traffic than it sends you. Since people tend to click on the first few results, anything lower down is likely to get far less traffic.
Hence, the main concerns of duplicate content are in the form of big sites that have over 100 articles on them, like duplicate copies of the Hubpages website, major news publishers and other such things. Small blogs get minuscule traffic compared to your original hub. But add up all those small bits and you will find that they can take a chunk of your traffic over time, so it is ideal to get rid of them when you can.
If you are planning on a mini career on Hubpages, or planning to write for many years, I would highly recommend putting a routine in place right now, before you grow your work further and it becomes too big to deal with.
Step 1: Claim Google Authorship
If you haven’t got it already, and regardless of whether you’re planning an overall best practice routine to combat content theft, claim Google Authorship. To date, Google has been putting a little bit of emphasis on Google Authorship in order to establish whether the publisher is a writer and whether they have a body of work and the original date of publication.
Going forward, I expect Google to be placing more importance on this and to be tightening the “rules” of applying and keeping Authorship (eg it might become more like Adsense), so that it is hard to get and you have to be an original content producer to keep it. I hope that it becomes a benchmark of original content publication on the web!
Currently, it’s easy to get Google Authorship and it requires no maintenance whatsoever. If you haven’t got it, you’re doing yourself a disservice. Don't forget to use a good looking profile picture, in case the author photos come back again into search results.
***UPDATE*** August 31, 2014
Google has announced that it will stop showing Authorship results in Google Search. Google Authorship will soon be discontinued, more news will be added as this progresses. Hence, give Step #1 a miss at this time.
Step 2: Deal With Existing Copied Content & Set Up Google Alerts
I am going to assume you have a number of hubs already published and that you haven’t delved into finding every single bit of duplicated content yet. In this, the biggest step, you’ll need to track down EVERY existing infringing copy and deal with it in some way, as well as enter hub text into Google Alerts so that you can be alerted immediately when new copies get published online.
Firstly, make a spreadsheet or list called “Google Alerts”, and place all of your existing hub urls in the first column, calling the column “URLs”. Have patience and do this, it makes it easier later. Make a second column for “Duplication Check” and a third column for “Google Alerts”. As you complete the duplication check and alerts, add a “y” into the appropriate columns.
Google Alerts Spreadsheet
(your hub url here)
put a "y" when done
put a "y" when done
(your hub url here)
put a "y" when done
put a "y" when done
(your hub url here)
put a "y" when done
put a "y" when done
You’re probably going to be working on this for a long while (I did 5 hubs a week until it was all done), so sit back, relax, get a coffee, put some music on, and let’s get started.
- For very long webpages, use Control + F on your keyboard and input the original phrase (eg “Cacti thieves are complete idiots who deserve what's coming to them”) and click search to find copied text.
- If your text is not visible on on a webpage, try right click and “View Page Source”. Then Control + F on your keyboard and input the “Cacti thieves are complete idiots who deserve what's coming to them” into the search box, to see if it comes up in the source code. If it does come up, then you can report to Google Spam.
- If your text is not in the source code and not on the webpage, then ignore it, as it might be archived in an ancient vault on the site and cannot be accessed by the public easily.
Go to the first hub that is listed in your “Google Alerts” spreadsheet and open up the hub on Hubpages. Find a sentence that is specific, original and short (eg “Cacti thieves are complete idiots who deserve what's coming to them”). Do not use a sentence from the first main 200 words of text on the hub, as it will bring up every social share in existence for your hub and will confuse results too much.
Copy and paste the sentence into a Google search box in inverted commas, just like this: “Cacti thieves are complete idiots who deserve what's coming to them”
Up should come a number of websites with copied text bolded in the results. Ignore anything with the Hupages domain or any social media sites such as Pinterest etc. What you are looking for are the other websites that are not just social media shares or Hubpages. When you find one, press Control on your keyboard (PC) and left click the mouse on the main blue link. It will open a new tab or window to examine the offending website.
Note: There are lots of sites out there (like Woorank, for example) devoted to reporting the rankings of your subdomain and they usually have lists of all your titles or URLs and have statistics and graphs on their pages. Don’t worry about these. There are also sites with pages of links and the site may be helping you by backlinking to your hub - don't report these either!
If you see a website with your entire hub text on it OR bits of your hub (with no backlink to your Hubpages subdomain), use one of the reporting methods below to deal with it.
Reporting Duplicate Content
There are many ways to report copyright violation (content theft). I recommend visiting all of the different links in this section and bookmarking them for future reference. The idea behind reporting is to remove potential gains for the thief and to remove the stolen content from their website. Some large community websites have their own pages for reporting, such as Blogger and Wordpress, but non-community websites don’t usually have this feature.
Create a second spreadsheet to keep track of how you have reported offenders. It's a good idea to do this because you might need to check the URLs again at a later date to see if websites have been removed.
Duplication Reporting Spreadsheet
(URL of stolen content)
eg. Whole Hubpages Site Copied
eg. DMCA Takedown
(URL of stolen content)
eg. Spun Text
eg. Google Spam
(URL of stolen content)
eg. Copied Hub
eg. Google DMCA
Why Report To Google Spam?
According to Google:
“Google’s search quality team uses spam reports as a basis for further improving the quality of the results that we show you… Spam reports are prioritized by looking at how much visibility a potentially spammy site has in our search results, in order to help us focus on high-impact sites in a timely manner… We generally use spam reports to help improve our algorithms so that we can not only recognize and handle this particular site, but also cover any similar sites. In a few cases, we may additionally choose to immediately remove or otherwise take action on a site.”
Reporting To Google Spam
Use this to report spun article text (nonsense text with a few sentences of your hub thrown in), text in website source code, webpages that appear in search engine results where you can’t see any stolen hub and other stuff that isn’t an entire duplication of your hub.
Install the Chrome web browser and get the Google Spam extension. It will put a little red flag on the top right of the browser. When you find a webpage to report, while on the page, click the red flag. Fill out the Additional Details box with a brief explanation to assist Google employees in determining if the website is a problem or not. If you don’t fill in this box, they’ve got nothing to work with.
Eg. “This website is about sunglasses, yet it has “Cacti thieves are complete idiots who deserve what's coming to them” in its source code.”
It is important with this explanation that you do not mention that it is stolen text or anything like that, as you should do a DMCA if it has significant stolen text. Best to just point out that the website is spammy in some way.
Eg. “This website has spun articles and doesn’t make any sense”.
It can take a long time for Google Spam to be active on the reported websites, but it’s a very fast and effective reporting method for all those thousands of Chanel websites with copied text in their source code. It’s also far easier to report spun articles this way than to bother with chasing DMCAs, especially when they’re small sites getting small amounts of traffic anyway.
Getting Content Thieves Banned From Adsense
Determine if a content thief has Adsense ads on their website. If they do, report them to Adsense (regarding content of website being a violation of Adsense terms and conditions). This will get them banned from Adsense and they won’t be able to make any money from your work. Plus, it will be a hard journey for them to get another Adsense account.
Removing The Website From Google With Google DMCA
This is for small blogs that have stolen one hub of yours. They are usually spammy looking and have not bothered copying your images. They don’t have many pages on their website and they may even be in a foreign language. It’s also suitable for forum posts that have a copied hub in them and for sites that have bigger portions of your hub in them with spun article text.
Note: Google DMCA does not remove the website from the internet. It only removes it from Google's search results. Hence, use this method for smaller infringements only, as they will still remain visible in other search engines.
Fill in a Google DMCA form (choose Web Search, then “I have a legal issue that is not mentioned above”, then “I have found content that may violate my copyright”). Fill out the DMCA form in detail.
You can view the progress of your request by visiting your DMCA Removal Dashboard when you are logged in to your Google account (search for "DMCA Removal Dashboard" to find it). DMCA removals usually take a few days and once approved by Google, the infringing website will be automatically removed from search results.
What To Do If Your Hub Is Copied Offline
- Has someone published/printed your hub off the internet?
If someone has printed a hard copy of your hub, you can seek compensation. Learn how to send a legal Cease & Desist Letter with a compensation request.
Removing The Website From The Internet With A DMCA Takedown Notice
Use this for reporting copies of your hubs that are likely to get a bit of traffic from your stolen hub. For example, copies of the Hubpages website, sites with lots of articles, ebooks and prominent websites. A DMCA Takedown will remove the website from the internet completely, or will force the webmaster to remove your hub. It also puts an official black mark against the website, as your complaint will be registered against them online in the Chilling Effects project.
Look up the URL of the website on Who Is Hosting This? (using only the main url – eg. http://www.domain.com) and find out who the hosting provider is (not the domain name provider). Search for the name of the hosting company’s website and locate a way to contact them and email them a DMCA Takedown Notice, with your correct details filled in. By law, the hosting company is supposed to get the person to clean up the content, or they remove the website from the internet. I can say it has been 100% effective every time I have used it!
Sample DMCA Takedown Notice
Fill in the capitalized and bolded bits with your details.
My name is YOUR REAL NAME HERE and I am the original content writer of HUBPAGES SUBDOMAIN URL HERE. A website that your company hosts (according to WHOIS information) is infringing on at least one copyright owned by me.
An article was copied onto your servers without permission. The original article, to which I own the exclusive copyrights, can be found at:
URL OF YOUR HUB HERE
The unauthorized and infringing copy can be found at:
URL OF COPY HERE
This letter is official notification under Section 512(c) of the Digital Millennium Copyright Act (”DMCA”), and I seek the removal of the aforementioned infringing material from your servers. I request that you immediately notify the infringer of this notice and inform them of their duty to remove the infringing material immediately, and notify them to cease any further posting of infringing material to your server in the future.
Please also be advised that law requires you, as a service provider, to remove or disable access to the infringing materials upon receiving this notice. Under US law a service provider, such as yourself, enjoys immunity from a copyright lawsuit provided that you act with deliberate speed to investigate and rectify ongoing copyright infringement. If service providers do not investigate and remove or disable the infringing material this immunity is lost. Therefore, in order for you to remain immune from a copyright infringement action you will need to investigate and ultimately remove or otherwise disable the infringing material from your servers with all due speed should the direct infringer, your client, not comply immediately.
I am providing this notice in good faith and with the reasonable belief that rights I own are being infringed. Under penalty of perjury I certify that the information contained in the notification is both true and accurate, and I have the authority to act on behalf of the owner of the copyright(s) involved.
Should you wish to discuss this with me please contact me directly.
YOUR REAL NAME HERE
YOUR ADDRESS HERE
YOUR EMAIL HERE
YOUR PHONE NUMBER HERE
KEEP GOING: Continue Searching For Duplications To Report
When you have dealt with ALL of the pages of search results returned for your search phrase (eg “Cacti thieves are complete idiots who deserve what's coming to them”), then move on to the next part, which is to enter the same phrase into Google Alerts, as you’ve now checked it, dealt with it and want to be told if anything for this phrase gets published online again.
What To Do When You Receive Google Alerts
When Google emails you an Alert, visit the URL of the offending page. Determine what action is needed and deal with it on the spot using one of the reporting methods above. Then delete the Alert from your email inbox.
Setting Up Google Alerts
Go to Google Alerts and in the box for entering a new alert, put in your phrase (eg. “Cacti thieves are complete idiots who deserve what's coming to them”) with the inverted commas. Click the “Show options” link and set it up the following way:
How often: As-it-happens
Language: (your language)
Region: Any region
How many: All results
Deliver to: (enter your email address here)
Then click “Create Alert”. From now on, Google will send you an email when your phrase in inverted commas is published on a website online, as soon as it happens. This will not catch everyone stealing your content, as it only alerts you to a sentence and lots of people steal just one sentence. But it is guaranteed to catch everyone who ever posts an entire copy or even a significant portion of your hub text. Make sure your phrase is as original or unusual sounding as possible, otherwise you will get too many emails.
Finishing Step 2
OK, now that you have completed the reporting and Google Alerts for one phrase (eg. “Cacti thieves are complete idiots who deserve what's coming to them”), it’s time to move on and do the rest of them. I like to pick about four sentences per hub to report and Alert. Then, when I have completed all four sentences, I put a “y” in both columns on my first spreadsheet.
Do all of the hubs on your spreadsheet, then your system is set up! It can take a long time, but you get faster at it as you go and you only have to do this procedure once. Google Alerts will email you whenever any new copies get published.
What would you do to content thieves if you could get your hands on them?
Step 3: Protecting New Hubs
Create a short list of things to do before pressing the “Publish” button on a new hub.
I write my hubs in Word documents and transfer them to Hubpages. The reason for this is that if there was ever any doubt as to who is the original owner of the content, I have a time and date stamped file in Word to prove it. When you save a Word document, you can see in the File Properties when the document was created and most people use this as irrevocable legal proof as the date cannot be changed. Just be careful not to save the Word document again after publishing your hub, so you can preserve the original date and time.
Statement Of Copyright
Some people like to put another copyright symbol on their work. I use the built in Hubpages copyright symbol (click Edit Hub, then in the right column, Display Options and Copyright). I also use an image box saying “Do not copy this article” down the bottom of my hubs. I give everyone on Hubpages to copy this box and use it too if they like.
“Save Page As”
Sometimes it takes a lot of work to get a hub looking just right. I like to right click on the page and “Save As” then save as a complete webpage before publishing my hub, so that it automatically saves the hub formatting, the text, captions, images etc into a neat little file. It is also a secondary form of proof, as it is also date and time stamped like the Word file and can be used legally as proof of content ownership. To view it at a later date, you can click on the html file and it will open in your web browser.
Get four sentences that are further than 200 words into the hub and add them to Google Alerts in inverted commas. This means you’ll be emailed about any copies immediately.
Relax & Write!
Now that you’ve cleaned up all of your existing hubs and have a process to protect new hubs, you’ll be able to deal with copyright infringement as it arrives in your inbox - there's no need to hunt down duplications online ever again. Learn the different reporting procedures by heart and use them as you need to. I find it takes about two minutes a day dealing with copyright infringement, though now that lots of people seem to be getting the message, I get Alerted far less than in the beginning. Best of luck!
© 2014 Suzanne Day