I’ve had a hub scraped by 2 different sites in the time between when a hub goes out of ‘pending’ into ‘featured’ and when google indexes that hub. The copies are ranking above me for a couple of terms – the scraped articles were indexed 18 hours before mine was [EDIT: I've just checked - my original article actually still isn't indexed, it was a different article showing up that was linked to the newer one that made me think it had finally been indexed). I’ve filed DCMAs, but I’m wondering if this is going to become a major headache because of the thousands of automatic scraper sites and lengthy indexing delays. Both the sites that scraped me are doing the same to all the hubs on HP (or a lot, it’s hard to say) – both are blogspot blogs, one is called soft mania 4 u (all one word), the other is promote your hubpages (all one word). The scraped articles aren’t even full articles – just the title, summary text and first photo – but still ranking above my original for some terms.
Is there any way to get our hubs indexed quicker once they go to ‘featured’ status? Will Google see our hubs as copies?
I know squidoo has a similar problem getting lenses crawled quickly (I read a couple of blog posts by writers there) – if anyone writes on there, have you found the indexing delays and copying thing to be a real problem and is there any way around it?
(just to note that before the new 'pending' feature came in, my Hubs were getting indexed within an hour or two, sometimes within minutes, and scraping/ranking wasn't a problem).
One of mine which I believe is still yet to be indexed I have also found on Softmania, although it's only offering the summary and appears to give the link at the bottom to the article. Odd.
EDIT: When you copy and paste the link at the bottom it just brings you to the HP feed, not even the article.
On the bottom of mine (also only a summary - they're scraping from HP's 'Latest' feed so they just get the title, summary, and first photo) there's a (non-live) URL to HP's 'Latest' Hubs category, but no link to my original Hub. Some of their little copies of bits and pieces seem to be ranking relatively highly (considering how short and stolen they are). As you say - it's all a bit odd.
EDIT: ha! you beat me to it! (slow typing ... must try harder)
I'm actually quite furious about this. DCMA's to be completed methinks!
Me too, I'm absolutely fuming. I'm wondering how to get hubs indexed quicker - it wouldn't be a problem if ours were indexed before the scrapers (I'd still be irritated, but as long as we were ranking higher it wouldn't be so much of a problem). I've seen suggestions like blogging links to hubs and tweeting them etc, but I wonder if that might go against us, post-penguin.
I'm wondering the same thing, particularly as my scraped hub has still not been indexed. Recently, I've been manually submitting new hubs to Bing and Yahoo, I'm hoping that if nothing else this provides more evidence of date of publication.
As this particular person, Priya Duta, who by the way is also the culprit for scraped hubs on promoteyourhubs.com, and is clearly just copying and pasting from the feed, could be stopped if HP used one of those copy and paste blocker thingys. I have no idea of the correct term.
Maybe this is just wishful thinking, but I'm also wondering if G, who have been quite vocal about targeting plagiarists and spammers, are setting some kind of trap for the pescy thieves by allowing them to publish partial articles which offer absolutely no value to the reader. What use is a summary and a picture with a dead link which doesn't even take the reader to the article anyway?
I just checked those sites and and soft mania 4u looks like it's been taken down. It says the page is unavailable or has moved. The other one is just links. I clicked on a homemade pizza hub and was sent to the entire hub on HP. Just letting you know!
EDIT: I just went to the sites, they're still there - the one with the links still full of scraped stuff as well as links. Looks like they're both still up and active (and they're still appearing in the search results). I don't know if crappy little sites like those two are really going to be any kind of problem when the Hubs in question *do* finally get indexed, google might ignore or bury them anyway. Will have to wait and see ...
There was discussion about this possible issue on the official thread about Idle hubs. I guess it alerted the scrapers and off they went. Alarming. I hope something gets figured out about this new unpopular feature. Good luck.
@Redberrysky, have you managed to find e-mail address for the culprit? I've tried Whois with no success.
Hollie - no, it's a blogger site so there won't be a whois for them - I just went and filed a DMCA with Google and ticked the 'Blogger' option on this form: http://support.google.com/bin/static.py … page=ts.cs
and then filled in the details (it's a couple of pages long, but quick to fill in)
I did notice that somewhere on that form it mentioned that if the website is a repeat offender, you can contact google's DMCA agent, but I couldn't see the link for that - if you spot it, it might be worth posting here so we could just bring down these sort of sites in one fell swoop
@rebekahELLE - I know what you mean, and I hesitated before posting this, but feed-scrapers have been around for dick-docks - I had several stealing from my lame personal blog in 2007 within a month of setting it up - they go for huge, huge numbers of sites and don't care about the vulnerabilities, so I think it's worth sharing how to stop them even if it seems like we're alerting them.
I've just used Google Fetch in Webmaster Tools (under 'Health' and 'Fetch') to see if that indexes quicker - if you use it don't forget to click on 'submit to index' when the fetch status gets a green tick under it - I forgot and sat there like a muffin for ten minutes waiting for something to happen. Doh. I don't know how long it takes, still waiting and searching. This is what google recommends to get new pages indexed quicker: http://googlewebmastercentral.blogspot. … ch-as.html
I've been using WMT for some time now, and have never had it take more than 24 hours to index a new hub.
Prior to that I've had hubs take as much as 6 weeks (!) to index.
Maybe this thread could be moved to the "Report a Problem" forums as it appears that HP has given up monitoring the thread they started which announced the idled hubs feature. They need to know about this. It is quite problematic.
Cna't find the report a problem forum, only the report an ad problem forum.
Maybe it would be better off in the 'Report a technical Problem or Bug' forum (2nd forum down on the left hand side under 'Official Announcements') - don't know if you want to move this thread or start a new one so I didn't do it cos I don't know how to move a thread don't know if it's really a HP prob, they seem to leave copyright violations up to us, but maybe it's best they're made aware of it.
Yes, I'm sure they'll just say it's up to us to file our DMCAs etc. etc. but IMHO the indexing delay is a problem they need to take care of.
I've created a new thread. Even though I've notified HP by email, the issue may get picked up faster in the report a problem thread.
I'm not sure that the delay is fixable, I think it may be a consequence of automatically having a 'NoIndex' tag as a default for 24 hours on every new Hub - Squidoo does the same and I've been reading blog posts by a couple of their writers this morning and long indexing delays seem to be very common. I think (but I don't know, I haven't looked into it that deeply) that long-time prolific trusted members of Squidoo don't go into the pending status, they get featured ststus (or equivalent) straight away - I don't know if this means their stuff gets indexed quicker, but it would make sense if the 'NoIndex' tag is what's to blame for the delay. I'm just conjecturing - I'm thinking that if the spiders think a page doesn't want to be indexed, they won't return as often. Anyone know the inner workings of Google who has a more technical explanation (and which might help us find a solution!)?
This is true. It's one of the "rewards" for being a Giant Squid.
Tried to add my latest hub using WMT, Google cannot find the page? A 404 error, WTFudge does that mean? Is it that Hubpages is somehow blocking Google from crawling my hub?
TEN DAYS NOW!!!
I've been reading all the posts here and elsewhere. I am so glad I got my last hub published just under the Pending/ZZZ wire. Hopefully, this whole mess will be cleaned up by the time I get around to publishing my next hub.
Redberry, the first one is working now and appears to have the same lice story as the other. It's just useless feed to latest hubs. I'm not sure what the benefit for them is. The promote Hp blog has working links to HP. Note that I'm viewing all of this via kindle so maybe that changes things. Is there recourse?
I was also wondering what the benefit was. There appears to be no adds on the blog so the financial aspect for the scrapers is pretty lame to say the least.
One of them - 'promoteyour' does have adsense but it's not showing up on their site - possibly with the small snippets whatever triggers adsense isn't triggering for this site (I have a browser add-on that tells me what's on a site, like adsense, analytics etc) I've seen other sites that have adsense on them but don't have ads - I think the content is just deemed unworthy or something - possibly this is a tactic by G to cut down MFA sites.
The other sire doesn't have adverts at all, so I don't know what that's about, maybe it's someone playing around with code and tech to learn it, or maybe they're trying to use it for an adsense application.
Just checked both the sites, and both are down. I checked them from US as well as UK ip address. Its always irritating to know our original content and work being copied by cheap scrappers and the copied one outranks the source article in case of ranking as well as indexing. The trend and way is also changing accordingly. Now a days, creating an autoblog and scrapping contents from RSS feed of numerous niche related blogs is the easiest way out to make couple of $$ by displaying some sort of adsense ads. Google is working hard to clear these niche sites especially autoblogs.
Once, a guy scrapped my blog's content A-Z in his autoblog and i got a trackback notification. I went to his site and i got to know there was no way to contact him. Then i found out his information from who.is, whorush.com and some domain tools and warned him about the possible consequences he may suffer, and the next day, he removed that post.
We can protect our WordPress blog by a simple Wordpress SEO plugin by Yoast. We can customize the feed signature and heading of every post like Copyright© yourblog[dot]com or we can put "feed on plagiarism (postlink) by yourblog[dot]com (Bloglink)". Now, when these scrappers copy content from your feed, this message will also be copied and displayed on that blog. This will ensure you as being original author to visitors of that site as well as Search Engines. But, i do not see any way for hubpages. But, we can use "Copyright © - yourblog'surl" at the end of the hub, if HP allows it.
@posts - I just cleared my cache and cookies and everything, then searched for a term I know both sites show up for, and they're both still appearing in search results, and if I enter the homepage URL I get to their sites, which still seem active - but Denisemai said earlier that she found them to be down as well. I'm wondering if it might be a propagation thing; that the sites have been taken down but that's not propagated through the Internet fully yet (I think full propagation takes a few days). It might be worth keeping an eye on to see if they do get taken down, or even just de-indexed.
For copyright - I think HP allows us to put something in our Hubs (manually, I don't think there's an automated process), but there might be problems, e.g. if the copyright statement is a long-ish one, we could end up with duplicate flags; and I saw on another thread an HP staffer saying that enthusiastic copyright statements might be an advertisement to scrapers that the work is worth stealing. For myself, I think that these sort of feed-scrapers aren't a problem as long as the original article is ranking above the copied snippet, they're more of an irritation, but I'd DMCA them whenever I come across them copying my stuff because the real problems come when they're allowed to grow unabated and start to rank for keywords.
by Yvonne Spence4 years ago
I love HubPages, I feel that I have learned so much here and my confidence in writing non-fiction took such a huge leap when I was invited onto the Apprenticeship program last year. I have met some wonderful people here...
by Dr. John Anderson4 years ago
The Googlebot visits new hubs within 30-60 minutes while they are in 'Pending' mode and the NOINDEX tag is in place for 24 Hours. This means that the bot does not index the page, and does not return for days or weeks....
by Tony4 years ago
There appears to be some changes with the "problem" with google indexing some hubs under our domains and others through hubpages.com."suddenly" webmaster tools are showing that my pages are now...
by Hollie Thomas5 years ago
Unfortunately, both RedberrySky and myself have discovered that some of our content summaries have been scraped from the feed whilst they were in pending mode. I have notified HP, but you might want to check whether any...
by Paul Edmondson4 years ago
Hubbers, I'm sorry that we can't tell you why your traffic is going down or why Google was showing Hubs on hubpages.com and now has reverted to showing them on the subdomain. We are similarly frustrated. We...
by seamist8 years ago
Hi everyoneI have a question and a problem. I joined Hubpages the very end of September. When I first submit my hubs, they recieve some traffic, and I check after a few days to make sure they are indexed with Google. If...
Copyright © 2017 HubPages Inc. and respective owners.
Other product and company names shown may be trademarks of their respective owners.
HubPages® is a registered Service Mark of HubPages, Inc.
HubPages and Hubbers (authors) may earn revenue on this page based on affiliate relationships and advertisements with partners including Amazon, Google, and others.