Google indexing delays and copied content problem

Jump to Last Post 1-8 of 8 discussions (37 posts)

85
Redberry Skyposted 13 years ago
I’ve had a hub scraped by 2 different sites in the time between when a hub goes out of ‘pending’ into ‘featured’ and when google indexes that hub. The copies are ranking above me for a couple of terms – the scraped articles were indexed 18 hours before mine was [EDIT: I've just checked - my original article actually still isn't indexed, it was a different article showing up that was linked to the newer one that made me think it had finally been indexed). I’ve filed DCMAs, but I’m wondering if this is going to become a major headache because of the thousands of automatic scraper sites and lengthy indexing delays. Both the sites that scraped me are doing the same to all the hubs on HP (or a lot, it’s hard to say) – both are blogspot blogs, one is called soft mania 4 u (all one word), the other is promote your hubpages (all one word). The scraped articles aren’t even full articles – just the title, summary text and first photo – but still ranking above my original for some terms.

Is there any way to get our hubs indexed quicker once they go to ‘featured’ status? Will Google see our hubs as copies?

I know squidoo has a similar problem getting lenses crawled quickly (I read a couple of blog posts by writers there) – if anyone writes on there, have you found the indexing delays and copying thing to be a real problem and is there any way around it?

(just to note that before the new 'pending' feature came in, my Hubs were getting indexed within an hour or two, sometimes within minutes, and scraping/ranking wasn't a problem).
reply report
1. 61
  Hollie Thomasposted 13 years agoin reply to this
  One of mine which I believe is still yet to be indexed I have also found on Softmania, although it's only offering the summary and appears to give the link at the bottom to the article. Odd.
  reply report
  61
  Hollie Thomasposted 13 years agoin reply to this
  EDIT: When you copy and paste the link at the bottom it just brings you to the HP feed, not even the article.
  reply report
  85
  Redberry Skyposted 13 years agoin reply to this
  On the bottom of mine (also only a summary - they're scraping from HP's 'Latest' feed so they just get the title, summary, and first photo) there's a (non-live) URL to HP's 'Latest' Hubs category, but no link to my original Hub. Some of their little copies of bits and pieces seem to be ranking relatively highly (considering how short and stolen they are). As you say - it's all a bit odd.
  reply report
  85
  Redberry Skyposted 13 years agoin reply to this
  EDIT: ha! you beat me to it! (slow typing ... must try harder)
  reply report
  61
  Hollie Thomasposted 13 years agoin reply to this
  I'm actually quite furious about this. DCMA's to be completed methinks!
  reply report
  85
  Redberry Skyposted 13 years agoin reply to this
  Me too, I'm absolutely fuming. I'm wondering how to get hubs indexed quicker - it wouldn't be a problem if ours were indexed before the scrapers (I'd still be irritated, but as long as we were ranking higher it wouldn't be so much of a problem). I've seen suggestions like blogging links to hubs and tweeting them etc, but I wonder if that might go against us, post-penguin.
  reply report
  61
  Hollie Thomasposted 13 years agoin reply to this
  I'm wondering the same thing, particularly as my scraped hub has still not been indexed. Recently, I've been manually submitting new hubs to Bing and Yahoo, I'm hoping that if nothing else this provides more evidence of date of publication.
  
  As this particular person, Priya Duta, who by the way is also the culprit for scraped hubs on promoteyourhubs.com, and is clearly just copying and pasting from the feed, could be stopped if HP used one of those copy and paste blocker thingys. I have no idea of the correct term.
  
  Maybe this is just wishful thinking, but I'm also wondering if G, who have been quite vocal about targeting plagiarists and spammers, are setting some kind of trap for the pescy thieves by allowing them to publish partial articles which offer absolutely no value to the reader. What use is a summary and a picture with a dead link which doesn't even take the reader to the article anyway?
  reply report
2. 71
  denisemaiposted 13 years agoin reply to this
  I just checked those sites and and soft mania 4u looks like it's been taken down. It says the page is unavailable or has moved. The other one is just links. I clicked on a homemade pizza hub and was sent to the entire hub on HP. Just letting you know!
  reply report
  85
  Redberry Skyposted 13 years agoin reply to this
  Cool cheers denisemai
  reply report
  85
  Redberry Skyposted 13 years agoin reply to this
  EDIT: I just went to the sites, they're still there - the one with the links still full of scraped stuff as well as links. Looks like they're both still up and active (and they're still appearing in the search results). I don't know if crappy little sites like those two are really going to be any kind of problem when the Hubs in question *do* finally get indexed, google might ignore or bury them anyway. Will have to wait and see ...
  reply report
86
rebekahELLEposted 13 years ago
There was discussion about this possible issue on the official thread about Idle hubs. I guess it alerted the scrapers and off they went. Alarming. I hope something gets figured out about this new unpopular feature. Good luck.
reply report
1. 61
  Hollie Thomasposted 13 years agoin reply to this
  Thanks RebekahElle,
  
  @Redberrysky, have you managed to find e-mail address for the culprit? I've tried Whois with no success.
  reply report
  85
  Redberry Skyposted 13 years agoin reply to this
  Hollie - no, it's a blogger site so there won't be a whois for them - I just went and filed a DMCA with Google and ticked the 'Blogger' option on this form: http://support.google.com/bin/static.py … page=ts.cs
  
  and then filled in the details (it's a couple of pages long, but quick to fill in)
  
  I did notice that somewhere on that form it mentioned that if the website is a repeat offender, you can contact google's DMCA agent, but I couldn't see the link for that - if you spot it, it might be worth posting here so we could just bring down these sort of sites in one fell swoop
  
  @rebekahELLE - I know what you mean, and I hesitated before posting this, but feed-scrapers have been around for dick-docks - I had several stealing from my lame personal blog in 2007 within a month of setting it up - they go for huge, huge numbers of sites and don't care about the vulnerabilities, so I think it's worth sharing how to stop them even if it seems like we're alerting them.
  
  I've just used Google Fetch in Webmaster Tools (under 'Health' and 'Fetch') to see if that indexes quicker - if you use it don't forget to click on 'submit to index' when the fetch status gets a green tick under it - I forgot and sat there like a muffin for ten minutes waiting for something to happen. Doh. I don't know how long it takes, still waiting and searching. This is what google recommends to get new pages indexed quicker: http://googlewebmastercentral.blogspot. … ch-as.html
  reply report
  61
  Hollie Thomasposted 13 years agoin reply to this
  Thanks Redberrysky,
  
  I'm on to it now.
  reply report
  81
  wildernessposted 13 years agoin reply to this
  I've been using WMT for some time now, and have never had it take more than 24 hours to index a new hub.
  
  Prior to that I've had hubs take as much as 6 weeks (!) to index.
  reply report
  61
  Hollie Thomasposted 13 years agoin reply to this
  What is WMT, Wilderness?
  reply report
  92
  wordscribe43posted 13 years agoin reply to this
  Webmaster Tools
  reply report
  61
  Hollie Thomasposted 13 years agoin reply to this
  I realised some time after I'd asked the question. Feel daft now.
  reply report
59
SmartAndFunposted 13 years ago
Maybe this thread could be moved to the "Report a Problem" forums as it appears that HP has given up monitoring the thread they started which announced the idled hubs feature. They need to know about this. It is quite problematic.
reply report
1. 61
  Hollie Thomasposted 13 years agoin reply to this
  Good idea, I'll start the thread.
  reply report
2. 61
  Hollie Thomasposted 13 years agoin reply to this
  Cna't find the report a problem forum, only the report an ad problem forum.
  reply report
  85
  Redberry Skyposted 13 years agoin reply to this
  Maybe it would be better off in the 'Report a technical Problem or Bug' forum (2nd forum down on the left hand side under 'Official Announcements') - don't know if you want to move this thread or start a new one so I didn't do it cos I don't know how to move a thread don't know if it's really a HP prob, they seem to leave copyright violations up to us, but maybe it's best they're made aware of it.
  reply report
  61
  Hollie Thomasposted 13 years agoin reply to this
  I'll try it there. I contacted HP this morning about this issue. Also, more hubbers may want to check whether the content has also been stolen.
  reply report
59
SmartAndFunposted 13 years ago
Yes, I'm sure they'll just say it's up to us to file our DMCAs etc. etc. but IMHO the indexing delay is a problem they need to take care of.
reply report
1. 61
  Hollie Thomasposted 13 years agoin reply to this
  I've created a new thread. Even though I've notified HP by email, the issue may get picked up faster in the report a problem thread.
  reply report
2. 85
  Redberry Skyposted 13 years agoin reply to this
  I'm not sure that the delay is fixable, I think it may be a consequence of automatically having a 'NoIndex' tag as a default for 24 hours on every new Hub - Squidoo does the same and I've been reading blog posts by a couple of their writers this morning and long indexing delays seem to be very common. I think (but I don't know, I haven't looked into it that deeply) that long-time prolific trusted members of Squidoo don't go into the pending status, they get featured ststus (or equivalent) straight away - I don't know if this means their stuff gets indexed quicker, but it would make sense if the 'NoIndex' tag is what's to blame for the delay. I'm just conjecturing - I'm thinking that if the spiders think a page doesn't want to be indexed, they won't return as often. Anyone know the inner workings of Google who has a more technical explanation (and which might help us find a solution!)?
  reply report
  67
  relacheposted 13 years agoin reply to this
  This is true. It's one of the "rewards" for being a Giant Squid.
  reply report
  85
  Redberry Skyposted 13 years agoin reply to this
  Thanks Relache. Do Giant Squids get indexed quicker? - if they do it would definitely suggest that Google doesn't recrawl previously NoIndexed content as much. I wonder if HP would consider bringing something like that in.
  reply report
69
Reality Bytesposted 13 years ago
Tried to add my latest hub using WMT, Google cannot find the page? A 404 error, WTFudge does that mean? Is it that Hubpages is somehow blocking Google from crawling my hub?

TEN DAYS NOW!!!
reply report
60
paradigmsearchposted 13 years ago
I've been reading all the posts here and elsewhere. I am so glad I got my last hub published just under the Pending/ZZZ wire. Hopefully, this whole mess will be cleaned up by the time I get around to publishing my next hub.
reply report
1. 69
  Reality Bytesposted 13 years agoin reply to this
  I will not publish a hub, in fact I will not produce one word here until I can be confident that my work is not going to sit in limbo indefinitely. My patience has struck an impasse!
  reply report
71
denisemaiposted 13 years ago
Redberry, the first one is working now and appears to have the same lice story as the other. It's just useless feed to latest hubs. I'm not sure what the benefit for them is. The promote Hp blog has working links to HP. Note that I'm viewing all of this via kindle so maybe that changes things. Is there recourse?
reply report
1. 61
  Hollie Thomasposted 13 years agoin reply to this
  I was also wondering what the benefit was. There appears to be no adds on the blog so the financial aspect for the scrapers is pretty lame to say the least.
  reply report
  85
  Redberry Skyposted 13 years agoin reply to this
  One of them - 'promoteyour' does have adsense but it's not showing up on their site - possibly with the small snippets whatever triggers adsense isn't triggering for this site (I have a browser add-on that tells me what's on a site, like adsense, analytics etc) I've seen other sites that have adsense on them but don't have ads - I think the content is just deemed unworthy or something - possibly this is a tactic by G to cut down MFA sites.
  
  The other sire doesn't have adverts at all, so I don't know what that's about, maybe it's someone playing around with code and tech to learn it, or maybe they're trying to use it for an adsense application.
  reply report
69
postsposted 13 years ago
Just checked both the sites, and both are down. I checked them from US as well as UK ip address. Its always irritating to know our original content and work being copied by cheap scrappers and the copied one outranks the source article in case of ranking as well as indexing. The trend and way is also changing accordingly. Now a days, creating an autoblog and scrapping contents from RSS feed of numerous niche related blogs is the easiest way out to make couple of $$ by displaying some sort of adsense ads. Google is working hard to clear these niche sites especially autoblogs.
Once, a guy scrapped my blog's content A-Z in his autoblog and i got a trackback notification. I went to his site and i got to know there was no way to contact him. Then i found out his information from who.is, whorush.com and some domain tools and warned him about the possible consequences he may suffer, and the next day, he removed that post.
We can protect our WordPress blog by a simple Wordpress SEO plugin by Yoast. We can customize the feed signature and heading of every post like Copyright© yourblog[dot]com or we can put "feed on plagiarism (postlink) by yourblog[dot]com (Bloglink)". Now, when these scrappers copy content from your feed, this message will also be copied and displayed on that blog. This will ensure you as being original author to visitors of that site as well as Search Engines. But, i do not see any way for hubpages. But, we can use "Copyright © - yourblog'surl" at the end of the hub, if HP allows it.
reply report
1. 85
  Redberry Skyposted 13 years agoin reply to this
  @posts - I just cleared my cache and cookies and everything, then searched for a term I know both sites show up for, and they're both still appearing in search results, and if I enter the homepage URL I get to their sites, which still seem active - but Denisemai said earlier that she found them to be down as well. I'm wondering if it might be a propagation thing; that the sites have been taken down but that's not propagated through the Internet fully yet (I think full propagation takes a few days). It might be worth keeping an eye on to see if they do get taken down, or even just de-indexed.
  
  For copyright - I think HP allows us to put something in our Hubs (manually, I don't think there's an automated process), but there might be problems, e.g. if the copyright statement is a long-ish one, we could end up with duplicate flags; and I saw on another thread an HP staffer saying that enthusiastic copyright statements might be an advertisement to scrapers that the work is worth stealing. For myself, I think that these sort of feed-scrapers aren't a problem as long as the original article is ranking above the copied snippet, they're more of an irritation, but I'd DMCA them whenever I come across them copying my stuff because the real problems come when they're allowed to grow unabated and start to rank for keywords.
  reply report

Post a Reply

jump to first post

Necessary
HubPages Device ID	This is used to identify particular browsers or devices when the access the service, and is used for security reasons.
Login	This is necessary to sign in to the HubPages Service.
Google Recaptcha	This is used to prevent bots and spam. (Privacy Policy)
Akismet	This is used to detect comment spam. (Privacy Policy)
HubPages Google Analytics	This is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic Pixel	This is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web Services	This is a cloud services platform that we used to host our service. (Privacy Policy)
Cloudflare	This is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted Libraries	Javascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)

Features
Google Custom Search	This is feature allows you to search the site. (Privacy Policy)
Google Maps	Some articles have Google Maps embedded in them. (Privacy Policy)
Google Charts	This is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host API	This service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTube	Some articles have YouTube videos embedded in them. (Privacy Policy)
Vimeo	Some articles have Vimeo videos embedded in them. (Privacy Policy)
Paypal	This is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook Login	You can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
Maven	This supports the Maven widget and search functionality. (Privacy Policy)

Marketing
Google AdSense	This is an ad network. (Privacy Policy)
Google DoubleClick	Google provides ad serving technology and runs an ad network. (Privacy Policy)
Index Exchange	This is an ad network. (Privacy Policy)
Sovrn	This is an ad network. (Privacy Policy)
Facebook Ads	This is an ad network. (Privacy Policy)
Amazon Unified Ad Marketplace	This is an ad network. (Privacy Policy)
AppNexus	This is an ad network. (Privacy Policy)
Openx	This is an ad network. (Privacy Policy)
Rubicon Project	This is an ad network. (Privacy Policy)
TripleLift	This is an ad network. (Privacy Policy)
Say Media	We partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing Pixels	We may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking Pixels	We may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.

Statistics
Author Google Analytics	This is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
Comscore	ComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking Pixel	Some articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
Clicksco	This is a data management platform studying reader behavior (Privacy Policy)

Google indexing delays and copied content problem

Arts and Design

Autos

Books, Literature, and Writing

Business and Employment

Education and Science

Entertainment and Media

Family and Parenting

Fashion and Beauty

Food and Cooking

Games, Toys, and Hobbies

Gender and Relationships

Health

Holidays and Celebrations

Home and Garden

HubPages Tutorials and Community

Personal Finance

Pets and Animals

Politics and Social Issues

Religion and Philosophy

Sports and Recreation

Technology

Travel and Places

About Us

This website uses cookies

Google indexing delays and copied content problem

Related Discussions

This website uses cookies