Google indexing delays and copied content problem

Jump to Last Post 1-8 of 8 discussions (37 posts)
  1. Redberry Sky profile image89
    Redberry Skyposted 11 years ago

    I’ve had a hub scraped by 2 different sites in the time between when a hub goes out of ‘pending’ into ‘featured’ and when google indexes that hub.  The copies are ranking above me for a couple of terms – the scraped articles were indexed 18 hours before mine was [EDIT: I've just checked - my original article actually still isn't indexed, it was a different article showing up that was linked to the newer one that made me think it had finally been indexed).  I’ve filed DCMAs, but I’m wondering if this is going to become a major headache because of the thousands of automatic scraper sites and lengthy indexing delays.  Both the sites that scraped me are doing the same to all the hubs on HP (or a lot, it’s hard to say) – both are blogspot blogs, one is called soft mania 4 u (all one word), the other is promote your hubpages (all one word).  The scraped articles aren’t even full articles – just the title, summary text and first photo – but still ranking above my original for some terms. 

    Is there any way to get our hubs indexed quicker once they go to ‘featured’ status?  Will Google see our hubs as copies?

    I know squidoo has a similar problem getting lenses crawled quickly (I read a couple of blog posts by writers there) – if anyone writes on there, have you found the indexing delays and copying thing to be a real problem and is there any way around it?

    (just to note that before the new 'pending' feature came in, my Hubs were getting indexed within an hour or two, sometimes within minutes, and scraping/ranking wasn't a problem).

    1. Hollie Thomas profile image61
      Hollie Thomasposted 11 years agoin reply to this

      One of mine which I believe is still yet to be indexed I have also found on Softmania, although it's only offering the summary and appears to give the link at the bottom to the article. Odd.

      1. Hollie Thomas profile image61
        Hollie Thomasposted 11 years agoin reply to this

        EDIT: When you copy and paste the link at the bottom it just brings you to the HP feed, not even the article.

      2. Redberry Sky profile image89
        Redberry Skyposted 11 years agoin reply to this

        On the bottom of mine (also only a summary - they're scraping from HP's 'Latest' feed so they just get the title, summary, and first photo) there's a (non-live) URL to HP's 'Latest' Hubs category, but no link to my original Hub.  Some of their little copies of bits and pieces seem to be ranking relatively highly (considering how short and stolen they are).  As you say - it's all a bit odd.

        1. Redberry Sky profile image89
          Redberry Skyposted 11 years agoin reply to this

          EDIT: ha! you beat me to it! (slow typing ... must try harder) smile

          1. Hollie Thomas profile image61
            Hollie Thomasposted 11 years agoin reply to this

            smile I'm actually quite furious about this. DCMA's to be completed methinks!

            1. Redberry Sky profile image89
              Redberry Skyposted 11 years agoin reply to this

              Me too, I'm absolutely fuming.  I'm wondering how to get hubs indexed quicker - it wouldn't be a problem if ours were indexed before the scrapers (I'd still be irritated, but as long as we were ranking higher it wouldn't be so much of a problem). I've seen suggestions like blogging links to hubs and tweeting them etc, but I wonder if that might go against us, post-penguin.

              1. Hollie Thomas profile image61
                Hollie Thomasposted 11 years agoin reply to this

                I'm wondering the same thing, particularly as my scraped hub has still not been indexed. Recently, I've been manually submitting new hubs to Bing and Yahoo, I'm hoping that if nothing else this provides more evidence of date of publication.

                As this particular person, Priya Duta, who by the way is also the culprit for scraped hubs on promoteyourhubs.com, and is clearly just copying and pasting from the feed, could be stopped if HP used one of those copy and paste blocker thingys. I have no idea of the correct term.

                Maybe this is just wishful thinking, but I'm also wondering if G, who have been quite vocal about targeting plagiarists and spammers, are setting some kind of trap for the pescy thieves by allowing them to publish partial articles  which offer absolutely no value to the reader. What use is a summary and a picture with a dead link which doesn't even take the reader to the article anyway?

    2. denisemai profile image67
      denisemaiposted 11 years agoin reply to this

      I just checked those sites and and soft mania 4u looks like it's been taken down. It says the page is unavailable or has moved. The other one is just links. I clicked on a homemade pizza hub and was sent to the entire hub on HP. Just letting you know!

      1. Redberry Sky profile image89
        Redberry Skyposted 11 years agoin reply to this

        Cool smile cheers denisemai smile

        1. Redberry Sky profile image89
          Redberry Skyposted 11 years agoin reply to this

          EDIT: I just went to the sites, they're still there - the one with the links still full of scraped stuff as well as links.  Looks like they're both still up and active (and they're still appearing in the search results).  I don't know if crappy little sites like those two are really going to be any kind of problem when the Hubs in question *do* finally get indexed, google might ignore or bury them anyway.  Will have to wait and see ...

  2. rebekahELLE profile image85
    rebekahELLEposted 11 years ago

    There was discussion about this possible issue on the official thread about Idle hubs. I guess it alerted the scrapers and off they went.  Alarming.  I hope something gets figured out about this new unpopular feature. Good luck.

    1. Hollie Thomas profile image61
      Hollie Thomasposted 11 years agoin reply to this

      Thanks RebekahElle,

      @Redberrysky, have you managed to find e-mail address for the culprit? I've tried Whois with no success.

      1. Redberry Sky profile image89
        Redberry Skyposted 11 years agoin reply to this

        Hollie - no, it's a blogger site so there won't be a whois for them - I just went and filed a DMCA with Google and ticked the 'Blogger' option on this form: http://support.google.com/bin/static.py … page=ts.cs

        and then filled in the details (it's a couple of pages long, but quick to fill in)

        I did notice that somewhere on that form it mentioned that if the website is a repeat offender, you can contact google's DMCA agent, but I couldn't see the link for that - if you spot it, it might be worth posting here so we could just bring down these sort of sites in one fell swoop smile

        @rebekahELLE - I know what you mean, and I hesitated before posting this, but feed-scrapers have been around for dick-docks - I had several stealing from my lame personal blog in 2007 within a month of setting it up - they go for huge, huge numbers of sites and don't care about the vulnerabilities, so I think it's worth sharing how to stop them even if it seems like we're alerting them.

        I've just used Google Fetch in Webmaster Tools (under 'Health' and 'Fetch') to see if that indexes quicker - if you use it don't forget to click on 'submit to index' when the fetch status gets a green tick under it - I forgot and sat there like a muffin for ten minutes waiting for something to happen.  Doh. I don't know how long it takes, still waiting and searching.  This is what google recommends to get new pages indexed quicker: http://googlewebmastercentral.blogspot. … ch-as.html

        1. Hollie Thomas profile image61
          Hollie Thomasposted 11 years agoin reply to this

          Thanks Redberrysky,

          I'm on to it now. smile

        2. wilderness profile image95
          wildernessposted 11 years agoin reply to this

          I've been using WMT for some time now, and have never had it take more than 24 hours to index a new hub.

          Prior to that I've had hubs take as much as 6 weeks (!) to index.

          1. Hollie Thomas profile image61
            Hollie Thomasposted 11 years agoin reply to this

            What is WMT, Wilderness?

            1. wordscribe43 profile image91
              wordscribe43posted 11 years agoin reply to this

              Webmaster Tools

              1. Hollie Thomas profile image61
                Hollie Thomasposted 11 years agoin reply to this

                I realised some time after I'd asked the question. Feel daft now. smile

  3. SmartAndFun profile image94
    SmartAndFunposted 11 years ago

    Maybe this thread could be moved to the "Report a Problem" forums as it appears that HP has given up monitoring the thread they started which announced the idled hubs feature. They need to know about this. It is quite problematic.

    1. Hollie Thomas profile image61
      Hollie Thomasposted 11 years agoin reply to this

      Good idea, I'll start the thread.

    2. Hollie Thomas profile image61
      Hollie Thomasposted 11 years agoin reply to this

      Cna't find the report a problem forum, only the report an ad problem forum.

      1. Redberry Sky profile image89
        Redberry Skyposted 11 years agoin reply to this

        Maybe it would be better off in the 'Report a technical Problem or Bug' forum (2nd forum down on the left hand side under 'Official Announcements') - don't know if you want to move this thread or start a new one so I didn't do it cos I don't know how to move a thread smile don't know if it's really a HP prob, they seem to leave copyright violations up to us, but maybe it's best they're made aware of it.

        1. Hollie Thomas profile image61
          Hollie Thomasposted 11 years agoin reply to this

          I'll try it there. I contacted HP this morning about this issue. Also, more hubbers may want to check whether the content has also been stolen.

  4. SmartAndFun profile image94
    SmartAndFunposted 11 years ago

    Yes, I'm sure they'll just say it's up to us to file our DMCAs etc. etc. but IMHO the indexing delay is a problem they need to take care of.

    1. Hollie Thomas profile image61
      Hollie Thomasposted 11 years agoin reply to this

      I've created a new thread. Even though I've notified HP by email, the issue may get picked up faster in the report a problem thread.

    2. Redberry Sky profile image89
      Redberry Skyposted 11 years agoin reply to this

      I'm not sure that the delay is fixable, I think it may be a consequence of automatically having a 'NoIndex' tag as a default for 24 hours on every new Hub - Squidoo does the same and I've been reading blog posts by a couple of their writers this morning and long indexing delays seem to be very common.  I think (but I don't know, I haven't looked into it that deeply) that long-time prolific trusted members of Squidoo don't go into the pending status, they get featured ststus (or equivalent) straight away - I don't know if this means their stuff gets indexed quicker, but it would make sense if the 'NoIndex' tag is what's to blame for the delay.  I'm just conjecturing - I'm thinking that if the spiders think a page doesn't want to be indexed, they won't return as often.  Anyone know the inner workings of Google who has a more technical explanation (and which might help us find a solution!)?

      1. relache profile image72
        relacheposted 11 years agoin reply to this

        This is true.  It's one of the "rewards" for being a Giant Squid.

        1. Redberry Sky profile image89
          Redberry Skyposted 11 years agoin reply to this

          Thanks Relache.  Do Giant Squids get indexed quicker? - if they do it would definitely suggest that Google doesn't recrawl previously NoIndexed content as much.  I wonder if HP would consider bringing something like that in.

  5. Reality Bytes profile image74
    Reality Bytesposted 11 years ago

    Tried to add my latest hub using WMT, Google cannot find the page?  A 404 error, WTFudge does that mean? Is it that Hubpages is somehow blocking Google from crawling my hub?

    TEN DAYS NOW!!!

  6. paradigmsearch profile image60
    paradigmsearchposted 11 years ago

    I've been reading all the posts here and elsewhere. I am so glad I got my last hub published just under the Pending/ZZZ wire. Hopefully, this whole mess will be cleaned up by the time I get around to publishing my next hub.

    1. Reality Bytes profile image74
      Reality Bytesposted 11 years agoin reply to this

      I will not publish a hub, in fact I will not produce one word here until I can be confident that my work is not going to sit in limbo indefinitely.  My patience has struck an impasse!

  7. denisemai profile image67
    denisemaiposted 11 years ago

    Redberry, the first one is working now and appears to have the same lice story as the other. It's just useless feed to latest hubs. I'm not sure what the benefit for them is. The promote Hp blog has working links to HP. Note that I'm viewing all of this via kindle so maybe that changes things. Is there recourse?

    1. Hollie Thomas profile image61
      Hollie Thomasposted 11 years agoin reply to this

      I was also wondering what the benefit was. There appears to be no adds on the blog so the financial aspect for the scrapers is pretty lame to say the least.

      1. Redberry Sky profile image89
        Redberry Skyposted 11 years agoin reply to this

        One of them - 'promoteyour' does have adsense but it's not showing up on their site - possibly with the small snippets whatever triggers adsense isn't triggering for this site (I have a browser add-on that tells me what's on a site, like adsense, analytics etc) I've seen other sites that have adsense on them but don't have ads - I think the content is just deemed unworthy or something - possibly this is a tactic by G to cut down MFA sites. 

        The other sire doesn't have adverts at all, so I don't know what that's about, maybe it's someone playing around with code and tech to learn it, or maybe they're trying to use it for an adsense application.

  8. posts profile image71
    postsposted 11 years ago

    Just checked both the sites, and both are down. I checked them from US as well as UK ip address. Its always irritating to know our original content and work being copied by cheap scrappers and the copied one outranks the source article in case of ranking as well as indexing. The trend and way is also changing accordingly. Now a days, creating an autoblog and scrapping contents from RSS feed of numerous niche related blogs is the easiest way out to make couple of $$ by displaying some sort of adsense ads. Google is working hard to clear these niche sites especially autoblogs.
    Once, a guy scrapped my blog's content A-Z in his autoblog and i got a trackback notification. I went to his site and i got to know there was no way to contact him. Then i found out his information from who.is, whorush.com and some domain tools and warned him about the possible consequences he may suffer, and the next day, he removed that post.
    We can protect our WordPress blog by a simple Wordpress SEO plugin by Yoast. We can customize the feed signature and heading of every post like Copyright© yourblog[dot]com or we can put "feed on plagiarism (postlink) by yourblog[dot]com (Bloglink)". Now, when these scrappers copy content from your feed, this message will also be copied and displayed on that blog. This will ensure you as being original author to visitors of that site as well as Search Engines. But, i do not see any way for hubpages. But, we can use "Copyright © - yourblog'surl" at the end of the hub, if HP allows it.

    1. Redberry Sky profile image89
      Redberry Skyposted 11 years agoin reply to this

      @posts - I just cleared my cache and cookies and everything, then searched for a term I know both sites show up for, and they're both still appearing in search results, and if I enter the homepage URL I get to their sites, which still seem active - but Denisemai said earlier that she found them to be down as well.  I'm wondering if it might be a propagation thing; that the sites have been taken down but that's not propagated through the Internet fully yet (I think full propagation takes a few days).  It might be worth keeping an eye on to see if they do get taken down, or even just de-indexed. 

      For copyright - I think HP allows us to put something in our Hubs (manually, I don't think there's an automated process), but there might be problems, e.g. if the copyright statement is a long-ish one, we could end up with duplicate flags; and I saw on another thread an HP staffer saying that enthusiastic copyright statements might be an advertisement to scrapers that the work is worth stealing.  For myself, I think that these sort of feed-scrapers aren't a problem as long as the original article is ranking above the copied snippet, they're more of an irritation, but I'd DMCA them whenever I come across them copying my stuff because the real problems come when they're allowed to grow unabated and start to rank for keywords.

 
working

This website uses cookies

As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://corp.maven.io/privacy-policy

Show Details
Necessary
HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
LoginThis is necessary to sign in to the HubPages Service.
Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
AkismetThis is used to detect comment spam. (Privacy Policy)
HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
Features
Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
MavenThis supports the Maven widget and search functionality. (Privacy Policy)
Marketing
Google AdSenseThis is an ad network. (Privacy Policy)
Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
Index ExchangeThis is an ad network. (Privacy Policy)
SovrnThis is an ad network. (Privacy Policy)
Facebook AdsThis is an ad network. (Privacy Policy)
Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
AppNexusThis is an ad network. (Privacy Policy)
OpenxThis is an ad network. (Privacy Policy)
Rubicon ProjectThis is an ad network. (Privacy Policy)
TripleLiftThis is an ad network. (Privacy Policy)
Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
Statistics
Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
ClickscoThis is a data management platform studying reader behavior (Privacy Policy)