ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

The wonders of GooHackle

Updated on October 21, 2010

The wonders of GooHackle

After stumbling onto an absolutely fascinating expose of most used Internet words courtesy of GooHackle.com, we certainly had to know more. GooHackle possibly represents the epitome of Google parsing and scraping services available to modern man.

What is Google Scraping?

Scraping Google involves the extraction and consolidation of useful information from Google.com web pages for supplemental applications and decision support. At least we envision that it does. Also encompassed in the process of scraping might be the intentional avoidance of special 'roadblocks' such as captcha technology and IP address screening. Google, along with many other internet-based web sites, implements mechanisms that some might describe as extreme. The captcha form, for example, attempts to limit access to some Google information by obligating users to visually process a specially engineered text string and correctly type that string back to the Google server before access is granted. Should a user prove incapable of accurately echoing the information, access is denied. Extreme data acquisition requires that captcha processing be automated.

GooHackle claims to possess technology to evade Google's captcha security. The home page of GooHackle.com provides a link to another GooHackle page explaining their breakthrough. Evidently the programmers at GooHackle realized that they could get any human to enter the captcha code, not just the human who actually wanted the information. Everytime they need to avoid a captcha, they capture that captcha from the Google page and present it to an internet user who obligingly translates it.

What can Goo do for you?

GooHackle also offers am extremely cool perl script that processes a Google search results page into a simple list of URLs (Universal Resource Locators.) This URL list can be funneled into other information processing tools for expedited SEO optimization and other useful applications. Google only offers the first 1000 URL results for any search, expanded into 100 pages of 10 links each. Instead of enduring the tedium of manually processing 100 pages of information, consider this script. Encounter a demo of the script output on GooHackle site: simply enter a search string, click a button, and sit back in amazement as a raw list of URLs is almost immediately returned.

Expect some resistance from Google. The type of scraping exploited by this tool is certainly frowned upon by the Google engineers and lawyers. Google possess an extreme affinity for the information they provide, despite claiming as their company motto the self-incriminating phrase Be Nice.

Should you get involved with Goo?

Enraptured with GooHackle yet? Some folks find themselves enthralled with the concept of 'defeating' Google in order to obtain useful Internet metrics. Everyone needs a hobby. We enlivened our otherwise dull day by testing the GooHackle Keyword Popularity Tool. In order to exercise the tool, we entered the single kyword phrase 'cars' and clicked a button on the form. In a few moments we observed that this keyword extends over 509,000,000 different web pages.

That's a lotta web pages.

GooHackle obviously didn't scan 509,000,000 web pages in order to extract this information. Evidently an efficient little perl script extracted the number from a Google results page. Regardless of the etymology of the number, it's an interesting parlor trick.

Comments

    0 of 8192 characters used
    Post Comment

    • nicomp profile image
      Author

      nicomp really 7 years ago from Ohio, USA

      @drbj: You got it! yay! It was fun for me, let's see if anyone else notices.

    • drbj profile image

      drbj and sherry 7 years ago from south Florida

      GooHackle, huh? I am enamored of that name.

    • nicomp profile image
      Author

      nicomp really 7 years ago from Ohio, USA

      @Stan Fletcher: Nice glasses.

    • Stan Fletcher profile image

      Stan Fletcher 7 years ago from Nashville, TN

      This was fascinating, enthralling, enlightening and well-written. Definitely the best info on GooHackle I've heard. And all this time I thought it was a new kind of fishing bait. "Jimmy Don, pass me some more of that GooHackle. That last fish that got away managed to get all the GooHackle off my hook before I lost him." As you can see, I'm on the cutting edge of all things technical.

    • Tom Whitworth profile image

      Tom Whitworth 7 years ago from Moundsville, WV

      nicomp,

      The only possible drawback I can see to a gob of Goo from GooHackel is that it turns out to be like Forrest Gump's box of chocolates. You never know what you're getting until you bight into it.

    • Wayne Brown profile image

      Wayne Brown 7 years ago from Texas

      I'm one of those guys who believes there is no reason to go to the back of the caves to see the bats if there are a few hanging around the entry. So far that has served me well so I imagine Goohackle will do little for me but thanks for serving it up in an informative and understandable way! WB

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: "https://hubpages.com/privacy-policy#gdpr"

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)