ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

The wonders of GooHackle

Updated on October 21, 2010

The wonders of GooHackle

After stumbling onto an absolutely fascinating expose of most used Internet words courtesy of GooHackle.com, we certainly had to know more. GooHackle possibly represents the epitome of Google parsing and scraping services available to modern man.

What is Google Scraping?

Scraping Google involves the extraction and consolidation of useful information from Google.com web pages for supplemental applications and decision support. At least we envision that it does. Also encompassed in the process of scraping might be the intentional avoidance of special 'roadblocks' such as captcha technology and IP address screening. Google, along with many other internet-based web sites, implements mechanisms that some might describe as extreme. The captcha form, for example, attempts to limit access to some Google information by obligating users to visually process a specially engineered text string and correctly type that string back to the Google server before access is granted. Should a user prove incapable of accurately echoing the information, access is denied. Extreme data acquisition requires that captcha processing be automated.

GooHackle claims to possess technology to evade Google's captcha security. The home page of GooHackle.com provides a link to another GooHackle page explaining their breakthrough. Evidently the programmers at GooHackle realized that they could get any human to enter the captcha code, not just the human who actually wanted the information. Everytime they need to avoid a captcha, they capture that captcha from the Google page and present it to an internet user who obligingly translates it.

What can Goo do for you?

GooHackle also offers am extremely cool perl script that processes a Google search results page into a simple list of URLs (Universal Resource Locators.) This URL list can be funneled into other information processing tools for expedited SEO optimization and other useful applications. Google only offers the first 1000 URL results for any search, expanded into 100 pages of 10 links each. Instead of enduring the tedium of manually processing 100 pages of information, consider this script. Encounter a demo of the script output on GooHackle site: simply enter a search string, click a button, and sit back in amazement as a raw list of URLs is almost immediately returned.

Expect some resistance from Google. The type of scraping exploited by this tool is certainly frowned upon by the Google engineers and lawyers. Google possess an extreme affinity for the information they provide, despite claiming as their company motto the self-incriminating phrase Be Nice.

Should you get involved with Goo?

Enraptured with GooHackle yet? Some folks find themselves enthralled with the concept of 'defeating' Google in order to obtain useful Internet metrics. Everyone needs a hobby. We enlivened our otherwise dull day by testing the GooHackle Keyword Popularity Tool. In order to exercise the tool, we entered the single kyword phrase 'cars' and clicked a button on the form. In a few moments we observed that this keyword extends over 509,000,000 different web pages.

That's a lotta web pages.

GooHackle obviously didn't scan 509,000,000 web pages in order to extract this information. Evidently an efficient little perl script extracted the number from a Google results page. Regardless of the etymology of the number, it's an interesting parlor trick.

working

This website uses cookies

As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://corp.maven.io/privacy-policy

Show Details
Necessary
HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
LoginThis is necessary to sign in to the HubPages Service.
Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
AkismetThis is used to detect comment spam. (Privacy Policy)
HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
Features
Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
MavenThis supports the Maven widget and search functionality. (Privacy Policy)
Marketing
Google AdSenseThis is an ad network. (Privacy Policy)
Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
Index ExchangeThis is an ad network. (Privacy Policy)
SovrnThis is an ad network. (Privacy Policy)
Facebook AdsThis is an ad network. (Privacy Policy)
Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
AppNexusThis is an ad network. (Privacy Policy)
OpenxThis is an ad network. (Privacy Policy)
Rubicon ProjectThis is an ad network. (Privacy Policy)
TripleLiftThis is an ad network. (Privacy Policy)
Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
Statistics
Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
ClickscoThis is a data management platform studying reader behavior (Privacy Policy)