The wonders of GooHackle
The wonders of GooHackle
After stumbling onto an absolutely fascinating expose of most used Internet words courtesy of GooHackle.com, we certainly had to know more. GooHackle possibly represents the epitome of Google parsing and scraping services available to modern man.
What is Google Scraping?
Scraping Google involves the extraction and consolidation of useful information from Google.com web pages for supplemental applications and decision support. At least we envision that it does. Also encompassed in the process of scraping might be the intentional avoidance of special 'roadblocks' such as captcha technology and IP address screening. Google, along with many other internet-based web sites, implements mechanisms that some might describe as extreme. The captcha form, for example, attempts to limit access to some Google information by obligating users to visually process a specially engineered text string and correctly type that string back to the Google server before access is granted. Should a user prove incapable of accurately echoing the information, access is denied. Extreme data acquisition requires that captcha processing be automated.
GooHackle claims to possess technology to evade Google's captcha security. The home page of GooHackle.com provides a link to another GooHackle page explaining their breakthrough. Evidently the programmers at GooHackle realized that they could get any human to enter the captcha code, not just the human who actually wanted the information. Everytime they need to avoid a captcha, they capture that captcha from the Google page and present it to an internet user who obligingly translates it.
What can Goo do for you?
GooHackle also offers am extremely cool perl script that processes a Google search results page into a simple list of URLs (Universal Resource Locators.) This URL list can be funneled into other information processing tools for expedited SEO optimization and other useful applications. Google only offers the first 1000 URL results for any search, expanded into 100 pages of 10 links each. Instead of enduring the tedium of manually processing 100 pages of information, consider this script. Encounter a demo of the script output on GooHackle site: simply enter a search string, click a button, and sit back in amazement as a raw list of URLs is almost immediately returned.
Expect some resistance from Google. The type of scraping exploited by this tool is certainly frowned upon by the Google engineers and lawyers. Google possess an extreme affinity for the information they provide, despite claiming as their company motto the self-incriminating phrase Be Nice.
Should you get involved with Goo?
Enraptured with GooHackle yet? Some folks find themselves enthralled with the concept of 'defeating' Google in order to obtain useful Internet metrics. Everyone needs a hobby. We enlivened our otherwise dull day by testing the GooHackle Keyword Popularity Tool. In order to exercise the tool, we entered the single kyword phrase 'cars' and clicked a button on the form. In a few moments we observed that this keyword extends over 509,000,000 different web pages.
That's a lotta web pages.
GooHackle obviously didn't scan 509,000,000 web pages in order to extract this information. Evidently an efficient little perl script extracted the number from a Google results page. Regardless of the etymology of the number, it's an interesting parlor trick.