create your own

reCAPTCHA! What it is, and what it does

65
rate or flag this page

By wyanjen


In the beginning, there were books.

Lots & lots of books.

Before the internet, a book was the best way to share information. Permanent, portable, printed pages. (Full disclosure: I am a printer. Who loves books. Try relaxing in a hot bath with a Kindle.) Important and interesting information has been rolling off the presses world-wide for hundreds of years.

Our generation has a different media. While some of us do carry library cards, many more of us are carrying WiFi cards. How do we get that inconceivable amount of printed data transferred to the web?

If you are picturing a warehouse full of monkeys typing away, well... actually that would be kinda cool. But wrong. Nope, the magic happens with scanners.


Is the phrase supposed to make sense?
Is the phrase supposed to make sense?


Books are scanned or photographed one page at a time. Once a page has been converted to a digital image, it is analyzed with OCR. Optical Character Recognition software identifies letters based on their shape.

If the quality of the scan is poor, the software may misinterpret a letter. Is that an i or an l? Add into the mix faded ink, yellowed paper and funky fonts, and you’ll get a fair amount of OCR failure.

Webmasters, bloggers, and retailers protect themselves and their users from spam and fraud by using a security system called CAPTCHA. The phrase, coined in 2000, is a contrived acronym: Completely Automated Public Turing test to tell Computers and Humans Apart.

A Captcha image requires visual perception, not simply recognition, to translate.

People can read it.

Bots can’t.


Should we be looking for a hidden meaning?
Should we be looking for a hidden meaning?


Luis von Ahn, an assistant professor of computer science at Carnegie Mellon University, was involved in the original development of Captcha. Mr. von Ahn was seeing excellent results from the security system. But, as he stated, “It takes about 10 seconds to type each Captcha. I realized that humanity as a whole is wasting 500,000 hours every day typing Captchas.”

OCR systems are only accurate up to 80% of the time when scanning older books and newspapers. Some pages have degraded. Some words are blurry. Through time, paper yellows and ink bleeds. The most sophisticated software is still not capable of perceiving difficult images.


This is more fun than listening to vinyl records backwards
This is more fun than listening to vinyl records backwards


Combining the successful Captcha security system with indecipherable words from scanned books, von Ahn developed reCaptcha in 2007.

reCaptcha serves up OCR failures to the best translators around: us. When you type in those two security words, you are doing more than proving yourself human. You are translating a blurry, distorted image into a word that the best OCR software could not decipher. It’s estimated that the dual-purpose reCaptchas are correcting more than 10 million words each day.


Right on!
Right on!


In September 2009, reCaptcha was purchased by Google. The master of search engine technology is using reCaptcha’s translating power to improve Google Books. Prior to the purchase the translated scans were used by The Open Content Alliance, a nonprofit group, to create the Internet Archive. The Internet Archive provides free access to over 1.25 million books. They are limited to public domain works, meaning books who’s copyrights have expired.

While continuing to catalog public domain titles, Google Books has also expanded access to include some in-copyright books and out of print books. Previews will be provided and the books will be available for purchase. Additionally, Google Books is working with libraries from Cornell, Harvard, and Oxford Universities, among many others, to allow full on-line access to their collections.

Ten seconds at a time, we are building our own digital library. Generations of works are being preserved and offered up for all people to access.

What a legacy.


The reCaptchas are communicating with us...
The reCaptchas are communicating with us...

Print   —   Rate it:  up  down  flag this hub

Comments

RSS for comments on this Hub

jacobkuttyta profile image

jacobkuttyta  says:
2 months ago

Nice hub, well written.

wyanjen profile image

wyanjen  says:
2 months ago

Thanks for checking it out jacob!

Submit a Comment

Members and Guests

Sign in or sign up and post using a hubpages account.


optional


  • No HTML is allowed in comments, but URLs will be hyperlinked
  • Comments are not for promoting your hubs or other sites

working