- HubPages»
- Technology»
- Computers & Software»
- Computer Software
The Best OCR Software Tool To Convert Scanned Handwritten, Typewritten or Printed Text Into Documents
Optical Character Recognition (OCR)
It is a system of converting scanned printed/handwritten image files into its machine readable text format. OCR systems require calibration to read a specific font; early versions needed to be programmed with images of each character, and worked on one font at a time. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common. Some systems are capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.
OCR software works by analyzing a document and comparing it with fonts stored in its database and/or by noting features typical to characters. Some OCR software also puts it through a spell checker to “guess” unrecognized words. OCR tools come with their own limitations. And scanning a page has to do a lot with resolutions, contrasts and clarity of fonts. From an average user’s standpoint, 100% accuracy is difficult to achieve, but close approximation is what most software strive for.
We will be looking at two OCR software. Microsoft OneNote, the overlooked and probably installed on your system and FreeOCR, the software that uses tesseract-ocr that is considered one of the most accurate free software OCR engines currently available.
Microsoft OneNote
Microsoft OneNote
For the occasional basic OCR stuff, Microsoft OneNote’s Optical Character Recognition feature is a time-saver. You might have missed it, it’s called "Copy Text from Picture".
Drag a scan or a saved picture into Microsoft OneNote. You can also use OneNote to clip part of the screen or an image into Microsoft OneNote.
Right click on the inserted picture and select "Copy Text from Picture". The copied optically recognized text goes into the clipboard and you can now paste it into any program like Microsoft Word or Notepad.
OneNote is simplicity personified. But it’s not too great for handwritten characters or even fuzzy ones. But for a quick job, I am all for Microsoft OneNote’s clip and paste.
FreeOCR
FreeOCR
This free OCR software uses tesseract-ocr, an OCR Engine that was developed at HP Labs between 1985 and 1995... and now at Google. The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images.
FreeOCR is a simple Windows interface for that underlying code. It supports most image files and multi-page TIFF files. It can handle PDF formats and is also compatible with TWAIN devices like scanners. FreeOCR also has the familiar double window interface with easy to understand settings. Before starting the one click conversion process, you can adjust the image contrast for better readability.
FreeOCR is a complete scan and OCR program including the Windows compiled Tesseract free ocr engine. FreeOCR is small, simple and easy-to-use, and it includes a Windows installer and supports multi-page tiff's, fax documents as well as most image types including compressed Tiff's which the Tesseract engine on its own cannot read. It has Twain scanning included and support for multipage Tiff documents. Best of all it is totally free !
FreeOCR has been totally rewritten for Microsoft's .Net Framework V2.0 This was mainly due to problems with displaying Unicode text properly which most older development environments sadly do not support. Unicode is important as the OCR engine supports different languages and outputs them in UTF-8 encoding.
Requirements:
- Pentium Processor - 200MHz
- 256 MB Memory (RAM)
- 10MB Free Disk Space
- SVGA Resolution Display
- Net Framework 2.0 or higher