Scanning and text recognition tips
Text recognition software "talks" with scanners via the TWAIN interface. It is a universal standard adopted in 1992 in order to unify the interaction of devices providing image input into computer (such as scanners) and external applications.
Optical Character Recognition (OCR) means "reading" scanned images and converting them into editable text (the original formatting and layout are also retained).
Recognition quality depends greatly on the scanned image quality. The image quality may be adjusted by setting the main scanning parameters: resolution, scan mode and brightness.
The main scanning parameters are:
Image type - a scanning parameter determining whether an image must be scanned in black and white, gray or color palette. For text recognition, typical image type is 256 grayscales.
Scan mode - gray. Scanning in grayscale mode will yield the best recognition results. If you scan your images in grayscale, the application tunes the brightness automatically.
Scan mode - black and white. The black and white scan mode enables the system to scan at a higher speed, but at the same time some character information is lost. This may have a negative effect on the recognition quality of the documents of medium and low print quality.
Scan mode - color. Select the color scan mode for scanning and retaining color documents. It scans and recognizes color documents with pictures, color text and background, and retains the color in electronic documents.
Resolution - a scanning parameter determining how many dpi to use during scanning, use 300 dpi resolution for regular texts (font size 10pts or greater) and 400-600 dpi resolution for texts set in smaller font sizes (9pts or less).
Brightness - a scanning parameter reflecting the contrast between black and white image areas. Setting correct brightness increases the recognition quality.In most cases the medium brightness value (50%) will do.
Some documents scanned in black and white mode may require some additional brightness tuning.
Tips on Brightness Tuning
The scanned image has to be legible. Three scanned samples on the image (upper-right corner of this hub) can represent respectively three states of an image: good very light (or "torn), and very dark (distorted, glued, or filled)give you some
- Brightness Image(1) - an example of a good image (from the OCR point of view)
- Brightness Image(2) - characters are "torn" or very light. Try to decrease the brightness (it will make the image darker). Try to scan it in gray mode (the brightness autotuning is used in this case).
- Brightness Image(3) characters are distorted, glued, or filled. Try to increase the brightness (it will make the image brighter). Try to scan it in gray mode (the brightness autotuning is used in this case).