Optimizing OCR Accuracy

<< Click to Display Table of Contents >>

Navigation:  Add-ons > Optical Character Recognition Add-on >

Optimizing OCR Accuracy

OCR Accuracy

 

OCR accuracy depends upon the scan quality of the original document.

 

Points to keep in mind to increase the accuracy of OCR processing and indexing:

 

Use a Good Quality Scanner        

The higher quality the scanner you use the higher quality the images that it produces.  Accurate images make for less errors and therefore faster and more accurate results.

 

Always check the images for scanning problems        

If you're processing a small number of documents, it's always worth having a quick look at them to check for anything that might cause a problem.

 

Use 300 DPI          

This is the optimum resolution for representing a normal sized character.  It provides just the right amount for accuracy and efficiency.  If the resolution is too low then the characters will be difficult to recognize.  If it's too high it is slower to process and uses more storage.

 

Scan in black and white        

Using color or grey scale can increase the image file size by between 10 to 50 times.  To keep the amount of data being processed and stored to a minimum, always scan in black and white where possible.

 

Character Accuracy

Factors which can affect the characters recognized are creative typefaces, shading, broken or touching characters, skewed and curved baselines, space errors and underlined text all of which can slow down the  performance of OCR.

 

Optimization for poor backgrounds

The quality of the background of a document can also have an impact on the recognition of characters.  Photocopied, faxed and crumpled documents can deform and distort character images rendering them difficult to recognize.