Navigation:  Docsvault Features > Optical Character Recognition >

Optimizing OCR Accuracy with scanning

Print this Topic Previous pageReturn to topNext page

OCR Data Accuracy

 

The OCR accuracy depends upon the quality of the original document and printed text quality.  After the processing, documents contain accurately digitized text, while retaining the document’s logical structure, layout and formatting.

Even if your documents have tables, we can also OCR the text from your tables. So at completion your OCR document would appear exactly the same as the original document and now you can edit or search the document by its contents.

 

You can open and use this document in any publishing software, word processor, Docsvault PDF Editor or other text editor.

 

Points to keep in mind to increase the accuracy of  OCR processing and indexing.

 

Use a Good Quality Scanner        

The higher quality the scanner you use the higher quality the images that it produces.  Accurate images make for less errors and therefore faster more accurate results.

 

Always check the images for scanning problems        

If you're processing a small number of documents, it's always worth having a quick look at them to check for anything that might cause a problem.  Badly distorted images, correction fluid etc.  If you're processing large batches, it's essential that you have a look at the scanner too.  A small amount of correction fluid on the glass will cause an error on every single page that you process.

 

Use 300 or 400 DPI        

This is the optimum resolution for representing a normal sized character.  It provides just the right amount for accuracy and efficiency.  If the resolution is too low then the characters will be difficult to recognize.  If it's too high it is slower to process and uses more storage.

 

Scan in black and white        

Using color or grey scale can increase the image file size by between 10 to 50 times.  To keep the amount of data being processed and stored to a minimum, always scan in black and white where possible.

 

 

Character Accuracy

Factors which can affect the characters recognized are creative typefaces, shading, broken or touching characters, skewed and curved baselines, insert errors, space errors and underlined text all of which can slow down the  performance of OCR.

 

 

Optimization for poor backgrounds

The quality of the background of a document can also have an impact on the recognition of characters.  Photocopied, faxed and crumpled documents can deform and distort character images rendering them difficult to recognize.

 

Docsvault OCR is tested and enhanced using extremes of light and dark backgrounds, deformation and speckle.

 

Automatic Orientation detection

Docsvault OCR automatically detects which way up the image or page has been scanned and delivers the recognized text the right way up.

 

 


Page url: http://www.docsvault.com/online-help/professional/index.html?optimizing_ocr_accuracy.html