Navigation:  Client Manual > Docsvault Features > Scanning and OCR >

Optical Character Recognition

Print this Topic Previous pageReturn to topNext page

Optical Character Recognition:

 

OCR i.e. Optical Character Recognition is the process of turning a picture of image (such as a scanned image of a  letter or invoice, or a PDF document which is essentially an image of a text documents) into a text document that can be added to searchable database, allowing the retrieval of scanned documents (PDF or TIFF) based on their content. 

 

 

For example:

When creating research report and proposals, many times you need to compile information from a variety of sources which may include other reports saved as PDF, printed documents, magazine or newspaper articles etc. Rather than retyping the information into your new document, simply OCR it while scanning in Docsvault. You can then extract information from the ocr'd pdf using any PDF Editor like Adobe Acrobat and edit it into Notepad or easily cut and pasted into a existing PDF file or any other editor.

 

 

OCR conversion to searchable PDF

 

You can easily OCR any black & white, greyscale or color, produced by a scanner. Docsvault can convert your documents into editable text (Notepad) and fully searchable OCR’d PDF files, the perfect format to store and share text based information over or beyond your organization.

 

In a searchable PDF, the original scanned image is retained so any human can read the document. The textual content that is extracted via OCR is put behind the image so search indexers can see it and you can select it as text in any PDF Editor. 

PDF searchable is very useful for you where you would like to have your documents in PDF format as well as have the ability to search the documents by it’s contents.

PDF searchable files provide a reliable and easy way of searching PDF documents.You can retrieved the  OCR'd documents either by browsing through Docsvault or by searching for a document using Search Option. The indexing  provided by Docsvault indexer provides high performance of retrieval. [Indexing is a system service that helps you to quickly find files on your computer using text searches. When you perform OCR on Tagged Image File Format (TIFF) or Portable Document Format (PDF) recognized text is available to the index, making it possible to find relevant TIFF and PDF files when you search.]

 

Three simple steps to convert a paper document into a searchable PDF file:

 

Scanning your paper document. Soon, an image of the scanned page will appear in the Preview window.
Performing OCR and
Saving the document in an searchable format.

While performing OCR, the program analyzes the image and detects areas that contain text.

 

Note:

To have a full text indexing on a PDF file you need to have it's corresponding IFilter component installed on your system.

 

You can turn off automatic analysis and OCR of newly scanned PDF or TIFF file from Others tab of Docsvault Server Manager.

 

 

 

Information:

OCR conversion of scanned documents is handled in one centralized location i.e the Docsvault Server.

 

 

 

Note:

Docsvault OCR supports only English characters.

 

 

 

 

 

 

 


Page url: http://www.docsvault.com/Online_Help/SB_Help/index.html?ocr_scanning.html