Navigation:  Docsvault Features > Optical Character Recognition >

OCR to Searchable PDF file

Print this Topic Previous pageReturn to topNext page

OCR conversion to searchable PDF

 

The OCR process identifies text in each page with near perfect accuracy and create a hidden text layer within the same PDF file in such a way that each text word is exactly positioned behind its appearance in the image. This recognized text can now be used in three ways:

 

When the PDF file is open, you can search for any word or phrase using the search feature of the PDF application and it will highlight the appearance of the matched text.

 

The text in PDF files is indexed by the indexing engine and can be used to search for content within these files from the main Docsvault search interface.

 

The recognized Text can be exported to a text file

 

For example:

When creating research report and proposals, many times you need to compile information from a variety of sources which may include other reports saved as PDF, printed documents, magazine or newspaper articles etc. Rather than retyping the information into your new document, simply OCR it while scanning in Docsvault. You can then extract information from Docsvault PDF Editor and edit it into Notepad or easily cut and pasted into a existing PDF file or any other editor.

 

 

You can easily OCR any image-based PDF file or PDF file created by scanning. Docsvault can convert your documents into editable text (Notepad) and fully searchable OCR’d PDF files, the perfect format to store and share text based information over or beyond your organization.

 

In a searchable PDF, the original scanned image is retained so any human can read the document. The textual content that is extracted via OCR is put behind the image so search indexers can see it and you can select it as text in any PDF Editor.

PDF searchable is very useful for you where you would like to have your documents in PDF format as well as have the ability to search the documents by it’s contents.

PDF searchable files provide a reliable and easy way of searching PDF documents.You can retrieved the  OCR'd documents either by browsing through Docsvault or by searching for a document using Search Option. The indexing  provided by Docsvault indexer provides high performance of retrieval. [Indexing is a system service that helps you to quickly find files on your computer using text searches. When you perform OCR on  Portable Document Format (PDF) recognized text is available to the index, making it possible to find relevant PDF files when you search.]

 

While performing OCR, the program analyzes the image and detects areas that contain text.

 

Note:

To have a full text indexing on a PDF file you need to have it's corresponding IFilter component installed on your system. Install Adobe's free iFilter 9.0 for 32-bit platform and Adobe iFilter 9 for 64-bit for 64-bit platform on the PC on which the Docsvault software is running.

 

Docsvault OCR supports only English characters.

 

 


Page url: http://www.docsvault.com/online-help/professional/index.html?ocr_to_searchable_pdf_file.html