OCR Conversion to Searchable PDF file

<< Click to Display Table of Contents >>

Navigation:  Add-ons > Optical Character Recognition Add-on >

OCR Conversion to Searchable PDF file

OCR conversion to searchable PDF


The OCR process identifies text in each page with near perfect accuracy and creates a hidden text layer within the same PDF file in such a way that each text word is exactly positioned behind its appearance in the image. This recognized text can now be used in three ways:


When the PDF file is open, you can search for any word or phrase using the search feature of the PDF application and it will highlight the appearance of the matched text


The text in PDF files is indexed by the indexing engine and can be used to search for content within these files from the main Docsvault search interface


The recognized text can be exported to a text file


For example:

When creating research report and proposals, many times you need to compile information from a variety of sources which may include other reports saved as PDF, printed documents, magazine or newspaper articles etc. Rather than retyping the information into your new document, simply OCR it while scanning in Docsvault. You can then extract information from Docsvault PDF Editor and edit it into Notepad or easily copy and paste into any other editor.



In a processed PDF file that is made searchable using OCR, the original scanned image is retained so that it looks exactly like it was scanned. The textual content that is extracted via OCR is put behind the image in a hidden text layer so search indexers can see it and you can select it as text in any PDF reader or editor.


note Note:

To have a full text indexing on a PDF file you need to have it's corresponding iFilter component installed on the Docsvault server system. Follow the "Indexing Help Page" link under the "Full Text Search" node in Docsvault Server Manager dialog for further help with iFilters.