Client Manual > Features > Working with PDF Documents

Working with PDF Documents

This section explains what happens when you import different types of PDF files into Docsvault.

PDF files can be imported into Docsvault by one of the following way:

•Importing electronic PDF files (text based & image base)

•Scanning paper documents with or without optical character recognition (OCR).

•Printing an electronic document to PDF format. We have already covered how to print to PDF format in Creating PDFs with Docsvault PDF section.

•Creating a PDF version of an existing document directly within DV.

•Creating a PDF copy of an existing file for distribution or archiving.

We can therefore divide PDF files into two types:

Text-based PDF: A series of text elements contents and images

Image-only PDF: A single scanned image per page

Importing image-only PDF files into Docsvault

In image-based PDF you will find image instead of text. While selecting the text with mouse you will find that the text is not selected. You can only read the PDF file. Neither you can edit nor you can delete the text.

If such image-based PDF files are imported into Docsvault, you can still work with it. Docsvault will attempt to OCR all imported PDFs using its Optical Character Recognition (OCR) add-on tool.

Note: This feature will be available only if has been enabled in Docsvault Server Manager. For more information, see OCR Configuration in the Server Manual.

However if the OCR process finds any text content in any imported PDF file, it will skip that PDF file and will keep it in its original form. This is essential to protect text based PDF files that do not need any OCR. However if you still wish to OCR, you can force OCR by opening the Properties dialog of this file in Docsvault Client and marking it for Re-OCR/Force OCR. This will convert the entire PDF file into image based PDF and then OCR all pages. For more information on Re-OCR, see Re-OCR PDF File.

The user will be able to get the update status of OCR status from the File Properties > General tab. For more information, see OCR Status.

To correct image-based PDFs, Docsvault Image Correct and Redact feature will allow you to redact, erase, edge cleanup, straighten crooked pages, adjust clarity and many more. For more information see PDF Correction Tools.

Notes:

•OCR add-on module will be available only if it has been purchased and activated.

•Administrator can monitor the OCR process from the OCR node in Docsvault Server Manager.

Scanning documents and optical character recognition (OCR)

You can create a PDF directly from a paper document using Docsvault and your scanner.

•Click the Scan icon on the toolbar.

•Enter the required parameters such as file name, description, location to save in Docsvault.

•Select the Scanner.

•Select OCR and create searchable PDF in Scan Preferences Setting and click Scan.

•Select Scan More to scan more pages or Import to save the file in PDF format in Docsvault.

For more information how to scan in Docsvault, see Scanning and OCR.