Working with PDF Documents

This section explains what happens when you import different types of PDF files into Docsvault.

PDF files can be imported into Docsvault by one of the following way:

•	Importing electronic PDF files (text based & image base)

•	Scanning paper documents with or without optical character recognition (OCR).

•	Printing an electronic document to PDF format. We have already covered how to print to PDF format in Creating PDFs with Docsvault PDF section.

We can therefore divide PDF files into two types:

Text-based PDF: A series of text elements contents and images (optional)

Image-only PDF: A single scanned image per page

Importing image-only PDF files into Docsvault

In image-based PDF you will find image instead of text. While selecting the text with mouse you will find that the text was not selected. You can’t edit the text. Even you can’t delete the content. It looks like an image. You can only read the PDF file. Neither you can edit nor you can delete the text.

If you import such image-based PDF file which is created or scanned using any other application into Docsvault, you can still work with it. Docsvault will OCR it using its optical character recognition (OCR) add-on tool while importing the file. Note that this feature will be available only if has been enabled in Tools > Advance Settings.. For more information, see OCR in the Advance Settings.

Docsvault will attempt to OCR all imported PDFs. However if the OCR process finds any text content in any imported PDF file, it will skip that PDF file and will keep it in its original form. This is essential to protect text based PDF files that do not need any OCR. However if you can still wish to OCR, you can force OCR by opening the Properties dialog of this file and marking it for Re-OCR/Force OCR. This will convert the entire PDF file into image based PDF and then OCR all pages. For more information on Re-OCR, see Re-OCR PDF File.

You will be able to get the update status of OCR status from the File Properties > General tab. For more information, see OCR Status.

Notes:

•	OCR add-on module will be available only if it has been purchased and activated.

•	You can monitor the OCR process from the OCR node in Tools > Advance Settings.

Scanning documents and optical character recognition (OCR)

You can create a PDF directly from a paper document using Docsvault and your scanner.

•	Click the Scan icon on the toolbar.

•	Enter the required parameters such as file name, description, location to save in Docsvault.

•	Select the Scanner.

•	Select OCR and create searchable PDF in Scan Preferences Setting and click Scan.

•	Select Scan More to scan more pages or Import to save the file in PDF format in Docsvault.

For more information how to scan in Docsvault, see Scanning and OCR.