OCR features of PDF solutions: how to choose a PDF tool?

Yes, you should!

“What is OCR?” you may ask, “I do not need OCR, I just want to open my PDFs”

According to statistics presented at the PDF Days 2018 conference, organized by the PDF association, one quarter of PDF documents are “image-only” (non-searchable) PDF documents. Image-only PDF documents are simply photographs of documents; do not contain any machine-readable text, which would allow a variety of actions, the key for working with digital documents or maintaining a paperless workplace environment.

This is exactly where OCR, or Optical Character Recognition, is of crucial importance – it makes documents truly digital and the text in them accessible for a number of actions. OCR is ultimately the key to making a difference in an employee’s productivity and efficiency between working with paper and working with digital documents.

All PDFs are the same! …or are they?

Even though all PDFs may look the same at the first glance, they are not. Depending on their origin, they may or may not allow access to the text in the document. When you are dealing with a scanned PDF you cannot select text to add mark-up and annotations, to copy and paste it into another document, to edit it or to search through it for keywords. All you can do is read it on the screen, which makes it about as useful as a paper document. It is kind of digital, but it does not utilize all the benefits that a truly digital document can have.

	Digitally-born PDF	Image-only PDF	Serachable PDF
Description	May have multiple layers, containing text, illustrations, and other objects. Has never “hit“ paper.	Has only an image of the document in a layer (image in a PDF “envelope“).	Has the document image layer and a text layer (normally under the image).
Origin	Created from another application via code or using a printer driver.	Scanner, Images converted to PDF or a “digitally-born“ PDF saved as an image-only PDF.	A text layer has been added to an image-only PDF by applying OCR.
Searchable	yes (in most cases)	no	yes
Example

And what does OCR have to do with all this?

Optical Character Recognition will “read” and recognize the document and will convert it to machine-readable text. Thus, allowing you to work with this document in the same way as with any other PDF document.
So far so good. BUT if the quality and accuracy of the OCR which is integrated in your PDF tool is insufficient you may be surprised by what comes out when you copy and paste text into another document, or when you need to convert a document to Microsoft Office formats for further editing.

Highly accurate OCR technology, such as the one at the core of ABBYY FineReader, which preserves the original layout and formatting of the document and supports various combinations of languages within one document, is the key to document productivity without frustration.

In fact, if you use ABBYY FineReader as your default PDF viewing and editing tool, you would not even need to bother understanding what OCR exactly is or does ̶ you can simply do your work.

ABBYY FineReader automatically detects image-only PDF files and applies OCR to them while it is opening them. For you this means immediate access to their content: to mark-up and annotate, to search, to extract data for re-use, to redact and even to edit.

In case you need to make some more serious editing to the document ABBYY FineReader provides you with a set of advanced document conversion settings within its OCR Editor, here you have almost endless possibilities to adjust the conversion settings to your project and get away without any re-typing, correction or re-building of a document.

5 every-day examples when good OCR saves the PDF

Here are a few examples of every-day situations in which you would be thankful for having a PDF tool, powered by a high-quality OCR:

1. Search & information retrieval

Finding relevant information quickly in the information flood is key to efficiency. According to IDC, knowledge workers spend an average of 136 minutes per week searching for documents. OCR is instrumental for being able to find a document itself and search for information within this document, especially within scans. The quality of the applied OCR, the languages it supports and its intelligence determine the quality of search results. If you are dealing with a multi-page document, you will be happy to have some AI help on your side.
ABBYY FineReader supports 192 languages and delivers up to 99.8% recognition accuracy. It not only can automatically detect the language of the document that you are working on, but can also handle documents, which include text in multiple languages.

2. Update and republish

For many years, PDF was the format of choice when the document wasn’t going to be changed. Now that PDF is the de-facto standard for digital documents and it is used as the digital representative of paper the need to edit PDF documents seems quite reasonable. Why not correct a typo, update a number or a word or change a name directly in a PDF document?

First, you need a PDF tool that is capable of editing PDF documents. Free PDF readers normally do not provide this capability. Advanced PDF tools such as ABBYY FineReader will let you easily edit the content of a digital PDF. Still PDFs were not meant to be edited. So if you want to make more significant changes than fixing a typo or a number, you can convert the existing document in Microsoft Word and edit it there, FineReader will preserve the original layout and formatting and you will save yourself hours from retyping or rebuilding an existing document.
Another challenge for PDF editing is scanned and image-only PDF documents. OCR is crucially important to get access to the text “locked” in the document image and edit it. Visually you may not recognize that you are working with an image-based PDF, and here is where FineReader provides key value – it will intelligently detect the document type and will apply OCR in the background. The benefit for you: you simply edit your documents – no matter what type they are.
OCR also makes it possible for you to copy & paste not only text but also tables with data from a PDF document you are viewing into another document, which can be Microsoft Word or Excel. In ABBYY FineReader you can select just the snippet that you need and continue reading.

3. Collaborating and exchanging feedback

Everyone knows how useful and fast adding comments and annotations to a PDF document is. Plus, if you compare to the past when we used to exchange feedback on paper it brings the advantage of not having to decipher the handwriting of your colleague. Even if you still get a piece of paper to provide your feedback you can still insist on being digital. Just snap a photo with your phone or scan the document and open it in ABBYY FineReader. The image will be automatically converted to a PDF and the OCR powering FineReader will add a text layer in the background, which will allow you to not only add the usual “sticky-note” comments, but also to select text, which you highlight, strike through, and underline; and then add your comment – it makes it much clearer for everybody.

4. Protect and redact

Have you ever tried editing a multi-page PDF document? You probably spent hours if not days going page-by-page, line-by-line, looking for names, social security numbers, phrases, etc. Additionally, if you were dealing with a “flattened” document or a document scan you do not have any other choice, as you cannot even search through them for specific information.
With ABBYY FineReader, you could have saved yourself all the pain. As soon as you open a document image, a text layer that your computer can identify as text is added automatically for you, making it possible to search, see a list of all occurrences of the phrase, select the ones you must redact (or all of them) and apply redaction with just one click – easy!

See more details about redaction in this article

5. Identify differences and changes

Most of us have the experience of printing two documents and going line-by-line comparing their content – to make sure that no undesired or fraudulent changes made their way into the final, to double-check that the requested edits made their way into the final or to detect duplicates. What is wrong with this process is that it wastes valuable resources – your time and paper, it is error-prone, and it is exhausting on your eyes and your brain. But with FineReader you can solve all of these issues at once! It can help you compare two documents in any file format within seconds – et voilà – you see all the differences highlighted for your review. So imagine you received a contract back, which your counterpart printed, manually signed and then scanned back to send you as a PDF or mailed it to you in an envelope. Now you would like to make sure that the conditions remained unchanged – thanks to OCR you can just open the scanned PDF version of the contract with FineReader and compare it to its final version in Word that you have and the rest will happen automatically.

Learn more about ABBYY FineReader

See for yourself: test ABBYY FineReader 14 in your day-to-day tasks with PDF documents and experience the difference, which AI-powered OCR technology makes in a PDF tool.

This article is part of an upcoming series with recommendations and tips for organisations and professionals looking for a new PDF tool, whether as an alternative solution to a tool currently in use, or as the next step after the free PDF reader ̶ for more possibilities and higher productivity.

After all, PDF is the new paper!

If you have questions about choosing a PDF tool suitable for your needs, do not hesitate to leave a comment and we will make sure to address them in upcoming articles.