It’s ironic: users of PDF solutions automatically assume that when they have a problem with a PDF file it’s the software’s fault. Conversely, developers typically say it’s the file’s fault. But ask a tester what’s at fault – the file or the software – and he’ll probably say “It’s 50/50.” Why? Because those of us who have tested PDF solutions over and over again, realize that the PDF format is so complicated there is always the probability that something is wrong with the PDF itself. But on the other hand, there’s the possibility of bugs, or even a problem caused by user error.
However, there is a middle ground: a PDF may have been badly encoded by the software that created it. Or, the chance that – again because of the complexity of the PDF format – a user chose the wrong settings when preparing for a task. For example, choosing a right to left language like Arabic, then forgetting to set things back to left to right for an English document – resulting in damaged encoding. All of which means that the person who receives the file may have problems opening and working with it even though there is nothing wrong with their PDF software! In such a situation, a user might experience something like the following: You open an editable PDF using Adobe Acrobat Pro with no evidence of any problem. But when you try to copy and paste its text, your pasted text looks either like series of square boxes or weird hieroglyphs! But it’s not your software’s fault, even if it looks so. It copied the text – but the text layer of that PDF was damaged.
With that in mind, I’m proud to say that the ABBYY team is really user oriented. And to this end, we’re working on technical enhancements that will let users get accurate text extraction from PDF files with bad encoding.