What do PDFs have to do with the GDPR?

With the General Data Protection Regulation (GDPR) being applicable in all member states of the European Union as of 25th May 2018, companies of all sizes and industries are bound to define processes and systems to manage personal data in a more structured and reliable way than ever before. Obviously, the first thing that comes to mind are customer databases, CRM systems, marketing automation tools. But what about documents?
Personal data and confidential information are not only contained in databases and internal systems, but also in various business documents. Moreover, businesses, institutions, organizations have a lot of them. Whether storing or sharing such documents, it is crucially important to be aware of the specifics, have the right tools and take the right actions.

The PDF Association refers to the PDF file format as “digital paper” and the “de facto standard for electronic documents”. In order to make sure PDF documents are included in your internal data protection compliance processes, there are a few things you should take care of.

5 Tips for Preparing PDF Documents for GDPR Compliance

#1| Raise the awareness among staff that not all PDFs are the same

Not all PDFs are searchable ̶ scans for example. To retrieve data in such documents, they must be converted to searchable PDFs using text recognition (OCR).

So-called “digital-born” PDFs are known to be searchable; however, there may be a variety of reasons why information in them could not be found through a full-text search - e.g. vector graphics that look like text, screenshots or other images that contain text.

Fundamentally, the ambition of GDPR is about strengthening individual control over the use of their personal data. This means that your organization must have the systems and best practices in place to comply with subject access requests (Art. 15 of GDPR), “right to be forgotten” (Art. 17 of GDPR) or objection to data processing (Art. 21 of GDPR). You have to be sure that you will find all personal data of a data subject in order to comply.

Moreover, GDPR imposes rigorous obligations on data controllers and processors to safeguard privacy rights, including providing data subjects a copy of their processed personal data upon request (Art. 15 of GDPR). To do so, data controllers must implement appropriate technical and organizational measures (Art. 24 of GDPR) to ensure that processing of personally identifiable information is consistent with the Regulation and provide information to data subjects in a concise, transparent, intelligible and easily accessible form (Art. 12 of GDPR), including the right of data portability (Art. 20 of GDPR).

#2 | Make data in PDF documents “discoverable”

The primary challenge faced by organizations is lack of sufficient insight to information holdings that may span email, file shares, content repositories and yes, in many cases, paper-based files. Not surprising the PwC Pulse Survey found that initial investments are expected to be in data discovery best practices and tools.

Digitizing paper documents and converting previously scanned and non-searchable PDFs into searchable documents will ensure that documents in digital archives and repositories are quickly retrievable by simply searching for specific names or other personal information. The conversion can be automated for individual needs or for organization-wide handling of large amounts of documents. The process has to be set up just once and will run automatically afterwards.

#3 | Minimize the sensitive data you store and share

GDPR imposes specific requirements related to data minimization (Art. 25 of GDPR). Therefore, it is prudent practice to either apply data anonymization best practices so that data subjects are no longer identifiable, or use data redaction for documents that may contain confidential, sensitive or personally identifiable information.

Therefore, since PDF documents contain information that is confidential and is deemed to be personally identifiable information, it is recommended that any personal, sensitive and confidential information is made unrecognizable in these documents - whether stored in a digital archive or shared with third parties. Using true redaction will permanently remove information from the documents and make it irretrievable.

#4 | Beware of the “hidden” data

In many cases sensitive data is visible at plain sight within the document, but sometimes such data can also be “hidden” within metadata, attached files, and comments for example. Instead of going through all these areas manually and redacting or removing sensitive information piece by piece, with the appropriate software tools, these can be removed quickly and easily. PDF documents can be "sanitized" by removing "hidden" data with just a few clicks, which will help keep administrative burden and compliance cost within reason.

#5 | Protect information in documents from unauthorized access

If certain personal information needs to be retained according to Art. 25 of GDPR the necessary measures need to be taken in order to limit the access to processed personal data only to authorized personnel. One of the possibilities to comply is protecting these documents with a password, which will make them accessible only to those who have the correct password. At the same time, beware of the fact that these documents then become inaccessible through keyword search. When it comes to handling requests such as subject access (Art. 15 GDPR), “right to be forgotten” (Art. 17 GDPR) or objection to data processing (Art. 21 GDPR), where finding all personal data of a data subject is crucial for compliance, password protected documents will need to be processed separately.