Button Text
Home
arrow
Blog
arrow
Data Privacy
Aug 16, 2022
Min Read

Redacting Sensitive Content in PDFs: Adobe Acrobat vs. Document.Redact

Share on TwitterShare on Twitter
Share on TwitterShare on Twitter
Share on TwitterShare on Twitter
Share on TwitterShare on Twitter
Sina Youn
Privacy Tech Lead
SUMMARY

What is PDF redaction?

PDF redaction is the process of removing sensitive or private information from a document before it is shared. This can be done using special software that is designed to identify and remove sensitive data, or by manually editing the document to remove sensitive information. Redacting a PDF can be important for privacy reasons, as it ensures that only the information that is intended to be shared is accessible. It can also be useful for security purposes, as it helps to prevent sensitive data from being leaked. When done properly, PDF redaction can help to protect the privacy of individuals and organizations.

Why redact PDFs at all?

There are many reasons not to disclose the full information or content in a document, be it because of privacy regulations, intellectual property, or confidentiality. For example, government documents that contain confidential and/or classified details must be redacted before being made public. Simultaneously, there are many reasons why organizations may want or have to store and/or share documents that contain sensitive information. This may be to satisfy legal obligations, like Freedom of Information Act (FOIA) or data subject (under the General Data Protection Regulation) requests, for business purposes (e.g. during merger and acquisition (M&A) processes, when appointing third-party service providers), and many other reasons.

Redaction has proven to be a straightforward solution that enables the secure sharing and storing of documents containing personal, sensitive or confidential information. Redaction is a data masking technique that removes information or parts of information irreversibly. Two commonly used synonyms for redaction, used in the legal space, are anonymization and de-identification.

Adobe Acrobat offers manual or semi-automated redaction tools only

Before computers became ubiquitous, people made paper copies of documents and manually redacted information on them with a pen. In today’s digital age, various solutions and web services offer users the digital equivalent of manual redaction. One of the most popular is Adobe Acrobat. It includes a redact tool for removing sensitive images and text in pdfs, which can be done manually or through a semi-automated (find and redact) process.

Manual redaction in Adobe acrobat involves a human screening the entire document while identifying and marking texts and objects for redaction. The semi-automated “find and redact” approach allows users to search for specific words, phrases, or patterns, (e.g., credit card numbers, email addresses, Social Security numbers, dates, etc.), and then remove all instances where the query appears in a PDF.

Both of the redaction options available in Adobe Acrobat require that a human worker first read through at least part of each document and give the tool instructions about what to redact from it. This is a tedious, time-consuming process that is neither scalable or accurate. This process might work at low volumes, but falls apart when redaction needs to be applied across larger document stores and accommodate different access permissions when sharing PDFs. Manual and even semi-automated redaction at scale requires a large human workforce to spend hundreds of hours screening and redacting documents.

A study by SpringCM found that almost 75% of companies say that human error negatively impacts document processing very often or often. Incorrectly redacted documents lead to additional redaction work and, worst case scenario, negative legal and business consequences. While the acquisition costs of manual and even semi-automated redaction tools like Adobe Acrobat may be relatively low, the consequences of relying on this tool to safeguard sensitive information at scale can be very high.

Scale sensitive information protection with AI-powered automation

Newer redaction solutions that leverage artificial intelligence (AI) to fully automate the redaction process save time, mitigate risk, and provide a scalable approach to data privacy and regulatory compliance. AI-driven approaches to redaction don’t rely on manual or rules-based “find and redact,” instead using an intelligent understanding of the types of information a user is seeking to redact, making it possible to identify sensitive data in a wider variety of contexts.

Natural language processing (NLP) is typically used to give AI redaction solutions the ability to identify information like names, addresses, credit card numbers, dates, and more automatically. Optical character recognition (OCR) and intelligent character recognitions (ICR) are used to help AI-automated redaction tools understand handwritten text. Additionally, computer vision (CV) is used to interpret images and graphics embedded in PDFs so information other than text can be redacted. This means personally identifiable information (PII) like faces and license plates can be removed automatically as well.

Using these technologies, AI-based redaction software is able to offer robust support for various document input types and formats. The software can also “understand” the document content in order to reliably redact specified information. This keeps human involvement near zero, while the redaction outcome is improved in terms of both speed, accuracy, and cost compared to manual or semi-automated alternatives.

Document.Redact by Super.AI: A powerful alternative to Adobe Acrobat

Meet Document.Redact, super.AI’s fully automated software solution to remove personal, sensitive and confidential information from any document instantly, irreversibly and reliably - with guaranteed results. You only need an internet connection!

Up to 99% of the redaction process can be automated - and to reach the highest redaction accuracy and anonymization quality, super.AI has multiple built-in redaction features for manual Quality Assurance (QA). You can do it yourself or use the “all-round carefree package” where super.AI takes care of the QA.

The underlying NLP- and CV-models of Document.Redact are based on machine learning, which means that super.AI’s program learns and improves with usage by (the program) exploring data and identifying patterns (itself). That way, our solution is more robust to non-standard patterns, regional specifics, different language inputs,  and scanned as well as machine-readable pdfs.

  • Document.Redact is 5x times faster than Adobe Acrobat, when redacting personal identifiers from a 26 page annual report, using both manual and semi-automated redaction in Adobe Acrobat.
  • Document.Redact is 20% more accurate than Adobe Acrobat, when it comes to identifying and redacting personal information from a 26 page annual report, using both manual and semi-automated redaction in Adobe Acrobat.
  • Document.Redact is ∞ more intelligent than Adobe Acrobat, for example it can detect both British and American English words in a single redaction process, and phone numbers from different countries.

Don’t let redaction requirements slow you down!

Document.Redact has full legal admissibility and is designed for business and technical users to address the growing imperative for organizations to effectively manage data privacy compliance and minimize IT- and cybersecurity risks. Data privacy is fast becoming the rule rather than the exception.

Applications of Document.Redact include:

  • All businesses in all industries: Fulfillment of Data Subject Requests  (DSR) under the Freedom of Information Act (FOIA) and GDPR.
  • Finance and Banking: Redaction of credit card statements, purchase orders, merger and acquisition (M&A) documents, and more.
  • Law Enforcement: Anonymization of court records, land records, legal discoveries, Uniform Commercial Code (UCC) Filings, and more.
  • Medical and Healthcare: HIPAA-compliant de-identification of medical records and patient data in clinical research and clinical trials.
Other Tags:
Data Privacy
Document Automation
Share on TwitterShare on Twitter
Share on FacebookShare on Facebook
Share on GithubShare on Github
Share on LinkedinShare on Linkedin

WHITE PAPER

Protecting Sensitive Information at Scale with AI

Get a customized demo with your documents

Book a free consultation with our experts.

You might also like