How super.AI Outperforms Google AI, Amazon Textract, and Microsoft Azure

Manish Rai

VP of Marketing

SUMMARY

Enterprises today have a confusing array of options for processing documents. Optical Character Recognition (OCR) software from companies like Abbyy was first used to digitize documents nearly 30 years ago. Next, companies like IBM built data capture solutions that used templates to automatically extract data from structured and semi-structured (e.g., invoices) documents. However, these solutions required creating a template for each vendor to account for variability across invoice formats. More recently, cloud vendors like Google, Microsoft, and Amazon have introduced document AI solutions that have two key advantages over previous solutions:

The vast troves of data modern tech giants possess is incredibly useful when building and training artificial intelligence (AI)
Recent advances in AI mean data can be extracted from documents with greater accuracy and without using templates for semi-structured information

However, these cloud AI solutions were designed with developers in mind, not business users. For instance, uploading documents for processing and downloading the results in a JSON format requires using API calls. Effectively leveraging these solutions can be complex, requiring users to parse the data to extract the desired fields, add post-processing to correct for OCR errors, and create a human-in-the-loop (HILT) interface for quality assurance and processing low-confidence outliers. Building a complete application that satisfies security, privacy, access control, and scalability requirements requires additional development. Additionally, cloud AI solutions from large providers require ongoing maintenance and updates to take advantage of new capabilities and improvements.

To learn more specific details, check out our blog where we compare super.AI to Microsoft Azure for automated invoice processing: Automating Invoice Processing: Super.AI vs. Microsoft Azure

The challenges that come with building and maintaining these document processing solutions has given rise to 75+ companies chasing the Intelligent Document Processing (IDP) market, which is expected to reach $6.38B by 2027. Most IDP vendors have done a great job simplifying setup and adding a human-in-the-loop interface, allowing shared services and global business services departments to automate document-centric processes such as procure-to-pay (P2P) and order to cash (O2C). IDP solutions allow users to:

Automatically classify documents at scale
Use any OCR or cloud document AI solution for automated data extraction
Tune field-specific confidence levels with thresholds that trigger HITL routing
Role-based access control to safeguard data security
Data encryption to satisfy privacy regulations
Analytics and reporting to measure automation rates and other important metrics

However, even the best IDP solutions available today struggle to process more than 80% of semi-structured (e.g., invoice) and complex documents (which may include handwritten notes, approval stamps, and/or signatures). They’re also unable to process unstructured documents such as contracts.

A unique approach to unstructured data and document processing

Super.AI takes a unique approach to unstructured data and document processing that allows our platform to process 100% of even the most complex documents. There are a few specific reason our technology is able to outperform other IDP offerings:

Decomposition: Each document is broken down into smaller components (core document, handwritten notes, stamps, signature, etc.) using computer vision. Our solution selects the best available AI for a given component, then combines the results into a unified output.
Multi-level AI: Several levels of AI are used to generate the best results. At the first level, the documents are sent to multiple OCR softwares — Google, Azure, and AWS — then the best results from each are combined into a unified output. Each OCR has strengths and weaknesses, with performance varying based on things like language, document layout, image quality, and more. At the next level, a semantic layer is used to detect and extract the desired fields from the composite data. Computer vision models are used to detect signatures and stamps, language models to identify fields with various labels, and fuzzy matching to correct common OCR errors. Finally, custom models are created and trained using customer-specific data. This multi-level AI approach allows us to deliver industry leading extraction accuracy.
Data Processing Crowd: After IBM Deep Blue defeated Kasparov at chess, it became clear that the best chess players are not AI or humans, but AI and humans working together. It's in this spirit that super.AI invested in building a crowd-sourced data processing team that can help us train customer specific AI models during the setup phase, as well as process exceptions and validate extracted data when AI confidence is low. This workforce can be deployed on-demand to ensure 100% processing for even the most complex documents. Of-course, customers always have the option to use their in-house team of BPOs for model training and validation.