What is OCR and how can I use it?
Optical character recognition (OCR) boils down to the process of converting text — whether printed or handwritten — into a digital format.
OCR is a form of computer vision, a field of study concerned with how machines see. Computer vision is both one of the more established and immediately useful forms of machine learning (ML). It’s about using AI to allow computers to process images and video to extract meaning from them.
OCR has a more storied history than most computer vision tasks, and it is often viewed as one of the more straightforward computer vision techniques, perhaps one that has already been perfected. The reality is more complicated. The real-world applications of OCR extend far beyond the familiar digitization of dusty documents, and the advent of machine learning means OCR has a long and lucrative road before it. In this article we go over the history of OCR, how it developed with the help of ML and explore some of its most recent use cases.
A brief history of OCR
OCR has a longer story to tell than its computer vision compatriots. It dates back to 1913 and the invention of the optophone. This device converted printed text into sounds for interpretation by the blind. Sadly, the optophone’s slow reading speed and exhausting manual operating mechanism prevented it from going to market.
It wasn’t until the 1970s that OCR staked its enduring claim. This came in the form of software that worked alongside scanners to convert physical documents into a digital format. Originally, it was intended for text-to-speech processing for the blind and visually impaired. The technology found much wider adoption, however, thanks to the invention of omni-font OCR, allowing for almost any document to be scanned and converted into digital text.
Until recently, the standard OCR technique remained largely unchanged:
- Create a digital image of the text
- Filter the image to enhance the contrast between light and dark areas
- Identify text through contour detection
- Compare each character or word against a library of examples to find the closest match — essentially a form of image classification
Alternatively, feature detection can be used, whereby consistent features of a character are used to identify it, e.g., a lowercase `b` is a straight line with a semicircle on the right side of its base.
This rule-based software approach is effective at the task it was designed for, but it’s also pretty restrictive, with the breadth of its effectiveness limited by the number of character sets available to match against. The process is largely confined to the conversion of structured text, e.g., text formatted in paragraphs or columns.
The accuracy of the technique also drops off rapidly as you introduce artefacts via second-generation documents (e.g., scans of scans) and coffee mug stains into the mix.
Furthermore, the rule-based approach has prevented OCR from being applied to a huge range of possible problems in the real world, where text is not provided in the form of neatly arranged documents. Rather, the real world is a place where text is oriented strangely and obscured by graffiti and mud on the side of a bus, where characters overlap in low-resolution images of street signs, among innumerable other possibilities.
The future of OCR
Machine learning (ML) has opened a new chapter on OCR, and it’s a page turner. OCR has progressed further in the past few years than in the previous hundred.
The key change is that OCR is no longer limited to scans of documents. Now it can be applied to any image of any text, which is why at super.ai we call our OCR data program image text transcription.
So long as you have enough accurate training data, an OCR ML algorithm can be applied to any imaginable real-world scenario that requires the identification and conversion of text. The enormous breadth of possible specialized use cases mean OCR has a bright future ahead of it, as illustrated by the recent use of machine learning to begin transcribing an ancient Japanese cursive script, Kuzushiji.
The power of ML usually resides in its ability to provide generalized solutions, a model that can work in a wide variety of instances so long as it’s trained on enough accurate, relevant training data. OCR is more unusual in this sense, as the text that needs to be identified comes in such a breadth of possible forms, arrangements, and types that a more specialised approach often provides better results. More exciting problems means more research, more experimentation, and more incentive for investment in creative solutions.
What’s more, with deep learning, producing a digitized version of text is only the beginning: extracting meaning from it is where the real fun begins, with a wealth of insight to be gained from identifying hitherto hidden patterns.
OCR Use Cases
Anyone dealing with large quantities of text in images, no matter what form, stands to benefit from OCR. Frequently, this is companies that were established before digitization was the norm and are looking to digitize vast physical archives. A prominent example of this sort of OCR on a massive scale is Google’s ill-fated endeavour to digitize every book on Earth. But the more complex and unusual your text, the more exciting your opportunities.
Here are some examples coming from industries that are already seeing widespread use of OCR:
Banking and finance
- Automated cheque deposits
- Annual report collation, translation, and information extraction
Transport and logistics
- Licence plate recognition
- Mail and shipment sorting
- Automated street sign reading
- Searchable, exchangeable hospital records
- Automated remittance filing
- Automated processing of notes into HIPAA-compliant formats
These markets are already being reshaped by OCR with no signs of slowing. But academia across all fields can also be excited by the prospect of widespread OCR and the wealth of data it produces. As always, with more data, comes more potential for identifying previously undiscovered patterns, creating fresh insights across the worlds of business and academia.
Do you have text you’re interested in transcribing? Try out our image text transcription product for free. If your use case is specific or niche, reach out to use and together we can create an innovative solution.