What Does Named Entity Recognition Do?

Christopher Marshall

Data science technical writer

SUMMARY

How NER works
NER use cases
How can I use NER?

Named entity recognition (NER)—sometimes referred to as entity chunking, extraction, or identification—is the task of identifying and categorizing key information (entities) in text. An entity can be any word or series of words that consistently refers to the same thing. Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “super.AI” in a text and classify it as a “Company”.

NER is a form of natural language processing (NLP), a subfield of artificial intelligence. NLP is concerned with computers processing and analyzing natural language, i.e., any language that has developed naturally, rather than artificially, such as with computer coding languages.

This post explores the basics of how NER works, along with some high-level use cases and how you can apply it in your business or project.

How NER works

At the heart of any NER model is a two step process:

Detect a named entity
Categorize the entity

Beneath this lie a couple of things.

Step one involves detecting a word or string of words that form an entity. Each word represents a token: “The Great Lakes” is a string of three tokens that represents one entity. Inside-outside-beginning tagging is a common way of indicating where entities begin and end. We’ll explore this further in a future blog post.

The second step requires the creation of entity categories. Here are some common entity categories:

Person
E.g., Elvis Presley, Audrey Hepburn, David Beckham
Organization
E.g., Google, Mastercard, University of Oxford
Time
E.g., 2006, 16:34, 2am
Location
E.g., Trafalgar Square, MoMA, Machu Picchu
Work of art
E.g., Hamlet, Guernica, Exile on Main St.

These are just a few examples. You can create your own entity categories to suit your task, as well as provide granular rules for which entities belong to which categories in instances of ambiguity or task-specific ontologies.

To learn what is and is not a relevant entity and how to categorize them, a model requires training data. The more relevant that training data is to the task, the more accurate the model will be at completing said task. Train your model on Victorian gothic literature, and it will probably struggle to navigate Twitter.

Once you have defined your entities and your categories, you can use these to label data and create a training dataset (our named entity recognition data program can do this for you automatically). You then use this training dataset to train an algorithm to label your text predictively.

NER use cases

NER is suited to any situation in which a high-level overview of a large quantity of text is helpful. With NER, you can, at a glance, understand the subject or theme of a body of text and quickly group texts based on their relevancy or similarity.

Some notable use cases include:

Human resources
Speed up the hiring process by summarizing applicants’ CVs; improve internal workflows by categorizing employee complaints and questions
Customer support
Improve response times by categorizing user requests, complaints and questions and filtering by priority keywords
Search and recommendation engines
Improve the speed and relevance of search results and recommendations by summarizing descriptive text, reviews, and discussions
Booking.com is a notable success story here
Content classification
Surface content more easily and gain insights into trends by identifying the subjects and themes of blog posts and news articles
Health care
Improve patient care standards and reduce workloads by extracting essential information from lab reports
Roche is doing this with pathology and radiology reports
Academia
Enable students and researchers to find relevant material faster by summarizing papers and archive material and highlighting key terms, topics, and themes
The EU’s digital platform for cultural heritage, Europeana, is using NER to make historical newspapers searchable

How can I use NER?

If you think that your business or project could benefit from NER, it’s pretty easy to start out. There are a number of excellent open-source libraries that can get you going, including NLTK, SpaCy, and Stanford NER. Each has its own pros and cons, which we’ll be exploring in more detail soon.

But before you begin using one of these libraries to build a model, you will need to produce a relevant labeled dataset to train the model on. That’s where Canotic is there to help. Using our named entity recognition data program, you provide us your raw text and desired entities and categories. We’ll label the text you send and return a high quality training dataset that you can take to train and tailor your NER model.

If you’re interested in learning more or have a specialized use case, reach out to us. You can also stay tuned to our blog, where we’ll be running a series of posts covering different aspects of NLP over the coming months.

Reducing Supply Chain Disruptions with Intelligent Document Processing

Brad Cordova

How AI-Powered Document Processing is Transforming Manufacturing Quality Control

Brad Cordova

AI in Accounting: 5 Ways Companies Benefit from AI to Improve Efficiencies

Rachel Heller

How Are Large Language Models Reshaping Intelligent Document Processing?

super.AI

Automating Bill of Lading, Packing Slips, and Shipping Documentation with AI

Brad Cordova

Reducing Supply Chain Disruptions with Intelligent Document Processing

Brad Cordova

How AI-Powered Document Processing is Transforming Manufacturing Quality Control

Brad Cordova

AI in Accounting: 5 Ways Companies Benefit from AI to Improve Efficiencies

Rachel Heller

How Are Large Language Models Reshaping Intelligent Document Processing?

super.AI

Automating Bill of Lading, Packing Slips, and Shipping Documentation with AI

Brad Cordova

Reducing Supply Chain Disruptions with Intelligent Document Processing

Brad Cordova

How AI-Powered Document Processing is Transforming Manufacturing Quality Control

Brad Cordova

AI in Accounting: 5 Ways Companies Benefit from AI to Improve Efficiencies

Rachel Heller

What Does Named Entity Recognition Do?

How NER works

NER use cases

How can I use NER?

You might also like

Reducing Supply Chain Disruptions with Intelligent Document Processing

How AI-Powered Document Processing is Transforming Manufacturing Quality Control

AI in Accounting: 5 Ways Companies Benefit from AI to Improve Efficiencies

How Are Large Language Models Reshaping Intelligent Document Processing?

Automating Bill of Lading, Packing Slips, and Shipping Documentation with AI

Reducing Supply Chain Disruptions with Intelligent Document Processing

How AI-Powered Document Processing is Transforming Manufacturing Quality Control

AI in Accounting: 5 Ways Companies Benefit from AI to Improve Efficiencies

How Are Large Language Models Reshaping Intelligent Document Processing?

Automating Bill of Lading, Packing Slips, and Shipping Documentation with AI

Reducing Supply Chain Disruptions with Intelligent Document Processing

How AI-Powered Document Processing is Transforming Manufacturing Quality Control

AI in Accounting: 5 Ways Companies Benefit from AI to Improve Efficiencies