What Does Named Entity Recognition Do?

Follow Us
Share on Twitter
Share on Facebook
Github
Linkedin
Home
Blog
What Does Named Entity Recognition Do?

Named entity recognition (NER)—sometimes referred to as entity chunking, extraction, or identification—is the task of identifying and categorizing key information (entities) in text. An entity can be any word or series of words that consistently refers to the same thing. Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “super.AI” in a text and classify it as a “Company”.

NER is a form of natural language processing (NLP), a subfield of artificial intelligence. NLP is concerned with computers processing and analyzing natural language, i.e., any language that has developed naturally, rather than artificially, such as with computer coding languages.

This post explores the basics of how NER works, along with some high-level use cases and how you can apply it in your business or project.

How NER works

At the heart of any NER model is a two step process:

  1. Detect a named entity
  2. Categorize the entity

Beneath this lie a couple of things.

Step one involves detecting a word or string of words that form an entity. Each word represents a token: “The Great Lakes” is a string of three tokens that represents one entity. Inside-outside-beginning tagging is a common way of indicating where entities begin and end. We’ll explore this further in a future blog post.

The second step requires the creation of entity categories. Here are some common entity categories:

  • Person
  • E.g., Elvis Presley, Audrey Hepburn, David Beckham
  • Organization
  • E.g., Google, Mastercard, University of Oxford
  • Time
  • E.g., 2006, 16:34, 2am
  • Location
  • E.g., Trafalgar Square, MoMA, Machu Picchu
  • Work of art
  • E.g., Hamlet, Guernica, Exile on Main St.

These are just a few examples. You can create your own entity categories to suit your task, as well as provide granular rules for which entities belong to which categories in instances of ambiguity or task-specific ontologies.

super.AI’s interface allows you to decide your entities

To learn what is and is not a relevant entity and how to categorize them, a model requires training data. The more relevant that training data is to the task, the more accurate the model will be at completing said task. Train your model on Victorian gothic literature, and it will probably struggle to navigate Twitter.

Once you have defined your entities and your categories, you can use these to label data and create a training dataset (our named entity recognition data program can do this for you automatically). You then use this training dataset to train an algorithm to label your text predictively.

NER use cases

NER is suited to any situation in which a high-level overview of a large quantity of text is helpful. With NER, you can, at a glance, understand the subject or theme of a body of text and quickly group texts based on their relevancy or similarity.

Some notable use cases include:

  • Human resources
  • Speed up the hiring process by summarizing applicants’ CVs; improve internal workflows by categorizing employee complaints and questions
  • Customer support
  • Improve response times by categorizing user requests, complaints and questions and filtering by priority keywords
  • Search and recommendation engines
  • Improve the speed and relevance of search results and recommendations by summarizing descriptive text, reviews, and discussions
  • Booking.com is a notable success story here
  • Content classification
  • Surface content more easily and gain insights into trends by identifying the subjects and themes of blog posts and news articles
  • Health care
  • Improve patient care standards and reduce workloads by extracting essential information from lab reports
  • Roche is doing this with pathology and radiology reports
  • Academia
  • Enable students and researchers to find relevant material faster by summarizing papers and archive material and highlighting key terms, topics, and themes
  • The EU’s digital platform for cultural heritage, Europeana, is using NER to make historical newspapers searchable
Wherever there are large quantities of text, NER can make life easier

How can I use NER?

If you think that your business or project could benefit from NER, it’s pretty easy to start out. There are a number of excellent open-source libraries that can get you going, including NLTK, SpaCy, and Stanford NER. Each has its own pros and cons, which we’ll be exploring in more detail soon.

But before you begin using one of these libraries to build a model, you will need to produce a relevant labeled dataset to train the model on. That’s where Canotic is there to help. Using our named entity recognition data program, you provide us your raw text and desired entities and categories. We’ll label the text you send and return a high quality training dataset that you can take to train and tailor your NER model.

If you’re interested in learning more or have a specialized use case, reach out to us. You can also stay tuned to our blog, where we’ll be running a series of posts covering different aspects of NLP over the coming months.

Christopher Marshall
Christopher Marshall
Data science technical writer
Featured Posts
What Exactly Is AI?
Automating Testing, Inspection, and Certification with Artificial Intelligence
Super.AI at Intelligent Automation Week Winter 2021
Confidential Information is Risky—So Automatically Redact It
Super.AI at Slush 2021
6 Ways to Use Automatic Image Processing to Streamline Your Business
Detect Vehicle Damage Automatically with AI-Powered Image Processing
Modernizing Optical Character Recognition (OCR) with Artificial Intelligence
What Is Intent Recognition and How Can I Use It?
Deconstructing the Super.AI UDP Platform: The AI Compiler
Built with Super.AI: Cashierless Checkout
Automating Product Recommendations with AI
AI-Powered Nameplate Data Extraction for Testing, Inspection, and Certification (TIC) Services
Introducing Super.AI Image Redact
Approaching Proof of Concept like Sun Tzu, A Military Strategist and Philosopher
Real-world Applications of Sentiment Analysis
Real-world Applications of Optical Character Recognition
How Artificial Intelligence Simplifies Problems
Ground Truth Data Guarantees Output Quality
Deconstructing the Super.AI UDP Platform: Data Lifecycle
Deconstructing the Super.AI UDP Platform: Quality Assurance
Deconstructing the Super.AI UDP Platform: Our Crowd vs. Your Own Labelers
Deconstructing the Super.AI UDP Platform: Data Programming
The Big Cost of Corrosion
Event Recap: Super.AI at AI in the City
AI in Tech: Automation Through Machine Learning
Button Text