Named Entity Recognition (NER)

Learn how to leverage the NER project types within the super.AI platform.

Named entity recognition (NER) is the task of identifying and categorizing named entities in text. Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “super.AI” in a text and classify it as a “Company”.

The super.AI named entity recognition (NER) project type features granular design options for defining your entities, explaining how they are labeled, and deciding what is excluded from labeling. On this page, you will find details on each of the following:

  • Entity classes
    • Entity pre-labeling
  • Custom entity recognition
  • Parts of speech to exclude
  • Task granularity

Entity classes

When you define an entity, you have to provide instructions to our human labelers on how to identify it within your text. There are three parts to creating entity classes:

  1. Entity name
  2. Explanation and examples
    • Provide clear, simple information that will help our human labelers identify the entity in your text. For example, “A dog is any domesticated canine descended from the gray wolf, e.g., labrador, chihuahua, or rottweiler.”
  3. Parent entity
    • If you select Yes on Has a parent entity?, you can then define the parent entity name. For example, a DOG entity might have an ANIMAL parent entity.
  4. Model class your entity class maps to
    • If you want to use a class supported by our entity pre-labeling but under a custom class name, you can map your custom name to the entity name supported by the pre-labeling model. For example, if you want to use the PERSON entity but have Human as the class name, you can just enter Human as the Entity name and select PERSON from the The model class your entity class maps to dropdown and we'll map it for you.

Entity pre-labeling

The super.AI NER project type supports pre-labeling of entities. The entities in the table below are recognized by our pre-labeling model. You can set any of these as the model class that your entity maps to when designing your project and they will automatically be labeled before your text is sent to our human labelers. The names and descriptions come from OntoNotes 5.0.

EntityDescription
PERSONPeople, including fictional characters
NORPNationalities or religious or political groups
FACBuildings, airports, highways, bridges, etc.
ORGCompanies, agencies, institutions, etc.
GPECountries, cities, states, etc.
LOCNon-GPE locations, e.g., mountain ranges and bodies of water
PRODUCTObjects, vehicles, foods, etc. (not services)
EVENTNamed battles, wars, sports events, hurricanes, etc.
WORK_OF_ARTTitles of books, songs, etc.
LAWNamed documents made into laws
LANGUAGEAny named language
DATEAbsolute (e.g., July 4, 2010) or relative (e.g., two weeks ago) dates or periods
TIMETimes shorter than a day
PERCENTPercentage, including %
MONEYMonetary values, including unit
QUANTITYMeasurements, as of weight or distance
ORDINALNumbers of order, e.g., first, second, etc.
CARDINALNumerals that do not fall under another type

Custom entity recognition

Define a search pattern (a text string or regex pattern) that you want consistently labeled with a specified entity name throughout any text you submit to us for labeling. You must also state whether the search pattern is an exact string or a regex pattern.

For example, if your text frequently features the name George Orwell you can set that as a search pattern along with the entity name PERSON, as in this screenshot:

720

If you don’t have any custom rules to apply, click the ‘x’ in the top right to remove the field.

Parts of speech to exclude

You can exclude any part-of-speech (POS) tag from labeling out of the options in the table below. You can find detailed explanations of each on the Universal Dependencies website.

Entities that contain masked parts of speech can still be labeled when at least one of the words that comprise the entity is not a masked part of speech. For example, if you enable POS tag masking for ADJ, you can still label Del the Funky Homosapien as PERSON, even though Funky is an adjective.

Part of speechExamples
Adjectivebig, old, green, African, incomprehensible, first
Adpositionin, to, during
Adverbvery, well, exactly, tomorrow, up, down, how, now
Auxiliaryhas (done), will (do), was (done), should (do), is (a teacher)
Coordinating conjunctionand, or, but
Determinera, an, the, this
Interjectionpsst, ouch, bravo, hello
Noungirl, cat, tree, air, beauty
Numeral1, 2020, one, seventy-seven, II, MMXIV
Particle’s, not, let’s, may you
PronounI, you, he, myself, themselves, who, nobody
Proper nounMary, John, London, NATO, HBO
Punctuation., (, ), ?
Subordinating conjunctionthat, if, while
Symbol$, %, §, ©, +, −, ×, ÷, =, <, >, :), ♥‿♥, 😝
Verbrun, eat, runs, ate, running, eating
Otherxfgh pdl jklw

Task granularity

These two settings let you define how labelers received your input data and how they label it.

Entity classes per task

Define whether labelers have to label all entities in the text as a single task or each labeler only labels one entity class per task. If your text input is quite dense with entities, it's better to select Separate tasks for each top-level entity.

Utterances per task

This lets you set the length of the dialogue users will label by limiting it to a certain number of utterances. We will break the input text up into chunks of this size to send to labelers.