Going from raw data to labeled data is not a magic black box. In fact, we’ve described our system in detail on the AI compiler page. Within super.AI’s service, we often need real people to label data by hand to ensure accuracy. And that means they need to understand how to label your data, which makes instructional clarity essential.
A set of well written instructions is the largest factor under your control that will determine the success of the data labeling process.
We’ve put together 10 rules that help create instructions that any human labeler anywhere in the world can understand.
Great instructions are often the product of many iterations of tweaks and adjustments. Getting them perfect the first time is hard: for example, there are often edge cases that are hard to anticipate without actually labeling a lot of data first.
Fortunately, your first set of instructions don’t need to be perfect. With super.AI, you can create a draft and quickly improve it based on feedback from the labelers and analysis of where they make mistakes.
The best strategy is to get started quickly and refine. It’s worth being patient when trying to find an instruction set that provides the results you’re after.
Example: If you see images of wolves coming back labeled as dogs, expand your instructions to specify that you only want domestic dogs labeled as dogs.
While this seems obvious, bearing it in mind at all times will help focus your writing. It’s like speaking to an interviewer rather than rambling to yourself in the shower.
Do not assume that a labeler knows the difference between a Gibson ES-335 and a ES-355TD (if you happen to be working for Gibson). Be detailed, be specific, and provide everything necessary for someone below your level of expertise to get the job done.
Example: The ES-355TD model features a split-diamond inlay in the headstock and mother-of-pearl fingerboard inlays.
While your instructions need to be detailed and specific, they also need to be understood in under 15 minutes. If a labeler can’t do that, there’s too much information there for them to bear in mind throughout the task.
Provide an introductory overview, then zoom in on the fine details. This will help the labeler place the information you provide in the right context and make the task more meaningful.
Use bullet points, numbered lists, bolding, and other formatting to structure your content in the most digestible fashion.
It's also possible to use HTML formatting in your instructions.
Example: This list that you’re reading right now.
Whenever there’s room for ambiguity, it’s important to be specific. Remember, though, that you don’t need to include every nuance in your initial instruction set; you can add clarity as you go based on where labelers get confused. Any definitions you create do not need to be scientific; they just need to be consistent.
chair label should not be applied to a sofa.
city is any settlement with over 100,000 inhabitants.
This is one area where iteration is essential, as there will be things you cannot predict.
Example: Include any bicycles attached to a car within the car bounding box.
Example: If a speed limit sign shows two values, enter the lowest value.
Illustrations of what good and bad labels look like are perhaps the most powerful tool for improving labeling. In an instant, a labeler can grasp something that might be difficult to describe or understand.
If your task requires more than just the super.AI labeling UI to complete, you need to provide access to additional tools in your instructions.
Example: Use a URL shortener (https://bitly.com/) for any links.
Updated 3 days ago