How Image Annotation Teaches Machines to See

26 Oct 2023

10 min read

Computer vision technology holds vast potential, enabling capabilities ranging from detecting cancer cells to facilitating facial recognition payments on smartphones. The success of these functionalities significantly depends on accurate image annotation, which involves labeling visual data meticulously to train models effectively. Similar to a scenario where a child might misidentify a banana as orange if mislabeled initially, machine learning models, too, can be misled with inaccurate image annotation.

The following discussion explores the critical role of precise image annotation in training robust computer vision models, shedding light on its broader implications in this dynamic field.

What are computer vision and image annotation?

Computer vision, an artificial intelligence technology, uses deep learning to process images and video in a way that lets machines see the world around them and react appropriately. These computer vision models can make sense of the visual data because of one crucial data annotation process: image annotation.

Image annotation is the process of identifying individual elements (objects, faces, etc.) in images by attaching labels to them. Data scientists and other AI professionals then use the annotated data to train AI models to accurately identify and track different elements within an image and predict the behavior of those elements.

Using image annotation for machine learning, computer vision models can see the world around them and ideally react in a similar way as a human would (think of a self-driving car being able to make on-the-fly decisions based on random external stimuli). In this blog, we’ll take a closer look at:

types of image annotation,
how image annotation is performed,
use cases.

Types of Image Annotation

There are several different techniques for annotating images for deep learning. They include:

Bounding Boxes – In this type of image annotation, bounding boxes in the shape of a rectangle are drawn tightly around the edges of each object to be identified. This helps detect and recognize different classes of objects.
2D and 3D Cuboid Annotations – Cuboid Annotations are used for multidimensional images – this type of annotation allows for more precise annotations, as it gives a more detailed look at the various dimensions of 2D and 3D objects.
Image Classification – Using predefined categories, image classification separates images into these categories to form a set.
Polygon Lines – Polygon annotations are a precise way to annotate objects by only including the pixels that belong to them.
Semantic Annotations – These provide accurate annotations at a pixel level:
- Semantic Segmentations – A precise type of a pixel-wise segmentation where every pixel in the image is assigned to a class.
- Pose Estimation – Used in a series of images, this is a technique that predicts and tracks the location of a person or object. This is done by looking at a combination of the pose and the orientation of a given person/object.
- Object Detection, Tracking and Identification – Object Annotation allows machines to detect objects on the line and determine proper positioning of the object. This is useful in quality control in food packag

How to Perform Image Annotation

Using a crowd, image annotation can be done in large volumes of images quickly and accurately. The first step in image annotation is to identify the use case and which annotation technique will be the most effective.

When the annotation technique is identified, the contributors are shown pictures and asked to identify the relevant elements in the picture. As with any type of annotation, the more annotators per dataset, the better the quality of the annotated data.

Real-World Applications of Image Annotations

Giving machines the ability to see through computer vision has many exciting applications in the real world. Here are a few examples of image annotation in use:

Autonomous vehicles: To make sense of the world around them, the technology used for self-driving cars needs some context about what it is looking at. Autonomous vehicles need to be able to identify traffic lights (and the colors within), pedestrians, road signs, driving lanes, and numerous other objects on the road.

Facial recognition: Landmark annotation, which uses key points labeled at specific locations, is the most useful type of image annotation for facial recognition technology. Used in security settings, social media, photo applications, and several other ways, facial recognition models are among the more controversial models within AI. However, the potential usefulness of facial recognition may outweigh the concerns.

Manufacturing: Particularly in large-scale production, computer vision in manufacturing can save hours of time and cut back on costs significantly when used in predictive maintenance, package inspection, and identifying defects. Semantic segmentation is the ideal annotation tool in manufacturing as, for example, it identifies tire defects on a manufacturing line.

Agriculture: Within agriculture, computer vision is being used for crop maintenance, to identify environmental conditions, and to check on the condition of specific crop yields, to name a few uses. In short, image annotation is a valuable tool within computer vision technology used in agriculture to help identify very specific objects in large-scale images.

Image Annotation at Defined.ai

At Defined.ai, we have worked with many companies to provide high-quality, crowdsourced training data for computer vision models. Here are just two examples:

Global Electronics Maker using Facial Recognition Technology

In this first case study, our client needed to be able to detect individual people in family portraits and understand their relationship with the others in the image, i.e., recognizing a “man” and also his position as “father.”

Using 1000 verified images, annotators within our crowd identified family members in each picture, providing details on age, relationship, and countries of origin, creating a highly customized dataset in just 6 weeks. Our client was able to use this dataset to train a facial recognition model to be more accurate and useful for their application.

Automation in Utilities Inspection

In this case study, EDP, an electric utilities company in Portugal, aimed to use computer vision models to improve asset performance management processes and better identify damage in an effective way. Using 12,500 images and multiple annotations from our crowd, the model learned to identify an utility pole.

An additional 900 annotated images were used to train the model to identify damage to the poles. As a result, EDP no longer needs to hire helicopters and humans to traverse the sky, taking pictures of poles. Instead, drones trained on our training data did it for them, saving EDP time and money. The models were also able to predict which poles would need maintenance in the future, allowing EDP to solve problems before they occurred. It was a huge improvement to their maintenance capabilities.

The Gift of Sight

In the same way that natural language processing is helping machines understand human speech in a more natural way, computer vision is helping machines process the world around them through sight. Image annotation is fundamental to this process, resulting in more accurate behaviors. Computer vision is an extremely exciting field, and it will undoubtedly change our lives for the better.