(+84) 931 939 453


What is data labeling?

In Artificial Intelligence – Machine Learning, data labeling is the process of identifying raw data (images, text files, videos, etc.) and assigning one or more meaningful labels to inform the machine learning model it can learn from. For example, labels can indicate whether a picture contains any objects,  what words are emitted in a sound recording, or if an X-ray picture contains a tumor. Data labeling is used in computer vision, natural language processing, and speech recognition.

How does data labeling work?

Today, most practical machine learning models use supervised learning, which applies an algorithm to map input to output. For supervised learning to work effectively, you need a labeled dataset that the model can learn from to make accurate decisions. Data labeling typically begins by asking a human to make an assessment of a certain unannotated piece of data. For example, a labeler might be asked to tag all images in a dataset where “photo contains vehicle” is true. Tagging can be as simple or detailed as identifying specific pixels in an image in relation to a vehicle. Machine learning models use human-supplied labels to learn underlying patterns in a process known as “model training”. The result is a trained model that can be used to make predictions about new data.

How does data labeling work?

In machine learning, the appropriately labeled data set that you use as a standard to train and evaluate a certain model is often called “ground truth”. The accuracy of your trained model will depend on the accuracy of your underlying truth, therefore, dedicating time and resources to ensure highly accurate data labeling is essential.

Some common types of data labeling

Computer vision

When building a computer vision system, you need to first label an image, pixel, or key point, or create a border that completely encloses the digital image, called a bounding box, to create your training data-set. For example, you can classify images by quality or content type, or you can segment images at the pixel level. You can then use this training data to build a computer vision model that can be used to automatically classify an image, detect an object’s position, identify key points in an image or image segment.

Computer vision

Natural language processing

Natural language processing first requires you to manually identify important parts of the text or tag the text with specific labels to create your training data-set. For example, you might want to identify the emotion or intent of a piece of text, identify parts of speech, classify proper nouns such as places and people, and identify text in images, PDFs, or other files. To do this, you can draw bounding boxes around the text and then manually transcribe the text in your training data-set. Natural language processing models are used for sentiment analysis, entity name recognition, and optical character recognition.

Sound processing

Audio processing converts all kinds of sounds such as voices, wildlife noises, and other sounds into a structured format that can be used in machine learning. Audio processing often requires you to transcribe it into manually written text first. From there, you can explore deeper information about the sound by adding tags and sound classifications. This classified audio will become your training data-set.


With a team of professional, skillful and experienced staff, BPO.MP Co., Ltd is proud to be a reputable data entry company and strive to become the best data entry company to provide top quality online data entry services with competitive prices, satisfying all customer needs. BPO- Business Process Outsourcing is very essential for every businesses.

Services BPO.MP provides:

For further information, please contact the Hotline: 0931 939 453 or email to: info@mpbpo.com.vn  

(+84) 931 939 453