Instance segmentation - IoU, Precision, Recall, AP, #FP

Improve your understanding of quantitative instance segmentation metrics to gain deep insight into your results.


  • IoU: The closer the value is to 1,

  • Precision [%]: The closer the value is to 100%,

  • Recall [%]: The closer the value is to 100%,

  • Average Precision: The closer the value is to 1,

  • False Positives: The closer the value is to 0,

the better your trained application performs.

What to expect from this page?

Basic theory

Instance segmentation metrics are calculated per label category or instance depending on the metric. An instance match is determined by its overlap with an annotation (ground truth) provided by the user. The numeric metric for this overlap is the IoU or “Intersection over Union“.

Based on this value, the predictions of the trained application can be classified as true-positives (presenting annotated areas of a label) or false-positives (not representing annotated areas of a label). Based on these classifications performance metrics like Precision, Recall, and Average Precision can be calculated for the whole image or ROIs.

Demo validation visualization

The demo validation visualization presents instance classifications and semantic segmentation. The “Annotation” and “Prediction” visualizations label instances in general as predicted by the application. The “Semantic Correctness” visualization shows the overlap of annotated and predicted areas in detail. For all representations, the following color legend is applied.

True-positive areas are displayed in green.

Areas where your app's prediction matches your annotations. In other words, the predicted labelled area overlaps with annotation by more than 50%.

False-positive areas are displayed in red.

Areas that have not been annotated, yet predicted by the application. In other words, it predicts and labels areas of the image that should not be predicted and labeled. Areas falling into this category either do not match annotation or do so by less than 50%.

True-negative areas are displayed in grEy

Areas that have not been annotated and predicted. In other words, it is an area that doesn’t include true-positive, false-positive, and false-negative areas.

False-negative areas are displayed in blue.

Areas that have been annotated, yet not predicted and labeled by your application.

Performance metrics for images with annotations

Section 2 of the downloaded PDF report, Training Results, provides tables with quantitative results for all validation images and ROI(s) (if they were used) including all available labels.

These tables include performance metrics such as Intersection over Union (IoU), Precision, Recall, and Average Precision (AP) for all images in total as well as for every single image. These measures, except IoU, are based on instance-wise classification as either true-positives or false-positives.

The sections below will describe how to understand and interpret these metrics.

Intersection over Union (IoU)

What is it?

IoU is calculated by dividing the intersection of the annotated instance area with the predicted instance area for each predicted instance.

Intersection over Union, IoU

Each predicted instance can only intersect with one annotated instance. If a prediction intersects with more than one annotated instance, the annotation with the highest IoU value is assigned to this prediction. Intersecting areas with IoU values lower than 0.5 (less than 50% overlap of the areas) are counted as non-corresponding. False-positive predictions generally have an IoU value of 0.

The IoU values of all predicted areas are collected and averaged to arrive at a value between 0 and 1 for the entire image (and all validation images). The values at the end of this spectrum signify the following:

  • an IoU of 0 means there is no overlap between predicted instances and annotated instances or that all of the detected overlaps are smaller than 50% of the entire area

  • an IoU of 1 means that all predictions perfectly match the corresponding annotated instances

Please note: An IoU of 1 would be the “perfect”, rather unrealistic value that rarely occurs in practice. However, the closer the value is to 1, the better your trained application performs.

The IoU value is calculated as an average over predicted instance IoUs, not annotated object IoUs. This means that if your image contains one large annotated object but there are four predicted objects with one of them overlapping the annotation more than 50% your resulting image IoU will be 25% (one prediction is true-positive, three predictions are false-positive).

This value will still be unchanged if the three false-positive predictions partly overlap with the annotated object as the IoU value is calculated instance-wise, not area-wise.

Why can it be undefined?

Important: If the image does not contain any prediction areas at all, the IoU will be reported as an undefined value (marked in the table with a dash “-”)


What is it?

Precision is a measure of how well the trained model arrives at correct results with its predictions. It is calculated by dividing the number of correctly identified instances by the number of the total predicted instances.


Why can it be undefined?


What is it?

Recall is a measure of how well the app can identify annotated instances. It is calculated by dividing the number of correctly identified instances by the total number of annotated instances.


Why can it be undefined?

Average Precision (AP)

What is it?

Average Precision is an overall measure of the app's performance. Its calculation is somewhat more complicated than in the cases of Precision and Recall. First, all predictions are sorted out and labelled as either true positive (TP) or false positive (FP). During this process, the number of TP and FP is summed up and recorded.

Based on this list we can calculate the Precision and Recall values for each point in the observation. The Precision denominator is the number of predictions detected up to a certain position on the list determined by the application for each case, not the total number of predictions.

The values for Precision and Recall are used to define a Precision-Recall Curve.

To estimate Average Precision (AP) the Precision-Recall Curve is transformed into a multi-step function of the maximum Precision value at each Recall level, where Recall values are exceeded. This results in a new curve appearing on the chart (see the red dashed one above).

The Average Precision (AP) is defined as the area under this red dashed curve. Its values can range from 0 to 1, with higher values signifying a better-performing model.

Why can it be undefined?

No. of false positives (#FP)

What is it?

To properly assess how well the application identifies the image background as such, the number of False Positives is supplied as a substitute for the Specificity.

Performance metrics for images without annotations


Now that you have an understanding of the most common quantitative metrics used in instance segmentation, you are perfectly equipped to start interpreting your analysis outputs!

Share this article, if you have found it helpful.

If you still have questions regarding your application training, feel free to send us an email at Copy-paste your training ID in the subject line of your email.

Related articles

Copyright 2016-2021 KML Vision GmbH. IKOSA® is a registered EU trademark.