ANYVERSE

Synthetic data to develop a trustworthy autonomous driving system | Chapter 5

Synthetic data to develop a trustworthy autonomous driving system Chapter 5

SHARE

Share on linkedin
Share on twitter

CHAPTER 5

Author
Hamid Serry, WMG Graduate Trainee, University of Warwick

Last week we looked at performance metrics, and how the base definitions could be combined to create more intricate and detailed metric systems. This week we are building on that with a very new type of metric: Spatial Recall Index.

Spatial Variation

Classic metrics maintain a focus surrounding an overall quantification of performance within a dataset full of a variety of images.

What this focus does not capture, however, is spatial variations across specific images and videos; and how these spatial variations are related to detection performance.

A study attempting to quantify the effect of augmenting virtual datasets for training deep neural networks (i.e. a Faster R-CNN network with a ResNet50 feature extractor) using image distortion, namely vignetting, was undertaken in 2019.

The re-trained networks were used to evaluate the real KITTI dataset, demonstrating an increase in mAP performance when using an augmented virtual training dataset [1].

Establishing this link is an important step as it invites new methods to measure image quality in a performance perspective, and deepen the understanding into spatial variations’ effects.

Spatial Recall Index (SRI)

Spatial Recall Index (SRI) fills this gap in metrics by quantifying the “Spatially resolved performance”, referring to the detection variations within the pixels in a detection model.

SRI can be properly computed on a datasets containing images of the same size (particularly true in the case of automated driving datasets).

It works by applying a Recall metric across every pixel in the input image, and creating a map of how these pixels vary with the True Positives (TP) intersecting the Ground Truth (GT) bounding boxes (as defined last week).

This outputs an image the same dimensions as the input image, with each pixel holding a value associated with the number of TP bounding boxes which overlap the GT bounding box.

As this is applied to a single frame, it is then repeated over all frames within the dataset. The output is then normalised by diving by the total GT bounding boxes, resulting in a scale ranging between 1 and 0, Equation 1; an ideal detector holding a value of 1 versus a pixel containing zero detections holding a 0 [2].

As this is applied to a single frame, it is then repeated over all frames within the dataset. The output is then normalised by diving by the total GT bounding boxes, resulting in a scale ranging between 1 and 0, Equation 1; an ideal detector holding a value of 1 versus a pixel containing zero detections holding a 0 [2].

SRI Comparisons

One of the greatest advantages of SRI comes from the visual aspect of the metric, as at a quick glance, much of the information based on comparing two SRI maps can be seen instantly. This is further enhanced by taking the difference between two outputted SRI maps and observing the performance difference between them.

This is illustrated in Equation 2, where (x,y) are the coordinates of the pixels within the image.

As this is applied to a single frame, it is then repeated over all frames within the dataset. The output is then normalised by diving by the total GT bounding boxes, resulting in a scale ranging between 1 and 0, Equation 1; an ideal detector holding a value of 1 versus a pixel containing zero detections holding a 0 [2].

Where SRI1(x,y) and SRI2(x,y) represent the SRI computed on two different models or variations of the dataset.

As illustrated in Figure 1, the equation applied to two individual SRI maps can be consolidated into one ΔSRI map, clearly showing the stark spatial variances between the set; directly corresponding to the two arcs of blur added.

As this is applied to a single frame, it is then repeated over all frames within the dataset. The output is then normalised by diving by the total GT bounding boxes, resulting in a scale ranging between 1 and 0, Equation 1; an ideal detector holding a value of 1 versus a pixel containing zero detections holding a 0 [2].

Figure 1 – SRI Comparison between a raw image and a variation with a blur filter applied to selected regions [3]

This example illustrates the usage within an entire dataset comparison, where the SRI is averaged over the dataset, but also the use for comparing even just a single image. It provides a useful comparison within a trained model to look into the details of the changes within detections, based on spatial variation.

Conclusion

SRI is a promising metric for use to visually compare the spatial performance differences of the deep neural network when evaluating a specific dataset.

This aspect opens avenues to also test how altering generated camera parameters can have an effect on the spatial variations within the image, as well as the detection performance linked to this, which will be of particular interest in regards to how parameters affect generated datasets.

References

[1] K. Saad and S. -A. Schneider, “Camera Vignetting Model and its Effects on Deep Neural Networks for Object Detection,” 2019 IEEE International Conference on Connected Vehicles and Expo (ICCVE), 2019, pp. 1-5, doi: 10.1109/ICCVE45908.2019.8965233.
[2] P. Müller, M. Brummel and A. Braun, “Spatial recall index for machine learning algorithms”, London Imaging Meeting, vol. 2, no. 1, pp. 58-62, 2021. Available: 10.2352/issn.2694-118x.2021.lim-58.

[3] A. Braun, “Spatial Recall Index – A novel metric for AI-algorithms”, Auto-Sens Brussels 2022.

Read more >>>

Looking to start your Synthetic Data journey or need help with your current project? We'd love to know more.

Looking for the right Synthetic Data to speed up your system? Please, enter the Anyverse now

Let's talk about synthetic data!

[contact-form-7 404 "Not Found"]