Hamid Serry, WMG Graduate Trainee, University of Warwick
Classic metrics maintain a focus surrounding an overall quantification of performance within a dataset full of a variety of images.
What this focus does not capture, however, is spatial variations across specific images and videos; and how these spatial variations are related to detection performance.
A study attempting to quantify the effect of augmenting virtual datasets for training deep neural networks (i.e. a Faster R-CNN network with a ResNet50 feature extractor) using image distortion, namely vignetting, was undertaken in 2019.
The re-trained networks were used to evaluate the real KITTI dataset, demonstrating an increase in mAP performance when using an augmented virtual training dataset .
Establishing this link is an important step as it invites new methods to measure image quality in a performance perspective, and deepen the understanding into spatial variations’ effects.
Spatial Recall Index (SRI)
Spatial Recall Index (SRI) fills this gap in metrics by quantifying the “Spatially resolved performance”, referring to the detection variations within the pixels in a detection model.
SRI can be properly computed on a datasets containing images of the same size (particularly true in the case of automated driving datasets).
It works by applying a Recall metric across every pixel in the input image, and creating a map of how these pixels vary with the True Positives (TP) intersecting the Ground Truth (GT) bounding boxes (as defined last week).
This outputs an image the same dimensions as the input image, with each pixel holding a value associated with the number of TP bounding boxes which overlap the GT bounding box.
As this is applied to a single frame, it is then repeated over all frames within the dataset. The output is then normalised by diving by the total GT bounding boxes, resulting in a scale ranging between 1 and 0, Equation 1; an ideal detector holding a value of 1 versus a pixel containing zero detections holding a 0 .
One of the greatest advantages of SRI comes from the visual aspect of the metric, as at a quick glance, much of the information based on comparing two SRI maps can be seen instantly. This is further enhanced by taking the difference between two outputted SRI maps and observing the performance difference between them.
This is illustrated in Equation 2, where (x,y) are the coordinates of the pixels within the image.
Where SRI1(x,y) and SRI2(x,y) represent the SRI computed on two different models or variations of the dataset.
As illustrated in Figure 1, the equation applied to two individual SRI maps can be consolidated into one ΔSRI map, clearly showing the stark spatial variances between the set; directly corresponding to the two arcs of blur added.
Figure 1 – SRI Comparison between a raw image and a variation with a blur filter applied to selected regions 
This example illustrates the usage within an entire dataset comparison, where the SRI is averaged over the dataset, but also the use for comparing even just a single image. It provides a useful comparison within a trained model to look into the details of the changes within detections, based on spatial variation.
SRI is a promising metric for use to visually compare the spatial performance differences of the deep neural network when evaluating a specific dataset.
This aspect opens avenues to also test how altering generated camera parameters can have an effect on the spatial variations within the image, as well as the detection performance linked to this, which will be of particular interest in regards to how parameters affect generated datasets.
 A. Braun, “Spatial Recall Index – A novel metric for AI-algorithms”, Auto-Sens Brussels 2022.
Read more >>>