Overview of popular datasets in computer vision and self driving

We can find several popular datasets in computer vision and autonomous driving tasks in the visual perception space. They differ in size, diversity, or performance on detection duties. Still, we can use several common points to compare and have an overview of those datasets’ performance in training machine learning models.

In Table 1 – extracted from Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art (2021) – you can observe an overview of these datasets, most of them are autonomous driving datasets though.

Analyzing popular datasets in computer vision tasks

Let’s start by detailing the tasks and characteristics of the datasets that are going to be analyzed and compared:

Object Detection
Semantic Segmentation
Traffic Sign Detection
Optical Flow
Reconstruction
Lane Detection
Road Detection
Stereo
Tracking

Dataset size will be taken into account for each of these tasks on a scale of XS to XL (XS: in the order of tens examples/scenes for training, S: in the order of hundreds, M: in the order of thousands, L: in the order of >10 thousand, and XL: in the order of >100 thousand).

Other factors such as dataset realism and diversity (subjectively evaluated by the researchers) will also be taken into account on a scale of 1 (low) to 5 (high).

Table 1: Popular Datasets in Computer Vision and Self-Driving

Each dataset is focused on specific tasks, such as object detection or semantic segmentation, and is not valid for training other parameters.

As we have assessed in the table1 we see how each dataset has its strengths and weaknesses, and most are focused on specific tasks within specific use cases. For instance, ImageNet, PASCAL VOC, and Microsoft COCO were the largest and most diverse datasets for object classification, detection, and segmentation. For its part, the Middlebury stereo dataset was built for stereo vision and multi-view reconstruction however, it lacks in size and diversity in comparison to other datasets.

Another example is the KITTI dataset, which has firmly established itself as a standard benchmark for all the tasks we’ve discussed, notably within the domain of autonomous driving applications. While KITTI provides annotated data and an evaluation server for all the problems addressed earlier, this dataset is relatively limited in size.

Surely you have already detected several patterns, such as datasets analyzed for traffic sign detection are not appropriate for other tasks such as optical flow, reconstruction, or tracking.

As we said at the beginning of this article, each dataset has been designed to train several specific tasks, so it lacks data to train other tasks equally important for developing an autonomous system. Therefore, it seems legitimate to explore and compare the performance of these datasets for all the typical tasks apart from those for which they have been designed.

Overall performance of analyzed computer vision datasets

Below you can see a series of graphs that represent the global performance of datasets representative to train each of the tasks analyzed in Table 1. The goal is to glance at how these datasets perform when it comes to training the functions they have designed form, but also other functions that they have not been specifically designed for.

To calculate the value of each axis we have taken the highest value presented by any of those datasets for that specific task.

Performance of datasets available to train stereo tasks

Figure 1 represents the overall performance of the datasets available to train stereo tasks: Middlebury, ETH3D, MPI Sintel, Flying Things, KITTI, and VIrtualKITTI.

Figure 1: Datasets for Stereo - Overall performance

We can see how the datasets available to train stereo do not have data to train traffic sign detection, they have data to a low extent to train reconstruction, lane detection, and road detection, but they are good in object detection, semantic segmentation, optical flow, and tracking tasks.

Performance of datasets available to train reconstruction tasks

Figure 2 represents the overall performance of the datasets available to train reconstruction tasks: Middlebury, EPFL Multi-View, DTU MVS, ETH3D, Tanks and Temples, KITTI, and VirtualKITTI.

Figure 2: Datasets for reconstruction - Overall performance

Performance of datasets available to train optical flow

Figure 3 represents the overall performance of the datasets available to train optical flow: Middlebury, SlowFlow, HCI Benchmark, MPI Sintel, Flying Chairs, Flying Things, Playing for Benchmarks, KITTI, and VirtualKITTI. In both cases (2 & 3), scores are highly influenced by KITTI, the rest have lower scores.

Figure 3: Datasets for optical flow - Overall performance

Performance of datasets available to train object detection

Figure 4 represents the overall performance of the datasets available to train object detection: ImageNet, Pascal VOC, Microsoft Coco, Cityscapes, EuroCity Persons Dataset, ApolloScape, NuScenes, Berkeley DeepDrive, German Traffic Sign Recognition Benchmark, German Traffic Sign Detection Benchmark, Tsinghua-Tencent 100K, Playing for Benchmarks, Waymo Open Dataset, KITTI, and VirtualKITTI.

Figure 4: Datasets for object detection - Overall performance

Performance of datasets available to train traffic sign detection

Figure 5 represents the overall performance of the datasets available to train traffic sign detection: German Traffic Sign Recognition Benchmark, German Traffic Sign Detection Benchmark, and Tsinghua-Tencent 100K.

Figure 5: Datasets for traffic sign detection - Overall performance

Performance of datasets available to train semantic segmentation

Figure 6 represents the overall performance of the datasets available to train semantic segmentation: ImageNet, Pascal VOC, Microsoft Coco, Cityscapes, Mapillary, ApolloScape, NuScenes, Berkeley, DeepDrive, German Traffic Sign Recognition Benchmark, German Traffic Sign Detection Benchmark, Tsinghua-Tencent 100K, SYNTHIA, Playing for Data, Playing for Benchmarks, Waymo Open Dataset, KITTI, and VirtualKITTI.

Figure 6: Datasets for semantic segmentation - Overall performance

Performance of datasets available to train road detection

Figure 7 represents the overall performance of the datasets available to train road detection: German Traffic Sign Recognition Benchmark, German Traffic Sign Detection Benchmark, Tsinghua-Tencent 100K, and KITTI.

Figure 7: Datasets for road detection - Overall performance

Performance of datasets available to train lane detection tasks

Figure 8 represents the overall performance of the datasets available to train lane detection tasks: ApolloScape, German Traffic Sign Recognition Benchmark, German Traffic Sign Detection Benchmark, Tsinghua-Tencent 100K, Caltech Lanes Dataset, VPGNet Dataset, and KITTI.

Figure 8: Datasets for lane detection- Overall performance

Performance of datasets available to train reconstruction tasks

Figure 9 represents the overall performance of the datasets available to train reconstruction tasks: ApolloScape, Playing for Benchmarks, MOTChallenge, Caltech Pedestrian Detection, Argoverse, Waymo Open Dataset, KITTI, and VirtualKITTI.

Figure 9: Datasets for reconstruction- Overall performance

After having taken a look at all these graphs of popular datasets in computer vision we can draw several conclusions. These average scores mean that each dataset was designed to develop one or several specific tasks. Therefore, to develop a self-driving algorithm in all its variables and achieve a robust autonomous system’s AI, the use of many different datasets would be required.

Anyway, since each project is different, it needs different requirements, and therefore it has different data needs, the generation of customized datasets is evident. It begs the question: how are you going to get all the quality data you need to train and improve your specific deep learning model?

Stay tuned to explore how synthetic datasets are unlocking new possibilities in computer vision

Can synthetic data help address some of the shortcomings that have been observed in datasets based on real data? Can it help improve diversity and reduce bias in datasets? Can it be built or customized to be more robust in a project’s tasks? What volume of data does it need to achieve training efficiency equal to or greater than real-world-gathered data?

Stay tuned to get the answer to all these questions and learn more on how high-quality synthetic datasets for computer vision are accelerating progress in the visual perception field.

Learn why Anyverse's synthetic datasets are more accurate and comprehensive than other available datasets

About Anyverse

Anyverse™ is the hyperspectral synthetic data generation platform for advanced perception that accelerates the development of autonomous systems and state-of-the-art sensors capable of supplying and covering all the data needs throughout the entire development cycle. From the initial stages of design or prototyping, through training/testing, and ending with the “fine-tuning” of the system to maximize its capabilities and performance.

Anyverse™ brings you different modules for scene generation, rendering, and sensor simulation, whether you are:
– Designing an advanced perception system
– Training, validating, and testing autonomous systems AI, or
– Enhancing and fine-tuning your perception system,

Anyverse™ is the right solution for you.

Analyzing popular datasets in computer vision tasks

Table 1: Popular Datasets in Computer Vision and Self-Driving

Overall performance of analyzed computer vision datasets

Performance of datasets available to train stereo tasks

Figure 1: Datasets for Stereo - Overall performance

Performance of datasets available to train reconstruction tasks

Figure 2: Datasets for reconstruction - Overall performance

Performance of datasets available to train optical flow

Figure 3: Datasets for optical flow - Overall performance

Performance of datasets available to train object detection

Figure 4: Datasets for object detection - Overall performance

Performance of datasets available to train traffic sign detection

Figure 5: Datasets for traffic sign detection - Overall performance

Performance of datasets available to train semantic segmentation

Figure 6: Datasets for semantic segmentation - Overall performance

Performance of datasets available to train road detection

Figure 7: Datasets for road detection - Overall performance

Performance of datasets available to train lane detection tasks

Figure 8: Datasets for lane detection- Overall performance

Performance of datasets available to train reconstruction tasks

Figure 9: Datasets for reconstruction- Overall performance

Stay tuned to explore how synthetic datasets are unlocking new possibilities in computer vision

Learn why Anyverse's synthetic datasets are more accurate and comprehensive than other available datasets

Learn why Anyverse's synthetic datasets are more accurate and comprehensive than other available datasets

About Anyverse

You might also like

Solutions

Use Cases

Industries

Resources

About Us

Let's talk about synthetic data!