SHARE
Our first joint research project objective is to compare the performance and results of an autonomous driving system AI model when training and validating it with real-world data and highly accurate synthetic data. We will explore as well the impact of faithful simulation of sensors on system performance
Can you use synthetic data to develop a trustworthy autonomous driving system? - Project’s logbook
Every week, we will be sharing an exciting new chapter of this project. A logbook, written by the researchers and enabling the readers to enjoy this journey, just like we do.
Enjoy the read!
CHAPTER 1
Author
Gabriele Baris, PhD. student at WMG, University of Warwick
Perception Tasks
We are interested in perception tasks based on camera data for assisted and automated driving. Based on that, we have identified a list of fundamental basic tasks:
- Object detection
- Segmentation
- Instance segmentation
- Tracking
- Optical flow
- Dynamic occupancy grid mapping
Given the expertise in the WMG sensor group, we would suggest starting with object detection. Object detection is a key perception task, and several times other tasks and functions are based on object detection.
Depending on the outcomes based on object detection, and project development and timing, we will evaluate the possibility to investigate other tasks, such as instance or panoptic segmentation.
Datasets
One of the objectives of the project is to compare Anyverse virtual data to real data. For this reason, we shortlisted and analyzed some of the most famous automated driving datasets, listed below. An extensive list of available datasets can be found in [1].
- Audi Autonomous Driving Dataset (A2D2)
- BDD100K
- CityScapes
- KITTI
- KITTI-360
- nuScenes
Since we aim to realistically simulate virtual camera data and compare the results to a real-world dataset, we need to know the camera parameters (e.g., horizontal and vertical field of views, sensor specs, lens specs, …), which are not always available. The datasets providing the more complete information are A2D2[2] and KITTI[3].
Thus, we suggest starting with KITTI (one of the most famous dataset available, with more than 8.5K citation to date) and then exploring other datasets in the future, if needed. This decision is based also on the facts presented in the following section.
KITTI
The whole dataset contains 6 hours of recorded traffic in different scenarios (from freeways to rural areas). Data is calibrated, synchronized and timestamped, providing both rectified and raw image sequences.
The sensor payload consists of:
2 × PointGray Flea2 grayscale cameras (FL2-14S3M-C), 1.4 Megapixels, 1/2” Sony ICX267 CCD, global shutter
2 × PointGray Flea2 color cameras (FL2-14S3C-C), 1.4 Megapixels, 1/2” Sony ICX267 CCD, global shutter
4 × Edmund Optics lenses, 4mm, opening angle ∼ 90°, vertical opening angle of region of interest (ROI) ∼ 35°
1 × Velodyne HDL-64E rotating 3D laser scanner, 10 Hz, 64 beams, 0.09° angular resolution, 2 cm distance accuracy, collecting ∼ 1.3 million points/second, field of view: 360° horizontal, 26.8° vertical, range: 120 m
1 × OXTS RT3003 inertial and GPS navigation system, 6 axis, 100 Hz, L1/L2 RTK, resolution: 0.02m / 0.1°
Considering the camera frames, the KITTI dataset for object detection consists of 7481 training images (labelled) and 7518 testing images(not labelled). The testing set is only used to compute the performance for the KITTI Leaderboard, since it does not provide the ground truth. The training set labels are distributed as follows:

Here, ‘DontCare’ labels denote regions in which objects have not been labeled, for example because they have been too far away from the laser scanner. To prevent such objects from being counted as false positives our evaluation script will ignore objects detected in don’t care regions of the test set.
You can use the don’t care labels in the training set to avoid that your object detector is harvesting hard negatives from those areas, in case you consider non-object regions from the training images as negative examples.
[…]
The goal in the object detection task is to train object detectors for the classes ‘Car’, ‘Pedestrian’, and ‘Cyclist’.
We suggest that as a part of the project, the amount of virtual images should be similar to KITTI and roughly with a similar distribution of object classes. This implementation will allow us to assess the performance of synthetic vs. real data to train the selected object detector(s).
References
[1] Laflamme CÉ, Pomerleau F, Giguere P. Driving datasets literature review. arXiv preprint arXiv:1910.11968. 2019 Oct 26.
[2] Geyer J, Kassahun Y, Mahmudi M, Ricou X, Durgesh R, Chung AS, Hauswald L, Pham VH, Mühlegg M, Dorn S, Fernandez T. A2D2: Audi Autonomous Driving Dataset. arXiv preprint arXiv:2004.06320. 2020 Apr 14.
[3] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite.
In 2012 IEEE conference on computer vision and pattern recognition 2012 Jun 16 (pp. 3354-3361). IEEE.
Read chapter 2 >>>