Hamid Serry, WMG Graduate Trainee, University of Warwick
The largest class is for Cars with 28742 labeled objects, with the Dontcare and Pedestrian classes following in 2nd and 3rd position respectively.
We are currently evaluating which classes we will use for the training and inference of the selected deep neural network, with the idea to merge the Pedestrian and Person_sitting classes as the difference between the two is not of significance to our current purposes and to exclude the DontCare category to not confuse the model training process as it denotes regions of the frames where objects have not been labeled .
Figure 1: Distribution of object classes in the KITTI original training set
We have then computed a split for the original KITTI dataset into 3 sub-sets: 70% of the images were reserved for training the neural network(s) (NNs), 20% for validating the training, and the final 10% for testing the resulting NN model.
This split was computed through a small piece of code which allowed for each sub-set to resemble the original statistical distributions for the 9 classes, as illustrated in Figure 2 below, e.g. the testing dataset (748 images) will contain circa 2874 Cars, 291 Vans, 22 Person_sitting, etc. Hereafter we will name these three subsets as the Anyverse training, validation, and testing datasets.
Once split, the Anyverse training set (5236 images) was investigated further as this is the set of data that we aim to mimic first with generated data using Anyverse tools.
Figure 2: Distribution of the relative frequency of object classes used to generate the Anyverse training, validation and testing sub-sets from KITTI
Moreover, the individual class frequencies can be attained in the Anyverse training dataset, with the idea that this knowledge can allow the synthetic data to be composed to match each specific class distribution, as shown in Figure 3.
All classes notably have the highest frequency at 0, since most images would contain some object classes as the dataset aims to have a wide enough variety of objects, but usually not all of the classes will be captured in all the frames.
Figure 3: Individual frequency distribution within object classes versus number of objects on x-axis
The mean and standard deviations of each class were then calculated to aid with the generation of comparable images, Table 1.
Some of the values, such as the person_sitting class, have a very low mean and standard deviation of 0.02998 and 0.304 respectively, as can be seen from Table 1, and it might be difficult to reproduce these values in a virtual dataset. The Car class has a higher mean frequency of 3.87, however also the standard deviation is circa 3.
As already mentioned, the aim of the project is to assess if it is possible to use images coming from a high fidelity simulator to train a neural network. To correctly simulate the camera data and compare the results with the real-world dataset, namely in our case KITTI, it is important to know all the available details for the camera sensors used for acquiring the data.
We investigated further the sensor used to collect the images and their pre-processing steps. The images are captured by a Point Grey FL2-14S3C-C camera, using a Sony ICX267 imager. According to KITTI website: “The camera images are cropped to a size of 1382 x 512 pixels using libdc’s format 7 mode. After rectification, the images get slightly smaller”.
Deep Neural Networks for object detection
This week, research into different potential Neural Network architectures for object detection has also taken place. Three broad categories, listed below, have been identified and they will be used in further research in the upcoming week to give an in-depth analysis into which machine learning model will fit with our intended use case.
Read more >>>