The Issue of AV Training
With Anyverse, we’re helping our customers tackle some of the more challenging machine learning problems in enabling autonomous driving. Many teams are collecting real life images at a very high cost in both money and time. Even with datasets collected by driving millions of miles, problems are encountered with biases and statistical imbalances leading to less than optimal performance of the resulting models.
The Example of Traffic Lights
Let’s take the example of traffic signals – real life datasets tend to not get a balance of signal states from the desired balance of range of distances, times of day, and color variations. Added to this are the problem of capturing difficult angles, obstructions, and sun positions. In many cases, traffic signal detection and classification confidence levels are unacceptably low and are plagued by low accuracy with false positives and misses.
It’s been recognized in the literature that enhancing machine learning models with synthetic images delivers positive outcomes in accuracy. Many of these early studies have achieved some improvement with images with low graphics quality (game engines) that unfortunately introduce their own compromises and artifacts that can plague the resulting model.
The Anyverse Approach
The Anyverse approach is to use high fidelity physics-based modeling and rendering to create true photorealistic images.
Our approach allows us to generate critical metadata – subject boxing, instance segmentation, and/or semantic segmentation concurrent with the generation of images. This eliminates slow, expensive, and error-prone tagging that comes from post-capture processing that plagues the use of real-world data. We can also model camera specifications, locations, and configurations.
In the case of traffic signals, we can generate a wide range of traffic signal models, number of lights, arrows, orientation, in a variety of settings to simulate the environment anywhere in the world.
To illustrate the potential power of this approach we have been testing with two disparate pre-made machine learning models. One has a wide range of classes that includes traffic signals – ResNet50. The other is an ImageNet based model that included the LISA real-life traffic signal datasets. For the first test of Anyverse, we generated a sample dataset of three stack traffic signals of 500 images of each class, red, yellow, green, and off. The images vary over distance from car, angle, sun position, signal color variation, and other environmental factors.
For transfer learning, we first removed the last fully connected layer from each model and then trained as a Softmax classifier with our sample dataset. From there we fine-tuned via back propagation through the full network. We think this mirrors the approach that customers will take to prove the value of our machine generated synthetic datasets. For the second test, we created a larger dataset of 5000 images of each class, and this time at a higher resolution of 4K images. This dataset also included more variations than the first.
To test the results, we used the Bosch traffic signal datasets to determine how performance improved, after cleaning up some bad data in the set. We ran the base model, ResNet50, and our updated model, ResNet50+, on a subset of the Bosch traffic signal dataset using only red, yellow, and green lights in three stack signals since that’s what we generated initially. When matching those samples, we compared what was seen as a traffic signal and its region of interest to the ground truth data from the testing samples. We considered the match accurate if the matching region overlapped by at least 50%, meaning the traffic signal was identified even if not seeing 100% of the traffic signal. We also required at least 50% confidence in our tests.
The base model, ResNet50, accurately matched traffic signals in the test data 78% of the time. The improved model, ResNet50+, on the other hand accurately matched traffic signals in the test data 89% of the time.
For the second test, we also used the cleaned up Bosch traffic signal dataset for testing. The base model that included the LISA real traffic light dataset was found to have an accuracy of 89%. Our transfer learned model based on our larger synthetic dataset found an improvement in accuracy of 97%. In both tests, the accuracy of green lights was extremely high, red lights were high, and yellow lights were just OK. All three showed improvement over the base models however.
These tests were made without fine-tuning or other additional approaches that others have found to improve accuracy because we wanted a simple and clean comparison. We also did start to run into some overfitting issues that we avoided by limiting the number of training iterations.
More to Come
Stay tuned, we plan to provide more details in a white paper and future blog post regarding this and other demonstrations of the potential of our approach. We’re confident and excited to bring dramatic improvement to machine learning models with our approach. We will also continue to add more variations to our datasets showing both the value of the approach and the improved accuracy. To deal with any overfitting issues, we will also add other classes from road-based scenes including traffic signs, cars, people, lane lines, crosswalks curbs, and all sorts of minor elements found in such scenes.