The Issue of AV Training
The Example of Traffic Lights
It’s been recognized in the literature that enhancing machine learning models with synthetic images delivers positive outcomes in accuracy. Many of these early studies have achieved some improvement with images with low graphics quality (game engines) that unfortunately introduce their own compromises and artifacts that can plague the resulting model.
The Anyverse Approach
The Anyverse approach is to use high fidelity physics-based modeling and rendering to create true photorealistic images.
Our approach allows us to generate critical metadata – subject boxing, instance segmentation, and/or semantic segmentation concurrent with the generation of images.
This eliminates slow, expensive, and error-prone tagging that comes from post-capture processing that plagues the use of real-world data. We can also model camera specifications, locations, and configurations.
In the case of traffic signals, we can generate a wide range of traffic signal models, number of lights, arrows, orientation, in a variety of settings to simulate the environment anywhere in the world.
To illustrate the potential power of this approach we have been testing with two disparate pre-made machine learning models. One has a wide range of classes that includes traffic signals – ResNet50. The other is an ImageNet based model that included the LISA real-life traffic signal datasets. For the first test of Anyverse, we generated a sample dataset of three stack traffic signals of 500 images of each class, red, yellow, green, and off.
The images vary over distance from car, angle, sun position, signal color variation, and other environmental factors.
For transfer learning, we first removed the last fully connected layer from each model and then trained as a Softmax classifier with our sample dataset. From there we fine-tuned via back propagation through the full network. We think this mirrors the approach that customers will take to prove the value of our machine generated synthetic datasets.
For the second test, we created a larger dataset of 5000 images of each class, and this time at a higher resolution of 4K images. This dataset also included more variations than the first.
To test the results, we used the Bosch traffic signal datasets to determine how performance improved, after cleaning up some bad data in the set. We ran the base model, ResNet50, and our updated model, ResNet50+, on a subset of the Bosch traffic signal dataset using only red, yellow, and green lights in three stack signals since that’s what we generated initially. When matching those samples, we compared what was seen as a traffic signal and its region of interest to the ground truth data from the testing samples.
We considered the match accurate if the matching region overlapped by at least 50%, meaning the traffic signal was identified even if not seeing 100% of the traffic signal. We also required at least 50% confidence in our tests.
The base model, ResNet50, accurately matched traffic signals in the test data 78% of the time. The improved model, ResNet50+, on the other hand accurately matched traffic signals in the test data 89% of the time.
For the second test, we also used the cleaned up Bosch traffic signal dataset for testing. The base model that included the LISA real traffic light dataset was found to have an accuracy of 89%. Our transfer learned model based on our larger synthetic dataset found an improvement in accuracy of 97%. In both tests, the accuracy of green lights was extremely high, red lights were high, and yellow lights were just OK.
All three showed improvement over the base models however.
These tests were made without fine-tuning or other additional approaches that others have found to improve accuracy because we wanted a simple and clean comparison. We also did start to run into some overfitting issues that we avoided by limiting the number of training iterations.
More to Come
Stay tuned, we plan to provide more details in a white paper and future blog post regarding this and other demonstrations of the potential of our approach. We’re confident and excited to bring dramatic improvement to machine learning models with our approach. We will also continue to add more variations to our datasets showing both the value of the approach and the improved accuracy. To deal with any overfitting issues, we will also add other classes from road-based scenes including traffic signs, cars, people, lane lines, crosswalks curbs, and all sorts of minor elements found in such scenes.
Anyverse™ helps you continuously improve your deep learning perception models to reduce your system’s time to market applying new software 2.0 processes. Our synthetic data production platform allows us to provide high-fidelity accurate and balanced datasets. Along with a data-driven iterative process, we can help you reach the required model performance.
With Anyverse™ you can accurately simulate any camera sensor and help you decide which one will perform better with your perception system. No more complex and expensive experiments with real devices, thanks to our state-of-the-art photometric pipeline.