You can’t afford to use inaccurate data to train autonomous driving AI


As if training and validating autonomous vehicle systems wasn’t already hard enough… having inaccurate data could make your life even more difficult.

Synthetic data is not a game

No one will think you’re crazy anymore when you talk about your synthetic data generation pipeline for autonomous driving development. Physically correct synthetic data, (in combination with real data or alone) is being used to enhance real-world datasets, creating unlimited scene variations, generating automatic annotations, ground truth data, and ultimately, reproducing the real world.

But why is there still a trust issue towards synthetic data? Or put another way, why is there a trust issue towards synthetic data that is not completely accurate?

It’s all about accuracy - Inaccurate synthetic data may make your AI weak

Not all synthetic data is created equal, and not all synthetic images can be considered as “correct” or accurate from a photometric point of view.

Data generation solutions or real-time graphic engines are often designed to generate as many images as possible, as quickly as possible… leaving aside many important aspects of reality such as the physics of light, lenses, or materials, all of which are aspects that do matter for AI to understand and interpret the real-world and act accordingly.
You can’t afford inaccurate data to train an autonomous driving AI

In addition to the above, these images and data are also configured to be displayed on a low dynamic range device such as a computer screen or TV, and lastly, to be viewed by humans. It’s fair to say that, at the end of the day, these images are just “pretty” images, ones that are qualitatively valid but don’t accurately simulate the behavior of a specific optical system, or a specific camera sensor…

This means that regarding these images as appropriate (from a photometric point of view) for training an autonomous transport perception system that requires the highest accuracy standards and minimal risk tolerance would be (if you’ll forgive the repetition) too risky…

It's also about sensors

Next-generation sensors that are being implemented in autonomous vehicles produce, and will continue to produce, data that will increasingly differ from a human-based visual model. For example, low or high lighting conditions, sensors that adapt to the lighting environment, new spectral filters to enhance object and material detection, infrared vision, etc.

So, it doesn’t seem unreasonable to believe that non-highly accurate data designed for qualitative human experiences will be less and less useful for autonomous driving development and advanced sensors.
You can’t afford inaccurate data to train an autonomous driving AI

You want to train your autonomous AI driving models with data coming from the same sensors your system is going to use. That’s why when you use synthetic data for training, you want to simulate those sensors as faithfully as possible and produce data that will be close to what the AI system model is going to consume. This will reduce the domain shift and help your system apply itself to the real world better.

Safety is not an option

We are not arguing that some systems couldn’t be trained without photometric, hyperspectral, pixel-accurate synthetic data, it may be sufficient for less demanding autonomous applications, but training AV and ADAS with this typology of data only increases the risk of overfitting, introducing information bias, and something that autonomous human transportation can’t afford… the system may not understand or be able to interpret the images coming from the real sensors and optical systems.

About Anyverse™

Anyverse™ helps you continuously improve your deep learning perception models to reduce your system’s time to market applying new software 2.0 processes. Our synthetic data production platform allows us to provide high-fidelity accurate and balanced datasets. Along with a data-driven iterative process, we can help you reach the required model performance.

With Anyverse™, you can accurately simulate any camera sensor and help you decide which one will perform better with your perception system. No more complex and expensive experiments with real devices, thanks to our state-of-the-art photometric pipeline.

Need to know more?

Visit our website, anytime, or our Linkedin, Instagram, and Twitter profiles.

Scroll to Top

Let's talk about synthetic data!