Accurate data for autonomous driving: Everything you need to know


Insights series

Accurate data for autonomous driving: Everything you need to know

What exactly does data accuracy mean in the context of training, testing, and validating an autonomous driving system? What does it take for one dataset to be more accurate than another? And why not having enough data accuracy could kill and bury your self-driving project.

Feeding your deep learning model with highly accurate synthetic data becomes imperative for those developers who want to stand out from the crowd and develop an autonomous vehicle capable of recognizing, analyzing, and interacting safely with the beautiful and unpredictable real world.

These and more questions are going to be cleared up soon in our new insight series about data accuracy:

Real-world data is probably what came to your mind in the first place when you tried to answer what accurate data should be. And you are right, real-world data is perfect for our good ol’ seasoned human eyes. However, it doesn’t mean that real-world data is always the most convenient source of information to train artificial intelligence. The variables “who” is interpreting reality: perception systems; “how” they visualize it: sensor stack; and “what” they need to learn: the AI behind it, are absolutely key.

1. The need for Pixel-accurate synthetic data for autonomous driving perception, development & validation

The advanced perception and AV/ADAS industry is implementing a new generation of sensors and optical systems that perceive the world in very different ways than humans do. With even more new ways in the upcoming years, synthetic data is going to be unequivocally needed for designing, training, calibrating, validating, and ultimately, upgrading sensors and perception systems alike. But not just any synthetic data… Synthetic data capable of faithfully and accurately simulating this new generation of sensors

Now that we are beginning to understand why data accuracy is key to a successful perception systems & new-generation sensors combination, it’s time to emphasize another important matter. Data accuracy has nothing to do with the concept of photorealism that we commonly attach to the images generated by off-the-shelf, real-time, computer graphics engines.

Planning on upgrading sensors?

Learn how to faithfully simulate your sensor and ISP and get full control to decide the best sensor for your perception system

Don’t get me wrong, these engines can develop beautiful, flashy images that can perfectly fit the requirements for other and less complex applications, but they don’t provide the precision to accomplish the data accuracy required for developing, for instance, human-safe and trustworthy, fully autonomous transport based on artificial perception.

Safety is one sine qua non condition AV/ADAS developers must commit to if they don’t want to end up working in a quicksand paradigm…

2. You can’t afford to use inaccurate data to train autonomous driving AI

Data generation solutions or real-time graphic engines are often designed to generate as many images as possible, as quickly as possible… leaving aside many important aspects of reality such as the physics of light, lenses, or materials, all of which are aspects that do matter for AI to understand and interpret the real-world and act accordingly.

This means that regarding these images as appropriate (from a photometric point of view) for training an autonomous transport perception system that requires the highest accuracy standards and minimal risk tolerance would be (if you’ll forgive the repetition) too risky…

In addition to the above, the images and data generated by real-time engines are also configured to be displayed on a low dynamic range device such as a computer screen or TV, and lastly, to be viewed by humans. It’s fair to say that, at the end of the day, these images are just “pretty” images, images that are qualitatively valid but don’t accurately simulate the behavior of a specific optical system, or a specific camera sensor…

Accurate data for autonomous driving: Everything you need to know
We are not arguing that some systems couldn’t be trained without photometric, hyperspectral, pixel-accurate synthetic data, it may be sufficient for less demanding autonomous applications, but training AV and ADAS with this typology of data only increases the risk of overfitting, introducing information bias, and something that autonomous human transportation can’t afford… the system may not understand or be able to interpret the images coming from the real sensors and optical systems.

3. Seeking ground truth data generation… not going to happen using human annotators…

Human annotators have been a widely used resource to annotate real-world datasets, but why do they represent a serious risk and an obstacle to generating ground truth data?

To answer this question, let’s make a direct comparison between real-world data and synthetic data:

The most sophisticated AI algorithms, such as the ones recently used in autonomous driving applications, and especially those dealing with long-range detection, need to reach pixel-level accuracy. There is no margin for error at this point and human annotators (for obvious reasons) can’t (and shouldn’t) take on this task.

Leaving aside the fact that it is impossible to annotate real-world data at pixel-level, why apply a non-free-from-error human methodology to train systems which are more accurate than humans?

Accurate data for autonomous driving: Everything you need to know

4. How to generate accurate long-range detection data for AV - Facing the challenge

Most advanced perception systems applied to autonomous motion applications (self-driving cars, drones, …) whether Lidar-based, camera-based, or time-of-flight camera-based, require highly accurate, long-range detection data. The reason is that these autonomous devices must strictly meet the greatest safety standards.

As we have seen before, real-time graphic engines are able to provide labeled data, but since its generated data hasn’t been processed through an accurate, optical and sensor simulation, it still can’t guarantee pixel accuracy or physically correct metadata that the perception system may need for training and validation…

Training and validating AV deep learning models which face this technical challenge requires a major data generation and automatic labeling accuracy that only pixel-accurate synthetic data can offer today.

5. A look into the future of data for training and validating autonomous vehicles

If we look back, it’s amazing how many AV and ADAS systems have been developed with real-world data (and will probably continue to be used in combination with more accurate synthetic data), but real-world data limitations are more and more evident, to the point of questioning… Will future perception systems need them for training at all?

Most probably they will, but they won’t be enough. We are already starting to see this, even Tesla with an immense amount of real data captured by their fleet of sold cars, still have a growing synthetic data practice to complement it.

What happens when perception systems developers upgrade their cameras? Will they use their “legacy” real data to retrain the system? Will they capture new data? Or will they generate data simulating the new sensors? It’s a complex question and there is no simple answer, but accurate synthetic data is gaining more and more weight and is something you have to explore and keep an eye on.

We may not know for certain what the limits of real-world data are, but we can be sure that there are no limits to pixel-accurate synthetic data.

About Anyverse™

Anyverse™ helps you continuously improve your deep learning perception models to reduce your system’s time to market applying new software 2.0 processes. Our synthetic data production platform allows us to provide high-fidelity accurate and balanced datasets. Along with a data-driven iterative process, we can help you reach the required model performance.

With Anyverse™, you can accurately simulate any camera sensor and help you decide which one will perform better with your perception system. No more complex and expensive experiments with real devices, thanks to our state-of-the-art photometric pipeline.

Need to know more?

Visit our website, anytime, or our Linkedin, Instagram, and Twitter profiles.

Scroll to Top

Let's talk about synthetic data!