The Trillion Miles Problem

Proper and robust AI training is essential for developing reliable and safe autonomous vehicles. This, in turn, requires rich, high-quality and unbiased training datasets.


Today, most of autonomous vehicles developers train and validate their models in the real world. Datasets are obtained and laboriously tagged to produce training data. However, there exists a huge body of challenging cases that can’t be easily reproduced by driving test miles in the real world. They are rare and difficult to find but they represent the most challenging and unpredictable scenarios and should be taken care of to optimize the safety of the vehicle.


Additionally, systems trained with real-world datasets are vulnerable to statistical bias due to the impossibility of collecting a statistically balanced (unbiased) range of environmental elements (e.g. changing conditions in weather and lighting, ambiguous lane layouts, unconventional vehicles, confusing signaling, pedestrians, animals, etc.).


Synthetic datasets can produce unlimited variations of digitally generated scenarios, lighting (traffic, street, buildings, sun position, night conditions) and scenery features such as atmospheric effects, object damages, other vehicles, road layout and pedestrians. Millions of virtual miles can be trained and tested in a fraction of the time and cost, guaranteeing a competitive advantage over teams relying exclusively on real-world datasets.

Let's talk about synthetic data!