How simulating light and sensors help build better perception systems
Developing computer vision systems is not an easy task. We are talking about systems that need to understand what they see in the real world and react accordingly. But, How do they see the world? How do you teach a machine what the real world is and interpret it?
Simply put, vision is the perception of light. The human eye, coupled with the brain, forms the most advanced perception system that exists to date. On the other hand, computer vision systems use optical cameras to perceive light (mimicking the eye), then use deep neural networks (mimicking the brain) to understand what they see.
However, that “understanding” is limited today to specific problems like object detection, object segmentation, or depth estimation. We are still far from neural networks that can provide a full understanding of an image captured by a camera.
Because of this limitation, some systems complement the perception with all kinds of other sensors like lidar and radar working with parts of the electromagnetic spectrum beyond visible light (IR and radio).
When it comes to self-driving cars ...
In the case of autonomous vehicles, there is still a heated debate whether optical cameras are enough for self-driving cars or other types of sensors are necessary.
Everybody wants to solve the same problem: Engineer vehicles that understand the world around them and can react accordingly, in any situation, for safe autonomous driving. SImplifying a lot, at the end of the day, solving the problem boils down to:
Easier said than done. Getting data for a perception system is not easy. You have to take thousands of pictures and curate them, which requires infrastructure and organization, it can be a separate project in itself. If that is not enough, just images are not enough either for neural networks to learn.
While training, you need to tell the neural network what it is seeing, and for that, you need to tag and annotate every single image with the ground truth info it needs for the specific problem. Very time-consuming and, often, not a very accurate task. When you thought you were done, it turns out your system is not performing well, so you need more training and yes, more data.
Synthetic data as a “real” alternative
For that, you need to faithfully simulate the behavior of real cameras when generating synthetic images. The closer the images you use for training to the images the system is going to see to make decisions, the more accurate those decisions, less domain shift effects. This brings us back to the beginning of the article, vision is the perception of light.
To faithfully simulate the behavior of cameras you need first to simulate light and then, follow its physical behavior throughout a scene as it is reflected, refracted, diffracted, and scattered by the objects and particles it finds in its way to the camera.
If you correctly characterized the light sources, including the sun and the sky, and every material in a 3D scene, you know exactly the amount of energy per wavelength reaching the camera sensor. With this spectral information, now you can simulate the physics in the sensor itself and how it transforms the energy in electrons and then into voltage that finally, after some digital processing, will give you an image as it was taken with the real camera.
Add a procedural engine to generate thousands of variations of the 3D scene, change camera position, lighting, and weather conditions. Leverage the processing power of the cloud to run everything in parallel and you have the Anyverse™ synthetic data platform. It features a proprietary physics-based synthetic image render engine.
It uses an accurate light transport model and provides a physics description of lights, cameras, and materials. Allowing for a very detailed simulation of the amount of light that is reaching the camera sensor and an equally detailed simulation of the sensor itself to produce the final color image.
No light, no perception, is that simple...
Why is this important? No light, no perception, is that simple. For us, no light, no simulation. And no simulation means your synthetic data may not be that useful to train and test deep learning-based perception systems. It may be more difficult for the neural networks to generalize to real-world images. Because at the end of the day that is every perception system’s goal: understand the real world and interpret it.
Different academic papers demonstrate that a machine learning model, based on deep neural networks, trained on a synthetic dataset generated considering camera sensor effects, performs in general better than if the effects are not present. You can check these papers on the subject:
Sensor simulation goes beyond data
Anyverse™ helps you continuously improve your deep learning perception models to reduce your system’s time to market applying new software 2.0 processes. Our synthetic data production platform allows us to provide high-fidelity accurate and balanced datasets. Along with a data-driven iterative process, we can help you reach the required model performance.
With Anyverse™ you can accurately simulate any camera sensor and help you decide which one will perform better with your perception system. No more complex and expensive experiments with real devices, thanks to our state-of-the-art photometric pipeline.