Example workspace aerial use case

Introduction

This example workspace main purpose is to get you started with the understanding of what we call the aerial use case. With this workspace you will be able to generate a basic dataset with a birds-eye-view of an outdoor scene. This point of view is typically what you need for a perception system mounted on dron of some sort. These perception systems are Al based for object detection and tracking

The workspace is ready to generate a dataset with some level of variability on the same based scene. It is designed to help you understand some other basic concepts to get you started working with the Anyverse platform such as: Base scenes, the variations system, cameras, sensors, scripting and more.

Workspace example for in-cabin behavior trees

Components

The base scene

To generate data for an aerial object detection and tracking system, the first thing you need is a base scene where you will populate with different assets to recreate scenarios to render and generate a varied dataset.

Anyverse Platform provides a good number of such base scenes. For the aerial use case you want to use base scenes that cover wide areas of terrain, bear in mind that depending on the altitude of the dron and the camera spec every image may cover several square meters. In this tutorial workspace we have used scenes_blocks_5_curves. It covers an area of approximately 200 m by 400 m, enough for the camera and altitud we are using in this tutorial

Resource – Base scenes
scene_blocks_5_curves

In the above image you can see the different locators that base scenes typically have to populate them with assets of different nature to create diverse scenes to render

Locators are a very important concept in Anyverse. They are used to position other elements with respect to them, any entity place in a locator will have a (0, 0, 0) position and rotation (transform for short) with respect to it. That means that the “effective” transform of the placed element will be the same ase the locator’s. Changing the transform of the element with respect to the locator will change its “effective” transform and it’s calculated adding the locator’s one.
In fact the nesting of element transforms apply to all elements in the simulation. The “efective” transform of an element is the sum of all the predecessor’s transforms and its own relative to its immediate parent.
This “efective” transform is what gets annotated as the position and rotation of an element in the generated data

You can explore the different locators in the scene. There are block locators, tree locators, sign locators, etc. Vehicles and pedestrians are placed using a different base scene concept: the city map or splines, represented by the blue, green and red lines in the 2D viewer and the colored surfaces in the 3D viewer. The anyverse Studio populate API uses this city map. This is is an advance scripting topic that we will cover in a different tutorial. For this one we have pre-populated the base scene with city blocks, trees and cars using ad-hoc scripts out of the scope of this tutorial.

Camera

You will notice that in the workspace Simulation there is an entity named Ego. That is a special entity we use to rig the cameras and you can think of it as a special locator. You can think of it as the dron you rig your cameras to and flies over the base scene. In our workspace we have defined 1 camera pointing straight down by default. But you can create and rig as many cameras as you need.

View from above with the global view to see the camera placement and the DronCamera PoV in that position

You can change the 3D viewer point of view by selecting a camera in the top left pulldown or by pressing P to toggle between cameras if you define more than one camera.

If you select a camera in the workspace you can see its properties in the properties panel and understand how to define a camera in Anyverse. Now configuring a camera basically means assigning 3 modules to it: Sensor, ISP and Lens. Each module is defined separately so you can create any camera combining them.

Camera modules references
Available camera modules in the workspace

Camera modules

Let’s see the details of the most important concepts and parameters you need to define for all three camera modules. The final color look & feel, image distortion, sharpness and artifacts depend on these configurations. Remember you want to simulate your real camera as faithfully as possible. You will need to get tall these details from your camera engineers and camera vendor.

Sensor resolution definition

ISPs are the camera modules that reconstruct the color of the image from the raw image the sensor module generates. It is the “secret sauce” to give the final image its unique look & feel. It is based in on a demosaicing algorithm using the sensor CFA (Color Filter Array), then it applies some white balance, adjusts the different color maps and applies a gamma correction. Our ISP implementation assures a color image output, but you could use your own ISP implementation the raw image output.

Lens definition

One of most defining parameter of sensors is its resolution. This is the first and more important parameters you need to define. Along with the pixel size determine the sensor size, that is important to know the resulting FoV depending on the lens focal length. Other important parameters are the shutter and exposure time, CFA, QE curves different types of noise sources, well capacity and filters sensors usually have to cut-off undesired wavelengths.

ISP definition

There are several camera lens types supported from a perfect PIN_HOLE and equidistan FISHEYE models, to an empirical OPENCV distortion model. When selecting OPENCV, you need to input the intrinsics block with the center of the lens in pixels and coefficients for the barrel (K1 to K6) and tangential (P0 and P1) distortions. This values along with the focal length in pixels are the result of the manual calibration process you can do with a real camera. After translating the focal length to meters (using the camera pixel size) you will have the characterization of the distortion of any real camera.

Camera position. Ground Sampling Distance

For an aerial use case the camera (or cameras) is typically attached to a flying vehicle, drone, aircraft, etc. flying over the scene taking pictures or recording video. There is a concept in aerial photography that can be very helpful to define the camera position and focal length: The Ground Sampling Distance (GSD).

The GSD is the distance between two consecutive pixel centers measured on the ground. The bigger the value of the image GSD, the lower the spatial resolution of the image and the less visible details. It is measured in cm/pixel. It’s value depends on the size of the camera sensor, the sensor resolution, the position of the camera above the ground (height) and the camera focal length. So if you have maximum GSD that you need to make sure your perception will be able to detect objects you can play with the height and focal length to get what you need.

Ground Sampling Distance- GSD. Credit: Pix4D

This is important when generating data to train an aerial perception system making sure you have a good variability within a reasonable range of GSD, since it is vary difficult to keep the drone’s height constant in the real world., and those fluctuations need to be present in the data you use for training.

You can calculate the GSD as follows: GSD = 100 * (Sw*H) / (F*ImW) with the units specified in the graph above. So if you have maximum and minimum GSD you can decide what is the sensor you need and the focal length of the optics and understand the minimum and maximum heights for the camera. With anyverse you can easily implement that variation in height when generating your datasets.

Additionally you can calculate the area covered in one picture calculating the the distance of the with and the distance of the height of the picture: Dw = GSD*ImW / 100; Dh = GSD*ImH / 100 Then the are in m2 is Area = Dw * Dh

In this tutorial’s workspace we start with a GSD = 1.24 cm/px, with the camera placed at 100 m above the ground, a 6.624 mm sensor width, a 1920×1080 sensor resolution and a focal length of 27.89 mm. Go to the sensor configuration and the lens configuration in the workspace to check all these and play around with the values. Using the 3D viewer with the camera PoV can help you understand the GSD concept.

GSD = 1.24 cm/px with the camera at 100 m above the ground
GSD = 2.47 cm/px with the camera at 200 m above the ground

Variability

When we want to generate a dataset to train a deep learning model, one of most important aspects of it is its variability. To avoid overfitting, you need to make sure that the model is exposed to as much variability of the characteristics the model needs to differentiate. Once you have identified those variable characteristics you need to define some kind of logic to variate them across the whole dataset. This means the every sample of the dataset needs to have different values for those characteristics. Anyverse Platform provides ways to define the logic and automatically apply it when generating each sample in the dataset.

In the workspace we are using for this tutorial for the aerial use case, we have a base scene populated with buildings, vegetation and traffic. All these elements are static. The purpose of the dataset we can generate with this workspace could be to train a perception system to detect cars from images taken from a drone flying over a city at some given height. We consider the variable characteristics of the dataset the point of view of the camera and the number of object of interest in every sample.

Concepts

Every entity in an Anyverse workspace has different properties depending on the entity type and the 3D asset associated to it. Some properties are common to all entities, like its position and orientation in the world reference system, or their visibility (you can hide and show entities at will) to name a couple that are useful. Other properties are specific to the associated 3D asset, like poses for people characters or thresholds to snap vehicles to the floor or traffic lines. You can see all properties of an entity in the properties panel, by clicking on it in the workspace:

Car entity properties panel
Camera entity properties panel

Understanding properties is very important to introduce the Anyverse Studio’s automatic variations system because mainly you can variate any entity property in the workspace with it.

In general, given the variable characteristics you have defined for your dataset, you need to “map” them to entity properties and the workspace and use the variations system to apply the logic you want to vary them. In our case the characteristics to vary are the point of view and the number of objects of interest per sample. Since we have a large city scene pre-populated with several cars, just changing the X, Y position of the Ego (the entity holding the camera) and the Camera roll, pitch and yaw angles (from the default camera orientation pointing straight down to the ground) will change the point of view and the number of cars in the frustum of the camera (effectively, what you get in the sample image).

We can easily use the Anyverse Studio variations system to apply some logic the above Ego and Camera properties:

Select the Generator and click the + in the variations list to add a new variation

For the camera angles, we set a random variation for all of them, ±20 for yaw and pitch and any angle for rall:

Find the camera entity in the workspace, the Rotation property and select a random variation

This creates a new automatic variation for the camara rotation angles. now we just have set the minimum and maximum values each angle will take (X = yaw, Y = pitch and Z = roll). Since the default yaw = 180º, pitch = 0º and roll = 90º, to have the random variation defined above we do:

During the dataset generation, in every iteration, the automatic variations system will randomly pick a value between the minimum and maximum for each angle in the Transform – Rotation property in the DronCamera entity.

Scripting

Besides a random variation you can select a Script variation for any property allowing you to implement any logic for your variation, even make it depend on other properties values or the iteration the generation is at. this gives you full power on the variation on a single property. The scripting language in Anyverse is Python, a vert familiar programming language for data scientists and engineers.

Ego position variation

For the Ego position we are going to script some logic. When you use a script for a variation, you are basically programming a python function that has to return the calculated value of the property you are varying.

Editing the variation script

Our function for the Ego position randomly pick an X, Y position for the Ego within a rectangle region we have defined in the workspace (loop_region) allowing a margin. Then since the region to match the loop in the scene is rotated 45º, we rotate the X,Y point 45º. Applying the rotation is easier in polar coordinates, so the script transform to polar adds 45º and transform to cartesian coordinates then create 3D vector with X, Y and Z (this one is invariant) and return it. This is the code:

import random
import math

cam_id = workspace.get_camera_entities()[0]
cam_rot = workspace.get_entity_property_value(cam_id, 'RelativeTransformToComponent','rotation')
print('Camera orientation: yaw: {:.2f}, pitch:{:.2f}, roll: {:.2f}'.format(cam_rot.x, cam_rot.y, cam_rot.z))

region_id = workspace.get_entities_by_name('loop_region')[0]
width = workspace.get_entity_property_value(region_id, 'RegionComponent','width')
depth = workspace.get_entity_property_value(region_id, 'RegionComponent','depth')
x = random.uniform(-depth/4, depth/4)
y = random.uniform(-width/4, width/4)
z = current_value.z
print('x: {}, y: {}'.format(x,y))

r = math.sqrt(x**2+y**2)
theta = math.atan2(y,x)
print('theta: {}'.format(math.degrees(theta)))
theta += math.radians(45)
print('theta: {}'.format(math.degrees(theta)))

x = r * math.cos(theta)
y = r * math.sin(theta)
print('x: {}, y: {}'.format(x,y))

new_pos = anyverse_platform.Vector3D(x, y, z)

return new_pos

You can see that we use the objects workspace and anyverse_platform from the Anyverse Platform API that provide methods and utilities to directly access entities and their properties When the variations system runs the above script for every iteration it will apply the resulting vector to the Ego position property.

There is another way to apply variations using scripting and it is complementary to the automatic variations system: The OnBeginIteration Script. In this script you can do whatever you want to anything in the workspace using the Anyverse Platform API. You could even completely reconstruct the wrkspace in every iteration. We will talk about the OnBeginIteration Script, the API and advanced scripting in other tutorials.

Results

Run the simulation

To make sure the variations work and to debug the scripts in the variation you can run the simulation in Anyverse Studio. Click the rocket icon under the 2D/3D viewers.

Simulation run

This runs all variations and scripts and display the results in the viewer. If like in our case we want to see what the camera will see in an iteration, make sure you selec the camera PoV in the 3D viewer.

Simulation result

As shown above, I recommend that you have the script console open in a different panel and that you can see it during the simulation run. In the output window in the console you can see the output of every script (all your print() statements) and the error stack trace if an error occurs so you can debug your code.

Stop and run as many times as you need to make sure you will get the results you want. Another way to debug and see run results is to run what we call a “Dry run” of your dataset generation. That is running all the iteration you specify in the Generator, without generating the jobs to render in the cloud.

This executes all variations and scripts per iteration and shows the output in the dry run execution window and in the script console output. You will be able to spot coding errors and wrong values from your calculations if you print them out.

Dry run execution

Generate a dataset

To generate a dataset from your workspace, you first have to create a dataset with the output channels you require. Go to the generator inspector space (1), then click the + icon to the right in the Dataset node of the tree (2), give it a name and select the channels you need for your dataset (you won’t be able to change them after creation) (3) and the Create button (4).

Create dataset

If you specify the depth channel for this dataset make sure you set the max depth to be at least the minimum height of your Ego. In the sample workspace that’ll be 100 m.

Now you can run a dataset generation. Select the dataset you just created from the pulldown selector on top of the 2D/3D viewers and click the launch icon. The batch summary window will pop-up where you can see, amongst other things, the number of samples that will be generated, the over all total pixel consumption, the available pixels in the account, the number of cameras and which ones are visible (that gives you the number of images per sample. And other relevant settings you may want to check because the affect the consumption of pixels.

You can launch the generation synchronously or asynchronously. The difference if that when launch asynchronously, Anyverse Studio will run the generation in a separate process in a OS console window. This allows to for example generate several batches in parallel. The output will go to the console window where the generation is running

You can go to the dataset inspector space to see the progress of your batches execution and wha they are finished you can explore the results right there as well.

Dataset explorer

Clicking the different frames will show you the results and for each frame you can click on the different channels to the right to se them with more detail and zoom on them with the mouse. Additionally yo can browse the JSON ground truth metadata all the way to the right in this space.

Download to local

Finally you can download all the frames with channels and ground truth to you local machine. Right click on the generated batch name and select the Download to local option in the contextual menu. Then you just stich al frames together to create a video in any format you want.

Here you have a couple of image results, similar to the ones you will get if you run a batch on the sample workspace

Was this article helpful?

Related Articles

Scroll to Top

Let's talk about synthetic data!