Birds Eye View - Getting Started - Birds Eye View and Libpanorama

From RidgeRun Developer Wiki



  Home Getting Started/Evaluating the Product ⇨




Throughout this guide, or with technical conversations with our support team, you've probably heard the name Libpanorama thrown around. This wiki clarifies the concepts of Libpanorama, Birds Eye View and the relationship between them.

What are Birds Eye View Images?

The easiest way to explain what Birds Eye View image is, is by using a picture.


Birds Eye View Image
A simulated Birds Eye View image generated from 4 perspective cameras.


As you can imagine by now, the image on the right is the BEV. It is an aerial view of the scene: what you would see if you were a bird flying over it, looking down. The BEV is a virtual composition formed by a collection of images that capture the surroundings. It is interesting to notice how these images are not facing down. The aerial view is generated from the limited floor information provided by the camera, after performing a carefully crafted perspective transformation.

The cameras must have some overlap between them. This allows the system to generate a full BEV image, without patches of missing information. The center of a BEV image, which is the object of interest, is typically a black region. Since there are no cameras facing the object, we effectively have a void of pixels there. Comercial applications typically overlay an avatar of the object on top of it. In this case: the black car.

Birds Eye View images have become very popular, specially among terrestrial vehicles. It is common to see them on modern cars, where they are used by the driver in order to have better control of the car dimensions and the obstacles in the exterior. Autonomous robots also use them to simplify collision avoidance and path planning algorithms. Lately, heavy machinery vehicles use them to have a complete view of the surroundings and avoid accidents, which otherwise would be impossible due to the size of the system.

Generating Birds Eye View Scenes

The general process of creating a Birds Eye View image can be easily understood. In there practice, there are several more details to take care of, but we'll cover some of them later. The main idea behind classical BEV systems is the Inverse Perspective Mapping (or IPM, for short). The following figure exemplifies this concept.


Inverse Perspective Mapping
Illustration of the Inverse Perspective Transformation. Taken from here.


I'll spare the math since it's out of the scope of this article. The following figure shows the IPM process in the form of an image processing chain.


BEV Process
The processing pipeline of a classical IPM based BEV.


The process is self explanatory but its worth focusing on the third step: the lens distortion removal. The IPM needs to be performed on rectilinear images. This means that, in order to use fisheye lens cameras, you need to remove this distortion first. Even if the image was not captured using a fisheye lens, perspective cameras still have some slightly noticeable curvature that may be corrected. Rectifying these images result in higher quality BEV images.

The next interesting step is fourth one: the perspective transform. The following image shows the sub-process.


IPM Subprocess
The perspective transformation sub-process.


Notice how, after the perspective transform. the image is enlarged. This typically corrects the BEV aspect ratio so that the resulting image has the natural dimensions of the scene. Finally, the resulting image is a cropped sub-section. This serves two purposes: on the one hand, it removes sections where there is no information available and, on the other hand, it removes the parts of the image where stretching is too noticeable due to extreme pixel interpolation. Both of these effects are caused by the IPM process.

Finally, you may be wondering how is this perspective transformation found. Don't worry, we'll cover that in a moment.

Introducing Libpanorama

You may have noticed how, from the original capture to the final BEV, the image undergoes several transformations. It would be great if these transformations could be combined together instead of performing each of them one by one. Moreover, you may also have noticed that the resulting BEV image consists only of a small, internal portion of the original capture. It would be ideal if we could only process the pixels that are actually needed by the BEV, ignoring the rest and saving precious processing time. That is precisely what Libpanorama was designed for.

Libpanorama is written in C++ and heavily relies on template meta-programming and static polymorphism to delegate as much processing to the compiler as possible, as opposed to runtime.

Extending Libpanorama

As said before, Birds Eye View is just one of the multiple use cases for Libpanorama. In practice, almost any geometric transform can be performed in real-time to the video feed, using the library. While the technical details on how to do so are out of the scope of this document, here are a list of some other use cases Libpanorama was built for.

Fisheye Undistort

Fisheye lenses allow cameras to capture more field of view than typical rectilinear ones. While "normal" lenses are capable to capture 110 degrees (at most), fisheye ones can go up to 210 degrees or more! They are capable of doing so, by introducing radial distortion in the image, resulting in the famous barrel effect.

The process of taking a fisheye image and removing its distortion to convert it back to a rectilinear projection (with information loss) is what we define as "undistort". This process is also typically applied to "normal" lenses as well which, due to manufacturing defects, can introduce radial defects to the image.

Libpanorama can perform the fisheye undistort in real-time.


Fisheye Undistort
A fisheye image and two different undistort algorithms.


Panoramas

A panoramic image is another projection which allows you to convey a greater field-of-view than a normal lens would. Like fisheye lenses, they do so by introducing unnatural distortions to the resulting image, but unlike fisheye, these are way less evident and unpleasant to the viewer. Panoramic images are typically generated by combining multiple individual images together, which results in the ability to capture a full 360 view of the scene in an image.

Libpanorama is capable of taking an array of individual images and blending them into a single panorama.


360 degrees panoramic image
Example of 360 degree generated panorama from three individual cameras.

Surround 360 Video

One of the benefits of panoramic images (or more correctly, equirectangular images) is that they can map the full 360 environment into a plane (the image). It is possible, then, to take a region of this equirectangular image and project it back to the rectilinear space. Even better, we can move the taken region and give the impression that the panning and tilting around, producing an immersive experience.

Libpanorama provides the tool to produce these immersive experiences in real time.



Equirectangular to rectilinear
Left: Equirectangular image (taken from Paul Bourke's site). Right: Rectilinear view taken from a certain perspective.


Little World

Just for the fun of it (and because we can 😉), we configure Libpanorama to generate the little world projection. This is the stereographic projection taken to the extreme. Regardless of the potential use cases, it does look pretty cool.

Libpanorama is capable of producing little world video streams in real time.


Little world projection
Left: Equirectangular image. Right: Little world projection from an aerial perspective.


Custom Projections

Interested in developing a custom projection? Don't hesitate to contact us and we'll be happy to assist in the development.



  Home Getting Started/Evaluating the Product ⇨