Basics and Foundation - Video Stabilization process with IMU

Follow Us On

RidgeRun Video Stabilization Library

Table of Contents [Sticky]

Video stabilization process using IMU

IMU

An Inertial Measurement Unit (IMU) is an electronic device that measures and reports a body's specific force, usually angular rate and acceleration. It uses a combination of accelerometers, gyroscope and sometimes magnetometers.

Accelerometer

It detects linear acceleration along each axis (x, y, z).

Gyroscope

It measures angular rate ( $\omega$ ) or rotational speed around each axis. It measures the change in rate of the body's orientation.

Magnetometers

It measures magnetic fields, which provides additional information to compensate for magnetic influences.

IMUs are commonly found in many modern devices that capture video, including smartphones, action cameras, drones, and in some professional video recording equipment

Sensor Noise

All measurements provided by sensors have some degree of error attached to them; it can be mitigated by integrating the measurements into quaternions and applying different filters.

Spatial Rotations and Orientations

An axis and an angle can define the rotation of an object in a 3D space. For example, consider the unit sphere in this figure, with the rotating axis ${\hat {e}}$ and the angle $\theta$ .

Rotations can be described in different ways, such as Euler angles. These can be good to check for initial understanding, but Euler angles suffer from gimbal lock. Check out this video on the topic. Quaternions can describe rotations and avoid this problem. They are also more efficient.

An orientation in space is different from a rotation; an orientation can be defined with a single three-dimensional vector with $\mathbf {i} ,\mathbf {j} ,\mathbf {k}$ components. A rotation corresponds to a transformation of the object's orientation. In the quaternions domain, they are described as a quaternion with the same three components and an extra real fourth component (which was zero for the orientation). This video describes some concepts related to quaternions, like how they compare to Euler angles and an initial idea of slerp, among others. Even though the video is centered on game development and we work with video stabilization, it can be helpful to understand some ideas in a very broad way; in general, quaternions have a large range of applications. More mathematical concepts and intuitions will be introduced in the next section.

Quaternions

Quaternions are 4-D representations of an imaginary axis of rotation that describes the attitude of an object compared to some reference frame. It has three imaginary components, one for each axis in the frame, and one extra real component.

The following expression generally represents them:

$\mathbf {q} =a+b\mathbf {i} +c\mathbf {j} +d\mathbf {k}$

The key behind quaternions is that they can be understood as the projection of a higher order fourth-dimensional object, and their operations describe the movement and rotation of that object. These operations can be modified to accommodate our needs in three dimensions. The imaginary part represents the axis of rotation. 3blue1brown and Ben Eater provided a great explanation of the behavior of quaternions in this interactive lesson (check out the videos on the right before starting the interactive lesson).

In summary, expanding the previous expression by the Taylor series of the Euler equation gives:

$\mathbf {q} (\alpha )=a\cos({\frac {\alpha }{2}})+\sin({\frac {\alpha }{2}})(b\mathbf {i} +c\mathbf {j} +d\mathbf {k} )$

and any point $\mathbf {p}$ represented in quaternion notation can be rotated by a certain axis $\mathbf {q}$ with the formula:

$\mathbf {y} =\mathbf {q} \otimes \mathbf {p} \otimes \mathbf {q} ^{-1}$

Sensor-Aid Video Stabilization

The information provided by the IMU can be used to estimate the rotational motion. This estimation allows for the removal of unwanted noisy movements and the application of motion compensation in the frame’s plane. The next diagram shows the general process. Using the filtered movement measures, it is possible to apply scale and cropping to the video frames to obtain a more stable video.

Motion estimation

The information provided by an accelerometer can be used to estimate the body's displacement, which is the movement in a given unit of time (a.k.a. velocity). On the other hand, gyroscopes provide angular velocity, which is the rate of change of the angular in three possible axes (Yaw, Pitch and Roll).
The Integrator is in charge of estimating the video motion.

Synchronization

For the sensor-aid video stabilization, it is crucial to match the IMU measurements to the video frame. A failure in this matching process will lead to no optimal stabilization (even if there won't be stabilization at all). Usually, the IMU sensors provide a monotonic counter associated with their measurements, and the cameras set the timestamp when the imager starts capturing the raw frame. These two times must be in the same time reference and match together in an interpolation.

You can learn more about this process in Interpolation.

Smooth motion

Filtering the unwanted movements of the orientations is done in order to compensate for these in the image frame correction. In this context, we can think of noise in two ways: first, the inherent noise from the sensors, which comes from electrical noise or quantization error. This noise would be reduced right after the capture of the sensor readings, and noise is considered as the actual unwanted camera movements that we want to filter to obtain a smooth motion estimation.

The motion smoothing is in charge of the Stabilizer.

Image warping (undistortion)

Using the information on the original body's orientation and the filtered orientations with more stable orientations and movement transitions, the image warping step consists of applying a transformation to each frame, which consists of shifting pixels, scaling, and cropping the image, effectively creating smoother footage.

For the RidgeRun Video Stabilization, the process estimates the motion as rotations. This implies that image warping is not perspective warping. Instead, it utilises the pinhole model and the intrinsic matrix to start the rotations. At the mathematical level, the final output maps are generated based on an undistortion procedure.

More information can be found in Video Undistortion.

❯