Jump to content

Basics and Foundation - Video Stabilization Process with IMU

From RidgeRun Developer Wiki

Follow us on: YouTube Twitter LinkedIn Email Share this page

Share This Page

Preferred Partner Logo 3 Partner Program Banner



Video stabilization process using IMU

IMU

An Inertial Measurement Unit (IMU) is an electronic device that measures and reports a body's specific force, usually angular rate and acceleration. It uses a combination of accelerometers, gyroscope and sometimes magnetometers.

DOF stupně volnosti u pohybových simulátorů
DOF stupně volnosti u pohybových simulátorů
  • Accelerometer

It detects linear acceleration along each axis (x, y, z).

  • Gyroscope

It measures angular rate (ω) or rotational speed around each axis. It measures the change in rate of the body's orientation.

  • Magnetometers

It measures magnetic fields, which provides additional information to compensate for magnetic influences.

IMUs are commonly found in many modern devices that capture video, including smartphones, action cameras, drones, and in some professional video recording equipment

Sensor Noise

All measurements provided by sensors have some degree of error attached to them; it can be mitigated by integrating the measurements into quaternions and applying different filters.

Spatial Rotations and Orientations

An axis and an angle can define the rotation of an object in a 3D space. For example, consider the unit sphere in this figure, with the rotating axis e^ and the angle θ.

Rotations can be described in different ways, such as Euler angles. These can be good to check for initial understanding, but Euler angles suffer from gimbal lock. Check out this video on the topic. Quaternions can describe rotations and avoid this problem. They are also more efficient.

An orientation in space is different from a rotation; an orientation can be defined with a single three-dimensional vector with 𝐢,𝐣,𝐤 components. A rotation corresponds to a transformation of the object's orientation. In the quaternions domain, they are described as a quaternion with the same three components and an extra real fourth component (which was zero for the orientation). This video describes some concepts related to quaternions, like how they compare to Euler angles and an initial idea of slerp, among others. Even though the video is centered on game development and we work with video stabilization, it can be helpful to understand some ideas in a very broad way; in general, quaternions have a large range of applications. More mathematical concepts and intuitions will be introduced in the next section.

Quaternions

Quaternions are 4-D representations of an imaginary axis of rotation that describes the attitude of an object compared to some reference frame. It has three imaginary components, one for each axis in the frame, and one extra real component.

The following expression generally represents them:

𝐪=a+b𝐢+c𝐣+d𝐤

The key behind quaternions is that they can be understood as the projection of a higher order fourth-dimensional object, and their operations describe the movement and rotation of that object. These operations can be modified to accommodate our needs in three dimensions. The imaginary part represents the axis of rotation. 3blue1brown and Ben Eater provided a great explanation of the behavior of quaternions in this interactive lesson (check out the videos on the right before starting the interactive lesson).

In summary, expanding the previous expression by the Taylor series of the Euler equation gives:

𝐪(α)=acos(α2)+sin(α2)(b𝐢+c𝐣+d𝐤)

and any point 𝐩 represented in quaternion notation can be rotated by a certain axis 𝐪 with the formula:

𝐲=𝐪𝐩𝐪1


Sensor-Aid Video Stabilization

The information provided by the IMU can be used to estimate the rotational motion. This estimation allows for the removal of unwanted noisy movements and the application of motion compensation in the frame’s plane. The next diagram shows the general process. Using the filtered movement measures, it is possible to apply scale and cropping to the video frames to obtain a more stable video.

Video stabilization with IMU

Motion estimation

The information provided by an accelerometer can be used to estimate the body's displacement, which is the movement in a given unit of time (a.k.a. velocity). On the other hand, gyroscopes provide angular velocity, which is the rate of change of the angular in three possible axes (Yaw, Pitch and Roll).
The Integrator is in charge of estimating the video motion.

Synchronization

For the sensor-aid video stabilization, it is crucial to match the IMU measurements to the video frame. A failure in this matching process will lead to no optimal stabilization (even if there won't be stabilization at all). Usually, the IMU sensors provide a monotonic counter associated with their measurements, and the cameras set the timestamp when the imager starts capturing the raw frame. These two times must be in the same time reference and match together in an interpolation.

You can learn more about this process in Interpolation.

Smooth motion

Filtering the unwanted movements of the orientations is done in order to compensate for these in the image frame correction. In this context, we can think of noise in two ways: first, the inherent noise from the sensors, which comes from electrical noise or quantization error. This noise would be reduced right after the capture of the sensor readings, and noise is considered as the actual unwanted camera movements that we want to filter to obtain a smooth motion estimation.

The motion smoothing is in charge of the Stabilizer.

Image warping (undistortion)

Using the information on the original body's orientation and the filtered orientations with more stable orientations and movement transitions, the image warping step consists of applying a transformation to each frame, which consists of shifting pixels, scaling, and cropping the image, effectively creating smoother footage.

For the RidgeRun Video Stabilization, the process estimates the motion as rotations. This implies that image warping is not perspective warping. Instead, it utilises the pinhole model and the intrinsic matrix to start the rotations. At the mathematical level, the final output maps are generated based on an undistortion procedure.

More information can be found in Video Undistortion.

Algorithms with IMU

Here we introduce some of the algorithms that are essential parts of the overall video stabilization process.

Integration

In this context the use of an integration is done to determine the orientation of an object from its angular velocity, which is the rate of change of its orientation over time. Essentially, it transforms the angular velocity (rotation rate) into the actual orientation.

However, this approach does not incorporate data from accelerometers, which measure linear acceleration. Accelerometer data can be fused with gyroscope data (which provides the angular velocity) in a process known as sensor fusion. This fusion can yield a more accurate estimate of the object’s orientation.

Simple integration

The simple integration algorithm takes only the gyroscope measurements, which are the rate of change in rotation, to obtain the given rotation, using an initial value of orientation.

However, this approach does not incorporate data from accelerometers, which measure linear acceleration. Accelerometer data can be fused with gyroscope data (which provides the angular velocity) in a process known as sensor fusion. This fusion can yield a more accurate estimate of the object’s orientation.

VQF

The Versatile Quaternion Based Filter, or VQF, is proposed in the paper Highly Accurate IMU Orientation Estimation with Bias Estimation and Magnetic Disturbance Rejection and it uses a gyroscope bias estimation algorithm and an algorithm for magnetic disturbance detection and rejection.

The full version of the algorithm includes additional features such as rest detection, gyroscope bias estimation, and magnetic disturbance rejection. Notably, the gyroscope bias estimation method in VQF avoids reliance on magnetometer corrections, which enhances its resilience against magnetic disturbances. Instead, the bias estimation is based solely on the disagreement between strapdown integration and accelerometer measurements during motion. This design choice helps maintain accuracy and robustness in challenging environments.

Madgwick

This algorithm uses a quaternion representation. allowing accelerometer and magnetometer data to be used in an analytically derived and optimised gradient-descent algorithm, and it proposed in the paper An Efficient Orientation Filter for Inertial and Inertial/Magnetic Sensor Arrays.

Performance evaluation compared this filter with a proprietary Kalman-based algorithm used in orientation sensors. The results demonstrate that the filter achieves higher levels of accuracy than the Kalman-based method. Notably, the filter's low computational load and ability to operate at low sampling rates present new opportunities for real-time applications with IMU and MARG sensor arrays.

Complementary

The complementary algorithm is proposed in the paper A Quaternion-Based Orientation Filter for IMUs and MARGs and it uses an algebraic solution of a system to obtain a quaternion estimation.

Incorporating a complementary filter, the system analyses signals in the frequency domain to combine them effectively. By applying high-pass filtering on gyroscope data (affected by low-frequency noise) and low-pass filtering on accelerometer data (affected by high-frequency noise), the filter aims to achieve an all-pass and noise-free attitude estimation. This complementary filtering process is crucial for accurate attitude estimation from IMU readings.


Visual comparison of the methods

The following examples compare the visual result of applying different orientation-estimation methods to the same unstabilized video. The comparison includes the original input, the results without horizon lock, and the results with horizon lock enabled.

Unstabilized Demo

Integrators with horizon lock disabled

Integrators horizon lock enabled

Interpolation

Sensor data and video frame times might not be synchronised; to smooth the orientation at each frame, we need the orientation values for each corresponding frame. This is why we need to interpolate the values for the sensor to obtain the orientation values at each frame.

Spherical linear interpolation (Slerp)

Slerp is a method of interpolation on the surface of a unit sphere. Given two points on the sphere, SLERP provides a smooth curve that follows the shortest path on the sphere’s surface between these two points. The speed along this curve is constant, which makes SLERP particularly useful for creating smooth transitions.

Slerp is often used to interpolate between two orientations or rotations. Slerp is often used with quaternions, allowing the creation of smooth rotational motion between two orientations.

Smooth orientation

Spherical Exponential Smoothing

Exponential smoothing is a time series forecasting method that uses weighted averages of past observations. The weights decrease exponentially as the observations get older, hence the name "exponential smoothing." It’s a powerful method that can handle data with a systematic trend or seasonal component. We apply the Slerp interpolation to the exponential smoothing to obtain more stable orientations.

Horizon lock

The Horizon Lock technique leverages data from both the gyroscope and accelerometer. Specifically, it utilizes accelerometer data to determine the direction of gravity. This information is instrumental in stabilizing footage along the horizon.

Undistortion

In order to apply the required transformation to the original frames, a rotation is computed between the unstable and stable orientations. This rotation is then transformed into a rotation matrix, serving as a rectification transformation. This transformation rectifies the footage from the unstable space (original orientation) to the stable space (desired orientation). This process is analogous to the undistortion performed on raw images captured by a camera, which corrects for lens distortions. Following this, a set of mapping functions are derived for the image. These functions account for transformations such as translations, rotations, scalings, and cropping, ultimately resulting in stabilized footage.




Cookies help us deliver our services. By using our services, you agree to our use of cookies.