GstCUDA - Features and Limitations

Index

Next: Supported Platforms

GstCUDA project general characteristics

GstCUDA characteristics:

Easy CUDA algorithm integration into GStreamer pipelines.
Complexity abstraction of both CUDA and GStreamer - allowing the developer to focus on the CUDA algorithm.
Optimal performance assurance for GStreamer/CUDA applications on Jetson platforms.
Support for PC systems that have NVIDIA GPUs. (x86 architecture)

Key Features

GstCUDA key features:

Offers a framework allowing users to develop custom GStreamer elements that can execute any CUDA algorithm. The framework consists of a series of base classes that abstract the complexity of GStreamer and CUDA integration.
Zero memory copy interface between CUDA and GStreamer on Jetson family platforms (TX1, TX2, Xavier, Nano and Orin).
GstCUDA supports two modes of memory handling:
- NVMM direct mapping mode: use the GstCUDA API's to directly handle NVMM memory buffers. This method provides the best possible performance on the Tegra platforms.
- Unified memory allocator mode: avoids the use of NVMM memory buffers by providing a memory allocator that directly passes the buffer to the GPU, providing zero memory copies and maintaining an excellent performance. This mode has a lower performance in comparison with the "Unified memory allocator mode". The Unified memory allocator is used in conjunction with V4L2 and user-space buffers.
- The two memory handling modes allow GstCUDA to support NVMM buffers, V4L2 buffers and user-space buffers.
Supports heavy CUDA algorithms and large amounts of data to be processed on the GPU without performance being affected due to copies or memory conversions. Handles up to 2x 4K 60fps streams simultaneously with "NVMM direct mapping mode" and 2x 4K 40fps streams simultaneously with "Unified memory allocator mode".
Provides a set of video filter quick prototyping GStreamer elements, with different input/output combinations, that allows video frames to be processed by the GPU using a custom CUDA library algorithm. Those elements executes the CUDA algorithm from a custom CUDA library loaded dynamically during run-time, passed to the GstCUDA element by setting an element property value. The user can choose between the different provided elements, to find the one that best matches the project requirements. It is ideal for quick prototyping, because the CUDA algorithm is separated from the GStreamer element, so the user could make modifications to the CUDA algorithm, recompile the custom CUDA library and run the GStreamer pipeline again to test it. Using run-time linking allows the CUDA algorithm to be swapped out or updated without having to rebuild any of the GStreamer source.
Provides integrated ad-on elements; that consist of a complete shared library which execute a specific CUDA algorithm. Those ad-ons elements are based on the GstCUDA framework, and clearly shows the potential of this framework being used to generate a final product.

Limitations

The current release exposes the following limitations and known bugs:

~~It only supports the whole NVIDIA Jetson family platforms (TX1, TX2, Xavier and Nano). There are plans in the future to extend support for PC systems that have NVIDIA GPUs.~~

There are plans in the future to extend support for EGL memory type buffers being directly accepted as inputs.

Index

Next: Supported Platforms