NVIDIA Jetson AGX Thor - GstCUDA GStreamer Plug-in for CUDA algorithm integration
The NVIDIA Jetson AGX Thor documentation from RidgeRun is presently being developed. |
GstCUDA Project General Characteristics
GstCUDA characteristics:
- Easy CUDA algorithm integration into GStreamer pipelines.
- Complexity abstraction of both CUDA and GStreamer - allowing the developer to focus on the CUDA algorithm.
- Optimal performance assurance for GStreamer/CUDA applications on Jetson platforms.
Promo Video
Features
- Offers a framework allowing users to develop custom GStreamer elements that can execute any CUDA algorithm.
- Zero memory copy interface between CUDA and GStreamer.
- GstCUDA supports two modes of memory handling:
- NVMM direct mapping mode: use the GstCUDA API's to directly handle NVMM memory buffers. This method provides the best possible performance on the Tegra platforms.
- Unified memory allocator mode: avoids the use of NVMM memory buffers by providing a memory allocator that directly passes the buffer to the GPU, providing zero memory copies and maintaining excellent performance.
- Supports heavy CUDA algorithms and large amounts of data to be processed on the GPU without performance being affected due to copies or memory conversions.
- Provides a set of video filter quick prototyping GStreamer elements, with different input/output combinations, that allows video frames to be processed by the GPU using a custom CUDA library algorithm.
- Provides integrated add-on elements; that consist of a complete shared library that executes a specific CUDA algorithm.
Examples
GstCUDA offers a GStreamer plugin that contains a set of elements, that are ideal for GStreamer/CUDA quick prototyping. The following plugins are used for showcasing GstCUDA capabilities within the Jetson AGX Thor.
- cudafilter
- cudamux
All the examples are using memory type NVMM and the following resolutions and using the performance mode with jetson_clocks.sh
- HD: 1920x1080 with WIDTH=1920 and HEIGHT=1080.
- 4K: 3840X2160 with WIDTH=3840 and HEIGHT=2160.
- FILESRC as the src video.
- FILESINK as the name of the output video.
cudafilter
This element takes the input and then generate a output using the following filters:
- memcpy
- median filter
- grayscale
- pinhole
As an example the grayscale filter generates the following output from the incoming input. The next image shows the after and the before result.

memcpy
This filter copies the specified number of bytes from one memory location to another memory location.
gst-launch-1.0 filesrc location=$FILESRC ! qtdemux ! h264parse ! nvv4l2decoder ! nvvidconv ! "video/x-raw(memory:NVMM),width=$WIDTH,height=$HEIGHT,framerate=30/1" ! cudafilter in-place=true location=./memcpy.so ! perf ! nvvidconv ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=$FILESINK -ev
median filter
This filter removes noise from images, signals, and videos by replacing each pixel's value with the median of its surrounding pixels
gst-launch-1.0 filesrc location=$FILESRC ! qtdemux ! h264parse ! nvv4l2decoder ! nvvidconv ! "video/x-raw(memory:NVMM),width=$WIDTH,height=$HEIGHT,framerate=30/1" ! cudafilter in-place=true location=./median_filter.so ! perf ! nvvidconv ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=$FILESINK -ev
grayscale
This filter provides a grayscale output stream.
gst-launch-1.0 filesrc location=$FILESRC ! qtdemux ! h264parse ! nvv4l2decoder ! nvvidconv ! "video/x-raw(memory:NVMM),width=$WIDTH,height=$HEIGHT,framerate=30/1" ! cudafilter in-place=true location=./grayscale.so ! perf ! nvvidconv ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=$FILESINK -ev
pinhole
This filter creates an outer edge blur with subtle ghosting / refractions, while maintaining a strong unaffected focal point in the center.
gst-launch-1.0 filesrc location=$FILESRC ! qtdemux ! h264parse ! nvv4l2decoder ! nvvidconv ! "video/x-raw(memory:NVMM),width=$WIDTH,height=$HEIGHT,framerate=30/1" ! cudafilter in-place=true location=./pinhole.so ! perf ! nvvidconv ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=$FILESINK -ev
cudamux
cudamux is a multiple inputs/single output pad video filter GStreamer element, that allows video frames to be processed by the GPU using a custom CUDA library algorithm.
The expected output for a two videotestsrcs using GstCUDA looks like the following image.

The mixer.so CUDA algorithm library consists of a very basic algorithm that receives two YUV I420 images as inputs and mixed them on the GPU, this generates an output image that is the average of the two input images.
GST_DEBUG=2 gst-launch-1.0 cudamux name=cuda location=./mixer.so filesrc location=$FILESRC1 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvidconv ! "video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1,format=I420" ! cuda.sink_0 filesrc location=$FILESRC2 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvidconv ! "video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1,format=I420" ! queue ! cuda.sink_1 cuda. ! "video/x-raw(memory:NVMM),width=1920,height=1080,format=I420,framerate=30/1" ! perf ! queue ! identity ! nvvidconv ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=test.mp4 -ev
Thor Performance
The performance obtained by this element is plotted in the following table for different resolutions. There you can compare the FPS, GPU% and CPU% usage.
Resolution | GstCUDA's element | CPU (%) | GPU (%) | FPS |
---|---|---|---|---|
HD | cudafilter memcpy | 90.73 | 3.30 | 449.396 |
cudafilter median filter | 88.73 | 13.05 | 212.151 | |
cudafilter grayscale | 88.79 | 5.47 | 282.900 | |
cudafilter pinhole | 87.78 | 5.09 | 320.965 | |
cudamux mixer | 86.4 | 6.55 | 186.873 | |
4K | cudafilter memcpy | 89.22 | 12.00 | 175.730 |
cudafilter median filter | 90.34 | 22.24 | 94.330 | |
cudafilter grayscale | 89.28 | 12.67 | 170.825 | |
cudafilter pinhole | 87.34 | 10.33 | 193.633 | |
cudamux mixer | 81.02 | 8.45 | 119.682 |
Getting Started
To know more about the element, please refer to the Features and Limitations wiki page.
How to Purchase
NVIDIA Jetson AGX Thor/Contact_Us page has the RidgeRun contact details for purchasing or requesting the evaluation version.