GstQtOverlay plugin performance on NVIDIA Platforms

From RidgeRun Developer Wiki

Follow Us On Twitter LinkedIn Email Share this page



Previous: Performance Index Next: Performance/nxp







For testing purposes, take into account the following points:

  • Maximum performance mode enabled: all cores, and Jetson clocks enabled
  • Base installation
  • The GstQtOverlay is surrounded by queues to measure the actual capability of GstQtOverlay



Thor AGX Platform

The Thor AGX was tested with the JetPack 7.1.

Format Performance

The contrast performance with and without NVMM support for every format, we obtained the following results.

Measurement RGBA UYVY NV12
NVMM with nvvidconv 405.327 fps
467.725 fps
462.701 fps
NVMM with native formats
405.335 fps
402.133 fps
406.673 fps

CPU Usage

The CPU usage results are the following.

Measurement No NVMM
NVMM
GstQtOverlay 1% 1%
Rest of pipeline 7.7% 10%
Total 8.7% 11%

Xavier NX Platform

The following Jetpack 4.4 (4.2.1 or earlier is not recommended) was tested for the Xavier NX.

Formats Performance

The following table shows the formats performance with and without NVMM.

Measurement RGBA UYVY NV12
NO NVMM 190 fps
N/A
N/A
NVMM with nvvidconv 227 fps
200 fps
177 fps
NVMM with native formats
265 fps
172 fps
177 fps

CPU usage

The following table shows the CPU usage of the GstQtOverlay.

Measurement No NVMM
NVMM
GstQtOverlay 16.5% 7.3%
Rest of pipeline 14% 27.8%
Total 30.5% 35.1%


The total consumption of the pipeline is higher in the NVMM case since there are more elements. The GstQtOverlay consumes less CPU in NVMM mode.

Nano Platform

The Jetson Nano was tested with Jetpack 4.5 (4.2.1 or earlier is not recommended)

Formats Performance

The results of performance with NVMM and without NVMM using the following pipeline only for RGBA format, supported for non NVMM memory are the following.

Measurement RGBA UYVY NV12
No NVMM 57 fps
N/A N/A
NVMM with nvvidconv
121 fps
67 fps 61 fps
NVMM with native formats
123 fps
67 fps 61 fps

While using the native formats may not provide big performance gains here as it still uses nvvidconv to upload to NVMM memory, it allows connecting directly to some cameras that output NV12 in NVMM memory like with the nvarguscamerasrc element.

CPU usage

The following table shows the results of CPU usage with and without NVMM.

Measurement No NVMM
NVMM
GstQtOverlay 2% 2%
Rest of pipeline 17% 13%
Total 19% 15%

Tests in multiple platforms regarding the resolution

The following results show multiple tests for different resolutions at 30 fps, in order to dig into the multiple capabilities of the end user. You can link the limit fps of the limit tables to the average table just to realize the limits of each resolution, but remember that the limit is just virtual since we are using the element imagefreeze to set the hardware to the limit.

Maximum Framerate, GPU and RAM Usage percentage for each platform of average behaviour.
Platform Mode Resolution Max Framerate(fps) CPU(%) RAM(MiB)
Thor AGX No NVMM 720p 30 0.51 271.33
1080p 30 0.73 293.12
4K 30 2.30 412.97
NVMM 720p 30 0.54 286.22
1080p 30 0.84 315.99
4K 30 2.29 474.94
Orin Nano No NVMM 720p 30 4 96
1080p 30 6 128
4K 11.96 7 160
NVMM 720p 30 7 128
1080p 30 9 144
4K 25.409 22 160
Xavier Nx No NVMM 720p 30 6 91
1080p 30 7 98
4K 11.574 8 119
NVMM 720p 30 10 98
1080p 30 16 105
4K 25.409 23 126


Maximum Framerate, GPU and RAM Usage percentage for each platform of limit behaviour.
Platform Mode Resolution Max Framerate(fps) CPU(%) RAM(MiB)
Thor AGX No NVMM 720p 312.529 2.60 270.07
1080p 250.146 2.47 292.64
4K 120.175 2.77 409.79
NVMM 720p 406.038 3.46 271.73
1080p 403.607 3.85 291.19
4K 193.832 2.29 411.6
Orin Nano No NVMM 720p 118.978 6 110.04
1080p 89.702 6 111.06
4K 35.474 6 117.376
NVMM 720p 301.708 14 128
1080p 185.239 13 128
4K 53.395 8 136
Xavier Nx No NVMM 720p 254 7 89.141
1080p 202 14.1 89.141
4K 84.5 15.6 110.16
NVMM 720p 390 6 117
1080p 228 11 151
4K 73.3 13 157

Reproducing the results

The following pipelines were used for each respective section.

Formats Performance

With the addition of native UYVY and NV12 support for NVMM memory, we measured the performance for each format between using nvvidconv or the native support. The following pipelines were used:

NVMM using nvvidconv:

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080, format=NV12' ! queue ! nvvidconv ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! nvvidconv ! 'video/x-raw' ! fakesink sync=false

NVMM with native UYVY and NV12 support:

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080, format=NV12' ! queue ! nvvidconv ! queue ! 'video/x-raw(memory:NVMM), format=NV12' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! nvvidconv ! 'video/x-raw' ! fakesink sync=false

No NVMM

gst-launch-1.0 videotestsrc pattern=black ! 'video/x-raw, width=1920, height=1080' ! queue ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! fakesink sync=false

CPU Usage

No NVMM

gst-launch-1.0 videotestsrc is-live=true ! 'video/x-raw, width=1920, height=1080'  ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf print-cpu-load=true ! 'video/x-raw, width=1920, height=1080' ! fakesink

NVMM:

gst-launch-1.0 videotestsrc is-live=true ! 'video/x-raw, width=1920, height=1080' ! nvvidconv ! 'video/x-raw(memory:NVMM), width=1920, height=1080' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf print-cpu-load=true ! nvvidconv ! 'video/x-raw, width=1920, height=1080'  ! fakesink sync=false

Performance for different resolutions

No NVMM

For average behavior:

gst-launch-1.0 videotestsrc is-live=1 ! "video/x-raw,width=${W},height=${H},framerate=30/1" ! nvvidconv ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf print-cpu-load=1 ! fakesink

For limit behavior:

gst-launch-1.0 videotestsrc ! "video/x-raw, width=${W},height=${H}" ! imagefreeze ! imxvideoconvert_g2d ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf print-cpu-load=1 ! fakesink

NVMM:

For average behavior:

gst-launch-1.0 videotestsrc  is-live=1 ! "video/x-raw, width=${W}, height=${H}, framerate=30/1" ! queue ! nvvidconv ! queue ! 'video/x-raw(memory:NVMM)' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf print-cpu-load=1  ! queue ! fakesink sync=false

For limit behavior:

gst-launch-1.0 videotestsrc  ! "video/x-raw, width=${W}, height=${H}, framerate=30/1" ! queue ! nvvidconv ! queue ! 'video/x-raw(memory:NVMM)' ! qtoverlay qml=gst-libs/gst/qt/main.qml ! perf ! queue ! fakesink sync=false


Previous: Performance Index Next: Performance/nxp