OpenGL Accelerated HTML Overlay: Performance - NVIDIA Jetson

From RidgeRun Developer Wiki



Previous: Examples/GStreamer_Usage Index Next: Performance/x86




Library performance

The library has two major components: the hardware-accelerated graphical rendering done by OpenGL and the web rendering engine done by Webkit GTK. In the following section, you will find the performance for the library separated into these two sections.

Graphical Rendering by OpenGL


In this section, we present results about the performance of HTML Overlay tested in the following setup:

  • Board: NVIDIA Jetson Xavier NX
  • Jetpack: 5.1

All the packages and dependencies are retrieved from the default APT repositories.


The following table shows the CPU usage, GPU usage, processing time and FPS.

Board NVIDIA Jetson Xavier NX
Resolution 4K +1080p +720p
Measurement CPU usage (%) GPU usage (%) Processing time (ms) FPS CPU usage (%) GPU usage (%) Processing time (ms) FPS CPU usage (%) GPU usage (%) Processing time (ms) FPS
Power Configuration 10 W Desktop Mode
Upload 6.34 36.99 11.46 87.24 7.25 19.64 14 71.4 4.44 15.72 7.293 137
Draw 0.16 11.96 4.329 231 0.36 7.71 2.5 395 0.40 5.67 2.077 481
Download 7.70 29.08 15.053 66.43 5 14.73 8.4 118 3.14 6.53 4.552 220
Power Configuration 20 W + Jetson Clocks (Max Power)
Upload 4.25 6.40 11.357 88 1.45 2.21 2.997 334 0.77 1.30 1.492 670
Draw 0.09 1.59 0.773 1294 0.09 0.82 0.478 2092 0.10 0.66 0.443 2309
Download 2.73 6.43 6.489 154.1 0.94 2.13 1.947 514 0.57 0.84 1.071 933.7

Web rendering by WebkitGTK

The following table shows the CPU usage, GPU usage, processing time and FPS.


Board NVIDIA Jetson Xavier NX
Resolution 4K +1080p +720p
Measurement CPU usage (%) GPU usage (%) Processing time (ms) FPS CPU usage (%) GPU usage (%) Processing time (ms) FPS CPU usage (%) GPU usage (%) Processing time (ms) FPS
Power Configuration 10 W Desktop Mode
Draw 9.2 0 215.425 4.64 9.2 0 52.525 19 6.77 0 23.095 43.3
Power Configuration 20 W + Jetson Clocks (Max Power)[1]
Draw 6.1 0 281.714 3.55 6.3 0 69.137 14.46 5.6 0 28.515 35.1
  • Note: When on mode 20W+Jetson Clocks the frequency of operation is 1.4 GHz and for 10w+Desktop mode the frequency of operation is 1.9 GHz. This is shown for the measurements for processing time in each mode.
  • Note: There is no GPU consumption since we are using a flag that disables the use of GPU for WebkitGTK.
export WEBKIT_DISABLE_COMPOSITING_MODE=1

GStreamer plugin performance

The plugin was tested with an example overlay and a camera, using a Jetson Xavier NX with Jetpack 5.1.1. The measurements were taken with the following pipeline, using gst-perf:

gst-launch-1.0 -ve nvarguscamerasrc num-buffers=300 ! "video/x-raw(memory:NVMM),height=$H,width=$W,framerate=30/1" ! nvvidconv flip-method=2 ! queue ! htmloverlay url="http://0.0.0.0:8000/overlay.html" enable-js=true web-refresh-rate=10 ! perf ! queue ! nvvidconv ! xvimagesink


Board Jetson Xavier NX
Resolution +720p +1080p +4k
FPS(10W-4core) 166.6167 54.7143 13.4852
FPS(20W-6core & jetson-clocks)[1] 120.5662 56.2988 12.7323

Used overlay

The user overlay (click View Source on the wiki to see the html):

REC
Montreal City, Canada


The following results show multiple tests for different resolutions at 30 fps, in order to dig into the multiple capabilities of the end user. You can link the limit fps of the limit tables to the average table just to realize the limits of each resolution, but remember that the limit is just virtual since we are using the element imagefreeze to set the hardware to the limit.

Orin Nano Platform

CPU usage

Taking the following pipelines as reference:

No GL Memory

For average behavior:

gst-launch-1.0 videotestsrc is-live=1 ! "video/x-raw,framerate=30/1,height=${H},width=${W}" ! queue ! nvvidconv ! videoconvert ! "video/x-raw" ! queue ! htmloverlay url="https://www.clocktab.com/" enable-js=true web-refresh-rate=5 overlay-x=100 ! queue ! "video/x-raw" ! queue ! perf print-cpu-load=true ! fakesink

For limit behavior:

gst-launch-1.0 videotestsrc num-buffers=1 pattern=ball ! "video/x-raw,format=RGBA,height=${H},width=${W}" ! imagefreeze ! queue ! htmloverlay url="https://www.clocktab.com/" enable-js=true web-refresh-rate=5 overlay-x=100 ! queue ! perf print-cpu-load=true ! fakesink

Results for average behavior

Resolution 720p 1080p
4K
Max Framerate (fps) 30 30 30
CPU(%) 21 23 47
RAM(MiB) 568 640 1040

Results for limit behavior

Resolution 720p 1080p
4K
Max Framerate (fps) 375.123 187.907 51.722
CPU(%) 24 24 31
RAM(MiB) 332.564 359.464 359.464

GL Memory:

For average behavior:

gst-launch-1.0 videotestsrc is-live=1 ! "video/x-raw,framerate=30/1,height=${H},width=${W}" ! queue ! nvvidconv ! videoconvert ! glupload ! "video/x-raw(memory:GLMemory)" ! queue ! glhtmloverlay url="https://www.clocktab.com/" enable-js=true web-refresh-rate=5 overlay-x=100 ! queue ! "video/x-raw(memory:GLMemory)"  ! gldownload ! queue ! perf print-cpu-load=true ! fakesink

For limit behavior:

gst-launch-1.0 videotestsrc num-buffers=1 pattern=ball ! "video/x-raw,format=RGBA,height=${H},width=${W}" ! imagefreeze ! queue ! glupload ! "video/x-raw(memory:GLMemory)" ! queue ! glhtmloverlay url="https://www.clocktab.com/" enable-js=true web-refresh-rate=5 overlay-x=100 ! queue ! "video/x-raw(memory:GLMemory)"  ! gldownload ! queue ! perf print-cpu-load=true ! fakesink

Results for average behavior

Resolution 720p 1080p
4K
Max Framerate (fps) 30 30 24
CPU(%) 9 22 32
RAM(MiB) 424 616 848

Results for limit behavior

Resolution 720p 1080p
4K
Max Framerate (fps) 793.2 462.677 146.421
CPU(%) 25 25 36
RAM(MiB) 301.245 337.456 337.456

Xavier NX Platform

CPU usage

Taking the following pipelines as reference:

No GL Memory

For average behavior:

gst-launch-1.0 videotestsrc is-live=1 ! "video/x-raw,framerate=30/1,height=${H},width=${W}" ! queue ! nvvidconv ! videoconvert ! "video/x-raw" ! queue ! htmloverlay url="https://www.clocktab.com/" enable-js=true web-refresh-rate=5 overlay-x=100 ! queue ! "video/x-raw" ! queue ! perf print-cpu-load=true ! fakesink

For limit behavior:

gst-launch-1.0 videotestsrc num-buffers=1 pattern=ball ! "video/x-raw,format=RGBA,height=${H},width=${W}" ! imagefreeze ! queue ! htmloverlay url="https://www.clocktab.com/" enable-js=true web-refresh-rate=5 overlay-x=100 ! queue ! perf print-cpu-load=true ! fakesink

Results for average behavior

Resolution 720p 1080p
4K
Max Framerate (fps) 30 30 30
CPU(%) 14 17 40
RAM(MiB) 189 231 469

Results for limit behavior

Resolution 720p 1080p
4K
Max Framerate (fps) 302.5 159.685 48.624
CPU(%) 25 29 39
RAM(MiB) 406 455 630

GL Memory:

For average behavior:

gst-launch-1.0 videotestsrc is-live=1 ! "video/x-raw,framerate=30/1,height=${H},width=${W}" ! queue ! nvvidconv ! videoconvert ! glupload ! "video/x-raw(memory:GLMemory)" ! queue ! glhtmloverlay url="https://www.clocktab.com/" enable-js=true web-refresh-rate=5 overlay-x=100 ! queue ! "video/x-raw(memory:GLMemory)"  ! gldownload ! queue ! perf print-cpu-load=true ! fakesink

For limit behavior:

gst-launch-1.0 videotestsrc num-buffers=1 pattern=ball ! "video/x-raw,format=RGBA,height=${H},width=${W}" ! imagefreeze ! queue ! glupload ! "video/x-raw(memory:GLMemory)" ! queue ! glhtmloverlay url="https://www.clocktab.com/" enable-js=true web-refresh-rate=5 overlay-x=100 ! queue ! "video/x-raw(memory:GLMemory)"  ! gldownload ! queue ! perf print-cpu-load=true ! fakesink

Results for average behavior

Resolution 720p 1080p
4K
Max Framerate (fps) 30 30 23
CPU(%) 20 26 39
RAM(MiB) 147 168 301

Results for limit behavior

Resolution 720p 1080p
4K
Max Framerate (fps) 450.771 311.757 125.448
CPU(%) 27 28 37
RAM(MiB) 378 448 679



Previous: Examples/GStreamer_Usage Index Next: Performance/x86



  1. 1.0 1.1 The reduced results with the 20W power mode is expected since the CPU is running at a reduced clock of 1.4GHz compared to the 1.9GHz