GstCUDA - Performance Profiling: Difference between revisions

Latest revision as of 18:58, 15 September 2020

This page shows GstCUDA performance profiling.

Problems running the pipelines shown on this page? Please see our GStreamer Debugging guide for help.

Glass to glass latency

This wiki contains the glass to glass latency measurement results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.

All the measurements were taken using the TX2 on the high-performance mode by running the following commands:

sudo nvpmodel -m 0 #Reboot after running it, so changes can take effect.
reboot
sudo ~/jetson_clocks

Jetpack 3.3 - IMX274 camera 4K@60fps glass to glass latency

Simple Capture to Display pipeline (without GstCUDA)

This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.

Glass to Glass latency: 112.2042693 ms ---> (59.9252609 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v  nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Cudafilter

NVMM Direct Handling

In-place:True

Glass to Glass latency: 178.9331237 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

In-place:False

Glass to Glass latency: 230.3850304 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Unified Memory Allocator

In-place:True

Glass to Glass latency: 188.1192285 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

In-place:False

Glass to Glass latency: 306.2578894 ms

Test pipeline:

gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Cudamux

NVMM Direct Handling

In-place:True

Glass to Glass latency: 145.5713375 ms ---> (75.4149314 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

In-place:False

Glass to Glass latency: 332.9231919 ms ---> (112.3744414 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Unified Memory Allocator

In-place:True

Glass to Glass latency: 136.4211149 ms ---> (118.3355796 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

In-place:False

Glass to Glass latency: 197.1957698 ms ---> (197.1957698 ms with tuned/optimized pipeline)

Test pipeline:

gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false

Previous: Example - opencvfilter

Index

Next: Contact Us

@@ Line 1: / Line 1: @@
-{{GstCUDA/Head|previous=Example 2: cudadebayer|next=Contact Us|keywords=GstCUDA add-ons,GstCUDA framework}}
+{{GstCUDA/Head|previous=Example - opencvfilter|next=Contact Us|keywords=GstCUDA add-ons,GstCUDA framework}}
 This page shows GstCUDA performance profiling.
@@ Line 13: / Line 13: @@
 == Glass to glass latency ==
-This wiki contains the glass to glass latency measurements results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.
+This wiki contains the glass to glass latency measurement results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.
 All the measurements were taken using the TX2 on the high-performance mode by running the following commands:
@@ Line 88: / Line 88: @@
-{{GstCUDA/Foot|previous=Example 2: cudadebayer|next=Contact Us}}
+{{GstCUDA/Foot|previous=Example - opencvfilter|next=Contact Us}}