GstCUDA - Performance Profiling: Difference between revisions
No edit summary |
mNo edit summary |
||
(7 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
{{GstCUDA | {{GstCUDA/Head|previous=Example - opencvfilter|next=Contact Us|keywords=GstCUDA add-ons,GstCUDA framework}} | ||
This page shows GstCUDA performance profiling. | This page shows GstCUDA performance profiling. | ||
__TOC__ | <br> | ||
<table> | |||
<tr> | |||
<td><div class="clear; float:right">__TOC__</div></td> | |||
<td valign=top> | |||
{{GStreamer debug}} | |||
</td> | |||
</table> | |||
== Glass to glass latency == | == Glass to glass latency == | ||
This wiki contains the glass to glass latency | This wiki contains the glass to glass latency measurement results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases. | ||
All the measurements were taken using the TX2 on the high-performance mode by running the following commands: | All the measurements were taken using the TX2 on the high-performance mode by running the following commands: | ||
Line 21: | Line 25: | ||
====Simple Capture to Display pipeline (without GstCUDA)==== | ====Simple Capture to Display pipeline (without GstCUDA)==== | ||
This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA. | This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA. | ||
* '''''Glass to Glass latency: 112.2042693 ms | * '''''Glass to Glass latency: 112.2042693 ms''''' ---> (59.9252609 ms with tuned/optimized pipeline) | ||
Test pipeline: | Test pipeline: | ||
<pre> | <pre> | ||
Line 54: | Line 58: | ||
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false | gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false | ||
</pre> | </pre> | ||
==== Cudamux ==== | ==== Cudamux ==== | ||
===== NVMM Direct Handling ===== | ===== NVMM Direct Handling ===== | ||
====== In-place:True ====== | ====== In-place:True ====== | ||
* '''''Glass to Glass latency: 145.5713375 ms | * '''''Glass to Glass latency: 145.5713375 ms''''' ---> (75.4149314 ms with tuned/optimized pipeline) | ||
Test pipeline: | Test pipeline: | ||
<pre> | <pre> | ||
Line 65: | Line 68: | ||
</pre> | </pre> | ||
====== In-place:False ====== | ====== In-place:False ====== | ||
* '''''Glass to Glass latency: 332.9231919 ms | * '''''Glass to Glass latency: 332.9231919 ms''''' ---> (112.3744414 ms with tuned/optimized pipeline) | ||
Test pipeline: | Test pipeline: | ||
<pre> | <pre> | ||
Line 72: | Line 75: | ||
===== Unified Memory Allocator ===== | ===== Unified Memory Allocator ===== | ||
====== In-place:True ====== | ====== In-place:True ====== | ||
* '''''Glass to Glass latency: 136.4211149 ms | * '''''Glass to Glass latency: 136.4211149 ms''''' ---> (118.3355796 ms with tuned/optimized pipeline) | ||
Test pipeline: | Test pipeline: | ||
<pre> | <pre> | ||
Line 78: | Line 81: | ||
</pre> | </pre> | ||
====== In-place:False ====== | ====== In-place:False ====== | ||
* '''''Glass to Glass latency: 197.1957698 ms | * '''''Glass to Glass latency: 197.1957698 ms''''' ---> (197.1957698 ms with tuned/optimized pipeline) | ||
Test pipeline: | Test pipeline: | ||
<pre> | <pre> | ||
Line 85: | Line 88: | ||
| | {{GstCUDA/Foot|previous=Example - opencvfilter|next=Contact Us}} |
Latest revision as of 18:58, 15 September 2020
This page shows GstCUDA performance profiling.
|
Glass to glass latency
This wiki contains the glass to glass latency measurement results of GstCUDA simple capture and display pipelines on a TX2. It contains the results for all the possible GstCUDA (cudafilter and cudamux) configurations and uses cases.
All the measurements were taken using the TX2 on the high-performance mode by running the following commands:
sudo nvpmodel -m 0 #Reboot after running it, so changes can take effect. reboot sudo ~/jetson_clocks
Jetpack 3.3 - IMX274 camera 4K@60fps glass to glass latency
Simple Capture to Display pipeline (without GstCUDA)
This measurement should be used as a reference to compare the glass to glass latency of the below pipelines with GstCUDA.
- Glass to Glass latency: 112.2042693 ms ---> (59.9252609 ms with tuned/optimized pipeline)
Test pipeline:
gst-launch-1.0 -v nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
Cudafilter
NVMM Direct Handling
In-place:True
- Glass to Glass latency: 178.9331237 ms
Test pipeline:
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
- Glass to Glass latency: 230.3850304 ms
Test pipeline:
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
Unified Memory Allocator
In-place:True
- Glass to Glass latency: 188.1192285 ms
Test pipeline:
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
- Glass to Glass latency: 306.2578894 ms
Test pipeline:
gst-launch-1.0 nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! cudafilter in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.so ! nvvidconv ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
Cudamux
NVMM Direct Handling
In-place:True
- Glass to Glass latency: 145.5713375 ms ---> (75.4149314 ms with tuned/optimized pipeline)
Test pipeline:
gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
- Glass to Glass latency: 332.9231919 ms ---> (112.3744414 ms with tuned/optimized pipeline)
Test pipeline:
gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=NV12,framerate=60/1" ! nvvidconv ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
Unified Memory Allocator
In-place:True
- Glass to Glass latency: 136.4211149 ms ---> (118.3355796 ms with tuned/optimized pipeline)
Test pipeline:
gst-launch-1.0 -v cudamux name=cuda in-place=true location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false
In-place:False
- Glass to Glass latency: 197.1957698 ms ---> (197.1957698 ms with tuned/optimized pipeline)
Test pipeline:
gst-launch-1.0 -v cudamux name=cuda in-place=false location=/home/nvidia/gst-cuda/tests/examples/cudamux_algorithms/mixer/mixer.so nvcamerasrc queue-size=10 sensor-id=1 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_0 nvcamerasrc queue-size=10 sensor-id=2 fpsRange='60 60' ! "video/x-raw(memory:NVMM),width=3840,height=2160,format=I420,framerate=60/1" ! nvvidconv ! "video/x-raw,width=3840,height=2160,format=I420,framerate=60/1" ! queue max-size-buffers=3 leaky=2 ! cuda.sink_1 cuda. ! perf print-arm-load=true ! nvoverlaysink enable-last-sample=false