CUDA ISP for NVIDIA Jetson/Performance/Library: Difference between revisions
No edit summary |
No edit summary |
||
(27 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
<noinclude> | <noinclude> | ||
{{CUDA ISP for NVIDIA Jetson/Head|previous=|next=| | {{CUDA ISP for NVIDIA Jetson/Head|previous=Examples/GStreamer usage|next=Performance/GStreamer}} | ||
{{#seo: | |||
|title= | |||
|title_mode=replace | |||
|description=Review CUDA ISP API performance results for Jetson devices. Explore frame rate, CPU usage, GPU usage & RAM performance data. | |||
}} | |||
</noinclude> | </noinclude> | ||
{{DISPLAYTITLE:CUDA ISP | {{DISPLAYTITLE:CUDA ISP API performance|noerror}} | ||
== Library API performance == | |||
= Library API performance = | |||
To measure the CUDA ISP API performance, we built a simple example (provided upon request) that iterates over the <code>Apply</code> methods for each algorithm and records performance metrics for each iteration. We measured the duration of each algorithm's <code>Apply</code> method. We also measured CPU, CPU RAM, GPU, and GPU RAM usage for the complete processing pipeline iterating at 30fps. We ran the experiments on both 1080p and 4K buffers. We also ran the experiments on the Jetson Nano, Jetson Xavier NX, Jetson Xavier AGX, and Jetson AGX Orin. | To measure the CUDA ISP API performance, we built a simple example (provided upon request) that iterates over the <code>Apply</code> methods for each algorithm and records performance metrics for each iteration. We measured the duration of each algorithm's <code>Apply</code> method. We also measured CPU, CPU RAM, GPU, and GPU RAM usage for the complete processing pipeline iterating at 30fps. We ran the experiments on both 1080p and 4K buffers. We also ran the experiments on the Jetson Nano, Jetson Xavier NX, Jetson Xavier AGX, and Jetson AGX Orin. | ||
=== Software performance measurement tools === | |||
* We measured the duration of each <code>Apply</code> method separately using the <code>chrono</code> library. | * We measured the duration of each <code>Apply</code> method separately using the <code>chrono</code> library. | ||
* We used | * We mainly used <code>sys/times.h</code> library to obtain the CPU usage. However, we used the <code>proc/status</code> file to obtain a secondary verification measure. | ||
* We read the <code>/proc/self/status</code> file to obtain the CPU RAM usage. | * We read the <code>/proc/self/status</code> file to obtain the CPU RAM usage. | ||
* We used | * We used <code>tegrastats</code> to obtain the GPU usage. | ||
* We used <code>cudaMemGetInfo</code> from CUDA to measure GPU RAM usage. | * We used <code>cudaMemGetInfo</code> from CUDA to measure GPU RAM usage. | ||
Every measurement is averaged over 100 iterations. The iterations are timed to run at 30 iterations per second. | |||
* On the Jetson Nano, we used Jetpack 4.5.3 and MAXN Power Mode (NVP model 0) | === Hardware setup === | ||
* On the Jetson Nano, we used Jetpack 4.5.3 and 10W 4 Core MAXN Power Mode (NVP model 0) | |||
* On the Jetson Xavier NX, we used Jetpack 4.5.3 and 20W 6 Core Power Mode (NVP model 8) | * On the Jetson Xavier NX, we used Jetpack 4.5.3 and 20W 6 Core Power Mode (NVP model 8) | ||
* On the Jetson Xavier AGX, we used Jetpack 4.5.1 and 30W 8 Core Power Mode (NVP model 3) | * On the Jetson Xavier AGX, we used Jetpack 4.5.1 and 30W 8 Core Power Mode (NVP model 3) | ||
* On the Jetson AGX Orin, we used Jetpack 5.0.2 and | * On the Jetson AGX Orin, we used Jetpack 5.0.2 and 50W 12 Core Power Mode (NVP model 3) | ||
For each system, we also used <code>jetson_clocks</code> to maximise the device clock frequency and thus the performance. | |||
=== Results === | |||
The following table summarises CUDA ISP's performance results. | |||
<center> | <center> | ||
{| class="wikitable" style="text-align:center;" | {| class="wikitable" style="text-align:center;" | ||
|- | |- | ||
! style="text-align:left;" | | ! style="text-align:left;" | Algorithm | ||
! colspan="2" style="background-color:#ffd6a5;" | Jetson AGX Orin | ! colspan="2" style="background-color:#ffd6a5;" | Jetson AGX Orin | ||
! colspan="2" style="background-color:#ffadad;" | Jetson Xavier AGX | ! colspan="2" style="background-color:#ffadad;" | Jetson Xavier AGX | ||
Line 52: | Line 60: | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaShift | | style="text-align:left; font-weight:bold;" | CudaShift | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 0.82 | ||
| style="background-color:#ffd6a5;" | 1.52 | | style="background-color:#ffd6a5;" | 1.52 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 1.56 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 4.18 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.71 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 1.82 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 2.10 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 7.52 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaDebayer | | style="text-align:left; font-weight:bold;" | CudaDebayer | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 0.68 | ||
| style="background-color:#ffd6a5;" | 1.30 | | style="background-color:#ffd6a5;" | 1.30 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 1.93 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 5.74 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.79 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 2.13 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 2.40 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 8.75 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 0.84 | ||
| style="background-color:#ffd6a5;" | 1.66 | | style="background-color:#ffd6a5;" | 1.66 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 1.94 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 5.21 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.99 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 2.22 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 2.51 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 8.51 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 1.24 | ||
| style="background-color:#ffd6a5;" | 1.91 | | style="background-color:#ffd6a5;" | 1.91 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 2.33 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 6.87 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 1.11 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 2.71 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 3.18 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 11.06 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | | style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 0.91 | ||
| style="background-color:#ffd6a5;" | 1.60 | | style="background-color:#ffd6a5;" | 1.60 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 1.23 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 3.13 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.59 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 1.31 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 2.05 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 7.70 | ||
|- | |- | ||
Line 110: | Line 118: | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaShift | | style="text-align:left; font-weight:bold;" | CudaShift | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 1216 | ||
| style="background-color:#ffd6a5;" | 660 | | style="background-color:#ffd6a5;" | 660 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 641 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 239 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 1408 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 550 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 475 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 132 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaDebayer | | style="text-align:left; font-weight:bold;" | CudaDebayer | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 1479 | ||
| style="background-color:#ffd6a5;" | 771 | | style="background-color:#ffd6a5;" | 771 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 519 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 174 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 1259 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 469 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 415 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 114 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 1197 | ||
| style="background-color:#ffd6a5;" | 603 | | style="background-color:#ffd6a5;" | 603 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 515 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 191 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 1011 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 451 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 398 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 117 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 807 | ||
| style="background-color:#ffd6a5;" | 522 | | style="background-color:#ffd6a5;" | 522 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 429 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 145 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 902 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 368 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 314 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 90 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | | style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 1104 | ||
| style="background-color:#ffd6a5;" | 623 | | style="background-color:#ffd6a5;" | 623 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 814 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 319 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 1697 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 761 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 487 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 129 | ||
|- | |- | ||
Line 168: | Line 176: | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaShift | | style="text-align:left; font-weight:bold;" | CudaShift | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 0.198 | ||
| style="background-color:#ffd6a5;" | 0. | | style="background-color:#ffd6a5;" | 0.218 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.285 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.255 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.287 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.356 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 0.800 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 0.817 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaDebayer | | style="text-align:left; font-weight:bold;" | CudaDebayer | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 0.121 | ||
| style="background-color:#ffd6a5;" | 0. | | style="background-color:#ffd6a5;" | 0.161 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.238 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.237 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.263 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.280 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 0.873 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 0.665 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 0.201 | ||
| style="background-color:#ffd6a5;" | 0. | | style="background-color:#ffd6a5;" | 0.277 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.338 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.316 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.443 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.471 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 1.286 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 1.299 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 0.260 | ||
| style="background-color:#ffd6a5;" | 0. | | style="background-color:#ffd6a5;" | 0.280 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.351 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.341 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.527 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.442 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 1.569 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 1.454 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | | style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 0.172 | ||
| style="background-color:#ffd6a5;" | 0. | | style="background-color:#ffd6a5;" | 0.201 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.246 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 0.237 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.278 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 0.251 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 0.647 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 0.680 | ||
|- | |- | ||
Line 226: | Line 234: | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaShift | | style="text-align:left; font-weight:bold;" | CudaShift | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 89.7 | ||
| style="background-color:#ffd6a5;" | 90.8 | | style="background-color:#ffd6a5;" | 90.8 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 90.0 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 91.5 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 89.9 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 90.0 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 77.5 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 88.7 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaDebayer | | style="text-align:left; font-weight:bold;" | CudaDebayer | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 44.4 | ||
| style="background-color:#ffd6a5;" | 44.5 | | style="background-color:#ffd6a5;" | 44.5 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 41.8 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 42.2 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 41.7 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 42.9 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 33.0 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 33.4 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 99.8 | ||
| style="background-color:#ffd6a5;" | 99.1 | | style="background-color:#ffd6a5;" | 99.1 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 108.5 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 107.4 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 107.6 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 108.1 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 40.6 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 77.5 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 99.2 | ||
| style="background-color:#ffd6a5;" | 99.3 | | style="background-color:#ffd6a5;" | 99.3 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 107.7 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 108.6 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 108.1 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 107.5 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 52.1 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 33.0 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | | style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 44.1 | ||
| style="background-color:#ffd6a5;" | 44.3 | | style="background-color:#ffd6a5;" | 44.3 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 42.0 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 42.1 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 42.1 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 42.0 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 33.2 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 89.1 | ||
|- | |- | ||
Line 284: | Line 292: | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaShift | | style="text-align:left; font-weight:bold;" | CudaShift | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 12.29 | ||
| style="background-color:#ffd6a5;" | 7.81 | | style="background-color:#ffd6a5;" | 7.81 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 18.03 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 13.00 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 11.59 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 4.23 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 85.32 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 25.96 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaDebayer | | style="text-align:left; font-weight:bold;" | CudaDebayer | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 11.60 | ||
| style="background-color:#ffd6a5;" | 13.04 | | style="background-color:#ffd6a5;" | 13.04 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 28.42 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 29.00 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 7.28 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 10.38 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 67.27 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 42.53 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 9.75 | ||
| style="background-color:#ffd6a5;" | 19.27 | | style="background-color:#ffd6a5;" | 19.27 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 17.22 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 25.54 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 4.70 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 15.15 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 20.24 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 75.14 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 8.89 | ||
| style="background-color:#ffd6a5;" | 24.42 | | style="background-color:#ffd6a5;" | 24.42 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 17.00 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 24.56 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 5.36 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 17.11 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 26.84 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 83.35 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | | style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 4.85 | ||
| style="background-color:#ffd6a5;" | 9.14 | | style="background-color:#ffd6a5;" | 9.14 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 13.47 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 24.68 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 4.64 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 15.36 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 20.11 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 81.19 | ||
|- | |- | ||
Line 342: | Line 350: | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaShift | | style="text-align:left; font-weight:bold;" | CudaShift | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 26.8 | ||
| style="background-color:#ffd6a5;" | 28.4 | | style="background-color:#ffd6a5;" | 28.4 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 43.0 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 85.7 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 42.9 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 63.7 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 41.6 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 43.3 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaDebayer | | style="text-align:left; font-weight:bold;" | CudaDebayer | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 6.0 | ||
| style="background-color:#ffd6a5;" | 6.1 | | style="background-color:#ffd6a5;" | 6.1 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 10.7 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 10.3 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 9.8 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 11.2 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 11.9 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 11.8 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Gray World Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 33.7 | ||
| style="background-color:#ffd6a5;" | 32.7 | | style="background-color:#ffd6a5;" | 32.7 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 57.5 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 61.6 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 56.8 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 59.6 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 40.6 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 31.5 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | | style="text-align:left; font-weight:bold;" | CudaWhiteBalancer (Histogram Stretch Algorithm) | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 33.2 | ||
| style="background-color:#ffd6a5;" | 33.4 | | style="background-color:#ffd6a5;" | 33.4 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 56.8 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 66.3 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 57.1 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 62.2 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 52.1 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 52.4 | ||
|- style="text-align:right;" | |- style="text-align:right;" | ||
| style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | | style="text-align:left; font-weight:bold;" | CudaColorSpaceConverter | ||
| style="background-color:#ffd6a5;" | | | style="background-color:#ffd6a5;" | 5.4 | ||
| style="background-color:#ffd6a5;" | 6.0 | | style="background-color:#ffd6a5;" | 6.0 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 9.9 | ||
| style="background-color:#ffadad;" | | | style="background-color:#ffadad;" | 10.4 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 10.0 | ||
| style="background-color:#c8ffc6;" | | | style="background-color:#c8ffc6;" | 10.3 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 11.2 | ||
| style="background-color:#a0c4ff;" | | | style="background-color:#a0c4ff;" | 11.8 | ||
|} | |} | ||
Line 399: | Line 407: | ||
<noinclude> | <noinclude> | ||
{{CUDA ISP for NVIDIA Jetson/Foot||}} | {{CUDA ISP for NVIDIA Jetson/Foot|Examples/GStreamer usage|Performance/GStreamer}} | ||
</noinclude> | </noinclude> |
Latest revision as of 16:49, 30 September 2024
CUDA ISP for NVIDIA Jetson |
---|
CUDA ISP for NVIDIA Jetson Basics |
Getting Started |
User Manual |
GStreamer |
Examples |
Performance |
Contact Us |
Library API performance
To measure the CUDA ISP API performance, we built a simple example (provided upon request) that iterates over the Apply
methods for each algorithm and records performance metrics for each iteration. We measured the duration of each algorithm's Apply
method. We also measured CPU, CPU RAM, GPU, and GPU RAM usage for the complete processing pipeline iterating at 30fps. We ran the experiments on both 1080p and 4K buffers. We also ran the experiments on the Jetson Nano, Jetson Xavier NX, Jetson Xavier AGX, and Jetson AGX Orin.
Software performance measurement tools
- We measured the duration of each
Apply
method separately using thechrono
library. - We mainly used
sys/times.h
library to obtain the CPU usage. However, we used theproc/status
file to obtain a secondary verification measure. - We read the
/proc/self/status
file to obtain the CPU RAM usage. - We used
tegrastats
to obtain the GPU usage. - We used
cudaMemGetInfo
from CUDA to measure GPU RAM usage.
Every measurement is averaged over 100 iterations. The iterations are timed to run at 30 iterations per second.
Hardware setup
- On the Jetson Nano, we used Jetpack 4.5.3 and 10W 4 Core MAXN Power Mode (NVP model 0)
- On the Jetson Xavier NX, we used Jetpack 4.5.3 and 20W 6 Core Power Mode (NVP model 8)
- On the Jetson Xavier AGX, we used Jetpack 4.5.1 and 30W 8 Core Power Mode (NVP model 3)
- On the Jetson AGX Orin, we used Jetpack 5.0.2 and 50W 12 Core Power Mode (NVP model 3)
For each system, we also used jetson_clocks
to maximise the device clock frequency and thus the performance.
Results
The following table summarises CUDA ISP's performance results.
Algorithm | Jetson AGX Orin | Jetson Xavier AGX | Jetson Xavier NX | Jetson Nano | ||||
---|---|---|---|---|---|---|---|---|
Buffer size | 1080p | 4K | 1080p | 4K | 1080p | 4K | 1080p | 4K |
Duration (ms) | ||||||||
CudaShift | 0.82 | 1.52 | 1.56 | 4.18 | 0.71 | 1.82 | 2.10 | 7.52 |
CudaDebayer | 0.68 | 1.30 | 1.93 | 5.74 | 0.79 | 2.13 | 2.40 | 8.75 |
CudaWhiteBalancer (Gray World Algorithm) | 0.84 | 1.66 | 1.94 | 5.21 | 0.99 | 2.22 | 2.51 | 8.51 |
CudaWhiteBalancer (Histogram Stretch Algorithm) | 1.24 | 1.91 | 2.33 | 6.87 | 1.11 | 2.71 | 3.18 | 11.06 |
CudaColorSpaceConverter | 0.91 | 1.60 | 1.23 | 3.13 | 0.59 | 1.31 | 2.05 | 7.70 |
Framerate (fps) | ||||||||
CudaShift | 1216 | 660 | 641 | 239 | 1408 | 550 | 475 | 132 |
CudaDebayer | 1479 | 771 | 519 | 174 | 1259 | 469 | 415 | 114 |
CudaWhiteBalancer (Gray World Algorithm) | 1197 | 603 | 515 | 191 | 1011 | 451 | 398 | 117 |
CudaWhiteBalancer (Histogram Stretch Algorithm) | 807 | 522 | 429 | 145 | 902 | 368 | 314 | 90 |
CudaColorSpaceConverter | 1104 | 623 | 814 | 319 | 1697 | 761 | 487 | 129 |
CPU usage (%) | ||||||||
CudaShift | 0.198 | 0.218 | 0.285 | 0.255 | 0.287 | 0.356 | 0.800 | 0.817 |
CudaDebayer | 0.121 | 0.161 | 0.238 | 0.237 | 0.263 | 0.280 | 0.873 | 0.665 |
CudaWhiteBalancer (Gray World Algorithm) | 0.201 | 0.277 | 0.338 | 0.316 | 0.443 | 0.471 | 1.286 | 1.299 |
CudaWhiteBalancer (Histogram Stretch Algorithm) | 0.260 | 0.280 | 0.351 | 0.341 | 0.527 | 0.442 | 1.569 | 1.454 |
CudaColorSpaceConverter | 0.172 | 0.201 | 0.246 | 0.237 | 0.278 | 0.251 | 0.647 | 0.680 |
CPU RAM (MB) | ||||||||
CudaShift | 89.7 | 90.8 | 90.0 | 91.5 | 89.9 | 90.0 | 77.5 | 88.7 |
CudaDebayer | 44.4 | 44.5 | 41.8 | 42.2 | 41.7 | 42.9 | 33.0 | 33.4 |
CudaWhiteBalancer (Gray World Algorithm) | 99.8 | 99.1 | 108.5 | 107.4 | 107.6 | 108.1 | 40.6 | 77.5 |
CudaWhiteBalancer (Histogram Stretch Algorithm) | 99.2 | 99.3 | 107.7 | 108.6 | 108.1 | 107.5 | 52.1 | 33.0 |
CudaColorSpaceConverter | 44.1 | 44.3 | 42.0 | 42.1 | 42.1 | 42.0 | 33.2 | 89.1 |
GPU usage (%) | ||||||||
CudaShift | 12.29 | 7.81 | 18.03 | 13.00 | 11.59 | 4.23 | 85.32 | 25.96 |
CudaDebayer | 11.60 | 13.04 | 28.42 | 29.00 | 7.28 | 10.38 | 67.27 | 42.53 |
CudaWhiteBalancer (Gray World Algorithm) | 9.75 | 19.27 | 17.22 | 25.54 | 4.70 | 15.15 | 20.24 | 75.14 |
CudaWhiteBalancer (Histogram Stretch Algorithm) | 8.89 | 24.42 | 17.00 | 24.56 | 5.36 | 17.11 | 26.84 | 83.35 |
CudaColorSpaceConverter | 4.85 | 9.14 | 13.47 | 24.68 | 4.64 | 15.36 | 20.11 | 81.19 |
GPU RAM (MB) | ||||||||
CudaShift | 26.8 | 28.4 | 43.0 | 85.7 | 42.9 | 63.7 | 41.6 | 43.3 |
CudaDebayer | 6.0 | 6.1 | 10.7 | 10.3 | 9.8 | 11.2 | 11.9 | 11.8 |
CudaWhiteBalancer (Gray World Algorithm) | 33.7 | 32.7 | 57.5 | 61.6 | 56.8 | 59.6 | 40.6 | 31.5 |
CudaWhiteBalancer (Histogram Stretch Algorithm) | 33.2 | 33.4 | 56.8 | 66.3 | 57.1 | 62.2 | 52.1 | 52.4 |
CudaColorSpaceConverter | 5.4 | 6.0 | 9.9 | 10.4 | 10.0 | 10.3 | 11.2 | 11.8 |