CUDA ISP for NVIDIA Jetson/Performance: Difference between revisions

From RidgeRun Developer Wiki
No edit summary
No edit summary
Line 12: Line 12:
* A maximum 3840x2160 case, to test performance on large buffers
* A maximum 3840x2160 case, to test performance on large buffers


== Procesing time ==
== Jetson Nano ==


=== Jetson Nano ===
=== Procesing Time ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
Line 31: Line 31:
</center>
</center>


=== Jetson Xavier NX ===
=== CPU and CPU RAM usage ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
|-
|-
! Procesing time (In microseconds, averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
! Measurement (Averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
|-
| cudashift || 396 || ||
|-
| cudadebayer || 0.002522 || ||
|-
|-
| cudawhitebalancer || || ||
| CPU usage (%)|| 396 || ||
|-
|-
| cudacolorspaceconverter || || ||
| CPU RAM usage (kB) || 0.002522 || ||
|-
|-
|}
|}
</center>
</center>


=== Jetson Xavier AGX ===
=== GPU and GPU RAM usage ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
|-
|-
! Procesing time (In microseconds, averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
! Measurement (Averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
|-
| cudashift || 396 || ||
|-
| cudadebayer || 0.002522 || ||
|-
|-
| cudawhitebalancer || || ||
| GPU usage (%)|| 396 || ||
|-
|-
| cudacolorspaceconverter || || ||
| GPU RAM usage (kB) || 0.002522 || ||
|-
|-
|}
|}
</center>
</center>


=== Jetson Orin ===
== Jetson Xavier NX ==
 
=== Procesing Time ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
Line 82: Line 76:
</center>
</center>


== CPU and CPU RAM usage ==
=== CPU and CPU RAM usage ===
 
=== Jetson Nano ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
|-
|-
! Procesing time (In microseconds, averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
! Measurement (Averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
|-
| cudashift || 396 || ||
|-
| cudadebayer || 0.002522 || ||
|-
|-
| cudawhitebalancer || || ||
| CPU usage (%)|| 396 || ||
|-
|-
| cudacolorspaceconverter || || ||
| CPU RAM usage (kB) || 0.002522 || ||
|-
|-
|}
|}
</center>
</center>


=== Jetson Xavier NX ===
=== GPU and GPU RAM usage ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
|-
|-
! Procesing time (In microseconds, averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
! Measurement (Averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
|-
|-
| cudashift || 396 || ||
| GPU usage (%)|| 396 || ||
|-
| cudadebayer || 0.002522 || ||
|-
|-
| cudawhitebalancer || || ||
| GPU RAM usage (kB) || 0.002522 || ||
|-
| cudacolorspaceconverter || || ||
|-
|-
|}
|}
</center>
</center>


=== Jetson Xavier AGX ===
== Jetson Xavier AGX ==
 
=== Procesing Time ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
Line 135: Line 121:
</center>
</center>


=== Jetson Orin ===
=== CPU and CPU RAM usage ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
|-
|-
! Procesing time (In microseconds, averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
! Measurement (Averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
|-
| cudashift || 396 || ||
|-
| cudadebayer || 0.002522 || ||
|-
|-
| cudawhitebalancer || || ||
| CPU usage (%)|| 396 || ||
|-
|-
| cudacolorspaceconverter || || ||
| CPU RAM usage (kB) || 0.002522 || ||
|-
|-
|}
|}
</center>
</center>


== GPU and GPU RAM usage ==
=== GPU and GPU RAM usage ===
 
=== Jetson Nano ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
|-
|-
! Procesing time (In microseconds, averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
! Measurement (Averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
|-
| cudashift || 396 || ||
|-
|-
| cudadebayer || 0.002522 || ||
| GPU usage (%)|| 396 || ||
|-
|-
| cudawhitebalancer || || ||
| GPU RAM usage (kB) || 0.002522 || ||
|-
| cudacolorspaceconverter || || ||
|-
|-
|}
|}
</center>
</center>


=== Jetson Xavier NX ===
== Jetson Orin ==
 
=== Procesing Time ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
Line 188: Line 166:
</center>
</center>


=== Jetson Xavier AGX ===
=== CPU and CPU RAM usage ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
|-
|-
! Procesing time (In microseconds, averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
! Measurement (Averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
|-
| cudashift || 396 || ||
|-
| cudadebayer || 0.002522 || ||
|-
|-
| cudawhitebalancer || || ||
| CPU usage (%)|| 396 || ||
|-
|-
| cudacolorspaceconverter || || ||
| CPU RAM usage (kB) || 0.002522 || ||
|-
|-
|}
|}
</center>
</center>


=== Jetson Orin ===
=== GPU and GPU RAM usage ===
<center>
<center>
{| class="wikitable"
{| class="wikitable"
|-
|-
! Procesing time (In microseconds, averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
! Measurement (Averaged over 100 iterations) !! 2x2 Buffers !! 1920x1080 Buffers !! 3840x2160 Buffers
|-
| cudashift || 396 || ||
|-
|-
| cudadebayer || 0.002522 || ||
| GPU usage (%)|| 396 || ||
|-
|-
| cudawhitebalancer || || ||
| GPU RAM usage (kB) || 0.002522 || ||
|-
| cudacolorspaceconverter || || ||
|-
|-
|}
|}

Revision as of 13:53, 24 March 2023



  Index  





Library API performance

To measure the CUDA ISP API performance, we built a simple example that iterates over the apply methods and records performance metrics for each iteration. We recorded the duration of each apply method, the CPU and GPU usage during the application of the code, and the CPU RAM and GPU RAM usage. We recorded the performance on a Jetson Nano, Jetson Xavier NX, Jetson Xavier AGX, and Jetson Orin. We recorded the performance statistics over 3 buffer sizes:

  • A minimum 2x2 case, to test the maximum speeds that the apply methods could achieve
  • A medium 1920x1080 case, to illustrate the changes in performance as the buffer size increases
  • A maximum 3840x2160 case, to test performance on large buffers

Jetson Nano

Procesing Time

Procesing time (In microseconds, averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
cudashift 396
cudadebayer 0.002522
cudawhitebalancer
cudacolorspaceconverter

CPU and CPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
CPU usage (%) 396
CPU RAM usage (kB) 0.002522

GPU and GPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
GPU usage (%) 396
GPU RAM usage (kB) 0.002522

Jetson Xavier NX

Procesing Time

Procesing time (In microseconds, averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
cudashift 396
cudadebayer 0.002522
cudawhitebalancer
cudacolorspaceconverter

CPU and CPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
CPU usage (%) 396
CPU RAM usage (kB) 0.002522

GPU and GPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
GPU usage (%) 396
GPU RAM usage (kB) 0.002522

Jetson Xavier AGX

Procesing Time

Procesing time (In microseconds, averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
cudashift 396
cudadebayer 0.002522
cudawhitebalancer
cudacolorspaceconverter

CPU and CPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
CPU usage (%) 396
CPU RAM usage (kB) 0.002522

GPU and GPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
GPU usage (%) 396
GPU RAM usage (kB) 0.002522

Jetson Orin

Procesing Time

Procesing time (In microseconds, averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
cudashift 396
cudadebayer 0.002522
cudawhitebalancer
cudacolorspaceconverter

CPU and CPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
CPU usage (%) 396
CPU RAM usage (kB) 0.002522

GPU and GPU RAM usage

Measurement (Averaged over 100 iterations) 2x2 Buffers 1920x1080 Buffers 3840x2160 Buffers
GPU usage (%) 396
GPU RAM usage (kB) 0.002522

GStreamer elements performance

To measure the performance, we have used two of our GStreamer tools: GstShark and GstPerf.

For testing purposes, take into account the following points:

  • Maximum performance mode enabled: all cores and Jetson clocks enabled.
  • Jetpack 4.6
  • FPS is equal to 1/processing time

Jetson Xavier NX

In the following sections you will see the performance for each of the elements.

cudashift element

The following pipeline was used to measure the processing time and FPS for the cudashift element with an input image with 4K resolution coming from a camera sensor.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, format=rggb' ! cudashift shift=5 ! fakesink

The results obtained:

Measurement (Average) Jetson Xavier NX
FPS 396
Processing time (seconds) 0.002522




CUDA ISP library
CUDA ISP library




cudadebayer element

RGB Output

The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 4K resolution coming from a camera sensor to an RGB output image.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! fakesink

The results obtained:

Measurement (Average) Jetson Xavier NX
FPS 228
Processing time (seconds) 0.004389




CUDA ISP library
CUDA ISP library




I420 Output

The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 4K resolution coming from a camera sensor to an I420 output image.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! fakesink

The results obtained:

Measurement (Average) Jetson Xavier NX
FPS 187
Processing time (seconds) 0.005353




CUDA ISP library
CUDA ISP library




cudaawb element

RGB Output

The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 4K resolution coming from a camera sensor to an RGB output image.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! fakesink

The results obtained:

Measurement (Average) Jetson Xavier NX
FPS 370
Processing time (seconds) 0.002698




CUDA ISP library
CUDA ISP library




I420 Output

The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 4K resolution coming from a camera sensor to an I420 output image.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

The results obtained:

Measurement (Average) Jetson Xavier NX
FPS 202
Processing time (seconds) 0.004952




CUDA ISP library
CUDA ISP library




Jetson Xavier AGX

In the following sections you will see the performance the elements.

cudadebayer

RGB Output

The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 1920x1200 resolution coming from a camera sensor to an RGB output image.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! 'video/x-raw, format=RGB' ! fakesink

The results obtained:

Measurement (Average) Jetson Xavier AGX
FPS 539
Processing time (seconds) 0.001854




CUDA ISP library
CUDA ISP library




I420 Output

The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 1920x1200 resolution coming from a camera sensor to an I420 output image.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! 'video/x-raw, format=I420' ! fakesink

The results obtained:

Measurement (Average) Jetson Xavier AGX
FPS 458
Processing time (seconds) 0.002183




CUDA ISP library
CUDA ISP library




cudaawb

RGB Output

The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 1920x1200 resolution coming from a camera sensor to an RGB output image.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=RGB' ! fakesink

The results obtained:

Measurement (Average) Jetson Xavier NX
FPS 752
Processing time (seconds) 0.001329




CUDA ISP library
CUDA ISP library




I420 Output

The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 1920x1200 resolution coming from a camera sensor to an I420 output image.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

The results obtained:

Measurement (Average) Jetson Xavier NX
FPS 473
Processing time (seconds) 0.002111




CUDA ISP library
CUDA ISP library




More cameras

This section shows the performance results for the elements running at the same time on more than one camera on a Jetson XavierAGX. For all the tests done with an RGB output image, the following pipeline was used to measure the processing time and FPS for the cudaawb and the cudadebayer element with an input image with 1920x1200 resolution coming from multiple camera sensor.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src device=/dev/video0 io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=RGB' ! fakesink

The same way, for all the test done with an I420 output image. the following pipeline was used to measure the processing time and FPS for the cudaawb and the cudadebayer element with an input image with 1920x1200 resolution coming from multiple camera sensor

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src device=/dev/video1 io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

Two cameras

RGB Output

The results obtained:

Measurement (Average) cudadebayer cudaawb
FPS 412 397
Processing time (seconds) 0.002426 0.002521

I420

The results obtained:

Measurement (Average) cudadebayer cudaawb
FPS 464 429
Processing time (seconds) 0.002154 0.002330

Three cameras

RGB Output

The results obtained:

Measurement (Average) cudadebayer cudaawb
FPS 429 374
Processing time (seconds) 0.002354 0.002672

I420

The results obtained:

Measurement (Average) cudadebayer cudaawb
FPS 402 450
Processing time (seconds) 0.002486 0.002220

Four cameras

RGB Output

The results obtained:

Measurement (Average) cudadebayer cudaawb
FPS 385 689
Processing time (seconds) 0.002597 0.001450

I420

The results obtained:

Measurement (Average) cudadebayer cudaawb
FPS 320 289
Processing time (seconds) 0.003128 0.003459

Five cameras

RGB Output

The results obtained:

Measurement (Average) cudadebayer cudaawb
FPS 494 347
Processing time (seconds) 0.002025 0.002883

I420

The results obtained:

Measurement (Average) cudadebayer cudaawb
FPS 332 296
Processing time (seconds) 0.003011 0.003375











  Index