CUDA ISP for NVIDIA Jetson/Performance: Difference between revisions
Line 199: | Line 199: | ||
* Jetpack 4.6 | * Jetpack 4.6 | ||
== '''Xavier AGX''' == | == '''Jetson Xavier AGX''' == | ||
The results obtained: | The results obtained: | ||
Line 211: | Line 211: | ||
! Output !! RGB !! I420 !! RGB !! I420 | ! Output !! RGB !! I420 !! RGB !! I420 | ||
|- | |- | ||
| FPS || | | FPS || 539 || 549 || 752 || 473 | ||
|- | |- | ||
| Processing time (seconds) || 0. | | Processing time (seconds) || 0.001854 || 0.002183 || 0.001329 || 0.002111 | ||
|} | |} | ||
</center> | </center> |
Revision as of 18:19, 24 March 2023
CUDA ISP for NVIDIA Jetson |
---|
CUDA ISP for NVIDIA Jetson Basics |
Getting Started |
User Manual |
GStreamer |
Examples |
Performance |
Contact Us |
Library API performance
To measure the CUDA ISP API performance, we built a simple example that iterates over the apply methods and records performance metrics for each iteration. We recorded the duration of each apply method, the CPU and GPU usage during the application of the code, and the CPU RAM and GPU RAM usage. We recorded the performance on a Jetson Nano, Jetson Xavier NX, Jetson Xavier AGX, and Jetson Orin. We recorded the performance statistics over 3 buffer sizes:
- A minimum 2x2 case, to test the maximum speeds that the apply methods could achieve
- A medium 1920x1080 case, to illustrate the changes in performance as the buffer size increases
- A maximum 3840x2160 case, to test performance on large buffers
Jetson Nano
Procesing Time
Procesing time (In microseconds, averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
cudashift | 136 | 135 | 147 |
cudadebayer | 68 | 53 | 55 |
cudawhitebalancer | 317 | 5071 | 18903 |
cudacolorspaceconverter | 55 | 55 | 57 |
CPU and CPU RAM usage
Measurement (Averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
CPU usage (%) | 0.797500 | 0.836478 | 0.819940 |
CPU RAM usage (kB) | 147071 | 146295 | 147580 |
GPU and GPU RAM usage
Measurement (Averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
GPU usage (%) | 0.0 | 25.12 | 94.6 |
GPU RAM usage (kB) | 91967 | 91733 | 116833 |
Jetson Xavier NX
Procesing Time
Procesing time (In microseconds, averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
cudashift | 93 | 93 | 93 |
cudadebayer | 39 | 39 | 31 |
cudawhitebalancer | 375 | 1360 | 4249 |
cudacolorspaceconverter | 33 | 35 | 34 |
CPU and CPU RAM usage
Measurement (Averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
CPU usage (%) | 0.482488 | 0.523657 | 0.477216 |
CPU RAM usage (kB) | 171679 | 173539 | 171987 |
GPU and GPU RAM usage
Measurement (Averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
GPU usage (%) | 0.85 | 5.48 | 17.91 |
GPU RAM usage (kB) | 98719 | 100387 | 106288 |
Jetson Xavier AGX
Procesing Time
Procesing time (In microseconds, averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
cudashift | |||
cudadebayer | |||
cudawhitebalancer | |||
cudacolorspaceconverter |
CPU and CPU RAM usage
Measurement (Averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
CPU usage (%) | |||
CPU RAM usage (kB) |
GPU and GPU RAM usage
Measurement (Averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
GPU usage (%) | |||
GPU RAM usage (kB) |
Jetson Orin
Procesing Time
Procesing time (In microseconds, averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
cudashift | |||
cudadebayer | |||
cudawhitebalancer | |||
cudacolorspaceconverter |
CPU and CPU RAM usage
Measurement (Averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
CPU usage (%) | |||
CPU RAM usage (kB) |
GPU and GPU RAM usage
Measurement (Averaged over 100 iterations) | 2x2 Buffers | 1920x1080 Buffers | 3840x2160 Buffers |
---|---|---|---|
GPU usage (%) | |||
GPU RAM usage (kB) |
GStreamer elements performance
To measure the performance, we have used two of our GStreamer tools: GstShark and GstPerf.
For testing purposes, take into account the following points:
- Maximum performance mode enabled: all cores and Jetson clocks enabled.
- Jetpack 4.6
Jetson Xavier AGX
The results obtained:
Xavier AGX | ||||
---|---|---|---|---|
Element | cudadebayer | cudaawb | ||
Output | RGB | I420 | RGB | I420 |
FPS | 539 | 549 | 752 | 473 |
Processing time (seconds) | 0.001854 | 0.002183 | 0.001329 | 0.002111 |
cudadebayer element
The results obtained:
cudadebayer | ||||||
---|---|---|---|---|---|---|
Platform | Nano | Xavier NX | Xavier AGX | |||
Output | RGB | I420 | RGB | I420 | RGB | I420 |
FPS | 51 | 36 | 228 | 187 | 539 | 549 |
Processing time (seconds) | 0.01948 | 0.02769 | 0.004389 | 0.005353 | 0.001854 | 0.002183 |
cudaawb element
The results obtained:
cudaawb | ||||||
---|---|---|---|---|---|---|
Platform | Nano | Xavier NX | Xavier AGX | |||
Output | RGB | I420 | RGB | I420 | RGB | I420 |
FPS | 91 | 38 | 370 | 202 | 752 | 473 |
Processing time (seconds) | 0.01096 | 0.02605 | 0.002698 | 0.004952 | 0.001329 | 0.002111 |
Jetson Nano
In the following sections you will see the performance for each of the elements.
cudashift element
The following pipeline was used to measure the processing time and FPS for the cudashift element with an input image with 4K resolution coming from a camera sensor.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, format=rggb' ! cudashift shift=0 ! fakesink
The results obtained:
Measurement (Average) | Jetson Nano |
---|---|
FPS | 92 |
Processing time (seconds) | 0.01088 |
cudadebayer element
RGB Output
The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 4K resolution coming from a camera sensor to an RGB output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! fakesink
The results obtained:
Measurement (Average) | Jetson Nano |
---|---|
FPS | 51 |
Processing time (seconds) | 0.01948 |
I420 Output
The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 4K resolution coming from a camera sensor to an I420 output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! 'video/x-raw, format=I420' ! fakesink
The results obtained:
Measurement (Average) | Jetson Nano |
---|---|
FPS | 36 |
Processing time (seconds) | 0.02769 |
cudaawb element
RGB Output
The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 4K resolution coming from a camera sensor to an RGB output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! fakesink
The results obtained:
Measurement (Average) | Jetson Nano |
---|---|
FPS | 91 |
Processing time (seconds) | 0.01096 |
I420 Output
The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 4K resolution coming from a camera sensor to an I420 output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink
The results obtained:
Measurement (Average) | Jetson Nano |
---|---|
FPS | 38 |
Processing time (seconds) | 0.02605 |
Jetson Xavier NX
In the following sections you will see the performance for each of the elements.
cudashift element
The following pipeline was used to measure the processing time and FPS for the cudashift element with an input image with 4K resolution coming from a camera sensor.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, format=rggb' ! cudashift shift=5 ! fakesink
The results obtained:
Measurement (Average) | Jetson Xavier NX |
---|---|
FPS | 396 |
Processing time (seconds) | 0.002522 |
cudadebayer element
RGB Output
The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 4K resolution coming from a camera sensor to an RGB output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! fakesink
The results obtained:
Measurement (Average) | Jetson Xavier NX |
---|---|
FPS | 228 |
Processing time (seconds) | 0.004389 |
I420 Output
The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 4K resolution coming from a camera sensor to an I420 output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! fakesink
The results obtained:
Measurement (Average) | Jetson Xavier NX |
---|---|
FPS | 187 |
Processing time (seconds) | 0.005353 |
cudaawb element
RGB Output
The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 4K resolution coming from a camera sensor to an RGB output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! fakesink
The results obtained:
Measurement (Average) | Jetson Xavier NX |
---|---|
FPS | 370 |
Processing time (seconds) | 0.002698 |
I420 Output
The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 4K resolution coming from a camera sensor to an I420 output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink
The results obtained:
Measurement (Average) | Jetson Xavier NX |
---|---|
FPS | 202 |
Processing time (seconds) | 0.004952 |
Jetson Xavier AGX
In the following sections you will see the performance the elements.
cudadebayer
RGB Output
The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 1920x1200 resolution coming from a camera sensor to an RGB output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! 'video/x-raw, format=RGB' ! fakesink
The results obtained:
Measurement (Average) | Jetson Xavier AGX |
---|---|
FPS | 539 |
Processing time (seconds) | 0.001854 |
I420 Output
The following pipeline was used to measure the processing time and FPS for the cudadebayer element with an input image with 1920x1200 resolution coming from a camera sensor to an I420 output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! 'video/x-raw, format=I420' ! fakesink
The results obtained:
Measurement (Average) | Jetson Xavier AGX |
---|---|
FPS | 458 |
Processing time (seconds) | 0.002183 |
cudaawb
RGB Output
The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 1920x1200 resolution coming from a camera sensor to an RGB output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=RGB' ! fakesink
The results obtained:
Measurement (Average) | Jetson Xavier NX |
---|---|
FPS | 752 |
Processing time (seconds) | 0.001329 |
I420 Output
The following pipeline was used to measure the processing time and FPS for the cudaawb element with an input image with 1920x1200 resolution coming from a camera sensor to an I420 output image.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink
The results obtained:
Measurement (Average) | Jetson Xavier NX |
---|---|
FPS | 473 |
Processing time (seconds) | 0.002111 |
More cameras
This section shows the performance results for the elements running at the same time on more than one camera on a Jetson XavierAGX. For all the tests done with an RGB output image, the following pipeline was used to measure the processing time and FPS for the cudaawb and the cudadebayer element with an input image with 1920x1200 resolution coming from multiple camera sensor.
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src device=/dev/video0 io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=RGB' ! fakesink
The same way, for all the test done with an I420 output image. the following pipeline was used to measure the processing time and FPS for the cudaawb and the cudadebayer element with an input image with 1920x1200 resolution coming from multiple camera sensor
GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve rrv4l2src device=/dev/video1 io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink
Two cameras
RGB Output
The results obtained:
Measurement (Average) | cudadebayer | cudaawb |
---|---|---|
FPS | 412 | 397 |
Processing time (seconds) | 0.002426 | 0.002521 |
I420
The results obtained:
Measurement (Average) | cudadebayer | cudaawb |
---|---|---|
FPS | 464 | 429 |
Processing time (seconds) | 0.002154 | 0.002330 |
Three cameras
RGB Output
The results obtained:
Measurement (Average) | cudadebayer | cudaawb |
---|---|---|
FPS | 429 | 374 |
Processing time (seconds) | 0.002354 | 0.002672 |
I420
The results obtained:
Measurement (Average) | cudadebayer | cudaawb |
---|---|---|
FPS | 402 | 450 |
Processing time (seconds) | 0.002486 | 0.002220 |
Four cameras
RGB Output
The results obtained:
Measurement (Average) | cudadebayer | cudaawb |
---|---|---|
FPS | 385 | 689 |
Processing time (seconds) | 0.002597 | 0.001450 |
I420
The results obtained:
Measurement (Average) | cudadebayer | cudaawb |
---|---|---|
FPS | 320 | 289 |
Processing time (seconds) | 0.003128 | 0.003459 |
Five cameras
RGB Output
The results obtained:
Measurement (Average) | cudadebayer | cudaawb |
---|---|---|
FPS | 494 | 347 |
Processing time (seconds) | 0.002025 | 0.002883 |
I420
The results obtained:
Measurement (Average) | cudadebayer | cudaawb |
---|---|---|
FPS | 332 | 296 |
Processing time (seconds) | 0.003011 | 0.003375 |