Measuring CUDA ISP API performance using GstShark and GstPerf

From RidgeRun Developer Wiki


Previous: Performance/Library Index Next: Contact Us






GStreamer elements performance

To measure the performance, we have used two of our GStreamer tools: GstShark and GstPerf.

For testing purposes, take into account the following points:

  • Maximum performance mode enabled: all cores and Jetson clocks enabled.
  • Jetpack 4.6
  • A patch was applied to v4l2src to enable bayer10 captures. You can see how to apply the patch in this link: Apply patch to v4l2src

In summary:

Jetson Orin (2472x2064) Jetson Xavier AGX (+1080p) Jetson Xavier NX (4K) Jetson Nano (4K)
Element cudashift cudadebayer cudaawb cudadebayer cudaawb cudashift cudadebayer cudaawb cudashift cudadebayer cudaawb
Output bayer 8 RGB I420 RGB I420 RGB I420 RGB I420 bayer 8 RGB I420 RGB I420 bayer 8 RGB I420 RGB I420
FPS 1382 1063 861 1800 1022 539 458 752 473 396 228 187 370 202 92 51 36 91 38
Processing time (milliseconds) 0.7237 0.9411 1.161 0.5557 0.9781 1.854 2.183 1.329 2.111 2.522 4.389 5.353 2.698 4.952 10.886 19.484 27.696 10.961 26.022

Jetson Orin

In this case, the patch for v4l2src was modified to enable captures in bayer 12. The processing time and FPS were measured with an input image with 2472x2064 resolution from a camera sensor for all the elements.

The following pipeline measured the processing time and FPS for the cudashift element.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=12, width=2472, height=2064, format=rggb' ! cudashift shift=5 ! fakesink

The following pipeline was used to test the cudadebayer and cudaawb elements with an RGB image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=12, width=2472, height=2064, format=rggb' ! cudadebayer ! cudaawb ! 'video/x-raw, format=RGB' ! fakesink

The following pipeline was used to test the cudadebayer and cudaawb elements with an I420 image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=12, width=2472, height=2064, format=rggb' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

The following image shows a plot with the processing time for all elements:




CUDA ISP library
CUDA ISP library




Jetson Xavier AGX

The processing time and FPS were measured with an input image with +1080p resolution from a camera sensor for all the elements.

The following pipeline was used to test the cudadebayer and cudaawb elements with an RGB image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=RGB' ! fakesink

The following pipeline was used for the cudadebayer and cudaawb elements with an I420 image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

The following image shows a plot with the processing time for all elements:




CUDA ISP library
CUDA ISP library




Jetson Xavier NX

The processing time and FPS were measured with an input image with 4K resolution from a camera sensor for all the elements.

The following pipeline measured the processing time and FPS for the cudashift element.

<

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, format=rggb' ! cudashift shift=5 ! fakesink

The following pipeline was used to test the cudadebayer and cudaawb elements with an RGB image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! fakesink

The following pipeline was used to test the cudadebayer and cudaawb elements with an I420 image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink


The following image shows a plot with the processing time for all elements:




CUDA ISP library
CUDA ISP library




Jetson Nano

The processing time and FPS were measured with an input image with 4K resolution from a camera sensor for all the elements.

The following pipeline measured the processing time and FPS for the cudashift element.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, format=rggb' ! cudashift shift=0 ! fakesink

he following pipeline was used to testing the cudadebayer and cudaawb elements with an RGB image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! fakesink

The following pipeline was used to test the cudadebayer and cudaawb elements with an I420 image as output.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src io-mode=userptr ! 'video/x-bayer, bpp=10, width=3840, height=2160' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

The following image shows a plot with the processing time for all elements:




CUDA ISP library
CUDA ISP library




More cameras

This section shows the performance results for the elements running simultaneously on multiple cameras on a Jetson Xavier AGX. For all the tests done with an RGB output image, the following pipeline was used to measure the processing time and FPS for the cudaawb and the cudadebayer element with an input image with 1920x1200 resolution coming from multiple camera sensors.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src device=/dev/video0 io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=RGB' ! fakesink

In the same way, all the tests are done with an I420 output image. the following pipeline was used to measure the processing time and FPS for the cudaawb and the cudadebayer element with an input image with 1920x1200 resolution coming from multiple camera sensors

GST_DEBUG="GST_TRACER:7" GST_TRACERS="proctime" gst-launch-1.0 -ve v4l2src device=/dev/video1 io-mode=userptr ! 'video/x-bayer, bpp=10, width=1920, height=1200, format=grbg' ! cudadebayer ! cudaawb ! 'video/x-raw, format=I420' ! fakesink

The results obtained:

cudadebayer
Output RGB I420
Number of cameras Two Three Four Five Two Three Four Five
FPS 412 429 385 494 464 402 320 332
Processing time (milliseconds) 2.426 2.354 2.597 2.025 2.154 2.486 3.128 3.011
cudaawb
Output RGB I420
Number of cameras Two Three Four Five Two Three Four Five
FPS 397 374 689 347 429 450 289 296
Processing time (milliseconds) 2.521 2.672 1.450 2.883 2.330 2.20 3.459 3.375




Previous: Performance/Library Index Next: Contact Us