Performance of the Stitcher element on NVIDIA Jeston Nano
Image Stitching for NVIDIA®Jetson™ |
---|
Before Starting |
Image Stitching Basics |
Overview |
Getting Started |
User Guide |
Resources |
Examples |
Spherical Video |
Performance |
Contact Us |
The performance of the cuda stitcher element depends on many factors, being more significant than those that have a direct influence on the output resolution.
The following sections show the measurements of the cuda-stitcher (FPS and Latency) for multiple image resolutions; as well as the impact of changing parameters such as the blending width and the homography-list.
Platform Setup
The performance measurements were done with a Jetson Nano board in 10W mode, which is the default and can be activated with
sudo nvpmodel -m 0
Some sections show a comparison between using the platform at a maximum frequency (With jetson_clocks) and in base mode. This mode can be set as follows:
sudo /usr/bin/jetson_clocks
Framerate
The average Frames per Second measurements are shown in the following charts varying the number of inputs, image resolutions, image overlap, and blending width. Also, the impact of executing or not the jetson_clocks script is shown in some of the results.
Comparing number of stitched images
These measurements were done with a BORDER_WIDTH
of 20 and a 10% overlap between images.
Comparing homographies
The homographies are used to define how each image is transformed before stitching, depending on its values the overlap between images can increase or decrease, yielding a smaller or larger output image and impacting performance.
These measurements show the effect of different transformations with different overlap percentages, the results were obtained from stitching 2 images of 1920x1080 with a BORDER_WIDTH
of 20 and running the jetson_clocks script.
Comparing blending widths
This parameter is set with the border-width=BORDER_WIDTH
option, it sets the overlap that will be blended from the input images, therefore the larger its value the slower it is processed. These measurements were taken from the stitching of 2 images of 1920x1080 with an overlap of 10% and running the jetson_clocks script.
The blender operates only on the parts of the images that are overlapping, in this case, the overlap is 10% that's why using a BORDER_WIDTH
of 300 doesn't affect performance compared to 200, since the blender is only operating on a maximum of 192 (10% overlap for 1920x1080).
Comparing input resolutions
The resolution of the input image plays a big role when evaluating performance, the bigger the input, the slower the algorithm. The data below shows the result of executing the stitching over images of different resolutions, all of them with a 16:9 aspect ratio, a BORDER_WIDTH
of 20, an overlap of 10% and running the jetson_clocks script.
Pipeline Structure
The general structure of the pipeline used for the FPS measurements above is the following:
gst-launch-1.0 -e cudastitcher name=stitcher \ homography-list="`cat homographies.json | tr -d "\n" | tr -d " "`" \ border-width=$BORDER_WIDTH \ filesrc location=images/ImageA.jpg ! nvjpegdec ! imagefreeze ! nvvidconv ! queue ! stitcher.sink_0 \ filesrc location=images/ImageB.jpg ! nvjpegdec ! imagefreeze ! nvvidconv ! queue ! stitcher.sink_1 \ stitcher. ! perf print-cpu-load=true ! fakesink
Latency
For the purpose of this performance evaluation, Latency is measured as the time difference between the src of the element before the stitcher and the src of the stitcher itself, effectively measuring the time between input and output pads. For multiple inputs, the largest time difference is taken.
These latency measurements were taken using the GstShark interlatency tracer.
The pictures below show the latency of the cuda-stitcher element, for multiple input images and multiple resolutions, as well as using and not using the jetson_clocks script.
Pipeline structure
The general structure of the pipeline used for the latency measurements is shown below, for the case of 2 images of 3840x2160 resolution.
BORDER_WIDTH=20 INPUT_1=image_1.jpg INPUT_2=image_2.jpg GST_DEBUG="3,GST_TRACER:7" GST_TRACERS="interlatency" GST_SHARK_CTF_DISABLE=1 \ gst-launch-1.0 -e cudastitcher name=stitcher \ homography-list="`cat homographies.json | tr -d "\n" | tr -d " "`" \ border-width=$BORDER_WIDTH \ multifilesrc loop=true location=$INPUT_1 ! nvjpegdec ! 'video/x-raw, width=3840, height=2160' \ ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12, width=3840, height=2160' \ ! nvvidconv ! queue ! stitcher.sink_0 \ multifilesrc loop=true location=$INPUT_2 ! nvjpegdec ! 'video/x-raw, width=3840, height=2160' \ ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12, width=3840, height=2160' \ ! nvvidconv ! queue ! stitcher.sink_1 \ stitcher. ! perf print-arm-load=true ! queue ! fakesink