Performance of the Stitcher element on NVIDIA Orin
Image Stitching for NVIDIA®Jetson™ |
---|
Before Starting |
Image Stitching Basics |
Overview |
Getting Started |
User Guide |
Resources |
Examples |
Spherical Video |
Performance |
Contact Us |
Image Stitching for NVIDIA Jetson RidgeRun documentation is currently under development. |
The performance of the cuda stitcher element depends on many factors, being more significant than those that have a direct influence on the output resolution.
The following sections show the measurements of the cuda-stitcher (FPS and Latency) for multiple image resolutions; as well as the impact of changing parameters such as the blending width and the homography-list.
For reference, you will find in each section the homographies calibration file used for the testing. The performance results can vary depending on the homographies, but on average the following results represent the overall performance of the element.
Orin AGX
Platform Setup
For the AGX Orin, the testing was done with JP 6. While for Orin NX with JP 5.1.2.
Also, for the examples with Jetson Clocks, you can activate this mode as follows.
sudo /usr/bin/jetson_clocks
Framerate
1920x1080
In the case of 1920x1080 resolution, for 2, 3, and 6 inputs the homographies files are the following:
{ "homographies": [ { "images": { "target": 1, "reference": 0 }, "matrix": { "h00": 0.7490261895239074, "h01": 0.04467113632580552, "h02": 1018.9828151317821, "h10": -0.05577820485200396, "h11": 0.9590844935041531, "h12": 61.08068248533324, "h20": -0.00014412069693060743, "h21": 1.7581178118418628e-05, "h22": 1.0 } } ] }
{ "homographies": [ { "images": { "target": 1, "reference": 0 }, "matrix": { "h00": 0.7490261895239074, "h01": 0.04467113632580552, "h02": 1018.9828151317821, "h10": -0.05577820485200396, "h11": 0.9590844935041531, "h12": 61.08068248533324, "h20": -0.00014412069693060743, "h21": 1.7581178118418628e-05, "h22": 1.0 } }, { "images": { "target": 2, "reference": 0 }, "matrix": { "h00": 1.3197060186315637, "h01": -0.10518566348433173, "h02": -1264.5768270277113, "h10": 0.1467783274677278, "h11": 1.1524649023229194, "h12": -227.0179395401691, "h20": 0.00019864314625771476, "h21": -0.00010857278904972765, "h22": 1.0 } } ] }
{ "homographies": [ { "images": { "target": 1, "reference": 0 }, "matrix": { "h00": 0.7490261895239074, "h01": 0.04467113632580552, "h02": 1018.9828151317821, "h10": -0.05577820485200396, "h11": 0.9590844935041531, "h12": 61.08068248533324, "h20": -0.00014412069693060743, "h21": 1.7581178118418628e-05, "h22": 1.0 } }, { "images": { "target": 2, "reference": 0 }, "matrix": { "h00": 0.8203959419915308, "h01": 0.19134092629013782, "h02": 927.0948457177544, "h10": -0.0179915273643625, "h11": 1.034698464498257, "h12": -680.8085473782533, "h20": -0.00014834561743950648, "h21": 8.652704821052748e-05, "h22": 1.0 } }, { "images": { "target": 3, "reference": 0 }, "matrix": { "h00": 1.0378054016500433, "h01": 0.03676895233913665, "h02": -5.558535987201656, "h10": -0.006201947768703567, "h11": 1.0395215780726415, "h12": -621.329917462604, "h20": -1.2684476455313587e-05, "h21": 5.7273947504607006e-05, "h22": 1.0 } }, { "images": { "target": 4, "reference": 0 }, "matrix": { "h00": 1.3394576098442972, "h01": -0.15576318123486257, "h02": -1230.4161628261922, "h10": 0.21296156399303987, "h11": 1.25689163996496, "h12": -1130.7835925176462, "h20": 0.00014172905668812799, "h21": 4.9304913992162066e-05, "h22": 1.0 } }, { "images": { "target": 5, "reference": 0 }, "matrix": { "h00": 1.3197060186315637, "h01": -0.10518566348433173, "h02": -1264.5768270277113, "h10": 0.1467783274677278, "h11": 1.1524649023229194, "h12": -227.0179395401691, "h20": 0.00019864314625771476, "h21": -0.00010857278904972765, "h22": 1.0 } } ] }
The next graph shows the amount of fps for each setup of inputs with and without jetson clocks.
4K
In the case of 4K resolution, for two,three and six inputs the homographies files are the following:
{ "homographies": [ { "images": { "target": 1, "reference": 0 }, "matrix": { "h00": 0.7014208032457997, "h01": 0.0, "h02": 3180.223613728557, "h10": -0.044936941010614385, "h11": 0.9201121048700188, "h12": 86.2789267403796, "h20": -4.160827871353184e-05, "h21": 0.0, "h22": 1.0 } } ] }
{ "homographies": [ { "images": { "target": 1, "reference": 0 }, "matrix": { "h00": 0.7014208032457997, "h01": 0.0, "h02": 3180.223613728557, "h10": -0.044936941010614385, "h11": 0.9201121048700188, "h12": 86.2789267403796, "h20": -4.160827871353184e-05, "h21": 0.0, "h22": 1.0 } }, { "images": { "target": 2, "reference": 0 }, "matrix": { "h00": 1.0059596024427553, "h01": 0.0, "h02": -3065.535683195869, "h10": 0.02812139261826276, "h11": 1.0500863393572026, "h12": -56.09324650577886, "h20": 2.60866350818764e-05, "h21": 0.0, "h22": 1.0 } } ] }
{ "homographies": [ { "images": { "target": 1, "reference": 0 }, "matrix": { "h00": 1.069728304504107, "h01": 0.003976349281680208, "h02": 2486.8360288823615, "h10": 0.04509436469518921, "h11": 1.036853809177049, "h12": -90.69673725525423, "h20": 9.660223055111373e-06, "h21": 9.013807989685773e-06, "h22": 1.0 } }, { "images": { "target": 2, "reference": 0 }, "matrix": { "h00": 1.1029118727474485, "h01": 0.043952634510905024, "h02": 2444.964028928727, "h10": 0.024629835881337415, "h11": 1.0529792960117825, "h12": -146.73010297262417, "h20": 1.4490325579825004e-05, "h21": 1.404662537183237e-05, "h22": 1.0 } }, { "images": { "target": 3, "reference": 0 }, "matrix": { "h00": 0.8848587011291174, "h01": -0.20469564243712468, "h02": 221.0712938320948, "h10": 0.0, "h11": 0.3625879945792258, "h12": 1319.7216938381491, "h20": 0.0, "h21": -0.00010661231376933578, "h22": 1.0 } }, { "images": { "target": 4, "reference": 0 }, "matrix": { "h00": 0.8892810916856179, "h01": -0.4749636131671599, "h02": 2925.5803039636144, "h10": 0.0, "h11": 0.37335000737513163, "h12": 1316.6516233548427, "h20": 0.0, "h21": -0.00010251750769850202, "h22": 1.0 } }, { "images": { "target": 5, "reference": 0 }, "matrix": { "h00": 1.004086737535137, "h01": -0.012649881236723413, "h02": 9.93271028892309, "h10": 0.003748731449662241, "h11": 0.9950228623682942, "h12": -4.0743393651017925, "h20": 1.6341487799222761e-06, "h21": -3.0483164302875094e-06, "h22": 1.0 } } ] }
Latency
Using the same setup as the case for framerate, for the purpose of this performance evaluation, Latency is measured as the time difference between the src of the element before the stitcher and the src of the stitcher itself, effectively measuring the time between input and output pads. For multiple inputs, the largest time difference is taken.
These latency measurements were taken using the GstShark interlatency tracer.
The pictures below show the latency of the cuda-stitcher element, for multiple input images and multiple resolutions, as well as using and not using the jetson_clocks script.
Using the same calibration files from the framerate performance you can achieve the following results for latency.
Pipeline structure
anonymous: I would move this section before the sections talking about performance (please remove this box when addressed) |
anonymous: Also, you can put the 3 pipelines used for 2, 3 and 6 inputs (please remove this box when addressed) |
To replicate the results using your images, videos, or cameras, you can use the following pipeline as a base for the case of 2 cameras, then you can add the other inputs for the other cases. Also, you can adjust the resolution if needed.
INPUT_0=<VIDEO_INPUT_0> INPUT_1=<VIDEO_INPUT_1> gst-launch-1.0 -e cudastitcher name=stitcher \ homography-list="`cat homographies.json | tr -d "\n" | tr -d " "`" \ filesrc location=$INPUT_0 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv ! stitcher.sink_0 \ filesrc location=$INPUT_1 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv ! stitcher.sink_1 \ stitcher. ! perf print-cpu-load=true ! fakesink -v
Jetson Orin Platforms CPU Usage
In the following table, you can see the performance with and without Jetson Clocks for different platforms from the Orin family with cases of 2 and 6 input video sources with a resolution of 1920x1080 with 60fps.
Platform | Mode | Cameras | CPU | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avg | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |||
Orin Nano | Normal | 2 | - | - | - | - | - | - | - | - | - | - | - | - | - |
6 | - | - | - | - | - | - | - | - | - | - | - | - | - | ||
Jetson Clocks | 2 | - | - | - | - | - | - | - | - | - | - | - | - | - | |
6 | - | - | - | - | - | - | - | - | - | - | - | - | - | ||
Orin NX | Normal | 2 | - | - | - | - | - | - | - | - | - | - | - | - | - |
6 | - | - | - | - | - | - | - | - | - | - | - | - | - | ||
Jetson Clocks | 2 | - | - | - | - | - | - | - | - | - | - | - | - | - | |
6 | - | - | - | - | - | - | - | - | - | - | - | - | - | ||
AGX Orin | Normal | 2 | 16% | 17% | 12% | 9% | 8% | 19% | 7% | 29% | 28% | - | - | - | - |
6 | 13% | 19% | 11% | 8% | 8% | 16% | 15% | 11% | 12% | - | - | - | - | ||
Jetson Clocks | 2 | 17% | 13% | 21% | 3% | 1% | 29% | 29% | 22% | 17% | - | - | - | - | |
6 | 9% | 13% | 14% | 9% | 8% | 7% | 7% | 6% | 5% | - | - | - | - |
anonymous: Why the usage for 2 cameras in some cases is higher than the 6 cameras case? (please remove this box when addressed) |
Jetson Orin Platforms GPU and RAM Usage
In the following table, you can see the performance with and without Jetson Clocks for different platforms from the Orin family with cases of 2 and 6 input video sources with a resolution of 1920x1080 with 60fps.
Platform | Mode | Cameras | GPU | RAM |
---|---|---|---|---|
Orin Nano | Normal | 2 | - | - |
6 | - | - | ||
Jetson Clocks | 2 | - | - | |
6 | - | - | ||
Orin Nx | Normal | 2 | - | - |
6 | - | - | ||
Jetson Clocks | 2 | - | - | |
6 | - | - | ||
Orin AGX | Normal | 2 | 58.63% | 5.19% |
6 | 76.16% | 5.17% | ||
Jetson Clocks | 2 | 59.32% | 5.33% | |
6 | 81.68% | 5.28% |