Performance of the Stitcher element on NVIDIA Thor

From RidgeRun Developer Wiki

Follow Us On Twitter LinkedIn Email Share this page



Previous: Performance Index Next: Performance/Orin







The performance of the cuda stitcher element depends on many factors, being more significant than those that have a direct influence on the output resolution.

The following sections show the measurements of the cuda-stitcher (FPS and Latency) for multiple image resolutions; as well as the impact of changing parameters such as the blending width and the homography-list.

For reference, you will find in each section the homographies calibration file used for the testing. The performance results can vary depending on the homographies, but on average the following results represent the overall performance of the element.

Platforms Setup

The testing for the AGX Thor was done with and without jetson clocks with the mode of MAXN Mode and JP 7.1.

To activate Jetson Clocks, you can do it as follows:

sudo jetson_clocks

Framerate

The following chart shows the maximum framerate for stitching in HD and 4k with multiple inputs.

Latency

Using the same setup as the case for framerate, for the purpose of this performance evaluation, Latency is measured as the time difference between the src of the element before the stitcher and the src of the stitcher itself, effectively measuring the time between input and output pads. For multiple inputs, the largest time difference is taken.

These latency measurements were taken using the GstShark interlatency tracer.

The chart below show the latency of the cuda-stitcher element, for multiple input images and multiple resolutions, as well as using and not using the jetson_clocks script.

Using the same calibration files from the framerate performance you can achieve the following results for latency.

CPU, GPU and RAM Usage

Latency for common resolutions and setups with and without Jetson clocks.
Resolution Inputs CPU (avg) RAM (MB) GPU (Avg)
Jetson Clocks Normal Jetson Clocks Normal Jetson Clocks Normal
1920x1080 2 6.27% 7.57% 391.96 MB 389.79 MB 18.94% 31.95%
3 6.61% 8.35% 471.18 MB 502.07 MB 23.60% 33.84%
6 6.01% 10.40% 813.19 MB 811.17 MB 47.45% 36.55%
4K 2 6.27% 7.81% 427.98 MB 461.44 MB 39.71% 40.09%
3 6.71% 8.14% 544.81 MB 573.72 MB 48.23% 36.58%
6 6.36% 7.11% 944.94 MB 968.22 MB 56.24% 35.55%

Reproducing the Results

Following you can find the files and pipelines used for the test of performance.

Calibration Files

1920x1080

In the case of 1920x1080 resolution, for 2, 3, and 6 inputs the homographies files are the following:

2 Inputs
{
    "homographies": [
        {
            "images": {
                "target": 1,
                "reference": 0
            },
            "matrix": {
                "h00": 0.7490261895239074,
                "h01": 0.04467113632580552,
                "h02": 1018.9828151317821,
                "h10": -0.05577820485200396,
                "h11": 0.9590844935041531,
                "h12": 61.08068248533324,
                "h20": -0.00014412069693060743,
                "h21": 1.7581178118418628e-05,
                "h22": 1.0
            }
        }
    ]
}
3 Inputs
{
    "homographies": [
        {
            "images": {
                "target": 1,
                "reference": 0
            },
            "matrix": {
                "h00": 0.7490261895239074,
                "h01": 0.04467113632580552,
                "h02": 1018.9828151317821,
                "h10": -0.05577820485200396,
                "h11": 0.9590844935041531,
                "h12": 61.08068248533324,
                "h20": -0.00014412069693060743,
                "h21": 1.7581178118418628e-05,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 2,
                "reference": 0
            },
            "matrix": {
                "h00": 1.3197060186315637,
                "h01": -0.10518566348433173,
                "h02": -1264.5768270277113,
                "h10": 0.1467783274677278,
                "h11": 1.1524649023229194,
                "h12": -227.0179395401691,
                "h20": 0.00019864314625771476,
                "h21": -0.00010857278904972765,
                "h22": 1.0
            }
        }
    ]
}
6 Inputs
{
    "homographies": [
        {
            "images": {
                "target": 1,
                "reference": 0
            },
            "matrix": {
                "h00": 0.7490261895239074,
                "h01": 0.04467113632580552,
                "h02": 1018.9828151317821,
                "h10": -0.05577820485200396,
                "h11": 0.9590844935041531,
                "h12": 61.08068248533324,
                "h20": -0.00014412069693060743,
                "h21": 1.7581178118418628e-05,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 2,
                "reference": 0
            },
            "matrix": {
                "h00": 0.8203959419915308,
                "h01": 0.19134092629013782,
                "h02": 927.0948457177544,
                "h10": -0.0179915273643625,
                "h11": 1.034698464498257,
                "h12": -680.8085473782533,
                "h20": -0.00014834561743950648,
                "h21": 8.652704821052748e-05,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 3,
                "reference": 0
            },
            "matrix": {
                "h00": 1.0378054016500433,
                "h01": 0.03676895233913665,
                "h02": -5.558535987201656,
                "h10": -0.006201947768703567,
                "h11": 1.0395215780726415,
                "h12": -621.329917462604,
                "h20": -1.2684476455313587e-05,
                "h21": 5.7273947504607006e-05,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 4,
                "reference": 0
            },
            "matrix": {
                "h00": 1.3394576098442972,
                "h01": -0.15576318123486257,
                "h02": -1230.4161628261922,
                "h10": 0.21296156399303987,
                "h11": 1.25689163996496,
                "h12": -1130.7835925176462,
                "h20": 0.00014172905668812799,
                "h21": 4.9304913992162066e-05,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 5,
                "reference": 0
            },
            "matrix": {
                "h00": 1.3197060186315637,
                "h01": -0.10518566348433173,
                "h02": -1264.5768270277113,
                "h10": 0.1467783274677278,
                "h11": 1.1524649023229194,
                "h12": -227.0179395401691,
                "h20": 0.00019864314625771476,
                "h21": -0.00010857278904972765,
                "h22": 1.0
            }
        }
    ]
}

4k

In the case of 4K resolution, for 2,3, and 6 inputs the homographies files are the following:

2 Inputs
{
    "homographies": [
        {
            "images": {
                "target": 1,
                "reference": 0
            },
            "matrix": {
                "h00": 0.7014208032457997,
                "h01": 0.0,
                "h02": 3180.223613728557,
                "h10": -0.044936941010614385,
                "h11": 0.9201121048700188,
                "h12": 86.2789267403796,
                "h20": -4.160827871353184e-05,
                "h21": 0.0,
                "h22": 1.0
            }
        }
    ]
}
3 Inputs
{
    "homographies": [
        {
            "images": {
                "target": 1,
                "reference": 0
            },
            "matrix": {
                "h00": 0.7014208032457997,
                "h01": 0.0,
                "h02": 3180.223613728557,
                "h10": -0.044936941010614385,
                "h11": 0.9201121048700188,
                "h12": 86.2789267403796,
                "h20": -4.160827871353184e-05,
                "h21": 0.0,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 2,
                "reference": 0
            },
            "matrix": {
                "h00": 1.0059596024427553,
                "h01": 0.0,
                "h02": -3065.535683195869,
                "h10": 0.02812139261826276,
                "h11": 1.0500863393572026,
                "h12": -56.09324650577886,
                "h20": 2.60866350818764e-05,
                "h21": 0.0,
                "h22": 1.0
            }
        }
    ]
}
6 Inputs
{
    "homographies": [
        {
            "images": {
                "target": 1,
                "reference": 0
            },
            "matrix": {
                "h00": 1.069728304504107,
                "h01": 0.003976349281680208,
                "h02": 2486.8360288823615,
                "h10": 0.04509436469518921,
                "h11": 1.036853809177049,
                "h12": -90.69673725525423,
                "h20": 9.660223055111373e-06,
                "h21": 9.013807989685773e-06,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 2,
                "reference": 0
            },
            "matrix": {
                "h00": 1.1029118727474485,
                "h01": 0.043952634510905024,
                "h02": 2444.964028928727,
                "h10": 0.024629835881337415,
                "h11": 1.0529792960117825,
                "h12": -146.73010297262417,
                "h20": 1.4490325579825004e-05,
                "h21": 1.404662537183237e-05,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 3,
                "reference": 0
            },
            "matrix": {
                "h00": 0.8848587011291174,
                "h01": -0.20469564243712468,
                "h02": 221.0712938320948,
                "h10": 0.0,
                "h11": 0.3625879945792258,
                "h12": 1319.7216938381491,
                "h20": 0.0,
                "h21": -0.00010661231376933578,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 4,
                "reference": 0
            },
            "matrix": {
                "h00": 0.8892810916856179,
                "h01": -0.4749636131671599,
                "h02": 2925.5803039636144,
                "h10": 0.0,
                "h11": 0.37335000737513163,
                "h12": 1316.6516233548427,
                "h20": 0.0,
                "h21": -0.00010251750769850202,
                "h22": 1.0
            }
        },
        {
            "images": {
                "target": 5,
                "reference": 0
            },
            "matrix": {
                "h00": 1.004086737535137,
                "h01": -0.012649881236723413,
                "h02": 9.93271028892309,
                "h10": 0.003748731449662241,
                "h11": 0.9950228623682942,
                "h12": -4.0743393651017925,
                "h20": 1.6341487799222761e-06,
                "h21": -3.0483164302875094e-06,
                "h22": 1.0
            }
        }
    ]
}

Pipeline structure

To replicate the results using your images, videos, or cameras, you can use the following pipeline as a base for the case of 2 cameras, then you can add the other inputs for the other cases. Also, you can adjust the resolution if needed.

INPUT_0=<VIDEO_INPUT_0>
INPUT_1=<VIDEO_INPUT_1>
gst-launch-1.0 -e cudastitcher name=stitcher \
homography-list="`cat homographies.json | tr -d "\n" | tr -d " "`" \
filesrc location=$INPUT_0 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv !  stitcher.sink_0 \
filesrc location=$INPUT_1 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv ! stitcher.sink_1 \
stitcher. ! perf print-cpu-load=true ! fakesink -v
INPUT_0=<VIDEO_INPUT_0>
INPUT_1=<VIDEO_INPUT_1>
INPUT_2=<VIDEO_INPUT_2>
gst-launch-1.0 -e cudastitcher name=stitcher \
homography-list="`cat homographies.json | tr -d "\n" | tr -d " "`" \
filesrc location=$INPUT_0 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv !  stitcher.sink_0 \
filesrc location=$INPUT_1 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv ! stitcher.sink_1 \
filesrc location=$INPUT_2 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv ! stitcher.sink_2 \
stitcher. ! perf print-cpu-load=true ! fakesink -v
INPUT_0=<VIDEO_INPUT_0>
INPUT_1=<VIDEO_INPUT_1>
INPUT_2=<VIDEO_INPUT_2>
INPUT_3=<VIDEO_INPUT_3>
INPUT_4=<VIDEO_INPUT_4>
INPUT_5=<VIDEO_INPUT_5>
gst-launch-1.0 -e cudastitcher name=stitcher \
homography-list="`cat homographies.json | tr -d "\n" | tr -d " "`" \
filesrc location=$INPUT_0 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv !  stitcher.sink_0 \
filesrc location=$INPUT_1 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv ! stitcher.sink_1 \
filesrc location=$INPUT_2 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv !  stitcher.sink_2 \
filesrc location=$INPUT_3 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv ! stitcher.sink_3 \
filesrc location=$INPUT_4 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv !  stitcher.sink_4 \
filesrc location=$INPUT_5 ! qtdemux ! h264parse ! nvv4l2decoder ! queue ! nvvidconv ! stitcher.sink_5 \
stitcher. ! perf print-cpu-load=true ! fakesink -v


Previous: Performance Index Next: Performance/Orin