Performance of the Stitcher element on NVIDIA AGX Orin

From RidgeRun Developer Wiki



Previous: Performance Index Next: Performance/Xavier







The performance of the Image Projector element depends not only on the resolution but also on the type of transformation.

The following sections show the measurements of the rrprojector (FPS, Processing time, Latency, and CPU/GPU Usage) for each transformation and also include the stitcher to see how it performs with 4k and 1920x1080 resolutions.

Platforms Setup

The testing for the AGX Orin was done with and without jetson clocks with the mode of 30W and JP 6.0.

sudo jetson_clocks

AGX Orin

Framerate

FishEye to Equirectangular

The following graph shows the framerate when using 4k and 1920x1080 resolutions with and without Jetson Clocks.

FPS for fisheye to equirectangular input with and without jetson_clocks.sh

Rectilinear to Equirectangular

The following graph shows the framerate when using 4k and 1920x1080 resolutions with and without Jetson Clocks.

FPS for Rectilinear to equirectangular input with and without jetson_clocks.sh

Projection with FishEye alongside the stitcher

The following graphs show the expected performance when using the stitcher and the projector with two inputs of fisheye with more than 180 degrees field of view for generating a 360 degrees field of view for 4k and HD resolutions with and without Jetson Clocks.

FPS on 1920x1080 inputs with fisheye and without jetson_clocks.sh


FPS on 3840x2160 inputs with fisheye and without jetson_clocks.sh

Latency

Using the same setup as the case for framerate, for the purpose of this performance evaluation, Latency is measured as the time difference between the src of the element before the projector and the src of the projector itself, effectively measuring the time between input and output pads. Generating the average time.

These latency measurements were taken using the GstShark interlatency tracer.

FishEye to Equirectangular

In the following graph, you can see the expected latency applied by the projection of a fisheye input to equirectangular in 4k and 1920x1080 resolutions, with and without Jetson Clocks.

Latency on fisheye to equirectangular with and without jetson_clocks.sh

Rectilinear to Equirectangular

In the following graph, you can see the expected latency applied by the projection of a rectilinear input to equirectangular in 4k and 1920x1080 resolutions, with and without Jetson Clocks.

Latency on rectilinear to equirectangular with and without jetson_clocks.sh

Projection with Fisheye alongside the stitcher

The following graphs show the expected latency applied by the projection of a fisheye input to equirectangular alongside the stitcher for 4k and 1920x1080 resolutions, with and without Jetson Clocks.

Latency on fisheye to equirectangular projection stitching HD with and without jetson_clocks.sh


Latency on fisheye to equirectangular projection stitching 4K with and without jetson_clocks.sh

Processing time

Using the same setup as the case for framerate and latency, for the purpose of this performance evaluation, processing time is measure in milliseconds for better understanding.

These processing time measurements were taken using the GstShark proctime tracer.

Following, you can see a graph summarizing the average processing time of the projector for fisheye, rectilinear, and stitcher cases.

Processing time for each projection type with HD and 4k Resolutions with and without jetson_clocks.h

Jetson Orin Platforms GPU usage

In the following table, you can see the performance with and without Jetson Clocks for different platforms from the AGX Orin with resolutions of 1920x1080 and 3840x2160 at 60 fps.

GPU and RAM Usage percentage for each projection
Type of Projection Mode Resolution GPU RAM
Fisheye Normal HD 23.22%% 6.30%
4K 23.03% 6.44%
Jetson Clocks HD 10.98% 4.05
4K 17.32% 4.81%
Rectilinear Normal HD 17.4% 6.02%
4K 23.15% 6.17%
Jetson Clocks HD 8.54% 6.02%
4K 15.54% 6.26%
Fisheye with stitching Normal HD 65.28% 6.37%
4K 72.33% 6.42%
Jetson Clocks HD 10.98% 4.55%
4K 82.06% 6.42%

Jetson Orin Platforms CPU Usage

In the following table, you can see the performance with and without Jetson Clocks for different platforms from the AGX Orin with resolutions of 1920x1080 and 3840x2160 at 60 fps.

CPU Usage percentage for each platform
Type of projection Mode Resolution CPU
Avg 1 2 3 4 5 6 7 8 9 10 11 12
Fisheye Normal HD 3.71% 3.11% 11.10% 2.59% 2.90% 1.91% 2.76% 3.50% 1.80% - - - -
4K 3.87% 2.75% 1.97% 1.48% 7.68% 5.52% 5.36% 6.14% 0.06% - - - -
Jetson Clocks HD 5.21% 7.02% 9.70% 2.12% 0.07% 5.49% 9.88% 7.33% 0.05% - - - -
4K 3.88% 8.67% 4.80% 1.26% 1.64% 2.14% 7.84% 4.61% 0.06% - - - -
Rectilinear Normal HD 4.68% 3.26% 16.04% 4.49% 1.91% 3.19% 5.74% 0.09% 2.75% - - - -
4K 4.33% 2.46% 1.47% 1.10% 0.05% 7.88% 0.19% 4.44% 17.07% - - - -
Jetson Clocks HD 5.21% 7.02% 9.70% 2.12% 0.07% 5.49% 9.88% 7.33% 0.05% - - - -
4K 4.19% 8.47% 3.26% 0.46% 1.43% 2.15% 9.05% 7.60% 1.14% - - - -
Fisheye with stitching Normal HD 16.01% 18.33% 19.97% 11.17% 13.42% 16.08% 17.62% 16.86% 14.60% - - - -
4K 11.51% 16.65% 7.18% 8.62% 8.84% 13.54% 10.11% 12.56% 14.56% - - - -
Jetson Clocks HD 12.83% 12.67% 14.84% 11.30% 6.02% 14.95% 12.65% 16.12% 14.09% - - - -
4K 6.82% 11.40% 10.33% 4.65% 4.63% 3.73% 6.97% 6.75% 6.10% - - - -

Reproducing the Results

Following you can find the pipelines used for the test of performance.

Fisheye to equirectangular testing pipeline

The following variables were used for the fisheye testing.

HD
R0=975
L0=195
CX0=948
CY0=512
RX0=0.0
RY0=0.0
RZ0=-90

R1=974
L1=195
CX1=953
CY1=521
RX1=0.0
RY1=0.0
RZ1=90
CAPS="video/x-raw(memory:NVMM),width=1920,height=1080,framerate=1/30"


4K
R0=975
L0=195
CX0=948
CY0=512
RX0=0.0
RY0=0.0
RZ0=-90

R1=974
L1=195
CX1=953
CY1=521
RX1=0.0
RY1=0.0
RZ1=90
CAPS="video/x-raw(memory:NVMM),width=3840,height=2160,framerate=1/30"

You can use the respective variables with the base pipeline.

GST_DEBUG=GST_TRACER:7 GST_TRACERS="interlatency;proctime"  gst-launch-1.0  videotestsrc is-live=true ! nvvidconv ! $CAPS  ! rrfisheyetoeqr radius=$R0 lens=$L0 center-x=$CX0 center-y=$CY0 rot-x=$RX0 rot-y=$RY0 rot-z=$RZ0 name=proj0 ! nvvidconv ! autovideosink

Rectilinear to equirectangular testing pipeline

The following variables were used for the rectilinear case. Also for this testing, the values used were based on the once that provides the best performance. The performance for this type of projection can vary depending on the calibration.

HD
 
CAPS="video/x-raw(memory:NVMM),width=1920,height=1080,framerate=1/30"


4K
CAPS="video/x-raw(memory:NVMM),width=3840,height=2160,framerate=1/30"

You can use the respective variables with the base pipeline.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="interlatency;proctime" gst-launch-1.0 videotestsrc is-live=true ! nvvidconv ! $CAPS ! rrrectilineartoeqr fov-h=150 fov-v=90   name=proj0 viewpoint-lat=45 viewpoint-lon=90  name=proj0 ! perf name=p0 ! nvvidconv !  queue ! x265enc ! h265parse ! rtph265pay  !  udpsink port=6000 host=10.42.0.1

Fisheye to equirectangular with stitching testing pipeline

The following variables and calibration files were used for most common case of use for the projector with the stitcher.

HD
 
R0=975
L0=195
CX0=948
CY0=512
RX0=0.0
RY0=0.0
RZ0=-90

R1=974
L1=195
CX1=953
CY1=521
RX1=0.0
RY1=0.0
RZ1=90

CAPS="video/x-raw(memory:NVMM),width=1920,height=1080,framerate=1/30"
{
    "projections": [
        {
            "0": {
                "radius": 975.0,
                "lens": 195.0,
                "center_x": 948.0,
                "center_y": 512.0,
                "rot_x": 0,
                "rot_y": 0,
                "rot_z": -90.0,
                "fisheye": true
            }
        },
        {
            "1": {
                "radius": 974.0,
                "lens": 195.0,
                "center_x": 953.0,
                "center_y": 521.0,
                "rot_x": 0,
                "rot_y": 0,
                "rot_z": 90.0,
                "fisheye": true
            }
        }
    ],
    "homographies": [
        {
            "images": {
                "target": 1,
                "reference": 0
            },
            "matrix": {
                "h00": 1,
                "h01": 0,
                "h02": -6.666666666666629,
                "h10": 0,
                "h11": 1,
                "h12": 0,
                "h20": 0,
                "h21": 0,
                "h22": 1
            }
        }
    ]
}


4K
R0=1952
L0=195
CX0=1898
CY0=1024
RX0=0.0
RY0=0.0
RZ0=-90

R1=1949
L1=195
CX1=1906
CY1=1020
RX1=0.0
RY1=0.0
RZ1=90

CAPS="video/x-raw(memory:NVMM),width=3840,height=2160,framerate=1/30"
{
    "projections": [
        {
            "0": {
                "radius": 1952.0,
                "lens": 195.0,
                "center_x": 1898.0,
                "center_y": 1024.0,
                "rot_x": 0,
                "rot_y": 0,
                "rot_z": -90.0,
                "fisheye": true
            }
        },
        {
            "1": {
                "radius": 1949.0,
                "lens": 195.0,
                "center_x": 1906.0,
                "center_y": 1020.0,
                "rot_x": 0,
                "rot_y": 0,
                "rot_z": 90.0,
                "fisheye": true
            }
        }
    ],
    "homographies": [
        {
            "images": {
                "target": 1,
                "reference": 0
            },
            "matrix": {
                "h00": 1,
                "h01": 0,
                "h02": 0.0,
                "h10": 0,
                "h11": 1,
                "h12": 0,
                "h20": 0,
                "h21": 0,
                "h22": 1
            }
        }
    ]
}

You can use the respective variables with the base pipeline.

GST_DEBUG="GST_TRACER:7" GST_TRACERS="interlatency;proctime" gst-launch-1.0  cudastitcher name=stitcher \
	homography-list="`cat homographies.json | tr -d "\n" | tr -d "\t" | tr -d " "`" \
	videotestsrc num-buffers=10000 ! nvvidconv !  $CAPS ! rrfisheyetoeqr radius=$R0 lens=$L0 center-x=$CX0 center-y=$CY0 rot-x=$RX0 rot-y=$RY0 rot-z=$RZ0 name=proj0 ! perf name=p0 !   queue ! stitcher.sink_0 \
	 videotestsrc num-buffers=10000 ! nvvidconv ! $CAPS ! rrfisheyetoeqr radius=$R1 lens=$L1 center-x=$CX1 center-y=$CY1 rot-x=$RX1 rot-y=$RY1 rot-z=$RZ1 name=proj1 ! perf name=p1  ! queue ! stitcher.sink_1 \
	stitcher. ! nvvidconv ! x264enc ! filesink location=test.mp4


Previous: Performance Index Next: Performance/Xavier