NVIDIA Jetson Orin - JetPack 5.0.2 - Performance Tuning - Software Encoders For Jetson Orin Nano

From RidgeRun Developer Wiki




Previous: JetPack 5.0.2‎/Performance Tuning/Maximizing Performance Index Next: Development in the Board





The Jetson Orin Nano does not include hardware units for video encoding (NVENC), unlike the other members of the NVIDIA Orin family. This means that users must find alternatives to encode their video other than the hardware-accelerated NVENC module, such as CPU-based encoding. CPU-based encoding solutions leave less CPU power for additional tasks and may not achieve the same performance as NVENC-based encoding. Our goal for this page is to evaluate some video encoding alternatives that work on the Jetson Orin Nano. In this section, you will find the results of several video encoding tests that will help assess the Orin Nano encoding capabilities and make it easier to select the solution that suits your product better.

Summary

We tested two options to encode video with H.264: FFmpeg and GStreamer. The tests were made with three different video resolutions, three different encoder presets, and different bitrate configurations. The idea was to obtain the maximum frame rate possible for each configuration and also compare the performance of the encoding tools in terms of CPU usage.
The results showed that the preset changed significantly the number of frames that could be processed each second. For instance, at 1080p with 10 Mb/s the difference between the slowest and fastest preset is close to 40 frames processed each second. The bitrate also affected the maximum frame rate considerably, The higher the bitrate the fewer frames that can be encoded each second.
Regarding CPU usage of the encoding tools (FFmpeg and GStreamer), at 1080p both tools show very close results, but for most cases GStreamer shows a slightly lower CPU load. The graphs in Figure 1 shows a summary of the results explained on this section. The graph on the left shows the difference of maximum frame rate between the tested presets at 10 Mb/s, the graph in the middle summarizes the behavior of the CPU load at 30 and 60 FPS with the configurations used at 10 Mb/s, and the graph on the left compares the impact of the bitrate on the CPU load for the 1080p 30 FPS settings. The graphs below represent the results from GStreamer with different 1080p configurations.

Figure 1: General results for video encoding tests.

Experimental Setup

The results presented were obtained in the following hardware setup:

  • NVIDIA Jetson Orin Nano (Emulated on a Jetson AGX Orin Developer Kit).
  • JetPack 5.0.2.
  • 8 GB RAM.
  • 6 CPUs.

To learn more about AGX Orin Emulation, please visit the emulation features of the developer kit wiki page.

The videos used for testing have the following specifications (download the test video):

  • Resolutions tested: 1920x1080, 1280x720, and 640x480.
  • Duration of 15 seconds.
  • 30 FPS.
  • Pixel format YUV420P.

Our tests were designed to find:

  • The maximum frame rate achievable given a resolution.
  • The CPU usage with a fixed frame rate and resolution to emulate real-time processing.

Maximum Frame Rate

For this test, the goal is to obtain the maximum frame rate possible at a certain resolution and with different encoding configurations. The resolutions tested are: 1920x1080, 1280x720, and 640x480. For each resolution we tested three presets of the H.264 encoder: veryslow, medium, and ultrafast, which will affect the quality of encoded video. Finally, we tested the encoder with a variable and fixed bitrate. The variable bitrate average is between 3 Mb/s and 5 Mb/s for most cases.

FFmpeg

The goal of the tests is to show you what is the maximum frame rate that can be obtained for different resolutions, presets, and bitrates. First, the presets affect the quality of the compression of the output video, so a slower preset will provide better quality at the expense of a higher CPU utilization. The bitrate, similar to presets, affects the quality of the video, the higher the bitrate the more quality you are going to get, but also the more bandwidth you are going to need to stream a video or the more space you will need to store the final video. Table 1 summarizes the tests ran and the results for each resolution.


Table 1. FFmpeg maximum frame rate tests summary.

Resolution Preset Bitrate (kBits/s) Max Frame Rate (FPS) Max RAM Used (MB) Average CPU Load (%)
1920x1080 veryslow Variable 5 248 79.0
medium 24 108 62.7
ultrafast 43 69 50.7
veryslow 1000 8 247 71.6
medium 46 106 55.6
ultrafast 105 68 33.6
veryslow 10000 5 247 79.1
medium 24 110 67.7
ultrafast 71 74 47.4
1280x720 veryslow Variable 12 136 71.7
medium 54 71 63.4
ultrafast 117 54 44.3
veryslow 1000 19 136 79.6
medium 90 72 56.1
ultrafast 199 54 47.0
640x480 veryslow Variable 28 88 69.2
medium 121 51 50.4
ultrafast 222 49 38.3
veryslow 1000 52 76 75.8
medium 190 52 55.3
ultrafast 400 45 42.6


To give you an idea about how presets affect the quality of the image, Figure 2, 3, and 4 shows a frame of the test video for each preset tested on the 1080p video at 1 Mb/s bitrate. Clearly, the ultrafast preset shows a considerable reduction in quality compared to the very slow preset. So, the preset and bitrate depend on your use case. If you want fast encoding, but do not care as much for the image quality a faster preset or a lower bitrate may be useful. If good video quality is needed a slower preset or a higher bitrate will be the way to go. Also considering the CPU usage, clearly slower presets load the CPU more because of the increased compression quality. If you want to dive deeper into what these presets configure internally, refer to this Encoding presets for x264 documentation.

Figure 2: Result with very slow preset using FFmpeg.
Figure 3: Result with medium preset using FFmpeg.
Figure 4: Result with ultra fast preset using FFmpeg.


For reference, the command needed to run the tests with variable bitrate is shown below. Where the -crf (constant rate factor) flag value is chosen to have a video with average quality. The lower the value the better the quality. You might want to change the preset too. So, you can consult all available presets in H.264 Video Encoding Guide.

ffmpeg -f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -i input1080.yuv -c:v libx264 -crf 22 -preset ultrafast -tune zerolatency output.mp4

And the command to specify a fixed bitrate is presented below. Where the -b:v flag is the target bitrate, and the min and max rate flags are there to make sure we hit the target bitrate. We also need to specify the buffer size if the min and max rates are defined. In general, the size will be set to twice the bitrate.

ffmpeg -f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -i input1080.yuv -c:v libx264 -x264-params "nal-hrd=cbr" -b:v 10M -minrate 10M -maxrate 10M -bufsize 20M -preset ultrafast -tune zerolatency output.mp4

GStreamer

The tests made in this section are similar to those in the FFmpeg section. The idea is to see the difference in performance as well as the quality of the output video. The encoder used is the x264enc The results are summarized in Table 2.

Table 2. GStreamer maximum frame rate tests summary.

Resolution Preset Bitrate (kBits/s) Max Frame Rate (FPS) Max RAM Used (MB) Average CPU Load (%)
1920x1080 veryslow Variable 4.75 208 76.6
medium 43 70 57.4
ultrafast 95 29 52
veryslow 1000 11.6 207 69.48
medium 43 73 52.4
ultrafast 95 34 45.2
veryslow 10000 4.6 210 82.4
medium 22.5 71 70.1
ultrafast 64.7 36 51.3
1280x720 veryslow Variable 12 101 81.2
medium 49 34 65.9
ultrafast 142 20 54.3
veryslow 1000 18.5 102 80.5
medium 76 38 52.1
ultrafast 185 20 37.0
640x480 veryslow Variable 26.8 40 76.9
medium 102 19 53.4
ultrafast 245 13 31.9
veryslow 1000 33.8 41 73.6
medium 136 18 54.2
ultrafast 360 14 33.7


The results, in terms of the quality of the video, are very similar to FFmpeg. Video encoded with the fastest preset possible turns out with a slightly worse quality compared to the very slow and medium preset, which is expected but can be managed by changing the bitrate, the higher the bitrate the better the quality. A sample of what a video looks like is shown in Figures 5, 6, and 7. The images were taken from the 1920x1080 videos at only 1 Mb/s bitrate. If we changed the bitrate to 10 Mb/s, the difference is not that noticeable, and the ultrafast will not show a significant quality difference from a slower preset.

Figure 5: Result with very slow preset using GStreamer.
Figure 6: Result with medium preset using GStreamer.
Figure 7: Result with ultra fast preset using GStreamer.


For this test, the pipeline used for variable bitrate is shown below.

gst-launch-1.0 filesrc location=input1080.yuv ! videoparse width=1920 height=1080 framerate=30/1 format=i420 ! x264enc tune=zerolatency insert-vui=true pass=quant quantizer=22 speed-preset=veryslow ! qtmux ! filesink location=output.mp4

The pipeline to encode with a fixed bitrate is the following. The pass property of the encoder is set to cbr, which means constant bitrate, and then we set the target bitrate with the corresponding property, which is in kBits/s.

gst-launch-1.0 filesrc location=input1080.yuv ! videoparse width=1920 height=1080 framerate=30/1 format=i420 ! x264enc tune=zerolatency insert-vui=true pass=cbr bitrate=10000 speed-preset=veryslow ! qtmux ! filesink location=output.mp4

Encoding CPU Usage

For these tests, the goal is to take video streams of 30 and 60 FPS as a live source (like a camera), and evaluate whether FFmpeg and GStreamer are able to fully encode the stream in real time and also evaluate resource usage with different configurations, mainly the CPU usage. Similar to previous tests, we evaluated the encoding with three different presets: veryslow, medium, and ultrafast. Also, two resolutions were tested: 1920x1080 and 1280x720. The variable bitrate average is between 3 Mb/s and 5 Mb/s for most cases.

FFmpeg

Results for a 30 FPS stream are shown in Table 3. We can see that the veryslow and the medium preset, although are the ones that provides better quality, are not able to encode the stream in real time at the frame rate needed at 1080p with variable or 10 Mb/s bitrate. If the bitrate is lowered to 1 Mb/s, the medium is able to encode at 30 FPS. The ultrafast shows no problem and is able to encode at constant 30 FPS with any configuration.

Table 3. FFmpeg real time encoding performance results for a 30 FPS stream.

Resolution Preset Bitrate (kBits/s) Encoding Frame Rate (FPS) Max RAM Used (MB) Average CPU Load (%)
1920x1080 veryslow Variable 4 246 63.9
medium 20 109 52.8
ultrafast 30 69 38.0
veryslow 1000 7 246 59.8
medium 30 107 47.3
ultrafast 30 68 30.6
veryslow 10000 3 257 66.5
medium 16 111 52.9
ultrafast 30 91 39.8
1280x720 veryslow Variable 9 243 63.8
medium 30 71 49.6
ultrafast 30 54 29.2
veryslow 1000 11 136 62.5
medium 30 73 42.0
ultrafast 30 53 14.1

Then, for the 60 FPS video encoding results in Table 4, the same results as before can be seen, neither the veryslow nor medium preset are able to reach the desired frame rate at 1080p regardless of configuration. The ultrafast is able to encode the 60 FPS at the same rate only if the bitrate is 1 Mb/s.

Table 4. FFmpeg real time encoding performance results for a 60 FPS stream.

Resolution Preset Bitrate (kBits/s) Encoding Frame Rate (FPS) Max RAM Used (MB) Average CPU Load (%)
1920x1080 veryslow Variable 5 250 78.1
medium 22 113 60.9
ultrafast 40 89 50.7
veryslow 1000 8 250 71.5
medium 42 123 55.8
ultrafast 60 74 36.6
veryslow 10000 3 273 78.8
medium 19 116 64.6
ultrafast 57 71 42.4
1280x720 veryslow Variable 12 256 77.0
medium 52 78 64.7
ultrafast 60 56 39.9
veryslow 1000 16 137 70.3
medium 60 90 54.6
ultrafast 60 59 26.6

The FFmpeg commands used to get the 30 FPS results are shown below. The first one contains the necessary params to maintain a fixed bitrate of 1000 kBits/s. The -r flag is used to indicate the frame rate of the input stream. Then, the second command is used for the variable bitrate tests. The -crf (constant rate factor) flag is used to tell FFmpeg we want variable bitrate with an average quality.

ffmpeg -f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -re -i input1080.yuv -c:v libx264 -x264-params "nal-hrd=cbr" -b:v 10M -minrate 10M -maxrate 10M -bufsize 20M -preset ultrafast -tune zerolatency output.mp4
ffmpeg -f rawvideo -pix_fmt yuv420p -s:v 1920x1080 -r 30 -re -i input1080.yuv -c:v libx264 -crf 22 -preset ultrafast -tune zerolatency output.mp4

GStreamer

The same tests cases that were applied to FFmpeg were tested with GStreamer. In Table 5, we have the results for a 30 FPS stream, as before, the veryslow and the medium preset are not able to encode the stream at 30 FPS, unless the bitrate is lowered to 1 Mb/s. The ultrafast is always able to reach 30 FPS encoding.

Table 5. GStreamer real time encoding performance results for a 30 FPS stream.

Resolution Preset Bitrate (kBits/s) Encoding Frame Rate (FPS) Max RAM Used (MB) Average CPU Load (%)
1920x1080 veryslow Variable 5 206 66.0
medium 20 73 48.8
ultrafast 30 29 29.7
veryslow 1000 6 209 57.8
medium 30 71 47.1
ultrafast 30 28 31.4
veryslow 10000 3 208 67.9
medium 16 75 54.4
ultrafast 30 33 42
1280x720 veryslow Variable 10 98 64.7
medium 30 36 46.3
ultrafast 30 19 30.4
veryslow 1000 11 105 66.0
medium 30 35 43.1
ultrafast 30 21 13.0

A similar result was obtained for a 60 FPS stream, but also in Table 6. Except that in this case, the ultrafast was not able to encode in real time when the bitrate was set to 10 Mb/s.

Table 6. GStreamer real time encoding performance results for a 60 FPS stream.

Resolution Preset Bitrate (kBits/s) Encoding Frame Rate (FPS) Max RAM Used (MB) Average CPU Load (%)
1920x1080 veryslow Variable 10 260 76.9
medium 22 93 62.9
ultrafast 60 68 40.3
veryslow 1000 10 238 74.4
medium 39 89 56.1
ultrafast 60 55 45.5
veryslow 10000 5 286 81.7
medium 19 100 69.5
ultrafast 56 77 53.9
1280x720 veryslow Variable 13 111 75.9
medium 48 47 57.4
ultrafast 60 28 35.2
veryslow 1000 16 119 68.5
medium 60 46 48.0
ultrafast 60 31 24.8

The pipelines used for these tests are the following. The first one is used for a fixed bitrate of 10000 kBits/s and the second one is used for variable bitrate.

gst-launch-1.0 filesrc location=input1080.yuv ! videoparse width=1920 height=1080 framerate=60/1 format=i420 ! identity sync=true ! x264enc pass=cbr bitrate=10000 insert-vui=true tune=zerolatency speed-preset=veryslow ! qtmux ! filesink location=output.mp4
gst-launch-1.0 filesrc location=input1080.yuv ! videoparse width=1920 height=1080 framerate=60/1 format=i420 ! identity sync=true ! x264enc pass=quant quanitizer=22 insert-vui=true tune=zerolatency speed-preset=veryslow ! qtmux ! filesink location=output.mp4

Results Analysis

For this section, we are going to compare the results from FFmpeg and GStreamer to outline the main differences of both options mainly from a resource usage perspective.

Maximum Frame Rate

For the 1080p videos, if we take a look at Table 1 and Table 2, we can see that GStreamer provided a slightly faster encoding. Although, both tools are expected to give a similar output due to being configured almost the same. In this case, depending on the bitrate configuration, for the veryslow preset, GStreamer was able to encode between 3 and 6 frames each second, FFmpeg encodes between 3 and 7 frames per second. The maximum frame rate for the medium preset is in the range of 16 and 31 FPS for both GStreamer and FFmpeg. Lastly, the ultrafast preset shows a frame rate between 45 and 81 FPS for GStreamer and 46 to 88 FPS for FFmpeg. The graphs below show the difference between variable bitrate and a fixed 10 Mb/s bitrate.

Figure 8: FFmpeg maximum frame rate summary results
Figure 9: GStreamer maximum frame rate summary results.


In terms of CPU usage during the encoding, at 1080p with variable bitrate, FFmpeg and GStreamer show a slightly lower CPU usage than with fixed bitrate of 10 Mb/s. This behavior can be seen on the graphs below.

Figure 10: FFmpeg CPU load summary results
Figure 11 GStreamer CPU load summary results.


Figure 12: FFmpeg memory usage summary results
Figure 13: GStreamer memory usage summary results.

Encoding CPU Usage

The results shown on this section are going to be focused on the values from the 1080p resolution, since it is the most demanding resolution and the other resolution tested follows a similar pattern. First, for the 30 FPS and 60 FPS streams we can see in both Table 3 and Table 5 the CPU usage for both variable and fixed bitrate of 10 Mb/s is too close to make an assumption about which one uses more resources than the other. Results vary with only 1 Mb/s bitrate.

Figure 14: FFmpeg CPU load summary results for real-time encoding
Figure 15: GStreamer CPU load summary results for real-time encoding.

Memory usage for FFmpeg show that the memory values stayed very close between both encoders for each preset and configuration. Overall, for the veryslow and ultrafast presets, FFmpeg tends to require more memory. However, GStreamer shows a higher memory usage with the medium preset. The following graphs summarize the data for memory usage.

Figure 16: FFmpeg memory usage summary results for real-time encoding
Figure 17: GStreamer memory usage summary results for real-time encoding.

Finally, the impact of the bitrate configuration is shown in Figure 18 and Figure 19, where we can see that, the CPU usage varies significantly depending on the bitrate. A higher bitrate will translate to a better quality but also much more CPU resources will be needed. In the end, you will need to consider the needs of your specific use case to select the appropriate settings. If a good quality is needed, a higher bitrate can be used at the expense of using much more CPU resources. Otherwise, lower bitrates can also provide decent quality while not needed as many resources from the CPU.

Figure 18: Bitrate impact based on bitrate setting for FFmpeg.
Figure 19: Bitrate impact based on bitrate setting for GStreamer.

Conclusion

According to the tables and graphs explained above, we can conclude that:

  • It would be possible to have approximately 4 1080p@30 FPS parallel streams being encoded without maxing out the CPU, as long as a lower bitrate is used and a faster preset is selected. It also helps to put the tune property set to zerolatency.
  • It would be possible to have approximately 2 1080p@60 FPS parallel streams being encoded at the same time if a low bitrate is selected with a faster preset.
  • Using GStreamer to encode a 1080p 30 FPS stream with a medium preset, there is 52% of the CPU left for other tasks if the bitrate is variable, 52% if the bitrate is 1 Mb/s, and 47% left for a 10 Mb/s bitrate. FFmpeg is close to these results as well.
  • The maximum frame rate will depend on the bitrate configuration, the higher the bitrate the fewer the frames that can be encoded per second, but the higher the quality we will get.
  • At 1080p, the encoders configured either the veryslow or medium preset were not able to encode the 30 and 60 FPS streams. The ultrafast was able to encode both streams at their respective frame rate. This result applies for all bitrates tested.



Previous: JetPack 5.0.2‎/Performance Tuning/Maximizing Performance Index Next: Development in the Board