Performance Tests - Video(H264)/Audio(AAC) RTSP streaming pipelines

From RidgeRun Developer Wiki

Introduction

In this document it is presented performance measurements for some H264 RTSP streaming pipelines. It was used a 1080p60 camera but using v4l2src plugin properties as crop-area it was possible to work with 1024x768 frames. Also some tests were made at 60fps or 30fps through decimate property.

Hardware setup

  • 1080p60 source video camera
  • DM8168 Z3 RPS
ARM clk: 987MHz
DDR clk: 675MHz
DRAM:  1 GiB
NAND:  256 MiB

Maximum bitrate according to each H264 profile and ARM load associated

For this test the next pipeline was used:

gst-launch v4l2src device=/dev/video5 always-copy=false queue-size=8 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)60/1' ! 
           omxbufferalloc numBuffers=12 ! 
           gstperf ! 
           omx_h264enc output-buffers=16 input-buffers=10 force-idr-period=30 i-period=30 bitrate=6000000  profile=base ! 
           queue !
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream1 ! 
           rtspsink name=sink service=5001

The results shown on Table 1 were gotten by changing bitrate and profile properties in omx_h264enc plugin and taking the average of the ARM load output reported by gstperf each second.

Table 1. ARM load average for several bitrate values according to each H264 profile
Bitrate Base Profile Main Profile High Profile

5Mbps

14.29% 13.96% 13.91%

10Mbps

19.63% 19.23% 18.15%

15Mbps

23.96% 23.83% 23.50%

20Mbps

28.99% 28.80% 28.69%

25Mbps

34.16% 33.45% 33.30%

30Mbps

38.32% 38.13% 37.83%

35Mpbs

42.13% 42.45% 43.19%

As a conclusion for this test, for different bitrates the ARM load associated to each H264 profile is not significant, basically the bitrate itself is what determines the how much ARM load consumes the pipeline.

Four channels for RTSP and one single Ethernet port using rtspsink

For this tests it was used a tee element to send the H264 encoder output to four RTSP channels through a single Ethernet port.

gst-launch v4l2src device=/dev/video5 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)60/1' ! 
           omxbufferalloc numBuffers=16 ! 
           gstperf name=stream1 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           tee name=t1 t1.! 
           queue ! 
           video/x-h264, mapping=/stream1 ! 
           rtspsink name=sink service=5001 
           t1.! queue ! 
           video/x-h264, mapping=/stream2 ! 
           sink. 
           v4l2src device=/dev/video0 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)60/1' ! 
           omxbufferalloc numBuffers=16 ! 
           gstperf name=stream2 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000  profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           tee name=t2 t2. ! 
           queue ! 
           video/x-h264, mapping=/stream3 ! 
           sink. 
           t2. ! queue ! 
           video/x-h264, mapping=/stream4 ! 
           sink.

The average ARM load consumed by this test was 42.00%.

Observe that we can use the decimate property on v4l2src plugin to decrease the pipeline framerate, for example using decimate=2 as in the next pipeline the ARM load can be reduced to 28.00%

gst-launch v4l2src decimate=2 device=/dev/video5 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)30/1' ! 
           omxbufferalloc numBuffers=16 ! 
           gstperf name=stream1 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           tee name=t1 t1.! 
           queue ! 
           video/x-h264, mapping=/stream1 ! 
           rtspsink name=sink service=5001 
           t1.! queue ! 
           video/x-h264, mapping=/stream2 ! 
           sink. 
           v4l2src decimate=2 device=/dev/video0 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)30/1' ! 
           omxbufferalloc numBuffers=16 ! 
           gstperf name=stream2 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000  profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           tee name=t2 t2. ! 
           queue ! 
           video/x-h264, mapping=/stream3 ! 
           sink. 
           t2. ! queue ! 
           video/x-h264, mapping=/stream4 ! 
           sink.

Four instances of the H264 encoder

The next pipeline has four intances of the H264 encoder and as well as the previous test each encoder output is send through an RTSP channel.

gst-launch v4l2src device=/dev/video5 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)60/1' ! 
           omxbufferalloc numBuffers=16 ! 
           tee name=t ! 
           queue ! 
           gstperf name=stream1 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream1 ! 
           rtspsink name=sink service=5001 
           t. ! queue ! 
           gstperf name=stream2 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream2 ! 
           sink. 
           v4l2src device=/dev/video0 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)60/1' ! 
           omxbufferalloc numBuffers=16 ! 
           tee name=t2 ! 
           queue ! 
           gstperf name=stream3 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream3 ! 
           sink. 
           t2. ! queue ! 
           gstperf name=stream4 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream4 ! 
           sink.

The average ARM load consumed by this test was 75.77%.

Decreasing the pipeline framerate to 30fps using decimate=2, as shown in the next pipeline, the ARM load can be reduced to 50.00%

gst-launch v4l2src decimate=2 device=/dev/video5 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)30/1' ! 
           omxbufferalloc numBuffers=16 ! 
           tee name=t ! 
           queue ! 
           gstperf name=stream1 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream1 ! 
           rtspsink name=sink service=5001 
           t. ! queue ! 
           gstperf name=stream2 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream2 ! 
           sink. 
           v4l2src decimate=2 device=/dev/video0 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)30/1' ! 
           omxbufferalloc numBuffers=16 ! 
           tee name=t2 ! 
           queue ! 
           gstperf name=stream3 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream3 ! 
           sink. 
           t2. ! queue ! 
           gstperf name=stream4 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream4 ! 
           sink.

RTSP - Video (H264) + Audio (AAC)

The following pipeline captures video at 60fps and adds AAC encoded audio on the RTSP streaming.

gst-launch v4l2src device=/dev/video5 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)60/1' ! 
           omxbufferalloc numBuffers=16 ! 
           gstperf name=stream1 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream1 ! 
           rtspsink name=sink service=5001 
           alsasrc  latency-time=20000 buffer-time=800000 ! 
           "audio/x-raw-int, endianness=(int)1234, signed=(boolean)true, width=(int)16, depth=(int)16, rate=(int)44100, channels=(int)2" ! 
           omx_aacenc output-format=4 ! 
           queue ! 
           aacparse ! 
           audio/mpeg, mapping=/stream1 ! 
           sink.

The average ARM load consumed by this test was 24.42%.

Now capturing at 30fps:

gst-launch v4l2src decimate=2 device=/dev/video5 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)30/1' ! 
           omxbufferalloc numBuffers=16 ! 
           gstperf name=stream1 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream1 ! 
           rtspsink name=sink service=5001 
           alsasrc  latency-time=20000 buffer-time=800000 ! 
           "audio/x-raw-int, endianness=(int)1234, signed=(boolean)true, width=(int)16, depth=(int)16, rate=(int)44100, channels=(int)2" ! 
           omx_aacenc output-format=4 ! 
           queue ! 
           aacparse ! 
           audio/mpeg, mapping=/stream1 ! 
           sink.

The average ARM load consumed by this test was 19.74%

Encoding: Video (H264) + Audio (AAC)

This test shows the ARM load consumed for a pipeline recording H264 video at 60fps and AAC audio.

gst-launch -e v4l2src device=/dev/video5 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
              'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)60/1' ! 
              omxbufferalloc numBuffers=16 ! 
              gstperf name=stream1 ! 
              omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
              queue ! 
              rr_h264parser singleNalu=true ! 
              mux.video_00 
              alsasrc  latency-time=20000 buffer-time=800000 ! 
              "audio/x-raw-int, endianness=(int)1234, signed=(boolean)true, width=(int)16, depth=(int)16, rate=(int)44100, channels=(int)2" ! 
              omx_aacenc output-format=4 ! 
              queue ! 
              aacparse ! 
              mux.audio_00 mp4mux dts-method=0 name=mux ! 
              filesink location=audioVideo.mp4

The average ARM load consumed by this pipeline was 16.67%.

Now capturing at 30fps:

gst-launch -e v4l2src decimate=2 device=/dev/video5 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
              'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)30/1' ! 
              omxbufferalloc numBuffers=16 ! 
              gstperf name=stream1 ! 
              omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
              queue ! 
              rr_h264parser singleNalu=true ! 
              mux.video_00 
              alsasrc  latency-time=20000 buffer-time=800000 ! 
              "audio/x-raw-int, endianness=(int)1234, signed=(boolean)true, width=(int)16, depth=(int)16, rate=(int)44100, channels=(int)2" ! 
              omx_aacenc output-format=4 ! 
              queue ! 
              aacparse ! 
              mux.audio_00 mp4mux dts-method=0 name=mux ! 
              filesink location=audioVideo.mp4

The average ARM load consumed by this pipeline was 13.04%

Four instances of the H264 encoder + AAC encoding

This test captures video at 30fps and uses four instances of H264 encoder to send video and audio through four RTSP channels.

gst-launch v4l2src decimate=2 device=/dev/video5 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)30/1' ! 
           omxbufferalloc numBuffers=16 ! 
           tee name=t ! 
           queue ! 
           gstperf name=stream1 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream1 ! 
           rtspsink name=sink service=5001 
           t. ! queue ! 
           gstperf name=stream2 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream2 ! 
           sink. 
           v4l2src decimate=2 device=/dev/video0 always-copy=false queue-size=12 crop-area="448,156@1024x768" ! 
           'video/x-raw-yuv-strided,format=(fourcc)NV12,width=1024,height=768,framerate=(fraction)30/1' ! 
           omxbufferalloc numBuffers=16 ! 
           tee name=t2 ! 
           queue ! 
           gstperf name=stream3 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream3 ! 
           sink. 
           t2. ! queue ! 
           gstperf name=stream4 ! 
           omx_h264enc output-buffers=4 input-buffers=4 force-idr-period=30 i-period=30 bitrate=6000000 profile=high ! 
           queue ! 
           rr_h264parser singleNalu=true ! 
           video/x-h264, mapping=/stream4 ! 
           sink. 
           alsasrc latency-time=20000 buffer-time=800000 ! 
           "audio/x-raw-int, endianness=(int)1234, signed=(boolean)true, width=(int)16, depth=(int)16, rate=(int)44100, channels=(int)2" ! 
           omx_aacenc output-format=4 ! 
           queue ! 
           aacparse ! 
           tee name=a ! 
           audio/mpeg, mapping=/stream1 ! 
           sink. a. ! 
           audio/mpeg, mapping=/stream2 ! 
           sink. a. ! 
           audio/mpeg, mapping=/stream3 ! 
           sink. a. ! 
           audio/mpeg, mapping=/stream4 ! 
           sink.

The average ARM load consumed by this pipeline was 62.42%