IMX6 RAW video performance and tuning using GStreamer

From RidgeRun Developer Wiki

Description

This wiki page has the latency and CPU results making raw video streaming after improving the UDP receiving for PC and IMX6.

Conclusions

  • Currently, the entire system has ~3 additional frames of latency in 30fps, which are distributed in capture system and network. The display system does not add latency since the interlatency of single buffer is lower than the ideal latency for 30fps.
  • In capture system, the highest latency is added by the RAW RTP payload process since this process does RTP packetization of video data. Eventually it might analyze to optimize the RTP packetization algorithm.
  • In network system, the added latency is between UDPsink and UDPsrc elements. In this stretch, there are many elements such as gstreamer algorithms, network driver and external network devices that add latency. GStreamer and Network driver might be optimized to reduce the latency.

Test Environment

The following tests contains measurements about CPU usage and latency performances when the UDP receiving is improved for PC and IMX6 using optimized versions of udpsrc element.

Setup 1: IMX6 => PC

Normal Test

GStreamer Pipeline

  • iMX6
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! rtpvrawpay ! udpsink host=<PC-IP> port=5001 sync=false async=false -v
  • Host Machine
gst-launch-1.0 udpsrc port=5001 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8,width=(string)720, height=(string)576, colorimetry=(string)BT601-5, payload=(int)96, ssrc=(uint)155528026, timestamp-offset=(uint)2270520902, seqnum-offset=(uint)27437,a-framerate=(string)30" ! rtpvrawdepay ! videoconvert ! queue ! xvimagesink sync=false

Results

Results: Optimized Udpsrc Test
udpsrc IMX6 CPU (%) Host PC (%) Latency (ms)
Optimized ~89 ~43 ~130
No optimized ~89 ~30 ~130

Jumbo frame Test

GStreamer Pipeline

  • iMX6
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! rtpvrawpay mtu=9000 ! udpsink host=<PC-IP> port=5001 sync=false async=false -v
  • Host Machine
gst-launch-1.0 udpsrc mtu=9000 port=5001 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8,width=(string)720, height=(string)576, colorimetry=(string)BT601-5, payload=(int)96, ssrc=(uint)155528026, timestamp-offset=(uint)2270520902, seqnum-offset=(uint)27437,a-framerate=(string)30" ! rtpvrawdepay ! videoconvert ! queue ! xvimagesink sync=false

Note: Only in the optimized udpsrc is required to set mtu property to 9000. The original udpsrc doesn't not have this property.

Results

Results: Optimized Udpsrc Test
udpsrc IMX6 CPU (%) Host PC (%) Latency (ms)
Optimized ~69 ~34 ~130
No optimized ~69 ~17 ~130


Setup 2: PC => IMX6

Normal Test

First, correct the quality on the received stream increasing the limit size of the UDP traffic that is allowed to buffer on the receive socket (on the IMX6).

sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608

GStreamer Pipeline

  • iMX6
gst-launch-1.0 udpsrc max-packet-size=9000 buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! rtpvrawdepay !  imxg2dvideosink window-width=720 window-height=576 sync=true -v
  • Host Machine
gst-launch-1.0 v4l2src ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1"  ! rtpvrawpay mtu=9000 ! udpsink host=<IMX6-IP> port=5001 sync=false async=false -v 

Note: Only in the optimized udpsrc is required to set max-packet-size property to 9000. Also, in the host pipeline the mtu in the rtpvrawpay element should be set to 9000 to match too. The original udpsrc can only receive packet size equal to 1500, so for this case do not change the mtu in the rtpvrawdepay.

Results

Results: Optimized udpsrc test
udpsrc IMX6 CPU (%) Host PC (%) Latency (ms)
Optimized ~45 ~35 ~130
No optimized ~74 ~35 ~130

Note: Only in the optimized udpsrc is required to set mtu property to 9000. The original udpsrc doesn't not have this property.


Different sinks

First, correct the quality on the received stream increasing the limit size of the UDP traffic that is allowed to buffer on the receive socket (on the IMX6).

sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608

GStreamer Pipeline

For this example we test with 3 different sinks:

SINK = rtpvrawdepay ! imxg2dvideosink window-width=720 window-height=576 sync=true
SINK = rtpvrawdepay ! fakesink
SINK = fakesink
  • iMX6
gst-launch-1.0 udpsrc max-packet-size=9000 buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! <SINK> -v
  • Host Machine
gst-launch-1.0 v4l2src ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1"  ! rtpvrawpay mtu=9000 ! udpsink host=<IMX6-IP> port=5001 sync=false async=false -v 

Note: Only in the optimized udpsrc is required to set max-packet-size property to 9000. Also, in the host pipeline the mtu in the rtpvrawpay element should be set to 9000 to match too. The original udpsrc can only receive packet size equal to 1500, so for this case do not change the mtu in the rtpvrawdepay.

Results

Results: IMX6 CPU %
udpsrc imxg2dvideosink (IMX6 CPU %) depay + fakesink (IMX6 CPU %) fakesink (IMX6 CPU %)
Optimized ~45 ~41 ~35
No optimized ~74 ~72 ~65

Note: Only in the optimized udpsrc is required to set mtu property to 9000. The original udpsrc doesn't not have this property.

Latency

This report includes latency measurements for RAW RTP streaming both iMX6 platform and host machine. The goal is to identify the added latency for each element into whole system.

Test Environment

Table 1. Environment elements
Property Value
Capture platform iMX6 Variscite
Display platform x86 i7 7th Gen
Network Bandwidth Gigabit Switch
Resolution 1280x720
Frame rate 30 fps

Assumptions

These are assumptions during the measurements and final analysis.

  • The capture side has a measurable latency, but it can not be optimized since the real case, it is being used a RTP camera.
  • The iMX6 Variscite might have a different performance regarding the RTP camera.
  • The host machine should work similarly than Atom Platform.
  • There is the enough bandwidth to stream RAW RTP video.
  • The interlatency represents the latency of single buffer different points through pipeline.

Capture System

This section has the latency measurement on capture platform (iMX6) to get an approximated value to keep in mind for the whole system.

Camera Latency

This measurement is to get the latency only capturing and displaying directly.

gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! autovideosink
Table 2. Latency values for camera.
Element Latency (ms)
Display 43

RAW RTP payload latency

These measurements shows the latency values to stream RAW RTP video

  • iMX6 Variscite
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! rtpvrawpay ! udpsink host=10.251.101.15 port=5001 sync=false async=false

These are interlatency average values during 1 min.

Table 3. Interlatency values on the RAW RTP payload pipeline.
Element Added Latency (ms)
rtpvrawpay 22.27757662
udpsink(sinkpad) 22.27757662

Capture System Latency

Table 4. Interlatency values for the Capture System.
Part Latency (ms)
Camera 43
RAW RTP process 22.27757662
Capture System 65.27757662

Display Platform

This section has the latency measurement on display platform (x86) to get an approximated value to keep in mind for the whole system.

RAW RTP depayload latency

  • x86 i7
gst-launch-1.0 udpsrc port=5001 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8,width=(string)720, height=(string)576, colorimetry=(string)BT601-5, payload=(int)96, ssrc=(uint)155528026, timestamp-offset=(uint)2270520902, seqnum-offset=(uint)27437,a-framerate=(string)30" ! rtpvrawdepay ! videoconvert ! queue ! xvimagesink sync=false

These are latency average values during 1 min. These values are measured in output side for each element, it says how much latency is adding each element.

Table 5. Interlatency values on the RAW RTP depayload pipeline.
Element Added Latency (us)
rtpvrawdepay 8.707433315
videoconvert 22.71966235
queue 36.16270062
xvimagesink 36.16270062

Display Platform Latency

Table 6. Interlatency values for Display Platform.
Part Latency (us)
RAW RTP process 36.16270062
Display Platform 36.16270062

System Latency

This measurement is the whole system latency from capture to display together.

  • iMX6 Variscite
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! rtpvrawpay ! udpsink host=10.251.101.15 port=5001 sync=false async=false
  • x86 i7
gst-launch-1.0 udpsrc port=5001 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8,width=(string)720, height=(string)576, colorimetry=(string)BT601-5, payload=(int)96, ssrc=(uint)155528026, timestamp-offset=(uint)2270520902, seqnum-offset=(uint)27437,a-framerate=(string)30" ! rtpvrawdepay ! videoconvert ! queue ! xvimagesink sync=false
Table 7. Latency values for whole system.
Part Latency (ms)
System 130

Network

This is an indirect measurement using the previous latency values to approximate the network latency during the streaming.

Equation to get the network latency:

network_latency = whole_latency - (capture_latency + display_latency)
Table 8. Indirect latency values for network stack.
System Part Latency (ms)
Whole System 130
Capture System 65.27757662
Display System 0.036162701
Network System 64.686260679