IMX6 RAW video performance and tuning using GStreamer
Description
This wiki page has the latency and CPU results making raw video streaming after improving the UDP receiving for PC and IMX6.
Conclusions
- Currently, the entire system has
~3 additional frames
of latency in30fps
, which are distributed in capture system and network. The display system does not add latency since the interlatency of single buffer is lower than the ideal latency for30fps
.
- In capture system, the highest latency is added by the RAW RTP payload process since this process does RTP packetization of video data. Eventually it might analyze to optimize the RTP packetization algorithm.
- In network system, the added latency is between UDPsink and UDPsrc elements. In this stretch, there are many elements such as gstreamer algorithms, network driver and external network devices that add latency. GStreamer and Network driver might be optimized to reduce the latency.
Test Environment
The following tests contains measurements about CPU usage and latency performances when the UDP receiving is improved for PC and IMX6 using optimized versions of udpsrc element.
- iMx6 variscite: var-som-solo
- Host Machine: Ubuntu 16.04 using GStreamer Alternative environment. See: https://developer.ridgerun.com/wiki/index.php?title=Setting_a_GStreamer_Alternative_Environment
- GStreamer 1.12.2 for iMx6
- GStreamer 1.12.4 for PC
- Local Network
- Resolution: 720x576
- Framerate: 30fps
Setup 1: IMX6 => PC
Normal Test
GStreamer Pipeline
- iMX6
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! rtpvrawpay ! udpsink host=<PC-IP> port=5001 sync=false async=false -v
- Host Machine
gst-launch-1.0 udpsrc port=5001 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8,width=(string)720, height=(string)576, colorimetry=(string)BT601-5, payload=(int)96, ssrc=(uint)155528026, timestamp-offset=(uint)2270520902, seqnum-offset=(uint)27437,a-framerate=(string)30" ! rtpvrawdepay ! videoconvert ! queue ! xvimagesink sync=false
Results
udpsrc | IMX6 CPU (%) | Host PC (%) | Latency (ms) |
---|---|---|---|
Optimized | ~89 | ~43 | ~130 |
No optimized | ~89 | ~30 | ~130 |
Jumbo frame Test
GStreamer Pipeline
- iMX6
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! rtpvrawpay mtu=9000 ! udpsink host=<PC-IP> port=5001 sync=false async=false -v
- Host Machine
gst-launch-1.0 udpsrc mtu=9000 port=5001 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8,width=(string)720, height=(string)576, colorimetry=(string)BT601-5, payload=(int)96, ssrc=(uint)155528026, timestamp-offset=(uint)2270520902, seqnum-offset=(uint)27437,a-framerate=(string)30" ! rtpvrawdepay ! videoconvert ! queue ! xvimagesink sync=false
Note: Only in the optimized udpsrc is required to set mtu property to 9000. The original udpsrc doesn't not have this property.
Results
udpsrc | IMX6 CPU (%) | Host PC (%) | Latency (ms) |
---|---|---|---|
Optimized | ~69 | ~34 | ~130 |
No optimized | ~69 | ~17 | ~130 |
Setup 2: PC => IMX6
Normal Test
First, correct the quality on the received stream increasing the limit size of the UDP traffic that is allowed to buffer on the receive socket (on the IMX6).
sysctl -w net.core.rmem_max=8388608 sysctl -w net.core.wmem_max=8388608
GStreamer Pipeline
- iMX6
gst-launch-1.0 udpsrc max-packet-size=9000 buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! rtpvrawdepay ! imxg2dvideosink window-width=720 window-height=576 sync=true -v
- Host Machine
gst-launch-1.0 v4l2src ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1" ! rtpvrawpay mtu=9000 ! udpsink host=<IMX6-IP> port=5001 sync=false async=false -v
Note: Only in the optimized udpsrc is required to set max-packet-size property to 9000. Also, in the host pipeline the mtu in the rtpvrawpay element should be set to 9000 to match too. The original udpsrc can only receive packet size equal to 1500, so for this case do not change the mtu in the rtpvrawdepay.
Results
udpsrc | IMX6 CPU (%) | Host PC (%) | Latency (ms) |
---|---|---|---|
Optimized | ~45 | ~35 | ~130 |
No optimized | ~74 | ~35 | ~130 |
Note: Only in the optimized udpsrc is required to set mtu property to 9000. The original udpsrc doesn't not have this property.
Different sinks
First, correct the quality on the received stream increasing the limit size of the UDP traffic that is allowed to buffer on the receive socket (on the IMX6).
sysctl -w net.core.rmem_max=8388608 sysctl -w net.core.wmem_max=8388608
GStreamer Pipeline
For this example we test with 3 different sinks:
SINK = rtpvrawdepay ! imxg2dvideosink window-width=720 window-height=576 sync=true SINK = rtpvrawdepay ! fakesink SINK = fakesink
- iMX6
gst-launch-1.0 udpsrc max-packet-size=9000 buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! <SINK> -v
- Host Machine
gst-launch-1.0 v4l2src ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1" ! rtpvrawpay mtu=9000 ! udpsink host=<IMX6-IP> port=5001 sync=false async=false -v
Note: Only in the optimized udpsrc is required to set max-packet-size property to 9000. Also, in the host pipeline the mtu in the rtpvrawpay element should be set to 9000 to match too. The original udpsrc can only receive packet size equal to 1500, so for this case do not change the mtu in the rtpvrawdepay.
Results
udpsrc | imxg2dvideosink (IMX6 CPU %) | depay + fakesink (IMX6 CPU %) | fakesink (IMX6 CPU %) |
---|---|---|---|
Optimized | ~45 | ~41 | ~35 |
No optimized | ~74 | ~72 | ~65 |
Note: Only in the optimized udpsrc is required to set mtu property to 9000. The original udpsrc doesn't not have this property.
Latency
This report includes latency measurements for RAW RTP streaming both iMX6 platform and host machine. The goal is to identify the added latency for each element into whole system.
Test Environment
Property | Value |
---|---|
Capture platform | iMX6 Variscite |
Display platform | x86 i7 7th Gen |
Network Bandwidth | Gigabit Switch |
Resolution | 1280x720 |
Frame rate | 30 fps |
Assumptions
These are assumptions during the measurements and final analysis.
- The capture side has a measurable latency, but it can not be optimized since the real case, it is being used a RTP camera.
- The iMX6 Variscite might have a different performance regarding the RTP camera.
- The host machine should work similarly than Atom Platform.
- There is the enough bandwidth to stream RAW RTP video.
- The interlatency represents the latency of single buffer different points through pipeline.
Capture System
This section has the latency measurement on capture platform (iMX6) to get an approximated value to keep in mind for the whole system.
Camera Latency
This measurement is to get the latency only capturing and displaying directly.
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! autovideosink
Element | Latency (ms) |
---|---|
Display | 43 |
RAW RTP payload latency
These measurements shows the latency values to stream RAW RTP video
- iMX6 Variscite
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! rtpvrawpay ! udpsink host=10.251.101.15 port=5001 sync=false async=false
These are interlatency average values during 1 min.
Element | Added Latency (ms) |
---|---|
rtpvrawpay | 22.27757662 |
udpsink(sinkpad) | 22.27757662 |
Capture System Latency
Part | Latency (ms) |
---|---|
Camera | 43 |
RAW RTP process | 22.27757662 |
Capture System | 65.27757662 |
Display Platform
This section has the latency measurement on display platform (x86) to get an approximated value to keep in mind for the whole system.
RAW RTP depayload latency
- x86 i7
gst-launch-1.0 udpsrc port=5001 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8,width=(string)720, height=(string)576, colorimetry=(string)BT601-5, payload=(int)96, ssrc=(uint)155528026, timestamp-offset=(uint)2270520902, seqnum-offset=(uint)27437,a-framerate=(string)30" ! rtpvrawdepay ! videoconvert ! queue ! xvimagesink sync=false
These are latency average values during 1 min. These values are measured in output side for each element, it says how much latency is adding each element.
Element | Added Latency (us) |
---|---|
rtpvrawdepay | 8.707433315 |
videoconvert | 22.71966235 |
queue | 36.16270062 |
xvimagesink | 36.16270062 |
Display Platform Latency
Part | Latency (us) |
---|---|
RAW RTP process | 36.16270062 |
Display Platform | 36.16270062 |
System Latency
This measurement is the whole system latency from capture to display together.
- iMX6 Variscite
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! rtpvrawpay ! udpsink host=10.251.101.15 port=5001 sync=false async=false
- x86 i7
gst-launch-1.0 udpsrc port=5001 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8,width=(string)720, height=(string)576, colorimetry=(string)BT601-5, payload=(int)96, ssrc=(uint)155528026, timestamp-offset=(uint)2270520902, seqnum-offset=(uint)27437,a-framerate=(string)30" ! rtpvrawdepay ! videoconvert ! queue ! xvimagesink sync=false
Part | Latency (ms) |
---|---|
System | 130 |
Network
This is an indirect measurement using the previous latency values to approximate the network latency during the streaming.
Equation to get the network latency:
network_latency = whole_latency - (capture_latency + display_latency)
System Part | Latency (ms) |
---|---|
Whole System | 130 |
Capture System | 65.27757662 |
Display System | 0.036162701 |
Network System | 64.686260679 |