Streaming RAW Video with GStreamer
|
IMX6 RAW Streaming Performance
Test to measure latency when capturing from a sensor at 720x576 resolution using an IMX6 platform and streaming the IMX6 RAW video to an x86 platform.
Glass to glass test
For this test VAR-SOM-MX6 board was used.
IMX6 Transmitter to X86 Receiver
IMX6 Transmitter Pipeline
gst-launch-1.0 imxv4l2videosrc imx-capture-mode=3 ! rtpvrawpay ! udpsink host=10.251.101.212 port=5001 sync=false async=false -v
x86 Receiver Pipeline
gst-launch-1.0 udpsrc port=5001 caps = "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8, \ width=(string)720, height=(string)576, colorimetry=(string)BT601-5, payload=(int)96, ssrc=(uint)155528026, timestamp-offset=(uint)2270520902, seqnum-offset=(uint)27437, \ a-framerate=(string)30" ! rtpvrawdepay ! videoconvert ! queue ! xvimagesink sync=false
Results
Captured time (s) | Received time (s) | Latency (ms) |
---|---|---|
0.788 | 0.346 | 442 |
3.537 | 3.062 | 475 |
4.978 | 4.541 | 437 |
5.867 | 5.377 | 490 |
7.559 | 7.106 | 453 |
8.938 | 8.487 | 451 |
9.913 | 9.428 | 485 |
10.660 | 10.172 | 488 |
11.623 | 11.186 | 437 |
13.328 | 12.886 | 442 |
Captured time (s) | Received time (s) | Latency (ms) |
---|---|---|
2.545 | 2.069 | 476 |
3.107 | 2.632 | 475 |
4.934 | 4.499 | 435 |
6.595 | 6.160 | 435 |
7.332 | 6.811 | 521 |
8.328 | 7.808 | 520 |
9.410 | 8.934 | 476 |
10.059 | 9.584 | 475 |
11.239 | 10.758 | 481 |
11.673 | 11.194 | 479 |
14.619 | 14.186 | 433 |
From those test the average latency obtained is approximately ~470 ms and the CPU usage in the IMX6 is about 43%. On x86 platform the CPU usage is between 5%-15% when receiving the streaming.
x86 Transmitter to IMX6 Receiver
x86 Transmitter Pipeline
- 720x576
gst-launch-1.0 v4l2src ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1" ! rtpvrawpay ! udpsink host=10.251.101.92 port=5001 sync=false async=false -v
- 176x144
gst-launch-1.0 v4l2src ! videoconvert ! videoscale ! "video/x-raw,width=176,height=144,format=UYVY" ! rtpvrawpay ! udpsink host=10.251.101.92 port=5001 sync=false async=false -v
IMX6 Receiver Pipeline
- 720x576
gst-launch-1.0 udpsrc buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! rtpvrawdepay ! imxg2dvideosink window-width=720 window-height=576 sync=true -v
- 176x144
gst-launch-1.0 udpsrc buffer-size=38016 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)176, height=(string)144,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! rtpvrawdepay ! imxg2dvideosink window-width=720 window-height=576 sync=true -v
Results
Captured time (s) | Received time (s) | Latency (ms) |
---|---|---|
51.730 | 51.648 | 82 |
50.050 | 49.926 | 124 |
48.118 | 48.039 | 79 |
46.225 | 46.139 | 86 |
44.549 | 44.420 | 129 |
42.635 | 42.542 | 93 |
40.593 | 40.568 | 25 |
38.142 | 38.056 | 86 |
35.218 | 35.132 | 86 |
09.632 | 09.547 | 85 |
Captured time (s) | Received time (s) | Latency (ms) |
---|---|---|
1.936 | 1.764 | 172 |
2.227 | 2.056 | 171 |
3.727 | 3.648 | 79 |
4.452 | 4.240 | 212 |
5.660 | 5.487 | 173 |
6.123 | 5.994 | 126 |
7.016 | 6.844 | 172 |
8.662 | 8.533 | 129 |
11.107 | 10.938 | 169 |
11.925 | 11.756 | 169 |
Build udpsrc for IMX6
sudo apt-get install gawk wget git-core diffstat unzip texinfo gcc-multilib \ build-essential chrpath socat cpio python python3 python3-pip python3-pexpect \ xz-utils debianutils iputils-ping libsdl1.2-dev xterm sudo apt-get install autoconf libtool libglib2.0-dev libarchive-dev python-git \ sed cvs subversion coreutils texi2html docbook-utils python-pysqlite2 \ help2man make gcc g++ desktop-file-utils libgl1-mesa-dev libglu1-mesa-dev \ mercurial automake groff curl lzop asciidoc u-boot-tools dos2unix mtd-utils pv \ libncurses5 libncurses5-dev libncursesw5-dev libelf-dev zlib1g-dev mkdir ~/var-fslc-yocto cd ~/var-fslc-yocto repo init -u https://github.com/varigit/variscite-bsp-platform.git -b rocko repo sync -j4 MACHINE=var-som-mx6 DISTRO=fslc-framebuffer . setup-environment build_fb bitbake gstreamer1.0-plugins-imx gstreamer1.0-plugins-good
UDP traffic tuning
Linux places very restrictive limits on the performance of UDP protocols by limiting the size of the UDP traffic that is allowed to buffer on the receive socket. Since we have high bitrate requirements we need to tune the limits of the socket. First check the current UDP/IP receive buffer default and limit on your IMX6:
root@var-som-mx6:~# sysctl net.core.rmem_max net.core.rmem_max = 163840 root@var-som-mx6:~# sysctl net.core.rmem_default net.core.rmem_default = 163840
That's around 160 kB (one buffer in the 720x576 res is around 607 kB). I recommend to increase the size to at least 8MB (in the future since we are planning to run several streams we might need to increase it even further). As a sudo user you need to run the following commands before executing the pipelines
$sysctl -w net.core.rmem_max=8388608 $sysctl -w net.core.wmem_max=8388608
Please check that when receiving the stream the "quality" issues no longer appear when executing the following pipelines :
- Pipeline on Host-PC
gst-launch-1.0 v4l2src ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1" ! rtpvrawpay ! udpsink host=<IMX6-IP> port=5001 sync=false async=false -v
- Pipeline on IMX6
gst-launch-1.0 udpsrc buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! rtpvrawdepay ! imxg2dvideosink window-width=720 window-height=576 sync=true -v
After this change, the CPU usage jumps to around 74% on the IMX6 for a single pipeline but that's ok; we will perform a trick on the next section to decrease that value .
UDP pipeline tunning
After the previous change, the CPU usage jumps from 54% to around 74% on the IMX6. This since we corrected the capabilities of the socket and there are no drops on the quality at the expense of having a higher CPU (since more packets are received). A possible way out is to increase the MTU size of the rtp packets created by rtpvrawpay, this will cause that the partition of this big package to happen on the physical network layer of the PC , and it's reconstruction on the IMX6 will also be performed by the physical layer.
UDP receiving only on IMX6
Reference pipelines without using MTU tnning
- Pipeline on Host-PC
gst-launch-1.0 v4l2src device=/dev/video0 ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1" ! rtpvrawpay ! udpsink host=<IMX6-IP> port=5001 sync=false async=false -v
- Pipeline on IMX6
gst-launch-1.0 udpsrc buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! fakesink -v
On the IMX6 the pipeline has a CPU usage around ~66%
Reference pipelines after MTU tuning
- Pipeline on Host-PC
gst-launch-1.0 v4l2src device=/dev/video0 ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1" ! rtpvrawpay mtu=60000 ! udpsink host=<IMX6-IP> port=5001 sync=false async=false -v
- Pipeline on IMX6
gst-launch-1.0 udpsrc buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! fakesink -v
The following report was generated by Wireshark on the PC, shows that the network traffic is still fragmented by the physical layer.
On the IMX6 the pipeline has a CPU usage of around ~15% to ~ 20%
UDP receiving on IMX6 + depayloader
- Pipeline on Host-PC
gst-launch-1.0 v4l2src device=/dev/video0 ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1" ! rtpvrawpay mtu=6000 ! udpsink host=<IMX6-IP> port=5001 sync=false async=false -v
- Pipeline on IMX6
gst-launch-1.0 udpsrc buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! rtpvrawdepay ! fakesink -v
On the IMX6 the pipeline has a CPU usage of around ~ 29% to ~37%.
UDP receiving on IMX6 + payloader + display
- Pipeline on Host-PC
gst-launch-1.0 v4l2src device=/dev/video0 ! videoconvert ! videoscale ! videorate ! "video/x-raw,width=720,height=576,format=I420,framerate=25/1" ! rtpvrawpay mtu=6000 ! udpsink host=<IMX6-IP> port=5001 sync=false async=false -v
- Pipeline on IMX6
gst-launch-1.0 udpsrc buffer-size=622080 port=5001 caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=YCbCr-4:2:0,depth=(string)8,width=(string)720, height=(string)576,colorimetry=(string)BT601-5, payload=(int)96, a-framerate=25" ! rtpvrawdepay ! imxg2dvideosink window-width=720 window-height=576 sync=true -v
On the IMX6 the pipeline has a CPU usage of around ~32% to ~ 40% for the pipeline.
Captured time (s) | Received time (s) | Latency (ms) |
---|---|---|
12.733 | 12.606 | 127 |
15.038 | 14.907 | 131 |
16.335 | 16.205 | 130 |
17,629 | 17.466 | 163 |
19.414 | 19.282 | 132 |
21.659 | 21.525 | 134 |
28.816 | 28.686 | 143 |
32.593 | 32.450 | 143 |
34.057 | 33.938 | 119 |
35.656 | 35.544 | 112 |
For a preliminar test the latency is around 133,4ms . Please confirm these results on your side, also; it seems like currently the most amount of CPU bound operations are being spent on the rtpvrawdepay (Since as of our previous tests, just adding the depayloader adds around 15% of CPU usage ) we recommend that checking this element should be our next step.