IMX6 GStreamer Video Pipeline Tuning
Problems running the pipelines shown on this page? |
Introduction
Various pipelines are used to show the steps in tuning video capture performance. The goal is to get 1080p30 capture and encoding to work with high image quality at 30 fps without dropping frames. The test was performed in a NXP-Freescale Quick Start Board for SCM-i.MX 6DQ running:
- Kernel 3.14.52
- GStreamer 1.8.3
- GStreamer-imx 0.12.2
- OV5640 mipi camera
The graphs were generated using RidgeRun's GstShark GStreamer pipeline analysis tool.
Video pipeline analysis
Capture only
- Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! perf print-arm-load=true \ ! fakesink -v
Pipeline output with fps and arm load values
INFO: Timestamp: 0:02:03.925626013; Bps: 0; fps: 0.0; CPU: 4; INFO: Timestamp: 0:02:04.958788347; Bps: 93342110; fps: 30.0; CPU: 0; INFO: Timestamp: 0:02:05.958789680; Bps: 93312000; fps: 30.0; CPU: 0; INFO: Timestamp: 0:02:06.993728680; Bps: 93251837; fps: 29.98; CPU: 5; INFO: Timestamp: 0:02:08.025446014; Bps: 93523181; fps: 30.6; CPU: 1; INFO: Timestamp: 0:02:09.025451014; Bps: 93312000; fps: 30.0; CPU: 0; INFO: Timestamp: 0:02:10.058777347; Bps: 93342110; fps: 30.0; CPU: 0; INFO: Timestamp: 0:02:11.058780347; Bps: 93312000; fps: 30.0; CPU: 0; INFO: Timestamp: 0:02:12.092110681; Bps: 93342110; fps: 30.0; CPU: 0; INFO: Timestamp: 0:02:13.092114347; Bps: 93312000; fps: 30.0; CPU: 0;
Results
Video output is a perfect 1080p@30 fps.
Capture with display
Enable HDMI output:
echo 0 > /sys/devices/soc0/fb.20/graphics/fb2/blank fbset -fb /dev/fb2 -geometry 1920 1080 1920 1080 32
- Pipeline
FB_MULTI_BUFFER=2 GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \ perf print-arm-load=true ! queue ! imxeglvivsink native-display=2 -v
Pipeline output with fps and arm load values
INFO: Timestamp: 0:07:20.618023384; Bps: 93342110; fps: 30.0; CPU: 1; INFO: Timestamp: 0:07:21.651356384; Bps: 93342110; fps: 30.0; CPU: 0; INFO: Timestamp: 0:07:22.651358718; Bps: 93312000; fps: 30.0; CPU: 0; INFO: Timestamp: 0:07:23.651359051; Bps: 93312000; fps: 30.0; CPU: 1; INFO: Timestamp: 0:07:24.684693051; Bps: 93342110; fps: 30.0; CPU: 0; INFO: Timestamp: 0:07:25.718023718; Bps: 93342110; fps: 30.0; CPU: 0; INFO: Timestamp: 0:07:26.751354718; Bps: 93342110; fps: 30.0; CPU: 0; INFO: Timestamp: 0:07:27.751356052; Bps: 93312000; fps: 30.0; CPU: 0;
Results
As can be seen the system is perfectly capable of capturing and displaying 1080p@30 fps
Video capture and H264 encode using GStreamer IMX element
- Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \ imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=1000 ! perf print-arm-load=true ! fakesink sync=false -v
Pipeline output with fps and arm load values
Timestamp: 0:38:37.161013941; Bps: 131793; fps: 29.29; CPU: 3; INFO: Timestamp: 0:38:38.179769275; Bps: 121533; fps: 29.46; CPU: 2; INFO: Timestamp: 0:38:39.200159608; Bps: 120641; fps: 29.41; CPU: 4; INFO: Timestamp: 0:38:40.209696942; Bps: 120624; fps: 29.73; CPU: 3; INFO: Timestamp: 0:38:41.228654275; Bps: 177109; fps: 29.46; CPU: 5; INFO: Timestamp: 0:38:42.247093942; Bps: 105061; fps: 29.46; CPU: 2; INFO: Timestamp: 0:38:43.265472275; Bps: 105857; fps: 29.46; CPU: 3; INFO: Timestamp: 0:38:44.274336275; Bps: 104111; fps: 29.76; CPU: 3; INFO: Timestamp: 0:38:45.299389942; Bps: 175484; fps: 29.26; CPU: 5;
If we increase the bitrate to 5Mbps:
- Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \ imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=5000 ! perf print-arm-load=true ! fakesink sync=false -v
Pipeline output with fps and arm load values
We get a lower framerate:
INFO: Timestamp: 0:39:26.616703947; Bps: 607716; fps: 28.82; CPU: 1; INFO: Timestamp: 0:39:27.630372281; Bps: 595790; fps: 28.62; CPU: 1; INFO: Timestamp: 0:39:28.633810281; Bps: 735368; fps: 28.91; CPU: 1; INFO: Timestamp: 0:39:29.645368947; Bps: 556646; fps: 28.68; CPU: 0; INFO: Timestamp: 0:39:30.653676614; Bps: 561556; fps: 28.76; CPU: 3; INFO: Timestamp: 0:39:31.665020948; Bps: 557725; fps: 28.68; CPU: 2; INFO: Timestamp: 0:39:32.673943614; Bps: 746591; fps: 28.76; CPU: 6; INFO: Timestamp: 0:39:33.683462615; Bps: 552546; fps: 28.74; CPU: 0; INFO: Timestamp: 0:39:34.692886281; Bps: 550875; fps: 28.74; CPU: 0;
And if we set the bitrate as high as possible (constant quality):
- Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \ imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=0 ! perf print-arm-load=true ! fakesink sync=false -v
Pipeline output with fps and arm load values
The framerate is even lower:
INFO: Timestamp: 0:40:47.386894957; Bps: 8403929; fps: 18.48; CPU: 4; INFO: Timestamp: 0:40:48.415934290; Bps: 8418616; fps: 18.46; CPU: 4; INFO: Timestamp: 0:40:49.441442624; Bps: 8446533; fps: 18.53; CPU: 4; INFO: Timestamp: 0:40:50.468978957; Bps: 8421129; fps: 18.50; CPU: 4; INFO: Timestamp: 0:40:51.492867957; Bps: 8436219; fps: 18.57; CPU: 3; INFO: Timestamp: 0:40:52.520492291; Bps: 8399315; fps: 18.50; CPU: 4;
And if we see the processing time:
As can be seen, the bottle neck is the encoder, so lets try to optimize it.
If we decrease the Search range for motion estimation (me-search-range property) from 256x128 (he default value) to 32x32 (the minimum value) we get an improvement in the framerate when encoding at 5Mbps:
- Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \ imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=5000 me-search-range=3 ! perf print-arm-load=true ! \ fakesink sync=false -v
Pipeline output with fps and arm load values
INFO: Timestamp: 0:49:53.423869355; Bps: 623678; fps: 29.23; CPU: 1; INFO: Timestamp: 0:49:54.447292689; Bps: 611960; fps: 29.32; CPU: 0; INFO: Timestamp: 0:49:55.471428022; Bps: 608625; fps: 29.29; CPU: 0; INFO: Timestamp: 0:49:56.487081022; Bps: 614823; fps: 29.55; CPU: 1; INFO: Timestamp: 0:49:57.508946022; Bps: 729395; fps: 29.38; CPU: 0; INFO: Timestamp: 0:49:58.531116689; Bps: 570587; fps: 29.35; CPU: 0; INFO: Timestamp: 0:49:59.554301356; Bps: 574347; fps: 29.32; CPU: 1; INFO: Timestamp: 0:50:00.570718689; Bps: 569524; fps: 29.52; CPU: 1;
Which gives an improvement of about 1 fps.
Results
- The bitrate affects the framerate indicating the encoder is a bottleneck.
- Decreasing the search range for motion estimation increases the final framerate.
Capture with H264 encode and UDP streaming
- Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \ imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=5000 me-search-range=3 ! \ perf print-arm-load=true ! queue ! mpegtsmux ! queue ! udpsink sync=false -v
Pipeline output with fps and arm load values
INFO: Timestamp: 1:24:37.232159604; Bps: 618132; fps: 29.4; CPU: 11; INFO: Timestamp: 1:24:38.254458604; Bps: 610727; fps: 29.35; CPU: 9; INFO: Timestamp: 1:24:39.276701270; Bps: 610045; fps: 29.35; CPU: 10; INFO: Timestamp: 1:24:40.291613604; Bps: 614996; fps: 29.58; CPU: 9; INFO: Timestamp: 1:24:41.314474604; Bps: 761279; fps: 29.35; CPU: 10; INFO: Timestamp: 1:24:42.332315604; Bps: 564174; fps: 29.49; CPU: 8; INFO: Timestamp: 1:24:43.352075271; Bps: 559038; fps: 29.44; CPU: 8; INFO: Timestamp: 1:24:44.364384604; Bps: 566705; fps: 29.64; CPU: 8;
We can also see the per element processing time:
Results
- The streaming part doesn't seem to be affecting the framerate.
reducing distance between key frames
This is accomplished by using a smaller GOP and idr-intervals, in this case 15 frames.
- Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \ imxvpuenc_h264 gop-size=15 idr-interval=15 bitrate=5000 me-search-range=3 ! \ perf print-arm-load=true ! queue ! mpegtsmux ! queue ! udpsink sync=false -v
Pipeline output with fps and arm load values
INFO: Timestamp: 0:13:08.406659759; Bps: 610497; fps: 30.0; CPU: 7; INFO: Timestamp: 0:13:09.406992426; Bps: 624329; fps: 30.0; CPU: 9; INFO: Timestamp: 0:13:10.409770760; Bps: 654854; fps: 29.94; CPU: 9; INFO: Timestamp: 0:13:11.410689760; Bps: 597911; fps: 30.0; CPU: 8; INFO: Timestamp: 0:13:12.417094760; Bps: 600857; fps: 29.82; CPU: 14; INFO: Timestamp: 0:13:13.451022093; Bps: 623654; fps: 30.0; CPU: 11; INFO: Timestamp: 0:13:14.453816760; Bps: 620975; fps: 29.94; CPU: 9; INFO: Timestamp: 0:13:15.456148427; Bps: 637886; fps: 29.94; CPU: 9; INFO: Timestamp: 0:13:16.456602427; Bps: 608855; fps: 30.0; CPU: 7; INFO: Timestamp: 0:13:17.488896427; Bps: 652532; fps: 30.3; CPU: 9;
As can be seen the result is much better.
Results
- Sending keyframes more frequently improved the framerate.