IMX6 GStreamer Video Pipeline Tuning

From RidgeRun Developer Wiki

Problems running the pipelines shown on this page?
Please see our GStreamer Debugging guide for help.

Introduction

Various pipelines are used to show the steps in tuning video capture performance. The goal is to get 1080p30 capture and encoding to work with high image quality at 30 fps without dropping frames. The test was performed in a NXP-Freescale Quick Start Board for SCM-i.MX 6DQ running:

  • Kernel 3.14.52
  • GStreamer 1.8.3
  • GStreamer-imx 0.12.2
  • OV5640 mipi camera

The graphs were generated using RidgeRun's GstShark GStreamer pipeline analysis tool.

Video pipeline analysis

Capture only

  • Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! perf print-arm-load=true \
! fakesink -v

Pipeline output with fps and arm load values

INFO:
Timestamp: 0:02:03.925626013; Bps: 0; fps: 0.0; CPU: 4; 
INFO:
Timestamp: 0:02:04.958788347; Bps: 93342110; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:02:05.958789680; Bps: 93312000; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:02:06.993728680; Bps: 93251837; fps: 29.98; CPU: 5; 
INFO:
Timestamp: 0:02:08.025446014; Bps: 93523181; fps: 30.6; CPU: 1; 
INFO:
Timestamp: 0:02:09.025451014; Bps: 93312000; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:02:10.058777347; Bps: 93342110; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:02:11.058780347; Bps: 93312000; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:02:12.092110681; Bps: 93342110; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:02:13.092114347; Bps: 93312000; fps: 30.0; CPU: 0; 

Results

Video output is a perfect 1080p@30 fps.

Capture with display

Enable HDMI output:

echo 0 > /sys/devices/soc0/fb.20/graphics/fb2/blank
fbset -fb /dev/fb2 -geometry 1920 1080 1920 1080 32
  • Pipeline
FB_MULTI_BUFFER=2  GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \
perf print-arm-load=true ! queue ! imxeglvivsink native-display=2 -v

Pipeline output with fps and arm load values

INFO:
Timestamp: 0:07:20.618023384; Bps: 93342110; fps: 30.0; CPU: 1; 
INFO:
Timestamp: 0:07:21.651356384; Bps: 93342110; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:07:22.651358718; Bps: 93312000; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:07:23.651359051; Bps: 93312000; fps: 30.0; CPU: 1; 
INFO:
Timestamp: 0:07:24.684693051; Bps: 93342110; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:07:25.718023718; Bps: 93342110; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:07:26.751354718; Bps: 93342110; fps: 30.0; CPU: 0; 
INFO:
Timestamp: 0:07:27.751356052; Bps: 93312000; fps: 30.0; CPU: 0; 

Results

As can be seen the system is perfectly capable of capturing and displaying 1080p@30 fps

Video capture and H264 encode using GStreamer IMX element

  • Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \
imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=1000 ! perf print-arm-load=true ! fakesink sync=false -v

Pipeline output with fps and arm load values

Timestamp: 0:38:37.161013941; Bps: 131793; fps: 29.29; CPU: 3; 
INFO:
Timestamp: 0:38:38.179769275; Bps: 121533; fps: 29.46; CPU: 2; 
INFO:
Timestamp: 0:38:39.200159608; Bps: 120641; fps: 29.41; CPU: 4; 
INFO:
Timestamp: 0:38:40.209696942; Bps: 120624; fps: 29.73; CPU: 3; 
INFO:
Timestamp: 0:38:41.228654275; Bps: 177109; fps: 29.46; CPU: 5; 
INFO:
Timestamp: 0:38:42.247093942; Bps: 105061; fps: 29.46; CPU: 2; 
INFO:
Timestamp: 0:38:43.265472275; Bps: 105857; fps: 29.46; CPU: 3; 
INFO:
Timestamp: 0:38:44.274336275; Bps: 104111; fps: 29.76; CPU: 3; 
INFO:
Timestamp: 0:38:45.299389942; Bps: 175484; fps: 29.26; CPU: 5; 

If we increase the bitrate to 5Mbps:

  • Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \
imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=5000 ! perf print-arm-load=true ! fakesink sync=false -v

Pipeline output with fps and arm load values

We get a lower framerate:

INFO:
Timestamp: 0:39:26.616703947; Bps: 607716; fps: 28.82; CPU: 1; 
INFO:
Timestamp: 0:39:27.630372281; Bps: 595790; fps: 28.62; CPU: 1; 
INFO:
Timestamp: 0:39:28.633810281; Bps: 735368; fps: 28.91; CPU: 1; 
INFO:
Timestamp: 0:39:29.645368947; Bps: 556646; fps: 28.68; CPU: 0; 
INFO:
Timestamp: 0:39:30.653676614; Bps: 561556; fps: 28.76; CPU: 3; 
INFO:
Timestamp: 0:39:31.665020948; Bps: 557725; fps: 28.68; CPU: 2; 
INFO:
Timestamp: 0:39:32.673943614; Bps: 746591; fps: 28.76; CPU: 6; 
INFO:
Timestamp: 0:39:33.683462615; Bps: 552546; fps: 28.74; CPU: 0; 
INFO:
Timestamp: 0:39:34.692886281; Bps: 550875; fps: 28.74; CPU: 0; 

And if we set the bitrate as high as possible (constant quality):

  • Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \
imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=0 ! perf print-arm-load=true ! fakesink sync=false -v

Pipeline output with fps and arm load values

The framerate is even lower:

INFO:
Timestamp: 0:40:47.386894957; Bps: 8403929; fps: 18.48; CPU: 4; 
INFO:
Timestamp: 0:40:48.415934290; Bps: 8418616; fps: 18.46; CPU: 4; 
INFO:
Timestamp: 0:40:49.441442624; Bps: 8446533; fps: 18.53; CPU: 4; 
INFO:
Timestamp: 0:40:50.468978957; Bps: 8421129; fps: 18.50; CPU: 4; 
INFO:
Timestamp: 0:40:51.492867957; Bps: 8436219; fps: 18.57; CPU: 3; 
INFO:
Timestamp: 0:40:52.520492291; Bps: 8399315; fps: 18.50; CPU: 4; 

And if we see the processing time:

Processing Time

As can be seen, the bottle neck is the encoder, so lets try to optimize it.

If we decrease the Search range for motion estimation (me-search-range property) from 256x128 (he default value) to 32x32 (the minimum value) we get an improvement in the framerate when encoding at 5Mbps:

  • Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \
imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=5000 me-search-range=3 ! perf print-arm-load=true ! \
fakesink sync=false -v

Pipeline output with fps and arm load values

INFO:
Timestamp: 0:49:53.423869355; Bps: 623678; fps: 29.23; CPU: 1; 
INFO:
Timestamp: 0:49:54.447292689; Bps: 611960; fps: 29.32; CPU: 0; 
INFO:
Timestamp: 0:49:55.471428022; Bps: 608625; fps: 29.29; CPU: 0; 
INFO:
Timestamp: 0:49:56.487081022; Bps: 614823; fps: 29.55; CPU: 1; 
INFO:
Timestamp: 0:49:57.508946022; Bps: 729395; fps: 29.38; CPU: 0; 
INFO:
Timestamp: 0:49:58.531116689; Bps: 570587; fps: 29.35; CPU: 0; 
INFO:
Timestamp: 0:49:59.554301356; Bps: 574347; fps: 29.32; CPU: 1; 
INFO:
Timestamp: 0:50:00.570718689; Bps: 569524; fps: 29.52; CPU: 1; 

Which gives an improvement of about 1 fps.

Results

  • The bitrate affects the framerate indicating the encoder is a bottleneck.
  • Decreasing the search range for motion estimation increases the final framerate.

Capture with H264 encode and UDP streaming

  • Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \
imxvpuenc_h264 gop-size=120 idr-interval=120 bitrate=5000 me-search-range=3 ! \
perf print-arm-load=true ! queue ! mpegtsmux ! queue ! udpsink sync=false -v

Pipeline output with fps and arm load values

INFO:
Timestamp: 1:24:37.232159604; Bps: 618132; fps: 29.4; CPU: 11; 
INFO:
Timestamp: 1:24:38.254458604; Bps: 610727; fps: 29.35; CPU: 9; 
INFO:
Timestamp: 1:24:39.276701270; Bps: 610045; fps: 29.35; CPU: 10; 
INFO:
Timestamp: 1:24:40.291613604; Bps: 614996; fps: 29.58; CPU: 9; 
INFO:
Timestamp: 1:24:41.314474604; Bps: 761279; fps: 29.35; CPU: 10; 
INFO:
Timestamp: 1:24:42.332315604; Bps: 564174; fps: 29.49; CPU: 8; 
INFO:
Timestamp: 1:24:43.352075271; Bps: 559038; fps: 29.44; CPU: 8; 
INFO:
Timestamp: 1:24:44.364384604; Bps: 566705; fps: 29.64; CPU: 8; 

We can also see the per element processing time:

Processing Time

Results

  • The streaming part doesn't seem to be affecting the framerate.

reducing distance between key frames

This is accomplished by using a smaller GOP and idr-intervals, in this case 15 frames.

  • Pipeline
GST_DEBUG=WARNING gst-launch-1.0 imxv4l2videosrc imx-capture-mode=5 queue-size=8 ! queue ! \
imxvpuenc_h264 gop-size=15 idr-interval=15 bitrate=5000 me-search-range=3 ! \
perf print-arm-load=true ! queue ! mpegtsmux ! queue ! udpsink sync=false -v

Pipeline output with fps and arm load values

INFO:
Timestamp: 0:13:08.406659759; Bps: 610497; fps: 30.0; CPU: 7; 
INFO:
Timestamp: 0:13:09.406992426; Bps: 624329; fps: 30.0; CPU: 9; 
INFO:
Timestamp: 0:13:10.409770760; Bps: 654854; fps: 29.94; CPU: 9; 
INFO:
Timestamp: 0:13:11.410689760; Bps: 597911; fps: 30.0; CPU: 8; 
INFO:
Timestamp: 0:13:12.417094760; Bps: 600857; fps: 29.82; CPU: 14; 
INFO:
Timestamp: 0:13:13.451022093; Bps: 623654; fps: 30.0; CPU: 11; 
INFO:
Timestamp: 0:13:14.453816760; Bps: 620975; fps: 29.94; CPU: 9; 
INFO:
Timestamp: 0:13:15.456148427; Bps: 637886; fps: 29.94; CPU: 9; 
INFO:
Timestamp: 0:13:16.456602427; Bps: 608855; fps: 30.0; CPU: 7; 
INFO:
Timestamp: 0:13:17.488896427; Bps: 652532; fps: 30.3; CPU: 9; 

As can be seen the result is much better.

Processing Time

Results

  • Sending keyframes more frequently improved the framerate.