IMX6 Memory Bandwidth usage

From RidgeRun Developer Wiki
Revision as of 11:58, 12 February 2017 by Spalli (talk | contribs) (→‎Introduction)

Introduction

This wiki page sumarizes the results for the memory bandwidth usage measurements for an iMX6 based board. For this test a Nitrogen6X board with a iMX6Q SoC Boundary Devices iMX6 Nitrogen6X Board was used with the RidgeRun Professional SDK Irazu for BoundaryDevices Nitrogen6X iMX6 board - Release Notes and the TC258743 toshiba module was used for 1080p60 captures.

Here you'll find the memory bandwidth utilisation for some common multimedia tasks that can help to understand what's happening behind the scenes when we are pushing the platform to the limits.

In the first section you'll find a short explanation about the available hardware and software profiling tools that can help the developer to understand the platform performance and after this, some experimental results.

Memory Bandwidth measurement

The iMX6 MMDC (mumulti-mode DDR controller) has a hardware debugging mechanism that monitors each access that is driven to the MMDC from channel0.

The profiling mechanism provides the ability to calculate the DDR utilization together with read and write accesses statistics towards DDR per given period of time.

MMDC supports the following profiling counters:

  • MADPSR0 (Total cycles count) - Indicates the total amount of cycles of the profiling period (up to 2^32 cycles)
  • MADPSR1 (Busy cycles count) - Indicates the total busy cycles during the profiling period
  • MADPSR2 (Total read accesses count) - Indicates the total read accesses towards MMDC during the profiling period
  • MADPSR3 (Total write accesses count) - Indicates the total write accesses towards MMDC during the profiling period
  • MADPSR4 (Total read bytes count) - Indicates total bytes that were read from MMDC during the profiling period
  • MADPSR5 (Total write bytes count) - Indicates total bytes that were written to MMDC during the profiling period

All profiling items described above are disabled by default. The following describes howto control the profiling mechanism:

  • MADPCR0[DBG_EN] enables profiling.
  • MADPCR0[PRF_FRZ] stops/freezes the profiling for example in case user wishes to perform DDR profiling per specific task. In order to resume profiling then MADPCR0[PRF_FRZ] should be cleared.
  • MADPCR0[DBG_RST] clears all profiling counters
  • MADPCR0[CYC_OVF] indicates whether an overflow occured in the total cycles counter (i.e. total amount of cycles are greater than 2^32) . This field can only cleared by writing '0'.

Read/Write statistics can be collected per specific AXI ID (16bits). The following fields in MADPCR1 register determines which AXI-ID or AXI-ID's to monitor:

  • PRF_AXI_ID defines which AXI IDs are taken for profiling. Default values is 16'h0.
  • PRF_AXI_ID_MASK defines which bits from PRF_AXI_ID will be compared with AXI ID of read/write access. "1" means to monitor the associated bit and "0" means don't care. Default value is 16'h0000, meaning all IDs are monitored So the AXI-IDs to be monitored are calculated according to the following equation:
(AXI-ID & PRF_AXI_ID_MASK) Xnor (PRF_AXI_ID & PRF_AXI_ID_MASK)

For example if AXI ID's between A100 till A1FF are wished to be monitored then the following should be configured:

* PRF_AXI_ID = A100
* PRF_AXI_ID_MASK = FF00

Note: see table Table 44-10 of IMX6 Reference manual for a complete list of AXI IDs. Note: Information from IMX6 Reference manual.

Freescale Profiling tool

Where to get?

Freescale mmdc profiling tool patch

Usage

For profiling task, the Freescale profiling tool was used which is a C applications that makes use of the previously mentioned set of registers to get statistics of the memory access. This is the basic usage if the tool:

mmdc [ARM:IPU1:IPU2:GPU2D:GPU2D1:GPU2D2:GPU3D:GPUVG:VPU:SUM] [...]

With IPU1, IPU2, ... being the AXI to be monitored.

The application also uses environment variables to control its behavior:

  • export MMDC_SLEEPTIME can be used to define profiling duration. 1 by default means 1s
  • export MMDC_LOOPCOUNT can be used to define profiling times. 1 by default. -1 means infinite loop
  • export MMDC_CUST_MADPCR1 can be used to customize madpcr1. Will ignore it if defined master

Note1: More than 1 master can be inputed. They will be profiled one by one.

Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead.

Here is an example of the output of the tool:

MMDC new Profiling results:
***********************
Measure time: 1000ms 
Total cycles count: 528083072
Busy cycles count: 146043821
Read accesses count: 7777907
Write accesses count: 808
Read bytes count: 497745652
Write bytes count: 16424
Avg. Read burst size: 63
Avg. Write burst size: 20
Read: 474.69 MB/s /  Write: 0.02 MB/s  Total: 474.70 MB/s 
Utilization: 21%
Overall Bus Load: 27%
Bytes Access: 63

The meaning of some of the results is as follows:

  • Read, Write, Total: Number of MB/s during the configured window of time.
  • Utilization: percentage of data transfered compared to the data that could be transfered if all the busy cycles are used to transfer data. It is calculated as follows:
(read_bytes + write_bytes) / (busy_cycles * 16) * 100
  • Overall Bus Load: number of busy cycles compared to the total number of cycles in the time window. It is calculated as follows:
busy_cycles / total_cycles * 100

Results

Following are the results for the memory bandwidth utilization for the iMX6 board. For this test the board was configured for a 1080p60 and 720p60 HDMI monitor with the Toshiba TC358743 module HDMI to CSI2 module as well as a USB camera for capturing.

The performance was evaluated for the board in a idle state (measure the memory bandwidth usage just for screen refresh) and while capturing and displaying from USB and/or HDMI.

Idle state

To perform this test follow this steps:

  • Boot the board.
  • Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile 

Results

  • For 1080p60 display
MMDC new Profiling results:
***********************
Measure time: 1000ms 
Total cycles count: 528083072
Busy cycles count: 146043821
Read accesses count: 7777907
Write accesses count: 808
Read bytes count: 497745652
Write bytes count: 16424
Avg. Read burst size: 63
Avg. Write burst size: 20
Read: 474.69 MB/s /  Write: 0.02 MB/s  Total: 474.70 MB/s 
Utilization: 21%
Overall Bus Load: 27%
Bytes Access: 63
  • For 720p60 display
MMDC new Profiling results:
***********************
Measure time: 1000ms 
Total cycles count: 528049216
Busy cycles count: 71949619
Read accesses count: 3457197
Write accesses count: 848
Read bytes count: 221228972
Write bytes count: 16976
Avg. Read burst size: 63
Avg. Write burst size: 20
Read: 210.98 MB/s /  Write: 0.02 MB/s  Total: 211.00 MB/s 
Utilization: 19%
Overall Bus Load: 13%
Bytes Access: 63
  • Notes
  • The bw for the display can be calculated as: Width x Height x 60fps x 4 bytes/pixel = 474.609375 MB/s for 1080p60 display

1080p60 loopback

To perform this test follow this steps:

  • Start the board
  • Run the 1080p60 loopback pipeline
gst-launch mfw_v4lsrc device=/dev/video0 capture-mode=5 fps-n=60 queue-size=7 ! queue max-size-buffers=3 ! mfw_isink &
  • Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile 

Results

MMDC new Profiling results:
***********************
Measure time: 1001ms 
Total cycles count: 528076616
Busy cycles count: 404778945
Read accesses count: 21624860
Write accesses count: 10023434
Read bytes count: 1189680802
Write bytes count: 641347184
Avg. Read burst size: 55
Avg. Write burst size: 63
Read: 1133.43 MB/s /  Write: 611.03 MB/s  Total: 1744.46 MB/s 
Utilization: 28%
Overall Bus Load: 76%
Bytes Access: 57
  • Notes
  • The memory bus load increased 50% with just the 1080p60 playback, this is about 1270MB/s.

USB 640x480p30 loopback

To perform this test follow this steps:

  • Start the board
  • Run the USB loopback pipeline
gst-launch v4l2src device=/dev/video1 ! capsfilter caps=video/x-raw-yuv,width=640,height=480 ! queue max-size-buffers=3 ! \ 
mfw_isink disp-width=640 disp-height=480 axis-top=0 axis-left=0 &
  • Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile 

Results

MMDC new Profiling results:
***********************
Measure time: 1000ms 
Total cycles count: 528097848
Busy cycles count: 344161593
Read accesses count: 19909024
Write accesses count: 3039038
Read bytes count: 1116222891
Write bytes count: 113826616
Avg. Read burst size: 56
Avg. Write burst size: 37
Read: 1064.51 MB/s /  Write: 108.55 MB/s  Total: 1173.07 MB/s 
Utilization: 22%
Overall Bus Load: 65%
Bytes Access: 53
  • Notes
  • The bus load for just the USB loopback is ~ 37% which is ~ 700 MB/s

HDMI 1080p60 + USB loopback

To perform this test follow this steps:

  • Start the board
  • Run the HDMI loopback pipeline
gst-launch mfw_v4lsrc device=/dev/video0 capture-mode=5 fps-n=60 queue-size=7 ! queue max-size-buffers=3 ! mfw_isink &
  • Run the USB loopback pipeline
gst-launch v4l2src device=/dev/video1 ! capsfilter caps=video/x-raw-yuv,width=640,height=480 ! queue max-size-buffers=3 ! \ 
mfw_isink disp-width=640 disp-height=480 axis-top=0 axis-left=0 &
  • Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile 

Results

MMDC new Profiling results:
***********************
Measure time: 1001ms 
Total cycles count: 528083568
Busy cycles count: 454432109
Read accesses count: 24926364
Write accesses count: 11549391
Read bytes count: 1275774249
Write bytes count: 655829612
Avg. Read burst size: 51
Avg. Write burst size: 56
Read: 1215.46 MB/s /  Write: 624.82 MB/s  Total: 1840.28 MB/s 
Utilization: 26%
Overall Bus Load: 86%
Bytes Access: 52
  • Notes
  • The bus load is about 60% with a memory bw of ~ 1360 MB/s

720p60 loopback

To perform this test follow this steps:

  • Start the board
  • Run the 720p60 loopback pipeline
gst-launch mfw_v4lsrc device=/dev/video0 capture-mode=3 fps-n=60 queue-size=7 ! queue max-size-buffers=3 ! \
mfw_isink  disp-width=1280 disp-height=720 &
  • Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile 

Results

  • For 720p60 display
MMDC new Profiling results:
***********************
Measure time: 1000ms 
Total cycles count: 528075928
Busy cycles count: 248097991
Read accesses count: 10464109
Write accesses count: 5272139
Read bytes count: 556042573
Write bytes count: 337293336
Avg. Read burst size: 53
Avg. Write burst size: 63
Read: 530.28 MB/s /  Write: 321.67 MB/s  Total: 851.95 MB/s 
Utilization: 22%
Overall Bus Load: 46%
Bytes Access: 56
  • For 1080p60 display
MMDC new Profiling results:
***********************
Measure time: 1003ms 
Total cycles count: 530015744
Busy cycles count: 341786995
Read accesses count: 17489708
Write accesses count: 6896141
Read bytes count: 1059274006
Write bytes count: 441232712
Avg. Read burst size: 60
Avg. Write burst size: 63
Read: 1007.18 MB/s /  Write: 419.53 MB/s  Total: 1426.71 MB/s 
Utilization: 27%
Overall Bus Load: 64%
Bytes Access: 61
  • Notes
  • When capturing and displaying at 720p60 the Bus load for the capture is about 33% which is 640 MB/s
  • When capturing at 720p60 and displaying at 1080p60 the Bus load for the capture is ~ 37% which is 952 MB/s, this because besides the capture the frame needs to be resized.

Comments

  • When the system is running out of memory bandwidth you can start experimenting problems with the HDMI monitor which gives you the warning message imx-ipuv3 imx-ipuv3.0: IPU Warning - IPU_INT_STAT_10 = 0x00100000. This means a synchronous display error interrupt, as a result of an error during access to a synchronous display. one solution to this issue is reducing the use case be reducing the display resolution, the needed framerate, frame buffer bits per pixel or any other measure that helps to reduce the memory bandwidth utilization.