IMX6 Memory Bandwidth usage
|
Introduction
This wiki page summarizes the results for the memory bandwidth usage measurements for an iMX6 based board. For this test, a Nitrogen6X board with a iMX6Q SoC Boundary Devices iMX6 Nitrogen6X Board was used with the RidgeRun Professional SDK Irazu for BoundaryDevices Nitrogen6X iMX6 board and the TC258743 Toshiba module was used for 1080p60 captures.
Here you'll find the memory bandwidth utilisation for some common multimedia tasks that can help to understand what's happening behind the scenes when we are pushing the platform to the limits.
In the first section you'll find a short explanation about the available hardware and software profiling tools that can help the developer to understand the platform performance and after this, some experimental results.
Memory Bandwidth measurement
The iMX6 MMDC (mumulti-mode DDR controller) has a hardware debugging mechanism that monitors each access that is driven to the MMDC from channel0.
The profiling mechanism provides the ability to calculate the DDR utilization together with read and write accesses statistics towards DDR per given period of time.
MMDC supports the following profiling counters:
- MADPSR0 (Total cycles count) - Indicates the total amount of cycles of the profiling period (up to 2^32 cycles)
- MADPSR1 (Busy cycles count) - Indicates the total busy cycles during the profiling period
- MADPSR2 (Total read accesses count) - Indicates the total read accesses towards MMDC during the profiling period
- MADPSR3 (Total write accesses count) - Indicates the total write accesses towards MMDC during the profiling period
- MADPSR4 (Total read bytes count) - Indicates total bytes that were read from MMDC during the profiling period
- MADPSR5 (Total write bytes count) - Indicates total bytes that were written to MMDC during the profiling period
All profiling items described above are disabled by default. The following describes howto control the profiling mechanism:
- MADPCR0[DBG_EN] enables profiling.
- MADPCR0[PRF_FRZ] stops/freezes the profiling for example in case user wishes to perform DDR profiling per specific task. In order to resume profiling then MADPCR0[PRF_FRZ] should be cleared.
- MADPCR0[DBG_RST] clears all profiling counters
- MADPCR0[CYC_OVF] indicates whether an overflow occurred in the total cycles counter (i.e. total amount of cycles are greater than 2^32) . This field can only be cleared by writing '0'.
Read/Write statistics can be collected per specific AXI ID (16bits). The following fields in MADPCR1 register determines which AXI-ID or AXI-ID's to monitor:
- PRF_AXI_ID defines which AXI IDs are taken for profiling. Default values is 16'h0.
- PRF_AXI_ID_MASK defines which bits from PRF_AXI_ID will be compared with AXI ID of read/write access. "1" means to monitor the associated bit and "0" means don't care. The default value is 16'h0000, meaning all IDs are monitored So the AXI-IDs to be monitored are calculated according to the following equation:
(AXI-ID & PRF_AXI_ID_MASK) Xnor (PRF_AXI_ID & PRF_AXI_ID_MASK)
For example if AXI ID's between A100 till A1FF are wished to be monitored then the following should be configured:
* PRF_AXI_ID = A100 * PRF_AXI_ID_MASK = FF00
Note: see table Table 44-10 of the IMX6 Reference manual for a complete list of AXI IDs. Note: Information from IMX6 Reference manual.
Freescale Profiling tool
Where to get?
Freescale mmdc profiling tool patch
Usage
For the profiling task, the Freescale profiling tool was used which is a C application that makes use of the previously mentioned set of registers to get statistics of the memory access. This is the basic usage of the tool:
mmdc [ARM:IPU1:IPU2:GPU2D:GPU2D1:GPU2D2:GPU3D:GPUVG:VPU:SUM] [...]
With IPU1, IPU2, ... being the AXI to be monitored.
The application also uses environment variables to control its behavior:
- export MMDC_SLEEPTIME can be used to define profiling duration. 1 by default means 1s
- export MMDC_LOOPCOUNT can be used to define profiling times. 1 by default. -1 means infinite loop
- export MMDC_CUST_MADPCR1 can be used to customize madpcr1. Will ignore it if defined master
Note1: More than 1 master can be inputted. They will be profiled one by one.
Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead.
Here is an example of the output of the tool:
MMDC new Profiling results: *********************** Measure time: 1000ms Total cycles count: 528083072 Busy cycles count: 146043821 Read accesses count: 7777907 Write accesses count: 808 Read bytes count: 497745652 Write bytes count: 16424 Avg. Read burst size: 63 Avg. Write burst size: 20 Read: 474.69 MB/s / Write: 0.02 MB/s Total: 474.70 MB/s Utilization: 21% Overall Bus Load: 27% Bytes Access: 63
The meaning of some of the results is as follows:
- Read, Write, Total: Number of MB/s during the configured window of time.
- Utilization: percentage of data transferred compared to the data that could be transferred if all the busy cycles are used to transfer data. It is calculated as follows:
(read_bytes + write_bytes) / (busy_cycles * 16) * 100
- Overall Bus Load: the number of busy cycles compared to the total number of cycles in the time window. It is calculated as follows:
busy_cycles / total_cycles * 100
Results
Following are the results for the memory bandwidth utilization for the iMX6 board. For this test, the board was configured for a 1080p60 and 720p60 HDMI monitor with the Toshiba TC358743 module HDMI to CSI2 module as well as a USB camera for capturing.
The performance was evaluated for the board in an idle state (measure the memory bandwidth usage just for screen refresh) and while capturing and displaying from USB and/or HDMI.
Idle state
To perform this test follow these steps:
- Boot the board.
- Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile
Results
- For 1080p60 display
MMDC new Profiling results: *********************** Measure time: 1000ms Total cycles count: 528083072 Busy cycles count: 146043821 Read accesses count: 7777907 Write accesses count: 808 Read bytes count: 497745652 Write bytes count: 16424 Avg. Read burst size: 63 Avg. Write burst size: 20 Read: 474.69 MB/s / Write: 0.02 MB/s Total: 474.70 MB/s Utilization: 21% Overall Bus Load: 27% Bytes Access: 63
- For 720p60 display
MMDC new Profiling results: *********************** Measure time: 1000ms Total cycles count: 528049216 Busy cycles count: 71949619 Read accesses count: 3457197 Write accesses count: 848 Read bytes count: 221228972 Write bytes count: 16976 Avg. Read burst size: 63 Avg. Write burst size: 20 Read: 210.98 MB/s / Write: 0.02 MB/s Total: 211.00 MB/s Utilization: 19% Overall Bus Load: 13% Bytes Access: 63
- Notes
- The bw for the display can be calculated as: Width x Height x 60fps x 4 bytes/pixel = 474.609375 MB/s for 1080p60 display
1080p60 loopback
To perform this test follow these steps:
- Start the board
- Run the 1080p60 loopback pipeline
gst-launch mfw_v4lsrc device=/dev/video0 capture-mode=5 fps-n=60 queue-size=7 ! queue max-size-buffers=3 ! mfw_isink &
- Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile
Results
MMDC new Profiling results: *********************** Measure time: 1001ms Total cycles count: 528076616 Busy cycles count: 404778945 Read accesses count: 21624860 Write accesses count: 10023434 Read bytes count: 1189680802 Write bytes count: 641347184 Avg. Read burst size: 55 Avg. Write burst size: 63 Read: 1133.43 MB/s / Write: 611.03 MB/s Total: 1744.46 MB/s Utilization: 28% Overall Bus Load: 76% Bytes Access: 57
- Notes
- The memory bus load increased 50% with just the 1080p60 playback, this is about 1270MB/s.
USB 640x480p30 loopback
To perform this test follow these steps:
- Start the board
- Run the USB loopback pipeline
gst-launch v4l2src device=/dev/video1 ! capsfilter caps=video/x-raw-yuv,width=640,height=480 ! queue max-size-buffers=3 ! \ mfw_isink disp-width=640 disp-height=480 axis-top=0 axis-left=0 &
- Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile
Results
MMDC new Profiling results: *********************** Measure time: 1000ms Total cycles count: 528097848 Busy cycles count: 344161593 Read accesses count: 19909024 Write accesses count: 3039038 Read bytes count: 1116222891 Write bytes count: 113826616 Avg. Read burst size: 56 Avg. Write burst size: 37 Read: 1064.51 MB/s / Write: 108.55 MB/s Total: 1173.07 MB/s Utilization: 22% Overall Bus Load: 65% Bytes Access: 53
- Notes
- The bus load for just the USB loopback is ~ 37% which is ~ 700 MB/s
HDMI 1080p60 + USB loopback
To perform this test follow these steps:
- Start the board
- Run the HDMI loopback pipeline
gst-launch mfw_v4lsrc device=/dev/video0 capture-mode=5 fps-n=60 queue-size=7 ! queue max-size-buffers=3 ! mfw_isink &
- Run the USB loopback pipeline
gst-launch v4l2src device=/dev/video1 ! capsfilter caps=video/x-raw-yuv,width=640,height=480 ! queue max-size-buffers=3 ! \ mfw_isink disp-width=640 disp-height=480 axis-top=0 axis-left=0 &
- Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile
Results
MMDC new Profiling results: *********************** Measure time: 1001ms Total cycles count: 528083568 Busy cycles count: 454432109 Read accesses count: 24926364 Write accesses count: 11549391 Read bytes count: 1275774249 Write bytes count: 655829612 Avg. Read burst size: 51 Avg. Write burst size: 56 Read: 1215.46 MB/s / Write: 624.82 MB/s Total: 1840.28 MB/s Utilization: 26% Overall Bus Load: 86% Bytes Access: 52
- Notes
- The bus load is about 60% with a memory bw of ~ 1360 MB/s
720p60 loopback
To perform this test follow these steps:
- Start the board
- Run the 720p60 loopback pipeline
gst-launch mfw_v4lsrc device=/dev/video0 capture-mode=3 fps-n=60 queue-size=7 ! queue max-size-buffers=3 ! \ mfw_isink disp-width=1280 disp-height=720 &
- Run the mmd profiling tool.
MMDC_LOOPCOUNT=-1 ./mmdc_profile
Results
- For 720p60 display
MMDC new Profiling results: *********************** Measure time: 1000ms Total cycles count: 528075928 Busy cycles count: 248097991 Read accesses count: 10464109 Write accesses count: 5272139 Read bytes count: 556042573 Write bytes count: 337293336 Avg. Read burst size: 53 Avg. Write burst size: 63 Read: 530.28 MB/s / Write: 321.67 MB/s Total: 851.95 MB/s Utilization: 22% Overall Bus Load: 46% Bytes Access: 56
- For 1080p60 display
MMDC new Profiling results: *********************** Measure time: 1003ms Total cycles count: 530015744 Busy cycles count: 341786995 Read accesses count: 17489708 Write accesses count: 6896141 Read bytes count: 1059274006 Write bytes count: 441232712 Avg. Read burst size: 60 Avg. Write burst size: 63 Read: 1007.18 MB/s / Write: 419.53 MB/s Total: 1426.71 MB/s Utilization: 27% Overall Bus Load: 64% Bytes Access: 61
- Notes
- When capturing and displaying at 720p60 the Bus load for the capture is about 33% which is 640 MB/s
- When capturing at 720p60 and displaying at 1080p60 the Bus load for the capture is ~ 37% which is 952 MB/s, this is because besides the capture the frame needs to be resized.
Comments
- When the system is running out of memory bandwidth you can start experimenting with problems with the HDMI monitor which gives you the warning message imx-ipuv3 imx-ipuv3.0: IPU Warning - IPU_INT_STAT_10 = 0x00100000. This means a synchronous display error interrupt, as a result of an error during access to a synchronous display. one solution to this issue is reducing the use case be reducing the display resolution, the needed framerate, frame buffer bits per pixel or any other measure that helps to reduce the memory bandwidth utilization.
For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.
Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.