NXP i.MX95 - Tools - Arm Performance Studio

From RidgeRun Developer Wiki



Previous: Tools Index Next: Contact_Us





Arm Performance Studio is a free suite of profiling tools for performance optimization of applications running on devices with Arm CPUs, and Arm GPUs. Arm Performance Studio is the new name for Arm Mobile Studio. These tools includes:

  • Streamline: Capture a performance profile, using all the CPU, GPU, and memory system performance data in the system
  • Performance Advisor: Generates an easy-to-read performance report from an annotated Streamline profile
  • Frame Advisor: Capture the API calls and rendering and get comprehensive geometry metrics
  • Mali Offline Compiler: Compile your shader programs and check how they will perform across on any Mali GPUs
  • RenderDoc for Arm GPUs: Tool for debugging Vulkan graphics applications

Performance Advisor and Frame Advisor are only available for Android devices.


Info
Find all the documentation related to Arm Performance Studio here.


Notation

This section provides commands to be executed either on the target board or the host computer. For clarity, please note the following notation:

Commands in this color are to be executed on the host computer
Commands in this color are to be executed on the target board

Installation

1. Download the software from this page.

2. Install Arm Performance Studio with the following command:

tar xvzf Arm_Performance_Studio_<version>_linux.tgz

This is all you need to do, this will generate the directory Arm_Performance_Studio_<version> with all the tools, including the necessary binaries already. You should see the following contents in the directory:

frame_advisor
java
license_terms
mali_offline_compiler
readme.txt
renderdoc_for_arm_gpus
streamline

The tools binaries are located in the following paths:

  • Streamline: <Install directory>/Arm_Performance_Studio_<version>/streamline/Streamline
  • Performance Advisor: <Install directory>/Arm_Performance_Studio_<version>/streamline/Streamline-cli
  • Frame Advisor: <Install directory>/Arm_Performance_Studio_<version>/frame_advisor/FrameAdvisor-gui
  • Mali Offline Compiler: <Install directory>/Arm_Performance_Studio_<version>/mali_offline_compiler/malioc
  • RenderDoc for Arm GPUs: <Install directory>/Arm_Performance_Studio_<version>/renderdoc_for_arm_gpus/bin/renderdoccmd, <Install directory>/Arm_Performance_Studio_<version>/renderdoc_for_arm_gpus/bin/qrenderdoc

Streamline

Target preparation

Some kernel options are required. The kernel configuration are usually located at /proc/config.gz, if the file is not visible, using this command you can create it:

sudo modprobe configs

The options you need in your kernel configuration are:

  • General Setup -> Profiling Support (CONFIG_PROFILING)
  • General Setup -> Kernel Performance Events And Counters -> Kernel performance events and counters (CONFIG_PERF_EVENTS)
  • General Setup -> Timers subsystem -> High Resolution Timer Support (CONFIG_HIGH_RES_TIMERS)
  • Kernel Features -> Enable hardware performance counter support for perf events (CONFIG_HW_PERF_EVENTS).


Info
If you can't find this option in menuconfig, verify instead that the option Device Drivers -> Performance monitor support -> ARM PMU framework (CONFIG_ARM_PMU) is enabled, CONFIG_HW_PERF_EVENTS is enabled by default but has this dependency.


You can verify if the options are enabled with this command:

zcat /proc/config.gz | grep <OPTION>

# For example

zcat /proc/config.gz | grep CONFIG_PROFILING

You should see an output like the following:

CONFIG_PROFILING=y

Otherwise, activate them using menuconfig, if you are using Yocto follow these steps:

# Within your Yocto environment
bitbake -c menuconfig virtual/kernel

You should see a user interface like the following:

Fig. 1. First screen using menuconfig.

Navigate through the menu options until you find the one you want to enable. For example, to enable CONFIG_PROFILING, go to General Setup, locate Profiling Support, and select the option by marking it with Y. Ensure the option is marked with an asterisk (*). Once you're finished, choose Save and then Exit.

Fig. 2. Activation of the kernel option CONFIG_PROFILING.

You can now rebuild your kernel and update the image on your target board. Use the following command to build the kernel:

bitbake virtual/kernel

Install gatord

A target agent is required to run on the Arm Linux target in order for Arm Streamline to operate. This agent is gator, and gatord is the daemon you need to execute in your target board.

The pre-built gatord binaries are available in the Arm Performance Studio under the path: <Install directory>/streamline/bin/linux. Copy the gatord binary to your target board and ensure you have execution permission, use this command to add this permission:

chmod +x gatord

Now you can execute the binary with the following command:

./gatord -a


Info
The -a (or --allow-command) flag allows to execute a command in Streamline, from the host computer.


(Optional) By default, gatord uses port 8080, but you can specify a different port by using the -p flag.

./gatord -a -p 5050

Capture a Streamline profile

You are now ready to capture the CPU and GPU metrics from Streamline, to do this execute these commands to start Streamline in the host computer:

cd <Install directory>/streamline/
./Streamline

The Streamline window will open and you should see something like in the following picture.


Fig. 3. Out of the box view of Streamline.


In the Start tab, in the option Select device type you have to select TCP. Just below, in the Select Target option, make sure to mark the box Enter target details: and write the target IP followed by the port used by gatord in the form: <Target IP>:<Port>.


Attention
The target board and host computer must be in the same network


You can check the target IP address using this command:

ifconfig

And the result should be similar to this:

...

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.100.79  netmask 255.255.255.0  broadcast 192.168.100.255
        inet6 fe80::622:4478:31d9:c881  prefixlen 64  scopeid 0x20<link>
        ether f8:dc:7a:e4:45:e2  txqueuelen 1000  (Ethernet)
        RX packets 35053  bytes 4051181 (3.8 MiB)
        RX errors 0  dropped 340  overruns 0  frame 0
        TX packets 19008  bytes 254895043 (243.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

...

If using the gatord default port, for this example you have to write: 192.168.100.79:8080. You can optionally specify the command to execute in the board within the Configure application section, if no command is specified, it can be executed directly from the target.


Fig. 4. TCP capture configuration.


Streamline gathers numerous events from the target. Before initiating the capture, choose the appropriate template to select and organize the necessary events effectively. This is shown in the next picture.


Fig. 5. Counters template selection.


Click on the bottom left button Select counters. In the window displayed, click on the button at the upper right corner marked in the figure above. Select the appropriate template, whether you want to profile the CPU of the GPU workload, for the i.MX95 GPU select Mali-G310-Valhall, and click on Save. Now you can start the capture by clicking in the Start capture button located at the bottom right corner, it is marked in the next picture.


Fig. 6. Start capture.


When the capture starts, you should see a similar result like in the following picture, based on the template selected.


Fig. 7. Capture screen.


The metrics graphs start to display in the central part of the window. While in the lower part, the CPU usage of the processes present in the target is displayed. In the upper left corner the capture control buttons are located, marked in the picture below, these buttons from left to right are:

  • Save the current capture and restart
  • Restart capturing, discarding the current capture
  • Stop capture from target and analyze collected data
  • Stop capture from target and discard collected data

Mali Offline Compiler

The Mali Offline Compiler is a command-line tool designed to compile shaders and kernels from OpenGL ES, Vulkan, and OpenCL. It generates performance reports that offer clear insights into the expected performance and potential bottlenecks of your shader programs. As a static analysis tool, the Mali Offline Compiler doesn’t require a device connected. You can produce reports for any supported Mali GPU, enabling you to anticipate shader performance on devices you may not have on hand. Check the available GPU list here.

To use the tool simply execute these commands:

cd <Install directory>/mali_offline_compiler

./malioc [--opengles|--vulkan|--opencl] -c Mali-G310 <input file>

Depending on the shader type, you have to specify if it is from OpenGL ES, Vulkan or OpenCL. The tool also comes with some samples located in <Install directory>/mali_offline_compiler/samples/, for example:

./malioc --opengles -c Mali-G310 samples/opengles/shader.comp

You should see an output like the following:

Mali Offline Compiler v8.4.1 (Build 215478)
Copyright (c) 2007-2024 Arm Limited. All rights reserved.

Configuration
=============

Hardware: Mali-G310 r0p0
Architecture: Valhall
Driver: r48p0-00rel0
Shader type: OpenGL ES Compute

Compiler messages
=================

INFO: Mali-G310 reports assume a 32 FMA/cycle and 4 Texture op/cycle core configuration.
      Other core configurations might have 16-64 FMA/cycle and 2-8 Texture op/cycle.

Main shader
===========

Work registers: 64 (100% used at 50% occupancy)
Uniform registers: 54 (42% used)
Shared storage: 0 bytes
Stack spilling: false
16-bit arithmetic: 0%

                                A      LS       T    Bound
Total instruction cycles:   13.95    2.00    0.00        A
Shortest path cycles:        0.08    0.00    0.00        A
Longest path cycles:        13.95    2.00    0.00        A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: true


Info
Watch this Arm Training video for a detailed explanation on how to interpret this information.




Previous: Tools Index Next: Contact_Us