NXP i.MX95 - Tools - Arm Performance Studio
NXP i.MX95 RidgeRun documentation is currently under development. |
Arm Performance Studio is a free suite of profiling tools for performance optimization of applications running on devices with Arm CPUs, and Arm GPUs. Arm Performance Studio is the new name for Arm Mobile Studio. These tools includes:
- Streamline: Capture a performance profile, using all the CPU, GPU, and memory system performance data in the system
- Performance Advisor: Generates an easy-to-read performance report from an annotated Streamline profile
- Frame Advisor: Capture the API calls and rendering and get comprehensive geometry metrics
- Mali Offline Compiler: Compile your shader programs and check how they will perform across on any Mali GPUs
- RenderDoc for Arm GPUs: Tool for debugging Vulkan graphics applications
Performance Advisor and Frame Advisor are only available for Android devices.
Notation
This section provides commands to be executed either on the target board or the host computer. For clarity, please note the following notation:
Commands in this color are to be executed on the host computer
Commands in this color are to be executed on the target board
Installation
1. Download the software from this page.
2. Install Arm Performance Studio with the following command:
tar xvzf Arm_Performance_Studio_<version>_linux.tgz
This is all you need to do, this will generate the directory Arm_Performance_Studio_<version>
with all the tools, including the necessary binaries already. You should see the following contents in the directory:
frame_advisor java license_terms mali_offline_compiler readme.txt renderdoc_for_arm_gpus streamline
The tools binaries are located in the following paths:
- Streamline:
<Install directory>/Arm_Performance_Studio_<version>/streamline/Streamline
- Performance Advisor:
<Install directory>/Arm_Performance_Studio_<version>/streamline/Streamline-cli
- Frame Advisor:
<Install directory>/Arm_Performance_Studio_<version>/frame_advisor/FrameAdvisor-gui
- Mali Offline Compiler:
<Install directory>/Arm_Performance_Studio_<version>/mali_offline_compiler/malioc
- RenderDoc for Arm GPUs:
<Install directory>/Arm_Performance_Studio_<version>/renderdoc_for_arm_gpus/bin/renderdoccmd
,<Install directory>/Arm_Performance_Studio_<version>/renderdoc_for_arm_gpus/bin/qrenderdoc
Streamline
Target preparation
Some kernel options are required. The kernel configuration are usually located at /proc/config.gz
, if the file is not visible, using this command you can create it:
sudo modprobe configs
The options you need in your kernel configuration are:
- General Setup -> Profiling Support (CONFIG_PROFILING)
- General Setup -> Kernel Performance Events And Counters -> Kernel performance events and counters (CONFIG_PERF_EVENTS)
- General Setup -> Timers subsystem -> High Resolution Timer Support (CONFIG_HIGH_RES_TIMERS)
- Kernel Features -> Enable hardware performance counter support for perf events (CONFIG_HW_PERF_EVENTS).
You can verify if the options are enabled with this command:
zcat /proc/config.gz | grep <OPTION> # For example zcat /proc/config.gz | grep CONFIG_PROFILING
You should see an output like the following:
CONFIG_PROFILING=y
Otherwise, activate them using menuconfig
, if you are using Yocto follow these steps:
# Within your Yocto environment bitbake -c menuconfig virtual/kernel
You should see a user interface like the following:
Navigate through the menu options until you find the one you want to enable. For example, to enable CONFIG_PROFILING
, go to General Setup
, locate Profiling Support
, and select the option by marking it with Y
. Ensure the option is marked with an asterisk (*). Once you're finished, choose Save and then Exit.
You can now rebuild your kernel and update the image on your target board. Use the following command to build the kernel:
bitbake virtual/kernel
anonymous: What should I doo if any of the options is not enabled? How do I enable it? (please remove this box when addressed) |
Install gatord
A target agent is required to run on the Arm Linux target in order for Arm Streamline to operate. This agent is gator, and gatord is the daemon you need to execute in your target board.
The pre-built gatord binaries are available in the Arm Performance Studio under the path: <Install directory>/streamline/bin/linux
. Copy the gatord binary to your target board and ensure you have execution permission, use this command to add this permission:
chmod +x gatord
Now you can execute the binary with the following command:
./gatord -a
(Optional) By default, gatord uses port 8080, but you can specify a different port by using the -p
flag.
./gatord -a -p 5050
Capture a Streamline profile
You are now ready to capture the CPU and GPU metrics from Streamline, to do this execute these commands to start Streamline in the host computer:
cd <Install directory>/streamline/ ./Streamline
The Streamline window will open and you should see something like in the following picture.
In the Start tab, in the option Select device type you have to select TCP. Just below, in the Select Target option, make sure to mark the box Enter target details: and write the target IP followed by the port used by gatord in the form: <Target IP>:<Port>
.
You can check the target IP address using this command:
ifconfig
And the result should be similar to this:
... eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.100.79 netmask 255.255.255.0 broadcast 192.168.100.255 inet6 fe80::622:4478:31d9:c881 prefixlen 64 scopeid 0x20<link> ether f8:dc:7a:e4:45:e2 txqueuelen 1000 (Ethernet) RX packets 35053 bytes 4051181 (3.8 MiB) RX errors 0 dropped 340 overruns 0 frame 0 TX packets 19008 bytes 254895043 (243.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ...
If using the gatord default port, for this example you have to write: 192.168.100.79:8080
. You can optionally specify the command to execute in the board within the Configure application section, if no command is specified, it can be executed directly from the target.
Streamline gathers numerous events from the target. Before initiating the capture, choose the appropriate template to select and organize the necessary events effectively. This is shown in the next picture.
Click on the bottom left button Select counters. In the window displayed, click on the button at the upper right corner marked in the figure above. Select the appropriate template, whether you want to profile the CPU of the GPU workload, for the i.MX95 GPU select Mali-G310-Valhall, and click on Save. Now you can start the capture by clicking in the Start capture button located at the bottom right corner, it is marked in the next picture.
When the capture starts, you should see a similar result like in the following picture, based on the template selected.
The metrics graphs start to display in the central part of the window. While in the lower part, the CPU usage of the processes present in the target is displayed. In the upper left corner the capture control buttons are located, marked in the picture below, these buttons from left to right are:
- Save the current capture and restart
- Restart capturing, discarding the current capture
- Stop capture from target and analyze collected data
- Stop capture from target and discard collected data
Mali Offline Compiler
The Mali Offline Compiler is a command-line tool designed to compile shaders and kernels from OpenGL ES, Vulkan, and OpenCL. It generates performance reports that offer clear insights into the expected performance and potential bottlenecks of your shader programs. As a static analysis tool, the Mali Offline Compiler doesn’t require a device connected. You can produce reports for any supported Mali GPU, enabling you to anticipate shader performance on devices you may not have on hand. Check the available GPU list here.
To use the tool simply execute these commands:
cd <Install directory>/mali_offline_compiler ./malioc [--opengles|--vulkan|--opencl] -c Mali-G310 <input file>
Depending on the shader type, you have to specify if it is from OpenGL ES, Vulkan or OpenCL. The tool also comes with some samples located in <Install directory>/mali_offline_compiler/samples/
, for example:
./malioc --opengles -c Mali-G310 samples/opengles/shader.comp
You should see an output like the following:
Mali Offline Compiler v8.4.1 (Build 215478) Copyright (c) 2007-2024 Arm Limited. All rights reserved. Configuration ============= Hardware: Mali-G310 r0p0 Architecture: Valhall Driver: r48p0-00rel0 Shader type: OpenGL ES Compute Compiler messages ================= INFO: Mali-G310 reports assume a 32 FMA/cycle and 4 Texture op/cycle core configuration. Other core configurations might have 16-64 FMA/cycle and 2-8 Texture op/cycle. Main shader =========== Work registers: 64 (100% used at 50% occupancy) Uniform registers: 54 (42% used) Shared storage: 0 bytes Stack spilling: false 16-bit arithmetic: 0% A LS T Bound Total instruction cycles: 13.95 2.00 0.00 A Shortest path cycles: 0.08 0.00 0.00 A Longest path cycles: 13.95 2.00 0.00 A A = Arithmetic, LS = Load/Store, T = Texture Shader properties ================= Has uniform computation: true