Testing ML demos
The NXP i.MX95 Technical Guide documentation from RidgeRun is presently being developed. |
Introduction
This guide shows how to run a basic Machine Learning demo using the NXP eIQ stack on the Verdin iMX95.
The goal is to validate that the Machine Learning environment is correctly set up and that hardware acceleration (NPU) is working as expected.
About NXP eIQ
The NXP eIQ stack provides a complete environment for developing and deploying Machine Learning applications on i.MX processors. It includes inference engines, compilers, optimized libraries, and tooling required to run neural networks efficiently on embedded systems.
On supported platforms like the Verdin iMX95, inference can be offloaded to dedicated hardware such as the NPU using the OpenVX-based delegates. If needed, workloads can also fall back to CPU execution.
Prerequisites
- A Yocto image with eIQ already integrated
Running the Demo
Once the system is up and running with eIQ support, we are going to use the built-in TensorFlow Lite examples to quickly verify inference on the NPU.
Navigate to the examples directory on the target:
cd /usr/bin/tensorflow-lite-2.16.2/examples/
Note: The TensorFlow Lite version may differ depending on your image. Adjust the path if needed.
The demo uses a pre-trained MobileNet V1 model to classify the sample image grace_hopper.bmp.
To run inference on the NPU (recommended for Verdin iMX95), execute:
./label_image \ -m mobilenet_v1_1.0_224_quant.tflite \ -i grace_hopper.bmp \ -l labels.txt \ --external_delegate_path=/usr/lib/libneutron_delegate.so
Performance (Verdin iMX95)
Running this demo on the Verdin iMX95 with NPU acceleration yields the following results:
| Backend | Inference Time | FPS |
|---|---|---|
| NPU | 0.16 ms | 6172.84 |
| CPU | 13.20 ms | 75.75 |
These results highlight the significant performance improvement when using the NPU compared to CPU-based inference.
Notes
- The provided demo is part of the eIQ package and is useful for quick validation of the ML stack.
- The MobileNet V1 model is commonly used for benchmarking due to its balance between accuracy and performance.