Setting the board on debug mode for Profiling the NPU in NXP i.MX8M Plus board

i.MX8M Plus debug mode

Before starting any application that you know is using the NPU, you can set up some flags in order to put the board in verbose mode, then you will have some extra outputs from the NPU driver, telling you how much time a certain model operation takes, or when is executing a CPU fallback due to some incompatible model operation.

This process is about exporting the following flags:

export CNN_PERF=1 NN_EXT_SHOW_PERF=1 VIV_VX_DEBUG_LEVEL=1 VIV_VX_PROFILE=1

After this, you can execute your accelerated application, and depending on the model, output like the following will be shown:

...

layer id: 12 layer name:TensorAdd operation[0]:VXNNE_OPERATOR_TENSOR_ADD target:VXNNE_OPERATION_TARGET_SH.
uid: 4
op_abs_id: 10
layer id: 15 layer name:TensorCopy operation[0]:unkown operation type target:VXNNE_OPERATION_TARGET_SH.
uid: 15
op_abs_id: 19
shader kernel name: tensorCopy_F16toF32_2D
execution time:               120 us

prev_ptrs = 0xaaaafdaf9a40
prev_ptrs = 0xaaab003adc80
prev_ptrs = 0xaaab001cae00
prev_ptrs = 0xaaab001cee00
prev_ptrs = 0xaaab001d0e40
prev_ptrs = 0xaaab001d2e80
prev_ptrs = 0xaaab001d4280
prev_ptrs = 0xaaab001d5680
prev_ptrs = 0xaaab001d6480
Releasing object array 0xaaab001f3740
Releasing object array 0xaaab004a17c0
Releasing object array 0xaaab004b1890
Releasing object array 0xaaab004c1960
prev_ptrs = 0xaaab007f0400
prev_ptrs = 0xaaab007f3140
prev_ptrs = 0xaaab007f5f40
prev_ptrs = 0xaaab007f8d00
prev_ptrs = 0xaaab007fbac0
Releasing object array 0xaaab0080e3b0
Releasing object array 0xaaab0081f120
Exit VX Thread: 0x83695120

...

Some execution time for each operations, unknown operations that will implicate on CPU fallback, and modified registers are specified in order to understand deeper the performance of your model on the NPU.

Previous: Neural Processing Unit/Profiling the NPU/Installing gputop

Index

Next: Neural Processing Unit/Use Case experiments: Smart Parking

❯