Exploring TensorFlow Lite delegates for prototyping
Getting started with AI on NXP i.MX8M Plus RidgeRun documentation is currently under development. |
NNAPI Delegate
With this delegate, we are able to ship the inference to the NPU accelerator. You need to ensure your model supports 8 or 16 bits quantization, otherwise, the NNAPI will send the unsupported operation back to the CPU, executing a CPU fallback, decreasing the performance of the entire inference time execution, as we will see in further sections. Also, you had to enable the TensorFlow Lite construction with NNAPI -DTFLITE_ENABLE_NNAPI=on flag for this step.
As we saw in the Cross-compiling apps for GStreamer, TensorFlow_Lite, and OpenCV in the minimal TensorFlow Lite example, then the process of delegating is related to adding the following lines before allocating the input tensors:
// <Your includes> // The required includes for NNAPI: #include "tensorflow/lite/delegates/nnapi/nnapi_delegate.h" #include "tensorflow/lite/tools/delegates/delegate_provider.h" void inference(){ // <Interpreter construction> // NNAPI construction: tflite::StatefulNnApiDelegate::Options options; options.allow_fp16 = true; options.allow_dynamic_dimensions = true; options.disallow_nnapi_cpu = false; options.accelerator_name = "vsi-npu"; auto delegate = tflite::evaluation::CreateNNAPIDelegate(options); if (!delegate){ std::cout << "NNAPI delegate is not well created \n" << std::endl; return ; } else { // Modifying the graph to support NNAPI operations: interpreter->ModifyGraphWithDelegate(std::move(delegate)); // Allocating the input thensors: TFLITE_MINIMAL_CHECK2(interpreter->AllocateTensors() == kTfLiteOk); // <Feed your tensors> } }
Bonus XNNPACK Delegate