Exploring TensorFlow Lite delegates for prototyping

NNAPI Delegate

With this delegate, we are able to ship the inference to the NPU accelerator. You need to ensure your model supports 8 or 16 bits quantization, otherwise, the NNAPI will send the unsupported operation back to the CPU, executing a CPU fallback, decreasing the performance of the entire inference time execution, as we will see in further sections. Also, you had to enable the TensorFlow Lite construction with NNAPI -DTFLITE_ENABLE_NNAPI=on flag for this step.

As we saw in the Cross-compiling apps for GStreamer, TensorFlow_Lite, and OpenCV in the minimal TensorFlow Lite example, then the process of delegating is related to adding the following lines before allocating the input tensors:


// <Your includes>

// The required includes for NNAPI:
#include "tensorflow/lite/delegates/nnapi/nnapi_delegate.h"
#include "tensorflow/lite/tools/delegates/delegate_provider.h"

void inference(){
      
      // <Interpreter construction>

      // NNAPI construction:
      tflite::StatefulNnApiDelegate::Options options;
      options.allow_fp16 = true;
      options.allow_dynamic_dimensions = true;
      options.disallow_nnapi_cpu = false;
      options.accelerator_name = "vsi-npu";
      auto delegate = tflite::evaluation::CreateNNAPIDelegate(options);
      
      if (!delegate){
        std::cout << "NNAPI delegate is not well created \n" << std::endl;
        return ; 
      } else {
        // Modifying the graph to support NNAPI operations: 
        interpreter->ModifyGraphWithDelegate(std::move(delegate));
        // Allocating the input thensors:
        TFLITE_MINIMAL_CHECK2(interpreter->AllocateTensors() == kTfLiteOk);

        // <Feed your tensors>
      }
}

Bonus XNNPACK Delegate

Previous: Development/Developing software for the board/Crosscompiling apps for GStreamer, TensorFlow Lite, and OpenCV

Index

Next: _Neural Processing Unit

❯