Qualcomm Robotics RB5/RB6 - AI Hardware Acceleration

From RidgeRun Developer Wiki


Index






The Qualcomm Robotics RB5/RB6 (QRB5165) platform offers SDKs and software tools that allow the execution of Machine Learning (ML) applications. The ML applications use the board's CPU but can also be executed using dedicated hardware processing units such as the Adreno GPU or Hexagon DSP with the Hexagon Tensor Accelerator (HTA). To learn more about these hardware components, you can check our section SoM Overview for more information. In this section, we will provide an overview of the tools available for running ML applications.

Qualcomm Neural Processing SDK for AI[1]

The RB5/RB6 platform offers a software accelerated inference-only runtime engine for running deep neural networks, which can be used for the following:

  • Running a Deep Neural Network on the hardware the user chooses such as the CPU, GPU, Hexagon DSP with the Hexagon Tensor Accelerator (HTA).
  • Debugging of the networks execution.
  • Convert models from multiple frameworks such as Caffe, Caffe2, ONNX and TensorFlow to Deep Learning Container (DLC), the SDK's native format.
  • Debugging and analyzing the performance of the network.
  • Integrate a network into applications in C++ or Java.

As a developer, you can create and train your own model. Once the model is ready with static weights, it can be converted to DLC for it to work with the Neural Processing SDK. The DLC files can be optimized with quantization or compression techniques and be used either for developing an application with the SDK's C++ or Java API or be executed in a GStreamer pipeline using the qtimlesnpe plugin.

qtimlesnpe[2]

This element from the QTI plugins exposes the capabilities of the Qualcomm Neural Processing SDK to GStreamer. The qtimlesnpe plugin can load and execute AI models and also supports preprocessing such as downscale, color convert, mean subtraction, and padding. On postprocessing, it supports the most popular use cases like classification, detection, and segmentation. The result of the postprocessing is attached as machine learning metadata (MLMeta) to the GST buffer of the video frame. The element can be configured with a file in JSON format or with GST properties, but GST has higher priority. It can also configure the Qualcomm Neural Processing SDK to offload model computation to the DSP, CPU, GPU or HTA.

TensorFlow[3]

The RB5/RB6 platform also supports acceleration in TFLite models on Hexagon DSPs, GPU and CPU, via NNAPI (Android Neural Network API). This Android specific API has been ported to run in the QRB5/RB6. You can train a model in TensorFlow, then convert it to tflite using TFLite converter. Finally, you can provide the model to the NNAPI runtime, that now can offload the model to hardware specific units. QTI also has its own GStreamer plugin element qtimletflite to exercise TFLite use cases.

qtimletflite[4]

This element from the QTI plugins exposes the capabilities of TensorFlow Lite to GStreamer. The qtimletflite plugin can load and execute TFLite models and also supports preprocessing such as downscale, color convert, mean subtraction and padding. On the postprocessing, it supports the most popular model types like classification, detection, and segmentation. The result of the postprocessing in attached as machine learning metadata (MLMeta) to the GST buffer of the video frame. The element can be configured with a file in JSON format or with GST properties, but GST has a higher priority.

References

  1. Qualcomm Neural Processing SDK. Retrieved February 28, 2023, from [1]
  2. qtimlesnpe. Retrieved February 28, 2023, from [2]
  3. TensorFlow. Retrieved February 8, 2023, from [3]
  4. qtimletflite. Retrieved February 28, 2023, from [4]


Index