Hailo Dataflow Compiler

< Hailo‎ | Hailo-8
Revision as of 16:43, 16 December 2024 by Spalli (talk | contribs) (Created page with "<noinclude> {{Hailo/Head |previous=https://developer.ridgerun.com/wiki/index.php/Hailo/Hailo-8/AI_Software_and_Tools/Hailo_AI_Software_Suite |next=https://developer.ridgerun.com/wiki/index.php/Hailo/Hailo-8/AI_Software_and_Tools/Hailo_AI_Software_Suite_Installation |title=Hailo Dataflow Compiler }} </noinclude> ==Hailo Dataflow Compiler== The Hailo Dataflow Compiler toolchain enables users to generate a Hailo Executable Format (HEF) binary file from various inputs, incl...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)





Hailo Dataflow Compiler

The Hailo Dataflow Compiler toolchain enables users to generate a Hailo Executable Format (HEF) binary file from various inputs, including TensorFlow checkpoints, frozen TensorFlow graphs, TFLite files, and ONNX files. The build process consists of several stages: translating the original model into a Hailo-compatible format, optimizing model parameters, and compiling the model into the final binary file.

The diagram illustrates the model-building process, starting with a TensorFlow or ONNX model and culminating in the Hailo binary (HEF).

 
Model building process using the Hailo Dataflow Compiler. Source: Hailo Dataflow Compiler User Guide

As illustrated in the figure, the model-building process consists of several key steps:

1. TensorFlow and ONNX Translation

The process begins by converting the user’s original model into a Hailo-compatible format. The translation API takes the model and generates an internal Hailo representation (HAR compressed file), which includes an HN model file (in JSON format) and a NumPy NPZ file containing the weights.

2. Profiler

The Profiler tool utilizes the HAR file to assess the model’s expected performance on the hardware. This profiling evaluates the required devices, hardware resource utilization, throughput (frames per second), and provides a detailed breakdown for each layer of the model.

3. Emulator

The emulator allows users to perform inference on their model without needing actual hardware. It operates in two modes:

Native Mode

Runs the model with float32 parameters for validating the translation process and calibration.

Quantized Mode

Simulates hardware implementation to analyze the accuracy of the optimized model.

4. Model Optimization

During model optimization, parameters are converted from float32 to int8. This is achieved by running the model in native mode on a small set of images to collect activation statistics. The calibration module then generates a new network configuration for the 8-bit representation, including int8 weights, biases, scaling, and hardware configuration.

5. Compiling the Model into a Binary Image

The model is compiled into a hardware-compatible binary format (HEF). The Dataflow Compiler Tool allocates hardware resources to maximize frames per second while balancing resource allocation. The compilation process, including microcode generation, is automated and can be initiated with a single API call.

6. Dataflow Compiler Studio (Preview - Parsing Stage Only)

The Dataflow Compiler Studio allows users to parse and visualize neural network graphs. Users can upload ONNX or TFLite files, and the tool suggests start and end nodes for parsing. The GUI offers a side-by-side comparison of Hailo’s parsed graph and the original graph, enabling users to adjust and re-parse as necessary to meet specific requirements.

7. Deployment Process

After compilation, the model is ready for inference on the target device. The HailoRT library, accessible via C/C++, Python APIs, and command-line tools, provides the necessary interface to load and run the model. Depending on the device and connection type (e.g., PCIe or Ethernet), the library employs various communication methods to interact with the device. The HailoRT library can be installed on the same machine as the Dataflow Compiler or on a separate machine, with a Yocto layer provided for easy integration into embedded environments.