Hailo Dataflow Compiler
The Hailo Dataflow Compiler toolchain enables users to generate a Hailo Executable Format (HEF) binary file from various inputs, including TensorFlow checkpoints, frozen TensorFlow graphs, TFLite files, and ONNX files. The build process consists of several stages: translating the original model into a Hailo-compatible format, optimizing model parameters, and compiling the model into the final binary file.
The diagram illustrates the model-building process, starting with a TensorFlow or ONNX model and culminating in the Hailo binary (HEF).
As illustrated in the figure, the model-building process consists of several key steps:
1. TensorFlow and ONNX Translation
The process begins by converting the user’s original model into a Hailo-compatible format. The translation API takes the model and generates an internal Hailo representation (HAR compressed file), which includes an HN model file (in JSON format) and a NumPy NPZ file containing the weights.
2. Profiler
The Profiler tool utilizes the HAR file to assess the model’s expected performance on the hardware. This profiling evaluates the required devices, hardware resource utilization, throughput (frames per second), and provides a detailed breakdown for each layer of the model.
3. Emulator
The emulator allows users to perform inference on their model without needing actual hardware. It operates in two modes:
Native Mode
Runs the model with float32 parameters for validating the translation process and calibration.
Quantized Mode
Simulates hardware implementation to analyze the accuracy of the optimized model.
4. Model Optimization
During model optimization, parameters are converted from float32 to int8. This is achieved by running the model in native mode on a small set of images to collect activation statistics. The calibration module then generates a new network configuration for the 8-bit representation, including int8 weights, biases, scaling, and hardware configuration.
5. Compiling the Model into a Binary Image
The model is compiled into a hardware-compatible binary format (HEF). The Dataflow Compiler Tool allocates hardware resources to maximize frames per second while balancing resource allocation. The compilation process, including microcode generation, is automated and can be initiated with a single API call.
6. Dataflow Compiler Studio (Preview - Parsing Stage Only)
The Dataflow Compiler Studio allows users to parse and visualize neural network graphs. Users can upload ONNX or TFLite files, and the tool suggests start and end nodes for parsing. The GUI offers a side-by-side comparison of Hailo’s parsed graph and the original graph, enabling users to adjust and re-parse as necessary to meet specific requirements.
7. Deployment Process
After compilation, the model is ready for inference on the target device. The HailoRT library, accessible via C/C++, Python APIs, and command-line tools, provides the necessary interface to load and run the model. Depending on the device and connection type (e.g., PCIe or Ethernet), the library employs various communication methods to interact with the device. The HailoRT library can be installed on the same machine as the Dataflow Compiler or on a separate machine, with a Yocto layer provided for easy integration into embedded environments.