‎

Table of Contents [Sticky]

Hailo-8
- Hailo-8 Overview
- Software and Tools

Hailo-15

Model Scripts

Model Scripts allow you to customize the behavior of the Dataflow Compiler and make adjustments to your model. While it’s generally recommended to optimize and compile using the default configuration—whether through the CLI tools or Python APIs—Model Scripts provide additional flexibility by enabling changes to default settings.

Usage Examples

Command Line Interface (CLI)

To use a model script with the CLI, the `hailo optimize` command includes an optional argument for specifying a custom model script file:


hailo optimize <HAR_PATH> --model-script <MODEL_SCRIPT_PATH>

Python API

Using the Python API, you can load and apply a model script like this:


client_runner.load_model_script('model_script.alls')
client_runner.optimize(calib_dataset)
compiled_hef = client_runner.compile()

About Model Scripts

A model script is a text file that can be used with the Optimize or Compile functions to tailor the compilation process. This file includes a set of commands designed for various purposes, allowing you to fine-tune model behavior.

Full-Precision Optimization Stage

Overview

The Model Modification commands modify a parsed model to add transformations that were not part of the original ONNX/TF model. These modifications aim to reduce CPU load.

Example Modifications

Apply normalization at the inputs.
Convert formats or colors at the input stage.
Resize the input from source resolution to the model's resolution.
Add post-processing to your model (on supported architectures only).

Model Modification Commands

The following model modification commands are supported. Each command inserts a layer directly after the input layer, applied sequentially in the script. The last-executed command’s layer appears immediately after the input layer.

input_conversion
transpose
normalization
nms_postprocess
change_output_activation
logits_layer
set_seed
resize

Example Command Sequence

For instance:

 input_layer1 -> reshape_yuy2 -> norm_layer1

Command Explanations

Input Conversion

Purpose: Adds on-chip conversion of the input tensor, such as color and format conversions.

Types of Conversions

Color Conversions

yuv_full_range_to_rgb - Converts YUV to RGB with full range.
yuv_to_rgb / yuv601_to_rgb - Complies with ITU-R BT.601 standard.
yuv709_to_rgb - Complies with ITU-R BT.709 standard.
yuv_full_range_to_bgr - Converts YUV to BGR with full range.
yuv_to_bgr / yuv601_to_bgr - Complies with ITU-R BT.601 standard.
yuv709_to_bgr - Complies with ITU-R BT.709 standard.
bgr_to_rgb - Transposes between R and B channels.
rgb_to_bgr - Transposes between R and B channels.

Format Conversions

yuy2_to_hailo_yuv - Converts YUY2 to YUV.
nv12_to_hailo_yuv - Converts NV12 to YUV.
nv21_to_hailo_yuv - Converts NV21 to YUV.
i420_to_hailo_yuv - Converts i420 to YUV.
tf_rgbx_to_hailo_rgb - Converts RGBX to Hailo RGB format.

Examples

Basic Conversion:

rgb_layer = input_conversion(input_layer1, yuv_to_rgb)

Conversion with Optimization Inclusion:

yuv_layer = input_conversion(input_layer2, yuy2_to_hailo_yuv, emulator_support=True)

Transpose

Purpose: Transposes connected components of selected input layers to improve performance.

Usage:

Specific Layer:
```
transpose(input_layer1)
```
All Layers:
```
transpose()
```

Note: Not supported with SpaceToDepth or DepthToSpace layers.

Normalization

Purpose: Adds normalization layers with specified mean and standard deviation values.

Example:

# adding normalization layer with the parameters mean & std after the specified input layer. Multiple commands can be used to apply different normalization to each input layer.
norm_layer1 = normalization(mean_array, std_array, input_layer)  

 # adding normalization layers after all input layers. Return value should match the number of inputs in the network
norm_layer1, norm_layer2, ... = normalization(mean_array, std_array)

NMS Post-Processing

The NMS (Non-Maximum Suppression) post-processing can be configured using the nms_postprocess command. This tool helps in filtering and processing object detection results.

Basic Usage

Example command:

nms_postprocess('nms_config_file.json', meta_arch=ssd)

Configuration Options

Option 1: Basic Architecture Specification

Simply specify the architecture name using meta_arch argument

System will either:

Use auto-generated config from detected NMS structure

OR use default configuration if no structure is detected

Example:

nms_postprocess(meta_arch=ssd)

Option 2: Custom Configuration

Specify architecture name plus configuration arguments

Configurable parameters:

nms_scores_th - Score threshold

nms_iou_th - IoU threshold

image_dims - Image dimensions

classes - Number of classes

Example:

nms_postprocess(meta_arch=yolov5, image_dims=[512, 512], classes=70)

Option 3: Custom Config File

Provide both config file path and architecture name

Important: When using config file, all parameters must be in the file

Example:

nms_postprocess('config_file_path', meta_arch=centernet)

Default Configuration Files

Located in site-packages/hailo_sdk_client/tools/core_postprocess/core_postprocess:

default_nms_config_yolov5.json
default_nms_config_yolov6.json
default_nms_config_yolox.json
default_nms_config_yolo8.json
default_nms_config_centernet.json
default_nms_config_ssd.json
default_nms_config_yolov5_seg.json

Processing Modes

Neural Core Mode (nn_core)

Runs NMS post-processing on neural core

Supported architectures: YOLOv5, SSD, Centernet

Example:

nms_postprocess(meta_arch=ssd, engine=nn_core)

CPU Mode (cpu)

Runs NMS post-processing on CPU

Supported architectures: YOLOv5, YOLOv5 SEG, YOLOv8, SSD, YOLOX

Example:

nms_postprocess(meta_arch=yolov5_seg, engine=cpu, image_dims=[512, 512])

Auto Mode (auto)

Currently only supported for YOLOv5

Performs:

Bounding box decoding on neural core

Score threshold filtering on neural core

IoU filtering on CPU

Example:

nms_postprocess('config_file_path', meta_arch=yolov5, engine=auto)

Important Notes

Output Formats

Object Detection Models:

Format: [batch_size, num_classes, 5, num_proposals_per_class]
Axis 2 format: [y_min, x_min, y_max, x_max, score]

Instance Segmentation Models:

Format: [N, 1, num_max_proposals, 6 + image_dims[0] * image_dims[1]]
Last axis format: [y_min, x_min, y_max x_max, score, class, flattened masks]

Default Settings

Default nms_scores_th: 0.3

Default nms_iou_th for CPU mode: 0.6

Bounding Box Decoding

Can run without NMS using bbox_decoding_only=True

Example:

nms_postprocess(meta_arch=yolov5, engine=cpu, bbox_decoding_only=True)

Warning: CPU-based bbox decoding may impact performance

All decoded bounding boxes are normalized between 0 and 1

Change Output Activation

Purpose: Changes the activation function of output layers.

Example:

change_output_activation(output_layer, activation)

Logits Layer

Purpose: Adds a logits layer (e.g., Softmax, Argmax) after an output layer.

Example:

logits_layer1 = logits_layer(output_layer, softmax, 1)

Set Seed

Purpose: Sets the global random seed for reproducible results.

Example:

set_seed(seed=5)

Resize

Purpose: Resizes input/output tensors, either on-chip or on CPU, using bilinear interpolation by default.

Example:

Resize with default settings:

resize1 = resize(conv1, resize_shapes=[256,256], resize_method=bilinear, engine=nn_core)

Numerical Optimization Stage

Overview

Important:

The Optimization Level determines how aggressively algorithms are applied to enhance the accuracy of a quantized model. Higher optimization levels increase accuracy but require more time and system resources.
- Optimization levels: (might change every version)
  - -100 nothing is applied - all default algorithms are switched off
  - 0 - Equalization
  - 1 - Equalization + Iterative bias correction
  - 2 - Equalization + Finetune with 4 epochs & 1024 images
  - 3 - Equalization + Adaround with 320 epochs & 256 images on all layers
  - 4 - Equalization + Adaround with 320 epochs & 1024 images on all layers
The Compression Level sets the proportion of 4-bit layers in the model. Increasing the number of 4-bit layers improves the model's performance (FPS) but requires a high optimization level to compensate for any accuracy loss.
- Compression levels: (might change every version)
  - 0 - nothing is applied
  - 1 - auto 4bit is set to 0.2 if network is large enough (20% of the weights)
  - 2 - auto 4bit is set to 0.4 if network is large enough (40% of the weights)
  - 3 - auto 4bit is set to 0.6 if network is large enough (60% of the weights)
  - 4 - auto 4bit is set to 0.8 if network is large enough (80% of the weights)
  - 5 - auto 4bit is set to 1.0 if network is large enough (100% of the weights)

Example commands:

model_optimization_flavor(optimization_level=4)
model_optimization_flavor(compression_level=2)
model_optimization_flavor(optimization_level=2, compression_level=1)
model_optimization_flavor(optimization_level=2, batch_size=4)

Using Resolution Reduction in the Optimization stage allows the model to run at a lower spatial resolution, significantly reducing processing time.

resolution_reduction

Reduce the model resolution in all input layers in order to optimize the model more efficiently. Marginally affects accuracy. Not supported on models that contain Fully-connected, Matmul an Cross-correlation layers, or when the resolution is too small.

Example commands:

# This will enable the algorithm, optimizing over an input shape of [128, 128]
pre_quantization_optimization(resolution_reduction, shape=[128, 128])

Note:

This operation doesn't modify the structure of the model's graph

Parameters

Parameter	Values	Default	Required	Description
shape	[int, int]	None	False	The shape to reduce the model resolution to.
interpolation	{disabled, bilinear}	bilinear	False	Interpolation (default) requires dataset in the original model size, disabled required dataset in the reduced resolution.

resolution_reduction per-layer

Sub-command for configuring resolution reduction per input layer, affecting its connected component. Reduce the resolution in order to optimize more efficiently. Marginally affects accuracy. Not supported when containing Fully-connected, Matmul an Cross-correlation layers, or when the resolution is too small.

Example commands

# This will enable the algorithm for input_layer1 connected component, optimizing over an input shape of [128, 128]
pre_quantization_optimization(resolution_reduction, layers=input_layer1, shape=[128, 128])

Note:

This operation doesn't modify the structure of the model's graph

Parameters

Parameter	Values	Default	Required	Description
shape	[int, int]	None	False	The shape to reduce the component resolution to.
interpolation	{disabled, bilinear}	None	False	Interpolation (default) requires dataset in the original model size, disabled required dataset in the reduced resolution.

Advanced Commands

Precision Mode

The precision_mode field within the quantization_param command enables selective 16-bit precision for specific layers or outputs, which can improve model accuracy.

precision_mode

Precision mode sets the bits available for the layers' weights and activation representation. There are three precision modes that could be set on the model layers using a model script command:

a8_w8 - which means 8-bit activations and 8-bit weights. (This is the default)
a8_w4 - which means 8-bit activations and 4-bit weights. Can be used to reduce memory consumption.

Supported on all layers that have weights. Compression levels automatically assigns 4-bit to layers in the model, according to the level.

a16_w16 - set 16-bit activations and weights to improve accuracy results. Supported on three cases:
- On any output node (output_layer_X)
- On any supported node(s), see the list below
- On the full model, in case all its layers are supported (Hailo-8 family only)

Example commands

quantization_param(conv3, precision_mode=a8_w4) # A specific 4bit layer
quantization_param(output_layer1, precision_mode=a16_w16) # A specific 16bit output layer
quantization_param([conv1, maxpool2], precision_mode=a16_w16) # Multiple 16bit layers
model_optimization_config(compression_params, auto_16bit_weights_ratio=1) # Full 16-bit network, in case all layers are supported

16-bit precision supported layers

Activations
Average Pooling
Concat
Const Input
Convolution
Deconvolution
Depth to Space
Depthwise Convolution
Elementwise Add / Sub*
External Padding
Feature Shuffle
Feature Split
Fully Connected (dense) [its output(s) must also be 16-bit, or model output layers]
Max Pooling
Normalization
Output Layer
Reduce Max*
Reduce Sum*
Resize*
Reshape
Shortcut
Slice
Space to Depth

Notes

Layers with (*) are supported as long as they are not part of a Softmax chain.
It is recommended to use Finetune when using 4-bit weights.

Example:

quantization_param(layer_name, precision_mode=16-bit)

Weights Clipping

This command allows modification of the weight clipping behavior for selected layers during quantization. It can help reduce quantization-related degradation, especially when dealing with outlier weight values. The command is applicable only to layers that have weights.

Modes

disabled: Disables weight clipping and ignores any previously set clipping values for the layer.
manual: Uses the specified clipping values as given.
percentile: Computes layer-wise percentiles (clipping values range from 0 to 100).
mmse: Ignores clipping values and applies Minimum Mean Square Estimators to clip the layer's weights.
mmse_if4b: Functions like mmse, but only applies clipping when the layer uses 4-bit weights. Clipping is disabled for 8-bit weights. This is the default behavior.

Example Commands

pre_quantization_optimization(weights_clipping, layers=[conv2], mode=manual, clipping_values=[-0.1, 0.8])
pre_quantization_optimization(weights_clipping, layers=[conv3], mode=percentile, clipping_values=[1.0, 99.0])
pre_quantization_optimization(weights_clipping, layers={conv*}, mode=mmse)
pre_quantization_optimization(weights_clipping, layers=[conv3, conv4], mode=mmse_if4b)
pre_quantization_optimization(weights_clipping, layers={conv*}, mode=disabled)

Note

The dynamic range of the weights remains symmetric, even when the clipping values are not symmetric.

Parameters

Parameter	Values	Default	Required	Description
mode	{disabled, manual, percentile, mmse, mmse_if4b}	mmse_if4b	True	Mode of operation, as described above
clipping_values	[float, float]	None	False	Clipping values, required when mode is manual or percentile

Activation Clipping

By default, model optimization does not apply activation clipping during quantization. This command allows you to modify this behavior for selected layers, enabling activation clipping when running the quantization API. Activation clipping can be particularly useful to reduce quantization-related degradation, especially when dealing with outlier activation values.

Modes

disabled: Disables activation clipping and ignores any previously set clipping values for the layer. This is the default mode.
manual: Uses the specified clipping values exactly as given.
percentile: Calculates the activation clipping values based on layer-wise percentiles (values range from 0 to 100).

Note

Activation clipping using percentiles requires multiple iterations to collect statistics, so quantization may take longer to complete when this mode is used.

Example Commands

pre_quantization_optimization(activation_clipping, layers=[conv1], mode=manual, clipping_values=[0.188, 1.3332])
pre_quantization_optimization(activation_clipping, layers=[conv1, conv2], mode=percentile, clipping_values=[0.5, 99.5])
pre_quantization_optimization(activation_clipping, layers={conv*}, mode=disabled)

Parameters

Parameter	Values	Default	Required	Description
mode	{disabled, manual, percentile}	disabled	True	Mode of operation, as described above
clipping_values	[float, float]	None	False	Clipping values, required when mode is manual or percentile
recollect_stats	bool	False	False	Indicates whether statistics should be recollected after applying the clipping

Global Average Pool Reduction

This command allows you to reduce the spatial dimensions of global average pooling layers by adding an additional average pooling layer. The kernel size of the added average pooling layer will be [1, h // division_factors[0], w // division_factors[1], 1], where h and w refer to the height and width of the input tensor, and `division_factors` are the scaling factors for these dimensions.

Example Commands

* pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[4, 4])
* # This will disable the reduction of avgpool1
* pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[1, 1])

Parameters

Parameter	Values	Default	Required	Description
division_factors	[int, int]	None	False	Specifies the scaling factors for the kernel height and width

Post-Quantization Commands

post_quantization_optimization

All the features of this command optimize the model after the quantization process.

Syntax:

post_quantization_optimization(<feature>, <**kwargs>)

Features

The following features are available with this command:

bias_correction
bias_correction per-layer
train_encoding
finetune
adaround
adaround per-layer
mix_precision_search

1. bias_correction

This sub-command allows configuring the global bias correction behavior during the post-quantization process. This command replaces the old ibc parameter from the quantize() API.

Example Command

# This will enable the IBC during the post-quantization
post_quantization_optimization(bias_correction, policy=enabled)

Notes

An in-depth explanation of the IBC algorithm can be found in the following paper: IBC Algorithm (PDF).
Bias correction is recommended when the model contains small kernels or depth-wise layers.

Parameters

Parameter	Values	Default	Required	Description
policy	{enabled, disabled}	disabled	False	Enable or disable the bias correction algorithm. When Optimization Level ≥ 1, it could be enabled by the default policy.
cache_compression	{enabled, disabled}	disabled	False	Enable or disable the compression of layer results when cached to disk.

2. bias_correction per-layer

This sub-command allows enabling or disabling the Iterative Bias Correction (IBC) algorithm on a per-layer basis. The allowed policy means the behavior is derived from the algorithm's configuration.

Example Commands

# This will enable IBC for a specific layer
post_quantization_optimization(bias_correction, layers=[conv1], policy=enabled)

# This will disable IBC for conv layers and enable it for the other layers
post_quantization_optimization(bias_correction, policy=enabled)
post_quantization_optimization(bias_correction, layers={conv*}, policy=disabled)

Parameters

Parameter	Values	Default	Required	Description
policy	{allowed, enabled, disabled}	allowed	False	Sets bias correction behavior for a given layer. (default is allowed)

3. train_encoding

The train_encoding sub-command allows fine-tuning the model during the post-quantization process.

Parameters

Parameter	Values	Default	Required	Description
policy	{enabled, disabled}	disabled	True	Enable or disable fine-tune training. When Optimization Level ≥ 1, this can be enabled by the default policy.
dataset_size	int; 0<x	1024	False	Number of images used for training. An exception is thrown if the supplied calibration set data stream falls short of this value.
batch_size	int; 0<x	None	False	Number of images used together in each training step. By default, it uses the calibration batch size. The value is determined by GPU memory constraints and algorithmic considerations.
epochs	int; 0≤x	8	False	Number of training epochs.
learning_rate	float	None	False	The base learning rate used for schedule calculation. Default value: `0.0002 / 8 * batch_size`. This parameter is key for experimentation to ensure convergence, especially for architectures differing from well-performing zoo examples.
def_loss_type	{ce, l2, l2rel, cosine}	l2rel	False	Default loss type to use if `loss_types` is not specified.
loss_layer_names	List of {str}	None	False	Names of layers for teacher-student losses. By default, these are the output nodes of the network.
loss_types	List of {ce, l2, l2rel, cosine}	None	False	Loss function types to apply to layers specified in `loss_layer_names`. Default: `def_loss_type`.
loss_factors	List of {float}	None	False	Weights for loss functions applied to respective layers in `loss_layer_names`. Default: 1 for all members.
native_layers	List of {str}	[]	False	Layers not quantized during training.
native_activations	{allowed, enabled, disabled}	enabled	False	Keep activations native during training.
val_images	int; 0≤x	4096	False	Number of validation images for evaluation between epochs.
val_batch_size	int; 0≤x	128	False	Batch size for validation steps.
stop_gradient_at_loss	bool	False	False	Stops gradient propagation after each loss layer.
force_pruning	bool	True	False	If true, forces zero weights to remain zero during training.

Advanced Parameters

Parameter	Values	Default	Required	Description
layers_to_freeze	List of {str}	[]	False	Freezes (prevents modification of weights and biases) any layer whose name includes an entry from this list.
lr_schedule_type	{cosine_restarts, exponential, constant}	cosine_restarts	False	Learning rate decay schedule type. Default is cosine decay.
decay_rate	float	0.5	False	Decay factor of the learning rate at the start of each "decay period."
decay_epochs	int; 0≤x	1	False	Duration of the "decay period" in epochs.
warmup_epochs	int; 0≤x	1	False	Duration of the warm-up period in epochs.
warmup_lr	float	None	False	Learning rate during the warm-up period. Defaults to 1/4 of the base learning rate.
optimizer	{adam, sgd, momentum, rmsprop}	adam	False	Optimizer to use. Default is Adam. For SGD, use `sgd`.
bias_only	bool	False	False	Trains only biases while freezing weights.
warmup_strategy	{constant, gradual}	gradual	False	Strategy for the learning rate warm-up stage.
wraparound_factor	float; 0≤x	0.1	False	Factor for wraparound loss.
shuffle_buffer_size	int; 0≤x	1	False	Buffer size for shuffling the dataset. A value of 0 uses the entire dataset size.

4. finetune

This sub-command enables knowledge distillation-based fine-tuning of the quantized graph.

Example Commands

# Enable fine-tune with default configuration
post_quantization_optimization(finetune)

# Enable fine-tune with a larger dataset
post_quantization_optimization(finetune, dataset_size=4096)

Parameters

Parameter	Values	Default	Required	Description
policy	{enabled, disabled}	disabled	True	Enable or disable fine-tune training. When Optimization Level ≥ 1, this can be enabled by the default policy.
dataset_size	int; 0<x	1024	False	Number of images used for training. An exception is thrown if the supplied calibration set data stream falls short of this value.
batch_size	int; 0<x	None	False	Uses the calibration batch size by default. Number of images processed together in each training step. This value is influenced by GPU memory constraints and the algorithmic impact, which can oppose the effect of learning_rate.
epochs	int; 0≤x	4	False	Number of training epochs.
learning_rate	float	None	False	Base learning rate for the schedule calculation. Default value: `0.0002 / 8 * batch_size`. This is a key parameter for experimentation, especially for architectures differing from well-performing zoo examples.
def_loss_type	{ce, l2, l2rel, cosine}	l2rel	False	Default loss type used if `loss_types` is not specified.
loss_layer_names	List of {str}	None	False	Names of layers for teacher-student losses, given in Hailo HN notation (e.g., conv20, fc1). Default: the network's output nodes.
loss_types	List of {ce, l2, l2rel, cosine}	None	False	Loss function types applied to the respective layers specified in `loss_layer_names`. Default: `def_loss_type`.
loss_factors	List of {float}	None	False	Weights for loss functions on layers specified in `loss_layer_names`. Default: 1 for all entries.
native_layers	List of {str}	[]	False	Layers not quantized during training.
native_activations	{allowed, enabled, disabled}	disabled	False	Keep activations native during training.
val_images	int; 0≤x	4096	False	Number of validation images used for evaluation between epochs.
val_batch_size	int; 0≤x	128	False	Batch size for validation steps.
stop_gradient_at_loss	bool	False	False	Stops gradient propagation after each loss layer.
force_pruning	bool	True	False	Forces zero weights to remain zero during training.

Advanced Parameters

Parameter	Values	Default	Required	Description
layers_to_freeze	List of {str}	[]	False	Freezes (prevents modification of weights and biases) any layer whose name includes an entry from this list.
lr_schedule_type	{cosine_restarts, exponential, constant}	cosine_restarts	False	Learning rate decay schedule type. Default: cosine decay.
decay_rate	float	0.5	False	Factor by which the learning rate is decayed at the beginning of each "decay period."
decay_epochs	int; 0≤x	1	False	Duration of the "decay period" in epochs.
warmup_epochs	int; 0≤x	1	False	Duration of the warm-up period in epochs, applied before the main schedule begins.
warmup_lr	float	None	False	Learning rate during the warm-up period. Defaults to 1/4 of the base learning rate.
optimizer	{adam, sgd, momentum, rmsprop}	adam	False	Optimizer to use. Default is Adam. For SGD, set to `sgd`.
bias_only	bool	False	False	Trains only biases while freezing weights.
warmup_strategy	{constant, gradual}	constant	False	Strategy for learning rate warm-up.
wraparound_factor	float; 0≤x	0	False	Factor for wraparound loss.
shuffle_buffer_size	int; 0≤x	1	False	Buffer size for shuffling the dataset. A value of 0 uses the entire dataset size.

5. adaround

The Adaround algorithm optimizes layers' quantization by training the rounding of kernel weights layer-by-layer.

Enabling Adaround

To enable Adaround, use a high optimization level (≥3) or the explicit command:

post_quantization_optimization(adaround, policy=enabled)

Adaround is primarily used at the highest optimization levels to mitigate quantization degradation. It is resource-intensive and requires a robust system to run effectively.

Recommendations for Reducing Resource Usage

Install the DALI Package

DALI accelerates the algorithm. Example installation:

  pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110 nvidia-dali-tf-plugin-cuda110

Lower Batch Size

Reduces memory usage but increases runtime. Example:

  post_quantization_optimization(adaround, policy=enabled, batch_size=8)

Enable Cache Compression

Reduces disk usage at the cost of increased runtime. Example:

  post_quantization_optimization(adaround, cache_compression=enabled, policy=enabled)

Use Smaller Dataset

Reduces memory consumption but might affect accuracy. Example:

  post_quantization_optimization(adaround, policy=enabled, dataset_size=256)

Disable Bias Training

Reduces runtime but may affect accuracy. Example:

  post_quantization_optimization(adaround, policy=enabled, train_bias=False)

Reduce Epochs

Lowers runtime but might impact accuracy. Example:

  post_quantization_optimization(adaround, policy=enabled, epochs=100)

Parameters

Parameter	Values	Default	Required	Description
policy	{enabled, disabled}	disabled	False	Enable or disable the Adaround algorithm. Enabled by default at Optimization Level ≥ 1.
learning_rate	float; 0<x	0.001	False	Learning rate for gradient descent.
batch_size	int; 0<x	32	False	Batch size for the Adaround algorithm.
dataset_size	int; 0<x	1024	False	Number of data samples for Adaround.
epochs	int; 0<x	320	False	Number of training epochs.
warmup	float; 0≤x≤1	0.2	False	Ratio of warmup epochs to total epochs.
weight	float; 0<x	0.01	False	Regularization weight. Higher values emphasize rounding cost over reconstruction loss (MSE).
train_bias	bool	True	False	Whether to train biases (applies bias correction if bias is not trained).
bias_correction_count	int	64	False	Number of samples used for bias correction.
mode	{train_4bit, train_all}	train_4bit	False	Defines the training mode. Default is `train_4bit`.
cache_compression	{enabled, disabled}	disabled	False	Enable or disable caching compression on disk.

Advanced Parameters

Parameter	Values	Default	Required	Description
b_range	[float, float]	[20, 2]	False	Defines the max and min values for temperature decay.
decay_start	float; 0≤x≤1	0	False	Ratio of training time without round regularization decay (`b`).

6. adaround per-layer

This sub-command allows toggling specific layers in the Adaround algorithm individually.

Example Commands

Enable or disable Adaround for specific layers:

# Disable Adaround for a specific layer
post_quantization_optimization(adaround, layers=[conv1], policy=disabled)

# Enable Adaround for specific layers
post_quantization_optimization(adaround, layers=[conv17, conv18], policy=enabled)

Parameters

Parameter	Values	Default	Required	Description
policy	{allowed, enabled, disabled}	allowed	False	Toggles Adaround behavior for the specified layer(s).
epochs	int	None	False	Number of training epochs for the specified layer(s).
weight	float; 0<x	None	False	Regularization weight for round regularization.
b_range	[float, float]	None	False	Temperature decay range for the specified layer(s).
decay_start	float; 0≤x≤1	None	False	Ratio of round training time without regularization decay (`b`).
train_bias	bool	None	False	Toggles bias training for the specified layer(s).
warmup	float; 0≤x≤1	None	False	Ratio of warmup epochs out of total epochs for the specified layer(s).
dataset_size	int; 0<x	None	False	Number of data samples used during training for the specified layer(s).
batch_size	int; 0<x	None	False	Batch size for training or inference for the specified layer(s).

7. mix_precision_search

This algorithm identifies the optimal precision configuration for a model using the Signal-to-Noise Ratio (SNR). SNR quantifies how much a signal is corrupted by noise and aids in balancing the compression applied to operations against the error introduced by this compression.

Parameters

Parameter	Values	Default	Required	Description
policy	{enabled, disabled}	disabled	False	Enables or disables the mix precision search.
dataset_size	int; 0<x	16	False	Number of images used for profiling.
batch_size	int; 0<x	8	False	Number of images processed together in each inference step.
snr_cap	int; 0<x	140	False	Maximum SNR value to be considered during the search.
compresions_markers	List of {float}	[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2]	False	Compression markers to guide the algorithm.
optimizer	{linear, pareto}	linear	False	Optimization strategy for precision configuration.
output_regulizer	{harmony}	harmony	False	Regulation function applied to the output.
comprecision_metric	{macs, bops, weighs}	bops	False	Metric used to evaluate the precision configuration.

Checker Configuration

When Optimization Level < 2, the checker_cfg can be manually enabled to gather activation statistics. This data can be analyzed with the profiler for detailed insights.

The checker configuration is enabled by default when the Optimization Level is set to 2 or higher.

checker_cfg

The Checker Config generates information about the quantization process using the layer analysis tool.

Example commands

This will disable the algorithm:

  model_optimization_config(checker_cfg, policy=disabled)

Note: This operation does not modify the structure of the model’s graph.

Parameters

Parameter	Values	Default	Required	Description
policy	{enabled, disabled}	enabled	False	Enables or disables the checker algorithm during the quantization process.
dataset_size	int; 0<x	16	False	Number of images used for profiling.
batch_size	int; 0<x	None	False	Number of images used together in each inference step; uses the calibration batch size by default.
analyze_mode	{simple, advanced}	simple	False	The analysis mode used during execution. 'simple' analyzes the fully quantized model, while 'advanced' analyzes layer by layer. Default is simple.
batch_norm_checker	bool	True	False	Whether the algorithm should display a warning message when the gathered layer statistics differ from the expected distribution in batch normalization. Default is True.

Compilation Stage

Performance Mode

The Performance Mode can be used to compile the model for the highest possible resource utilization, aiming to maximize performance (FPS).

Note: Expect the compilation time to increase dramatically when using this mode.

Performance Param

Definition

performance_param(compiler_optimization_level=max)

Description

Setting this parameter enters performance mode, in which the compiler will try as hard as it can to find a solution that will fit in a single context, with the highest performance. This method of compilation will require significantly longer time to complete, because the compiler tries to use very high utilization levels, that might not allocate successfully. If it fails to allocate, it automatically tries lower utilization, until it finds the highest possible utilization.

compiler_optimization_level - supports 0, 1 (default), 2, and max.

0 - returns the first feasible solution found.
1 - returns the best solution under default utilization.
2 (or max) - exhausts searches over the best utilization.

This command requires:

The compiler to meet the specified FPS.
The compiler will ignore this command if the model is Multi-Context.

Remove Node

Definition

remove_node(layer_name)

Example

remove_node(conv1)

Description

This command removes a layer from the network. It is useful for removing layers provided by the HN that are not necessary. Should be used internally only and with caution.

layer_name – the name of the layer to remove.

Suggestions

Suggestions for the compilation could be supplied (for example: compile for platforms with low PCIe bandwidth).

Platform Param

Definition

platform_param(param=value)

Examples

platform_param(targets=[ethernet])
platform_param(hints=[low_pcie_bandwidth])

Description

This sets several parameters regarding the platform hosting Hailo as described below:

targets – a list or a single value of hosting target restrictions such as Ethernet which requires disabling a set of features.

Current supported targets:

Ethernet, which disables the following features:

DDR portals, since the DDR access through PCIe is not available.
Context Switch (multi contexts), since DDR access is not available.
Sequencers (a fast PCIe-based model loading).

hints – a list of hints or a single hint about the hosting platform such as Low PCIE bandwidth which optimizes performance for specific scenarios.

Current supported hints:

low_pcie_bandwidth, adjusts the compiler to reduce the PCIE bandwidth by disabling or changing decision thresholds regarding when PCIE should be used.

Automatic Model Script

The Automatic model script can be used to pin the compilation results to a previously compiled version of the same model. After the compilation process, in addition to the binary .hef file, the compiled HAR (Hailo ARchive) file is created. This HAR file contains the final compilation results, as well as the automatic model script (.auto.alls) file, that contains the exact instructions for the compiler for creating the same binary file (for the specific Dataflow Compiler version). This model script can be used to compile the model again (from the corresponding quantized HAR file), for a quick compilation.

Extraction of the automatic model script out of the compiled HAR file is done with the command:

 hailo har extract <COMPILED_HAR_PATH> --auto-model-script-path auto_model_script_file.alls.

The extracted model script can be used in this manner:

 hailo compiler <QUANTIZED_HAR_PATH> --model-script auto_model_script_file.alls.

❯

Hailo Model Scripts

Model Scripts

Usage Examples

Command Line Interface (CLI)

Python API

About Model Scripts

Full-Precision Optimization Stage

Overview

Model Modification Commands

Command Explanations

Input Conversion

Transpose

Normalization

NMS Post-Processing

Change Output Activation

Logits Layer

Set Seed

Resize

Numerical Optimization Stage

Overview

Advanced Commands

Precision Mode

Weights Clipping

Activation Clipping

Global Average Pool Reduction

Post-Quantization Commands

Checker Configuration

Compilation Stage