Hailo Model Scripts

< Hailo‎ | Hailo-8





Model Scripts

Model Scripts allow you to customize the behavior of the Dataflow Compiler and make adjustments to your model. While it’s generally recommended to optimize and compile using the default configuration—whether through the CLI tools or Python APIs—Model Scripts provide additional flexibility by enabling changes to default settings.

Usage Examples

Command Line Interface (CLI)

To use a model script with the CLI, the `hailo optimize` command includes an optional argument for specifying a custom model script file:


hailo optimize <HAR_PATH> --model-script <MODEL_SCRIPT_PATH>

Python API

Using the Python API, you can load and apply a model script like this:


client_runner.load_model_script('model_script.alls')
client_runner.optimize(calib_dataset)
compiled_hef = client_runner.compile()

About Model Scripts

A model script is a text file that can be used with the Optimize or Compile functions to tailor the compilation process. This file includes a set of commands designed for various purposes, allowing you to fine-tune model behavior.

Full-Precision Optimization Stage

Overview

The Model Modification commands modify a parsed model to add transformations that were not part of the original ONNX/TF model. These modifications aim to reduce CPU load.

Example Modifications

  • Apply normalization at the inputs.
  • Convert formats or colors at the input stage.
  • Resize the input from source resolution to the model's resolution.
  • Add post-processing to your model (on supported architectures only).

Model Modification Commands

The following model modification commands are supported. Each command inserts a layer directly after the input layer, applied sequentially in the script. The last-executed command’s layer appears immediately after the input layer.

  • input_conversion
  • transpose
  • normalization
  • nms_postprocess
  • change_output_activation
  • logits_layer
  • set_seed
  • resize

Example Command Sequence

For instance:

 input_layer1 -> reshape_yuy2 -> norm_layer1
Command Explanations
Input Conversion

Purpose: Adds on-chip conversion of the input tensor, such as color and format conversions.

Types of Conversions
Color Conversions
  • yuv_full_range_to_rgb - Converts YUV to RGB with full range.
  • yuv_to_rgb / yuv601_to_rgb - Complies with ITU-R BT.601 standard.
  • yuv709_to_rgb - Complies with ITU-R BT.709 standard.
  • yuv_full_range_to_bgr - Converts YUV to BGR with full range.
  • yuv_to_bgr / yuv601_to_bgr - Complies with ITU-R BT.601 standard.
  • yuv709_to_bgr - Complies with ITU-R BT.709 standard.
  • bgr_to_rgb - Transposes between R and B channels.
  • rgb_to_bgr - Transposes between R and B channels.
Format Conversions
  • yuy2_to_hailo_yuv - Converts YUY2 to YUV.
  • nv12_to_hailo_yuv - Converts NV12 to YUV.
  • nv21_to_hailo_yuv - Converts NV21 to YUV.
  • i420_to_hailo_yuv - Converts i420 to YUV.
  • tf_rgbx_to_hailo_rgb - Converts RGBX to Hailo RGB format.
Examples
  • Basic Conversion:
    rgb_layer = input_conversion(input_layer1, yuv_to_rgb)
  • Conversion with Optimization Inclusion:
    yuv_layer = input_conversion(input_layer2, yuy2_to_hailo_yuv, emulator_support=True)
Transpose

Purpose: Transposes connected components of selected input layers to improve performance.

Usage:
  • Specific Layer:
    transpose(input_layer1)
  • All Layers:
    transpose()

Note: Not supported with SpaceToDepth or DepthToSpace layers.

Normalization

Purpose: Adds normalization layers with specified mean and standard deviation values.

Example:
# adding normalization layer with the parameters mean & std after the specified input layer. Multiple commands can be used to apply different normalization to each input layer.
norm_layer1 = normalization(mean_array, std_array, input_layer)  

 # adding normalization layers after all input layers. Return value should match the number of inputs in the network
norm_layer1, norm_layer2, ... = normalization(mean_array, std_array)
NMS Post-Processing

The NMS (Non-Maximum Suppression) post-processing can be configured using the nms_postprocess command. This tool helps in filtering and processing object detection results.

Basic Usage

Example command:
nms_postprocess('nms_config_file.json', meta_arch=ssd)

Configuration Options

Option 1: Basic Architecture Specification
Simply specify the architecture name using meta_arch argument
System will either:
Use auto-generated config from detected NMS structure
OR use default configuration if no structure is detected
Example:
nms_postprocess(meta_arch=ssd)
Option 2: Custom Configuration
Specify architecture name plus configuration arguments
Configurable parameters:
nms_scores_th - Score threshold
nms_iou_th - IoU threshold
image_dims - Image dimensions
classes - Number of classes
Example:
nms_postprocess(meta_arch=yolov5, image_dims=[512, 512], classes=70)
Option 3: Custom Config File
Provide both config file path and architecture name
Important: When using config file, all parameters must be in the file
Example:
nms_postprocess('config_file_path', meta_arch=centernet)

Default Configuration Files

Located in site-packages/hailo_sdk_client/tools/core_postprocess/core_postprocess:
default_nms_config_yolov5.json
default_nms_config_yolov6.json
default_nms_config_yolox.json
default_nms_config_yolo8.json
default_nms_config_centernet.json
default_nms_config_ssd.json
default_nms_config_yolov5_seg.json

Processing Modes

Neural Core Mode (nn_core)
Runs NMS post-processing on neural core
Supported architectures: YOLOv5, SSD, Centernet
Example:
nms_postprocess(meta_arch=ssd, engine=nn_core)
CPU Mode (cpu)
Runs NMS post-processing on CPU
Supported architectures: YOLOv5, YOLOv5 SEG, YOLOv8, SSD, YOLOX
Example:
nms_postprocess(meta_arch=yolov5_seg, engine=cpu, image_dims=[512, 512])
Auto Mode (auto)
Currently only supported for YOLOv5
Performs:
Bounding box decoding on neural core
Score threshold filtering on neural core
IoU filtering on CPU
Example:
nms_postprocess('config_file_path', meta_arch=yolov5, engine=auto)

Important Notes

Output Formats
Object Detection Models:
Format: [batch_size, num_classes, 5, num_proposals_per_class]
Axis 2 format: [y_min, x_min, y_max, x_max, score]
Instance Segmentation Models:
Format: [N, 1, num_max_proposals, 6 + image_dims[0] * image_dims[1]]
Last axis format: [y_min, x_min, y_max x_max, score, class, flattened masks]
Default Settings
Default nms_scores_th: 0.3
Default nms_iou_th for CPU mode: 0.6
Bounding Box Decoding
Can run without NMS using bbox_decoding_only=True
Example:
nms_postprocess(meta_arch=yolov5, engine=cpu, bbox_decoding_only=True)
Warning: CPU-based bbox decoding may impact performance
All decoded bounding boxes are normalized between 0 and 1
Change Output Activation

Purpose: Changes the activation function of output layers.

Example:
change_output_activation(output_layer, activation)
Logits Layer

Purpose: Adds a logits layer (e.g., Softmax, Argmax) after an output layer.

Example:
logits_layer1 = logits_layer(output_layer, softmax, 1)
Set Seed

Purpose: Sets the global random seed for reproducible results.

Example:
set_seed(seed=5)
Resize

Purpose: Resizes input/output tensors, either on-chip or on CPU, using bilinear interpolation by default.

Example:
  • Resize with default settings:
    resize1 = resize(conv1, resize_shapes=[256,256], resize_method=bilinear, engine=nn_core)

Numerical Optimization Stage

Overview

Important:

  • The Optimization Level determines how aggressively algorithms are applied to enhance the accuracy of a quantized model. Higher optimization levels increase accuracy but require more time and system resources.
    • Optimization levels: (might change every version)
      • -100 nothing is applied - all default algorithms are switched off
      • 0 - Equalization
      • 1 - Equalization + Iterative bias correction
      • 2 - Equalization + Finetune with 4 epochs & 1024 images
      • 3 - Equalization + Adaround with 320 epochs & 256 images on all layers
      • 4 - Equalization + Adaround with 320 epochs & 1024 images on all layers
  • The Compression Level sets the proportion of 4-bit layers in the model. Increasing the number of 4-bit layers improves the model's performance (FPS) but requires a high optimization level to compensate for any accuracy loss.
    • Compression levels: (might change every version)
      • 0 - nothing is applied
      • 1 - auto 4bit is set to 0.2 if network is large enough (20% of the weights)
      • 2 - auto 4bit is set to 0.4 if network is large enough (40% of the weights)
      • 3 - auto 4bit is set to 0.6 if network is large enough (60% of the weights)
      • 4 - auto 4bit is set to 0.8 if network is large enough (80% of the weights)
      • 5 - auto 4bit is set to 1.0 if network is large enough (100% of the weights)

Example commands:

model_optimization_flavor(optimization_level=4)
model_optimization_flavor(compression_level=2)
model_optimization_flavor(optimization_level=2, compression_level=1)
model_optimization_flavor(optimization_level=2, batch_size=4)
  • Using Resolution Reduction in the Optimization stage allows the model to run at a lower spatial resolution, significantly reducing processing time.
  • resolution_reduction
Reduce the model resolution in all input layers in order to optimize the model more efficiently. Marginally affects accuracy. Not supported on models that contain Fully-connected, Matmul an Cross-correlation layers, or when the resolution is too small.
Example commands:
# This will enable the algorithm, optimizing over an input shape of [128, 128]
pre_quantization_optimization(resolution_reduction, shape=[128, 128])
Note:
This operation doesn't modify the structure of the model's graph
Parameters
Parameter Values Default Required Description
shape [int, int] None False The shape to reduce the model resolution to.
interpolation {disabled, bilinear} bilinear False Interpolation (default) requires dataset in the original model size, disabled required dataset in the reduced resolution.
  • resolution_reduction per-layer
Sub-command for configuring resolution reduction per input layer, affecting its connected component. Reduce the resolution in order to optimize more efficiently. Marginally affects accuracy. Not supported when containing Fully-connected, Matmul an Cross-correlation layers, or when the resolution is too small.
Example commands
# This will enable the algorithm for input_layer1 connected component, optimizing over an input shape of [128, 128]
pre_quantization_optimization(resolution_reduction, layers=input_layer1, shape=[128, 128])
Note:
This operation doesn't modify the structure of the model's graph
Parameters
Parameter Values Default Required Description
shape [int, int] None False The shape to reduce the component resolution to.
interpolation {disabled, bilinear} None False Interpolation (default) requires dataset in the original model size, disabled required dataset in the reduced resolution.

Advanced Commands

Precision Mode

The precision_mode field within the quantization_param command enables selective 16-bit precision for specific layers or outputs, which can improve model accuracy.

precision_mode
Precision mode sets the bits available for the layers' weights and activation representation. There are three precision modes that could be set on the model layers using a model script command:
  • a8_w8 - which means 8-bit activations and 8-bit weights. (This is the default)
  • a8_w4 - which means 8-bit activations and 4-bit weights. Can be used to reduce memory consumption.
Supported on all layers that have weights. Compression levels automatically assigns 4-bit to layers in the model, according to the level.
  • a16_w16 - set 16-bit activations and weights to improve accuracy results. Supported on three cases:
    • On any output node (output_layer_X)
    • On any supported node(s), see the list below
    • On the full model, in case all its layers are supported (Hailo-8 family only)
Example commands
quantization_param(conv3, precision_mode=a8_w4) # A specific 4bit layer
quantization_param(output_layer1, precision_mode=a16_w16) # A specific 16bit output layer
quantization_param([conv1, maxpool2], precision_mode=a16_w16) # Multiple 16bit layers
model_optimization_config(compression_params, auto_16bit_weights_ratio=1) # Full 16-bit network, in case all layers are supported
16-bit precision supported layers
  • Activations
  • Average Pooling
  • Concat
  • Const Input
  • Convolution
  • Deconvolution
  • Depth to Space
  • Depthwise Convolution
  • Elementwise Add / Sub*
  • External Padding
  • Feature Shuffle
  • Feature Split
  • Fully Connected (dense) [its output(s) must also be 16-bit, or model output layers]
  • Max Pooling
  • Normalization
  • Output Layer
  • Reduce Max*
  • Reduce Sum*
  • Resize*
  • Reshape
  • Shortcut
  • Slice
  • Space to Depth
Notes
  • Layers with (*) are supported as long as they are not part of a Softmax chain.
  • It is recommended to use Finetune when using 4-bit weights.
Example:
quantization_param(layer_name, precision_mode=16-bit)

Weights Clipping

This command allows modification of the weight clipping behavior for selected layers during quantization. It can help reduce quantization-related degradation, especially when dealing with outlier weight values. The command is applicable only to layers that have weights.

Modes
  • disabled: Disables weight clipping and ignores any previously set clipping values for the layer.
  • manual: Uses the specified clipping values as given.
  • percentile: Computes layer-wise percentiles (clipping values range from 0 to 100).
  • mmse: Ignores clipping values and applies Minimum Mean Square Estimators to clip the layer's weights.
  • mmse_if4b: Functions like mmse, but only applies clipping when the layer uses 4-bit weights. Clipping is disabled for 8-bit weights. This is the default behavior.
Example Commands
pre_quantization_optimization(weights_clipping, layers=[conv2], mode=manual, clipping_values=[-0.1, 0.8])
pre_quantization_optimization(weights_clipping, layers=[conv3], mode=percentile, clipping_values=[1.0, 99.0])
pre_quantization_optimization(weights_clipping, layers={conv*}, mode=mmse)
pre_quantization_optimization(weights_clipping, layers=[conv3, conv4], mode=mmse_if4b)
pre_quantization_optimization(weights_clipping, layers={conv*}, mode=disabled)
Note
The dynamic range of the weights remains symmetric, even when the clipping values are not symmetric.
Parameters
Parameter Values Default Required Description
mode {disabled, manual, percentile, mmse, mmse_if4b} mmse_if4b True Mode of operation, as described above
clipping_values [float, float] None False Clipping values, required when mode is manual or percentile

Activation Clipping

By default, model optimization does not apply activation clipping during quantization. This command allows you to modify this behavior for selected layers, enabling activation clipping when running the quantization API. Activation clipping can be particularly useful to reduce quantization-related degradation, especially when dealing with outlier activation values.

Modes
  • disabled: Disables activation clipping and ignores any previously set clipping values for the layer. This is the default mode.
  • manual: Uses the specified clipping values exactly as given.
  • percentile: Calculates the activation clipping values based on layer-wise percentiles (values range from 0 to 100).
Note
Activation clipping using percentiles requires multiple iterations to collect statistics, so quantization may take longer to complete when this mode is used.
Example Commands
pre_quantization_optimization(activation_clipping, layers=[conv1], mode=manual, clipping_values=[0.188, 1.3332])
pre_quantization_optimization(activation_clipping, layers=[conv1, conv2], mode=percentile, clipping_values=[0.5, 99.5])
pre_quantization_optimization(activation_clipping, layers={conv*}, mode=disabled)
Parameters
Parameter Values Default Required Description
mode {disabled, manual, percentile} disabled True Mode of operation, as described above
clipping_values [float, float] None False Clipping values, required when mode is manual or percentile
recollect_stats bool False False Indicates whether statistics should be recollected after applying the clipping

Global Average Pool Reduction

This command allows you to reduce the spatial dimensions of global average pooling layers by adding an additional average pooling layer. The kernel size of the added average pooling layer will be [1, h // division_factors[0], w // division_factors[1], 1], where h and w refer to the height and width of the input tensor, and `division_factors` are the scaling factors for these dimensions.

Example Commands
* pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[4, 4])
* # This will disable the reduction of avgpool1
* pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[1, 1])
Parameters
Parameter Values Default Required Description
division_factors [int, int] None False Specifies the scaling factors for the kernel height and width

Post-Quantization Commands

post_quantization_optimization
All the features of this command optimize the model after the quantization process.
Syntax:
post_quantization_optimization(<feature>, <**kwargs>)
Features
The following features are available with this command:
  • bias_correction
  • bias_correction per-layer
  • train_encoding
  • finetune
  • adaround
  • adaround per-layer
  • mix_precision_search


1. bias_correction
This sub-command allows configuring the global bias correction behavior during the post-quantization process. This command replaces the old ibc parameter from the quantize() API.
Example Command
# This will enable the IBC during the post-quantization
post_quantization_optimization(bias_correction, policy=enabled)
Notes
  • An in-depth explanation of the IBC algorithm can be found in the following paper: IBC Algorithm (PDF).
  • Bias correction is recommended when the model contains small kernels or depth-wise layers.
Parameters
Parameter Values Default Required Description
policy {enabled, disabled} disabled False Enable or disable the bias correction algorithm. When Optimization Level ≥ 1, it could be enabled by the default policy.
cache_compression {enabled, disabled} disabled False Enable or disable the compression of layer results when cached to disk.


2. bias_correction per-layer
This sub-command allows enabling or disabling the Iterative Bias Correction (IBC) algorithm on a per-layer basis. The allowed policy means the behavior is derived from the algorithm's configuration.
Example Commands
# This will enable IBC for a specific layer
post_quantization_optimization(bias_correction, layers=[conv1], policy=enabled)

# This will disable IBC for conv layers and enable it for the other layers
post_quantization_optimization(bias_correction, policy=enabled)
post_quantization_optimization(bias_correction, layers={conv*}, policy=disabled)
Parameters
Parameter Values Default Required Description
policy {allowed, enabled, disabled} allowed False Sets bias correction behavior for a given layer. (default is allowed)



3. train_encoding
The train_encoding sub-command allows fine-tuning the model during the post-quantization process.
Parameters
Parameter Values Default Required Description
policy {enabled, disabled} disabled True Enable or disable fine-tune training. When Optimization Level ≥ 1, this can be enabled by the default policy.
dataset_size int; 0<x 1024 False Number of images used for training. An exception is thrown if the supplied calibration set data stream falls short of this value.
batch_size int; 0<x None False Number of images used together in each training step. By default, it uses the calibration batch size. The value is determined by GPU memory constraints and algorithmic considerations.
epochs int; 0≤x 8 False Number of training epochs.
learning_rate float None False The base learning rate used for schedule calculation. Default value: `0.0002 / 8 * batch_size`. This parameter is key for experimentation to ensure convergence, especially for architectures differing from well-performing zoo examples.
def_loss_type {ce, l2, l2rel, cosine} l2rel False Default loss type to use if `loss_types` is not specified.
loss_layer_names List of {str} None False Names of layers for teacher-student losses. By default, these are the output nodes of the network.
loss_types List of {ce, l2, l2rel, cosine} None False Loss function types to apply to layers specified in `loss_layer_names`. Default: `def_loss_type`.
loss_factors List of {float} None False Weights for loss functions applied to respective layers in `loss_layer_names`. Default: 1 for all members.
native_layers List of {str} [] False Layers not quantized during training.
native_activations {allowed, enabled, disabled} enabled False Keep activations native during training.
val_images int; 0≤x 4096 False Number of validation images for evaluation between epochs.
val_batch_size int; 0≤x 128 False Batch size for validation steps.
stop_gradient_at_loss bool False False Stops gradient propagation after each loss layer.
force_pruning bool True False If true, forces zero weights to remain zero during training.
Advanced Parameters
Parameter Values Default Required Description
layers_to_freeze List of {str} [] False Freezes (prevents modification of weights and biases) any layer whose name includes an entry from this list.
lr_schedule_type {cosine_restarts, exponential, constant} cosine_restarts False Learning rate decay schedule type. Default is cosine decay.
decay_rate float 0.5 False Decay factor of the learning rate at the start of each "decay period."
decay_epochs int; 0≤x 1 False Duration of the "decay period" in epochs.
warmup_epochs int; 0≤x 1 False Duration of the warm-up period in epochs.
warmup_lr float None False Learning rate during the warm-up period. Defaults to 1/4 of the base learning rate.
optimizer {adam, sgd, momentum, rmsprop} adam False Optimizer to use. Default is Adam. For SGD, use `sgd`.
bias_only bool False False Trains only biases while freezing weights.
warmup_strategy {constant, gradual} gradual False Strategy for the learning rate warm-up stage.
wraparound_factor float; 0≤x 0.1 False Factor for wraparound loss.
shuffle_buffer_size int; 0≤x 1 False Buffer size for shuffling the dataset. A value of 0 uses the entire dataset size.



4. finetune
This sub-command enables knowledge distillation-based fine-tuning of the quantized graph.
Example Commands
# Enable fine-tune with default configuration
post_quantization_optimization(finetune)

# Enable fine-tune with a larger dataset
post_quantization_optimization(finetune, dataset_size=4096)
Parameters
Parameter Values Default Required Description
policy {enabled, disabled} disabled True Enable or disable fine-tune training. When Optimization Level ≥ 1, this can be enabled by the default policy.
dataset_size int; 0<x 1024 False Number of images used for training. An exception is thrown if the supplied calibration set data stream falls short of this value.
batch_size int; 0<x None False Uses the calibration batch size by default. Number of images processed together in each training step. This value is influenced by GPU memory constraints and the algorithmic impact, which can oppose the effect of learning_rate.
epochs int; 0≤x 4 False Number of training epochs.
learning_rate float None False Base learning rate for the schedule calculation. Default value: `0.0002 / 8 * batch_size`. This is a key parameter for experimentation, especially for architectures differing from well-performing zoo examples.
def_loss_type {ce, l2, l2rel, cosine} l2rel False Default loss type used if `loss_types` is not specified.
loss_layer_names List of {str} None False Names of layers for teacher-student losses, given in Hailo HN notation (e.g., conv20, fc1). Default: the network's output nodes.
loss_types List of {ce, l2, l2rel, cosine} None False Loss function types applied to the respective layers specified in `loss_layer_names`. Default: `def_loss_type`.
loss_factors List of {float} None False Weights for loss functions on layers specified in `loss_layer_names`. Default: 1 for all entries.
native_layers List of {str} [] False Layers not quantized during training.
native_activations {allowed, enabled, disabled} disabled False Keep activations native during training.
val_images int; 0≤x 4096 False Number of validation images used for evaluation between epochs.
val_batch_size int; 0≤x 128 False Batch size for validation steps.
stop_gradient_at_loss bool False False Stops gradient propagation after each loss layer.
force_pruning bool True False Forces zero weights to remain zero during training.
Advanced Parameters
Parameter Values Default Required Description
layers_to_freeze List of {str} [] False Freezes (prevents modification of weights and biases) any layer whose name includes an entry from this list.
lr_schedule_type {cosine_restarts, exponential, constant} cosine_restarts False Learning rate decay schedule type. Default: cosine decay.
decay_rate float 0.5 False Factor by which the learning rate is decayed at the beginning of each "decay period."
decay_epochs int; 0≤x 1 False Duration of the "decay period" in epochs.
warmup_epochs int; 0≤x 1 False Duration of the warm-up period in epochs, applied before the main schedule begins.
warmup_lr float None False Learning rate during the warm-up period. Defaults to 1/4 of the base learning rate.
optimizer {adam, sgd, momentum, rmsprop} adam False Optimizer to use. Default is Adam. For SGD, set to `sgd`.
bias_only bool False False Trains only biases while freezing weights.
warmup_strategy {constant, gradual} constant False Strategy for learning rate warm-up.
wraparound_factor float; 0≤x 0 False Factor for wraparound loss.
shuffle_buffer_size int; 0≤x 1 False Buffer size for shuffling the dataset. A value of 0 uses the entire dataset size.


5. adaround
The Adaround algorithm optimizes layers' quantization by training the rounding of kernel weights layer-by-layer.
Enabling Adaround
To enable Adaround, use a high optimization level (≥3) or the explicit command:
post_quantization_optimization(adaround, policy=enabled)
Adaround is primarily used at the highest optimization levels to mitigate quantization degradation. It is resource-intensive and requires a robust system to run effectively.
Recommendations for Reducing Resource Usage
  • Install the DALI Package
DALI accelerates the algorithm. Example installation:
  pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110 nvidia-dali-tf-plugin-cuda110
  • Lower Batch Size
Reduces memory usage but increases runtime. Example:
  post_quantization_optimization(adaround, policy=enabled, batch_size=8)
  • Enable Cache Compression
Reduces disk usage at the cost of increased runtime. Example:
  post_quantization_optimization(adaround, cache_compression=enabled, policy=enabled)
  • Use Smaller Dataset
Reduces memory consumption but might affect accuracy. Example:
  post_quantization_optimization(adaround, policy=enabled, dataset_size=256)
  • Disable Bias Training
Reduces runtime but may affect accuracy. Example:
  post_quantization_optimization(adaround, policy=enabled, train_bias=False)
  • Reduce Epochs
Lowers runtime but might impact accuracy. Example:
  post_quantization_optimization(adaround, policy=enabled, epochs=100)
Parameters
Parameter Values Default Required Description
policy {enabled, disabled} disabled False Enable or disable the Adaround algorithm. Enabled by default at Optimization Level ≥ 1.
learning_rate float; 0<x 0.001 False Learning rate for gradient descent.
batch_size int; 0<x 32 False Batch size for the Adaround algorithm.
dataset_size int; 0<x 1024 False Number of data samples for Adaround.
epochs int; 0<x 320 False Number of training epochs.
warmup float; 0≤x≤1 0.2 False Ratio of warmup epochs to total epochs.
weight float; 0<x 0.01 False Regularization weight. Higher values emphasize rounding cost over reconstruction loss (MSE).
train_bias bool True False Whether to train biases (applies bias correction if bias is not trained).
bias_correction_count int 64 False Number of samples used for bias correction.
mode {train_4bit, train_all} train_4bit False Defines the training mode. Default is `train_4bit`.
cache_compression {enabled, disabled} disabled False Enable or disable caching compression on disk.
Advanced Parameters
Parameter Values Default Required Description
b_range [float, float] [20, 2] False Defines the max and min values for temperature decay.
decay_start float; 0≤x≤1 0 False Ratio of training time without round regularization decay (`b`).


6. adaround per-layer
This sub-command allows toggling specific layers in the Adaround algorithm individually.
Example Commands
Enable or disable Adaround for specific layers:
# Disable Adaround for a specific layer
post_quantization_optimization(adaround, layers=[conv1], policy=disabled)

# Enable Adaround for specific layers
post_quantization_optimization(adaround, layers=[conv17, conv18], policy=enabled)
Parameters
Parameter Values Default Required Description
policy {allowed, enabled, disabled} allowed False Toggles Adaround behavior for the specified layer(s).
epochs int None False Number of training epochs for the specified layer(s).
weight float; 0<x None False Regularization weight for round regularization.
b_range [float, float] None False Temperature decay range for the specified layer(s).
decay_start float; 0≤x≤1 None False Ratio of round training time without regularization decay (`b`).
train_bias bool None False Toggles bias training for the specified layer(s).
warmup float; 0≤x≤1 None False Ratio of warmup epochs out of total epochs for the specified layer(s).
dataset_size int; 0<x None False Number of data samples used during training for the specified layer(s).
batch_size int; 0<x None False Batch size for training or inference for the specified layer(s).


7. mix_precision_search
This algorithm identifies the optimal precision configuration for a model using the Signal-to-Noise Ratio (SNR). SNR quantifies how much a signal is corrupted by noise and aids in balancing the compression applied to operations against the error introduced by this compression.
Parameters
Parameter Values Default Required Description
policy {enabled, disabled} disabled False Enables or disables the mix precision search.
dataset_size int; 0<x 16 False Number of images used for profiling.
batch_size int; 0<x 8 False Number of images processed together in each inference step.
snr_cap int; 0<x 140 False Maximum SNR value to be considered during the search.
compresions_markers List of {float} [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2] False Compression markers to guide the algorithm.
optimizer {linear, pareto} linear False Optimization strategy for precision configuration.
output_regulizer {harmony} harmony False Regulation function applied to the output.
comprecision_metric {macs, bops, weighs} bops False Metric used to evaluate the precision configuration.

Checker Configuration

When Optimization Level < 2, the checker_cfg can be manually enabled to gather activation statistics. This data can be analyzed with the profiler for detailed insights.

  • The checker configuration is enabled by default when the Optimization Level is set to 2 or higher.
checker_cfg
The Checker Config generates information about the quantization process using the layer analysis tool.
Example commands
  • This will disable the algorithm:
      model_optimization_config(checker_cfg, policy=disabled)
Note: This operation does not modify the structure of the model’s graph.
Parameters
Parameter Values Default Required Description
policy {enabled, disabled} enabled False Enables or disables the checker algorithm during the quantization process.
dataset_size int; 0<x 16 False Number of images used for profiling.
batch_size int; 0<x None False Number of images used together in each inference step; uses the calibration batch size by default.
analyze_mode {simple, advanced} simple False The analysis mode used during execution. 'simple' analyzes the fully quantized model, while 'advanced' analyzes layer by layer. Default is simple.
batch_norm_checker bool True False Whether the algorithm should display a warning message when the gathered layer statistics differ from the expected distribution in batch normalization. Default is True.

Compilation Stage

Performance Mode
The Performance Mode can be used to compile the model for the highest possible resource utilization, aiming to maximize performance (FPS).
Note: Expect the compilation time to increase dramatically when using this mode.
Performance Param
Definition
performance_param(compiler_optimization_level=max)
Description
Setting this parameter enters performance mode, in which the compiler will try as hard as it can to find a solution that will fit in a single context, with the highest performance. This method of compilation will require significantly longer time to complete, because the compiler tries to use very high utilization levels, that might not allocate successfully. If it fails to allocate, it automatically tries lower utilization, until it finds the highest possible utilization.
  • compiler_optimization_level - supports 0, 1 (default), 2, and max.
  • 0 - returns the first feasible solution found.
  • 1 - returns the best solution under default utilization.
  • 2 (or max) - exhausts searches over the best utilization.
This command requires:
  • The compiler to meet the specified FPS.
  • The compiler will ignore this command if the model is Multi-Context.
Remove Node
Definition
remove_node(layer_name)
Example
remove_node(conv1)
Description
This command removes a layer from the network. It is useful for removing layers provided by the HN that are not necessary. Should be used internally only and with caution.
  • layer_name – the name of the layer to remove.


Suggestions
Suggestions for the compilation could be supplied (for example: compile for platforms with low PCIe bandwidth).
Platform Param
Definition
platform_param(param=value)
Examples
platform_param(targets=[ethernet])
platform_param(hints=[low_pcie_bandwidth])
Description
This sets several parameters regarding the platform hosting Hailo as described below:
  • targets – a list or a single value of hosting target restrictions such as Ethernet which requires disabling a set of features.
Current supported targets:
  • Ethernet, which disables the following features:
  • DDR portals, since the DDR access through PCIe is not available.
  • Context Switch (multi contexts), since DDR access is not available.
  • Sequencers (a fast PCIe-based model loading).
  • hints – a list of hints or a single hint about the hosting platform such as Low PCIE bandwidth which optimizes performance for specific scenarios.
Current supported hints:
  • low_pcie_bandwidth, adjusts the compiler to reduce the PCIE bandwidth by disabling or changing decision thresholds regarding when PCIE should be used.


Automatic Model Script
The Automatic model script can be used to pin the compilation results to a previously compiled version of the same model. After the compilation process, in addition to the binary .hef file, the compiled HAR (Hailo ARchive) file is created. This HAR file contains the final compilation results, as well as the automatic model script (.auto.alls) file, that contains the exact instructions for the compiler for creating the same binary file (for the specific Dataflow Compiler version). This model script can be used to compile the model again (from the corresponding quantized HAR file), for a quick compilation.
Extraction of the automatic model script out of the compiled HAR file is done with the command:
 hailo har extract <COMPILED_HAR_PATH> --auto-model-script-path auto_model_script_file.alls.
The extracted model script can be used in this manner:
 hailo compiler <QUANTIZED_HAR_PATH> --model-script auto_model_script_file.alls.