Model Scripts
Model Scripts allow you to customize the behavior of the Dataflow Compiler and make adjustments to your model. While it’s generally recommended to optimize and compile using the default configuration—whether through the CLI tools or Python APIs—Model Scripts provide additional flexibility by enabling changes to default settings.
Usage Examples
Command Line Interface (CLI)
To use a model script with the CLI, the `hailo optimize` command includes an optional argument for specifying a custom model script file:
hailo optimize <HAR_PATH> --model-script <MODEL_SCRIPT_PATH>
Python API
Using the Python API, you can load and apply a model script like this:
client_runner.load_model_script('model_script.alls')
client_runner.optimize(calib_dataset)
compiled_hef = client_runner.compile()
About Model Scripts
A model script is a text file that can be used with the Optimize or Compile functions to tailor the compilation process. This file includes a set of commands designed for various purposes, allowing you to fine-tune model behavior.
Full-Precision Optimization Stage
Overview
The Model Modification commands modify a parsed model to add transformations that were not part of the original ONNX/TF model. These modifications aim to reduce CPU load.
Example Modifications
- Apply normalization at the inputs.
- Convert formats or colors at the input stage.
- Resize the input from source resolution to the model's resolution.
- Add post-processing to your model (on supported architectures only).
Model Modification Commands
The following model modification commands are supported. Each command inserts a layer directly after the input layer, applied sequentially in the script. The last-executed command’s layer appears immediately after the input layer.
- input_conversion
- transpose
- normalization
- nms_postprocess
- change_output_activation
- logits_layer
- set_seed
- resize
Example Command Sequence
For instance:
input_layer1 -> reshape_yuy2 -> norm_layer1
Command Explanations
Input Conversion
Purpose: Adds on-chip conversion of the input tensor, such as color and format conversions.
- Types of Conversions
- Color Conversions
- yuv_full_range_to_rgb - Converts YUV to RGB with full range.
- yuv_to_rgb / yuv601_to_rgb - Complies with ITU-R BT.601 standard.
- yuv709_to_rgb - Complies with ITU-R BT.709 standard.
- yuv_full_range_to_bgr - Converts YUV to BGR with full range.
- yuv_to_bgr / yuv601_to_bgr - Complies with ITU-R BT.601 standard.
- yuv709_to_bgr - Complies with ITU-R BT.709 standard.
- bgr_to_rgb - Transposes between R and B channels.
- rgb_to_bgr - Transposes between R and B channels.
- Color Conversions
- Format Conversions
- yuy2_to_hailo_yuv - Converts YUY2 to YUV.
- nv12_to_hailo_yuv - Converts NV12 to YUV.
- nv21_to_hailo_yuv - Converts NV21 to YUV.
- i420_to_hailo_yuv - Converts i420 to YUV.
- tf_rgbx_to_hailo_rgb - Converts RGBX to Hailo RGB format.
- Format Conversions
- Examples
- Basic Conversion:
rgb_layer = input_conversion(input_layer1, yuv_to_rgb)
- Basic Conversion:
- Conversion with Optimization Inclusion:
yuv_layer = input_conversion(input_layer2, yuy2_to_hailo_yuv, emulator_support=True)
- Conversion with Optimization Inclusion:
Transpose
Purpose: Transposes connected components of selected input layers to improve performance.
- Usage:
- Specific Layer:
transpose(input_layer1)
- All Layers:
transpose()
- Specific Layer:
Note: Not supported with SpaceToDepth or DepthToSpace layers.
Normalization
Purpose: Adds normalization layers with specified mean and standard deviation values.
- Example:
# adding normalization layer with the parameters mean & std after the specified input layer. Multiple commands can be used to apply different normalization to each input layer. norm_layer1 = normalization(mean_array, std_array, input_layer) # adding normalization layers after all input layers. Return value should match the number of inputs in the network norm_layer1, norm_layer2, ... = normalization(mean_array, std_array)
NMS Post-Processing
The NMS (Non-Maximum Suppression) post-processing can be configured using the nms_postprocess command. This tool helps in filtering and processing object detection results.
Basic Usage
- Example command:
nms_postprocess('nms_config_file.json', meta_arch=ssd)
Configuration Options
- Option 1: Basic Architecture Specification
- Simply specify the architecture name using meta_arch argument
- System will either:
- Use auto-generated config from detected NMS structure
- OR use default configuration if no structure is detected
- Example:
nms_postprocess(meta_arch=ssd)
- Option 2: Custom Configuration
- Specify architecture name plus configuration arguments
- Configurable parameters:
- nms_scores_th - Score threshold
- nms_iou_th - IoU threshold
- image_dims - Image dimensions
- classes - Number of classes
- Example:
nms_postprocess(meta_arch=yolov5, image_dims=[512, 512], classes=70)
- Option 3: Custom Config File
- Provide both config file path and architecture name
- Important: When using config file, all parameters must be in the file
- Example:
nms_postprocess('config_file_path', meta_arch=centernet)
Default Configuration Files
- Located in site-packages/hailo_sdk_client/tools/core_postprocess/core_postprocess:
default_nms_config_yolov5.json default_nms_config_yolov6.json default_nms_config_yolox.json default_nms_config_yolo8.json default_nms_config_centernet.json default_nms_config_ssd.json default_nms_config_yolov5_seg.json
Processing Modes
- Neural Core Mode (nn_core)
- Runs NMS post-processing on neural core
- Supported architectures: YOLOv5, SSD, Centernet
- Example:
nms_postprocess(meta_arch=ssd, engine=nn_core)
- CPU Mode (cpu)
- Runs NMS post-processing on CPU
- Supported architectures: YOLOv5, YOLOv5 SEG, YOLOv8, SSD, YOLOX
- Example:
nms_postprocess(meta_arch=yolov5_seg, engine=cpu, image_dims=[512, 512])
- Auto Mode (auto)
- Currently only supported for YOLOv5
- Performs:
- Bounding box decoding on neural core
- Score threshold filtering on neural core
- IoU filtering on CPU
- Example:
nms_postprocess('config_file_path', meta_arch=yolov5, engine=auto)
Important Notes
- Output Formats
- Object Detection Models:
Format: [batch_size, num_classes, 5, num_proposals_per_class] Axis 2 format: [y_min, x_min, y_max, x_max, score]
- Object Detection Models:
- Instance Segmentation Models:
Format: [N, 1, num_max_proposals, 6 + image_dims[0] * image_dims[1]] Last axis format: [y_min, x_min, y_max x_max, score, class, flattened masks]
- Instance Segmentation Models:
- Default Settings
- Default nms_scores_th: 0.3
- Default nms_iou_th for CPU mode: 0.6
- Bounding Box Decoding
- Can run without NMS using bbox_decoding_only=True
- Example:
nms_postprocess(meta_arch=yolov5, engine=cpu, bbox_decoding_only=True)
- Warning: CPU-based bbox decoding may impact performance
- All decoded bounding boxes are normalized between 0 and 1
Change Output Activation
Purpose: Changes the activation function of output layers.
- Example:
change_output_activation(output_layer, activation)
Logits Layer
Purpose: Adds a logits layer (e.g., Softmax, Argmax) after an output layer.
- Example:
logits_layer1 = logits_layer(output_layer, softmax, 1)
Set Seed
Purpose: Sets the global random seed for reproducible results.
- Example:
set_seed(seed=5)
Resize
Purpose: Resizes input/output tensors, either on-chip or on CPU, using bilinear interpolation by default.
- Example:
- Resize with default settings:
resize1 = resize(conv1, resize_shapes=[256,256], resize_method=bilinear, engine=nn_core)
Numerical Optimization Stage
Overview
Important:
- The Optimization Level determines how aggressively algorithms are applied to enhance the accuracy of a quantized model. Higher optimization levels increase accuracy but require more time and system resources.
- Optimization levels: (might change every version)
-
- -100 nothing is applied - all default algorithms are switched off
- 0 - Equalization
- 1 - Equalization + Iterative bias correction
- 2 - Equalization + Finetune with 4 epochs & 1024 images
- 3 - Equalization + Adaround with 320 epochs & 256 images on all layers
- 4 - Equalization + Adaround with 320 epochs & 1024 images on all layers
- The Compression Level sets the proportion of 4-bit layers in the model. Increasing the number of 4-bit layers improves the model's performance (FPS) but requires a high optimization level to compensate for any accuracy loss.
- Compression levels: (might change every version)
-
- 0 - nothing is applied
- 1 - auto 4bit is set to 0.2 if network is large enough (20% of the weights)
- 2 - auto 4bit is set to 0.4 if network is large enough (40% of the weights)
- 3 - auto 4bit is set to 0.6 if network is large enough (60% of the weights)
- 4 - auto 4bit is set to 0.8 if network is large enough (80% of the weights)
- 5 - auto 4bit is set to 1.0 if network is large enough (100% of the weights)
Example commands:
model_optimization_flavor(optimization_level=4)
model_optimization_flavor(compression_level=2)
model_optimization_flavor(optimization_level=2, compression_level=1)
model_optimization_flavor(optimization_level=2, batch_size=4)
- Using Resolution Reduction in the Optimization stage allows the model to run at a lower spatial resolution, significantly reducing processing time.
- resolution_reduction
- Reduce the model resolution in all input layers in order to optimize the model more efficiently. Marginally affects accuracy. Not supported on models that contain Fully-connected, Matmul an Cross-correlation layers, or when the resolution is too small.
- Example commands:
# This will enable the algorithm, optimizing over an input shape of [128, 128] pre_quantization_optimization(resolution_reduction, shape=[128, 128])
- Note:
- This operation doesn't modify the structure of the model's graph
- Note:
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
shape | [int, int] | None | False | The shape to reduce the model resolution to. |
interpolation | {disabled, bilinear} | bilinear | False | Interpolation (default) requires dataset in the original model size, disabled required dataset in the reduced resolution. |
- resolution_reduction per-layer
- Sub-command for configuring resolution reduction per input layer, affecting its connected component. Reduce the resolution in order to optimize more efficiently. Marginally affects accuracy. Not supported when containing Fully-connected, Matmul an Cross-correlation layers, or when the resolution is too small.
- Example commands
# This will enable the algorithm for input_layer1 connected component, optimizing over an input shape of [128, 128] pre_quantization_optimization(resolution_reduction, layers=input_layer1, shape=[128, 128])
- Example commands
- Note:
- This operation doesn't modify the structure of the model's graph
- Note:
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
shape | [int, int] | None | False | The shape to reduce the component resolution to. |
interpolation | {disabled, bilinear} | None | False | Interpolation (default) requires dataset in the original model size, disabled required dataset in the reduced resolution. |
Advanced Commands
Precision Mode
The precision_mode field within the quantization_param command enables selective 16-bit precision for specific layers or outputs, which can improve model accuracy.
- precision_mode
- Precision mode sets the bits available for the layers' weights and activation representation. There are three precision modes that could be set on the model layers using a model script command:
- a8_w8 - which means 8-bit activations and 8-bit weights. (This is the default)
- a8_w4 - which means 8-bit activations and 4-bit weights. Can be used to reduce memory consumption.
- Supported on all layers that have weights. Compression levels automatically assigns 4-bit to layers in the model, according to the level.
- a16_w16 - set 16-bit activations and weights to improve accuracy results. Supported on three cases:
- On any output node (output_layer_X)
- On any supported node(s), see the list below
- On the full model, in case all its layers are supported (Hailo-8 family only)
- a16_w16 - set 16-bit activations and weights to improve accuracy results. Supported on three cases:
- Supported on all layers that have weights. Compression levels automatically assigns 4-bit to layers in the model, according to the level.
- Example commands
quantization_param(conv3, precision_mode=a8_w4) # A specific 4bit layer quantization_param(output_layer1, precision_mode=a16_w16) # A specific 16bit output layer quantization_param([conv1, maxpool2], precision_mode=a16_w16) # Multiple 16bit layers model_optimization_config(compression_params, auto_16bit_weights_ratio=1) # Full 16-bit network, in case all layers are supported
- Example commands
- 16-bit precision supported layers
- Activations
- Average Pooling
- Concat
- Const Input
- Convolution
- Deconvolution
- Depth to Space
- Depthwise Convolution
- Elementwise Add / Sub*
- External Padding
- Feature Shuffle
- Feature Split
- Fully Connected (dense) [its output(s) must also be 16-bit, or model output layers]
- Max Pooling
- Normalization
- Output Layer
- Reduce Max*
- Reduce Sum*
- Resize*
- Reshape
- Shortcut
- Slice
- Space to Depth
- 16-bit precision supported layers
- Notes
- Layers with (*) are supported as long as they are not part of a Softmax chain.
- It is recommended to use Finetune when using 4-bit weights.
- Notes
- Example:
quantization_param(layer_name, precision_mode=16-bit)
- Example:
Weights Clipping
This command allows modification of the weight clipping behavior for selected layers during quantization. It can help reduce quantization-related degradation, especially when dealing with outlier weight values. The command is applicable only to layers that have weights.
- Modes
- disabled: Disables weight clipping and ignores any previously set clipping values for the layer.
- manual: Uses the specified clipping values as given.
- percentile: Computes layer-wise percentiles (clipping values range from 0 to 100).
- mmse: Ignores clipping values and applies Minimum Mean Square Estimators to clip the layer's weights.
- mmse_if4b: Functions like mmse, but only applies clipping when the layer uses 4-bit weights. Clipping is disabled for 8-bit weights. This is the default behavior.
- Example Commands
pre_quantization_optimization(weights_clipping, layers=[conv2], mode=manual, clipping_values=[-0.1, 0.8])
pre_quantization_optimization(weights_clipping, layers=[conv3], mode=percentile, clipping_values=[1.0, 99.0])
pre_quantization_optimization(weights_clipping, layers={conv*}, mode=mmse)
pre_quantization_optimization(weights_clipping, layers=[conv3, conv4], mode=mmse_if4b)
pre_quantization_optimization(weights_clipping, layers={conv*}, mode=disabled)
- Note
- The dynamic range of the weights remains symmetric, even when the clipping values are not symmetric.
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
mode | {disabled, manual, percentile, mmse, mmse_if4b} | mmse_if4b | True | Mode of operation, as described above |
clipping_values | [float, float] | None | False | Clipping values, required when mode is manual or percentile |
Activation Clipping
By default, model optimization does not apply activation clipping during quantization. This command allows you to modify this behavior for selected layers, enabling activation clipping when running the quantization API. Activation clipping can be particularly useful to reduce quantization-related degradation, especially when dealing with outlier activation values.
- Modes
- disabled: Disables activation clipping and ignores any previously set clipping values for the layer. This is the default mode.
- manual: Uses the specified clipping values exactly as given.
- percentile: Calculates the activation clipping values based on layer-wise percentiles (values range from 0 to 100).
- Note
- Activation clipping using percentiles requires multiple iterations to collect statistics, so quantization may take longer to complete when this mode is used.
- Example Commands
pre_quantization_optimization(activation_clipping, layers=[conv1], mode=manual, clipping_values=[0.188, 1.3332]) pre_quantization_optimization(activation_clipping, layers=[conv1, conv2], mode=percentile, clipping_values=[0.5, 99.5]) pre_quantization_optimization(activation_clipping, layers={conv*}, mode=disabled)
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
mode | {disabled, manual, percentile} | disabled | True | Mode of operation, as described above |
clipping_values | [float, float] | None | False | Clipping values, required when mode is manual or percentile |
recollect_stats | bool | False | False | Indicates whether statistics should be recollected after applying the clipping |
Global Average Pool Reduction
This command allows you to reduce the spatial dimensions of global average pooling layers by adding an additional average pooling layer. The kernel size of the added average pooling layer will be [1, h // division_factors[0], w // division_factors[1], 1], where h and w refer to the height and width of the input tensor, and `division_factors` are the scaling factors for these dimensions.
- Example Commands
* pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[4, 4]) * # This will disable the reduction of avgpool1 * pre_quantization_optimization(global_avgpool_reduction, layers=avgpool1, division_factors=[1, 1])
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
division_factors | [int, int] | None | False | Specifies the scaling factors for the kernel height and width |
Post-Quantization Commands
- post_quantization_optimization
- All the features of this command optimize the model after the quantization process.
- Syntax:
post_quantization_optimization(<feature>, <**kwargs>)
- Syntax:
- Features
- The following features are available with this command:
- Features
- bias_correction
- bias_correction per-layer
- train_encoding
- finetune
- adaround
- adaround per-layer
- mix_precision_search
- 1. bias_correction
- This sub-command allows configuring the global bias correction behavior during the post-quantization process. This command replaces the old ibc parameter from the quantize() API.
- Example Command
# This will enable the IBC during the post-quantization post_quantization_optimization(bias_correction, policy=enabled)
- Example Command
- Notes
- An in-depth explanation of the IBC algorithm can be found in the following paper: IBC Algorithm (PDF).
- Bias correction is recommended when the model contains small kernels or depth-wise layers.
- Notes
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
policy | {enabled, disabled} | disabled | False | Enable or disable the bias correction algorithm. When Optimization Level ≥ 1, it could be enabled by the default policy. |
cache_compression | {enabled, disabled} | disabled | False | Enable or disable the compression of layer results when cached to disk. |
- 2. bias_correction per-layer
- This sub-command allows enabling or disabling the Iterative Bias Correction (IBC) algorithm on a per-layer basis. The allowed policy means the behavior is derived from the algorithm's configuration.
- Example Commands
# This will enable IBC for a specific layer post_quantization_optimization(bias_correction, layers=[conv1], policy=enabled) # This will disable IBC for conv layers and enable it for the other layers post_quantization_optimization(bias_correction, policy=enabled) post_quantization_optimization(bias_correction, layers={conv*}, policy=disabled)
- Example Commands
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
policy | {allowed, enabled, disabled} | allowed | False | Sets bias correction behavior for a given layer. (default is allowed) |
- 3. train_encoding
- The train_encoding sub-command allows fine-tuning the model during the post-quantization process.
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
policy | {enabled, disabled} | disabled | True | Enable or disable fine-tune training. When Optimization Level ≥ 1, this can be enabled by the default policy. |
dataset_size | int; 0<x | 1024 | False | Number of images used for training. An exception is thrown if the supplied calibration set data stream falls short of this value. |
batch_size | int; 0<x | None | False | Number of images used together in each training step. By default, it uses the calibration batch size. The value is determined by GPU memory constraints and algorithmic considerations. |
epochs | int; 0≤x | 8 | False | Number of training epochs. |
learning_rate | float | None | False | The base learning rate used for schedule calculation. Default value: `0.0002 / 8 * batch_size`. This parameter is key for experimentation to ensure convergence, especially for architectures differing from well-performing zoo examples. |
def_loss_type | {ce, l2, l2rel, cosine} | l2rel | False | Default loss type to use if `loss_types` is not specified. |
loss_layer_names | List of {str} | None | False | Names of layers for teacher-student losses. By default, these are the output nodes of the network. |
loss_types | List of {ce, l2, l2rel, cosine} | None | False | Loss function types to apply to layers specified in `loss_layer_names`. Default: `def_loss_type`. |
loss_factors | List of {float} | None | False | Weights for loss functions applied to respective layers in `loss_layer_names`. Default: 1 for all members. |
native_layers | List of {str} | [] | False | Layers not quantized during training. |
native_activations | {allowed, enabled, disabled} | enabled | False | Keep activations native during training. |
val_images | int; 0≤x | 4096 | False | Number of validation images for evaluation between epochs. |
val_batch_size | int; 0≤x | 128 | False | Batch size for validation steps. |
stop_gradient_at_loss | bool | False | False | Stops gradient propagation after each loss layer. |
force_pruning | bool | True | False | If true, forces zero weights to remain zero during training. |
- Advanced Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
layers_to_freeze | List of {str} | [] | False | Freezes (prevents modification of weights and biases) any layer whose name includes an entry from this list. |
lr_schedule_type | {cosine_restarts, exponential, constant} | cosine_restarts | False | Learning rate decay schedule type. Default is cosine decay. |
decay_rate | float | 0.5 | False | Decay factor of the learning rate at the start of each "decay period." |
decay_epochs | int; 0≤x | 1 | False | Duration of the "decay period" in epochs. |
warmup_epochs | int; 0≤x | 1 | False | Duration of the warm-up period in epochs. |
warmup_lr | float | None | False | Learning rate during the warm-up period. Defaults to 1/4 of the base learning rate. |
optimizer | {adam, sgd, momentum, rmsprop} | adam | False | Optimizer to use. Default is Adam. For SGD, use `sgd`. |
bias_only | bool | False | False | Trains only biases while freezing weights. |
warmup_strategy | {constant, gradual} | gradual | False | Strategy for the learning rate warm-up stage. |
wraparound_factor | float; 0≤x | 0.1 | False | Factor for wraparound loss. |
shuffle_buffer_size | int; 0≤x | 1 | False | Buffer size for shuffling the dataset. A value of 0 uses the entire dataset size. |
- 4. finetune
- This sub-command enables knowledge distillation-based fine-tuning of the quantized graph.
- Example Commands
# Enable fine-tune with default configuration post_quantization_optimization(finetune) # Enable fine-tune with a larger dataset post_quantization_optimization(finetune, dataset_size=4096)
- Example Commands
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
policy | {enabled, disabled} | disabled | True | Enable or disable fine-tune training. When Optimization Level ≥ 1, this can be enabled by the default policy. |
dataset_size | int; 0<x | 1024 | False | Number of images used for training. An exception is thrown if the supplied calibration set data stream falls short of this value. |
batch_size | int; 0<x | None | False | Uses the calibration batch size by default. Number of images processed together in each training step. This value is influenced by GPU memory constraints and the algorithmic impact, which can oppose the effect of learning_rate. |
epochs | int; 0≤x | 4 | False | Number of training epochs. |
learning_rate | float | None | False | Base learning rate for the schedule calculation. Default value: `0.0002 / 8 * batch_size`. This is a key parameter for experimentation, especially for architectures differing from well-performing zoo examples. |
def_loss_type | {ce, l2, l2rel, cosine} | l2rel | False | Default loss type used if `loss_types` is not specified. |
loss_layer_names | List of {str} | None | False | Names of layers for teacher-student losses, given in Hailo HN notation (e.g., conv20, fc1). Default: the network's output nodes. |
loss_types | List of {ce, l2, l2rel, cosine} | None | False | Loss function types applied to the respective layers specified in `loss_layer_names`. Default: `def_loss_type`. |
loss_factors | List of {float} | None | False | Weights for loss functions on layers specified in `loss_layer_names`. Default: 1 for all entries. |
native_layers | List of {str} | [] | False | Layers not quantized during training. |
native_activations | {allowed, enabled, disabled} | disabled | False | Keep activations native during training. |
val_images | int; 0≤x | 4096 | False | Number of validation images used for evaluation between epochs. |
val_batch_size | int; 0≤x | 128 | False | Batch size for validation steps. |
stop_gradient_at_loss | bool | False | False | Stops gradient propagation after each loss layer. |
force_pruning | bool | True | False | Forces zero weights to remain zero during training. |
- Advanced Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
layers_to_freeze | List of {str} | [] | False | Freezes (prevents modification of weights and biases) any layer whose name includes an entry from this list. |
lr_schedule_type | {cosine_restarts, exponential, constant} | cosine_restarts | False | Learning rate decay schedule type. Default: cosine decay. |
decay_rate | float | 0.5 | False | Factor by which the learning rate is decayed at the beginning of each "decay period." |
decay_epochs | int; 0≤x | 1 | False | Duration of the "decay period" in epochs. |
warmup_epochs | int; 0≤x | 1 | False | Duration of the warm-up period in epochs, applied before the main schedule begins. |
warmup_lr | float | None | False | Learning rate during the warm-up period. Defaults to 1/4 of the base learning rate. |
optimizer | {adam, sgd, momentum, rmsprop} | adam | False | Optimizer to use. Default is Adam. For SGD, set to `sgd`. |
bias_only | bool | False | False | Trains only biases while freezing weights. |
warmup_strategy | {constant, gradual} | constant | False | Strategy for learning rate warm-up. |
wraparound_factor | float; 0≤x | 0 | False | Factor for wraparound loss. |
shuffle_buffer_size | int; 0≤x | 1 | False | Buffer size for shuffling the dataset. A value of 0 uses the entire dataset size. |
- 5. adaround
- The Adaround algorithm optimizes layers' quantization by training the rounding of kernel weights layer-by-layer.
- Enabling Adaround
- To enable Adaround, use a high optimization level (≥3) or the explicit command:
post_quantization_optimization(adaround, policy=enabled)
- Adaround is primarily used at the highest optimization levels to mitigate quantization degradation. It is resource-intensive and requires a robust system to run effectively.
- Recommendations for Reducing Resource Usage
- Install the DALI Package
- DALI accelerates the algorithm. Example installation:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110 nvidia-dali-tf-plugin-cuda110
- Recommendations for Reducing Resource Usage
- Lower Batch Size
- Reduces memory usage but increases runtime. Example:
post_quantization_optimization(adaround, policy=enabled, batch_size=8)
- Enable Cache Compression
- Reduces disk usage at the cost of increased runtime. Example:
post_quantization_optimization(adaround, cache_compression=enabled, policy=enabled)
- Use Smaller Dataset
- Reduces memory consumption but might affect accuracy. Example:
post_quantization_optimization(adaround, policy=enabled, dataset_size=256)
- Disable Bias Training
- Reduces runtime but may affect accuracy. Example:
post_quantization_optimization(adaround, policy=enabled, train_bias=False)
- Reduce Epochs
- Lowers runtime but might impact accuracy. Example:
post_quantization_optimization(adaround, policy=enabled, epochs=100)
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
policy | {enabled, disabled} | disabled | False | Enable or disable the Adaround algorithm. Enabled by default at Optimization Level ≥ 1. |
learning_rate | float; 0<x | 0.001 | False | Learning rate for gradient descent. |
batch_size | int; 0<x | 32 | False | Batch size for the Adaround algorithm. |
dataset_size | int; 0<x | 1024 | False | Number of data samples for Adaround. |
epochs | int; 0<x | 320 | False | Number of training epochs. |
warmup | float; 0≤x≤1 | 0.2 | False | Ratio of warmup epochs to total epochs. |
weight | float; 0<x | 0.01 | False | Regularization weight. Higher values emphasize rounding cost over reconstruction loss (MSE). |
train_bias | bool | True | False | Whether to train biases (applies bias correction if bias is not trained). |
bias_correction_count | int | 64 | False | Number of samples used for bias correction. |
mode | {train_4bit, train_all} | train_4bit | False | Defines the training mode. Default is `train_4bit`. |
cache_compression | {enabled, disabled} | disabled | False | Enable or disable caching compression on disk. |
- Advanced Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
b_range | [float, float] | [20, 2] | False | Defines the max and min values for temperature decay. |
decay_start | float; 0≤x≤1 | 0 | False | Ratio of training time without round regularization decay (`b`). |
- 6. adaround per-layer
- This sub-command allows toggling specific layers in the Adaround algorithm individually.
- Example Commands
- Enable or disable Adaround for specific layers:
# Disable Adaround for a specific layer post_quantization_optimization(adaround, layers=[conv1], policy=disabled) # Enable Adaround for specific layers post_quantization_optimization(adaround, layers=[conv17, conv18], policy=enabled)
- Enable or disable Adaround for specific layers:
- Example Commands
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
policy | {allowed, enabled, disabled} | allowed | False | Toggles Adaround behavior for the specified layer(s). |
epochs | int | None | False | Number of training epochs for the specified layer(s). |
weight | float; 0<x | None | False | Regularization weight for round regularization. |
b_range | [float, float] | None | False | Temperature decay range for the specified layer(s). |
decay_start | float; 0≤x≤1 | None | False | Ratio of round training time without regularization decay (`b`). |
train_bias | bool | None | False | Toggles bias training for the specified layer(s). |
warmup | float; 0≤x≤1 | None | False | Ratio of warmup epochs out of total epochs for the specified layer(s). |
dataset_size | int; 0<x | None | False | Number of data samples used during training for the specified layer(s). |
batch_size | int; 0<x | None | False | Batch size for training or inference for the specified layer(s). |
- 7. mix_precision_search
- This algorithm identifies the optimal precision configuration for a model using the Signal-to-Noise Ratio (SNR). SNR quantifies how much a signal is corrupted by noise and aids in balancing the compression applied to operations against the error introduced by this compression.
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
policy | {enabled, disabled} | disabled | False | Enables or disables the mix precision search. |
dataset_size | int; 0<x | 16 | False | Number of images used for profiling. |
batch_size | int; 0<x | 8 | False | Number of images processed together in each inference step. |
snr_cap | int; 0<x | 140 | False | Maximum SNR value to be considered during the search. |
compresions_markers | List of {float} | [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2] | False | Compression markers to guide the algorithm. |
optimizer | {linear, pareto} | linear | False | Optimization strategy for precision configuration. |
output_regulizer | {harmony} | harmony | False | Regulation function applied to the output. |
comprecision_metric | {macs, bops, weighs} | bops | False | Metric used to evaluate the precision configuration. |
Checker Configuration
When Optimization Level < 2, the checker_cfg can be manually enabled to gather activation statistics. This data can be analyzed with the profiler for detailed insights.
- The checker configuration is enabled by default when the Optimization Level is set to 2 or higher.
- checker_cfg
- The Checker Config generates information about the quantization process using the layer analysis tool.
- Example commands
- This will disable the algorithm:
model_optimization_config(checker_cfg, policy=disabled)
- This will disable the algorithm:
- Example commands
- Note: This operation does not modify the structure of the model’s graph.
- Parameters
Parameter | Values | Default | Required | Description |
---|---|---|---|---|
policy | {enabled, disabled} | enabled | False | Enables or disables the checker algorithm during the quantization process. |
dataset_size | int; 0<x | 16 | False | Number of images used for profiling. |
batch_size | int; 0<x | None | False | Number of images used together in each inference step; uses the calibration batch size by default. |
analyze_mode | {simple, advanced} | simple | False | The analysis mode used during execution. 'simple' analyzes the fully quantized model, while 'advanced' analyzes layer by layer. Default is simple. |
batch_norm_checker | bool | True | False | Whether the algorithm should display a warning message when the gathered layer statistics differ from the expected distribution in batch normalization. Default is True. |
Compilation Stage
- Performance Mode
- The Performance Mode can be used to compile the model for the highest possible resource utilization, aiming to maximize performance (FPS).
- Note: Expect the compilation time to increase dramatically when using this mode.
- Performance Param
- Definition
performance_param(compiler_optimization_level=max)
- Definition
- Description
- Setting this parameter enters performance mode, in which the compiler will try as hard as it can to find a solution that will fit in a single context, with the highest performance. This method of compilation will require significantly longer time to complete, because the compiler tries to use very high utilization levels, that might not allocate successfully. If it fails to allocate, it automatically tries lower utilization, until it finds the highest possible utilization.
- compiler_optimization_level - supports 0, 1 (default), 2, and max.
- 0 - returns the first feasible solution found.
- 1 - returns the best solution under default utilization.
- 2 (or max) - exhausts searches over the best utilization.
- This command requires:
- The compiler to meet the specified FPS.
- The compiler will ignore this command if the model is Multi-Context.
- Remove Node
- Definition
remove_node(layer_name)
- Definition
- Example
remove_node(conv1)
- Example
- Description
- This command removes a layer from the network. It is useful for removing layers provided by the HN that are not necessary. Should be used internally only and with caution.
- Description
- layer_name – the name of the layer to remove.
- Suggestions
- Suggestions for the compilation could be supplied (for example: compile for platforms with low PCIe bandwidth).
- Platform Param
- Definition
platform_param(param=value)
- Definition
- Examples
platform_param(targets=[ethernet]) platform_param(hints=[low_pcie_bandwidth])
- Examples
- Description
- This sets several parameters regarding the platform hosting Hailo as described below:
- Description
- targets – a list or a single value of hosting target restrictions such as Ethernet which requires disabling a set of features.
- Current supported targets:
- Ethernet, which disables the following features:
- DDR portals, since the DDR access through PCIe is not available.
- Context Switch (multi contexts), since DDR access is not available.
- Sequencers (a fast PCIe-based model loading).
- hints – a list of hints or a single hint about the hosting platform such as Low PCIE bandwidth which optimizes performance for specific scenarios.
- Current supported hints:
- low_pcie_bandwidth, adjusts the compiler to reduce the PCIE bandwidth by disabling or changing decision thresholds regarding when PCIE should be used.
- Automatic Model Script
- The Automatic model script can be used to pin the compilation results to a previously compiled version of the same model. After the compilation process, in addition to the binary .hef file, the compiled HAR (Hailo ARchive) file is created. This HAR file contains the final compilation results, as well as the automatic model script (.auto.alls) file, that contains the exact instructions for the compiler for creating the same binary file (for the specific Dataflow Compiler version). This model script can be used to compile the model again (from the corresponding quantized HAR file), for a quick compilation.
- Extraction of the automatic model script out of the compiled HAR file is done with the command:
hailo har extract <COMPILED_HAR_PATH> --auto-model-script-path auto_model_script_file.alls.
- The extracted model script can be used in this manner:
hailo compiler <QUANTIZED_HAR_PATH> --model-script auto_model_script_file.alls.