NVIDIA Transfer Learning Toolkit
|
Introduction to NVIDIA Transfer Learning Toolkit
This guide shows you how to train a model in the toolkit and how to deploy it to DeepStream.
Configuring Docker
Note: If installing the image from JetPack, this step is not necessary. |
- Instructions to have docker with GPU support.
- Instructions to add support for NVIDIA to docker.
- Commands to enable the NVIDIA runtime
Troubleshooting configuring the Docker
- Error:
docker: Error response from daemon: could not select device driver "" with capabilities: gpu.
- Solution: NVIDIA drivers might need to be reinstalled. Follow the instructions in the collabnix blog on New Docker CLI API Support for NVIDIA GPUs to reinstall them. You might need to restart the device before connecting.
- Error:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
- Solution:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.X/compat/
- Solution:
Downloading the container
- Create an account in NVIDIA's NGC
- Go to
Account -> Setup
and generate an API Key and follow the instructions for docker. You might need to run the following commands in order to correctly add the password:
sudo usermod -aG docker $USER newgrp docker
- Pull the image (according to the TLT guide):
docker pull nvcr.io/nvidia/tlt-streamanalytics:v1.0.1_py2
Running the container
- Create a local directory to host the dataset and other outputs so that they persist outside the docker.
mkdir -p ~/work/tlt-experiments cd ~/work/tlt-experiments
- Run the container
docker run --runtime=nvidia -it -v ~/tlt-experiments:/workspace/tlt-experiments nvcr.io/nvidia/tlt-streamanalytics:v1.0.1_py2
- Watch the docker version if you are using a different version.
- Docker might need to be reinstalled with the following commands to correctly use NVIDIA software:
sudo apt purge docker* sudo apt install docker
Downloading a pre-trained model
List the available models:
ngc registry model list nvidia/iva/tlt_*_classification
Download a model:
ngc registry model download-version nvidia/iva/[model name]
For example:
ngc registry model download-version nvidia/iva/tlt_mobilenet_v2_classification:1
Training a model
- Convert your dataset to match the expected value as defined in TLT IVA Getting Started Guide - Creating an experiment spec file (Chapter 7).
tlt-dataset-convert -d convert_config.txt -o tfrecords/tfrecord
- Add the experiment configuration file as exposed in TLT IVA Getting Started Guide - Preparing input data structure (Chapter 6).
The following could be a good example:
dataset_config { data_sources: { tfrecords_path: "/workspace/tlt-experiments/tfrecords/*" image_directory_path: "/workspace/tlt-experiments/" } image_extension: "png" target_class_mapping { key: "car" value: "car" } target_class_mapping { key: "automobile" value: "car" } target_class_mapping { key: "heavy_truck" value: "car" } target_class_mapping { key: "person" value: "pedestrian" } target_class_mapping { key: "rider" value: "cyclist" } validation_fold: 0 } model_config { arch: "resnet" pretrained_model_file: "pre-trained-models/tlt_resnet18_detectnet_v2_v1/resnet18.hdf5" freeze_blocks: 0 freeze_blocks: 1 all_projections: True num_layers: 18 use_pooling: False use_batch_norm: True dropout_rate: 0.1 training_precision: { backend_floatx: FLOAT32 } objective_set: { cov {} bbox { scale: 35.0 offset: 0.5 } } training_precision { backend_floatx: FLOAT32 } } evaluation_config { average_precision_mode: INTEGRATE validation_period_during_training: 10 first_validation_epoch: 1 minimum_detection_ground_truth_overlap { key: "car" value: 0.7 } minimum_detection_ground_truth_overlap { key: "pedestrian" value: 0.5 } minimum_detection_ground_truth_overlap { key: "cyclist" value: 0.5 } evaluation_box_config { key: "car" value { minimum_height: 4 maximum_height: 9999 minimum_width: 4 maximum_width: 9999 } } evaluation_box_config { key: "pedestrian" value { minimum_height: 4 maximum_height: 9999 minimum_width: 4 maximum_width: 9999 } } evaluation_box_config { key: "cyclist" value { minimum_height: 4 maximum_height: 9999 minimum_width: 4 maximum_width: 9999 } } } bbox_rasterizer_config { target_class_config { key: "car" value: { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 0.4 cov_radius_y: 0.4 bbox_min_radius: 1.0 } } target_class_config { key: "cyclist" value: { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 0.4 cov_radius_y: 0.4 bbox_min_radius: 1.0 } } target_class_config { key: "pedestrian" value: { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 0.4 cov_radius_y: 0.4 bbox_min_radius: 1.0 } } deadzone_radius: 0.67 } postprocessing_config { target_class_config { key: "car" value: { clustering_config { coverage_threshold: 0.005 dbscan_eps: 0.15 dbscan_min_samples: 0.05 minimum_bounding_box_height: 20 } } } target_class_config { key: "cyclist" value: { clustering_config { coverage_threshold: 0.005 dbscan_eps: 0.15 dbscan_min_samples: 0.05 minimum_bounding_box_height: 20 } } } target_class_config { key: "pedestrian" value: { clustering_config { coverage_threshold: 0.005 dbscan_eps: 0.15 dbscan_min_samples: 0.05 minimum_bounding_box_height: 20 } } } } cost_function_config { target_classes { name: "car" class_weight: 1.0 coverage_foreground_weight: 0.05 objectives { name: "cov" initial_weight: 1.0 weight_target: 1.0 } objectives { name: "bbox" initial_weight: 10.0 weight_target: 10.0 } } target_classes { name: "cyclist" class_weight: 1.0 coverage_foreground_weight: 0.05 objectives { name: "cov" initial_weight: 1.0 weight_target: 1.0 } objectives { name: "bbox" initial_weight: 10.0 weight_target: 1.0 } } target_classes { name: "pedestrian" class_weight: 1.0 coverage_foreground_weight: 0.05 objectives { name: "cov" initial_weight: 1.0 weight_target: 1.0 } objectives { name: "bbox" initial_weight: 10.0 weight_target: 10.0 } } enable_autoweighting: True max_objective_weight: 0.9999 min_objective_weight: 0.0001 } training_config { batch_size_per_gpu: 4 num_epochs: 240 learning_rate { soft_start_annealing_schedule { min_learning_rate: 5e-6 max_learning_rate: 5e-4 soft_start: 0.1 annealing: 0.7 } } regularizer { type: L1 weight: 3e-9 } optimizer { adam { epsilon: 1e-08 beta1: 0.9 beta2: 0.999 } } cost_scaling { enabled: False initial_exponent: 20.0 increment: 0.005 decrement: 1.0 } checkpoint_interval: 1 } # Sample augementation config for augmentation_config { preprocessing { output_image_width: 960 output_image_height: 544 output_image_channel: 3 min_bbox_width: 1.0 min_bbox_height: 1.0 } spatial_augmentation { hflip_probability: 0.5 vflip_probability: 0.0 zoom_min: 1.0 zoom_max: 1.0 translate_max_x: 8.0 translate_max_y: 8.0 } color_augmentation { color_shift_stddev: 0.0 hue_rotation_max: 25.0 saturation_shift_max: 0.2 contrast_scale_max: 0.1 contrast_center: 0.5 } } # Sample evaluation config to run evaluation in integrate mode for the given 3 class model, # at every 10th epoch starting from the epoch 1. evaluation_config { average_precision_mode: INTEGRATE validation_period_during_training: 10 first_validation_epoch: 1 minimum_detection_ground_truth_overlap { key: "car" value: 0.7 } minimum_detection_ground_truth_overlap { key: "pedestrian" value: 0.5 } minimum_detection_ground_truth_overlap { key: "cyclist" value: 0.5 } evaluation_box_config { key: "car" value { minimum_height: 10 maximum_height: 9999 minimum_width: 10 maximum_width: 9999 } } evaluation_box_config { key: "pedestrian" value { minimum_height: 10 maximum_height: 9999 minimum_width: 10 maximum_width: 9999 } } evaluation_box_config { key: "cyclist" value { minimum_height: 10 maximum_height: 9999 minimum_width: 10 maximum_width: 9999 } } }
- Run the training:
tlt-train detectnet_v2 --gpus 1 -r results -e experiment_config.json -k key
Resuming Training
Evaluating the model
- Run the evaluation:
tlt-evaluate detectnet_v2 -e experiment_config.json -k key -m $MODEL_FILE
This is the same process that runs every N epochs by setting the following parameters in the experiment config file
validation_period_during_training: 10 first_validation_epoch: 1
Running Inference
tlt-infer detectnet_v2 [-h] -m $MODEL_FILE -i $INPUT_IMAGE_DIR -o $OUTPUT_IMAGE_DIR -bs $BATCH_SIZE -cp $CLUSTER_FILE -k key -lw $LINE_WIDTH
The cluster params file should follow this structure:
{ "dbscan_criterion": "IOU", "dbscan_eps": { "cyclist": 0.25, "pedestrian": 0.35, "default": 0.15, "car": 0.3 }, "dbscan_min_samples": { "cyclist": 0.05, "pedestrian": 0.05, "default": 0.0, "car": 0.05 }, "min_cov_to_cluster": { "cyclist": 0.005, "pedestrian": 0.005, "default": 0.005, "car": 0.005 }, "min_obj_height": { "cyclist": 4, "pedestrian": 4, "car": 4, "default": 2 }, "target_classes": ["car", "cyclist", "pedestrian"], "confidence_th": { "pedestrian": 0.6, "cyclist": 0.6, "car": 0.6 }, "confidence_model": { "car": { "kind": "aggregate_cov"}, "pedestrian": { "kind": "aggregate_cov"}, "cyclist": { "kind": "aggregate_cov"}, "default": { "kind": "aggregate_cov"} }, "output_map": { "car" : "car", "cyclist" : "cyclist", "pedestrian" : "pedestrian" }, "color": { "car": "green", "cyclist": "magenta", "pedestrian": "cyan", "default": "blue" }, "postproc_classes": ["car", "cyclist", "pedestrian"], "image_height": 384, "image_width": 1248, "stride": 16 }
Pruning the model
Not all weights in a network contribute equally to the accuracy. By pruning the network the less significant weight can be removed speeding up the network while only having a small impact in the network accuracy. In the sample ran with KITTI the number of parameters was reduced from 11,555,983
to 743,751
while only reducing the car precision from 73.0718
to 73.0707
.
tlt-prune -pm $MODEL \ -o $OUTPUT_DIRECTORY \ -eq $EQUALIZATION_CRITERION \ # OPTIONS: arithmetic_mean, geometric_mean, union, intersection; useful for MobileNet and ResNet; default: union -pg $PRUNING_GRANULARITY \ # Optional -pth $PRUNING_THRESHOLD \ # Optional -nf $MIN_FILTERS_PER_LAYER \ # Optional: minimum number of filters to keep per layer -el $EXCLUDED_LAYERS \ # Optional: List separated by spaces, can be left empty -k $KEY
Note: NVIDIA recommends to change the threshold to keep the number of parameters in the model to within 10-20% of the original not-pruned model. |
Re-training the model
In order to regain accuracy, NVIDIA recommends that you retrain this pruned model over the same dataset. To do this, use the tlt-train
command as documented in Training the model, with an updated spec file that points to the newly pruned model as the pre-trained model file.
model_config { pretrained_model_file: prunned_model load_graph: true # Since prunning modifies the network the graph must be reloaded
For detectnet_v2, it is important that the user set the load_graph option under model_config to true to import the pruned graph. All the other parameters may be retained in the spec file from the previous training.
Deploying to DeepStream
Genearing INT8 calibration file
Running networks in INT8 mode to improve performance, but this requires a calibration cache at engine creation-time. The calibration cache is generated using a calibration tensor file, if tlt-export
is run with the --data_type
flag set to int8
. Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile, since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself. This can only be done for classification or detectnet_v2 models.
tlt-int8-tensorfile detectnet_v2 -e experiment_config.json -m 10 -o calibration.tensor
Exporting the model
tlt-export results/model.step-77844.tlt \ -o resnet18_detector.etlt \ --outputs output_cov/Sigmoid,output_bbox/BiasAdd \ -k key \ --input_dims 3,512,512 \ --max_workspace_size 1100000 \ --export_module detectnet_v2 \ --cal_data_file calibration.tensor \ --data_type int8 \ --batches 10 \ --cal_cache_file calibration.bin
Deploying to DeepStream
For the Jetson platform, the tlt-converter for JetPack 4.2.2 and JetPack 4.2.3 / 4.3 is available to download in the dev zone. Once the tlt-converter is downloaded, please follow the instructions below to generate a TensorRT engine:
1. Install the open ssl package using the command: sudo apt-get install libssl-dev
2. Run the tlt-converter using the sample command below and generate the engine.
tlt-converter -k key \ -d 3,512,512 \ -o output_cov/Sigmoid,output_bbox/BiasAdd \ -e resnet10_kitti_multiclass_v1.engine \ -m 16 \ -t fp32 \ resnet18_detector.etlt
Training a Classifier
This process is quite similar to training a detector (the example above), sample commands and configuration files are shown here.
Classification spec file:
model_config { # Model architecture can be chosen from: # ['resnet', 'vgg', 'googlenet', 'alexnet', 'mobilenet_v1', 'mobilenet_v2', 'squeezenet'] arch: "squeezenet" # for resnet --> n_layers can be [10, 18, 50] # for vgg --> n_layers can be [16, 19] # n_layers: 18 # Only relevant for resnet and vgg use_bias: True use_batch_norm: True all_projections: True use_pooling: False freeze_bn: False freeze_blocks: 0 # When using pretrained-weights not all layers need to be retrained freeze_blocks: 1 freeze_blocks: 2 freeze_blocks: 3 freeze_blocks: 4 freeze_blocks: 5 freeze_blocks: 6 # image size should be "3, X, Y", where X,Y >= 16 input_image_size: "3,112,112" } eval_config { eval_dataset_path: "test" model_path: "results/weights/squeezenet_080.tlt" # Has to be specific here top_k: 3 # If the correct class is in the top 3 (in this case) a special statistic is reported batch_size: 256 n_workers: 8 } train_config { train_dataset_path: "train" val_dataset_path: "val" pretrained_model_path: "tlt_squeezenet_classification_v1/squeezenet.hdf5" # optimizer can be chosen from ['adam', 'sgd'] optimizer: "sgd" batch_size_per_gpu: 16 n_epochs: 80 n_workers: 16 # regularizer reg_config { type: "L2" scope: "Conv2D,Dense" weight_decay: 0.00005 } # learning_rate lr_config { # "step" and "soft_anneal" are supported. scheduler: "soft_anneal" # "soft_anneal" stands for soft annealing learning rate scheduler. # the following 4 parameters should be specified if "soft_anneal" is used. learning_rate: 0.005 soft_start: 0.056 annealing_points: "0.3, 0.6, 0.8" annealing_divider: 10 # "step" stands for step learning rate scheduler. # the following 3 parameters should be specified if "step" is used. # learning_rate: 0.006 # step_size: 10 # gamma: 0.1 } }
Jump to navigationJump to search
Dictory Structure:
├── test │ ├── Abyssinian │ ├── american_bulldog │ ├── american_pit_bull_terrier │ ... ├── train │ ├── Abyssinian │ ├── american_bulldog │ ├── american_pit_bull_terrier │ ... ├── val │ ├── Abyssinian │ ├── american_bulldog │ ├── american_pit_bull_terrier │ ...
Training command:
tlt-train classification --gpus 1 -k key -r results -e pets_classification.json
Evaluating command:
tlt-evaluate classification -e pets_classification.json -k key
Running Inference:
tlt-infer classification -m results/weights/squeezenet_018.tlt -i Beagle.jpg -k key -cm results/classmap.json
For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.
Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.