Revision as of 21:08, 10 October 2024

Introduction

DeepStream

DeepStream is a streaming analytic toolkit to build AI-powered applications. It takes the streaming data as input - from USB/CSI camera, video from file or streams over RTSP, and uses AI and computer vision to generate insights from pixels for better understanding of the environment. This toolkit exposes some GStreamer plugins, such as the Gst-nvtracker which allows the DS pipeline to use a low-level tracker library to track the detected objects over time persistently with unique IDs. This is the element of our interest in this documentation.

Gst-NvTracker

The nvtracker supports 4 different tracker types, which are:

IOU (Intersection over Union)
NvSORT (Simple Online and Realtime Tracking)
NvDeepSORT (Realtime Tracking with a Deep Association Metric)
NvDCF (discriminative correlation filter)

The main tracker types that works with target re-association and data association metrics are the NvDCF and NvDeepSORT.

NvDCF

The nvDCF tracker uses the discriminative correlation filter (DCF), which allows reidentifying an object when it was partially or fully occluded. The goal of the nvDCF tracker is to identify the same object target in the frames using the DCF which learns the correlation of each object. This correlation filter yields the maximum response at the center of the object, as shown in the following image:

Thanks to the above tracking capabilities, the tracker can re-identify a target object even if there are partial or full occlusions in some cases. The tracker uses a YAML configuration file with defined parameters to improve the tracking accuracy and robustness.

Some parameters that have been explored are:

maxShadowTrackingAge: After a target reaches the maximum number (in frames) of shadow tracking the tracker will be terminated. Shadow tracking is a mode where a target is being tracked in the background and not explicitly detected.
probationAge: Is a probationary period in a number of frames where a target exists but is not associated with a detector object. If the target exceeds this parameter without being associated, the object will be considered lost.

For example, for the occlusions in the following images, the probationAge and the maxShadowTrackingAge parameters were increased. The first one is to avoid the new targets being associated too soon and the second one to keep the ID already assigned in Shadow Tracking so that when the objects reappear, the tracker can reassign the ID to the same objects.

Another parameter that can be used is:

earlyTerminationAge: If the shadowTrackingAge reaches this threshold while the tracker is in Tentative mode, the target will be terminated prematurely. The tentative mode is when a new tracker is created to track an object but the object tracked is kept under a probatory period (probationAge) until it is associated. If the target is in the Tentative mode and the shadowTrackingAge reaches earlyTerminationAge specified in the config file, the target will be terminated too early.

A parameter from the gst-nvinfer can be used as well:

interval: By specifying this parameter greater than zero, that specific number of batches will be skipped for inference.

An example of using the interval parameter is when an object "steals" the ID from a previous object detected, as shown the following image:

The interval parameter can be increased in order to avoid the detection be performed every frame.

To understand why increasing the interval parameter allows the object to keep the same ID for the same object, we should check the multiobject tracking principle seen in the following image. In multiobject tracking, the detections done can be predicted using Data Association by computing the score matrix with the new detections and matching the already existing target detection. Then, the Target Management handles new trackers instantiation or target termination and performs target state updates. Therefore, if we tell to the nvinfer to perform a detection every interval frame, we could keep the Bounding Box in the same place for the next detection since the NvDCF tracker tracks the center of the object. So the tracker could assign the same ID to the same object

The above strategy has a very strong drawback since it is very dependable on the tracking use case scenario. So, it is not a real solution, but we know that the strategy can be applied depending on the use case scenario.

The following table summarizes other tracker parameters that can be used to tune the reassociation results.

Parameter	Description	Comments
minMatchingScore4Overall (is defined in TargetManagment and DataAssociator)	Min total score for re-association
SearchRegionPaddingScale	Determines the size of the search region as a multiple of the diagonal of the target’s bounding box	It can set SearchRegionPaddingScale to the max, but it wouldn’t cover the entire image.
minTrackerConfidence	If the confidence of an object tracker is lower than this parameter on the fly, then it will be tracked in shadow mode
minTrajectoryLength4Projection	Min tracklet length of a target (i.e., age) to perform trajectory projection [frames]
prepLength4TrajectoryProjection	Length of the trajectory during which the state estimator is updated to make projections [frames]
trajectoryProjectionLength	Min tracklet length of a target (i.e., age) to perform trajectory projection [frames]
minBboxSizeSimilarity4TrackletMatching	Min bbox size similarity for tracklet matching
maxTrackletMatchingTimeSearchRange	Search space in time for max tracklet similarity
minMatchingScore4SizeSimilarity	Min bbox size similarity score for valid matching
featureImgSizeLevel	Size of a feature image	A lower value of featureImgSizeLevel causes NvDCF to use a smaller feature size, increasing GPU performance potentially yet at the cost of accuracy and robustness.
filterLr	Learning rate for DCF filter in exponential moving average	Overlearning or fast learning could not be shown desired results
filterChannelWeightsLr	Learning rate for weights for different feature channels in DCF
gaussianSigma	Standard deviation for Gaussian for desired response
minMatchingScore4VisualSimilarity	Min visual similarity score for valid matching

NvDeepSORT

The NvDeepSORT tracker is based on the object appearance information by using deep learning for accurate object detection and matching in different frames and locations. This tracker is also useful for handling occlusions and ID switches.

One of the metrics for the Data Association is the proximity, the other one is the Re-ID based similarity, which is not covered in this section. For the proximity score, the Mahalanobis distance is used. This score is computed between i-th detector object and the j-th target by using the target’s predicted location and its associated uncertainty.

The parameters that can be used to improve the object ID identification are the same as the NvDCF tracker. However, since the NvDeepSORT is less visual tracking than the NvDCF, the maxShadowTrackingAge and probationAge parameters values can be smaller than the NvDCF YAML config. This allows the GPU to have more performance in terms of resources utilization. However, the occlusions and ID switches may be handled in a poorer way than the NvDCF tracker.

Re-ID

Re-identification (Re-ID) uses TensorRT™-accelerated deep neural networks to extract unique feature vectors from detected objects that are robust to spatial-temporal variance and occlusion. It has two use-cases in NvMultiObjectTracker: (1) In NvDeepSORT, the Re-ID similarity is used for data association of objects over consecutive frames.; (2) In target re-association, the Re-ID features of targets are extracted and kept, so that they can be used for re-association with the same target if they are seemingly lost. reidType selects the mode for each aforementioned use-case.

The Re-ID similarity between a detector object and a target is the cosine similarity between the detector object’s Re-ID feature and its nearest neighbor in the target’s feature gallery, whose value is in range [0.0, 1.0]. Specifically, each Re-ID feature in the target’s gallery takes the dot product with the detector object’s Re-ID feature. The maximum of all the dot products is the similarity score.

The Re-ID has a spatial-temporal constraint. If an object moves out of frame or gets occluded beyond maxShadowTrackingAge, it will be assigned a new ID even if it returns into the frame.

The target re-association takes advantage of the Late Activation and Shadow Tracking in target management module. It tries to associate the newly-appeared targets with previously lost targets based on motion and Re-ID similarity in a seamless, real-time manner. Before a target is lost, the Re-ID network extracts its Re-ID feature with the frame interval of reidExtractionInterval and stores them in the feature gallery. These features will be used to identify target re-appearance in the tracklet matching stage.

The supported Re-ID model formats are NVIDIA TAO, ONNX and UFF (deprecated). Multiple ready-to-use sample models are listed below. Scripts and README file for users to setup the model are provided in sources/tracker_ReID.

NVIDIA TAO ReIdentificationNet

NVIDIA pre-trained ReIdentificationNet is a high accuracy ResNet-50 model with feature length 256. It can be downloaded and used directly with command:

mkdir /opt/nvidia/deepstream/deepstream/samples/models/Tracker/
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/reidentificationnet/versions/deployable_v1.0/files/resnet50_market1501.etlt' -P /opt/nvidia/deepstream/deepstream/samples/models/Tracker/

The tracker config file supports this model by default. Note the raw output from this network is not L2 normalized, so addFeatureNormalization: 1 is set to add L2 normalization as a post processing.

This model is intented to re-identify people, other models can be added by following the instructions on this link.

Re-ID Feature Output

Objects’ Re-ID features can be accessed in the tracker plugin and downstream modules via Deepstream metadata, which can be used for other tasks such as multi-target multi-camera tracking. The low-level lib configuration yaml must contain this parameter on the ReID section:

outputReidTensor: 1

This feature is supported whenever NvDeepSORT or Re-ID based re-association is used. To retrieve Re-ID features for every frame, make sure interval=0 in PGIE config and reidExtractionInterval: 0 if re-association is enabled. Otherwise, the Re-ID features will be extracted at intervals only when PGIE generates bounding boxes and reidExtractionInterval is met.

Multicamera Multiobject tracking

The ReID feature allows us to implement tracking of multiple objects that are detected in different cameras. This does not happen out of the box in the nvtracker element, so we would need to do this in an application or by creating a custom element.

To determine wether two output tensors represent the same object seen from different cameras, we use the same cosine similarity index that is used for intra-frame reid. This allows us to measure how close the vectors are in the N-dimensional space represented by the N output features that model gives us.

Results

Initial detections

Occlusions case:

The car is sucessfully reidentified and associated as the same in both cameras:

❯