ReID and Target Reassociation using NVIDIA Deepstream: Difference between revisions
No edit summary |
Crodriguez (talk | contribs) m (Crodriguez moved page DeepStream - Target reassociation using nvtracker and reID to ReID and Target Reassociation using NVIDIA Deepstream without leaving a redirect) |
(No difference)
|
Revision as of 21:08, 10 October 2024
Introduction
DeepStream
DeepStream is a streaming analytic toolkit to build AI-powered applications. It takes the streaming data as input - from USB/CSI camera, video from file or streams over RTSP, and uses AI and computer vision to generate insights from pixels for better understanding of the environment. This toolkit exposes some GStreamer plugins, such as the Gst-nvtracker which allows the DS pipeline to use a low-level tracker library to track the detected objects over time persistently with unique IDs. This is the element of our interest in this documentation.
Gst-NvTracker
The nvtracker supports 4 different tracker types, which are:
- IOU (Intersection over Union)
- NvSORT (Simple Online and Realtime Tracking)
- NvDeepSORT (Realtime Tracking with a Deep Association Metric)
- NvDCF (discriminative correlation filter)
The main tracker types that works with target re-association and data association metrics are the NvDCF and NvDeepSORT.
NvDCF
The nvDCF tracker uses the discriminative correlation filter (DCF), which allows reidentifying an object when it was partially or fully occluded. The goal of the nvDCF tracker is to identify the same object target in the frames using the DCF which learns the correlation of each object. This correlation filter yields the maximum response at the center of the object, as shown in the following image:

Thanks to the above tracking capabilities, the tracker can re-identify a target object even if there are partial or full occlusions in some cases. The tracker uses a YAML configuration file with defined parameters to improve the tracking accuracy and robustness.
Some parameters that have been explored are:
maxShadowTrackingAge
: After a target reaches the maximum number (in frames) of shadow tracking the tracker will be terminated. Shadow tracking is a mode where a target is being tracked in the background and not explicitly detected.probationAge
: Is a probationary period in a number of frames where a target exists but is not associated with a detector object. If the target exceeds this parameter without being associated, the object will be considered lost.
For example, for the occlusions in the following images, the probationAge
and the maxShadowTrackingAge
parameters were increased. The first one is to avoid the new targets being associated too soon and the second one to keep the ID already assigned in Shadow Tracking so that when the objects reappear, the tracker can reassign the ID to the same objects.


Another parameter that can be used is:
earlyTerminationAge
: If theshadowTrackingAge
reaches this threshold while the tracker is in Tentative mode, the target will be terminated prematurely. The tentative mode is when a new tracker is created to track an object but the object tracked is kept under a probatory period (probationAge
) until it is associated. If the target is in the Tentative mode and theshadowTrackingAge
reachesearlyTerminationAge
specified in the config file, the target will be terminated too early.
A parameter from the gst-nvinfer
can be used as well:
interval
: By specifying this parameter greater than zero, that specific number of batches will be skipped for inference.
An example of using the interval
parameter is when an object "steals" the ID from a previous object detected, as shown the following image:

The interval
parameter can be increased in order to avoid the detection be performed every frame.
To understand why increasing the interval
parameter allows the object to keep the same ID for the same object, we should check the multiobject tracking principle seen in the following image. In multiobject tracking, the detections done can be predicted using Data Association by computing the score matrix with the new detections and matching the already existing target detection. Then, the Target Management handles new trackers instantiation or target termination and performs target state updates. Therefore, if we tell to the nvinfer
to perform a detection every interval
frame, we could keep the Bounding Box in the same place for the next detection since the NvDCF tracker tracks the center of the object. So the tracker could assign the same ID to the same object

The above strategy has a very strong drawback since it is very dependable on the tracking use case scenario. So, it is not a real solution, but we know that the strategy can be applied depending on the use case scenario.
The following table summarizes other tracker parameters that can be used to tune the reassociation results.
Parameter | Description | Comments |
---|---|---|
minMatchingScore4Overall (is defined
in TargetManagment and DataAssociator) |
Min total score for re-association | |
SearchRegionPaddingScale | Determines the size of the search region as a
multiple of the diagonal of the target’s bounding box |
It can set SearchRegionPaddingScale to the max,
but it wouldn’t cover the entire image. |
minTrackerConfidence | If the confidence of an object tracker is lower than
this parameter on the fly, then it will be tracked in shadow mode |
|
minTrajectoryLength4Projection | Min tracklet length of a target (i.e., age)
to perform trajectory projection [frames] |
|
prepLength4TrajectoryProjection | Length of the trajectory during which the
state estimator is updated to make projections [frames] |
|
trajectoryProjectionLength | Min tracklet length of a target (i.e., age)
to perform trajectory projection [frames] |
|
minBboxSizeSimilarity4TrackletMatching | Min bbox size similarity for tracklet matching | |
maxTrackletMatchingTimeSearchRange | Search space in time for max tracklet similarity | |
minMatchingScore4SizeSimilarity | Min bbox size similarity score for valid matching | |
featureImgSizeLevel | Size of a feature image | A lower value of featureImgSizeLevel causes NvDCF
to use a smaller feature size, increasing GPU performance potentially yet at the cost of accuracy and robustness. |
filterLr | Learning rate for DCF filter in exponential moving average | Overlearning or fast learning could not be shown
desired results |
filterChannelWeightsLr | Learning rate for weights for different feature
channels in DCF |
|
gaussianSigma | Standard deviation for Gaussian for desired response | |
minMatchingScore4VisualSimilarity | Min visual similarity score for valid matching |
NvDeepSORT
The NvDeepSORT tracker is based on the object appearance information by using deep learning for accurate object detection and matching in different frames and locations. This tracker is also useful for handling occlusions and ID switches.
One of the metrics for the Data Association is the proximity, the other one is the Re-ID based similarity, which is not covered in this section. For the proximity score, the Mahalanobis distance is used. This score is computed between i-th detector object and the j-th target by using the target’s predicted location and its associated uncertainty.
The parameters that can be used to improve the object ID identification are the same as the NvDCF tracker. However, since the NvDeepSORT is less visual tracking than the NvDCF, the maxShadowTrackingAge
and probationAge
parameters values can be smaller than the NvDCF YAML config. This allows the GPU to have more performance in terms of resources utilization. However, the occlusions and ID switches may be handled in a poorer way than the NvDCF tracker.
Re-ID
Re-identification (Re-ID) uses TensorRT™-accelerated deep neural networks to extract unique feature vectors from detected objects that are robust to spatial-temporal variance and occlusion. It has two use-cases in NvMultiObjectTracker: (1) In NvDeepSORT, the Re-ID similarity is used for data association of objects over consecutive frames.; (2) In target re-association, the Re-ID features of targets are extracted and kept, so that they can be used for re-association with the same target if they are seemingly lost. reidType
selects the mode for each aforementioned use-case.
The Re-ID similarity between a detector object and a target is the cosine similarity between the detector object’s Re-ID feature and its nearest neighbor in the target’s feature gallery, whose value is in range [0.0, 1.0]
. Specifically, each Re-ID feature in the target’s gallery takes the dot product with the detector object’s Re-ID feature. The maximum of all the dot products is the similarity score.
The Re-ID has a spatial-temporal constraint. If an object moves out of frame or gets occluded beyond maxShadowTrackingAge
, it will be assigned a new ID even if it returns into the frame.
The target re-association takes advantage of the Late Activation and Shadow Tracking in target management module. It tries to associate the newly-appeared targets with previously lost targets based on motion and Re-ID similarity in a seamless, real-time manner. Before a target is lost, the Re-ID network extracts its Re-ID feature with the frame interval of reidExtractionInterval
and stores them in the feature gallery. These features will be used to identify target re-appearance in the tracklet matching stage.
The supported Re-ID model formats are NVIDIA TAO, ONNX and UFF (deprecated). Multiple ready-to-use sample models are listed below. Scripts and README file for users to setup the model are provided in sources/tracker_ReID.
NVIDIA TAO ReIdentificationNet
NVIDIA pre-trained ReIdentificationNet is a high accuracy ResNet-50 model with feature length 256. It can be downloaded and used directly with command:
mkdir /opt/nvidia/deepstream/deepstream/samples/models/Tracker/
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/reidentificationnet/versions/deployable_v1.0/files/resnet50_market1501.etlt' -P /opt/nvidia/deepstream/deepstream/samples/models/Tracker/
The tracker config file supports this model by default. Note the raw output from this network is not L2 normalized, so addFeatureNormalization: 1
is set to add L2 normalization as a post processing.
This model is intented to re-identify people, other models can be added by following the instructions on this link.
Re-ID Feature Output
Objects’ Re-ID features can be accessed in the tracker plugin and downstream modules via Deepstream metadata, which can be used for other tasks such as multi-target multi-camera tracking. The low-level lib configuration yaml must contain this parameter on the ReID section:
outputReidTensor: 1
This feature is supported whenever NvDeepSORT or Re-ID based re-association is used. To retrieve Re-ID features for every frame, make sure interval=0
in PGIE config and reidExtractionInterval: 0
if re-association is enabled. Otherwise, the Re-ID features will be extracted at intervals only when PGIE generates bounding boxes and reidExtractionInterval
is met.
Multicamera Multiobject tracking
The ReID feature allows us to implement tracking of multiple objects that are detected in different cameras. This does not happen out of the box in the nvtracker element, so we would need to do this in an application or by creating a custom element.
To determine wether two output tensors represent the same object seen from different cameras, we use the same cosine similarity index
that is used for intra-frame reid. This allows us to measure how close the vectors are in the N-dimensional space represented by the N output features that model gives us.
Results
Initial detections
Occlusions case:
The car is sucessfully reidentified and associated as the same in both cameras: