ReID and Target Reassociation using NVIDIA Deepstream: Difference between revisions

From RidgeRun Developer Wiki
No edit summary
Line 4: Line 4:
|description=This page explains the NVIDIA Deepstream NvDCF Tracker and the ReID objects features usage for target reassociation when there are occlusions or Multicamera Multiobject tracking.  
|description=This page explains the NVIDIA Deepstream NvDCF Tracker and the ReID objects features usage for target reassociation when there are occlusions or Multicamera Multiobject tracking.  
}}
}}
 
[[File:ReID promotion.png|alt=Re Identification and tracking with NVIDIA Deepstream. Image by Freepik|center|thumb|600x600px|Re-Identification and tracking with NVIDIA Deepstream.]]
= Introduction =


= DeepStream =
= DeepStream =

Revision as of 16:00, 16 December 2024

Re Identification and tracking with NVIDIA Deepstream. Image by Freepik
Re-Identification and tracking with NVIDIA Deepstream.

DeepStream

DeepStream is a streaming analytic toolkit to build AI-powered applications. It takes the streaming data as input - from USB/CSI camera, video from file or streams over RTSP, and uses AI and computer vision to generate insights from pixels for better understanding of the environment. This toolkit exposes some GStreamer plugins, such as the Gst-nvtracker which allows the DS pipeline to use a low-level tracker library to track the detected objects over time persistently with unique IDs. This is the element of our interest in this documentation.

Gst-NvTracker

The nvtracker supports 4 different tracker types, which are:

  • IOU (Intersection over Union)
  • NvSORT (Simple Online and Realtime Tracking)
  • NvDeepSORT (Realtime Tracking with a Deep Association Metric)
  • NvDCF (discriminative correlation filter)

The main tracker types that works with target re-association and data association metrics are the NvDCF and NvDeepSORT.

NvDCF

The nvDCF tracker uses the discriminative correlation filter (DCF), which allows reidentifying an object when it was partially or fully occluded. The goal of the nvDCF tracker is to identify the same object target in the frames using the DCF which learns the correlation of each object. This correlation filter yields the maximum response at the center of the object, as shown in the following image:

The left side image shows the target object, while the right image presents the correlation response map where the red color denotes a higher confidence and blue a lower confidence detection. The yellow cross represents the center of the correlation map response.
nvDCF tracker data association. Taken from NvDCF Tracker Data Association


Thanks to the above tracking capabilities, the tracker can re-identify a target object even if there are partial or full occlusions in some cases. The tracker uses a YAML configuration file with defined parameters to improve the tracking accuracy and robustness.

Some parameters that have been explored are:

  • maxShadowTrackingAge: After a target reaches the maximum number (in frames) of shadow tracking the tracker will be terminated. Shadow tracking is a mode where a target is being tracked in the background and not explicitly detected.
  • probationAge: Is a probationary period in a number of frames where a target exists but is not associated with a detector object. If the target exceeds this parameter without being associated, the object will be considered lost.

For example, for the occlusions in the following images, the probationAge and the maxShadowTrackingAge parameters were increased. The first one is to avoid the new targets being associated too soon and the second one to keep the ID already assigned in Shadow Tracking so that when the objects reappear, the tracker can reassign the ID to the same objects.

This image shows the object targets on the image left side before the occlusions happens. The occlusions will be performed by the police and taxi vehicles. The targets of interest are the cars within the IDs 4, 6, 7, and 8
This image shows the object targets on the image left side after the occlusion by the police and taxi vehicles happened. Note that the targets of interest keeps it IDs, which are the IDs 4, 6, 7, and 8

Another parameter that can be used is:

  • earlyTerminationAge: If the shadowTrackingAge reaches this threshold while the tracker is in Tentative mode, the target will be terminated prematurely. The tentative mode is when a new tracker is created to track an object but the object tracked is kept under a probatory period (probationAge) until it is associated. If the target is in the Tentative mode and the shadowTrackingAge reaches earlyTerminationAge specified in the config file, the target will be terminated too early.


A parameter from the gst-nvinfer can be used as well:

  • interval: By specifying this parameter greater than zero, that specific number of batches will be skipped for inference.

An example of using the interval parameter is when an object "steals" the ID from a previous object detected, as shown the following image:

This image shows how the grey vehicle steals the target red car object ID after when the occlusion happens.

The interval parameter can be increased in order to avoid the detection being performed every frame, which produces a behavior shown in the following image:

This image shows that the target red car keeps its object ID after an occlusion by the grey vehicle happens.

To understand why increasing the interval parameter allows the object to keep the same ID for the same object, we should check the multiobject tracking principle seen in the following image. In multiobject tracking, the detections done can be predicted using Data Association by computing the score matrix with the new detections and matching the already existing target detection. Then, the Target Management handles new trackers instantiation or target termination and performs target state updates. Therefore, if we tell to the nvinfer to perform a detection every interval frame, we could keep the Bounding Box in the same place for the next detection since the NvDCF tracker tracks the center of the object. So the tracker could assign the same ID to the same object

This image shows the multiobject tracking concept in a nutshell. The red bounding boxes are the objects that have been detected by the inference detector, while the green bounding boxes are the objects that have been tracked by the tracker. The multiobject tracking tries to predict the object detection trajectory, when the detector and the tracker B-boxes are closer to each other, they are paired and matched.
Multiobject tracking in a nutshell. Adapted from NVIDIA DeepStream Technical Deep Dive: Multi-Object Tracker

The above strategy has a very strong drawback since it is very dependable on the tracking use case scenario. So, it is not a real solution, but we know that the strategy can be applied depending on the use case scenario.

The following table summarizes other tracker parameters that can be used to tune the reassociation results.

Parameter Description Comments
minMatchingScore4Overall (is defined

in TargetManagment and DataAssociator)

Min total score for re-association
SearchRegionPaddingScale Sets the size of the search region as a

multiple of the diagonal of the target’s bounding box

It can set SearchRegionPaddingScale to the max,

but it wouldn’t cover the entire image.

minTrackerConfidence If the confidence of an object tracker is lower than

this parameter on the fly, then it will be tracked in shadow mode

minTrajectoryLength4Projection A trajectory projection in frames in done

based on the min tracket length of a target

prepLength4TrajectoryProjection Length of the trajectory during which the

state estimator is updated to make projections in frames

trajectoryProjectionLength Length of the projected trajectory in frames
minBboxSizeSimilarity4TrackletMatching Min bbox size similarity for tracklet matching
maxTrackletMatchingTimeSearchRange Search space in time for max tracklet similarity
minMatchingScore4SizeSimilarity Min bbox size similarity score for valid matching
featureImgSizeLevel Size of a feature image The GPU performance can be increased by lowering

the featureImgSizeLevel value at the cost of accuracy and robustness.

filterLr Exponential moving average of the learning rate for

the DCF filter

Overlearning or fast learning could not be shown

desired results

filterChannelWeightsLr Learning rate for weights for different feature

channels in DCF

gaussianSigma Standard deviation for Gaussian for desired response
minMatchingScore4VisualSimilarity Min visual similarity score for valid matching

NvDeepSORT

The NvDeepSORT tracker is based on the object appearance information by using deep learning for accurate object detection and matching in different frames and locations. This tracker is also useful for handling occlusions and ID switches.

One of the metrics for the Data Association is the proximity, the other one is the Re-ID based similarity, which is not covered in this section. For the proximity score, the Mahalanobis distance is used. This score is computed between i-th detector object and the j-th target by using the target’s predicted location and its associated uncertainty.

The parameters that can be used to improve the object ID identification are the same as the NvDCF tracker. However, since the NvDeepSORT is less visual tracking than the NvDCF, the maxShadowTrackingAge and probationAge parameters values can be smaller than the NvDCF YAML config. This allows the GPU to have more performance in terms of resources utilization. However, the occlusions and ID switches may be handled in a poorer way than the NvDCF tracker.

Re-ID

Re-identification (Re-ID) uses TensorRT™-accelerated deep neural networks to extract unique feature vectors from detected objects that are robust to spatial-temporal variance and occlusion. It has two use-cases in NvMultiObjectTracker: (1) In NvDeepSORT, the Re-ID similarity is used for data association of objects over consecutive frames.; (2) In target re-association, the Re-ID features of targets are extracted and kept, so that they can be used for re-association with the same target if they are seemingly lost. reidType selects the mode for each aforementioned use-case.

The Re-ID similarity between a detector object and a target is the cosine similarity between the detector object’s Re-ID feature and its nearest neighbor in the target’s feature gallery, whose value is in range [0.0, 1.0]. Specifically, each Re-ID feature in the target’s gallery takes the dot product with the detector object’s Re-ID feature. The maximum of all the dot products is the similarity score.

The Re-ID has a spatial-temporal constraint. If an object moves out of frame or gets occluded beyond maxShadowTrackingAge, it will be assigned a new ID even if it returns into the frame.

The target re-association takes advantage of the Late Activation and Shadow Tracking in target management module. It tries to associate the newly-appeared targets with previously lost targets based on motion and Re-ID similarity in a seamless, real-time manner. Before a target is lost, the Re-ID network extracts its Re-ID feature with the frame interval of reidExtractionInterval and stores them in the feature gallery. These features will be used to identify target re-appearance in the tracklet matching stage.

The supported Re-ID model formats are NVIDIA TAO, ONNX and UFF (deprecated). Multiple ready-to-use sample models are listed below. Scripts and README file for users to setup the model are provided in sources/tracker_ReID.


NVIDIA TAO ReIdentificationNet

NVIDIA pre-trained ReIdentificationNet is a high accuracy ResNet-50 model with feature length 256. It can be downloaded and used directly with command:

mkdir /opt/nvidia/deepstream/deepstream/samples/models/Tracker/
wget 'https://api.ngc.nvidia.com/v2/models/nvidia/tao/reidentificationnet/versions/deployable_v1.0/files/resnet50_market1501.etlt' -P /opt/nvidia/deepstream/deepstream/samples/models/Tracker/

The tracker config file supports this model by default. Note the raw output from this network is not L2 normalized, so addFeatureNormalization: 1 is set to add L2 normalization as a post processing.

This model is intented to re-identify people, other models can be added by following the instructions on this link.

Re-ID Feature Output

Objects’ Re-ID features can be accessed in the tracker plugin and downstream modules via Deepstream metadata, which can be used for other tasks such as multi-target multi-camera tracking. The low-level lib configuration yaml must contain this parameter on the ReID section:

  • outputReidTensor: 1

This feature is supported whenever NvDeepSORT or Re-ID based re-association is used. To retrieve Re-ID features for every frame, make sure interval=0 in PGIE config and reidExtractionInterval: 0 if re-association is enabled. Otherwise, the Re-ID features will be extracted at intervals only when PGIE generates bounding boxes and reidExtractionInterval is met.

Multicamera Multiobject tracking

The ReID feature allows us to implement tracking of multiple objects that are detected in different cameras. This does not happen out of the box in the nvtracker element, so we would need to do this in an application or by creating a custom element.

To determine wether two output tensors represent the same object seen from different cameras, we use the same cosine similarity index that is used for intra-frame reid. This allows us to measure how close the vectors are in the N-dimensional space represented by the N output features that model gives us.

Results

Initial detections

This image shows the initial detections for the target object which is the orange vehicle within the ID 1

Occlusions case:

This image shows the occlusion caused by the tree when the target object is moving forward

The car is successfully reidentified and associated as the same in both cameras:

The left side image shows the target object reassociation after the occlusion and the right side image also shows the target object reidentification when the object jumps from one camera to another. Both reassociations are done by using the ReID object features