GstInference Roadmap

Backlog

Backends

ARM-NN: ARM NN is an inference unit that makes efficient use of CPU, GPU and NPU for a variety of existing backends. Manufacturers such as NXP are integrating ARMNN into their development environment.

OpenCV DNN: OpenCV also developed their inference engine which is supposed to be highly optimized for x86 and ARM. It would be interesting to explore this.

PyTorch Support: I've heard very good things about PyTorch, mainly about the stability. The typical phrase is: "it just works". Contrary to what its name suggests, it does provide a C++ API.

Caffe (supported by ONNX already): Caffe is another popular framework that has great support in NVIDIA platforms.

Cloud Inference: I've been thinking about this for a while. It would be very interesting to use GstInference as a proxy for cloud inference. Our backend would just send the appropriate requests and receive the output. The rest of the pipeline wouldn't even notice. I wonder if Google Cloud Vision or Google Cloud Video Intelligence are good options.

Use Cases

Image Segmentation: Image segmentation is the process of partitioning the image in multiple segments and/or objects. It's been a while now since Fully Convolutional Networks outperform any other method available (in terms of accuracy). Architecturally, supporting a segmentation model does not require any change, but the Metadata needs to be extended to include a map image per prediction. Here's a good survey on image segmentation using Deep Learning.

Pose Estimation: Pose estimation is the process of estimating the position of the joints of an articulated object, typically a human. This can be solved very accurately nowadays using Deep Learning. Architecturally, supporting a pose estimation does not require any change, but the Metadata needs to be extended to include a skeleton object per detection.

Temporal Inference: As per now, every inference is done in a per-frame basis. With this approach we can't predict problems such as: what's the speed of the ball, is he waving, did the car in front of me suddenly stopped, etc... This task attempts to support predictions that depend on frames over time. It might be that no change is required, but it needs to be analyzed.

Audio: We only support video at the moment. It'd be cool to support audio use cases as well.

Design Changes

Backend pre/post-processing: Some accelerators not only provide modules to perform the inference process but also to compute the preprocessing involved in preparing the image for the network and interpreting the output. As per now, the base class provides the mechanism to perform inference, and the specific element provides pre and post-processing. Each element is mapped to network architecture. This task delegates the pre/post-processing down to R2Inference for the backend to perform it if available, or fallback to a SW implementation.

Allocators: Many inferences are being done in special HW units. These typically have specific memory requirements such as special alignments, contiguousness or simply to be part of a predefined heap. As per now, we copy GStraeamer buffers into these memory arrays, but GStreamer has great support for allocators. These are basically a mechanism for downstream elements to communicate upstream ones about their memory requirements so that when they receive buffers, they are already using the correct memory type.

Batching: As per now, GstInference passes one frame at a time to the backend. While this work, it may be very inefficient as the communication overhead may be comparable to the actual processing. Batching is the process of grouping frames to pass them all together to the HW unit so that it makes better use of its resources. I'm thinking that GstBufferLists may be useful for this.

Different input/output data types: We only support floating-point inputs and outputs at the R2Inference level. As per now, all backends have been working this way, but we received requests already for alternative types. Maybe using templates? We need to figure a clean way to integrate this with the Gstreamer layer.

Evaluate Aggregator: Our GstVideoInference base class exposes 4 pads. See Anatomy of a GstInference Element. While this has a design justification, more and more we've been thinking if the complexity is worth it. A clear path is to get rid of the model sink and consider migrating to a base class such as GstAggregator or GstVideoAggregator. While both handle multiple inputs I'm unsure if they are able to handle the level of detail we need while maintaining synchronization. Consider the following case: a detector cascaded with a classifier. The classifier will act on the objects previously detected. The simplest case is where a single object is detected and a single buffer enters the model and the bypass. However, there is a case where no objects were detected at all and we still want to push the bypass (to avoid freezing the stream). Another case is when multiple objects are detected, in which case there is a N:1 relationship and synchronization needs to be worked out.

Remove Model Src Pad: This is related to the Evaluate Aggregator task.

Older

2020Q2

The following general items are planned for this quarter:

Backend pre/post-processing: Some accelerators not only provide modules to perform the inference process but also to compute the preprocessing involved in preparing the image for the network and interpreting the output. As per now, the base class provides the mechanism to perform inference, and the specific element provides pre and post-processing. Each element is mapped to network architecture. This task delegates the pre/post-processing down to R2Inference for the backend to perform it if available, or fallback to a SW implementation.

ONNX Support: ONNX is an open format built to represent machine learning models. Basically, by exporting your model to ONNX you'll be able to run it under different backends. Different backends are supported using different runtimes. This quarter we're adding support for Microsoft's ONNX Runtime. This will add acceleration for: CPU, CUDA, TensorRT, DirectML, MKL-DNN, MKL-ML, nGraph, NUPHAR, OpenVINO.

Google Coral TPU: Google's Coral is a complete toolkit to build products with local AI. Coral's models run using TensorFlow Lite which we support already. However, in order to make use of the TPU a special context and delegates need to be used. This will be absorbed by GstInference as a new backend.

Meson Support: Following GStreamer's and Glib's, we are adopting Meson as our build system. As per now, we are not planning on dropping support for Autotools yet.

Legacy GstMeta Deprecation: Previous quarter we made a large design refactor in order to produce a single, hierarchically richer metadata instead of separate, standalone ones. Now Detection, Classification and Embedding were absorbed in GstInferenceMeta. For backward compatibility, old metas are still kept within the project, but we'd like to drop them as this will result in much cleaner code.

GStreamer Mainline Integration: There is an open Merge Request to gst-plugins-bad in order to absorb GstInference as part of GStreamer. Part of the resources will be designated for this effort and help move this forward.

2019Q4

InferenceCrop element: Crop out detection bounding boxes for further processing

InferenceFilter element: Allow conditional inference by only processing if the prediction meets a certain criteria

InferenceBin element: High level bin that configures the GstInference element in their typical configuration.

GstInferenceMeta: Big refactor in the way predictions are handled. Instead of having separate detections and classifications, now we have a single hierarchical stricture of predictions that are all interrelated. These allow for much more complex use cases.

TensorFlow Lite backend: TensorFlow Lite is a stripped-down lightweight version of TensorFlow designed for mobile devices.

TensorRT backend: TensorRT is NVIDIA's homemade inference engine. It makes very optimal use of their GPUs and DLAs. While TensorFlow, Caffe or ONNX may run TensorRT underneath, calling it directly resulted in a huge performance boost.

2019Q2-2019Q1

MobileNetV2: Popular lightweight classification network
TinyYoloV3: Popular lightweight detection network
ResNet50: Popular classification network
i.MX8 Support: Port and test performance on the i.MX8 embedded platforms
TensorFlow (CPU) backend: Add support for TensorFlow models inference. Likely the most popular machine learning framework nowadays.
TensorFlow (GPU) backend: Extend TensorFlow inference to use NVIDIA GPUs.
NCSDK backend: Add support for Intel's NCSDK for the Movidius Neural Engine.
Overlay elements: Debug elements to draw bounding boxes and labels on top of video frames
InceptionV4: State of the art classification architecture
TinyYoloV2: Popular lightweight detection architecture.

Previous: Releases

Index

Next: Contact us

❯