R2Inference - Introduction

Deep Learning has revolutionized classic computer vision techniques to enable even more intelligent and autonomous systems. Deep learning frameworks offer building blocks for designing, training, and validating deep neural networks, through a high-level programming interface. However, not all frameworks are supported on every hardware, and porting a deep learning application to new hardware usually implies learning a new framework. Changing deep learning architectures is difficult, because some architectures may not be ported yet to a specific framework. R2Inference is an ongoing open-source project from Ridgerun Engineering that allows framework-independent development through an intuitive API. With R2I, changing frameworks is as easy as compiling the library with a different backend, no in-code changes are needed.

General Concepts

The deep learning workflow, to solve a specific problem, usually has the following steps:

Identify the deep learning task that solves the problem (classification, detection, segmentation...)
Get a dataset that matches the problem data and is labeled accordingly to the task selected.
Select a deep learning architecture that solves the task and matches the task complexity. For example, it would be overkill to use GoogLeNet for two classes classification. Alternatively, you can design your own architecture or modify an existing one.
Train your architecture with the dataset to produce a model with weights. R2Inference is especially useful here because some frameworks are optimized for training and others for inference.
Optimize the resulting model and deploy it on your device.

Deep Learning Architecture

Sometimes the terms "model" and "architecture" are used indistinctly. When we talk about architecture, we are referring to the model description. The architecture is the description of each layer that constitutes a deep learning model, that solves a specific task. Architectures by themselves can't solve the task. First, they need to be trained with a data set that matches the problem to be solved. The same deep learning architecture can be used to solve different problems as long as they belong to the same task type. For example, here is a glance at the GoogLeNet architecture:

We dive deep into the most popular and successful deep learning architectures on the GstInference supported architectures wiki page.

Deep Learning Model

A model is a result of training an architecture with a data set. It is composed of the model description or architecture and the trained weights. In some frameworks, they are split into two files (TensorFlow) and in others, there is only one file. It's easy to find pre-trained models online for popular tasks such as ImageNet, but it is important to take into consideration that sometimes other operations are applied during training, and those operations need to be applied when evaluating the model too. We call the operations applied to the image before evaluating the model "pre-processing" and the operations applied to the output of the model "post-processing".

Deep Learning Tasks

The task that can be performed by deep convolutional neural networks is virtually endless. R2Inference supports the 3 most popular tasks.

Classification

In classification networks, the input is an image and the output is an array containing the probability that the image belongs to each training class.

Detection

Detection can be view as classification on a larger scale. Instead of one image producing a single class, one image can have multiple bounding boxes, each one with its corresponding class.

Segmentation

Segmentation is a pixel-wise classification. Each pixel on the image is given a probability for each training class.

R2Inference Classes

FrameworkMeta : Describes Framework meta information
IFrameworkFactory: Creates interrelated objects of a framework
IEngine: Evaluates IFrame in IModels
ILoader: Validates IModel for an IEngine
IModel: Abstracts a framework model
IFrame: Abstracts a data input
IParameters: Sets and gets properties
IPrediction: Abstracts a network output

References

↑ C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions”, in proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015

Index

Next: Getting started

❯

[1] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions”, in proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015

[1]