GstInference with MobileNet
Make sure you also check GstInference's companion project: R2Inference |
GstInference |
---|
Introduction |
Getting started |
Supported architectures |
InceptionV1 InceptionV3 YoloV2 AlexNet |
Supported backends |
Caffe |
Metadata and Signals |
Overlay Elements |
Utils Elements |
Legacy pipelines |
Example pipelines |
Example applications |
Benchmarks |
Model Zoo |
Project Status |
Contact Us |
|
Description
GoogLeNet is an image classification convolutional neural network. It was designed to participate in the ImageNet challenge, a competition where research teams evaluate classification algorithms on the ImageNet data set and compete to achieve higher accuracy. ImageNet is a collection of hand-labeled images from 1000 distinct categories. GoogLeNet actually won the ImageNet challenge in 2014 using twelve timeless parameters than the previous winner. They achieve this by incorporating computer vision concepts on the "inception layer".
Architecture
The architecture of MobileNetV2 contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers. Use ReLU6 as the non-linearity because of its robustness when used with low-precision computation
GStreamer Plugin
The Gstreamer plugin uses the pre-process and post-process described in the original paper. Please take into consideration that not all deep neural networks are trained the same even if they use the same model architecture. If the model is trained differently, details like label ordering, input dimensions, and color normalization can change.
This element was based on this TensorFlow repo (see this table for pre-trained weights). The pre-trained model used to test the element may be downloaded from our R2I Model Zoo for the different frameworks.
Pre-process
Input parameters:
- Input size: 224 x 224
- Format BGR
The pre-process consists of taking the input image and transforming it to the input size (by scaling, interpolation, cropping...). Then the mean is subtracted from each pixel on RGB. Finally, conversion to BGR is performed by inverting the order of the channels.
Post-process
The model output is a float array of size 1000 containing the probability for each one of the ImageNet labels. The post-process consists of simply searching the highest probability on the array.
References
- ↑ Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks”, inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520, 2018