GstInference with MobileNet

From RidgeRun Developer Wiki



Previous: Supported architectures/ResNet Index Next: Supported architectures/MobileNetV2 SSD





Description

GoogLeNet is an image classification convolutional neural network. It was designed to participate in the ImageNet challenge, a competition where research teams evaluate classification algorithms on the ImageNet data set and compete to achieve higher accuracy. ImageNet is a collection of hand-labeled images from 1000 distinct categories. GoogLeNet actually won the ImageNet challenge in 2014 using twelve timeless parameters than the previous winner. They achieve this by incorporating computer vision concepts on the "inception layer".

Architecture

The architecture of MobileNetV2 contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers. Use ReLU6 as the non-linearity because of its robustness when used with low-precision computation

Mobilenet architecture [1]

GStreamer Plugin

The Gstreamer plugin uses the pre-process and post-process described in the original paper. Please take into consideration that not all deep neural networks are trained the same even if they use the same model architecture. If the model is trained differently, details like label ordering, input dimensions, and color normalization can change.

This element was based on this TensorFlow repo (see this table for pre-trained weights). The pre-trained model used to test the element may be downloaded from our R2I Model Zoo for the different frameworks.

Pre-process

Input parameters:

  • Input size: 224 x 224
  • Format BGR

The pre-process consists of taking the input image and transforming it to the input size (by scaling, interpolation, cropping...). Then the mean is subtracted from each pixel on RGB. Finally, conversion to BGR is performed by inverting the order of the channels.

Post-process

The model output is a float array of size 1000 containing the probability for each one of the ImageNet labels. The post-process consists of simply searching the highest probability on the array.

References

  1. Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks”, inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510-4520, 2018


Previous: Supported architectures/ResNet Index Next: Supported architectures/MobileNetV2 SSD