GstInference with InceptionV2 layer

From RidgeRun Developer Wiki



Previous: Supported architectures Index Next: Supported architectures/InceptionV4





Description

GoogLeNet is an image classification convolutional neural network. It was designed to participate in the ImageNet challenge, a competition where research teams evaluate classification algorithms on the ImageNet data set and compete to achieve higher accuracy. ImageNet is a collection of hand-labeled images from 1000 distinct categories. GoogLeNet actually won the ImageNet challenge in 2014 using twelve times fewer parameters than the previous winner. They achieve this by incorporating computer vision concepts on the "inception layer".

Architecture

The network uses LeNet CNN as a base but incorporates a novel element on the inception layer. The inception layer uses several very small convolutions in order to reduce the number of parameters. The architecture consists of 9 inception layers for a 22 layer deep CNN.

Googlenet architecture [1]

GStreamer Plugin

The Gstreamer plugin uses the pre-process and post-process described in the original paper. Please take into consideration that not all deep neural networks are trained the same even if they use the same model architecture. If the model is trained differently, details like label ordering, input dimensions, and color normalization can change.

This element was based on the ncappzoo repo. The pre-trained model used to test the element may be downloaded from our R2I Model Zoo for the different frameworks.

Pre-process

Input parameters:

  • Input size: 224 x 224
  • RGB Mean: [0.40787054 * 255, 0.45752458 * 255, 0.48109378 * 255]
  • Format BGR

The pre-process consists of taking the input image and transforming it to the input size (by scaling, interpolation, cropping...). Then the mean is subtracted from each pixel on RGB. Finally, conversion to BGR is performed by inverting the order of the channels.

Post-process

The model output is a float array of size 1000 containing the probability for each one of the ImageNet labels. The post-process consists of simply searching the highest probability on the array.

Examples

Please refer to the Googlenet section on the examples page.

References

  1. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions”, in proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015


Previous: Supported architectures Index Next: Supported architectures/InceptionV4