GstInference with FaceNet

From RidgeRun Developer Wiki




Previous: Supported architectures/InceptionV4 Index Next: Supported architectures/TinyYoloV2





Description

FaceNet is a convolutional neural network used for face recognition. The CNN maps input images to a euclidean space, where the distance between points on this space corresponds to face similarity. The vector that describes a point on this space is called an embedding. FaceNet architecture is similar to other classification CNN, but its training process implements a novel method of triplet loss. This method is optimized to compute a 128-byte embedding that highlights similarity or difference between faces.

Architecture

As mentioned before, FaceNet novelty is not its architecture, but rather its training process. The authors tried out 4 different CNN architectures and in the end Inception v1 produced the best results [1]

Googlenet architecture [2]

Gstreamer Plugin

The Gstreamer plugin uses the pre-process and post-process as in InceptionV1 and Inception V2. Please take into consideration that not all deep neural networks are trained the same even if they use the same model architecture. If the model is trained differently, details like label ordering, input dimensions, and color normalization can change.

This element was based on the ncappzoo repo. The pre-trained model used to test the element may be downloaded from our R2I Model Zoo for the different frameworks.

Pre-process

Input parameters:

  • Input size: 160 x 160
  • RGB Mean: Calculated for each input
  • RGB STD: Calculated for each input
  • Format RGB

The pre-process consists of taking the input image and transforming it to the input size. The mean and std used are calculated for each input image. Then the mean is subtracted from each pixel on RGB, and it is divided by the standard deviation. No, no color space conversion is needed because the model was trained with RGB.

Post-process

The model output is a float array of size 128 containing an embedding that describes the input image. The post-process consists of simply forwarding this array. The Euclidean distance between different arrays produced by the facenet element determines the face similarity between its input images.

Examples

Please refer to the FaceNet section on the examples page.

References

  1. Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815-823. 2015.
  2. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions”, in proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.


Previous: Supported architectures/InceptionV4 Index Next: Supported architectures/TinyYoloV2