GstInference with FaceNet
Make sure you also check GstInference's companion project: R2Inference |
GstInference |
---|
Introduction |
Getting started |
Supported architectures |
InceptionV1 InceptionV3 YoloV2 AlexNet |
Supported backends |
Caffe |
Metadata and Signals |
Overlay Elements |
Utils Elements |
Legacy pipelines |
Example pipelines |
Example applications |
Benchmarks |
Model Zoo |
Project Status |
Contact Us |
|
Description
FaceNet is a convolutional neural network used for face recognition. The CNN maps input images to a euclidean space, where the distance between points on this space corresponds to face similarity. The vector that describes a point on this space is called an embedding. FaceNet architecture is similar to other classification CNN, but its training process implements a novel method of triplet loss. This method is optimized to compute a 128-byte embedding that highlights similarity or difference between faces.
Architecture
As mentioned before, FaceNet novelty is not its architecture, but rather its training process. The authors tried out 4 different CNN architectures and in the end Inception v1 produced the best results [1]
Gstreamer Plugin
The Gstreamer plugin uses the pre-process and post-process as in InceptionV1 and Inception V2. Please take into consideration that not all deep neural networks are trained the same even if they use the same model architecture. If the model is trained differently, details like label ordering, input dimensions, and color normalization can change.
This element was based on the ncappzoo repo. The pre-trained model used to test the element may be downloaded from our R2I Model Zoo for the different frameworks.
Pre-process
Input parameters:
- Input size: 160 x 160
- RGB Mean: Calculated for each input
- RGB STD: Calculated for each input
- Format RGB
The pre-process consists of taking the input image and transforming it to the input size. The mean and std used are calculated for each input image. Then the mean is subtracted from each pixel on RGB, and it is divided by the standard deviation. No, no color space conversion is needed because the model was trained with RGB.
Post-process
The model output is a float array of size 128 containing an embedding that describes the input image. The post-process consists of simply forwarding this array. The Euclidean distance between different arrays produced by the facenet element determines the face similarity between its input images.
Examples
Please refer to the FaceNet section on the examples page.
References
- ↑ Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815-823. 2015.
- ↑ C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions”, in proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.