R2Inference - TensorFlow-Lite

TensorFlow Lite is an open-source software library that is part of TensorFlow™. This provides a deep learning framework for on-device inference. Tensorflow lite models can be used on Android and IOS, also can be used on systems like Raspberry Pi and Arm64-based boards. The TensorFlow lite backend supports .tflite models and .tflite quantized models.

Installation

R2Inference TensorFlow Lite backend depends on the C/C++ TensorFlow API. The installation process consists on downloading the source code, build and install it.

TensorFlow Python API and utilities can be installed with Python pip. These are not needed by R2Inference, but they are highly recommended if you need to generate models.

X86

You can install the C/C++ Tensorflow API for x86 following the next steps:

Build and install Tensorflow Lite

Download Tensorflow source code:

git clone -b v2.0.1 https://github.com/tensorflow/tensorflow
export TensorflowPath=/PATH/TENSORFLOW/SRC
cd $TensorflowPath/tensorflow/lite/tools/make

Download dependencies:

./download_dependencies.sh

Build:

./build_lib.sh

Copy the static library to the libraries path:

cp gen/linux_x86_64/lib/libtensorflow-lite.a /usr/local/lib/

Install abseil dependency:

cd downloads/absl/
mkdir build && cd build
cmake ..
make && sudo make install

Cross-compile for ARM64

First, start installing the toolchain and needed libs:

sudo apt-get update
sudo apt-get install crossbuild-essential-arm64

Download Tensorflow source code:

export TensorflowPath=/PATH/TENSORFLOW/SRC
git clone -b v2.0.1 https://github.com/tensorflow/tensorflow
cd $TensorflowPath/tensorflow/lite/tools/make

Download the build dependencies:

./tensorflow/lite/tools/make/download_dependencies.sh

Then compile:

./tensorflow/lite/tools/make/build_aarch64_lib.sh

The static library is generated in: tensorflow/lite/tools/make/gen/linux_aarch64/lib/libtensorflow-lite.a.

Nvidia Jetson (TX1, TX2, Xavier, Nano)

You can install the C/C++ Tensorflow API for Jetson devices following the next steps:

Build and install Tensorflow Lite

Download Tensorflow source code:

export TensorflowPath=/PATH/TENSORFLOW/SRC
git clone -b v2.0.1 https://github.com/tensorflow/tensorflow
cd $TensorflowPath/tensorflow/lite/tools/make

Download dependencies:

./download_dependencies.sh

Build:

./build_aarch64_lib.sh

Copy the static library to the libraries path:

cp gen/aarch64_armv8-a/lib/libtensorflow-lite.a /usr/local/lib/

Install abseil dependency:

cd downloads/absl/
mkdir build && cd build
cmake ..
make && sudo make install

Generating a model for R2I

In Tensorflow, all file formats are based on protocol buffers. As a summary, protocol buffers (or protobuf, as referred on the documentation) are data structures for which there are a set of tools to generate classes in C, Python, and other languages in order to load, save, and access the data between the supported API's. More information about TensorFlow Model files can be found here. The steps to generate a graph model suitable for GstInference on the Tensorflow backend can be summarized in three main steps:

Save the graph structure that describes your model
Save checkpoint of your model training session (Session variables)
Combine the graph structure with the checkpoint data (this step is typically referred to as freezing the graph)

Saving a session with TensorFlow python API

In Tensorflow, you can use a saver object to handle saving and restoring of your model graph metadata and the checkpoint (variables) data. In general terms, outside a Tensorflow session a graph contains only the information regarding the mathematical operation that is performed, while the variables are given a particular value inside a session. Typically, after training your model, you can use a saver object to save both your graph structure and data checkpoint. The following is an example when working on the Tensorflow default graph:

#! /usr/bin/env python3

import tensorflow as tf
import os

#file name is model_graph.py
dir = os.path.dirname(os.path.realpath(__file__))

default_saver = tf.train.Saver() 

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  
  
  # Perform your graph construction and training
  
  
  default_saver.save(sess, dir + '/data-all')

This will generate 4 files:

model_graph.chkp.meta: Graph data and metadata (operations, configurations, etc), allows to load a graph and retrain it.
model_graph.chkp.index: This file has a key-value table linking a tensor name and the location to find the corresponding data in the chkp.data files
model_graph.chkp.data-00000-of-00001: Holds all variables (which includes weights of the graph) from the session at different timestamps
checkpoint : A file that keeps a record of the latest checkpoint files saved

The most important files are the chkp.meta and chkp.data files. On a directory, with such files, you can use the freeze_graph.py method provided by Tensorflow resources on a directory with the files generated by the saver object in order to generate a protocol buffer file suitable for GstInference.

You can refer to R2Inference Model Zoo for pre-trained models suitable for evaluating GstInference.

Create a model using saved weights from a saved model

This example code takes and saved model from a directory and converts it to a Tensorflow lite model with .tflite extension.

import tensorflow as tf

# Construct a basic model.
root = tf.train.Checkpoint()
root.v1 = tf.Variable(3.)
root.v2 = tf.Variable(2.)
root.f = tf.function(lambda x: root.v1 * root.v2 * x)

# Save the model.
export_dir = "/tmp/test_saved_model"
input_data = tf.constant(1., shape=[1, 1])
to_save = root.f.get_concrete_function(input_data)
tf.saved_model.save(root, export_dir, to_save)

# Convert the model.
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
tflite_model = converter.convert()

Tools

Convert tensorflow frozen graph to tflite

If Tensorflow Python Api is installed on the system you will find a tool to convert Tensorflow frozen graph (.pb) to Tensorflow lite format (.tflite). To convert models run:

export OUTPUT_FILE=/PATH/TO/OUTPUT_FILE export GRAPH_FILE=/PATH/TO/GRAPH_FILE

tflite_convert \
  --output_file=OUTPUT_FILE \
  --graph_def_file=GRAPH_FILE \
  --input_arrays=input \
  --output_arrays=output \
  --enable_v1_converter

Where input_arrays are the name of the input node of the model and output_arrays the name of the output node.

Tensorboard

TensorBoard is a visualization tool for TensorFlow. You can use TensorBoard to visualize your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it. To use TensorBoard you simply need to install TensorFlow core. Installing TensorFlow via pip should also automatically install TensorBoard. This tool is specially useful to determine the input and output layer name of undocumented graphs. TensorBoard can load any TensorFlow checkpoint generated with the same version (loading a checkpoint generated with a different Tensorflow version will result in errors).

tensorboard --logdir=route/to/checkpoint/dir

You will get a message similar to this:

TensorBoard 1.10.0 at http://mtaylor-laptop:6006 (Press CTRL+C to quit)

Open that address in your browser, go to graph and analyze the graph to determine the output node name. In this example the output node name is ArgMax because it's input is the resnet_model/final_dense signal.

Resnet output node

API

You can find the full documentation of the C API here and the Python API here. R2Inference uses only the C API and R2Inference takes care of the session, loading the graph, and executing. Because of this, we will only take a look at the options that you can change when using the C API through R2Inference.

R2Inference changes the options of the framework via the "IParameters" class. First you need to create an object:

r2i::RuntimeError error;
std::shared_ptr<r2i::IParameters> parameters = factory->MakeParameters (error);

Then call the "Set" or "Get" virtual functions:

parameters->Set(<option>, <value>)
parameters->Get(<option>, <value>)

Tensorflow Lite Options

Property	C API Counterpart	Value	Operation	Description
number_of_threads	Interpreter->SetNumThreads	Integer	R/W	Set Number of threads to run
allow_fp16	Interpreter->SetAllowFp16PrecisionForFp32	Integer	R/W	Allow the usage of 16-bit float point instead of 32 bits

Previous: Supported_backends/TensorFlow

Index

Next: Supported_backends/Caffe