NVIDIA Jetson Xavier - Parsing a Tensorflow model for TensorRT

From RidgeRun Developer Wiki



Previous: Deep Learning/TensorRT/Tensorflow Index Next: Deep Learning‎/TensorRT/Parsing Caffe









TensorRT can also be used on previously generated Tensorflow models to allow for faster inference times. This is a more common case of deployment, where the convolutional neural network is trained on a host with more resources, and then transferred to an embedded system for inference.

At the end of this guide you will be able to:

  • Convert a TensorFlow SavedModel to a Frozen Graph.
  • Load a Frozen Graph for inference.
  • Run a TensorRT inference engine on Xavier.

To follow this guide you will need:

  • Jetson Xavier with JetPack 4.1

And a host development computer with:

  • Tensorflow
  • TensorRT
  • A trained Tensorflow model

Step 1: Install Prerequisites

  1. Install JetPack
  2. Install Tensorflow on host
  3. Follow the instructions to train a model on the host. You can also get other trained model of your choosing.
  4. Install TensorRT and its tools on the host computer. First, download the .deb package from nvidia download page and install TensorRT:
sudo dpkg -i  <your-deb-package>
sudo apt-get update
sudo apt-get install tensorrt
sudo apt-get install python-libnvinfer-dev
sudo apt-get install uff-converter-tf

Step 3: Generate the UFF

The new unified format for neural networks is called as UFF.
There are two ways to generate a TensorRT engine from TensorFlow. If you are using TensorFlow to train your own model you can add uff to the python training script and generate the uff from the model stream. If you already trained the model or downloaded a trained model checkpoint or frozen graph you can convert a frozen protobuf to uff

Tensorflow Modelstream to UFF

This step is done on the host computer. The UFF Toolkit installed on the previous step allows you to convert TensorFlow models to UFF. The UFF parser can build TensorRT engines from these UFF models.

You will need the following includes:

import TensorFlow as tf #there is a known bug where TensorFlow needs to be imported before TensorRT
import uff # to convert the graph from a serialized frozen TensorFlow model to UFF.
import numpy as np
import time
import os

Create your model, for this example we are using LeNet5 model to classify handwritten digits:

STARTER_LEARNING_RATE = 1e-4
BATCH_SIZE = 10
NUM_CLASSES = 10
MAX_STEPS = 3000
IMAGE_SIZE = 28
IMAGE_PIXELS = IMAGE_SIZE ** 2
OUTPUT_NAMES = ["fc2/Relu"]

def WeightsVariable(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.1, name='weights'))

def BiasVariable(shape):
    return tf.Variable(tf.constant(0.1, shape=shape, name='biases'))

def Conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    filter_size = W.get_shape().as_list()
    pad_size = filter_size[0]//2
    pad_mat = np.array([[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]])
    x = tf.pad(x, pad_mat)
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='VALID')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

def MaxPool2x2(x, k=2):
    # MaxPool2D wrapper
    pad_size = k//2
    pad_mat = np.array([[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]])
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='VALID')

def network(images):
    # Convolution 1
    with tf.name_scope('conv1'):
        weights = WeightsVariable([5,5,1,32])
        biases = BiasVariable([32])
        conv1 = tf.nn.relu(Conv2d(images, weights, biases))
        pool1 = MaxPool2x2(conv1)
    # Convolution 2
    with tf.name_scope('conv2'):
        weights = WeightsVariable([5,5,32,64])
        biases = BiasVariable([64])
        conv2 = tf.nn.relu(Conv2d(pool1, weights, biases))
        pool2 = MaxPool2x2(conv2)
        pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
    # Fully Connected 1
    with tf.name_scope('fc1'):
        weights = WeightsVariable([7 * 7 * 64, 1024])
        biases = BiasVariable([1024])
        fc1 = tf.nn.relu(tf.matmul(pool2_flat, weights) + biases)
    # Fully Connected 2
    with tf.name_scope('fc2'):
        weights = WeightsVariable([1024, 10])
        biases = BiasVariable([10])
        fc2 = tf.nn.relu(tf.matmul(fc1, weights) + biases)
    return fc2

def loss_metrics(logits, labels):
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,
                                                                   logits=logits,
                                                                   name='softmax')
    return tf.reduce_mean(cross_entropy, name='softmax_mean')


def training(loss):
    tf.summary.scalar('loss', loss)
    global_step = tf.Variable(0, name='global_step', trainable=False)
    learning_rate = tf.train.exponential_decay(STARTER_LEARNING_RATE,
                                               global_step,
                                               100000,
                                               0.75,
                                               staircase=True)
    tf.summary.scalar('learning_rate', learning_rate)
    optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op

def evaluation(logits, labels):
    correct = tf.nn.in_top_k(logits, labels, 1)
    return tf.reduce_sum(tf.cast(correct, tf.int32))

def do_eval(sess,
            eval_correct,
            images_placeholder,
            labels_placeholder,
            data_set,
            summary):
    true_count = 0
    steps_per_epoch = data_set.num_examples // BATCH_SIZE
    num_examples = steps_per_epoch * BATCH_SIZE
    for step in range(steps_per_epoch):
        feed_dict = fill_feed_dict(data_set,
                                   images_placeholder,
                                   labels_placeholder)
        log, correctness = sess.run([summary, eval_correct], feed_dict=feed_dict)
        true_count += correctness
    precision = float(true_count) / num_examples
    tf.summary.scalar('precision', tf.constant(precision))
    print('Num examples %d, Num Correct: %d Precision @ 1: %0.04f' %
          (num_examples, true_count, precision))
    return log

def placeholder_inputs(batch_size):
    images_placeholder = tf.placeholder(tf.float32, shape=(None, 28, 28, 1))
    labels_placeholder = tf.placeholder(tf.int32, shape=(None))
    return images_placeholder, labels_placeholder

def fill_feed_dict(data_set, images_pl, labels_pl):
    images_feed, labels_feed = data_set.next_batch(BATCH_SIZE)
    feed_dict = {
        images_pl: np.reshape(images_feed, (-1,28,28,1)),
        labels_pl: labels_feed,
    }
    return feed_dict

def run_training(data_sets):
    with tf.Graph().as_default():
        images_placeholder, labels_placeholder = placeholder_inputs(BATCH_SIZE)
        logits = network(images_placeholder)
        loss = loss_metrics(logits, labels_placeholder)
        train_op = training(loss)
        eval_correct = evaluation(logits, labels_placeholder)
        summary = tf.summary.merge_all()
        init = tf.global_variables_initializer()
        saver = tf.train.Saver()
        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
        sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
        summary_writer = tf.summary.FileWriter("/tmp/tensorflow/mnist/log",
                                               graph=tf.get_default_graph())
        test_writer = tf.summary.FileWriter("/tmp/tensorflow/mnist/log/validation",
                                            graph=tf.get_default_graph())
        sess.run(init)
        for step in range(MAX_STEPS):
            start_time = time.time()
            feed_dict = fill_feed_dict(data_sets.train,
                                       images_placeholder,
                                       labels_placeholder)
            _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
            duration = time.time() - start_time
            if step % 100 == 0:
                print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))
                summary_str = sess.run(summary, feed_dict=feed_dict)
                summary_writer.add_summary(summary_str, step)
                summary_writer.flush()
            if (step + 1) % 1000 == 0 or (step + 1) == MAX_STEPS:
                checkpoint_file = os.path.join("/tmp/tensorflow/mnist/log", "model.ckpt")
                saver.save(sess, checkpoint_file, global_step=step)
                print('Validation Data Eval:')
                log = do_eval(sess,
                              eval_correct,
                              images_placeholder,
                              labels_placeholder,
                              data_sets.validation,
                              summary)
                test_writer.add_summary(log, step)
        # Return sess
        graphdef = tf.get_default_graph().as_graph_def()
        frozen_graph = tf.graph_util.convert_variables_to_constants(sess,
                                                                    graphdef,
                                                                    OUTPUT_NAMES)
        return tf.graph_util.remove_training_nodes(frozen_graph)

The important part on the above code is this one:

graphdef = tf.get_default_graph().as_graph_def()
frozen_graph = tf.graph_util.convert_variables_to_constants(sess,graphdef, OUTPUT_NAMES)
return tf.graph_util.remove_training_nodes(frozen_graph)

In these lines we generate the fozen graph from the session and the graph def and remove the training nodes. Now we load the TensorFlow MNIST data loader and run training. The model has summaries included, so you can visualize training in TensorBoard:

MNIST_DATASETS = tf.contrib.learn.datasets.load_dataset("mnist")
tf_model = run_training(MNIST_DATASETS)

Finally to generate the UFF run:

uff_model = uff.from_tensorflow(tf_model, ["fc2/Relu"], output_filename="saved_model.uff")

This function call should output something like this:

Using output node fc2/Relu
Converting to UFF graph
No. nodes: 28
UFF Output written to model.uff

The function uff.from_tensorflow also has the following parameters:

  • quiet=[true|false]: To suppress logging
  • input_nodes=[...]: To allow you to define a set of input nodes in the graph
  • text=[true|false]: To save a human readable version of UFF model
  • list_nodes=[true|false]: To list the nodes on the graph

Tensorflow saved session to UFF

For this section, you will need a saved Tensorflow checkpoint folder. First, we need to load the checkpoint file. The tf.train.Saver object saves and restores variables to/from checkpoint files. Note that loading a checkpoint generated with a different Tensorflow version will result on errors. Sadly, to be able to load a TensorFlow checkpoint you need to know the output node name. A trick to get this from an unknown model is to load it in tensorboard:

tensorboard --logdir=route/to/checkpoint/dir

You will get a message similar to this:

TensorBoard 1.10.0 at http://mtaylor-laptop:6006 (Press CTRL+C to quit)

Open that address in your browser, go to graph and analyze the graph to determine the output node name. In this example the output node name is ArgMax because it's input is the resnet_model/final_dense signal.

Resnet output node

Add the following code to a python file:

import tensorflow as tf
from tensorflow.contrib import * # this include isn't used but solves some common tf errors
import uff # to convert the graph from a serialized frozen TensorFlow model to UFF.

#Load checkpoint
checkpoint = tf.train.get_checkpoint_state("route/to/checkpoint/folder") #Get all checkpoint names present on the given folder
input_checkpoint = checkpoint.model_checkpoint_path

#Devices should be cleared to allow Tensorflow to control placement of graph when loading on different machines
saver = tf.train.import_meta_graph(input_checkpoint + '.meta',  clear_devices=True)

#Get the graph_def
graph = tf.get_default_graph()
input_graph_def = graph.as_graph_def()

#output names array
output_nodes_names = ['YOUR','MODEL', 'OUTPUT', 'NODES']

with tf.Session(graph=graph) as sess:
  saver.restore(sess, input_checkpoint)
  frozen_graph = tf.graph_util.convert_variables_to_constants(sess, input_graph_def, output_nodes_names)
  frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph)
  uff_model = uff.from_tensorflow(frozen_graph, output_nodes_names, output_filename="model.uff")

This python code opens the checkpoint folder a generate the file "model.uff". Note that not all models on TensorFlow can be saved as a uff. For example, ResNet can't be converted to UFF because it uses the ArgMax layer and it isn't supported right now on uff.

Step 4: Load the uff file and perform inference

Up to this point, everything was running on the host computer, however, the engine should be created on the actual platform (Xavier) because TensorRT runs device-specific profiling during the optimization phase. Since Python API isn't supported on Xavier at this time, the uff must be loaded with the C++ API instead.

Loading the uff is an actual example provided by NVIDIA with TensorRT naned sample_uff_mnist. For more details on this example please refer to the C++ API section.



Previous: Deep Learning/TensorRT/Tensorflow Index Next: Deep Learning‎/TensorRT/Parsing Caffe