NVIDIA Jetson Xavier - Parsing a Tensorflow model for TensorRT
TensorRT can also be used on previously generated Tensorflow models to allow for faster inference times. This is a more common case of deployment, where the convolutional neural network is trained on a host with more resources, and then transferred to an embedded system for inference.
At the end of this guide you will be able to:
- Convert a TensorFlow SavedModel to a Frozen Graph.
- Load a Frozen Graph for inference.
- Run a TensorRT inference engine on Xavier.
To follow this guide you will need:
- Jetson Xavier with JetPack 4.1
And a host development computer with:
- Tensorflow
- TensorRT
- A trained Tensorflow model
Step 1: Install Prerequisites
- Install JetPack
- Install Tensorflow on host
- Follow the instructions to train a model on the host. You can also get other trained model of your choosing.
- Install TensorRT and its tools on the host computer. First, download the .deb package from nvidia download page and install TensorRT:
sudo dpkg -i <your-deb-package> sudo apt-get update sudo apt-get install tensorrt sudo apt-get install python-libnvinfer-dev sudo apt-get install uff-converter-tf
Step 3: Generate the UFF
The new unified format for neural networks is called as UFF.
There are two ways to generate a TensorRT engine from TensorFlow. If you are using TensorFlow to train your own model you can add uff to the python training script and generate the uff from the model stream. If you already trained the model or downloaded a trained model checkpoint or frozen graph you can convert a frozen protobuf to uff
Tensorflow Modelstream to UFF
This step is done on the host computer. The UFF Toolkit installed on the previous step allows you to convert TensorFlow models to UFF. The UFF parser can build TensorRT engines from these UFF models.
You will need the following includes:
import TensorFlow as tf #there is a known bug where TensorFlow needs to be imported before TensorRT import uff # to convert the graph from a serialized frozen TensorFlow model to UFF. import numpy as np import time import os
Create your model, for this example we are using LeNet5 model to classify handwritten digits:
STARTER_LEARNING_RATE = 1e-4 BATCH_SIZE = 10 NUM_CLASSES = 10 MAX_STEPS = 3000 IMAGE_SIZE = 28 IMAGE_PIXELS = IMAGE_SIZE ** 2 OUTPUT_NAMES = ["fc2/Relu"] def WeightsVariable(shape): return tf.Variable(tf.truncated_normal(shape, stddev=0.1, name='weights')) def BiasVariable(shape): return tf.Variable(tf.constant(0.1, shape=shape, name='biases')) def Conv2d(x, W, b, strides=1): # Conv2D wrapper, with bias and relu activation filter_size = W.get_shape().as_list() pad_size = filter_size[0]//2 pad_mat = np.array([[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]]) x = tf.pad(x, pad_mat) x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='VALID') x = tf.nn.bias_add(x, b) return tf.nn.relu(x) def MaxPool2x2(x, k=2): # MaxPool2D wrapper pad_size = k//2 pad_mat = np.array([[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]]) return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='VALID') def network(images): # Convolution 1 with tf.name_scope('conv1'): weights = WeightsVariable([5,5,1,32]) biases = BiasVariable([32]) conv1 = tf.nn.relu(Conv2d(images, weights, biases)) pool1 = MaxPool2x2(conv1) # Convolution 2 with tf.name_scope('conv2'): weights = WeightsVariable([5,5,32,64]) biases = BiasVariable([64]) conv2 = tf.nn.relu(Conv2d(pool1, weights, biases)) pool2 = MaxPool2x2(conv2) pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64]) # Fully Connected 1 with tf.name_scope('fc1'): weights = WeightsVariable([7 * 7 * 64, 1024]) biases = BiasVariable([1024]) fc1 = tf.nn.relu(tf.matmul(pool2_flat, weights) + biases) # Fully Connected 2 with tf.name_scope('fc2'): weights = WeightsVariable([1024, 10]) biases = BiasVariable([10]) fc2 = tf.nn.relu(tf.matmul(fc1, weights) + biases) return fc2 def loss_metrics(logits, labels): cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='softmax') return tf.reduce_mean(cross_entropy, name='softmax_mean') def training(loss): tf.summary.scalar('loss', loss) global_step = tf.Variable(0, name='global_step', trainable=False) learning_rate = tf.train.exponential_decay(STARTER_LEARNING_RATE, global_step, 100000, 0.75, staircase=True) tf.summary.scalar('learning_rate', learning_rate) optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9) train_op = optimizer.minimize(loss, global_step=global_step) return train_op def evaluation(logits, labels): correct = tf.nn.in_top_k(logits, labels, 1) return tf.reduce_sum(tf.cast(correct, tf.int32)) def do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_set, summary): true_count = 0 steps_per_epoch = data_set.num_examples // BATCH_SIZE num_examples = steps_per_epoch * BATCH_SIZE for step in range(steps_per_epoch): feed_dict = fill_feed_dict(data_set, images_placeholder, labels_placeholder) log, correctness = sess.run([summary, eval_correct], feed_dict=feed_dict) true_count += correctness precision = float(true_count) / num_examples tf.summary.scalar('precision', tf.constant(precision)) print('Num examples %d, Num Correct: %d Precision @ 1: %0.04f' % (num_examples, true_count, precision)) return log def placeholder_inputs(batch_size): images_placeholder = tf.placeholder(tf.float32, shape=(None, 28, 28, 1)) labels_placeholder = tf.placeholder(tf.int32, shape=(None)) return images_placeholder, labels_placeholder def fill_feed_dict(data_set, images_pl, labels_pl): images_feed, labels_feed = data_set.next_batch(BATCH_SIZE) feed_dict = { images_pl: np.reshape(images_feed, (-1,28,28,1)), labels_pl: labels_feed, } return feed_dict def run_training(data_sets): with tf.Graph().as_default(): images_placeholder, labels_placeholder = placeholder_inputs(BATCH_SIZE) logits = network(images_placeholder) loss = loss_metrics(logits, labels_placeholder) train_op = training(loss) eval_correct = evaluation(logits, labels_placeholder) summary = tf.summary.merge_all() init = tf.global_variables_initializer() saver = tf.train.Saver() gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) summary_writer = tf.summary.FileWriter("/tmp/tensorflow/mnist/log", graph=tf.get_default_graph()) test_writer = tf.summary.FileWriter("/tmp/tensorflow/mnist/log/validation", graph=tf.get_default_graph()) sess.run(init) for step in range(MAX_STEPS): start_time = time.time() feed_dict = fill_feed_dict(data_sets.train, images_placeholder, labels_placeholder) _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict) duration = time.time() - start_time if step % 100 == 0: print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)) summary_str = sess.run(summary, feed_dict=feed_dict) summary_writer.add_summary(summary_str, step) summary_writer.flush() if (step + 1) % 1000 == 0 or (step + 1) == MAX_STEPS: checkpoint_file = os.path.join("/tmp/tensorflow/mnist/log", "model.ckpt") saver.save(sess, checkpoint_file, global_step=step) print('Validation Data Eval:') log = do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.validation, summary) test_writer.add_summary(log, step) # Return sess graphdef = tf.get_default_graph().as_graph_def() frozen_graph = tf.graph_util.convert_variables_to_constants(sess, graphdef, OUTPUT_NAMES) return tf.graph_util.remove_training_nodes(frozen_graph)
The important part on the above code is this one:
graphdef = tf.get_default_graph().as_graph_def() frozen_graph = tf.graph_util.convert_variables_to_constants(sess,graphdef, OUTPUT_NAMES) return tf.graph_util.remove_training_nodes(frozen_graph)
In these lines we generate the fozen graph from the session and the graph def and remove the training nodes. Now we load the TensorFlow MNIST data loader and run training. The model has summaries included, so you can visualize training in TensorBoard:
MNIST_DATASETS = tf.contrib.learn.datasets.load_dataset("mnist") tf_model = run_training(MNIST_DATASETS)
Finally to generate the UFF run:
uff_model = uff.from_tensorflow(tf_model, ["fc2/Relu"], output_filename="saved_model.uff")
This function call should output something like this:
Using output node fc2/Relu Converting to UFF graph No. nodes: 28 UFF Output written to model.uff
The function uff.from_tensorflow also has the following parameters:
- quiet=[true|false]: To suppress logging
- input_nodes=[...]: To allow you to define a set of input nodes in the graph
- text=[true|false]: To save a human readable version of UFF model
- list_nodes=[true|false]: To list the nodes on the graph
Tensorflow saved session to UFF
For this section, you will need a saved Tensorflow checkpoint folder. First, we need to load the checkpoint file. The tf.train.Saver object saves and restores variables to/from checkpoint files. Note that loading a checkpoint generated with a different Tensorflow version will result on errors. Sadly, to be able to load a TensorFlow checkpoint you need to know the output node name. A trick to get this from an unknown model is to load it in tensorboard:
tensorboard --logdir=route/to/checkpoint/dir
You will get a message similar to this:
TensorBoard 1.10.0 at http://mtaylor-laptop:6006 (Press CTRL+C to quit)
Open that address in your browser, go to graph and analyze the graph to determine the output node name. In this example the output node name is ArgMax because it's input is the resnet_model/final_dense
signal.
Add the following code to a python file:
import tensorflow as tf from tensorflow.contrib import * # this include isn't used but solves some common tf errors import uff # to convert the graph from a serialized frozen TensorFlow model to UFF. #Load checkpoint checkpoint = tf.train.get_checkpoint_state("route/to/checkpoint/folder") #Get all checkpoint names present on the given folder input_checkpoint = checkpoint.model_checkpoint_path #Devices should be cleared to allow Tensorflow to control placement of graph when loading on different machines saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=True) #Get the graph_def graph = tf.get_default_graph() input_graph_def = graph.as_graph_def() #output names array output_nodes_names = ['YOUR','MODEL', 'OUTPUT', 'NODES'] with tf.Session(graph=graph) as sess: saver.restore(sess, input_checkpoint) frozen_graph = tf.graph_util.convert_variables_to_constants(sess, input_graph_def, output_nodes_names) frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph) uff_model = uff.from_tensorflow(frozen_graph, output_nodes_names, output_filename="model.uff")
This python code opens the checkpoint folder a generate the file "model.uff"
.
Note that not all models on TensorFlow can be saved as a uff. For example, ResNet can't be converted to UFF because it uses the ArgMax layer and it isn't supported right now on uff.
Step 4: Load the uff file and perform inference
Up to this point, everything was running on the host computer, however, the engine should be created on the actual platform (Xavier) because TensorRT runs device-specific profiling during the optimization phase. Since Python API isn't supported on Xavier at this time, the uff must be loaded with the C++ API instead.
Loading the uff is an actual example provided by NVIDIA with TensorRT naned sample_uff_mnist
. For more details on this example please refer to the C++ API section.