NVIDIA Jetson Xavier - Deep Learning with TensorRT

This TensorRT wiki demonstrates how to use the C++ and Python APIs to implement the most common deep learning layers. Also provides step-by-step instructions with examples for common user tasks such as creating a TensorRT network definition, invoking the TensorRT builder, serializing and deserializing, and how to feed the engine with data and perform inference.

Description

TensorRT is a C++ library that facilitates high-performance inference on NVIDIA platforms. It is designed to work with the most popular deep learning frameworks, such as TensorFlow, Caffe, PyTorch, etc. It focuses specifically on running an already trained model, to train the model, other libraries like cuDNN are more suitable. Some frameworks like TensorFlow have integrated TensorRT so that it can be used to accelerate inference within the framework. For other frameworks like Caffe a parser is provided to generate a model that can be imported on TensorRT. And finally, TensorRT C++ and Python APIs can be used to build a model from the ground up. For a more in-depth analysis of each use case refer to the following sections:

This GitHub repo has a great collection of Tensorflow models with TensorRT.

Some NVIDIA benchmarks on TX2:

Model	Input Size	Tensorflow on TX2 without TensorRT	Tensorflow on TX2 with TensorRT
inception_v4	299x299	129ms	38.5ms
resnet_v1_50	224x224	55.1ms	12.5ms
resnet_v1_101	224x224	91.0ms	20.6ms
resnet_v1_152	224x224	124ms	28.9ms
mobilenet_v1_1p0_224	224x224	17.3ms	11.1ms

Previous: Deep Learning

Index

Next: Deep Learning/TensorRT/Tensorflow