NVIDIA Jetson Xavier - Deep Learning with TensorRT
This TensorRT wiki demonstrates how to use the C++ and Python APIs to implement the most common deep learning layers. Also provides step-by-step instructions with examples for common user tasks such as creating a TensorRT network definition, invoking the TensorRT builder, serializing and deserializing, and how to feed the engine with data and perform inference.
Description
TensorRT is a C++ library that facilitates high-performance inference on NVIDIA platforms. It is designed to work with the most popular deep learning frameworks, such as TensorFlow, Caffe, PyTorch, etc. It focuses specifically on running an already trained model, to train the model, other libraries like cuDNN are more suitable. Some frameworks like TensorFlow have integrated TensorRT so that it can be used to accelerate inference within the framework. For other frameworks like Caffe a parser is provided to generate a model that can be imported on TensorRT. And finally, TensorRT C++ and Python APIs can be used to build a model from the ground up. For a more in-depth analysis of each use case refer to the following sections:
- Using TensorRT integrated with Tensorflow
- Parsing Tensorflow model for TensorRT
- Parsing Caffe model for TensorRT
- Building TensorRT API examples
This GitHub repo has a great collection of Tensorflow models with TensorRT.
Some NVIDIA benchmarks on TX2:
Model | Input Size | Tensorflow on TX2 without TensorRT | Tensorflow on TX2 with TensorRT |
---|---|---|---|
inception_v4 | 299x299 | 129ms | 38.5ms |
resnet_v1_50 | 224x224 | 55.1ms | 12.5ms |
resnet_v1_101 | 224x224 | 91.0ms | 20.6ms |
resnet_v1_152 | 224x224 | 124ms | 28.9ms |
mobilenet_v1_1p0_224 | 224x224 | 17.3ms | 11.1ms |