R2Inference - TensorRT
Make sure you also check R2Inference's companion project: GstInference |
R2Inference |
---|
Introduction |
Getting started |
Supported backends |
Examples |
Model Zoo |
Contact Us |
|
NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. TensorRT is built on CUDA, NVIDIA's parallel programming model, and enables you to optimize inference for all deep learning frameworks leveraging libraries, development tools, and technologies in CUDA-X for artificial intelligence, autonomous machines, high-performance computing, and graphics.
The core of NVIDIA TensorRT is a C++ library that facilitates high performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network, which consists of a network definition and a set of trained parameters, and produces a highly optimized runtime engine which performs inference for that network.
TensorRT provides API's via C++ and Python that help to express deep learning models via the Network Definition API or load a pre-defined model via the parsers that allow TensorRT to optimize and run them on an NVIDIA GPU. TensorRT applies graph optimizations, layer fusion, among other optimizations, while also finding the fastest implementation of that model leveraging a diverse collection of highly optimized kernels.
TensorRT is designed to work in a complementary fashion with training frameworks such as TensorFlow, Caffe, PyTorch, MXNet, etc. It focuses specifically on running an already-trained network quickly and efficiently on a GPU for the purpose of generating a result (a process that is referred to in various places as scoring, detecting, regression, or inference). Alternatively, TensorRT can be used as a library within a user application. It includes parsers for importing existing models from Caffe, ONNX, or TensorFlow, and C++ and Python APIs for building models programmatically.
Installation
X86
You can choose between Debian packages, RPM-packages or a tar file for TensorRT installation.
For tar file installation, please refer to the tar Installation Guide.
For RPM installation, check the RPM Installation Guide.
The Debian and RPM installations automatically install any dependencies. However:
- requires
sudo
or root privileges to install. - provides no flexibility as to which location TensorRT is installed into.
- requires that the CUDA Toolkit and cuDNN have also been installed using Debian or RPM packages.
- Does not allow more than one minor version of TensorRT to be installed at the same time.
Downloading TensorRT
- Go to NVIDIA TensorRT page (membership is required).
- Click Download Now.
- Select the version of TensorRT that you are interested in.
- Select the check-box to agree to the license terms.
- Select the TensorRT local repo file that matches the Ubuntu version, CPU architecture, and CUDA version that you are using. Your download begins.
Installing TensorRT
- Install TensorRT from the Debian local repo package:
os="ubuntu1x04" tag="cudax.x-trt7.x.x.x-ga-yyyymmdd" sudo dpkg -i nv-tensorrt-repo-${os}-${tag}_1-1_amd64.deb sudo apt-key add /var/nv-tensorrt-repo-${tag}/7fa2af80.pub sudo apt-get update sudo apt-get install tensorrt
- Install libnvinfer library:
If using Python 2.7 python-libnvinfer
should be installed:
sudo apt-get install python-libnvinfer-dev
If using Python 3.x python-libnvinfer
should be installed:
sudo apt-get install python3-libnvinfer-dev
- If you plan to use TensorRT with TensorFlow
graphsurgeon-tf
should be installed:
sudo apt-get install uff-converter-tf
- Verify the installation.
dpkg -l | grep TensorRT
You should see something similar to the following:
ii graphsurgeon-tf 7.0.0-1+cuda10.2 amd64 GraphSurgeon for TensorRT package ii libnvinfer-bin 7.0.0-1+cuda10.2 amd64 TensorRT binaries ii libnvinfer-dev 7.0.0-1+cuda10.2 amd64 TensorRT development libraries and headers ii libnvinfer-doc 7.0.0-1+cuda10.2 all TensorRT documentation ii libnvinfer-plugin-dev 7.0.0-1+cuda10.2 amd64 TensorRT plugin libraries ii libnvinfer-plugin7 7.0.0-1+cuda10.2 amd64 TensorRT plugin libraries ii libnvinfer-samples 7.0.0-1+cuda10.2 all TensorRT samples ii libnvinfer7 7.0.0-1+cuda10.2 amd64 TensorRT runtime libraries ii libnvonnxparsers-dev 7.0.0-1+cuda10.2 amd64 TensorRT ONNX libraries ii libnvonnxparsers7 7.0.0-1+cuda10.2 amd64 TensorRT ONNX libraries ii libnvparsers-dev 7.0.0-1+cuda10.2 amd64 TensorRT parsers libraries ii libnvparsers7 7.0.0-1+cuda10.2 amd64 TensorRT parsers libraries ii python-libnvinfer 7.0.0-1+cuda10.2 amd64 Python bindings for TensorRT ii python-libnvinfer-dev 7.0.0-1+cuda10.2 amd64 Python development package for TensorRT ii python3-libnvinfer 7.0.0-1+cuda10.2 amd64 Python 3 bindings for TensorRT ii python3-libnvinfer-dev 7.0.0-1+cuda10.2 amd64 Python 3 development package for TensorRT ii tensorrt 7.0.0.x-1+cuda10.2 amd64 Meta package of TensorRT ii uff-converter-tf 7.0.0-1+cuda10.2 amd64 UFF converter for TensorRT package
Nvidia Jetson (TX1, TX2, Xavier, Nano)
NVIDIA TensorRT can be installed on Jetson platforms by installing JetPack 4.3, which is a release supporting all Jetson modules, including the Jetson AGX Xavier series, Jetson TX2 series, Jetson TX1, and Jetson Nano. Key features include a new version of TensorRT and cuDNN improving AI inference performance by up to 25%.
- Check the latest JetPack official page for more information.
- Visit the Jetson software installation guide by using the SDK Manager.
Generating a model for R2I
There are two main ways of getting a TensorRT model for R2Inference: by converting an existing model to TRT and by generating a model with NVIDIA Transfer Learning Toolkit (TLT).
Converting an existing model to TRT
From Caffe to TensorRT
- Download the caffe_to_trt.py Python script.
- Execute the script according to the following parameters:
./caffe_to_trt.py --output_name [Output layer name] --deploy_file [Plain text prototxt] --model_file [Caffe model containing the weights]
For example:
./caffe_to_trt.py --output_name prob --deploy_file deploy.prototxt --model_file bvlc_googlenet.caffemodel
From UFF to TensorRT
- Download the uff_to_trt.py Python script.
- Execute the script according to the following parameters:
./uff_to_trt.py --width [Network input width] --height [Network input height] --channels [Network input channels] --order [NHWC | NCHW] --input_name [Input layer name] --output_name [Output layer name] --graph_name [UFF Engine Name]
For example:
./uff_to_trt.py --width 416 --height 416 --channels 3 --order "NHWC" --input_name input/Placeholder --output_name add_8 --graph_name ./graph_tinyyolov2_tensorflow.uff
Create a model using NVIDIA Transfer Learning Toolkit (TLT)
Check the NVIDIA Transfer Learning Toolkit in order to learn how to generate a TensorRT model.