R2Inference - TensorRT

From RidgeRun Developer Wiki




Previous: Supported_backends/Caffe Index Next: Supported_backends/EdgeTPU




NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. TensorRT is built on CUDA, NVIDIA's parallel programming model, and enables you to optimize inference for all deep learning frameworks leveraging libraries, development tools, and technologies in CUDA-X for artificial intelligence, autonomous machines, high-performance computing, and graphics.

The core of NVIDIA TensorRT is a C++ library that facilitates high performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network, which consists of a network definition and a set of trained parameters, and produces a highly optimized runtime engine which performs inference for that network.

TensorRT provides API's via C++ and Python that help to express deep learning models via the Network Definition API or load a pre-defined model via the parsers that allow TensorRT to optimize and run them on an NVIDIA GPU. TensorRT applies graph optimizations, layer fusion, among other optimizations, while also finding the fastest implementation of that model leveraging a diverse collection of highly optimized kernels.

TensorRT is designed to work in a complementary fashion with training frameworks such as TensorFlow, Caffe, PyTorch, MXNet, etc. It focuses specifically on running an already-trained network quickly and efficiently on a GPU for the purpose of generating a result (a process that is referred to in various places as scoring, detecting, regression, or inference). Alternatively, TensorRT can be used as a library within a user application. It includes parsers for importing existing models from Caffe, ONNX, or TensorFlow, and C++ and Python APIs for building models programmatically.

Installation

X86

You can choose between Debian packages, RPM-packages or a tar file for TensorRT installation.

For tar file installation, please refer to the tar Installation Guide.

For RPM installation, check the RPM Installation Guide.

The Debian and RPM installations automatically install any dependencies. However:

  • requires sudo or root privileges to install.
  • provides no flexibility as to which location TensorRT is installed into.
  • requires that the CUDA Toolkit and cuDNN have also been installed using Debian or RPM packages.
  • Does not allow more than one minor version of TensorRT to be installed at the same time.

Downloading TensorRT

  • Go to NVIDIA TensorRT page (membership is required).
  • Click Download Now.
  • Select the version of TensorRT that you are interested in.
  • Select the check-box to agree to the license terms.
  • Select the TensorRT local repo file that matches the Ubuntu version, CPU architecture, and CUDA version that you are using. Your download begins.

Installing TensorRT

  • Install TensorRT from the Debian local repo package:
os="ubuntu1x04"
tag="cudax.x-trt7.x.x.x-ga-yyyymmdd"
sudo dpkg -i nv-tensorrt-repo-${os}-${tag}_1-1_amd64.deb
sudo apt-key add /var/nv-tensorrt-repo-${tag}/7fa2af80.pub
sudo apt-get update
sudo apt-get install tensorrt
  • Install libnvinfer library:

If using Python 2.7 python-libnvinfer should be installed:

sudo apt-get install python-libnvinfer-dev

If using Python 3.x python-libnvinfer should be installed:

sudo apt-get install python3-libnvinfer-dev
  • If you plan to use TensorRT with TensorFlow graphsurgeon-tf should be installed:
sudo apt-get install uff-converter-tf
  • Verify the installation.
dpkg -l | grep TensorRT

You should see something similar to the following:

ii  graphsurgeon-tf	7.0.0-1+cuda10.2	amd64	GraphSurgeon for TensorRT package
ii  libnvinfer-bin		7.0.0-1+cuda10.2	amd64	TensorRT binaries
ii  libnvinfer-dev		7.0.0-1+cuda10.2	amd64	TensorRT development libraries and headers
ii  libnvinfer-doc		7.0.0-1+cuda10.2	all	TensorRT documentation
ii  libnvinfer-plugin-dev	7.0.0-1+cuda10.2	amd64	TensorRT plugin libraries
ii  libnvinfer-plugin7	7.0.0-1+cuda10.2	amd64	TensorRT plugin libraries
ii  libnvinfer-samples	7.0.0-1+cuda10.2	all	TensorRT samples
ii  libnvinfer7		7.0.0-1+cuda10.2	amd64	TensorRT runtime libraries
ii  libnvonnxparsers-dev		7.0.0-1+cuda10.2	amd64	TensorRT ONNX libraries
ii  libnvonnxparsers7	7.0.0-1+cuda10.2	amd64	TensorRT ONNX libraries
ii  libnvparsers-dev	7.0.0-1+cuda10.2	amd64	TensorRT parsers libraries
ii  libnvparsers7	7.0.0-1+cuda10.2	amd64	TensorRT parsers libraries
ii  python-libnvinfer	7.0.0-1+cuda10.2	amd64	Python bindings for TensorRT
ii  python-libnvinfer-dev	7.0.0-1+cuda10.2	amd64	Python development package for TensorRT
ii  python3-libnvinfer	7.0.0-1+cuda10.2	amd64	Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev	7.0.0-1+cuda10.2	amd64	Python 3 development package for TensorRT
ii  tensorrt		7.0.0.x-1+cuda10.2 	amd64	Meta package of TensorRT
ii  uff-converter-tf	7.0.0-1+cuda10.2	amd64	UFF converter for TensorRT package

Nvidia Jetson (TX1, TX2, Xavier, Nano)

NVIDIA TensorRT can be installed on Jetson platforms by installing JetPack 4.3, which is a release supporting all Jetson modules, including the Jetson AGX Xavier series, Jetson TX2 series, Jetson TX1, and Jetson Nano. Key features include a new version of TensorRT and cuDNN improving AI inference performance by up to 25%.

Generating a model for R2I

There are two main ways of getting a TensorRT model for R2Inference: by converting an existing model to TRT and by generating a model with NVIDIA Transfer Learning Toolkit (TLT).

Converting an existing model to TRT

From Caffe to TensorRT

  • Download the caffe_to_trt.py Python script.
  • Execute the script according to the following parameters:
./caffe_to_trt.py  --output_name [Output layer name] --deploy_file [Plain text prototxt] --model_file [Caffe model containing the weights]

For example:

./caffe_to_trt.py  --output_name prob --deploy_file deploy.prototxt --model_file bvlc_googlenet.caffemodel

From UFF to TensorRT

  • Download the uff_to_trt.py Python script.
  • Execute the script according to the following parameters:
./uff_to_trt.py --width [Network input width] --height [Network input height] --channels [Network input channels] --order [NHWC | NCHW] --input_name [Input layer name] --output_name [Output layer name] --graph_name [UFF Engine Name]

For example:

./uff_to_trt.py --width 416 --height 416 --channels 3 --order "NHWC" --input_name input/Placeholder --output_name add_8 --graph_name ./graph_tinyyolov2_tensorflow.uff

Create a model using NVIDIA Transfer Learning Toolkit (TLT)

Check the NVIDIA Transfer Learning Toolkit in order to learn how to generate a TensorRT model.




Previous: Supported_backends/Caffe Index Next: Examples