Xavier/Deep Learning/TensorRT/Building Examples: Difference between revisions

From RidgeRun Developer Wiki
(Created page with "<noinclude> {{Xavier/Head}} </noinclude> {{DISPLAYTITLE:NVIDIA Jetson Xavier - Building TensorRT API examples|noerror}} The following section demonstrates how to build and us...")
 
mNo edit summary
 
(9 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<noinclude>
<noinclude>
{{Xavier/Head}}
{{Xavier/Head|previous=Deep Learning‎/TensorRT/Parsing Caffe|next=Deep Learning/Deep Learning Accelerator|metakeywords=TensorRT,samples,examples,nvdia samples,python samples,c++ samples}}
</noinclude>
</noinclude>
{{DISPLAYTITLE:NVIDIA Jetson Xavier - Building TensorRT API examples|noerror}}
{{DISPLAYTITLE:NVIDIA Jetson Xavier - Building TensorRT API examples|noerror}}


The following section demonstrates how to build and use nvidia samples for the TensorRT C++ API and Python API
__TOC__
= C++ API=
<br>
The following section demonstrates how to build and use NVIDIA samples for the TensorRT C++ API and Python API
== C++ API==
First you need to build the samples. TensorRT is installed in <code>/usr/src/tensorrt/samples</code> by default. To build all the c++ samples run:
First you need to build the samples. TensorRT is installed in <code>/usr/src/tensorrt/samples</code> by default. To build all the c++ samples run:
<syntaxhighlight lang=bash>
<syntaxhighlight lang=bash>
Line 13: Line 16:
./<sample_name>
./<sample_name>
</syntaxhighlight>
</syntaxhighlight>
After building the samples directory, binaries are generated in the In the <code>/usr/src/tensorrt/bin</code> directory, and they are named in <code>snake_case</code>. On the other hand, the source code is located in the samples directory under a second level directory named like the binary but in <code>camelCase</code>. Some samples require some extra steps like downloading a model or a frozen graph, those steps are enumerated in the README files on the source folder. Inside the following table you can find the sample binary names and descriptions:
After building the samples directory, binaries are generated in the In the <code>/usr/src/tensorrt/bin</code> directory, and they are named in <code>snake_case</code>. On the other hand, the source code is located in the samples directory under a second-level directory named like the binary but in <code>camelCase</code>. Some samples require some extra steps like downloading a model or a frozen graph, those steps are enumerated in the README files on the source folder. Inside the following table you can find the sample binary names and descriptions:
{| class="wikitable" style="width: 100%;"
{| class="wikitable" style="width: 70%;"
|-
|-
! style="width: 20%"|Sample  
! style="width: 20%"|Sample  
Line 28: Line 31:
*Use the engine to perform inference on an input image
*Use the engine to perform inference on an input image
|  
|  
The Caffe model was trained with the MNIST data set. To test the engine, this example picks a handwritten digit at random and runs an inference with it. This sample outputs the ASCII rendering of the input image and the most likely digit associated to that image.  
The Caffe model was trained with the MNIST data set. To test the engine, this example picks a handwritten digit at random and runs an inference with it. This sample outputs the ASCII rendering of the input image and the most likely digit associated with that image.  
|-
|-
|sample_mnist_api
|sample_mnist_api
Line 48: Line 51:
*Use the engine to perform inference
*Use the engine to perform inference
|  
|  
This sample uses a pre-trained TensorFlow model that was frozen and converted to UFF <code>/usr/src/tensorrt/data/mnist/lenet5.uff</code>. To generate your own UFF files see [[Xavier/JetPack_4.1/Components/TensorRT#Step_3:_Generate_the_UFF_file|Generate the UFF file]]
This sample uses a pre-trained TensorFlow model that was frozen and converted to UFF <code>/usr/src/tensorrt/data/mnist/lenet5.uff</code>. To generate your own UFF files see [[Xavier/Deep_Learning/TensorRT/Parsing_Tensorflow#Step_3:_Generate_the_UFF|Generate the UFF file]]


This sample outputs the inference results and ASCII rendering of every digit from 0 to 9.
This sample outputs the inference results and ASCII rendering of every digit from 0 to 9.
Line 66: Line 69:
*Use layer-based profiling
*Use layer-based profiling
|  
|  
See [https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#half2mode this] for details on how to set the half precision mode and network profiling.
See [https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#half2mode this] for details on how to set the half-precision mode and network profiling.
|-
|-
|sample_char_rnn
|sample_char_rnn
Line 83: Line 86:
INT8 inference is available only on GPUs with compute capability 6.1 or 7.x. The advantage of using INT8 is that the inference and training are faster, but it requires an investment to determine how best to represent the weights and activations as 8-bit integers.
INT8 inference is available only on GPUs with compute capability 6.1 or 7.x. The advantage of using INT8 is that the inference and training are faster, but it requires an investment to determine how best to represent the weights and activations as 8-bit integers.


The sample calibrates for MNIST, but can be used to calibrate other networks. Run the sample on MNIST with: <code>./sample_int8 mnist</code>
The sample calibrates for MNIST but can be used to calibrate other networks. Run the sample on MNIST with: <code>./sample_int8 mnist</code>
|-
|-
|sample_plugin
|sample_plugin
Line 91: Line 94:
*Enable a Custom layer in NvCaffeParser
*Enable a Custom layer in NvCaffeParser
|  
|  
A limiting factor when using the Caffe and Tensorflow parser is that using not supported layers will result on an error. This sample creates a custom layer and adds it to the parser to counteract that problem.
A limiting factor when using the Caffe and Tensorflow parser is that using not supported layers will result in an error. This sample creates a custom layer and adds it to the parser to counteract that problem.


The custom layer is a replacement for the <code>FullyConnected</code> layer using cuBLAS matrix multiplication and cuDNN tensor addition. So it makes a great example on how to integrate other GPU APIs with TensorRT.
The custom layer is a replacement for the <code>FullyConnected</code> layer using cuBLAS matrix multiplication and cuDNN tensor addition. So it makes a great example of how to integrate other GPU APIs with TensorRT.
|-
|-
|sample_nmt
|sample_nmt
Line 107: Line 110:
*Implement the Faster R-CNN network in TensorRT
*Implement the Faster R-CNN network in TensorRT
|  
|  
The model used on this example is to large to be included with the package, to download it follow the guide on  <code>/usr/src/tensorrt/samples/sampleFasterRNN/README.txt</code>
The model used in this example is too large to be included with the package, to download it follow the guide on  <code>/usr/src/tensorrt/samples/sampleFasterRNN/README.txt</code>


This model is based on [https://arxiv.org/abs/1506.01497 this] paper
This model is based on [https://arxiv.org/abs/1506.01497 this] paper
Line 117: Line 120:
*Perform inference on the SSD network in TensorRT
*Perform inference on the SSD network in TensorRT
|  
|  
The model used on this example is to large to be included with the package, to download it follow the guide on  <code>/usr/src/tensorrt/samples/sampleUffSSD/README.txt</code>
The model used in this example is too large to be included with the package, to download it follow the guide on  <code>/usr/src/tensorrt/samples/sampleUffSSD/README.txt</code>
|-
|-
|sample_movielens
|sample_movielens
Line 123: Line 126:
*Implement a movie recommendation system using Neural Collaborative Filter in TensorRT
*Implement a movie recommendation system using Neural Collaborative Filter in TensorRT
|  
|  
Each input of the model consist of a userID and a list of movieIDs. The network predicts the highest rated movie for each user.
Each input of the model consists of a userID and a list of movieIDs. The network predicts the highest rated movie for each user.


The sample uses a set of 32 users with 100 movies each, and compares its prediction with the ground truth.
The sample uses a set of 32 users with 100 movies each and compares its prediction with the ground truth.
|}
|}


=Python API=
==Python API==
You can find the Python samples in the <code>/usr/src/tensorrt/samples/python</code> directory. Every Python sample includes a README.md and requirements.txt file. To run one of the Python samples, the process typically involves two steps:
You can find the Python samples in the <code>/usr/src/tensorrt/samples/python</code> directory. Every Python sample includes a README.md and requirements.txt file. To run one of the Python samples, the process typically involves two steps:


Line 142: Line 145:
*fc_plugin_caffe_mnist
*fc_plugin_caffe_mnist
*uff_custom_plugin
*uff_custom_plugin
 
<br>
NOTE: Python API isn't supported on Xavier at this time, and the Python API samples are not included with Xavier's TensorRT installation. To get these samples you need to install TensorRT on the host.
{{Ambox
|type=notice
|small=left
|issue='''Python API isn't supported on Xavier at this time, and the Python API samples are not included with Xavier's TensorRT installation. To get these samples you need to install TensorRT on the host.'''
|style=width:unset;
}}


<noinclude>
<noinclude>
{{Xavier/Foot|<Replace with "previous" page>|<Replace with "next" page>}}
{{Xavier/Foot|Deep Learning‎/TensorRT/Parsing Caffe|Deep Learning/Deep Learning Accelerator}}
</noinclude>
</noinclude>

Latest revision as of 17:56, 13 February 2023




Previous: Deep Learning‎/TensorRT/Parsing Caffe Index Next: Deep Learning/Deep Learning Accelerator







The following section demonstrates how to build and use NVIDIA samples for the TensorRT C++ API and Python API

C++ API

First you need to build the samples. TensorRT is installed in /usr/src/tensorrt/samples by default. To build all the c++ samples run:

cd /usr/src/tensorrt/samples
sudo make -j4
cd ../bin
./<sample_name>

After building the samples directory, binaries are generated in the In the /usr/src/tensorrt/bin directory, and they are named in snake_case. On the other hand, the source code is located in the samples directory under a second-level directory named like the binary but in camelCase. Some samples require some extra steps like downloading a model or a frozen graph, those steps are enumerated in the README files on the source folder. Inside the following table you can find the sample binary names and descriptions:

Sample Description Notes
sample_mnist
  • Perform basic TensorRT setup and initialization
  • Import a Caffe model using Caffe parser
  • Build an engine
  • Serialize and deserialize the engine
  • Use the engine to perform inference on an input image

The Caffe model was trained with the MNIST data set. To test the engine, this example picks a handwritten digit at random and runs an inference with it. This sample outputs the ASCII rendering of the input image and the most likely digit associated with that image.

sample_mnist_api
  • Build a network creating every layer
  • Use the engine to perform inference on an input image

This sample builds a model from scratch using the C++ API. For a more detailed guide on how to do this, you can visit this topic on the official documentation.

This sample does not train the model. It just loads the pre-trained weights.

sample_uff_mnist
  • Implement a TensorFlow model
  • Create the UFF Parser
  • Use the UFF Parser to get the dimensions and the order of the input tensor
  • Load a trained TensorFlow model converted to UFF
  • Build an engine
  • Use the engine to perform inference

This sample uses a pre-trained TensorFlow model that was frozen and converted to UFF /usr/src/tensorrt/data/mnist/lenet5.uff. To generate your own UFF files see Generate the UFF file

This sample outputs the inference results and ASCII rendering of every digit from 0 to 9.

sample_onnx_mnist
  • Configure the ONNX parser
  • Convert an MNIST network in ONNX format to a TensorRT network
  • Build the engine and run inference using the generated TensorRT network

See this for a detailed ONNX parser configuration guide.

sample_googlenet
  • Use FP16 mode in TensorRT
  • Use TensorRTHalf2Mode
  • Use layer-based profiling

See this for details on how to set the half-precision mode and network profiling.

sample_char_rnn
  • Implement a recurrent neural network based on the char-rnn.

The network is trained for predictive text completion with the Treebank-3 dataset

sample_int8
  • Perform INT8 calibration
  • Perform INT8 inference
  • Calibrate a network for execution in INT8
  • Cache the output of the calibration to avoid repeating the process

INT8 inference is available only on GPUs with compute capability 6.1 or 7.x. The advantage of using INT8 is that the inference and training are faster, but it requires an investment to determine how best to represent the weights and activations as 8-bit integers.

The sample calibrates for MNIST but can be used to calibrate other networks. Run the sample on MNIST with: ./sample_int8 mnist

sample_plugin
  • Define a Custom layer that supports multiple data formats
  • Define a Custom layer that can be serialized and deserialized
  • Enable a Custom layer in NvCaffeParser

A limiting factor when using the Caffe and Tensorflow parser is that using not supported layers will result in an error. This sample creates a custom layer and adds it to the parser to counteract that problem.

The custom layer is a replacement for the FullyConnected layer using cuBLAS matrix multiplication and cuDNN tensor addition. So it makes a great example of how to integrate other GPU APIs with TensorRT.

sample_nmt
  • Create a seq2seq type NMT inference engine using a checkpoint from TensorFlow

This sample requires more setup to test. you should follow the guide on /usr/src/tensorrt/samples/sampleNMT/README.txt

For more information about NMT models this is a great resource.

sample_fasterRCNN
  • Implement the Faster R-CNN network in TensorRT

The model used in this example is too large to be included with the package, to download it follow the guide on /usr/src/tensorrt/samples/sampleFasterRNN/README.txt

This model is based on this paper

The original Caffe model was modified to include RPN and ROIPooling layers

sample_uff_ssd
  • Perform inference on the SSD network in TensorRT

The model used in this example is too large to be included with the package, to download it follow the guide on /usr/src/tensorrt/samples/sampleUffSSD/README.txt

sample_movielens
  • Implement a movie recommendation system using Neural Collaborative Filter in TensorRT

Each input of the model consists of a userID and a list of movieIDs. The network predicts the highest rated movie for each user.

The sample uses a set of 32 users with 100 movies each and compares its prediction with the ground truth.

Python API

You can find the Python samples in the /usr/src/tensorrt/samples/python directory. Every Python sample includes a README.md and requirements.txt file. To run one of the Python samples, the process typically involves two steps:

python -m pip install -r requirements.txt #Install the sample requirements
python sample.py #run the sample

The available samples are:

  • introductory_parser_samples
  • end_to_end_tensorflow_mnist
  • network_api_pytorch_mnist
  • fc_plugin_caffe_mnist
  • uff_custom_plugin




Previous: Deep Learning‎/TensorRT/Parsing Caffe Index Next: Deep Learning/Deep Learning Accelerator