NVIDIA Jetson Xavier - Jetson Inference Deep Learning Tutorial

Jetson-inference is a training guide for inference on the NVIDIA Jetson TX1 and TX2 using NVIDIA DIGITS. The "dev" branch on the repository is specifically oriented for NVIDIA Jetson Xavier since it uses the Deep Learning Accelerator (DLA) integration with TensorRT 5. This page is a summary of the original training which can be found in the jetson-inference Github repository. We will focus on the inference part of the tutorial because training is usually made on a host.

Building jetson-inference

To build jetson-inference from source on jetson follow these steps:

# install pre-requisites
sudo apt-get install git cmake -y
# clone the repo
git clone --recursive https://github.com/dusty-nv/jetson-inference.git -b dev
cd jetson-inference/
# configure with cmake
mkdir build
cd build
cmake ../
# compile 
make

Classifying Images with ImageNet

ImageNet is a classification network trained with a database of 1000 objects. The input is an image and it outputs the most likely class and the probability that the image belongs to that class. The repo includes a command-line interface called imagenet-console and a live camera program called imagenet-camera.

imagenet-console

You can use a pretrained model from console:

cd jetson-inference/build/aarch64/bin
./imagenet-console <path-to-input-image> <path-to-output-image>

Example output:

Console output:

class 0195 - 0.963048  (Boston bull, Boston terrier)
class 0245 - 0.017777  (French bulldog)
imagenet-console:  'stella.jpeg' -> 96.30477% class #195 (Boston bull, Boston terrier)
loaded image  fontmapA.png  (256 x 512)  2097152 bytes
[cuda]  cudaAllocMapped 2097152 bytes, CPU 0x21cda5000 GPU 0x21cda5000
[cuda]  cudaAllocMapped 8192 bytes, CPU 0x21bf66000 GPU 0x21bf66000
imagenet-console:  attempting to save output image to 'output_0.jpg'
imagenet-console:  completed saving 'output_0.jpg'

You can also load a custom caffe model:

./imagenet-console <path-to-input-image> <path-to-output-image> \
--prototxt=<path-to-prototxt> \
--model=<path-to-caffemodel> \
--labels=<path-to-labels> \
--input_blob=data \
--output_blob=softmax

imagenet-camera

Similar to the last example, you can run imagenet as a live camera demo:

cd jetson-inference/build/aarch64/bin
$ ./imagenet-camera googlenet           # to run using googlenet
$ ./imagenet-camera alexnet             # to run using alexnet

Locating Object Coordinates using DetectNet

Image recognition networks output class probabilities corresponding to the entire input image. Detection networks, on the other hand, find where in the image those objects are located. DetectNet accepts an input image and outputs the class and coordinates of the detected bounding boxes.

detectnet-console

You can use a pretrained model from the console:

./detectnet-console <path-to-input-image> <path-to-output-image> <network>

The network option can be any of the following pre-trained networks:

Model	Description
ped-100	single-class pedestrian detector
multiped-500	multi-class pedestrian + baggage detector
facenet-120	single-class facial recognition detector
coco-airplane	MS COCO airplane class
coco-bottle	MS COCO bottle class
coco-chair	MS COCO chair class
coco-dog	MS COCO dog class

Example output:

Console output:

1 bounding boxes detected
bounding box 0   (58.201561, 171.843750)  (634.487488, 1059.500000)  w=576.285950  h=887.656250
draw boxes  1  0   0.000000 200.000000 255.000000 100.000000
detectnet-console:  writing 772x1040 image to 'output_0.jpg'
detectnet-console:  successfully wrote 772x1040 image to 'output_0.jpg'

You can also load a custom caffe model:

./detectnet-console <path-to-input-image> <path-to-output-image> \
--prototxt=<path-to-prototxt> \
--model=<path-to-caffemodel> \
--input_blob=data \
--output_cvg=coverage \
--output_bbox=bboxes

detectnet-camera

Similar to the last example, you can run detectnet as a live camera demo:

cd jetson-inference/build/aarch64/bin
./detectnet-camera coco-bottle    # detect bottles/soda cans in the camera
./detectnet-camera coco-dog       # detect dogs in the camera
./detectnet-camera multiped       # run using multi-class pedestrian/luggage detector
./detectnet-camera pednet         # run using original single-class pedestrian detector
./detectnet-camera facenet        # run using facial recognition network
./detectnet-camera                # by default, program will run using multiped

Image Segmentation with SegNet

Segmentation is based on image recognition, except the classifications occur at the pixel level as opposed to classifying entire images as with image recognition. This is accomplished by convolutionalizing a pre-trained imageNet recognition model (like Alexnet), which turns it into a fully-convolutional segmentation model capable of per-pixel labeling.

segnet-console

You can use a pretrained model from console:

cd jetson-inference/build/aarch64/bin
./segnet-console <path-to-input-image> <path-to-output-image>

You can also load a custom caffe model:

./segnet-console <path-to-input-image> <path-to-output-image> \
--prototxt=<path-to-prototxt> \
--model=<path-to-caffemodel> \
--labels=<path-to-labels> \
--colors=<path-to-colors> \
--input_blob=data \ 
--output_blob=score_fr

segnet-camera

Similar to the last example, you can run segnet as a live camera demo:

cd jetson-inference/build/aarch64/bin
./segnet-camera

Previous: Deep Learning/Deep Learning Tutorials

Index

Next: Deep Learning/Deep Learning Tutorials/Jetson Reinforcement

❯