NVIDIA Jetson Xavier - Jetson Inference Deep Learning Tutorial
Jetson-inference is a training guide for inference on the NVIDIA Jetson TX1 and TX2 using NVIDIA DIGITS. The "dev" branch on the repository is specifically oriented for NVIDIA Jetson Xavier since it uses the Deep Learning Accelerator (DLA) integration with TensorRT 5. This page is a summary of the original training which can be found in the jetson-inference Github repository. We will focus on the inference part of the tutorial because training is usually made on a host.
Building jetson-inference
To build jetson-inference from source on jetson follow these steps:
# install pre-requisites sudo apt-get install git cmake -y # clone the repo git clone --recursive https://github.com/dusty-nv/jetson-inference.git -b dev cd jetson-inference/ # configure with cmake mkdir build cd build cmake ../ # compile make
Classifying Images with ImageNet
ImageNet is a classification network trained with a database of 1000 objects. The input is an image and it outputs the most likely class and the probability that the image belongs to that class. The repo includes a command-line interface called imagenet-console and a live camera program called imagenet-camera.
imagenet-console
You can use a pretrained model from console:
cd jetson-inference/build/aarch64/bin ./imagenet-console <path-to-input-image> <path-to-output-image>
Example output:
Console output:
class 0195 - 0.963048 (Boston bull, Boston terrier) class 0245 - 0.017777 (French bulldog) imagenet-console: 'stella.jpeg' -> 96.30477% class #195 (Boston bull, Boston terrier) loaded image fontmapA.png (256 x 512) 2097152 bytes [cuda] cudaAllocMapped 2097152 bytes, CPU 0x21cda5000 GPU 0x21cda5000 [cuda] cudaAllocMapped 8192 bytes, CPU 0x21bf66000 GPU 0x21bf66000 imagenet-console: attempting to save output image to 'output_0.jpg' imagenet-console: completed saving 'output_0.jpg'
You can also load a custom caffe model:
./imagenet-console <path-to-input-image> <path-to-output-image> \ --prototxt=<path-to-prototxt> \ --model=<path-to-caffemodel> \ --labels=<path-to-labels> \ --input_blob=data \ --output_blob=softmax
imagenet-camera
Similar to the last example, you can run imagenet as a live camera demo:
cd jetson-inference/build/aarch64/bin $ ./imagenet-camera googlenet # to run using googlenet $ ./imagenet-camera alexnet # to run using alexnet
Locating Object Coordinates using DetectNet
Image recognition networks output class probabilities corresponding to the entire input image. Detection networks, on the other hand, find where in the image those objects are located. DetectNet accepts an input image and outputs the class and coordinates of the detected bounding boxes.
detectnet-console
You can use a pretrained model from the console:
./detectnet-console <path-to-input-image> <path-to-output-image> <network>
The network option can be any of the following pre-trained networks:
Model | Description |
---|---|
ped-100 | single-class pedestrian detector |
multiped-500 | multi-class pedestrian + baggage detector |
facenet-120 | single-class facial recognition detector |
coco-airplane | MS COCO airplane class |
coco-bottle | MS COCO bottle class |
coco-chair | MS COCO chair class |
coco-dog | MS COCO dog class |
Example output:
Console output:
1 bounding boxes detected bounding box 0 (58.201561, 171.843750) (634.487488, 1059.500000) w=576.285950 h=887.656250 draw boxes 1 0 0.000000 200.000000 255.000000 100.000000 detectnet-console: writing 772x1040 image to 'output_0.jpg' detectnet-console: successfully wrote 772x1040 image to 'output_0.jpg'
You can also load a custom caffe model:
./detectnet-console <path-to-input-image> <path-to-output-image> \ --prototxt=<path-to-prototxt> \ --model=<path-to-caffemodel> \ --input_blob=data \ --output_cvg=coverage \ --output_bbox=bboxes
detectnet-camera
Similar to the last example, you can run detectnet as a live camera demo:
cd jetson-inference/build/aarch64/bin ./detectnet-camera coco-bottle # detect bottles/soda cans in the camera ./detectnet-camera coco-dog # detect dogs in the camera ./detectnet-camera multiped # run using multi-class pedestrian/luggage detector ./detectnet-camera pednet # run using original single-class pedestrian detector ./detectnet-camera facenet # run using facial recognition network ./detectnet-camera # by default, program will run using multiped
Image Segmentation with SegNet
Segmentation is based on image recognition, except the classifications occur at the pixel level as opposed to classifying entire images as with image recognition. This is accomplished by convolutionalizing a pre-trained imageNet recognition model (like Alexnet), which turns it into a fully-convolutional segmentation model capable of per-pixel labeling.
segnet-console
You can use a pretrained model from console:
cd jetson-inference/build/aarch64/bin ./segnet-console <path-to-input-image> <path-to-output-image>
You can also load a custom caffe model:
./segnet-console <path-to-input-image> <path-to-output-image> \ --prototxt=<path-to-prototxt> \ --model=<path-to-caffemodel> \ --labels=<path-to-labels> \ --colors=<path-to-colors> \ --input_blob=data \ --output_blob=score_fr
segnet-camera
Similar to the last example, you can run segnet as a live camera demo:
cd jetson-inference/build/aarch64/bin ./segnet-camera