Pose Estimation using TensorRT on NVIDIA Jetson

This guide is based on the Real time human pose estimation project on Jetson Nano at 22FPS from NVIDIA and the repository Real-time pose estimation accelerated with NVIDIA TensorRT.

This is a NVIDIA demo that uses a pose estimation model trained on PyTorch and deployed with TensorRT to demonstrate PyTorch to TRT conversion and pose estimation performance on NVIDIA Jetson platforms.

PyTorch Installation

To install PyTorch on NVIDIA Jetson TX2 you will need to build from the source and apply a small patch.

First install pip and cmake

sudo apt-get install python-pip
sudo apt-get install cmake

Clone the PyTorch repo

I used v1.0.0 because in other versions disabling NCCL in the cmake and Setup.py wasn't working:

git clone http://github.com/pytorch/pytorch
cd pytorch
git checkout v1.0.0
git submodule update --init --recursive

Install PyTorch prerequisites

sudo -H pip3 install -U setuptools
sudo -H  pip3 install -r requirements.txt

Applying Patch

  • You will need to disable NCCL (NVIDIA multi-GPU library for desktop GPUs) and distributed processing. Also load the CUDA toolkit library as static. Here is the patch
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 159b15367e..6f7423df4e 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -95,7 +95,7 @@ option(USE_LMDB "Use LMDB" ON)
 option(USE_METAL "Use Metal for iOS build" ON)
 option(USE_MOBILE_OPENGL "Use OpenGL for mobile code" ON)
 option(USE_NATIVE_ARCH "Use -march=native" OFF)
-option(USE_NCCL "Use NCCL" ON)
+option(USE_NCCL "Use NCCL" OFF)
 option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)
 option(USE_NNAPI "Use NNAPI" OFF)
@@ -119,7 +119,7 @@ option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)
 option(USE_ZMQ "Use ZMQ" OFF)
 option(USE_ZSTD "Use ZSTD" OFF)
-option(USE_DISTRIBUTED "Use distributed" ON)
+option(USE_DISTRIBUTED "Use distributed" OFF)
     USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON
diff --git a/cmake/public/cuda.cmake b/cmake/public/cuda.cmake
index 849fa07524..9c71bfa027 100644
--- a/cmake/public/cuda.cmake
+++ b/cmake/public/cuda.cmake
@@ -9,6 +9,8 @@ endif()
 # release (3.11.3) yet. Hence we need our own Modules_CUDA_fix to enable sccache.
 # Find CUDA.
 find_package(CUDA 7.0)
diff --git a/setup.py b/setup.py
index 20654625ab..be5191ac63 100644
--- a/setup.py
+++ b/setup.py
@@ -198,6 +198,8 @@ IS_DARWIN = (platform.system() == 'Darwin')
 IS_LINUX = (platform.system() == 'Linux')
 IS_PPC = (platform.machine() == 'ppc64le')
 IS_ARM = (platform.machine() == 'aarch64')
+USE_NCCL = False
 BUILD_PYTORCH = check_env_flag('BUILD_PYTORCH')
 # ppc64le and aarch64 do not support MKLDNN

Build and install PyTorch

(this will take a while)

sudo python3 setup.py install
cd ..

Other dependencies

Install the required modules and packages with pip

sudo -H pip3 install Pillow==6.1
sudo -H pip3 install torchvision
sudo -H pip3 install tensorrt
sudo -H pip3 install tqdm
sudo -H pip3 install cython
sudo -H pip3 install pycocotools
sudo apt-get install python3-matplotlib

Install Jetcam

(Jetcam is a NVIDIA util to access a CSI or USB camera in Python)

git clone https://github.com/NVIDIA-AI-IOT/jetcam
cd jetcam
sudo python3 setup.py install
cd ..

Jetcam is really simple to use. Once installed you can access the camera with:

from jetcam.usb_camera import USBCamera
#from jetcam.csi_camera import CSICamera
from jetcam.utils import bgr8_to_jpeg

camera = USBCamera(width=WIDTH, height=HEIGHT, capture_fps=30)
#camera = CSICamera(width=WIDTH, height=HEIGHT, capture_fps=30)

camera.running = True

# The current camera frame.
# Attach the execution function whenever a new camera frame is received.
camera.observe(function, names='value')

Install torch2trt

(torch2trt is a NVIDIA util to convert Torch models to TRT)

git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
sudo python3 setup.py install
cd ..

Run the real time human pose estimation using TensorRT demo

Clone and install trt_pose repo

git clone https://github.com/NVIDIA-AI-IOT/trt_pose
cd trt_pose
sudo python3 setup.py install
cd ..


Download the models from the following links and place the downloaded weighs in tasks/human_pose.

Run the demo

(live_demo.py) with jupyter notebook:

cd trt_pose/tasks/human_pose
jupyter notebook

The expected output should be something like this:

TRT pose demo output

