Tritonserver support for NVIDIA Jetson Platforms

Introduction

From the official Triton documentation: The Triton Inference Server provides a cloud inferencing solution optimized for both CPUs and GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton Server is also available as a shared library with an API that allows the full functionality of the server to be included directly in an application.

In here you will find how to run and test the server on a JetPack step by step.

Native support

Native support for JetPack is supported in the latest relases of tritonserver. Latest release at the moment of writing is 2.16, which supports JetPack 4.6, with the TensorFlow 2.6.0, TensorFlow 1.15.5, TensorRT 8.0.1.6 and OnnxRuntime 1.8.1, you can find more information here: https://github.com/triton-inference-server/server/releases/tag/v2.16.0#Jetson_Jetpack_Support.

In order to use this release or any others, you can download the tarball with JetPack support from the release downloads: https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz. This tarball includes the tritonserver as well as the client applications.

Steps to use Triton server

1. Install the SDK components

In order to use tritonserver with all it's capabilities for the models, you should first install all the SDK components available for the board on JetPack 4.6

2. Install Triton server dependencies

apt-get update
apt-get install -y --no-install-recommends \
    software-properties-common \
    autoconf \
    automake \
    build-essential \
    cmake \
    git \
    libb64-dev \
    libre2-dev \
    libssl-dev \
    libtool \
    libboost-dev \
    libcurl4-openssl-dev \
    rapidjson-dev \
    patchelf \
    zlib1g-dev

3. Download a models repository

This step is optional if you already have a models repository, if you don't have one for testing purposes, you can get some by using the step located in here: https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md#create-a-model-repository

For this you need to:

git clone https://github.com/triton-inference-server/server.git tritonserver-src
cd tritonserver-src && cd docs/examples
./fetch_models.sh
export MODEL_PATH=$PWD/model_repository

The models will then be located under the model_repository folder in the same directory as the executed script, exported in the MODEL_PATH variable.

4. Download the triton server

The tarball will include the executable and needed shared libraries, alongside the tritonserver backends

wget https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz
tar -xzf tritonserver2.16.0-jetpack4.6.tgz
cd tritonserver2.16.0-jetpack4.6
export BACKEND_PATH=$PWD/backends

5. Execute the server

You can then execute the server this way:

./bin/tritonserver --model-repository=$MODEL_PATH --backend-directory=$BACKEND_PATH --backend-config=tensorflow,version=2

And you can check is up and ok by using the ready endpoint:

curl -v localhost:8000/v2/health/ready

Steps to use Triton client

1. Install all dependencies

apt-get install -y --no-install-recommends \
        curl \
        pkg-config \
        python3 \
        python3-pip \
        python3-dev

pip3 install --upgrade wheel setuptools cython && \
pip3 install --upgrade grpcio-tools numpy==1.19.4 future attrdict

2. Install the python package

The python package is located inside the downloaded and uncompressed release.

You can install it by using:

python3 -m pip install --upgrade  clients/python/tritonclient-2.16.0-py3-none-any.whl[all]

3. Using one of the examples

There are some python examples located in here: https://github.com/triton-inference-server/client/tree/main/src/python/examples

These examples are included inside the downloaded release too.

You can run for example to perform detection on an image:

python3 ./examples/image_client.py -m densenet_onnx -c 3 -s VGG /home/nvidia/client/src/python/cistus1.jpg --url localhost:8000 --protocol HTTP

Yocto support

Docker support

Kubernetes support

A full guide on how to run Tritonserver with kubernetes is located in here: https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/

Here is a summary and more detailed steps on how to do that.

The first thing that you should do is install kubernetes, in this case on Ubuntu 18.04.

Installing kubectl

The guide to install kubernetes Kubectl is located in here: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management

But overall, you should run:

# Install dependencies
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl

# Add kubernetes to package registry
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

# Install kubectl
sudo apt-get update
sudo apt-get install -y kubectl
sudo apt-mark hold kubectl

Installing kubeadmin

There are some tools necessary to run kubernetes, these are located here: https://kubernetes.io/docs/tasks/tools/

We should install kubelet and kubeadm, to work alongside kubectl. As stated here https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm, you should use the following commands:

# Install kubelet and kubeadm
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

After installing, you will need to configure multiple things:

Kubelet cgroup

You can find more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configuring-a-cgroup-driver

The reason to make this modification is because the kubelet and the container engine (docker or containerd) need to be running under the same group. For this, you can configure the kubelet to run on the same group as docker for example, which runs under cgroupfs. To do this:

sudo vim /etc/default/kubelet

And add

KUBELET_EXTRA_ARGS="--cgroup-driver=cgroupfs"

Kubectl for non root user

You can find more information here:

First run once the kubeadm as root:

sudo kubeadm init

Then create the configuration in your home directory:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Then stop the kubeadm

sudo kubeadm reset -f

Pod network add-on

You will need a pod network add-on in order for the master node to start correctly, more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network

In the link you can find a list of add-ons available, but in this issue thread is one that worked: https://github.com/kubernetes/kubernetes/issues/48798

To install it and use it:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml