Tritonserver support for NVIDIA Jetson Platforms

Introduction to Triton Inference Server

From the official NVIDIA Triton Inference Server documentation: The Triton Inference Server provides a cloud inferencing solution optimized for both CPUs and GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton Server is also available as a shared library with an API that allows the full functionality of the server to be included directly in an application.

Here you will find how to run and test the server on a JetPack step by step.

Native support

Native support for JetPack is supported in the latest releases of tritonserver. The latest release at the moment of writing is 2.16, which supports JetPack 4.6, with the TensorFlow 2.6.0, TensorFlow 1.15.5, TensorRT 8.0.1.6, and OnnxRuntime 1.8.1, you can find more information here: https://github.com/triton-inference-server/server/releases/tag/v2.16.0#Jetson_Jetpack_Support.

In order to use this release or any others, you can download the tarball with JetPack support from the release downloads: https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz. This tarball includes the tritonserver as well as the client applications.

Steps to use Triton server

1. Install the SDK components

In order to use tritonserver with all its capabilities for the models, you should first install all the SDK components available for the board on JetPack 4.6

2. Install Triton server dependencies

apt-get update
apt-get install -y --no-install-recommends \
    software-properties-common \
    autoconf \
    automake \
    build-essential \
    cmake \
    git \
    libb64-dev \
    libre2-dev \
    libssl-dev \
    libtool \
    libboost-dev \
    libcurl4-openssl-dev \
    rapidjson-dev \
    patchelf \
    zlib1g-dev

3. Download a models repository

This step is optional if you already have a models repository, if you don't have one for testing purposes, you can get some by using the step located here: https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md#create-a-model-repository

For this you need to:

git clone https://github.com/triton-inference-server/server.git tritonserver-src
cd tritonserver-src && cd docs/examples
./fetch_models.sh
export MODEL_PATH=$PWD/model_repository

The models will then be located under the model_repository folder in the same directory as the executed script, exported in the MODEL_PATH variable.

4. Download the triton server

The tarball will include the executable and needed shared libraries, alongside the tritonserver backends

wget https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz
tar -xzf tritonserver2.16.0-jetpack4.6.tgz
cd tritonserver2.16.0-jetpack4.6
export BACKEND_PATH=$PWD/backends

5. Execute the server

You can then execute the server this way:

./bin/tritonserver --model-repository=$MODEL_PATH --backend-directory=$BACKEND_PATH --backend-config=tensorflow,version=2

And you can check is up and ok by using the ready endpoint:

curl -v localhost:8000/v2/health/ready

Steps to use Triton client

1. Install all dependencies

apt-get install -y --no-install-recommends \
        curl \
        pkg-config \
        python3 \
        python3-pip \
        python3-dev

pip3 install --upgrade wheel setuptools cython && \
pip3 install --upgrade grpcio-tools numpy==1.19.4 future attrdict

2. Install the python package

The python package is located inside the downloaded and uncompressed release.

You can install it by using:

python3 -m pip install --upgrade  clients/python/tritonclient-2.16.0-py3-none-any.whl[all]

3. Using one of the examples

There are some python examples located here: https://github.com/triton-inference-server/client/tree/main/src/python/examples

These examples are included inside the downloaded release too.

You can run for example to perform detection on an image:

python3 ./examples/image_client.py -m inception_graphdef -c 3 -s INCEPTION /home/nvidia/client/src/python/cistus1.jpg --url localhost:8000 --protocol HTTP

Docker support

If you search the documentation, you will find that there are docker images on NGC for tritonserver, named nvcr.io/nvidia/tritonserver:<xx.yy>-py3. These images don't work for JetPack because are built for Windows or Ubuntu for PC, therefore you should need to create one by using the following Dockerfile.


FROM nvcr.io/nvidia/l4t-ml:r32.6.1-py3

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        software-properties-common \
        autoconf \
        automake \
        build-essential \
        cmake \
        git \
        libb64-dev \
        libre2-dev \
        libssl-dev \
        libtool \
        libboost-dev \
        libcurl4-openssl-dev \
        rapidjson-dev \
        patchelf \
        zlib1g-dev && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /tritonserver

RUN wget https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz && \
    tar -xzf tritonserver2.16.0-jetpack4.6.tgz && \
    rm tritonserver2.16.0-jetpack4.6.tgz

ENV LD_LIBRARY_PATH=/tritonserver/backends/tensorflow1:$LD_LIBRARY_PATH

# tritonserver looks in /opt/tritonserver/backends by default
RUN mkdir -p /opt/tritonserver/backends && cp -r ./backends/* /opt/tritonserver/backends/


ENTRYPOINT ["/tritonserver/bin/tritonserver"]
CMD ["--help"]

You may do so by appending that into a Dockerfile file. And then:

sudo docker build . -t tritontest:0.1

If your file is not named "Dockerfile", you may use the -f <filename> switch in the command.

Then you can run a container with that image

sudo docker run --gpus=1 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/nvidia/server/docs/examples/model_repository:/models tritontest:0.1 --model-repository=/models

Then you can run any client as described in: #Steps_to_use_Triton_client

Kubernetes support

A full guide on how to run Tritonserver with kubernetes is located in here: https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/

Here is a summary and more detailed steps on how to do that.

The first thing that you should do is install kubernetes, in this case on Ubuntu 18.04.

Installing kubectl

The guide to install kubernetes Kubectl is located in here: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management

But overall, you should run:

# Install dependencies
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl

# Add kubernetes to package registry
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

# Install kubectl
sudo apt-get update
sudo apt-get install -y kubectl
sudo apt-mark hold kubectl

Installing kubeadmin

There are some tools necessary to run kubernetes, these are located here: https://kubernetes.io/docs/tasks/tools/

We should install kubelet and kubeadm, to work alongside kubectl. As stated here https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm, you should use the following commands:

# Install kubelet and kubeadm
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

After installing, you will need to configure multiple things:

Disable swap

sudo swapoff -a

Kubelet cgroup

You can find more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configuring-a-cgroup-driver

The reason to make this modification is that the kubelet and the container engine (docker or containerd) need to be running under the same group. For this, you can configure the kubelet to run on the same group as a docker for example, which runs under cgroupfs. To do this:

sudo vim /etc/default/kubelet

And add

KUBELET_EXTRA_ARGS="--cgroup-driver=cgroupfs"

Kubectl for non root user

You can find more information here:

First run once the kubeadm as root:

sudo kubeadm init

Then create the configuration in your home directory:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Then stop the kubeadm

sudo kubeadm reset -f

Pod network add-on

You will need a pod network add-on in order for the master node to start correctly, more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network

In the link, you can find a list of add-ons available, but in this issue thread is one that worked: https://github.com/kubernetes/kubernetes/issues/48798

To install it and use it:

sudo kubeadm init
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
sudo kubeadm init --pod-network-cidr=10.244.0.0/16

Tritonserver Kubernetes deployment

In this case, we are going to create a deployment of one replica using kubernetes, by using the same docker image we created above. This is done using the following deployment YAML file description for kubernetes:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: triton-deploy
  labels:
    app: triton-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: triton-app
  template:
    metadata:
      labels:
        app: triton-app
    spec:
      volumes:
      - name: models-volume
        hostPath:
          # directory location on host
          path: /home/nvidia/server/docs/examples/model_repository/
          # this field is optional
          type: Directory
      containers:
        - name: triton-container
          ports:
          - containerPort: 8000
            name: http-triton
          - containerPort: 8001
            name: grpc-triton
          - containerPort: 8002
            name: metrics-triton
          image: "tritontest:0.1"
          volumeMounts:
          - mountPath: /models
            name: models-volume
          command: ["/bin/sh", "-c"]
          args: ["/tritonserver/bin/tritonserver --model-repository=/models --allow-gpu-metrics=false --strict-model-config=false"]

You can check if the pods were created correctly

$ sudo kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE   IP           NODE                    NOMINATED NODE   READINESS GATES
triton-deploy-64464b877f-5scws   1/1     Running   0          33m   10.244.0.7   localhost.localdomain   <none>           <none>

If the pod is not running, you can check why by using:

sudo kubectl describe pod triton-deploy-64464b877f-5scws

If the pod is running, you can now use the tritonserver that is running under the IP obtained from the above step by using the example from the client as mentioned in #Steps_to_use_Triton_client:

python3 ./examples/image_client.py -m inception_graphdef -c 3 -s INCEPTION /home/nvidia/client/src/python/cistus1.jpg --url 10.244.0.7:8000 --protocol HTTP

K3s support

You can run the Triton server on K3s, a lightweight Kubernetes distribution built for IoT and edge computing.

First, you need to install K3s, in this case on Ubuntu 18.04.

Installing K3s

You can find documentation about the options to install K3s here: https://rancher.com/docs/k3s/latest/en/installation/install-options/

The most simple way is to use script installation:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.19.7+k3s1 K3S_KUBECONFIG_MODE="644" sh -

After installing, you can check that your master node is Ready:

$ k3s kubectl get node
NAME       STATUS   ROLES                  AGE     VERSION
ridgerun   Ready    control-plane,master   5d21h   v1.19.7+k3s1

Tritonserver K3s deployment

You can use the same Kubernetes deployment YAML file from Tritonserver_Kubernetes_deployment

Then deploy your server by running:

k3s kubectl apply -f <name-of-the-yaml-file>

You can check if the pods were created correctly

$ k3s kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP           NODE                       NOMINATED NODE   READINESS GATES
triton-deploy-6d5f68b4d-m9qgp   1/1     Running   0          21m     10.42.0.22   localhost.localdomain      <none>           <none>

If the pod is not running, you can check why by using:

k3s kubectl describe pod triton-deploy-6d5f68b4d-m9qgp

If the pod is running, you can use the tritonserver with the IP obtained from the above step, by using the example from the client as mentioned in #Steps_to_use_Triton_client:

python3 ./examples/image_client.py -m inception_graphdef -c 3 -s INCEPTION /home/nvidia/client/src/python/cistus1.jpg --url 10.42.0.22:8000 --protocol HTTP

For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.

Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.

❯

Tritonserver support for NVIDIA Jetson Platforms

Introduction to Triton Inference Server

Native support

Steps to use Triton server

Steps to use Triton client

Docker support

Kubernetes support

Installing kubectl

Installing kubeadmin

Disable swap

Kubelet cgroup

Kubectl for non root user

Pod network add-on

Tritonserver Kubernetes deployment

K3s support

Installing K3s

Tritonserver K3s deployment

COMPANY

SUPPORT

ARTIFICIAL INTELLIGENCE