Tritonserver support for NVIDIA Jetson Platforms
![]() |
|
Introduction to Triton Inference Server
From the official NVIDIA Triton Inference Server documentation: The Triton Inference Server provides a cloud inferencing solution optimized for both CPUs and GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton Server is also available as a shared library with an API that allows the full functionality of the server to be included directly in an application.
Here you will find how to run and test the server on a JetPack step by step.
Native support
Native support for JetPack is supported in the latest releases of tritonserver. The latest release at the moment of writing is 2.16, which supports JetPack 4.6, with the TensorFlow 2.6.0, TensorFlow 1.15.5, TensorRT 8.0.1.6, and OnnxRuntime 1.8.1, you can find more information here: https://github.com/triton-inference-server/server/releases/tag/v2.16.0#Jetson_Jetpack_Support.
In order to use this release or any others, you can download the tarball with JetPack support from the release downloads: https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz. This tarball includes the tritonserver as well as the client applications.
Steps to use Triton server
1. Install the SDK components
In order to use tritonserver with all its capabilities for the models, you should first install all the SDK components available for the board on JetPack 4.6
2. Install Triton server dependencies
apt-get update apt-get install -y --no-install-recommends \ software-properties-common \ autoconf \ automake \ build-essential \ cmake \ git \ libb64-dev \ libre2-dev \ libssl-dev \ libtool \ libboost-dev \ libcurl4-openssl-dev \ rapidjson-dev \ patchelf \ zlib1g-dev
3. Download a models repository
This step is optional if you already have a models repository, if you don't have one for testing purposes, you can get some by using the step located here: https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md#create-a-model-repository
For this you need to:
git clone https://github.com/triton-inference-server/server.git tritonserver-src cd tritonserver-src && cd docs/examples ./fetch_models.sh export MODEL_PATH=$PWD/model_repository
The models will then be located under the model_repository folder in the same directory as the executed script, exported in the MODEL_PATH variable.
4. Download the triton server
The tarball will include the executable and needed shared libraries, alongside the tritonserver backends
wget https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz tar -xzf tritonserver2.16.0-jetpack4.6.tgz cd tritonserver2.16.0-jetpack4.6 export BACKEND_PATH=$PWD/backends
5. Execute the server
You can then execute the server this way:
./bin/tritonserver --model-repository=$MODEL_PATH --backend-directory=$BACKEND_PATH --backend-config=tensorflow,version=2
And you can check is up and ok by using the ready endpoint:
curl -v localhost:8000/v2/health/ready
Steps to use Triton client
1. Install all dependencies
apt-get install -y --no-install-recommends \ curl \ pkg-config \ python3 \ python3-pip \ python3-dev pip3 install --upgrade wheel setuptools cython && \ pip3 install --upgrade grpcio-tools numpy==1.19.4 future attrdict
2. Install the python package
The python package is located inside the downloaded and uncompressed release.
You can install it by using:
python3 -m pip install --upgrade clients/python/tritonclient-2.16.0-py3-none-any.whl[all]
3. Using one of the examples
There are some python examples located here: https://github.com/triton-inference-server/client/tree/main/src/python/examples
These examples are included inside the downloaded release too.
You can run for example to perform detection on an image:
python3 ./examples/image_client.py -m inception_graphdef -c 3 -s INCEPTION /home/nvidia/client/src/python/cistus1.jpg --url localhost:8000 --protocol HTTP
Docker support
If you search the documentation, you will find that there are docker images on NGC for tritonserver, named nvcr.io/nvidia/tritonserver:<xx.yy>-py3
. These images don't work for JetPack because are built for Windows or Ubuntu for PC, therefore you should need to create one by using the following Dockerfile.
FROM nvcr.io/nvidia/l4t-ml:r32.6.1-py3 ARG DEBIAN_FRONTEND=noninteractive RUN apt-get update && \ apt-get install -y --no-install-recommends \ software-properties-common \ autoconf \ automake \ build-essential \ cmake \ git \ libb64-dev \ libre2-dev \ libssl-dev \ libtool \ libboost-dev \ libcurl4-openssl-dev \ rapidjson-dev \ patchelf \ zlib1g-dev && \ rm -rf /var/lib/apt/lists/* WORKDIR /tritonserver RUN wget https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz && \ tar -xzf tritonserver2.16.0-jetpack4.6.tgz && \ rm tritonserver2.16.0-jetpack4.6.tgz ENV LD_LIBRARY_PATH=/tritonserver/backends/tensorflow1:$LD_LIBRARY_PATH # tritonserver looks in /opt/tritonserver/backends by default RUN mkdir -p /opt/tritonserver/backends && cp -r ./backends/* /opt/tritonserver/backends/ ENTRYPOINT ["/tritonserver/bin/tritonserver"] CMD ["--help"]
You may do so by appending that into a Dockerfile file. And then:
sudo docker build . -t tritontest:0.1
If your file is not named "Dockerfile", you may use the -f <filename> switch in the command.
Then you can run a container with that image
sudo docker run --gpus=1 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/nvidia/server/docs/examples/model_repository:/models tritontest:0.1 --model-repository=/models
Then you can run any client as described in: #Steps_to_use_Triton_client
![]() | NOTE: In case that you have no space left on docker by default, you can use this to move the location of it somewhere else: Please follow Docker_Tutorial#Moving_docker_location |
Kubernetes support
A full guide on how to run Tritonserver with kubernetes is located in here: https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/
Here is a summary and more detailed steps on how to do that.
The first thing that you should do is install kubernetes, in this case on Ubuntu 18.04.
Installing kubectl
The guide to install kubernetes Kubectl is located in here: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management
But overall, you should run:
# Install dependencies sudo apt-get update sudo apt-get install -y apt-transport-https ca-certificates curl # Add kubernetes to package registry sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list # Install kubectl sudo apt-get update sudo apt-get install -y kubectl sudo apt-mark hold kubectl
Installing kubeadmin
There are some tools necessary to run kubernetes, these are located here: https://kubernetes.io/docs/tasks/tools/
We should install kubelet and kubeadm, to work alongside kubectl. As stated here https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm, you should use the following commands:
# Install kubelet and kubeadm sudo apt-get update sudo apt-get install -y kubelet kubeadm kubectl sudo apt-mark hold kubelet kubeadm kubectl
After installing, you will need to configure multiple things:
Disable swap
sudo swapoff -a
Kubelet cgroup
You can find more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configuring-a-cgroup-driver
The reason to make this modification is that the kubelet and the container engine (docker or containerd) need to be running under the same group. For this, you can configure the kubelet to run on the same group as a docker for example, which runs under cgroupfs. To do this:
sudo vim /etc/default/kubelet
And add
KUBELET_EXTRA_ARGS="--cgroup-driver=cgroupfs"
Kubectl for non root user
You can find more information here:
First run once the kubeadm as root:
sudo kubeadm init
Then create the configuration in your home directory:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
Then stop the kubeadm
sudo kubeadm reset -f
Pod network add-on
You will need a pod network add-on in order for the master node to start correctly, more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network
In the link, you can find a list of add-ons available, but in this issue thread is one that worked: https://github.com/kubernetes/kubernetes/issues/48798
To install it and use it:
sudo kubeadm init kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml sudo kubeadm init --pod-network-cidr=10.244.0.0/16
![]() | NOTE: This IP address will be the base network address for all the running containers under Kubernetes. |
Tritonserver Kubernetes deployment
In this case, we are going to create a deployment of one replica using kubernetes, by using the same docker image we created above. This is done using the following deployment YAML file description for kubernetes:
apiVersion: apps/v1 kind: Deployment metadata: name: triton-deploy labels: app: triton-app spec: replicas: 1 selector: matchLabels: app: triton-app template: metadata: labels: app: triton-app spec: volumes: - name: models-volume hostPath: # directory location on host path: /home/nvidia/server/docs/examples/model_repository/ # this field is optional type: Directory containers: - name: triton-container ports: - containerPort: 8000 name: http-triton - containerPort: 8001 name: grpc-triton - containerPort: 8002 name: metrics-triton image: "tritontest:0.1" volumeMounts: - mountPath: /models name: models-volume command: ["/bin/sh", "-c"] args: ["/tritonserver/bin/tritonserver --model-repository=/models --allow-gpu-metrics=false --strict-model-config=false"]
![]() | NOTE: We are using a model repository already available on the specified path, this then gets mounted inside the container created, so you should put the one you have here, as specified in the step 3 from Steps_to_use_Triton_server |
You can check if the pods were created correctly
$ sudo kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES triton-deploy-64464b877f-5scws 1/1 Running 0 33m 10.244.0.7 localhost.localdomain <none> <none>
If the pod is not running, you can check why by using:
sudo kubectl describe pod triton-deploy-64464b877f-5scws
If the pod is running, you can now use the tritonserver that is running under the IP obtained from the above step by using the example from the client as mentioned in #Steps_to_use_Triton_client:
python3 ./examples/image_client.py -m inception_graphdef -c 3 -s INCEPTION /home/nvidia/client/src/python/cistus1.jpg --url 10.244.0.7:8000 --protocol HTTP
K3s support
You can run the Triton server on K3s, a lightweight Kubernetes distribution built for IoT and edge computing.
First, you need to install K3s, in this case on Ubuntu 18.04.
Installing K3s
You can find documentation about the options to install K3s here: https://rancher.com/docs/k3s/latest/en/installation/install-options/
The most simple way is to use script installation:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.19.7+k3s1 K3S_KUBECONFIG_MODE="644" sh -
After installing, you can check that your master node is Ready:
$ k3s kubectl get node NAME STATUS ROLES AGE VERSION ridgerun Ready control-plane,master 5d21h v1.19.7+k3s1
Tritonserver K3s deployment
You can use the same Kubernetes deployment YAML file from Tritonserver_Kubernetes_deployment
Then deploy your server by running:
k3s kubectl apply -f <name-of-the-yaml-file>
You can check if the pods were created correctly
$ k3s kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES triton-deploy-6d5f68b4d-m9qgp 1/1 Running 0 21m 10.42.0.22 localhost.localdomain <none> <none>
If the pod is not running, you can check why by using:
k3s kubectl describe pod triton-deploy-6d5f68b4d-m9qgp
If the pod is running, you can use the tritonserver with the IP obtained from the above step, by using the example from the client as mentioned in #Steps_to_use_Triton_client:
python3 ./examples/image_client.py -m inception_graphdef -c 3 -s INCEPTION /home/nvidia/client/src/python/cistus1.jpg --url 10.42.0.22:8000 --protocol HTTP
For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.
Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.