Tritonserver support for NVIDIA Jetson Platforms
Introduction
From the official Triton documentation: The Triton Inference Server provides a cloud inferencing solution optimized for both CPUs and GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton Server is also available as a shared library with an API that allows the full functionality of the server to be included directly in an application.
In here you will find how to run and test the server on a JetPack step by step.
Native support
Native support for JetPack is supported in the latest relases of tritonserver. Latest release at the moment of writing is 2.16, which supports JetPack 4.6, with the TensorFlow 2.6.0, TensorFlow 1.15.5, TensorRT 8.0.1.6 and OnnxRuntime 1.8.1, you can find more information here: https://github.com/triton-inference-server/server/releases/tag/v2.16.0#Jetson_Jetpack_Support.
In order to use this release or any others, you can download the tarball with JetPack support from the release downloads: https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz. This tarball includes the tritonserver as well as the client applications.
Steps to use Triton server
1. Install the SDK components
In order to use tritonserver with all it's capabilities for the models, you should first install all the SDK components available for the board on JetPack 4.6
2. Install Triton server dependencies
apt-get update apt-get install -y --no-install-recommends \ software-properties-common \ autoconf \ automake \ build-essential \ cmake \ git \ libb64-dev \ libre2-dev \ libssl-dev \ libtool \ libboost-dev \ libcurl4-openssl-dev \ rapidjson-dev \ patchelf \ zlib1g-dev
3. Download a models repository
This step is optional if you already have a models repository, if you don't have one for testing purposes, you can get some by using the step located in here: https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md#create-a-model-repository
For this you need to:
git clone https://github.com/triton-inference-server/server.git tritonserver-src cd tritonserver-src && cd docs/examples ./fetch_models.sh export MODEL_PATH=$PWD/model_repository
The models will then be located under the model_repository folder in the same directory as the executed script, exported in the MODEL_PATH variable.
4. Download the triton server
The tarball will include the executable and needed shared libraries, alongside the tritonserver backends
wget https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz tar -xzf tritonserver2.16.0-jetpack4.6.tgz cd tritonserver2.16.0-jetpack4.6 export BACKEND_PATH=$PWD/backends
5. Execute the server
You can then execute the server this way:
./bin/tritonserver --model-repository=$MODEL_PATH --backend-directory=$BACKEND_PATH --backend-config=tensorflow,version=2
And you can check is up and ok by using the ready endpoint:
curl -v localhost:8000/v2/health/ready
Steps to use Triton client
1. Install all dependencies
apt-get install -y --no-install-recommends \ curl \ pkg-config \ python3 \ python3-pip \ python3-dev pip3 install --upgrade wheel setuptools cython && \ pip3 install --upgrade grpcio-tools numpy==1.19.4 future attrdict
2. Install the python package
The python package is located inside the downloaded and uncompressed release.
You can install it by using:
python3 -m pip install --upgrade clients/python/tritonclient-2.16.0-py3-none-any.whl[all]
3. Using one of the examples
There are some python examples located in here: https://github.com/triton-inference-server/client/tree/main/src/python/examples
These examples are included inside the downloaded release too.
You can run for example to perform detection on an image:
python3 ./examples/image_client.py -m densenet_onnx -c 3 -s VGG /home/nvidia/client/src/python/cistus1.jpg --url localhost:8000 --protocol HTTP
Yocto support
Docker support
Kubernetes support
A full guide on how to run Tritonserver with kubernetes is located in here: https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/
Here is a summary and more detailed steps on how to do that.
The first thing that you should do is install kubernetes, in this case on Ubuntu 18.04.
Installing kubectl
The guide to install kubernetes Kubectl is located in here: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management
But overall, you should run:
# Install dependencies sudo apt-get update sudo apt-get install -y apt-transport-https ca-certificates curl # Add kubernetes to package registry sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list # Install kubectl sudo apt-get update sudo apt-get install -y kubectl sudo apt-mark hold kubectl
Installing kubeadmin
There are some tools necessary to run kubernetes, these are located here: https://kubernetes.io/docs/tasks/tools/
We should install kubelet and kubeadm, to work alongside kubectl. As stated here https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm, you should use the following commands:
# Install kubelet and kubeadm sudo apt-get update sudo apt-get install -y kubelet kubeadm kubectl sudo apt-mark hold kubelet kubeadm kubectl
After installing, you will need to configure multiple things:
Kubelet cgroup
You can find more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configuring-a-cgroup-driver
The reason to make this modification is because the kubelet and the container engine (docker or containerd) need to be running under the same group. For this, you can configure the kubelet to run on the same group as docker for example, which runs under cgroupfs. To do this:
sudo vim /etc/default/kubelet
And add
KUBELET_EXTRA_ARGS="--cgroup-driver=cgroupfs"
Kubectl for non root user
You can find more information here:
First run once the kubeadm as root:
sudo kubeadm init
Then create the configuration in your home directory:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
Then stop the kubeadm
sudo kubeadm reset -f
Pod network add-on
You will need a pod network add-on in order for the master node to start correctly, more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network
In the link you can find a list of add-ons available, but in this issue thread is one that worked: https://github.com/kubernetes/kubernetes/issues/48798
To install it and use it:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml