Tritonserver support for NVIDIA Jetson Platforms: Difference between revisions

Revision as of 20:28, 6 December 2021

Introduction Triton Inference Server

From the official Triton documentation: The Triton Inference Server provides a cloud inferencing solution optimized for both CPUs and GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton Server is also available as a shared library with an API that allows the full functionality of the server to be included directly in an application.

Here you will find how to run and test the server on a JetPack step by step.

Native support

Native support for JetPack is supported in the latest releases of tritonserver. Latest release at the moment of writing is 2.16, which supports JetPack 4.6, with the TensorFlow 2.6.0, TensorFlow 1.15.5, TensorRT 8.0.1.6, and OnnxRuntime 1.8.1, you can find more information here: https://github.com/triton-inference-server/server/releases/tag/v2.16.0#Jetson_Jetpack_Support.

In order to use this release or any others, you can download the tarball with JetPack support from the release downloads: https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz. This tarball includes the tritonserver as well as the client applications.

Steps to use Triton server

1. Install the SDK components

In order to use tritonserver with all its capabilities for the models, you should first install all the SDK components available for the board on JetPack 4.6

2. Install Triton server dependencies

apt-get update
apt-get install -y --no-install-recommends \
    software-properties-common \
    autoconf \
    automake \
    build-essential \
    cmake \
    git \
    libb64-dev \
    libre2-dev \
    libssl-dev \
    libtool \
    libboost-dev \
    libcurl4-openssl-dev \
    rapidjson-dev \
    patchelf \
    zlib1g-dev

3. Download a models repository

This step is optional if you already have a models repository, if you don't have one for testing purposes, you can get some by using the step located here: https://github.com/triton-inference-server/server/blob/main/docs/quickstart.md#create-a-model-repository

For this you need to:

git clone https://github.com/triton-inference-server/server.git tritonserver-src
cd tritonserver-src && cd docs/examples
./fetch_models.sh
export MODEL_PATH=$PWD/model_repository

The models will then be located under the model_repository folder in the same directory as the executed script, exported in the MODEL_PATH variable.

4. Download the triton server

The tarball will include the executable and needed shared libraries, alongside the tritonserver backends

wget https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz
tar -xzf tritonserver2.16.0-jetpack4.6.tgz
cd tritonserver2.16.0-jetpack4.6
export BACKEND_PATH=$PWD/backends

5. Execute the server

You can then execute the server this way:

./bin/tritonserver --model-repository=$MODEL_PATH --backend-directory=$BACKEND_PATH --backend-config=tensorflow,version=2

And you can check is up and ok by using the ready endpoint:

curl -v localhost:8000/v2/health/ready

Steps to use Triton client

1. Install all dependencies

apt-get install -y --no-install-recommends \
        curl \
        pkg-config \
        python3 \
        python3-pip \
        python3-dev

pip3 install --upgrade wheel setuptools cython && \
pip3 install --upgrade grpcio-tools numpy==1.19.4 future attrdict

2. Install the python package

The python package is located inside the downloaded and uncompressed release.

You can install it by using:

python3 -m pip install --upgrade  clients/python/tritonclient-2.16.0-py3-none-any.whl[all]

3. Using one of the examples

There are some python examples located here: https://github.com/triton-inference-server/client/tree/main/src/python/examples

These examples are included inside the downloaded release too.

You can run for example to perform detection on an image:

python3 ./examples/image_client.py -m densenet_onnx -c 3 -s VGG /home/nvidia/client/src/python/cistus1.jpg --url localhost:8000 --protocol HTTP

Docker support

If you search the documentation, you will find that there are docker images on NGC for tritonserver, named nvcr.io/nvidia/tritonserver:<xx.yy>-py3. These images don't work for JetPack because are built for Windows or Ubuntu for PC, therefore you should need to create one by using the following Dockerfile.


FROM nvcr.io/nvidia/l4t-ml:r32.6.1-py3

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        software-properties-common \
        autoconf \
        automake \
        build-essential \
        cmake \
        git \
        libb64-dev \
        libre2-dev \
        libssl-dev \
        libtool \
        libboost-dev \
        libcurl4-openssl-dev \
        rapidjson-dev \
        patchelf \
        zlib1g-dev && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /tritonserver

RUN wget https://github.com/triton-inference-server/server/releases/download/v2.16.0/tritonserver2.16.0-jetpack4.6.tgz && \
    tar -xzf tritonserver2.16.0-jetpack4.6.tgz && \
    rm tritonserver2.16.0-jetpack4.6.tgz

ENV LD_LIBRARY_PATH=/tritonserver/backends/tensorflow1:$LD_LIBRARY_PATH

# tritonserver looks in /opt/tritonserver/backends by default
RUN mkdir -p /opt/tritonserver/backends && cp -r ./backends/* /opt/tritonserver/backends/


ENTRYPOINT ["/tritonserver/bin/tritonserver"]
CMD ["--help"]

You may do so by appending that into a Dockerfile file. And then:

sudo docker build . -t tritontest:0.1

If your file is not named "Dockerfile", you may use the -f <filename> switch in the command.

Then you can run a container with that image

sudo docker run --gpus=1 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/nvidia/server/docs/examples/model_repository:/models tritontest:0.1 --model-repository=/models

Then you can run any client as described above: https://developer.ridgerun.com/wiki/index.php?title=Tritonserver_support_for_NVIDIA_Jetson_Platforms#Steps_to_use_Triton_client

Kubernetes support

A full guide on how to run Tritonserver with kubernetes is located in here: https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/

Here is a summary and more detailed steps on how to do that.

The first thing that you should do is install kubernetes, in this case on Ubuntu 18.04.

Installing kubectl

The guide to install kubernetes Kubectl is located in here: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management

But overall, you should run:

# Install dependencies
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl

# Add kubernetes to package registry
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

# Install kubectl
sudo apt-get update
sudo apt-get install -y kubectl
sudo apt-mark hold kubectl

Installing kubeadmin

There are some tools necessary to run kubernetes, these are located here: https://kubernetes.io/docs/tasks/tools/

We should install kubelet and kubeadm, to work alongside kubectl. As stated here https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm, you should use the following commands:

# Install kubelet and kubeadm
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

After installing, you will need to configure multiple things:

Disable swap

sudo swapoff -a

Kubelet cgroup

You can find more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configuring-a-cgroup-driver

The reason to make this modification is because the kubelet and the container engine (docker or containerd) need to be running under the same group. For this, you can configure the kubelet to run on the same group as docker for example, which runs under cgroupfs. To do this:

sudo vim /etc/default/kubelet

And add

KUBELET_EXTRA_ARGS="--cgroup-driver=cgroupfs"

Kubectl for non root user

You can find more information here:

First run once the kubeadm as root:

sudo kubeadm init

Then create the configuration in your home directory:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Then stop the kubeadm

sudo kubeadm reset -f

Pod network add-on

You will need a pod network add-on in order for the master node to start correctly, more information here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network

In the link, you can find a list of add-ons available, but in this issue thread is one that worked: https://github.com/kubernetes/kubernetes/issues/48798

To install it and use it:

sudo kubeadm init
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
sudo kubeadm init --pod-network-cidr=10.244.0.0/16

Tritonserver deployment

In this case we are going to create a deployment of one replica using kubernetes, by using the same docker image we created above. This is done using the following deployment YAML file descrption for kubernetes:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: triton-deploy
  labels:
    app: triton-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: triton-app
  template:
    metadata:
      labels:
        app: triton-app
    spec:
      volumes:
      - name: models-volume
        hostPath:
          # directory location on host
          path: /home/nvidia/server/docs/examples/model_repository/
          # this field is optional
          type: Directory
      containers:
        - name: triton-container
          ports:
          - containerPort: 8000
            name: http-triton
          - containerPort: 8001
            name: grpc-triton
          - containerPort: 8002
            name: metrics-triton
          image: "tritontest:0.1"
          volumeMounts:
          - mountPath: /models
            name: models-volume
          command: ["/bin/sh", "-c"]
          args: ["/tritonserver/bin/tritonserver --model-repository=/models --allow-gpu-metrics=false --strict-model-config=false"]

You can check if the pods were created correctly

$ sudo kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE   IP           NODE                    NOMINATED NODE   READINESS GATES
triton-deploy-64464b877f-5scws   1/1     Running   0          33m   10.244.0.7   localhost.localdomain   <none>           <none>

If the pod is not running, you can check why by using:

sudo kubectl describe pod triton-deploy-64464b877f-5scws

If the pod is running, you can now use the tritonserver that is running under the IP obtained from above by using the example from the client above here:

python3 ./examples/image_client.py -m densenet_onnx -c 3 -s VGG /home/nvidia/client/src/python/cistus1.jpg --url 10.244.0.7:8000 --protocol HTTP

RidgeRun Resources

Quick Start

Client Engagement Process

RidgeRun Blog

Homepage

Technical and Sales Support

RidgeRun Online Store

RidgeRun Videos

Contact Us

RidgeRun.ai: Artificial Intelligence | Generative AI | Machine Learning

Contact Us

Visit our Main Website for the RidgeRun Products and Online Store. RidgeRun Engineering information is available at RidgeRun Engineering Services, RidgeRun Professional Services, RidgeRun Subscription Model and Client Engagement Process wiki pages. Please email to support@ridgerun.com for technical questions and contactus@ridgerun.com for other queries. Contact details for sponsoring the RidgeRun GStreamer projects are available in Sponsor Projects page.

@@ Line 177: / Line 177: @@
 Then you can run any client as described above: https://developer.ridgerun.com/wiki/index.php?title=Tritonserver_support_for_NVIDIA_Jetson_Platforms#Steps_to_use_Triton_client
-'''NOTE:''' In case that you have no space left on docker by default, you can use this to move the location of it somewhere else: [[Docker_Tutorial#Moving_docker_location|https://developer.ridgerun.com/wiki/index.php?title=Docker_Tutorial#Moving_docker_location]]
+{{Ambox
+|type=notice
+|small=left
+|issue='''NOTE:''' In case that you have no space left on docker by default, you can use this to move the location of it somewhere else: [[Docker_Tutorial#Moving_docker_location|https://developer.ridgerun.com/wiki/index.php?title=Docker_Tutorial#Moving_docker_location]]
+|style=width:unset;
+}}
 == Kubernetes support ==