Get, Post or Go Home?
Should GET and POST be the only HTTP request methods used by developers?
Yes, GET and POST are the only verbs needed.
No, DELETE, PATCH and other requests have their place.

Tutorial: Edge AI with Triton Inference Server, Kubernetes, Jetson Mate

In this tutorial, we will configure and deploy Nvidia Triton Inference Server on Jetson Mate to perform inference of computer vision models.
Apr 8th, 2022 8:46am by
Featued image for: Tutorial: Edge AI with Triton Inference Server, Kubernetes, Jetson Mate

In this tutorial, we will configure and deploy Nvidia Triton Inference Server on the Jetson Mate carrier board to perform inference of computer vision models. It builds on our previous post where I introduced Jetson Mate from Seeed Studio to run the Kubernetes cluster at the edge.

Though this tutorial focuses on Jetson Mate, you can use one or more Jetson Nano Developer Kits connected to a network switch to run the Kubernetes cluster.

Step 1: Install K3s on Jetson Nano System-on-Modules (SoMs)

Assuming you have installed and configured JetPack 4.6.x on all the four Jetson Nano 4GB modules, let’s start with the installation of K3s.

The first step is to turn Nvidia Container Toolkit into the default runtime for Docker. To do this, add the line "default-runtime": "nvidia" to the file, /etc/docker/daemon.json on each node. This is important because we want K3s to access the GPU available on Jetson Nano.

Config file for NVidia container runtime

SSH into the module inserted into the master slot, and run the below command to install K3s control plane.

By setting the --docker switch, we are forcing K3s to use Nvidia container runtime instead of the default containerd runtime. I am also disabling the Servicelb load balancer and Traefik service mesh on the cluster to avoid network-related issues.

Once the control plane is installed, verify if the K3s service is up and running.

Checking system status

Run the below command on the remaining three nodes to configure the cluster:

By the end of this step, we should have a four-node Kubernetes cluster based on K3s.

a 4 node cluster.

Step 2: Configure and Mount NFS Share in the Cluster

We are using NFS as the shared backend to store models accessed by the Triton Inference Server. The NFS client provisioner Helm chart helps us in exposing an existing NFS share to Kubernetes.

Replace the NFS server IP address with your server.

This Helm chart creates a storage class that supports the dynamic provisioning of shared PVCs.

Helm chart.

Step 3: Create a Shared PVC and Add a TensorFlow Model

With the NFS share ready, let’s configure the shared PVC to which we can add models accessed by Triton Inference Server.

Notice the storageClassName and the accessModes values in the specification. They help us create the shared PVC accessible to multiple Triton pods.

Create a shared PVC

When you access the directory exposed as an NFS share, you will see a directory that matches the name of the PVC.

Shared as a NFS folder

Let’s populate this directory with the Inception model, label file, and the configuration file expected by Triton Inference Server.

On the NFS server, run the following commands to download the model and moving it to the NFS share.

On my file server, the directory /var/nfs/general/ is exported as a NFS share. Replace that with your own path.

Next, let’s download the labels and the configuration file from the Triton GitHub repository.

Triton download

This directory structure enables Triton to access and load models from the shared backend.

Step 4: Building the Triton Inference Server Docker Image

As of April 2022, Nvidia doesn’t have an official container image for Triton targeting Jetson devices. To deploy it on Kubernetes, we will have to build our own Docker image.

This Dockerfile builds the image from base image optimized for JetPack 4.6.

Build, tag, and push this image to your favorite repository.

Step 5: Deploying Triton Inference Server on K3s

Triton Inference Server takes advantage of the GPU available on each Jetson Nano module. But, only one instance of Triton can use the GPU at a time. To ensure that we run one and only one instance of the Triton pod, we will configure it as a daemonset.

Notice that the shared PVC, models-pvc is mounted at /opt/triton/models for Triton to access the artifacts.

Deploy the daemonset and check if the pods are running and services are exposed.

kubectl get ds

kubectl get pods

kubectl get svc

The endpoints exposed by Triton pods are used by the clients for inference.

Step 6: Performing Inference from Python

Download the Python image client example code from Triton GitHub repository. To install the required modules, create a requirements.txt file with the below content and use pip3 to install.

Download a picture of a car, and run the image client by sending the picture as a parameter.

Picture of a car

Replace the IP address with the correct NodePort of your cluster. You should see the below output:

replace ip address

This concludes the end-to-end tutorial on installing and configuring Triton Inference Server on a Jetson Nanocluster running K3s.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Docker.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.