Containers have become the unit of deployment not just for data center and cloud workloads but also for edge applications. Along with containers, Kubernetes has become the foundation of the infrastructure. Distributions such as K3s are fueling the adoption of Kubernetes at the edge.
I have seen many challenges when working with large retailers and system integrators rolling out Kubernetes-based edge infrastructure. One of them is the ability to mix and match ARM64 and AMD64 devices to run AI workloads. Customers often run a heterogeneous (AMD64/ARM64) multinode Kubernetes cluster for scalability and high availability. The AMD64 devices are typically based on Intel NUC, while the ARM64 devices are powered by the Jetson Nano and Jetson Xavier NX System-on-Modules (SOMs) from NVIDIA.
SeeedStudio, a leader in IoT kits for makers and developers, has launched a carrier board that can house four Jetson Nano or Jetson Xavier NX SOMs. Branded as Jetson Mate, the kit is ideal for prototyping and deploying AI workloads at the edge. It can be a standalone device to run Kubernetes infrastructure or join an existing cluster providing AI acceleration to deep learning models.
Late last year, SeeedStudio sent me Jetson Mate Cluster Mini, which comes with the carrier board, cooling fan, and a case. This device has everything you need to build an end-to-end AI inference testbed powered by Jetson SOMs and Kubernetes.
Jetson modules are available through the developer kits that come with the combination of SOM and the carrier board. The Jetson Nano Developer kit and Jetson Xavier NX Developer kit are an example of integrated devices used for prototyping AI solutions for the edge. However, for production usage, NVIDIA recommends Jetson Modules that are integrated with production-grade, ruggedized hardware appliances available through their partners.
Jetson Mate supports both the modules available in the developer kits as well as the production-grade compute modules. Makers and developers can plug the SOM directly into the Jetson Mate carrier board and bootstrap them through NVIDIA JetPack SDK running on a different machine.
When using the developer kit, you can boot them directly through the preconfigured SD card image available from NVIDIA. This is the most convenient option for configuring each module separately and then simply plugging it into the Jetson Mate carrier board.
Thanks to the onboard gigabit ethernet switch that enables the 4 SoMs to communicate with each other, you can just use one network cable to connect Jetson Mate to the Internet. One of the modules is considered to be the master node, while the other modules are the worker nodes of a Kubernetes cluster.
All three peripheral SoMs can be turned on or off individually. With a 65W 2-Port PD charger for Jetson Nano SoMs or a 90W 2-Port PD charger for Jetson NX SoMs and a CAT6 ethernet cable, you can easily build your own Jetson cluster running Kubernetes.
The RGB cooling fan is quiet and efficient, which helps us maintain the cluster’s temperature. I bought a USB-C PD adapter typically used with Dell laptops to provide consistent and reliable power to the entire cluster.
Building a Cloud Native AI Inference Engine with Jetson Mate
I am excited to build a powerful AI inference engine based on 4X NVIDIA Jetson Nano modules, SeeedStudio’s Jetson Mate Mini, NVIDIA Jetpack 4.6, and the NVIDIA Triton Inference Server. The Kubernetes infrastructure is based on K3s — a Kubernetes lightweight distribution.
When installing K3s on Jetson Nano, you have to point the default container runtime to NVIDIA Container Toolkit. The first SoM is configured as the K3s server and the remaining three as K3s agents. We force K3s to use NVIDIA Container Toolkit with the
--docker switch set in the configuration. It’s also a good idea to disable servicelb load balancer and the traefik ingress that come by default with K3s installation. At the end of this step, we have a fully configured 4-node Kubernetes cluster.
Once K3s is installed, I configured an NFS share on the master and mounted that on each node. Triton uses this shared file system to load models used for inference. In a typical deployment scenario, NFS is replaced by an object storage service such as Amazon S3 of MinIO. To keep the setup simple, I used NFS as the shared model store for Triton.
Triton Inference Server is a scalable model server from NVIDIA. It supports multiple deep learning frameworks and runtimes, including TensorRT, TensorFlow, PyTorch, and ONNX. Starting Jetpack 4.6.1, NVIDIA supports Triton on Jetson. But there is no readily available container image or a Helm chart to deploy it in our K3s cluster running on Jetson Mate. I decided to build a container image for Triton myself that comes in handy in running the inference engine at the edge.
In order to ensure that the Triton pod runs on each node taking advantage of the underlying GPU, I deployed it as a daemonset. This will force the Triton pod to run exactly one pod per node while becoming highly available. We expose Triton’s REST and gRPC endpoints as Kubernetes services for the clients to invoke the AI models. The models are loaded from the distributed NFS share configured earlier.
Finally, I copied an inception model graph to the NFS share and configured it for Triton. When a client requests this model by invoking the REST endpoint, it is routed to one of the pods of the daemonset.
In the next part of this article, I will walk through all the steps needed to deploy Triton Inference Server on Jetson Mate. Stay tuned!