Tutorial: Deploying TensorFlow Models at the Edge with NVIDIA Jetson Nano and K3s

In this tutorial, we will explore the idea of running TensorFlow models as microservices at the edge. Jetson Nano, a powerful edge computing device will run the K3s distribution from Rancher Labs. It can be a single node K3s cluster or join an existing K3s cluster just as an agent.
For background, refer to my previous article on Jetson Nano and configuring it as an AI testbed.
For the completeness of the tutorial, we will run a single node K3s on Jetson Nano. If you want to turn that into an agent, follow the steps covered in one of the previous articles from the K3s series.
Step 1: Configure Docker Runtime
The Jetson platform from NVIDIA runs a flavor of Debian called L4T (Linux for Tegra) which is based on Ubuntu 18.04. The OS along with the CUDA-X drivers and SDKs is packaged into JetPack, a comprehensive software stack for the Jetson family of products such as Jetson Nano and Jetson Xavier.
Starting with JetPack 4.2, NVIDIA has introduced a container runtime with Docker integration. This custom runtime enables Docker containers to access the underlying GPUs available in the Jetson family.
Start by downloading the most recent version of JetPack and flash your Jetson Nano device with it.
Check the version of Docker runtime with the below command:
1 |
nvidia-docker version |
Since Docker supports custom runtimes, we can use the standard Docker CLI with --runtime nvidia
switch to use NVIDIA’s container runtime.
Instead of using the switch for every invocation, we can turn that into the default runtime by modifying the /etc/docker/daemon.json
file to add the line "default-runtime": "nvidia"
.
1 2 3 4 5 6 7 8 9 10 |
{ "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia" } |
Make sure you restart the Docker service or reboot your system before proceeding.
Step 2: Install K3s on Jetson Nano
The default container runtime in K3s is containerd, an industry-standard container runtime. This means that Docker CE and K3s will not share the same configuration and images.
For the AI workloads running in K3s, we need access to the GPU which is available only through the nvidia-docker
runtime. In the previous step, we already configured Docker to use the custom runtime.
Fortunately, K3s has an option to use existing Docker runtime instead of containerd. This is possible by adding the --docker
switch to the installation script.
Let’s go ahead and install K3s on NVIDIA Jetson Nano pointing it to the Docker runtime. We will also add a couple of other switches which makes it easy to use the kubectl
CLI with K3s.
1 |
mkdir $HOME/.kube/ |
1 |
curl -sfL https://get.k3s.io | sh -s - --docker --write-kubeconfig-mode 644 --write-kubeconfig $HOME/.kube/config |
Within a few minutes, K3s is up and running on our Jetson Nano.
1 |
kubectl get nodes |
Step 3: Run TensorFlow as a Kubernetes Pod on Jetson Nano
With the Kubernetes infrastructure available, we will try to run TensorFlow 2.x as a pod in our single node cluster powered by K3s.
NVIDIA has published a set of container images that are optimized for JetPack to run at the edge. They are available in the NVIDIA GPU Cloud (NGC) container registry.
Let’s pull the TensorFlow 2.2 container image for L4T from NGC.
1 |
sudo docker pull nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 |
Let’s see if TensorFlow can access the GPU available on Jetson Nano.
1 2 3 4 |
sudo docker run -it --rm --runtime nvidia \ --network host \ nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 \ python3 |
Within the Python shell, run the below code snippets to check the version and GPU access:
1 2 |
import tensorflow as tf print(tf.__version__) |
1 |
tf.config.list_physical_devices('GPU') |
As we can see, GPU device 0 is visible to TensorFlow.
Now, it’s time to see if we can run this as a Kubernetes pod and still access the GPU.
Create a simple pod specification which will keep the TensorFlow 2.2 container running.
1 2 3 4 5 6 7 8 9 10 |
apiVersion: v1 kind: Pod metadata: name: tensorflow spec: containers: - name: tf image: nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf2.2-py3 command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 30; done;" ] |
1 |
kubectl apply -f tf2.yaml |
1 |
kubectl get pods |
With TensorFlow pod running, let’s access the shell and try the same commands.
1 |
kubectl exec -it tensorflow -- python3 |
You should see that the GPU is available to TensorFlow.
Accessing the GPU from a K3s cluster through custom Docker runtime is a powerful mechanism to run AI at the edge in a cloud native environment. With TensorFlow running at the edge within Kubernetes, you can deploy deep learning models as microservices.
This enables many interesting use cases to bring the best of AI and IoT to Kubernetes infrastructure. In one of the upcoming tutorials, I will cover an end-to-end AI inference use case based on this platform. Stay tuned.
Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.