Getting Started with GPUs in Google Kubernetes Engine

In the last part of this series, I introduced Nvidia-Docker to access GPUs from containers. In this tutorial, I will walk you through the steps involved in accessing GPUs from Kubernetes.
Google Kubernetes Engine (GKE) is one of the first hosted Kubernetes platforms to offer GPUs to customers. Based on Nvidia Tesla K80 and P100 GPUs, GKE makes it possible to run containerized machine learning jobs, image processing, and financial modeling at scale in the cloud. The feature is currently available in Beta in select regions of Google Cloud Platform.
As an advocate of Kubernetes, and a budding machine learning developer, I am very excited to see the availability of GPUs. This capability will bring highly scalable training and inferencing to machine learning jobs deployed on Kubernetes.
Assuming you have a valid GCP account, and the Google Cloud SDK configured on your development machine, you can launch a GPU-backed Kubernetes cluster.
Let’s start by verifying the available accelerators and supported regions in the GCP cloud.
1 2 3 |
$ gcloud beta compute accelerator-types list <img class="aligncenter size-full wp-image-4496949" src="https://storage.googleapis.com/cdn.thenewstack.io/media/2018/04/d7522857-gpu-gke-1.jpg" alt="" width="947" height="827" data-id="4496949" /> |
The output confirms the availability of Nvidia Tesla K80 and P100 GPU accelerators in a few regions.
We will now launch a GKE cluster in asia-east1-a zone with two nodes. This is a normal cluster with no GPU nodes. After the cluster is provisioned, we will add a couple of nodes with GPU.
1 2 3 4 5 6 7 8 9 |
$ gcloud container clusters create k8s-gpu \ --num-nodes=2 \ --zone asia-east1-a \ --cluster-version 1.9.4-gke.1 $ kubectl get nodes |
With the cluster in place, we will now create a node pool with GPU-specific nodes. A node pool is a subset of node instances within a cluster that all have the same configuration.
When we create a container cluster, the number and type of nodes that are specified become the default node pool. Then, we can add additional custom node pools of different sizes and types to the cluster. All nodes in any given node pool are identical to one another.
The following command will create a new node pool and adds it to the existing cluster. The advantage of this approach is that each node pool can be scaled separately. Though we are only adding a single node initially, we can easily expand or shrink the pool based on the workload.
1 2 3 4 5 6 7 8 9 10 11 |
$ gcloud beta container node-pools \ create gpu-pool \ --num-nodes=1 \ --accelerator type=nvidia-tesla-k80,count=1 \ --zone asia-east1-a \ --cluster k8s-gpu |
The command used above is loaded with switches. Notice the switch –accelerator that mentions the type of GPU to use along with the number of GPUs. It is possible to add more than one GPU to a node in the pool.
Now the cluster has an additional GPU-backed node.
1 |
$ kubectl get nodes |
When a GPU node is added to the cluster, GKE runs a plugin as a pod on that specific node.
Checking the pods in the kube-system namespace will confirm this.
1 |
$ kubectl get pods -n=kube-system<em> </em> |
We also need to install the device driver as a DaemonSet that targets each GPU node in the cluster. Google provides a YAML file with the DaemonSet definition.
The installation takes several minutes to complete. Once installed, the Nvidia GPU device plugin exposes Nvidia GPU capacity via Kubernetes APIs.
1 |
$ kubectl create -f <a href="https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/nvidia-driver-installer/cos/daemonset-preloaded.yaml">https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/nvidia-driver-installer/cos/daemonset-preloaded.yaml</a> |
After a few minutes, the driver shows up in the kube-system namespace.
We are all set to run a GPU workload on the cluster. Let’s start by deploying an Ubuntu 16:04 image to check out the Nvidia configuration.
1 2 3 4 5 6 7 8 9 |
$ kubectl run cuda \ --image=ubuntu:16.04 \ --env="LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/bin" \ --limits="nvidia.com/gpu=1" \ --rm -it -- /bin/bash |
The above command creates a deployment called cuda with the GPU limit set of 1. Depending on the number of GPUs added to the nodes as a part of node pool, we can allocate the GPU resources to the pod. The command also sets an environment variable to add the binaries and libraries to the pod.
If everything goes well, we should be inside the shell of the Ubuntu container.
Navigate to the /usr/local/nvidia/bin directory to run the customary nvidia-smi command.
Congratulations! You are all set to run massively parallelizable workloads on Kubernetes.
If you are running a pod instead of a deployment, use the following declaration with nodeSelector to create the affinity. This will ensure that the pod is always scheduled on a node with GPU.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
apiVersion: v1 kind: Pod spec: containers: - name: my-gpu-container resources: limits: nvidia.com/gpu: 2 nodeSelector: cloud.google.com/gke-accelerator: nvidia-tesla-p100 # or nvidia-tesla-k80 |
In the next part of this tutorial, we will create a machine learning training job to build a Caffe model on the GKE cluster. That’s an exciting use case to exploit the combined power of Kubernetes and GPUs. Stay tuned!
Feature image: A 3D visualization of a heart taken generated by GPUs from 2D MRI images, as demonstrated at the Nvidia GPU Technical Conference.