Accelerating Development with Container Run Times, Kubernetes and GPUs

Nvidia, a dominant player in the GPU market, has made a series of recent investments in cloud native technologies with initiatives on the client-side, its GPU Cloud, and integrations with Kubernetes. For one, Nvidia is integrating its NVIDIA GPU Operator with the containerd runtime, starting with Kubernetes version 1.4.0.
The implications are real for anyone who has followed the evolution of container architectures. GPUs with containers form a path to machine learning scalability that in turn will create the long-term use of AI in our everyday lives in a way that will be transparent in the way we work and live. It could not have been possible to scale apps on our devices without the use of massive processing power from the giant computers we know as data centers and cloud services. It’s created a software layer that abstracts complexities in at-scale app development.
The Gap
But there is a gap. And the gap is in terms of scale.
Setting up the right development environment on top of a GPU is a pain. First, there is the Nvidia graphics driver, which communicates directly with the hardware/GPU; followed by CUDA runtime and various libraries such as cuDNN, a GPU-accelerated library of primitives for deep neural networks.
Finally, frameworks like TensorFlow or PyTorch are installed that can talk to the GPU through the stack. The version of the driver, runtime, libraries, and framework should be compatible with each other. Any deviation would interfere with the workflow and performance of training.
Through containers, Nvidia delivers the most compatible and optimized stack in a self-contained image. This removes the delicate process of handpicking the right components for the development environment. Of course, it also delivers portability. The same image can be used with an Amazon EC2 GPU instance or an on-prem GPU. The container packaging allows the developer to containerize their apps and toolkits and use the models that Nvidia researchers and data scientists build and publish. It makes this entire workflow deployable via Helm on Kubernetes environments.
Containers are the foundation for immutable infrastructure. They are portable, easily changed out, and optimized. A machine learning engineer may build a TensorFlow model that can be packaged with all of its libraries in a container. GPUs accelerate training but do not have the underlying capability to be virtualized. Their value is in the speed and massive processing of neural networks. With containers, the optimized models can be distributed and used in scaled-out architectures, ushering the real opportunity to make machine learning operations a reality.
For the past several years, the Kubernetes architecture has been valued for its use as a container orchestrator in the software development lifecycle. With GPUs, the market has found ways to parallel process massive data sets. Now as algorithm development becomes more accessible, container technologies are viewed as a way to make algorithms that use neural nets more scalable in enterprise architectures, be it in the cloud or private, distributed environments. Nvidia’s investment demonstrates how GPUs, which can’t be virtualized, can be used to accelerate the use of models such as conversational AI with applications for different industries.
Nvidia has turned its attention in recent years to containerd and CRI-O, now the open source runtimes of choice for those who use container-based architectures to manage application architectures on Kubernetes. CRIO-O was developed by Red Hat, when Docker was at the height of its popularity, circa 2016. CRI-O is a runtime based upon the container runtime interface (CRI) to be used with the Open Container Initiative (OCI). Containerd is a runtime that has become an industry standard and supervises the lifecycle of containers, communicating with the outside world via API calls.
Nvidia’s Container Toolkit allows users to build and run GPU accelerated Docker containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage Nvidia GPUs.
The container toolkit is extended to the Jetson family of edge devices that run the same CUDA-X stack. Developers can pull images from Nvidia GPU Cloud that are optimized for JetPack — the software stack powering the Jetson family of products for supporting AI operations.
The Nvidia GPU Cloud (NGC) is a catalog of frameworks for building machine learning models, such as TensorFlow, PyTorch, and MXNet. NGC also has a collection of Helm Charts for Kubernetes plus a catalog of models such as a people detector and facial recognition capabilities.
Nvidia made its initial technical investment in Docker to develop Nvidia Docker, allowing it to run GPU-accelerated Docker containers. It offered a library and a command-line interface utility to configure GNU/Linux containers leveraging NVIDIA hardware, according to documentation on GitHub.
Nvidia dropped the Docker name and branded its new service as what we know now as the Nvidia Container Toolkit. Helm charts are used by Nvidia as an application package manager to run on Kubernetes. The toolkit can be used to prepare the containers for Helm. The Helm charts allow the user to define the applications, install and keep them upgraded. According to the Nvidia documentation, users may create Helm charts and push them to NGC organization and team registry spaces for sharing with others. With its push into Kubernetes, Nvidia has developed a GPU Operator as a Helm installable.
In Kubernetes 1.20, support for Docker is deprecated, prompting many to rethink their future choice of container runtime, writes Kevin Klues, a principal software engineer on the Nvidia Cloud Native team in a post on the Nvidia developer blog. “For existing Docker users, the obvious and less risky choice is containerd as Docker already runs on top of containerd under the hood. From a user’s perspective, such a transition would be completely transparent.”
The Nvidia Operator supports the Docker and CRI-O runtimes. Users have asked for the operator to support containerd. Notable reason: microk8s, which, as Klues points out, only runs on containerd. The GPU Operator runs GPU nodes as if they were CPUs. It uses the operator framework in Kubernetes to automate the management of all Nvidia software components needed to provision GPUs.
The operator makes the deployment of GPUs in Kubernetes environments a first-class citizen, said Adel Al-Hallak, director of AI products for Nvidia in an interview last year. The capability allows the operator to manage a single golden image as opposed to having to manage two different images, one that sees CPU nodes and one that sees GPU nodes as well.
The advantage of containers with GPUs is in the packaging. Traditional practices call for models to be built from scratch by data scientists who may use a framework such as TensorFlow or PyTorch. The work is often so complex that only a Ph.D. scientist has the understanding of how to build the models. Due to the complexity and knowledge needed, data scientists may earn $2500,000 per year, according to the University of Wisconsin.
Nvidia is now building out toolkits for data scientists that allow them to package models in containers that developers may access through the Nvidia GPU Cloud registry, using techniques such as transfer learning. With transfer learning, an existing neural network capability may be transferred to a model.
And here’s where the container packaging becomes useful.
“We say ‘hey, here’s a containerized toolkit of a [neural network] transfer, what we call the transfer learning toolkits,'” said Al-Hallak, who before Nvidia worked at IBM as a product manager for machine learning, deep learning and cognitive systems. “And here’s a set of models that we’ve pre-trained for you, right? So models that can classify images and computer vision or detect objects in images or video, or models in the medical imaging space that can identify anomalies. For you know, any given CT scan or X-Ray.”
The container takes much of the complexity out of the process for the user who may not be a trained data scientist. Nvidia’s transfer learning toolkit uses Docker containers to distribute pre-trained models for computer vision and conversational AI use cases.
Containers provide both a technical and commercial value, said Al-Hallak. The commercial value comes with how Nvidia packages its GPUs with ready-to-go models that are containerized. It offers the OEM a way to get the processing power of GPUs with the cloud-native capabilities that containers offer.
And the processing power is immense.
“So, imagine if I have 10 GPU nodes,” said Janakiram MSV, who leads the MSV & Associates consultancy. “And if I want to optimize my machine learning training performance, Kubernetes (running on CPUs), will do it at the host level, at the machine level, and Nvidia’s CUDA, which is the Compute Unified Device Architecture, will parallelize at the GPU level. So you get 10X the performance when you run a machine learning training job on Kubernetes, with 10 GPU machines.”
More often than not, a single Helm install will stand up as a reference application for a conversational AI virtual weather bot, as one example of something Nvidia has developed.
GPUs are not architected for distributed computing, which makes them useful for an ML engineer but not something that may scale. CPUs are made for the multiple processing capabilities that compute, storage, and networking requirements. Nvidia is offering containers on GPUs that may be used in a data center to accelerate data processing and then made available on a CPU-based distributed architecture. Containers provide Nvidia with a way to sell its GPUs with containers that are repositories for models used in different vertical markets such as healthcare.
“So, see CPU, first of all, can be virtualized,” MSV said. “Compute, network, storage, all of them can be virtualized and containerized. For example, you can take one Ethernet card that is available on your computer and then build multiple VXLans or virtual LANs to create a virtual network and overlay network within Kubernetes — that works very well. Similarly, you can virtualize storage. You can pull ten disks and create volumes that are visible to Docker and Kubernetes. CPU is time shared between multiple containers, without even realizing it. But when it comes to GPU, it cannot be virtualized, and it is not container friendly.”
What’s Next?
The industry is moving towards virtualizing GPUs. Run:AI and Bitfusion by VMware are some of the examples
Nvidia is cementing its place in the AI accelerated cloud. Kubernetes is the unified and consistent layer to deliver such a complex software stack. How well Nvidia executes in the open source kingdom of container architectures will factor in how AI emerges in the public cloud, data center, and edge environments.
Janikiram MSV helped contribute to this post.
Red Hat and VMware are sponsors of The New Stack.