GPU manufacturer Nvidia has put the open source Kubernetes container orchestration engine at the center of its strategy to bring machine learning to the enterprise.
On stage, at Nvidia’s GPU Technology Conference, taking place this week in San Jose, California. Jensen Huang, Nvidia CEO/founder, touted a new set of optimizations to execute large machine learning jobs in a timely manner — such as the latest version of the TensorRT library — that would hasten ML jobs by a factor of 100. He also unveiled the “world’s largest CPU,” one capable executing a neural network job that would typically take six days in only 18 minutes. But how would an organization manage and make full use of all these speedy technologies, across the clouds and in the data centers, in large-scale operations?
“Turns out, there is this thing called Kubernetes,” Huang said. “Kubernetes on Nvidia GPUs is going to bring so much joy. So much joy.”
Huang pinpointed the cause of this elation being Kubernetes’ recent support of GPUs, introduced in beta form with the release of version 1.8. It is good news for Nvidia, the world’s largest GPU maker, that Kubernetes is now “GPU-aware,” Huang said.
In a demonstration, Huang showed how Kubernetes could scale up a workload. He started with the company’s standard demonstration on how a single GPU can be used to scan a set of photos containing flowers and nearly instantly identify what type of flower they are. To increase that workload, Huang’s engineer called up Kubernetes to spin up a number of four additional replicas of the basic program, which instantly increased the number of flowers being identified on the screen manyfold.
“Kubernetes can assign pods — which is a basic a service that containers a whole lot of containers — on one GPU, on many GPUs on one server, on many GPUs on many servers. You could also assign it across data centers. So you can have some of it in the cloud and some of it in the data center,” Huang told the audience. “All of this stuff is happening completely invisibly because we made Kubernetes GPU-aware.”
The demonstration also showed resilience offered by Kubernetes. Huang’s engineer killed off four GPUs, to show how Kubernetes will automatically find four replacement units, though K8s found the replacements so quickly in this demo, the audience barely even noticed a slowing down of the all the photo identifications.
— The New Stack (@thenewstack) March 27, 2018
Behind the Scenes
Starting in April, according to the company, any Kubernetes managed service can offer GPUs as services. The company specifically touted both Amazon Web Services and Google Cloud Platform, though not Microsoft Azure, which is still being certified by Nvidia.
Behind the scenes, Nvidia plans to contribute GPU-support code to the Kubernetes project, managed by the Cloud Native Computing Foundation. The company develops the technology first in-house, and then offers it upstream for the rest of the community to adopt, explained Kari Briski, Nvidia director of accelerated computing software.
The idea behind container support is that Kubernetes would recognize GPUs in a system, and be able to assign workloads to them. Nvidia has already been supporting container-based operations, through the Nvidia Container Runtime, a modified version of runc that recognizes GPUs.
Kubernetes first started supporting GPUs, experimentally, with its 1.7 release, though it required manually mounting the volume in the pod specification, and it didn’t offer any monitoring or health checks. Starting with Kubernetes versions 1.8, users could use the alpha Nvidia plug-in for Kubernetes, through the use of the Nvidia Container Runtime, noted Viraj Chavan, Nvidia’s director of GPU Cloud Compute Software, in a technical session explaining more about the Kubernetes work. That plug-in was upgraded to beta status this week.
The plug-in “exposes the GPU as a first-class resource to other software components at the top of the stack,” Chavan said.
Now the company is working to further refine this support, Chavan said. It would like to see GPU health checks and monitoring in place, as well as support for being topology-aware and aware of different types of GPUs, so that the user could specify the workload by the specific GPU type. Another item on the wish list would be to allow users to switch to different container run-times, such as CRI-O.
“We want to add these capabilities where you can schedule your jobs with these constraints in place. For example, you can choose to say ‘my application needs this kind of GPU and this kind of memory,'” Chavan said.
Still, the modern enterprise still has other challenges ahead when using GPUs for production-scale jobs. One is reconciling the technology with continuous integration and deployment practices. Cloud-based CI/CD systems such as Travis CI and CircleCI don’t yet support GPUs, noted Michael Wendt, Nvidia manager for applied engineering solutions, in another technical session at the conference. Wendt himself is working on a Jenkins plug-in that would support the Docker container runtime. It works for Docker Swarm but doesn’t support Kubernetes quite yet, he said.
Feature image: Nvidia CEO Jensen Huang, on stage at GTC 2018, demonstrating Kubernetes for machine learning work.