A company that virtualizes AI infrastructure, Run:AI, has launched a deep learning virtualization platform for Kubernetes that allows its users to dynamically assign graphical processing unit (GPU) resources for AI model inference and training.
While the company has provided virtualization for AI infrastructure since its beginning two years ago, it now brings these capabilities to Kubernetes through a few specific new features, which are offered as a container run as part of your existing Kubernetes cluster, explained Run:AI co-founder and CEO Omri Geller in an interview with The New Stack.
“We are building a virtual environment for GPUs, but also we built a scheduler for Kubernetes. Even through Kubernetes, GPUs are allocated statically to users. We built scheduling capabilities into Kubernetes to support batch scheduling, which is very relevant when you build AI models,” said Geller. “The Kubernetes scheduler was built in order to support services and not the training of AI models. We built an extension that will be able to support efficient training and inference of AI models.”
In essence, explained Geller, Kubernetes is built for running short jobs, where AI training and inference may require long-running workloads. At the same time, when using Kubernetes for AI, there is usually no division of GPUs and an inability to share GPUs across workloads according to policies. These are all features that Run:AI has introduced to make deep learning more efficient on Kubernetes. Geller also explained that Run:AI offers “topology awareness scheduling,” which, much like the use of regions and content delivery networks (CDNs), moves workloads physically closer to the processor handling the workload.
“If you want to run distributed computing efficiently, you want to choose correctly the right GPUs for a specific workload. You want to make sure, for example, that those GPUs are close to the CPU,” said Geller. “Topology awareness scheduling allows you to make sure that you get the most out of your expensive and powerful AI hardware.”
Run:AI’s virtualization platform plugs into Kubernetes with a single line of code. The platform not only offers the ability to manage a number of different workloads, putting policies in place to most efficiently use GPUs, but it also offers visualization into that process.
“Today when you enter an IT organization, Kubernetes has become the de-facto standard tool to run IT environments,” said Geller. “You want to run your AI workloads using Kubernetes, and run Kubernetes on top of GPUs, therefore there is a marriage that needs to happen to bridge this gap between what AI development needs and what Kubernetes can provide.”
While this entire process may sound akin to virtual GPUs (vGPUs), Geller said that there was a primary difference: vGPUs offer a static provisioning of GPUs, whereas Run:AI with Kubernetes can allow the assignment of GPU resources to change as needed, allowing users more flexibility for experimentation.
“vGPUs are something a little bit different. vGPUs carve the GPU. Run:AI dynamically orchestrates GPU pools. If you have a lot of GPUs and you want to manage them efficiently and use them efficiently among users. Without Run:AI, those are assigned statically, so every user gets a fixed amount of GPU, but they can’t get more and that’s limiting them,” said Geller.
“Sometimes, as a researcher, you want to run many experiments in parallel, and if you have a fixed quota of GPUs, you can’t run more. GPUs are an expensive resource and therefore static allocations of GPUs are not efficient when it comes to enterprise environments. Our layer virtualizes the GPUs in a way that we help to manage the resources without statically allocating GPUs to users, but to have a virtual pool of GPUs that are automatically provisioned according to business goals,” he said.