Startup Run:AI Looks to Improve GPU Utilization for AI Workloads
A startup is using a VMware-style technique to give data scientists greater access to GPU compute power to run their artificial intelligence (AI) workloads.
Run:AI, whose platform is designed to enable organizations to leverage all of the compute power they need to accelerate their AI development and deployment efforts, recently unveiled two new technologies, Thin GPU Provisioning and Job Swapping, which allow data scientists to share the GPUs they’re allocated for their AI work.
Thin GPU Provisioning does for the compute accelerators what VMware’s thin provisioning did for storage area networks (SANs), which reallocates available disk space as needed. Data scientists often are allocated GPUs for their AI-related work but aren’t always able to use all the accelerator compute power that they’re given. Combined, Run:AI’s two new technologies automatically optimized the allocation of the GPUs, so GPU compute capabilities that aren’t being used by one data scientist can be automatically provisioned and used by another.
“Many times people ask for resources, but are not really using that,” Omri Geller, Run:AI’s co-founder and CEO, told The New Stack. “Our technology is under the hood. Other data scientists that are requesting compute power may get access to a compute power that was previously assigned to someone else based on the fact that our system identifies that this someone is not actually using the GPUs. It’s really transparent for the user and that’s a strength. That means that if ‘Data Scientist A’ is not using his or her GPUs, ‘Data Scientist B’ can use the GPU that was provisioned and will not know it was previously allocated.”
GPUs and AI Workloads
GPUs have become increasingly important in data centers and high-performance computing (HPC) environments since Nvidia began pushing its use as accelerators to run in systems with traditional CPUs to drive improved performance and power efficiency. Six of the top 10 supercomputers listed on the bi-annual Top500 list of the world’s fastest systems leverage GPUs from Nvidia.
With their parallel processing capabilities and Nvidia’s CUDA framework, GPUs can accelerate the computational processes for AI and deep learning workloads, making them essential for data scientists. A problem is that despite their increasing importance in data centers, there are few tools that help enterprises optimize them, Geller said.
“Today there are a lot of technologies already implemented for normal CPUs within the computers,” he said. “The way the CPUs are utilized is very good today with all the operating systems. When it comes to chip deals, in the majority of the cases GPUs are not first-class citizens, not in Kubernetes and not in the cloud-native ecosystem. It’s really the early days of how to take advantage of those accelerators … in leading-edge Kubernetes deployments. Therefore, our roadmap is making those accelerators first-class citizens in those new environments. By making that, we will be able to get more utilization out of those compute resources.”
Run:AI Orchestrates GPU Power
The Israeli startup, which was founded more than three years ago and has about 60 employees, is designing its software platform to get data scientists as much GPU power as they need when they need it. It decouples the workloads from the underlying hardware, enabling resources to be pooled and using advanced scheduling capabilities to ensure the resources are fully utilized and get to workloads that need them most. That enables data scientists to run more experiments and get results more quickly.
The company claims the real-time visibility and control over scheduling and dynamic provisioning of GPUs results in a doubling in the utilization of existing infrastructure.
“One of the most important things when building applications is compute power.” Geller said. “Those applications are becoming larger and larger and complex, and they need a lot of compute power. … We took on the technical challenges of making the best software that gets the most out of those compute resources.”
Virtual Pool of Resources
The platform creates a virtual pool of GPU systems and includes such features as splitting GPUs into fractions that can be used for different workloads and a Kubernetes-based workload scheduler. It now also will include the Thin GPU Provisioning and Job Swapping features. The goal of the platform is to make all the tasks it performs transparent to the user and to give them the power they need while running the tools they’re used to, whether on-premises or in the cloud.
Run:AI this year introduced its Run:it your way initiative, which enables data scientists to essentially run whatever machine learning tools — from pipelines and data pre-processing to Kubeflow (an open-source machine learning platform running on Kubernetes) — they want for managing such processes as modeling atop the vendor’s compute orchestration platform.
“The consistent thing across deployments is that data scientists are using GPUs,” the CEO said. “The things that are not consistent are what tools data scientists use in order to run workloads on the GPUs. Each of the data scientists in the world have their tools of choice and we at Run:AI, we don’t want to change that.”
Thin GPU Provisioning is transparent to the data scientists using the platform. The user who originally was allocated the GPUs doesn’t see if any of the accelerator compute power is reallocated to another data scientist and that data scientist doesn’t see that they were able to get more compute power. They both just know that their workloads are getting the compute power they need, Geller said.
The Job Swapping feature is a key to all this, Fara Hain, vice president of marketing at Run:AI, told The New Stack. It’s the tool that identifies where the GPU compute power is needed, identifies where available GPUs are located and then reallocates them, she said. It does so based on priorities and policies set by an enterprise’s IT and data science teams for specific jobs.
“Swapping happens based on free resources along with the policies and priorities that were set in advance,” Hain said.
Run:AI is testing Thin GPU Provisioning and Job Swapping in customer labs. Both are expected to be generally available in the fourth quarter.
Geller said the software platform from Run:AI, which has raised $43 million in funding to date, is generally available and being used by a range of large enterprises. Among the customers are Wayve, a London-based company developing AI software for autonomous vehicles, and the London Medical Imaging and AI Centre for Value Based Healthcare, which develops, test and deploy AI systems across the country’s National Health Service (NHS).
He said most of Run:AI’s customers are in the financial services, automotive and healthcare fields, though there are some in other verticals as well.