How OpenStack Provides Scalable, Reusable Infrastructure for AI/ML Workloads
Red Hat sponsored this post, which was written independently by The New Stack.
“Infrastructure” is defined as “compute, storage, and networking resources accessed throughout a datacenter,” according to the OpenStack Foundation. Through virtualization, we have abstracted away these various components to allow for infrastructure-as-a-service (IaaS) and infrastructure-as-code, both of which make it possible to automate tasks such as server provisioning and management, which leads us to this idea of scalable and reusable infrastructure of artificial intelligence/machine learning (AI/ML) workloads. Very much in vogue, AI/ML is special in this regard because it often requires special hardware for specific workloads and not for others.
Inference, for example, is the act of training models, a task much like studying for a test. Inference for tasks such as visual object recognition can require much more, and a specific type of, processing power than needed for actually identifying those objects later. For this purpose, graphical processing units (GPUs) can make these workloads tenable. The only problem is, we again run into the scenario of having a relatively rare, expensive, and power-hungry piece of hardware that we want to use as efficiently as possible. Thus, virtualization again saves the day, by isolating workloads and divvying up these specialized processors into virtual GPUs (vGPUs).
Abstracting Infrastructure with Virtualization
Virtualization allows us to divide hardware resources, abstracting away the physical resources from the software systems running on top of them. At the Open Infrastructure Summit earlier this year, there were many themes one could point to — edge computing, bare metal, hybrid and multi-cloud, infrastructure-as-a-service, and 5G networking among them — as central to the conference, and at the core of each lies virtualization. Nowadays, virtualization is used in numerous and varied realms, giving users the ability to emulate a different operating system within their own, for example, or providing networking functionality in the absence of networking hardware. And with the steady move toward cloud computing over the past decade, we’ve seen the methods and purposes of virtualization only expand, often with the purpose of abstracting layers of the stack in order to enable automation.
Mark Collier, chief operating officer at the OpenStack Foundation, compares recent efforts to enable AI/ML workloads to the early days of the cloud, when the attitude was one of moving beyond specialized hardware to instead virtualizing and scaling horizontally. Now, however, another round of hardware specialization has taken place and GPUs have become that lowest common denominator of hardware, that is then virtualized, shared, and repurposed as needed.
“What we’ve seen happen over the last few years is like the early days of cloud, when everyone said this is going to be the lowest common denominator — we’re going to have the cheapest CPUs, hard drives memory and it’s just going to scale out horizontally, and that’s going to the end of hardware specialization,” said Collier. “What we’ve seen in more recent years is the return of seeing the value of different pieces of hardware, such as GPUs. You get a lot of benefit out of having that in your data center as part of your cloud fabric, so that’s meant that OpenStack has evolved to enable those things. vGPU is something that’s been incorporated in Nova for the last several releases, which is more of a sophisticated cloud-centric way of exposing the underlying GPU hardware inside of virtual machines. We definitely see this trend towards enabling different hardware architectures inside of clouds.”
Similarly, OpenStack also enables field programmable gate arrays (FPGAs), chips that are designed to be configured by a user after manufacturing, to be offered as generic cloud resources, which Collier points to as another way that virtualized hardware assists with modern AI/ML workloads.
“The idea is that you can update and change the capabilities of the processor well after they’ve left the factory, which is not how a normal processor works. The ability to essentially customize and create unique functions on the chip that weren’t in there when the chip designer built them is really interesting. There are specific workloads that really lend themselves to the performance characteristics of FPGAs, in particular in the machine learning area of inference, which is at the front end of the training process,” said Collier. “When you’re training, you’re collecting lots of data. In a self-driving car, for example, it’s collecting data from all the cameras and the sensors. It’s constantly trying to turn that into math, relatively simple math, but math you want to repeat millions or billions of times. That is something that performs much more efficiently on FPGAs than something like a GPU or CPU.”
Erwan Gallen, an OpenStack product specialist at Red Hat, offered a presentation during the Open Infrastructure Summit on how to quickly implement facial recognition using OpenStack, that explored the history, current use, and potential of using vGPUs and other hardware acceleration, as well as different machine learning algorithms and architectures, which can be viewed in full below.
At one point during the presentation, the difference between using CPUs and vGPUs is made explicit, finding that CPUs were able to process 3.3 images per second, while a GPU was able to process 158 images in that same second.
In an email to The New Stack, Gallen wrote that vGPUs were currently in tech preview for Red Hat OpenStack Platform (RHOSP) 14, but were slated for full support in RHOSP 15, which is expected later this year. With adding vGPUs and GPU pass-through, Gallen also remarked that OpenStack “was designed to scale,” as it “allows in one API call to provide thousands of VMs deployed with a specific AI/ML workload.” In particular, Gallen pointed to vGPUs as a move toward greater efficiency.
“GPU pass-through and vGPU are reusable resources — you can get usage statistics and quotas per team per month for these accelerators. vGPU allows you to get more usage of each GPU,” wrote Gallen. “Some customers want to use vGPU because they are using only 50% of their GPU Frame Buffer memory. For some workloads, such as inference and limited training, they can increase the usage ratio of their investment.”
Workload Isolation and Beyond
Daniel Riek, a senior director at the Artificial Intelligence Center of Excellence with Red Hat, took time to explain that virtual machines (VMs) go beyond providing multiple access points to singular pieces of hardware, also providing benefits like workload isolation for security purposes. Kata Containers, an OpenStack Foundation project, for example, are lightweight virtual machines built precisely for this reason — virtualization that provides security isolation in multitenant environments, such as those often encountered while using containers and container orchestration systems like Kubernetes.
“You need VGPUs whenever you want to share GPUs safely with multiple VMs, and you need VMs for security isolation for multiplexing hardware, or if you want to run multiple operating systems or different kernels. For the same reason that you need a VM, you would need a vGPU to multiplex the GPU. If you want multiple users to be able to talk to the same piece of hardware, you can’t have one and not the other safely. You need vGPUs if you want to safely run VMs with GPUs,” explained Riek. “You could also use VGPUs, even when you run containers on bare metal. You can orchestrate the bare metal with OpenStack, deploy OpenShift on it, and run machine learning on top of that. You still have a benefit from vGPUs if you want to have multiple access to the GPU. With Openshift, you’re not using VMs to multiplex the hardware, you’re just using containers to share it. It’s a timeshare versus a virtualization.”
Of course, there is an old approach to all of this that, with virtualization, has moved in a new direction. For organizations that want to make sure they get every available bit of compute squeezed out of a scenario, bare metal is commonly the answer. In the end, IaaS automates the reprovisioning of infrastructure, conferring the performance benefits of bare metal while commoditizing it. Combined with vGPUs, organizations today are more capable than ever before to treat infrastructure as reusable while enjoying the efficiencies provided by virtualization.
The OpenStack Foundation is a sponsor of The New Stack.
Feature Image by simplyelke from Pixabay.