How to Meet the Hardware Challenge for ML Workloads on Kubernetes

Development teams are quickly learning how machine learning (ML) can turbocharge efficiency by replacing humans for software development and operations in a number of ways. In the application-development space, developers are increasingly creating the infrastructure for ML for emerging applications, such as driverless vehicles and facial recognition metadata apps.
But ML application development and ML-assisted production pipelines represent a different game. One key difference is how computationally intensive ML applications and tools also have different hardware requirements, compared to traditional if-then computing, which largely involves a shift from CPU configurations to the world of GPUs. In other words, GPUs, instead of traditional workstation and server CPUs, are usually required to process the millions of parallel matrix operations within a neural network that mimics the workings of a human brain in many ways.
The potential costs involved when investing in the required hardware for an ambitious ML development project can seem daunting, especially for a small- to medium-sized enterprise without the in-house resources to build the necessary infrastructure on-premise. But once again, cloud-based alternatives can come to the rescue, and more specifically, Kubernetes platforms can often — but not always, as explained below — serve as the perfect conduit for at-scale ML software deployment and creation.
“Because Kubernetes can be viewed as the great equalizer offering systematic scheduling and resource management across multiple pieces of infrastructure, ML workloads will gravitate towards Kubernetes,” Ravi Lachhman, technical evangelist, for AppDynamics, said.
Indeed, in this way, running ML workloads on GPU-powered Kubernetes platforms should, at the very least, meet the needs of many organizations.
“There is certainly overhead when running Kubernetes versus bare-metal instances of purpose-built GPU machines,” Lachhman said. “But the benefit to reconstitute or reposition available infrastructure to handle requests and the ability to run multiple types of workloads are key advantages. Kubernetes is built to abstract out the hardware by taking advantage of the portability of container.”
However, more often than not, the tunability that is needed to get the increased performance of machine learning workloads might not be at the container-orchestrator level, Lachhman said. “Organizations might take a hit on a single node performance versus a horizontally scaled or fan-out type of architecture but having an orchestrator manage deployments to the best piece of infrastructure for the job or request is certainly a game changer,” Lachhman said.
But while GPU certainly are ideal for neural network computing, CPUs can sometimes do the job for certain at-scale ML tasks as well. “It really depends on whether it’s scale that you need or whether you want that constant hour, using any chip that’s designed specifically for the use case, no one can argue with that. But you can gain some flexibility at using containers and actually putting in technology that can scale with you as you need to scale up,” Nick Durkin, senior director of sales engineering, for Harness, said. “So, add more infrastructure on the bottom end using the cloud providers, which then ultimately gets you more pods or more easiest tasks to run in order to wipe out that load as supposed to take it in all one physical machine if you will.”
Ultimately, at the end of the day, it’s a decision about what “your workload looks like,” Durkin said. “The workload should determine whether you’re going to go directly on the GPU or whether you want the scalability to potentially use multiple CPUs, which are more or easily and readily available amongst every cloud provider,” Durkin said.
GPUs, for example, are “phenomenal when used on a standard use case that needs continual computation,” Durkin said. This might include a workload that’s always processing, such as a decision tree for driverless cars with embedded sensors with neural network-guided controls that tell the car when the vehicle needs to turn left or right.
The autonomous-driving capabilities of Tesla all-electric cars serve as a good example of instantaneous-computation tasks that GPUs are geared for, Durkin said. “The cars need it in real time to be able to make real decisions. However, maybe the thing that you’re looking for is actually scale and flexibility because your workloads aren’t running 24/7,” Durkin said. “Like our workloads, ours is running kind of peaks and valleys when people deploy and they’re not all deploying at the same time. And so, we’ll have massive peaks and massive valleys — the same way that you see financial transactions and so forth.”
On a practical level, GPUs are needed for specific ML/AI algorithms that require massive parallel processing to train their inference model, Torsten Volk, an analyst for Enterprise Management Associates (EMA), said. The algorithm continuously adjusts weights and other model parameters to optimize the fit between input and output based on sometimes millions or more examples, Volk said. “Figuring this out includes incredible numbers of iterative calculations that are continuously adjusted based on the error rates produced by their predecessors. Whether this is done on Kubernetes, vSphere or bare metal is irrelevant, since each of these resources allows you to assign specific workloads, such as model training to one or more GPUs.”
However, Kubernetes does have its advantages for ML computing tasks. “For Kubernetes, all you need to do is install the AMD or Nvidia GPU’s device plugin,” Volk said. “All developers then need to do is add their GPU requirements to their container manifest and they are set. The CI/CD pipeline keeps running the same way as before.”
In many ways, Kubernetes offers advantages for ML applications in addition to those that a GPU neural network or CPU parallel processing offer. Kubernetes, for example, autonomously allocates resources, for example. “In a Kubernetes-style environment, that burden gets removed from them regardless of other GPUs in the mix or not because it becomes the system administrator’s ability to label the environment,” Jim Scott, director, enterprise architecture for MapR, said. “Let’s say what types of workloads can go on to different types of machines and then keep track of what’s the utilization of the system… It does the job of, ‘how do I procure and substantiate that we need more hardware of certain types to meet certain workloads considerably easier?’”
Indeed, Kubernetes is seen as a great bridge between ML applications and their deployments on GPUs (or, as mentioned above, on CPUs in some cases as well). For ML development and workload management, “one can find themselves dealing with multiple distributed system technologies and those who have DataOps or ClusterOps skills can be hard to find,” Lachhman said. “Kubernetes can be seen as a great equalizer having ‘one’ resource manager if describing workloads to run inside Kubernetes.”
Harness is a sponsor of The New Stack.
Feature image via Pixabay.