ML Can Streamline Kubernetes Provisioning
DETROIT — In the rush to create, provision and manage Kubernetes, proper resource provisioning often gets left out. According to StormForge, a company paying, for example, $1 million a month on cloud computing resources is likely wasting $6 million a year of resources on the cloud for unused Kubernetes clusters.
The reasons for this are manifold and can vary. They include how DevOps teams can tend to estimate too conservatively or aggressively or overspend on resource provisioning. In this episode of The New Stack Makers podcast, StormForge’s Yasmin Rajabi, vice president of product management, and Patrick Bergstrom, chief technology officer, discussed how to properly provision Kubernetes resources and explored the associated challenges.
This On the Road episode of Makers was recorded live in Detroit during KubeCon + CloudNativeCon North America 2022. The conversation was hosted by B. Cameron Gain, a longtime TNS contributor.
ML Can Prevent Getting Burned For Kubernetes Provisioning
Developers in the Dark
Ironically, the most commonly used Kubernetes resources can even complicate the ability to optimize resources for applications. The processes typically involve Kubernetes resource requests and limits, and predicting how the resources might impact quality of service for pods.
Developers deploying an application on Kubernetes often need to set a CPU request, memory request and other resource limits. “They are usually like ‘I don’t know — whatever was there before or whatever the default is,’” Rajabi said. “They are in the dark.”
Sometimes, developers might use their favorite observability tool and say, “‘We look where the max is, and then take a guess,’” Rajabi said.
“The challenge is, if you start from there when you start to scale that out — especially for organizations that are using horizontal scaling with Kubernetes — is that then you’re taking that problem and you’re just amplifying it everywhere. And so, when you’ve hit that complexity at scale, taking a second to look back and ‘say, how do we fix this?’, you don’t want to just arbitrarily go reduce resources, because you have to look at the trade-off of how that impacts your reliability.”
The process then becomes very hit-or-miss. “That’s where it becomes really complex, when there are so many settings across all those environments, all those namespaces,” Rajabi said. “It’s almost a problem that can only be solved by machine learning, which makes it very interesting.”
But before organizations learn the hard way about not automating optimizing, deployments, and management of Kubernetes, many resources — and costs — are laid to waste.
“It’s one of those things that becomes a bigger and bigger challenge, the more you grow as an organization,” Bergstrom said.
Many StormForge customers are deploying into thousands of namespaces and thousands of workloads, he said: “You are suddenly trying to manage each workload individually to make sure it has the resources and the memory that it needs.”
The process should actually be pain-free, when ML is properly implemented. With StormForge’s partnership with Datadog, it is possible to apply ML to collect historical data, Bergstrom said.
“Then, within just hours of us deploying our algorithm into your environment, we have machine learning that’s used two to three weeks worth of data to train, that can then automatically set the correct resources for your application. This is because we know what the application is actually using,” Bergstrom said. “We can predict the patterns and we know what it needs in order to be successful.”