Optimizing Resource Management Using Machine Learning to Scale Kubernetes
Kubernetes is great at large-scale systems, but its complexity and transparency have caused higher cloud costs, delays in deployment and developer frustration. As Kubernetes has taken off and workloads continue to move to a containerized environment, optimizing resources is becoming increasingly important. In fact, the recent 2021 Cloud Native Survey revealed that Kubernetes has already crossed the chasm to the mainstream with 96% of organizations using or evaluating the technology.
In this episode of The New Stack Makers podcast, Matt Provo, founder and CEO of StormForge, discusses new ways to think about Kubernetes, including resource optimization which can be achieved by empowering developers through automation. He also shared the company’s latest new machine learning-powered multidimensional optimization solution, Optimize Live.
Alex Williams, founder and publisher of The New Stack, hosted this podcast.
Originally spun out of Harvard, the company started its algorithms in a lab “to figure out how to take our core machine learning, apply it to the right set of problems, as well as productize it and connect it to the right kind of market opportunities like the growth of containerized workloads and scaling Kubernetes,” said Provo.
For companies like StormForge, who are born in the cloud, what’s often top of mind is “resource management and efficiency at scale, in particular, on Kubernetes,” Provo said. With machine learning models consuming cloud resources heavily, StormForge uses its own products to understand and navigate the challenges its customers also face.
As the company pivoted to a containerized architecture, Provo said that the path to scale was very challenging. “In our own lift and shift to Kubernetes, our team found and ran into the challenge of tuning the application workloads that are moving to Kubernetes which was another pain point,” Provo said. Initially focused on pre-production, the company “used load or performance tests as a data input since the machine is connected and dependent on the quality of data put into the models.” Customers found value in areas like scenario planning, and what “to deploy, into production, as events like Black Friday would come up,” Provo added.
Armed with insight from customers who seek to look at both preproduction and production in the same platform, StormForge recently released a module within their platform. “Optimize Live takes in observability and telemetry data from a production standpoint. It then uses that as the data source which allows us to provide real-time recommendations on resource allocation — in the moment, as well as predictive,” said Provo.
With Bayesian Optimization as the company’s IP, StormForge differentiates itself as “we’re the only ones out there that can do what we would call multi-objective optimization,” said Provo. “Bayesian optimization allows us to go to an infinite number of potential parameters or metrics and how they interact with one another for that application. And we can do that not only on a static standpoint but from an ongoing standpoint,” Provo said.
With DevOps teams increasingly involved in helping organizations meet their goals, there is a “growing skills gap within a Kubernetes environment where the same humans who were responsible for the applications in a non-Kubernetes world are oftentimes now responsible for the applications in a containerized Kubernetes world, but without the training and development that they deserve and need,” said Provo. “Our goal is to empower the developers into the process, not automate them out of it. We are a huge believer in developer augmented AI, allowing them to give feedback while maintaining control where that makes sense,” said Provo.