DevFinOps and AI to Provision Exactly the Right Cloud Spend

The tech industry has an addiction to over-provisioning. In an effort to hit four or five nines of uptime, organizations often provision double the number of servers needed in the cloud. This is costly — for budgets and the environment. Of course, under-provisioning can be just as harmful to a business in terms of downtime and reputation.
Until now, how much is provisioned, and when, often came down to individual developer instinct — plus at least 20% of cushioning on top. This meant that finance teams often found unwelcome surprises — if they could even figure out their cloud spend on those 100-page Amazon Web Services (AWS) bills.
In the tight economic climate of 2023, increasing complexity and constantly rising cloud costs have spurred a greater adoption of FinOps.
FinOps is a practice that enlists relevant stakeholders around an organization to contain IT expenses, using tooling for cost reporting, monitoring and rightsizing cloud spend. It also means optimizing cloud resources to generate revenue and value. Not every organization is making a big enough dent in their expenses with FinOps, but at least the reporting and monitoring has gone more mainstream.
In fact, the “2023 State of FinOps Report,” published in February by the FinOps Foundation, found almost half of the organizations interviewed about their FinOps practices are forecasting their costs monthly, while about two-thirds are starting out with cost allocation.
Still, 20% of those interviewed in this year’s report don’t even track the difference between their cloud budget and their actual spending. And less than 5% of those interviewed have implemented proper automation and workload management.
With limited storage and a higher cost than ever, now that cost reporting is finally nailed, FinOps has been ripe for an influx of AI to scale past those often instinctive and arbitrary thresholds that frankly most developers are simply making up.
This week at KubeCon + CloudNativeCon North America, CAST AI, a Kubernetes cost optimization platform, announced two new offerings: Workload Rightsizing, which automates the granular scaling of workload requests in near real time, and PrecisionPack, which trains on your usage and then automates strategic pod positioning.
“If I know where I spend money, wouldn’t it be nice that I deploy some automation that will reduce it, and then report to me that the number has been reduced and why?” asked Laurent Gil, founder and chief product officer of CAST AI, in an interview with The New Stack.
CAST AI looks to now go beyond the finance function of FinOps to foster a culture of DevFinOps — which, as Gil described it, is “bringing this now to the controller of the cost, which are the DevOps and SRE teams, because they are the ones who decide which machine is used, why, which services, how much, when, how to scale, how to scale up and down.”
It’s the natural progression, he said, to operationalize FinOps.
This week, CAST AI also announced $35 million in Series B funding, further proving the demand for the tech industry’s newest portmanteau. Gil told to The New Stack what this means for FinOps and Kubernetes orchestration, how to build a DevFinOps culture, and just what CAST AI has built to attract that market confidence.
Manual FinOps Moves Slowly
In a 2021 whitepaper entitled “Is FinOps the Answer to Cloud Cost Governance?” Gartner presented FinOps as a cloud economics practice that must be done collaboratively across functions. But the analyst also noted that the practice should be owned by a dedicated FinOps team in charge of setting technical guardrails and creating monthly reports.
The paper pointed to the decentralized adoption of cloud services — Infrastructure, Platform and Software as a Service (SaaS, PaaS, and SaaS) — alongside the rush to the cloud without initial cost consideration, as to blame for the challenge that FinOps seeks to tackle. The authors advocate that cloud cost governance is a business decision made on behalf of each application.
Business leadership and application teams, the report continued, should work together to budget and forecast cloud costs — though the paper noted this could be a distraction from app teams delivering value. Then, the analyst urged, the vendor procurement teams should look to renegotiate with cloud vendors.
Gartner’s recommendations seem to involve a lot of manual parameter setting and meetings. The report predicted that “The wasteful use of cloud resources can usually be corrected in a year.” Needless to say, a year is a long time in tech, when a lot of things can change.
“The traditional finance was saying: Well, you have all these 100 [virtual machines] that are not very well used. We recommend that you change these 100 VMs from that to that, so your cost goes down,” Gil gave as an example. “We need to use some automation or some techniques that will reduce this cost on the fly.”
FinOps ‘Shifts Left’
-
Screenshots show the gap between estimation and actual pay before and after optimizing via an autoscaler. Cost savings in this example range from 50 to 65%. (Source: CAST AI)
At the start of this year, Eran Kinsbruner, global head of product marketing and brand strategy at Lightrun, a developer observability platform, predicted in The New Stack that FinOps would make a “shift left,” toward engineering.
“When organizations embrace a solid shift-left FinOps approach, it will translate not only into overall cost reduction and cost predictability but also into a much faster resolution of production defects (MTTR reduction) and enhance developer productivity and innovation,” Kinsbruner said.
This developer feedback loop, he argued, needed tightening as much as cloud budgets did. But, as he suggested would likely happen back in his January article, 2023 found organizations deciding between cutting cloud spend and cloud-driven speed to market.
By eliminating much of that decision process, by automating with a tool like CAST AI, you could eliminate the need for that business versus speed-to-innovation trade-off and optimize for both.
Kubernetes is the most common use case for cloud cost optimization because often developers just eyeball the amount of CPUs they need, over-provisioning just in case there’s a traffic spike.
-
This is a screenshot of the client with the checkout service. The pale blue line represents the CPUs the developer thought was necessary, the green represents the reality, and the black represents the headroom the app team set on top of that reality. (Source: CAST AI)
The example presented above is of a container for a checkout service for an e-commerce customer. “This container has seven replicas,” Gil said. “It’s a Kubernetes application that is made of containers. Each container has copies — that’s why it scales. When the developer deploys this checkout service, they say to the DevOps team that this container needs 0.3 CPUs.”
This isn’t gained knowledge that even more experienced developers already have. It’s really the result of just guessing what they think the checkout container needs. After a few days with CAST AI’s new Workload Rightsizing, the team realized the application only needs 0.075 CPUs.
And because this is a new tool and developers or their organizations will naturally be cautious about adoption, developers can always choose to opt out of this automated rightsizing of workloads or ask for more headroom between the actual and the provisioned.
Either way, the AI engine will be recording and responding to both memory and CPU usage data in real time.
Automated Kubernetes Pod Positioning
This week, CAST AI also introduced PrecisionPack, the next-generation Kubernetes scheduling approach that gets rid of randomness in pod placement. PrecisionPack, Gil said, employs a “sophisticated bin-packing algorithm to ensure strategic pod positioning onto the designated set of nodes,” which maximizes resource usage, while bolstering efficiency and predictability across your Kubernetes clusters.
This means workload movement is reduced, which in turn improves both uptime and reliability of workloads, while automatically creating a perfect blueprint for cluster cost optimization.
Similarly, the app teams can tell the AI engine “to be more aggressive, to be less aggressive, to be just on time, to give headroom,” Gil said.
Finally, it’s important to note that since cloud cost continues to be the main proxy for carbon footprint, by halving your cloud budget, you’re also halving your environmental impact.
Gil spoke of a customer in Germany, which examined the right-sizing impact, on average, when using CAST AI versus not using it. The current savings with CAST AI, the customer determined, is 37%. “It means that out of 100 CPUs, 37 have not been used on average,” he said. These machines aren’t kept running, waiting for the customer to use them again. Instead, the cloud provider sells that capacity to another company.
“It’s almost like I’m telling AWS: Look, your data center is actually 37% bigger for those clients that use CAST AI. So you don’t need to use more energy to build a new one yet because you have a lot of resources that are unused that you can give to someone else.”
Don’t forget to respond to the FinOps Foundation’s “2024 State of FinOps Survey.”