Cloud Services / Kubernetes / Machine Learning / Sponsored / Contributed

Don’t Let Kubernetes Complexity Stall Your Cloud Momentum

26 Oct 2021 7:28am, by

Erwin Daria
Erwin is a principal sales engineer at StormForge. After serving in roles building and leading infrastructure teams, Erwin has transitioned to the vendor side, serving and finding success in sales, marketing and product roles for companies like Tintri and Juniper Networks.

In industries characterized by fierce competition and escalating customer demands, velocity has become a key differentiator. With the ability to support the rapid development and deployment of applications, cloud has emerged as the Holy Grail to achieve this velocity – easy, on-demand capacity that can scale with a business, all in an OpEx model. Perfect! Public, private and hybrid cloud use soared and containers and orchestration platforms, specifically Kubernetes, found their place in the development process. The global pandemic only served to accelerate cloud, container and Kubernetes adoption as companies turned to off-premises solutions to evolve operations, support a new way of working, and enhance business resiliency.

At this point, reality set in.

The run to the cloud — and, specifically, Kubernetes — resulted in both system and organizational complexity that was not well understood on multiple fronts. Kubernetes introduced unexpected and unwelcome challenges, with one study finding that 94% of organizations adopting Kubernetes say it’s a source of pain for their organizations.

The big lesson as organizations struggle to operationalize Kubernetes? Velocity can create friction in the form of high cloud costs, and those added costs can actually slow momentum.

Expanding cloud costs are just one area of impact, though, because the adoption of Kubernetes and the complexity that results also creates new burdens for the people that have to run it. It begs the question: Are you willing to trade agility for long-term profitability and the risk of operational burnout?

Attempts to Simplify Didn’t Fix the Real Problem

In an effort to address systems complexity, development teams adopted purpose-built observability platforms to make sense of the relationship between the components that make up an application, both software and hardware, along with how they serve the end user (how well the application works or doesn’t work). Unfortunately, reactive, performance-only-focused observability platforms don’t solve the problem, they merely identify it. Then what?

Organizations pursued other ways to address the issues, which gave rise to CloudOps and build-run teams charged with making sense of the complexity resulting from the application moves and builds that need to be migrated to cloud. CloudOps organizations brought together people, processes and tools to focus specifically on how the cloud model affects all areas of IT and the business. The goal of build-run is to give development teams responsibility for the day-to-day performance of applications and services and empower developers to focus on products over projects — basically, you build it, you run it.

At the same time, organizations are beginning to implement FinOps frameworks to bring together cross-functional stakeholders. This pseudo app-level steering committee is designed to add financial accountability to the variable spend model of cloud.

That’s a lot of time, people and processes put against the problem. Yet it still exists today.

We Moved Fast and Broke Things – Now What?

While a once-popular mantra urged software developers to move fast and break things, today the reality is that things are broken. They need to be fixed in a way that allows developers to push the velocity envelope while also ensuring that we don’t get caught in the trap of breaking the same things over and over.

What was once a panacea for organizations that saw it as the Holy Grail of agility and speed, the cloud, has become a source of uncontrolled costs and management complexity. This creates a problematic situation in two ways:

  • Eroded margins: First, escalating cloud costs begin to erode margins by adding to the total cost of revenue (COR) or cost of goods sold (COGS), and
  • Missed SLAs: Second, as development teams are told to reduce cloud costs, they don’t know how to balance those cuts with the impact on service-level agreements promised to the business.

When the cost of cloud takes over the business value it was designed to create, it results in what Sarah Wang and Martin Casado of Andreesen Horowitz call the cloud paradox: You’re crazy if you don’t start in the cloud; you’re crazy if you stay on it.

AI and ML: How a New Class of Tooling Does Help

Fortunately, technologies like machine learning (ML) are making their way into the process and enabling improvements in the ability to optimize the trade-offs between performance and cost. With this technology and a new class of tooling, development teams can do what no individual human could: exhaustively understand and tune all the variables available to ensure that performance and cost are optimized for each application.

Artificial intelligence (AI) and machine learning have become integral to supporting the velocity mandate. Naturally, these new tools are making their way into the deployment process. As a result, organizations are starting to develop practices to manage the adoption and integration of AI and ML tools. AIOps tools now empower Ops teams to automate and improve operations by leveraging analytics and machine learning.

At the same time, DevSecOps works to automate the process of integrating security into all phases of software development. Finally, continuous optimization has found its place in the CI/CD pipeline between continuous integration and continuous development, and is using machine learning to optimize Kubernetes configurations prior to launching into production. This continuous optimization is key to addressing the bottlenecks that can slow down application delivery by identifying issues that need to be addressed and finding an ideal solution. This is where ML is far superior to human cognition.

By implementing these capabilities into the CI/CD process, developer and operations teams now have tools to combat those surprise cloud bills that slow down so many organizations and accelerate their transition to cloud.

Optimization Solutions for Today’s Cloud-First Development

New resource optimization solutions like StormForge can help enterprises proactively ensure efficiency and intelligent business trade-offs between cost and performance without time-consuming, ineffective trial and error.

Using ML, these solutions automate the discovery of optimal application configurations. Some, like StormForge, take a proactive approach by incorporating performance testing to generate load on the application in pre-production, tune the application to meet the load, then create the ideal configuration for Kubernetes to deploy those apps into production. This helps engineers save time without increasing the cost of running applications or affecting app performance and reliability.

Using a resource optimization solution can minimize wasted resources, empowering developers to make decisions based on business goals and offloading complexity so that developers can focus their efforts on actually developing. That, after all, is what makes velocity possible.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: StormForge, Velocity, Real.

Photo by September20th from Pexels.