What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Kubernetes / Software Development

How Desirable Is Your Desired State?

A looping system is how Kubernetes handles constant change with tons of moving parts and always-changing variables, like someone trying to balance one too many plates.
Feb 22nd, 2023 7:36am by
Featued image for: How Desirable Is Your Desired State?

Dip your toes into Kubernetes, and you’ll quickly come across the concept of a “desired state.” And no, it doesn’t have anything to do with your dream job, or the perfect suite of cloud native tools, or exactly how you’d like to design your Kubernetes cluster if only your opinion mattered.

The concept is simpler than that: The desired state represents how you want your application and cluster to behave, and a Kubernetes “controller” creates a feedback loop to get there. Still, the cloud native ecosystem has created cultural and technical barriers that make your desired state more inaccessible to realize than ever. Barriers like:

  • Only some people know enough about Kubernetes to define the desired state conceptually or on the manifest level.
  • The desired state changes with the market and the customers (how many you have and their expectations).
  • The desired state depends on what cloud native tools you’re using. Some companies use bare YAML, others generate Helm Charts for their deployments or use Kustomize to differentiate between environments.
  • It also depends on the application source code, the way you pack it and the data stored by your application.
  • And as your organization evolves, you want to have a history of every change to your desired state, like you do with version control — how and why did you made those changes to your desired state.

All of this makes us wonder how desirable the desired state is in the first place.

What Is the ‘Desired State’ in Kubernetes?

There’s a common analogy about control loops that help differentiate between current and desired state. When you set your thermostat to a specific temperature, you establish your desired state. With help from the HVAC equipment, the thermostat’s job is to bring the room from the current state (current temperature) to the desired state (set temperature). The thermostat loops through this check-change loop repeatedly to keep the states aligned.

In Kubernetes, the desired state is how your infrastructure or application should function once running. The controller is the HVAC equipment — all the mysterious machinery that gets you from 0 to desired state. The temperature you set is your configuration.

In the cloud native world, all this happens with a declarative approach. Instead of configuring infrastructure by listing the steps, known as imperative configuration, you define how every piece of your cluster should operate and let Kubernetes worry about the dirty work. This also fits nicely with Infrastructure as Code (IaC), where managing and provisioning your infrastructure is an automated, not manual, process.

In Kubernetes, the desired state is how your infrastructure or application should function once running.

A Kubernetes controller runs continuously and tracks associated objects (pods, services, etc.), looking for variation between the spec field provided in the YAML configuration (the desired state) and the status field, which stores the current state. If there is a variation, it takes action automatically by calling the API server to make the changes required to close that gap.

This looping system is how Kubernetes handles constant change with tons of moving parts and always-changing variables, like someone trying to balance one too many plates. For example, your actual application state is a mix of the application image, the desired state of Kubernetes and the application’s state, which is stored in a database that’s often external to the cluster itself.

This complexity means it’s frequently impossible for your cluster to achieve that desired state, making it more of a pipe dream, an objective that is always moving.

Your desired state depends, then, on four things:

  • The code and the operating system where it runs (stored as an image in the container registry).
  • The configuration required to deploy the image to Kubernetes.
  • The output of the CD process that deploys the image with the configuration, modifying it if needed.
  • The data that is stored in the application (normally in a database).

The Container Registry and Your Manifests

The Kubernetes controller is always trying to synchronize the desired state and the actual state, but your application state also depends on other things. One of the plates you’re juggling is your container registry, which is responsible for storing all the images that provide the foundation for your containers and pods — your entire infrastructure. Your image will hold the version of the code being deployed.

As your application evolves or the underlying OS changes, your application will have to be updated. Those changes move into different environments navigating through your CI/CD pipeline, which generates new images. In containers, images are “layered” by design, with several layers building on top of the other to provide the final container use. For instance, you can have a base layer, which is the operating system, with your application running on the second layer, or you can have more layers for more complex applications.

All these edits, tweaks and possibilities have real-world implications for your container registry as it tries to supply the images required to bring your cluster back to the desired state.

Because Kubernetes (and modern software development style) encourages continuous application deployment, you’re likely going to be changing both layers in time:

  • You’ll update your base OS layer for new versions of the libraries/services your applications depend on.
  • You’ll update your application layer for updated dependencies or new features/bug fixes in your application itself.

Sometimes these changes are also driven by larger architectural shifts. For example, you need to change your database because your customer data is simply too large and is no longer performing adequately, and thus you need to change your desired state, which in fact might mean reckoning with additional tweaks to each layer of your images. Or, if you’re dealing with multiregional deployments, your images need to be configurable so pods are deployed evenly across your global infrastructure.

And to manage all these image-level changes, you should be using specific labels to install a verified version of both the OS and application levels, instead of simply using latest and hoping for the best. Then, updating the image version is no longer automatically done by the controller after updates, so you need to trigger the update when you are ready.

All these edits, tweaks and possibilities have real-world implications for your container registry as it tries to supply the images required to bring your cluster back to the desired state.

Your CD Process: Desired to Deployed

Kubernetes controllers make sure that your application state is what you have defined, but somehow you need to move from your local definition of the configuration to the desired state as defined in your cluster, making sure that nothing breaks in the middle.

From our perspective, Argo CD and Flux are the leaders in cloud native continuous delivery (CD). They both follow GitOps patterns of using git repositories as the single source of truth for defining your infrastructure/application’s desired state and are deployed inside your cluster as applications.

CD tools watch your git repositories for changes in your manifests (whether they’re plain YAML-based manifests, Helm charts or Kustomize applications) and automatically sync them with your cluster. In order to avoid problems, automatic tests of different kinds are performed. If a test finds an issue, the developer is asked to change the configuration and try again. Even if that works most of the time, it’s good practice to have some eyes reviewing the changes to make sure you don’t introduce a problem not covered by your testing.

Between automated tests and the pull/merge requests developers make for peers to approve or deny their changes, these processes are designed to improve the end-user experience by allowing for frequent releases. It might seem cumbersome at times, but by reducing the time required to deploy, you can ship higher-quality code and improve security by controlling who puts code in production.

CD pipelines can work with deployments that are pull-based or push-based. In push-based deployments, you use some kind of action to trigger the CD process, like a GitHub Action or Webhook. In pull-based deployments, an agent identifies when the changes have been made and triggers the process automatically. In any case, the build will likely consist of many steps that include validations and verifications.

Some advanced tools like the ones proposed here support both flavors of GitOps. Some advanced tools for IaC alleviate the imperative strategy by providing some idempotence (you can execute them several times and the result won’t vary), but you need to code the steps required for that. There are pros and cons for both, but at the end, the important thing is to define your desired state in the best way possible.

Reconciliation and Drift in the GitOps Era

When you’re employing GitOps and IaC, your CD trigger is often the divergence between the actual and desired states. But what happens if the actual state changes beyond your direct intervention or the controller can’t achieve the desired state? Kubernetes favors the declarative configuration, but it responds to imperative commands using kubectl that change the state and don’t rely on YAML manifests. That’s what we call drift, and it can happen for many reasons, like changes in the application configuration, runtime environment, autoscaling or changes to one or more layers of your images.

You might even need to quickly make a manual configuration change in a specific pod, such as providing it with a higher memory request to keep it from crashing and affecting the end-user experience. You now have an out-of-band change that doesn’t align with the desired state as you previously defined it, but you’re also accomplishing your goal of resiliency and reliability.

Every time you make a manual change, your CD pipeline will try to revert it, so you need to either disable your CD or modify your desired state in configuration to make it permanent.

Every time you make a manual change, your CD pipeline will try to revert it, so you need to either disable your CD or modify your desired state in configuration to make it permanent. It’s no fun if you find the perfect short-term solution for your production problem but the CD pipeline keeps “fixing it” by moving into a defined state that is not working.

In addition, your desired state also depends on policy. How can you be sure that the current policies are in place? Are you using verified images avoiding the “latest” version in favor or a specific approved one? There are lists of best practices and policies that can help you improve the security and maintainability of your deployment, which you can apply automatically.

Open Policy Agent and Kyverno are the most commonly used projects to apply policy for cloud native infrastructure/applications. Every time your policy changes — for instance, somebody defines the tags required for cost management — so does your desired state, which often means additional configuration changes.

Which one wins? The pipe dream of desired state or what your end user wants, an application that actually works? Now you, or your CI/CD pipeline, need to start taking specific action to reduce drift through reconciliation.

How Do You Control Your Desired State from Container to Production?

When you look at the reality of desired state, container complexity and reconciliation/drift, the dream of “you build it, you deploy it” seems like just that — a dream. There are too many moving pieces for developers to maintain their own applications in production, despite the story told in job descriptions for roles in DevOps, GitOps, applications site reliability engineering and beyond.

No matter how heavily your organization invests in DevOps, platform engineering or GitOps, the reality is that you can never be sure of achieving the desired state as you’ve configured. You need to adapt to the situation to provide timely updates to the configuration of your desired state and compare it in every moment with the actual state deployed in your cluster.

Many organizations are addressing issues around desired state with internal developer platforms, which abstract the complexity of Kubernetes configuration behind a curated catalog of services, tools and services, but those introduce new challenges along their benefits. Platform engineers waste time digging into auto-generated configurations for answers when something goes wrong, and there is always a balance between giving developers what they request (the latest version of many libraries and applications) and making sure that it works appropriately and safely by following other policies.

Based on how quickly best practices are changing, we’re far from seeing the cloud native community agree on any one strategy. There are projects designed to simplify the process: Developers don’t need to know the details of Kubernetes if they use the new X deployment language that abstracts the concept into a higher-level language. The idea is great and has been put into practice in the past for infrastructure (Puppet, Ansible, Salt and others). But complex problems aren’t normally solved uniquely in the abstraction layer.

At the end, you need a tool that brings anyone working with Kubernetes configurations, from developers to DevOps to platform engineers, onto the same efficient platform where they can better understand the desired state by:

  • Understanding which images are being used.
  • Visualizing every resource created and its dependencies.
  • Validating configurations against schemas and policies.
  • Reconciling versions of configurations across branches and environments with visual diffs.
  • Collaborating with other teams to prevent drift.

The desperate need for this workbench is why we’re building Monokle Cloud and Monokle Desktop, a free set of tools for exploring and analyzing Kubernetes configurations and optimizing GitOps workflows.

Need a little more evidence? Sign in to Monokle Cloud with your GitHub profile to see the visual difference between two release branches in the Argo CD repository. Or check out this quick walk-through video from one of our lead engineers, Wito Delnat.

Join us in realizing the dream of a desired state!

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.