We’re well into 2021, and as far as GitOps is concerned, we continue the hype cycle just where we left off last year. We do so for perfectly good reasons, of course. Keeping configuration in a git repository and applying it when something changes has always made perfect sense. But just because something makes perfect sense, doesn’t mean it is trivial to implement. As it turns out, automation that only runs when configuration in the repository has changed is too limited to deal with many of the failure scenarios of operating distributed systems.
Kubernetes separates concerns at just the right level to overcome this, by providing an API to declare the desired state, and a controller-based reconciliation loop to keep desired and current state in sync. And with that, the idea that always made perfect sense is the preferred way and got its own buzzword, GitOps.
For workloads on top of Kubernetes, there is no shortage of tools if you want to adopt GitOps. And if you can make it past the noise from all the vendors jumping the bandwagon, and rebranding their legacy products as GitOps, you can pick one and avoid having to deal with the various edge-cases yourself.
But any system can only ever be as reliable as its dependencies. And your cluster is the foundation of your system. What does all that advanced GitOps deployment automation really get you, if the cluster it’s built on is the weakest link?
Since we are all human, manual operations are prone to mistakes. And many common mistakes when managing a cluster manually will have a severe negative availability impact on your application workloads.
Anything you build on top of a manually managed cluster will therefore be as unreliable as that cluster.
This leads to two questions. First, when enough teams adopt GitOps to make it the hype it is, why are the majority of clusters still managed manually via UI? And second, of course, how do we fix that?
If you need a reminder of how dominant UIs really are, remember all those VMware installations that are primarily managed via UI. This UI is apparently so widespread, that the ability to manage Amazon Web Services (AWS) with it is heavily advertised by both VMware and AWS. And VMware is far from the only example. Cloud providers’ own UIs also pay tribute to this reality. Need another example? Let’s also not forget multicloud Kubernetes UIs like Google Anthos, Azure Arc or SUSE Rancher. Heck, there are even companies who solely exist to build alternative, supposedly better, AWS UIs.
In this UI driven world, despite all the benefits it promises, infrastructure as code is held back by its intimidatingly steep learning curve. And having Kubernetes in the mix, just makes this learning curve even steeper.
But even if you are already past the steep learning curve, infrastructure as code and Kubernetes induce a hefty upfront effort for migration projects and greenfield projects alike. You will have to write loads of code from scratch, and also set up and integrate numerous prerequisites before you can maintain everything as a team using automation.
Worse, on top of this, infrastructure as code alone is not enough to achieve full stack GitOps. Kubernetes’ separation of concerns that made GitOps easily viable, only does so for cluster workloads. When it comes to the cluster’s infrastructure, we still face the same complexities as before, unless we have a system in place that equally separates between declaring and reconciling state for the cluster infrastructure.
Now that we have a better understanding of the problem, let’s take a look at what to do about it.
One example of a system that provides the missing separation of concerns are the managed Kubernetes offerings like AKS, EKS and GKE. With them, you can declare the desired state via the API and the cloud provider takes responsibility for keeping desired and current state in sync.
Another example is ClusterAPI, a cloud native community initiative to achieve the same outside the hyper scalers’ walled gardens. And via the acquisition of Heptio, ClusterAPI has made it into vSphere, unlocking this capability for its vast install base.
When using a managed Kubernetes solution or ClusterAPI, infrastructure as code is the perfect fit to maintain and apply the desired state. But this still leaves the issue of having to build everything from scratch.
For anyone coming from application development, it may come as a surprise that there is no framework for a popular use-case like this. But infrastructure as code, as an ecosystem, is still in its infancy.
Any similarly popular use-case on the application development side will often have multiple, actively maintained frameworks in any language. Like Django, Rails, Spring Boot or Gatsby, and their respective alternatives. Just to name a few examples.
But the reason for the lack of frameworks can not merely be explained by the fact that the infrastructure as a code ecosystem is still emerging. Only the paradigm shift to containers and Kubernetes provided a clean enough abstraction to prevent application requirements from leaking into the infrastructure layer. Before this, infrastructure teams often had to keep configuration application-specific and that left no common requirements for a reusable framework to add value.
As my contribution to advancing the infrastructure as a code ecosystem, I don’t only write articles about this. I went all in and wrote the code. Kubestack, the Terraform GitOps framework I maintain integrates everything teams need to build GitOps automation for Kubernetes cluster infrastructure and cluster services into one free and open source framework. If having the benefits of infrastructure as code and a GitOps workflow without the upfront and long term effort sounds too good to be true, maybe you should give Kubestack a try.
One last thought about UIs. I’m not saying UIs have no place in the future of GitOps. Well designed UIs, after all, can drastically improve the user experience. I just believe that instead of changing the desired state, they should help teams change the code in the repository.
Amazon Web Services and VMware are sponsors of The New Stack.
Internal image via the author.