From GitOps to Adaptable CI/CD Patterns for Kubernetes At Scale

GitHub has changed the way development teams manage code changes, bringing in improved collaboration from “commit” all the way to “merge.” Having started with Jenkins-centric continuous integration, many organizations are now trying to optimize and improve on sub-optimal pipelines by taking a closer look at their GitHub workflows.
Weaveworks, the organization that originally coined the term GitOps describes it as “Operations by Pull request.” The most valuable part of GitOps is that it enables “environments as code,” which means that GitOps lets you define policies and environment configuration alongside your code.
While there’s a lot to love about this approach, there are also aspects of GitOps that are not necessarily suitable in an enterprise context. Elaborating on this, Andrew Phillips, a product manager in Google Cloud Platform’s DevOps division explains that there are primarily two challenges with common implementations of GitOps: “First, running your release processes and approvals largely through source code repositories isn’t ideal in larger organizations. Second, it’s not always best to choose a ‘pull’ approach to make environment as code real; a ‘push’ approach can be more suitable in many cases.” These issues crop up with organizational complexity, and this is where simple implementations of GitOps fall short.
As a result, it is imperative to retain the best parts of GitOps and replace certain parts of it with alternatives more suited to your context. In this post, we discuss a better way to think about GitOps with Kubernetes so that you can apply a set of workable patterns, keeping in mind your own organization’s complexities and constraints.
Dealing with Enterprise Complexity
Large organizations typically have multiple development teams each with its own unique practices. They must empower each team with the necessary autonomy to build, deploy and maintain the code and applications the team owns in order to maintain velocity and quality. At the same time, organizations must prioritize and plan for the scaling of their applications and clusters.
There are two aspects to scaling a CI/CD pipeline. The first is when your organization scales to include new teams and new team members. The second is when your application’s usage grows and needs to be supported by additional resources. Additional Kubernetes clusters and larger environments mean more complexity and a greater risk of failure during rollout. The more your application scales the greater the cost of a failure becomes. There is more at stake for an application in production that has a million queries per second than one that has just ten.
These two goals of autonomy and scale seem to be at odds with each other. This is where GitOps fits, as it benefits both without sacrificing one or the other.
“GitOps provides a set of best practices to organize code and configuration for applications and environments in an automated way,” Dan Lorenc, software engineering manager at Google Cloud, said.
If infrastructure as code was centered around instances, GitOps centers around environment configuration. In a GitOps model, all changes and updates to your production environment happen through changes to code repositories. Instead of manually configuring an environment, GitOps enforces changes to the environment as a by-product of code changes. This way, GitOps creates a link between your repositories and environments and enables “environments as code.”
“The power of integrating repositories and environments is that you can now extend your access controls, testing policies and authorization from your code to your environment configuration,” Lorenc said. This helps to unify your CI/CD pipeline while still enforcing development patterns that adapt to your organization.
One Repository Per Team
GitHub makes it possible to maintain and deploy from multiple code repositories, but with every additional repository, the management overhead increases. In an organization with multiple development teams, the dilemma is whether to create as many repositories as possible or fewer ones. In a scenario with multiple repositories, each team would have separate repositories for development, staging and production. This results in the problem of repository sprawl.
Google’s Andrew Phillips recommends an alternative. He advises, “As far as possible, it makes sense to have fewer repositories than many. Instead of creating additional repositories you could create three branches within that repository for each environment — for example, one branch for dev, one for staging and another for prod. This way, each environment has its own commit history and config, so you’ll still have the benefits of isolation while avoiding the maintenance overhead.”
It’s often the case that organizations end up with a very granular and unnecessarily complex system that is easy to create but hard to maintain. Phillips’ advice goes against this trend and offers a pragmatic approach to managing and scaling code repositories.
By following this model, you’ll likely end up with one repository per team, which is better than multiple repositories for every team, but in a large organization with many teams, this is still hard to manage. Taking the idea of fewer repositories further, you can have a shared repository for each environment. For example, a staging cluster that five teams deploy to can have its config hosted in a single repository. This way you get the best of both worlds — separation of concerns and ease of management.
‘Push’ Over ‘Pull’
Another important consideration with GitOps is how runtimes are synced when there are numerous developers contributing code multiple times a day. GitOps is often described as operating on a “pull model” in which a process running in the live cluster tries to realize the desired environment configuration after changes to it have been committed. This is problematic as the team becomes aware of failures only after the repository storing the environment configuration has been updated. When this happens teams can roll back the commit, but it gets recorded in the commit history which is then no longer a clean record of good states.
Another issue with relying on a pull model to deploy changes is that anyone with commit rights becomes a production deployer. This is not a good security practice as your repository becomes your security surface, and the security controls around repositories are less powerful than the security controls around your runtime.
“A better alternative is to avoid the ‘pull’ model altogether and instead use a ‘push’ model to deploy config and code,” Phillips said. “In this model, rather than having the cluster ‘watch’ for updates, you call the API, which allows you to apply checks and custom security controls as part of your pipeline, both just before and after the deployment.”
Before the “push” command, the new code goes through a “dry run” or a “preview environment,” and if it fails this test the deployment is not carried out. But if it succeeds, the configuration is pushed and validated using a smoke test or other verification step, and you then commit it to your master repository. This increases confidence in each deployment and results in a stable master branch with a commit history of successful deployments. You can still include manual approvals as part of the pipeline to have more control where required, but the goal is to have the approval and verification process automated as much as possible.
The tools to enable these specific patterns of GitOps are yet to reach maturity, but there are some promising options already. Jenkins X comes closest in terms of bringing a model of GitOps to reality. It focuses on automating management of various environments, and providing fast feedback on code changes before they’re merged to master. Weave Flux is another open source tool that is based on the idea of declaring a desired state for deployments in Git. Flux then ensures that the desired state is achieved in your live cluster. While the tools are still evolving, the patterns discussed here can be applied irrespective of the tools that enforce them.
Key Takeaways:
In this series, we’ve detailed how to create and optimize CI/CD pipelines for cloud native applications.
- In the first post, we talked about how you can quickly spin up local environments with Skaffold, and automate deployments in the cloud using Google Cloud Build.
- In the second post, we discussed how to harden and secure your CI/CD pipeline using the metadata API Grafeas. We also covered vulnerability scanning with Container Registry and enforcing security policies with Kritis and binary authorization.
- Finally, in this post, we discussed specific patterns and modifications to optimize and scale your GitOps-style workflow.
Whether you’re just starting out with building a CI/CD pipeline, or are in the process of revamping an outdated one, the tools, best practices, and patterns discussed here can be applied. In doing so, you’ll transform the way you build and ship modern, cloud-native applications.