Argo Rollouts: How Intuit Does Blue/Green and Canary Deployments on Kubernetes
Without native support for blue/green and canary deployments in Kubernetes, developers are left on their own to work that out.
Intuit acquired Applatix, the company behind the Argo workflow engine for Kubernetes, in early 2018, and focused that team’s efforts to moving the financial software firm to cloud-native technology.
Within 18 months, it had 2,000 services running on Kubernetes. It now touts that creating or upgrading a service takes less than 10 minutes, including setting up automated build and deploy pipelines, and a rollback less than five minutes for its QuickBooks software.
Intuit has 1,200 engineers using Kubernetes making around 1,300 deployments a day, Thomson said. They use two deployment strategies: recreate (version A is terminated, then version B is rolled out); and rolling update (Version B is slowly rolled out and replacing Version A). But it found its legacy software couldn’t do rolling update, plus it had new applications and strategies it wanted to employ.
It uses a GitOps approach as a single source of truth of the desired state of infrastructure and applications. All Kubernetes manifests are stored in a Git repository, and it uses the open source tool Argo CD, which compares the manifests to what’s in the cluster, identify the differences, and help the user reconcile those differences.
One of the problems Intuit encountered is that GitOps is declarative and blue/green and canary methods are imperative.
“We wanted more of a declarative approach and integrate with CI tools so developers would have a simple, clean pipeline from commit to deploy and master,” Thomson said.
Its first approach to blue/green and canary deployments involved Jenkins scripting, but that did not fit the GitOps model.
“[It was] taking a lot of state and putting that into our pipeline, which meant we had two different sources of truth. [There were] a lot of assumptions with Jenkins. If anything changed, if there was anything different with the cluster, the pipeline would fail. Users would have to go in and modify the cluster to get it back into the right state,” he said.
It also was extremely brittle and required more work than they wanted.
Next, it tried pushing Jenkins logic into deployment hooks. There were still a lot of assumptions from Jenkins, it still was not idempotent and transparent, and still required a lot of work. It was still not following the GitOps model.
“It worked, but we weren’t happy with it,” Thomson said.
The first two approaches were very imperative. “We said, ‘Do this, do that.” But we wanted to take advantage of the things that make Kubernetes so great: it’s declarative nature,” he said.
So they set out to build a custom controller, which after six months of work became Argo Rollouts. Its design requirements were that it:
- Codifies the deployment orchestration in the controller. Developers don’t have to worry about the deployment logic.
- GitOps friendly (idempotent) — regardless of the state of the cluster, it should be able to handle it and get back to a solid, steady state.
- Runs inside the Kubernetes cluster — not have to provide credentials to some outside source.
- Easy adoption and migration from deployments.
- Feature parity with deployments.
You install a custom resource called a rollout, then a controller that operates on that resource. It handles ReplicaSet creation, scaling and deletion — the full lifecycle of the ReplicaSet. They also wanted to have a single desired state as a Podspec. It supports manual and automated promotions and integrates with Horizontal Pod Autoscaler (HPA).
Since developers vary in their approaches to canary deployment, they wanted to make it as flexible as possible. It allows the user to define the steps they want to take to transition from old version to the new.
Thomson explained in a blog post that the Argo Rollouts controller enables both old and new versions to run simultaneously in blue/green deployments by managing the ReplicaSets and filtering the traffic by modifying Service selectors:
The ReplicaSets are created from the spec.template field of the Rollout and services are specified in the spec.strategy.blueGreen field. Each ReplicaSet created by the rollout has a unique hash (in the rollouts-pod-template-hash in the label) that the controller will add to the service’s selector to limit traffic to that one ReplicaSet.
When the spec.template field changes, the Rollout will create a new ReplicaSet and wait for it to become available. Once that occurs, the controller will modify the preview Service’s selector to send traffic to the new Replicaset and enter a paused state by setting the spec.paused field to true. During this time, the Rollout will pause and wait until an external operator (like a CD system or a user) changes the value to false. After the Rollout is unpaused, the Rollout will modify the active service’s selector to point it at the new ReplicaSet and scale down the old one.
You can define a ratio of traffic to your canary or you can add a step that will pause your rollout for a predetermined time or it gets the OK to progress. When a new version is introduced, the Argo Rollout controller will execute these steps, then mark the current template within the rollout to be the new stable resource.
It was not using service mesh at the time of this demo, but started working with Istio during the summer. It has since released Admiral, an open source project providing automatic configuration for multiple Istio deployments to work as a single mesh.
Since the acquisition, Intuit has released a number of Argo projects, including:
- Argo CD, declarative continuous deployment for Kubernetes.
- Argo Events, developed in collaboration with investment management firm BlackRock, an event-based dependency manager for Kubernetes to trigger workflows and applications.
- Argo Workflows, a highly scalable, Kubernetes-native workflow orchestrator.
- Keikoproj, a set of declarative custom resource definitions (CRDs) for managing Kubernetes at scale in production.
- Keiko — a set of independent open source declarative tools for orchestration and management of multitenant Kubernetes clusters in production.
At KubeCon+CloudNativeCon North America later this year, Intuit announced Argo Flux, a collaboration with AWS and Weaveworks to unify Flux, a Cloud Native Computing Foundation sandbox project, and Argo CD into an open source GitOps continuous delivery tool.