Primer: Blue-Green Deployments and Canary Releases
This post is part of an ongoing series from Catherine Paganini that focuses on explaining IT concepts for business leaders.
In our previous article on CI/CD, we briefly discussed blue-green deployments and canary releases and their role in continuous delivery (CD). These are very powerful approaches that significantly reduce the risk associated with application deployments, meriting their own dedicated article. So let’s dive into it a little deeper.
Blue-green deployments and canary releases mitigate application deployment risk by enabling IT to revert back to the previous version should an issue occur during the release. Switching back and forth between versions is as easy as flipping a switch and can be automated, minimizing the time users are exposed to potentially faulty code. But before we dive deeper, let’s first differentiate between deployment and releases.
Decoupling Deployment from Releases
While often used interchangeably, deployment and releases are two separate processes. Deployment is the process of installing a software version for any environment, including production. It doesn’t necessarily have to be associated with a release. A release, on the other hand, is when a new feature is made available to your customer base.
Traditionally, an update or feature was deployed a day prior to the release date which was potentially widely promoted in the media. As we know things can go wrong during deployment, leaving little room to fix issues. The goal of frequent production deployments throughout feature development is to reduce risk.
In blue-green deployments, there are two production environments: blue and green. Blue is the current version with live traffic and green is the environment with the updated code. At any time, only one has live traffic.
To release a new version, code is deployed to the environment with no traffic where final tests are performed. Once IT is confident the application is ready, all traffic is routed to the green environment. Green is now live and the actual release executed.
This is the first time the new code is tested with a production load (real-life traffic). Risks still remain until the code is actually released, that will never go away. But if something goes wrong, IT can quickly reroute the traffic back to the blue version. All they have to do is closely monitor code behavior, this can be automated through proper tooling, to see if green works well or if a rollback is needed.
Blue-green deployments: There is only live traffic in one environment at any given time.
This approach is by no means new. IT always created a new version and then rerouted live traffic to it. What is new is the reliability and reproducibility provided through component codification in version control.
How do we get to this reliability and reproducibility? Developers codify all parameters in version control, a database-like system that tracks all code changes. These include app logic, build procedures, tests, deployment procedures, upgrade procedures, recovery procedures, etc. In short, everything that affects the app. The computer then executes the code, deploying the application within its environment matching the exact state codified in version control.
Before DevOps, the process was a lot more manual and error-prone. All changes had to be captured in documentation (which, by the way, is a lot more difficult to version and analyze) based on which developers could recreate the application and environment. With two key steps being manual, this process is far too unreliable leading to frequent problems.
While codifying the app and environment is also a human task, it’s part of the development process not a separate one, such as the task of creating documentation. The same code that is in production, is codified in version control. Any change or update will automatically trigger tests ensuring the code is in a deployable state. In that way, if a human error sneaks it, the system will most likely catch it.
Similar to blue-green deployments, canary releases start with two environments: one with live traffic and the other one, containing the updated code, without live traffic. Unlike blue-green deployments, traffic is moved to the updated code gradually. It can start at 1%, then move to 10%, 25%, and so on, until it reaches 100%. By automating the release, as the code is confirmed as operating correctly it is successively promoted to larger and more critical environments. If at any point an issue occurs, all traffic is rolled back to the previous version. This greatly reduces risk as only a small percentage of the user base is exposed to the new code initially.
Not only can IT control the percentage of the user rollout, but canary releases can also start with less critical users, such as those with a free account or in a market that is less critical to your business.
Canary releases: live traffic is gradually rolled over from the old version to the new one until only the update is live.
Cluster Immune System
Cluster immune systems take canary releases one step further. Linked to the production monitoring system, a release is automatically rolled back when user-facing performance deviates from a predefined range (e.g. increased error rate by 2%). This approach can identify errors otherwise hard to find through automated tests and reduces the time needed to detect and respond to a drop in performance.
By decoupling deployment from releases and leveraging blue-green deployments or canary releases, the risk is significantly reduced. At any time, IT is able to revert the app back to the previous version — a far cry from the reality of traditional releases.
New technologies and approaches made this for the first time possible: version control, infrastructure as code, containers, and Kubernetes all play a role in this new, nimble, DevOps-oriented IT world.
As usual, a big thanks to Oleg Chunikhin who, with each article, teaches me a little more about cloud native tech.