Code Climate processes roughly 650 billion lines of code daily and deploys their production website five times daily — all without a staging server, and all without exploding. When rigor is applied to operations, mistakes are kept to a minimum and negative impact to customers is minimized. In other words, the MTTD (mean time to detection) and MTTR (mean time to repair) — how long it takes to find and fix a problem — are reduced.
They employ proportional investment, meaning when — not if — issues happen with deployments, the team will step back and decide whether it’s a small problem or a larger one, and divert their time and energy accordingly, according to Helmkamp.
Here are the four techniques Code Climate uses for deploying production websites:
This is a project on Github, around for years, used to implement feature flags — essentially conditional logic — in a Ruby application. Rollout lets you limit a feature to specific users, groups (beta testers or moderators), or a random subset based on a percentage (used the most). Interestingly, he comments, it lets you increase and decrease the rollout of the feature without having to redeploy your website. Changing a rollout file using the Ruby console takes him only seconds, as compared to redeploying a website, which can take one to five minutes.
The data for all the configuration is all stored in Redis, and Helmkamp explains how this is done. He also states if something were to go wrong, the feature can be deactivated for everybody on the website.
At Code Climate, they use their own self-written tool with Capistrano, which allows deployment of specific branches to specific servers. They use chat for all their deployments, meaning any employee can issue a command into chat to deploy a website at any time.
Branch deploys can be used to deploy a non-master branch across the entire set of servers, to watch and see if anything breaks. Also, it can deploy to a specific server to look at the metrics on that particular server and see how it changes. This helps avoid conditionals and reverts, and allows testing without committing to pushing it to master and rolling out to 100 percent immediately, which is useful for complex backend changes, especially on internal services.
Theory allows Code Climate to deploy a branch to a separate pool of processes — not the main pool of processes serving end users — and it shares the data with production, and has live connections to all resources and databases.
This set up is limited to staff only, and allows them to work on a large change and deploy it so that a staff member will always be interacting with the code on Theory, but the end user is still getting the code from the master branch of the website. This is useful to avoid conditionals — just like branch deploy, but unlike rollout — and for sweeping changes.
Theory allows active and passive QA. An application which is highly used provides passive QA, in that bugs can be reported and resolved quickly simply by carrying out normal daily activity.
Helmkamp warns that it is important to ensure you have up-to-date code before deploying to Theory, in case you’re doing this yourself. This is done by merging the master branch into the feature branch, which will incorporate any bug fixes not yet updated on the feature branch.
This is an open source project developed inside Github, used for mission-critical changes. It is the most rigorous form of deployment.
Scientist lets you run two or more versions of code simultaneously, in production, and track the results. It can catch mismatched values, timings and failures of critical code paths. It does this by creating an instance with two alternate implementations, or blocks of code, of the same algorithm. The old version of the code, which is believed to be working, is called the control. The new version of the code you want to replace the old version with is called the candidate. When experiment.run is called, it executes the code in both blocks and returns the answer from the control block. If the candidate is broken, it will not affect the behavior of the production website — as long as it is contained in the Scientist experiment.
Facilities are provided within Scientist to compare values and figure out if the candidate is working exactly the same as the control. If it isn’t, the information is recorded for the developer to access. They can also get statistics of the failure rate or timing, and this help figure out why the values are untrue. It also records exceptions and mismatches, which can help the developer find bugs in the candidate, or even the control.
If you don’t want to spend a whole lot of time deploying a new intensive application to staging and then to production, where it may not work, consider using these four techniques. “When in doubt, ship it.”
Feature image via Flickr Creative Commons.