Historically, feature flagging systems were tools built in-house by developers to test and control new features before rolling them out to customers. Since the flags originated in-house, they had to be maintained in-house, leaving developers in charge of not just the features they were tasked to build, but also the tools used to manage those features.
The software development industry is changing though, and the demand for feature flags is increasing. Better automated testing, monitoring, and observability have opened the door to testing in production. Software applications are becoming better equipped through vendor partnerships and integrations, enabling feature flags to become the foundation of a larger category of tools needed by modern-day development teams.
Before we get any further, let’s make sure we’re all on the same page about what a feature flag really is. Think of a feature flag as a light switch: flip it one way, and a feature is turned on for users; flip it the other way, and it’s no longer available. If you’re feeling even more adventurous, you may add the ability to turn a feature on for a random percentage of users or some user attributes. When feature flags approach this level of power and sophistication, they are called “controlled rollouts”. In your feature flagging system, controlled rollouts would look something like this:
// turn on the feature for 50% of users in California. For everyone
// else, turn it on for 1% of users.
if user.state = ‘ca’ then split 50%:on,50%:off
else split 1%:on,99%:off
Controlled rollouts are a powerful tool that allows product engineering teams to get creative while decreasing the blast radius of errors and also test in production. Most importantly, controlled rollouts give product engineers the opportunity to quantify the impact of a feature on engineering and product metrics without releasing it to all users.
A feature is initially released to 1% to 5% of users in a controlled rollout. This allows engineers to learn right off the bat if there are bugs, exceptions, or latency changes introduced by the feature.
With such a small audience, the mean or 95th percentile latency across the site will barely see any effect. Application Performance Monitoring (APMs) and exception tracking systems are not built to pick up the signal produced by a feature flag.
If your feature passes this first round of testing with flying colors and there is no degradation to the engineering operational metrics, it’s time to upgrade your testing pool and release it to the next 20% to 50% of users. Similar to APMs, product analytics systems will not pick up the impact of the feature on user behavior metrics at this exposure level.
Getting to Experimentation
Since we’ve determined that feature flags and controlled rollouts blow a hole in your ability to measure changes in engineering and user behavior metrics, how can you measure their impact? The answer is to tie measurement to feature flags in a single integrated system. When you combine feature flags with controlled rollouts and measurement, that gets you to the next level: experimentation.
Integrated systems are capable of managing feature flags and running experiments when they’re at their full maturity. This means engineers can release a feature to 1% of users, and the system will automatically detect, alert, and kill the flag if page latencies or exception rates are negatively affected. These systems also enable product managers to continue to release to 50% of users and collect data on whether the feature had the desired impact on user behavior — or at least didn’t cause a degradation.
Without controlled rollouts and measurements, feature flags are incomplete. However, by combining the abilities to release quickly, measure, and learn from your users through a unified solution for feature delivery, you can create a world where every feature is safe behind a flag, purposefully released to users, and quantified through metrics.
Feature image via Pixabay.