Editor’s Note: Previously in The New Stack, we’ve discussed the subject of feature flagging — a way for development teams to restrict the dissemination of new features and capabilities in code. Such a system can enable a more carefully marshalled migration from a monolithic architecture to a microservices environment. In the following contributed feature, Jonathan Anderson, who manages strategy and revenue operations for Oakland, California-based LaunchDarkly, reveals his company’s strategy behind managing software features in a distributed system — especially when that system scales up to thousands of concurrent microservices.
What I’m about to say may sound heretical, but hear me out: Not everyone needs a highly sophisticated feature management system like LaunchDarkly. While we feel strongly that everyone should manage release risk with feature flags, we know that many teams can do this successfully, using a homegrown or open source system that includes just a subset of the capabilities that LaunchDarkly provides.
Whether you need a more sophisticated system is determined by two factors: the number of people in your engineering team and how often you release code to production.
The teams that build their own feature-flagging systems are often our smartest customers, and they make for our best advocates. They understand what a well-built feature management system can do: increase a team’s speed and productivity by “de-risking” the release process, by means of early user feedback, canary launches, and kill switches.
These folks have felt the pain of building and maintaining a system like LaunchDarkly (just ask Microsoft). If you’ve ever had to turn to the one developer who can still remember which flags you “definitely should not touch,” we know we’ll get along swimmingly.
If you’re ready to take a crack at building your own but don’t yet know where to start, you can find some of our favorite open source libraries here. But if you’re prepared to take a deeper dive into feature management systems, we’re about to lay out what the different levels of these systems look like, beginning with the most basic schematics, and progressively adding sophistication. Along the way, we intend to answer three principal questions:
- What are the characteristics of a simple feature management system, compared to a more sophisticated one?
- At each level, what are the pros and cons?
- How do you know if you’re outgrowing your current feature management solution?
Level 0: The config file
The proverbial “toe-in-the-water” for any system purporting to offer feature management, involves storing rudimentary flags in application configuration files.
At first glance, a config file appears to be a conventional, sensible place to store all types of settings. Chances are, you’re using configuration files for your global application settings anyway. But the main limitation with configuration file-backed feature flags is that they’re not context-sensitive, meaning that flags will either be “on” or “off” for all users.
Another limitation of storing flags in config files becomes apparent as soon as you need to update the flags. In most cases, for your application to catch up with any changes requires redeploying or restarting it, which at scale can be cumbersome.
Configuration files are also often poorly documented, and change management (audit logging, versioning) is often rudimentary at best. One of our favorite anecdotes tells about the developer who, while attempting to clean up an exceedingly long config file, accidentally deleted a critical flag and brought the whole system careening down. Since configs have no granular controls, anyone with access may change any value.
Why does it work? It’s a straightforward way to store simple feature flags.
What are the problems?
- Config files lack context for user targeting.
- Picking up changes requires redeployment or restart.
- Config files are hard to update and maintain.
- Anyone with access can change any value.
When is it time to move on? As soon as you need something more granular than a global on/off switch.
Level 1: The Database
Moving flags from config files to a database may seem like a small change, but in the world of responsible feature management, it’s a huge leap. In practice, this is usually accomplished by storing individual feature flag values in your user model. For example, every user may have a Boolean flag value for “can upload,” or a short integer for “preferred language.” As database values, these flags are checked for type, so they’re not prone to a user error such as a typo.
The primary benefit of moving flags to a database is that you can target functionality at specific users. Most critically, you can update user targeting without restarting or re-deploying services. So you’re much less likely to take your system down while determining who gets to be a beta customer for specific features.
In addition, having your flags in a database distributes access control across the development team. So you no longer need to beg for time from the only developer on your team who has the experience — and willingness — to muck around with config files.
In theory, this means that anyone, from product managers to business users, could toggle flags on, for individual users. However, we’ve found that without an intuitive UI, making these changes requires a database connection and proficiency with phrasing database queries. This ends up imposing limits on who actually can make changes. What tends to happen is that a business user requesting a change will have to open a ticket. Then an admin will need to go into the database to update a flag, triggering a backlog for minor updates.
One of the first things our customers ask is how to control features releases, in order to gather early user feedback — and to limit the fallout if a feature isn’t working out. A convenient method for canary-testing new features is with a controlled percentage rollout, where users are randomly assigned access to new pieces of functionality. With the database approach, you still need to write additional tooling to handle bucketing (e.g., assigning users to variations based on percent rollouts).
With the database approach, cleaning flags becomes a chore. Beyond just removing them from your code, there’s an additional step to modify your production database — for instance, via a script. At scale, this can become a pain point.
Why does it work? With database flags, you can update which end users have access without redeploying.
What are the problems?
- Database flags limit user segmenting and release functionality.
- Only people you trust with database access can change segmenting.
When is it time to move on? As soon as you need percentage rollouts, or have too many feature flags cluttering your database.
Level 2: An Open Source Feature Flagging System
Are you ready to release new features with controlled rollouts? Do you want to get user feedback earlier in the process? Do you just want a feature kill switch? You might just be ready for an open source solution.
Generally, open source solutions allow for simple user segmenting and controlled feature rollouts. They even have simple UIs so that, once configured, non-technical users can access them. From a feature management perspective, now we’re talking.
As with most every protected asset, though, opening up access can have unintended consequences. Open source solutions tend to lack role-based access control (RBAC). You can’t limit who can do what, so as a result, there’s no way to restrict what the most junior member of the team can change.
In addition, open source solutions are not equipped with audit trails, so change management and tracking can be a challenge. There are moments (most likely around 3 a.m., after a release has gone off the rails) when it becomes really important for you to be able to zero in on exactly what changed.
Finally, open source solutions are often language- or stack-specific. So if you use multiple languages (e.g., PHP and Android) or multiple platforms (mobile, web), you’ll need multiple systems, each of which may operate a bit differently.
Why does it work? These solutions enable controlled percentage rollout and open up access with a simple UI.
What are the problems?
- There is no means for limiting access or logging changes to flags.
- Open source solutions are generally language specific.
When is it time to move on? As soon as you need an additional release or access controls or a cross-language platform.
Level 3: Sophisticated feature management solutions
When people initially skip over the first three levels and reach for a more sophisticated solution like LaunchDarkly, it’s because either they have advanced use cases or they need greater control over the release process. Two common advanced cases include targeting rules based on user attributes (e.g., to set up a beta test group) and experimenting with code-driven A/B tests.
As for greater controls, systems like LaunchDarkly are built to scale along with engineering teams. More sophisticated solutions will include role-based access controls as well as an audit log for individual flags and contributors. Not only is an audit log very helpful in release crises, but also when trying to figure out what a teammate may have been attempting to do with a flag.
We’ve found that our customers are more and more capable, and increasingly comfortable, with expanding feature management beyond the development team. By means of an intuitive user interface, even non-technical users like product managers, and sales and marketing associates, can safely control who sees what functionality and when.
Finally, LaunchDarkly has well-documented SDKs for 11 languages, including iOS and Android, so engineering teams can have one centralized system for creating and modifying flags, identifying and targeting users, and executing feature rollouts.
Why does it work?
- Sophisticated solutions offer better controls and intuitive front ends, so non-technical users can manage code.
- Cross-language support leads to centralized feature management.
- Required for advanced use cases like user group segmentation and A/B testing.
What are the problems?
- It may have more horsepower than individual developers need.
When is it time to move on? Never, we hope!
Feature image: Experienced assembly line workers of both sexes contribute to the production of A-20 attack bombers now in the public domain.