Rethinking Testing in Production
Testing in production has long been regarded as a risky endeavor, often controversial among developers, testers and stakeholders alike. The traditional approach of meticulously testing software in controlled environments like development and staging before deploying to production has been the norm.
The very idea of testing in the live production environment has earned a bad reputation due to potential disruptions, unforeseen bugs and the fear of compromising user experience. However, in software development, this conventional wisdom is being increasingly challenged by a different approach: testing in production with the strategic use of feature flags.
Production Is Always Different
Testing in production with flags doesn’t necessarily imply abandoning other testing environments.
Instead, it recognizes the inherent challenges in maintaining identical development, staging and production environments. The rapid growth and evolving nature of production environments — fueled by user interactions and increasing volumes — make it practically impossible and economically unfeasible to mirror these environments accurately.
With products becoming more interconnected, trying to accurately replicate third-party APIs and integrations outside of production is close to impossible.
Trunk-based development, with its focus on continuous integration and delivery, acknowledges the need for a paradigm shift. Feature flags emerge as the proverbial Archimedes lever in this transformation, offering a flexible and controlled approach to testing in production.
Developers can now gradually roll out features without disrupting the entire user base, mitigating the risks associated with traditional testing methodologies.
Feature flags empower developers to enable a feature in production for themselves during the development phase, allowing them to refine and perfect it before exposing it to broader testing audiences.
This progressive approach ensures that potential issues are identified and addressed early in the development process. As the feature matures, it can be selectively enabled for testing teams, engineering groups or specific user segments, facilitating thorough validation at each step.
The logistic nightmare of maintaining identical environments is alleviated, as testing in production becomes an integral part of the development workflow.
Moreover, the introduction of feature flags paves the way for A/B testing in production, enabling data-driven decision-making by comparing the performance of different feature variations in a real-world setting.
Empowering Development with Feature Flags
With feature flagging tools like Flagsmith, a structured hierarchy of identities (users), segments (groups) and flags set the stage for a meticulously orchestrated release of features. This deliberate sequence allows you to override features at different levels. In order of precedence, these are:
- Identity overrides (user)
- Segment overrides (groups of users)
- Environment defaults (Default)
This approach allows you to have a controlled release flow, such as:
- A developer wraps a feature in a feature flag so that it can be toggled on/off.
- The developer then tests the feature in production by enabling it for just themselves (via an identity override).
- The developer enables the feature for an internal team — again, without any users being impacted (via a segment override).
- The feature is gradually rolled out to increasingly larger audiences until you reach the entire population of users OR do some A/B testing to figure out which version of the feature should be final (via environment defaults).
The once-maligned notion of testing in production is no longer justified if you have the proper tools. Feature flags not only align with the dynamic nature of production environments but also significantly enhance the development process.
Their introduction empowers developers and testers to iterate and refine features with heightened agility, progressively rolling them out to broader audiences. This approach minimizes the potential for disruptions, adding stability and fostering adaptability in the fast-paced landscape of software development.
Why Use Feature Flags to Test in Production
In embracing feature flags, two additional crucial points come to the forefront:
- Feature flags unlock the potential to streamline environments, potentially even adopting a mono-environment setup. (As discussed, this is an option, not a requirement.) This option not only represents a significant cost-saving measure but also optimizes development resources.
- Testing in production with the safeguard of feature flags allows for experimentation and refinement without impacting end users, ultimately contributing to increased system stability.
You should remember that there is no such thing as a silver bullet that solves all problems. It goes without saying (but let’s highlight it!) that introducing something new like feature flags implies that you did your homework and learned the tradeoffs, the new processes and the best practices you will introduce with new technology.
It’s worth it, though! We talk to developers every day and they can attest to the value of testing in production using the approach we presented above.
Testing in production is not a new subject, and we encourage you to read more about this topic. We would like to pay homage to those who wrote about it before us. You can read other perspectives right here on The New Stack!
- Two Times Integration Testing in Production Has Gone Wrong
- Embracing Testing in Production
- Chaos Engineering Progressively Moves to Production
- Testing in Production: Will You Get Eaten Alive?
- Honeycomb’s Charity Majors: Go Ahead, Test in Production
This article was co-authored by Moreno Garcia, a solutions architect with over 10 years of experience in solutions architecture.