Serverless Testing in Production
The still-maturing ecosystem of serverless means that there is not a range of tools available for specific aspects of application deployment within this infrastructure style. But also, the nature of serverless as an events-driven architecture — where cloud providers or others are responsible for autoscaling and managing the resources necessary for compute — means that in many cases, it is difficult to usefully test for how things will occur in a production environment.
Charity Majors, co-founder and CEO of platform-agnostic DevOps monitoring tool Honeycomb.io, says that this inability to test in development is not unique to serverless. Given the nature of building and deploying distributed applications at scale, there is no possible way to test for every eventuality. While she agrees that the “I don’t test, but when I do, I test in production” meme may be worthy of an eye-roll, she does believe in the concept of “testing in production.”
“When I say ‘test in production,’ I don’t mean not to do best practices first,” explained Majors. “What I mean is, there are unknown unknowns that should be tested by building our systems to be resilient. Some categories of bug can only be noticed when applications at scale. Maybe we need fewer staging environments and more guard rails. It is not a case of ‘if’ we roll out bugs, but when, and being able to check for them, and building robots to check for them needs to be part of a stable system.”
While this is true for all applications at scale, serverless also brings its particular context. Serverless is built on immutable infrastructure that spins up during an event, so it is not even possible, often, to create the same workflow twice, making staging and testing prior to production impossible.
Majors points to five emerging techniques that can be used to manage serverless testing in production.
Feature flags: “Feature flags let you date a release so you can ship your code to production,” said Majors. The idea is that for new features or upgrades, by using feature flags, developers can stipulate who should be using the new feature, while it remains invisible to the bulk of the rest of the customer base. “Feature Flags say ‘Don’t send anyone to this code until I say so,” said Majors. This allows for developers to test everything on themselves first, prior to more widespread rollouts. New startups like LaunchDarkly, are now appearing that offer Feature Flags as a service. “We use them at Honeycomb, and they are great,” said Majors. “The really cool thing about Launch Darkly is that the interface is really useful, and easy to use, you can have non-technical people like marketing using it.”
This makes it possible for anyone in a business to be able to test their own changes: A CEO updating the business website or a marketing team member adding API product landing pages to a developer portal could use feature flags alongside a tool like LaunchDarkly to test the changes they have made prior to making them live for the majority of the customer base, for example.
Often, Majors says, feature flags are something an internal team might hack up as they go, but that limits their usability internally as it is often built without the user interface that makes it accessible beyond the engineering team who built them.
Canaries: Canaries are often used in conjunction with feature flags. When Majors was at Facebook, the engineering team often first deployed new features and updates to Brazil, and would then manually check the error rate between a control base and the Brazilian deployment group. “These are staged rollouts,” said Majors. A canary allows testing if anything goes wrong in production before then rolling out more widely. Majors hopes that there will be a startup, like LaunchDarkly, that decides to focus just on managing canary staged rollouts. “There are companies that are doing it as part of their service mesh, like Buoyant and Turbine, even Nginx, do some of this,” said Majors. “It is about hardening your deployments, after all, you don’t know how it is going to run on the end.”
Staged rollouts with auto promotion to larger groups: This is a type of canarying with a staged component, instead of deploying to a specific geographic segment. “You set the canary at 10 percent of all production. Then you automate so your system compares that 10 percent against the rest of the system. When there are no errors or concerns, then it promotes the deployment to 25 percent. If at any point it is not okay, the system would either revert to pre-deployment, or, like the staged rollout itself, do a rolling revert,” said Majors. Majors says this is useful particularly for applications at scale, as often there are problems that may not exist at lower levels of deployment in production. “There may be problems you only see when you get to 80%, for example, because it requires opening more connections, or using more RAM. There becomes a tipping point where at aggregate level it takes the server down,” said Majors.
Rich instrumentation: “I firmly hope that we are going to look back to today as the bad old days when we shipped code and if we don’t get paged, then everything is okay,” warned Majors. At Honeycomb, her commitment is to building rich instrumentation so that developers can ask whether shipped code behaves as expected. “We should be capturing enough detail so we can see. As an industry, we are still coming up with universal principles for instrumentation. It is an art as well as a science.” Because of the nature of global applications, and the size of serverless and their often large workflows of multiple functions, entire architecture systems can’t fit in the heads of a single architect overseeing the system anymore, requiring a richer set of tooling to assist with managing production.
Observability-first development: All of this comes to a new paradigm Majors hopes will take the application development industry beyond the current test driven development principle. With rich instrumentation, it will be possible to monitor systems and understand who they are performing in real time. In an observability-first development paradigm, an engineering team might map the desired outcome from a workflow, application or feature rollout. But then before coding for that feature or workflow, the team would then create their instrumentation toolkit that first, might ask if the solution is even worth building, but then would be able to monitor both development and deployment to ensure alignment with the initial objectives. “It is like wearing a headlamp. Instead of building a feature and monitoring it, which is a very blind approach, with observability-first development you build instrumentation first. It makes gives more confidence to developers,” Majors said.
Feature image: One Does Not Simply Meme Generator.