Where Does Trace-Based Testing Fit in the Testing Pyramid?
Trace-based testing is a new form of testing using observability data, namely OpenTelemetry-compatible distributed traces, collected in modern systems. It enables building integration and system tests quickly and easily, dramatically reducing the time and cost involved. You’re leveraging the time and effort already invested in instrumenting your distributed application to provide distributed traces.
Trace-based testing “adds on” testing as a benefit, making your investment in observability pay dividends by reducing test cost and allowing you to increase test coverage inexpensively.
For distributed Function as a Service and microservice systems, trace-based testing should be used to enhance confidence and reduce effort as compared to more traditional black box testing methods. It should be used for traditional integration tests, system tests, and if the product you deliver is an API such as Stripe or Twilio, end-to-end tests.
Key benefits of trace-based testing:
- Ease of writing and maintaining tests.
- Ability to provide meaningful tests against asynchronous and message-queue-based architectures.
- Vastly better ability to troubleshoot and debug failed tests.
- Provides high confidence tests.
- Made for cloud native, using cloud native observability techniques, namely distributed tracing.
Why are we so passionate about this new means to test? Is it actually a better solution for modern distributed cloud native applications? Let’s examine where trace-based testing fits in the testing pyramid and how it could affect testing strategy. Let’s start by looking at the testing pyramid and its key concepts.
Where Did the Testing Pyramid Originate?
The testing pyramid originated from the book “Succeeding with Agile” by Mike Cohn, in 2009. The concept visualizes a balanced approach to testing in agile development. The pyramid emphasizes a foundation of automated unit tests, followed by integration tests and a smaller number of end-to-end tests. The idea is to catch most issues early with fast and focused tests at the base of the pyramid, while using fewer and slower tests at higher levels for validation. The testing pyramid has become widely adopted in software development.
This diagram from “The Testing Pyramid: How to Structure Your Test Suite” clearly illustrates the parts of the system involved at each level:
There has since been much debate about the particular layers. Should they be broken up into more levels? What quantity of tests are needed at each level? As a founder and product manager, I cannot imagine not having some form of a manual test after a release and would argue that “manual” should appear above end-to-end tests. In most systems with a UI, humans are the users of the system, and while we want a high level of test coverage, a final sanity check by an actual person to catch the “unknown unknowns” we did not anticipate in our tests is mandatory.
How Do You Decide the Quantity of Tests in Each Area?
What drives the area (the number of tests), dedicated to each level in the original model? There are several factors:
- Speed of the test execution.
- Ease and expense of setting up test environments.
- Complexity and cost in creating and maintaining tests.
These can all be summarized as optimizing for cost in either time or money:
- Slower tests increase the time taken to complete the testing phase in each release.
- Increased time and dollars in setting up environments, with integration tests and end-to-end tests requiring fuller environments.
- Integration testing in a distributed, microservice-based system is notoriously hard, with much of the effort focused on how to implement the test across multiple services.
Is cost the only factor to consider in determining the testing strategy and allocation of tests across these boundaries? Of course not. If you wanted to optimize solely for cost, you would have zero tests, spend no time or money on creating or running them, optimizing the cost and time for testing. Tests, however, have a purpose, and that purpose is to provide you with the confidence to release a quality product repeatedly. Perhaps considering which tests provide the most confidence and ensure quality should be an important factor in deciding the mix of tests.
Confidence as a Driving Factor in Determining Test Mix
Tomas Fernandez’s article, “The Testing Pyramid: How to Structure Your Test Suite,” explores this and explains the concept using a testing matrix diagram, shown below:
Most new projects start with lots of unit tests, which are low effort, but low-confidence tests. As the product progresses, you need to move to high-confidence tests. This is accomplished by full-system testing, whether via the API and/or the user interface. A goal is to find a way to write high-confidence tests that are still low effort.
Microservice-Based Architectures Break Traditional Black Box Testing
Microservice-based architecture has dramatically altered the backend. When the testing pyramid concept was first introduced in 2009, most systems were just beginning to move to an API-based approach, which separated the frontend from the backend code. Most API calls were purely synchronous, and you could rely on the status code returned by a call to indicate the success or failure of the entire execution initiated by the call.
Fast forward to today. Most microservice-based architectures have a greater level of complexity, with various languages, asynchronous processes and message queues extensively used and developed in parallel. A call to an API may or may not be synchronous, and for the asynchronous API calls, the status code returned may just indicate success in placing the message on a message bus, not that the entire sequence of events being triggered all function properly.
In addition, by the nature of microservice-based architecture, systems are broken down into more granular pieces, each focused on providing a service for a particular need. This vastly increases the interdependencies and interfaces between systems, and the number of systems involved in servicing one request. With this increase, the need for testing the combined, full system increases, as interfaces between systems written by different teams is often a source of concern.
So how has testing the backend changed? Integration testing in 2009 was mostly black box testing. A test triggers an API and then tests against the information contained in the response. Was the status code returned 200? Did we get the value we’d expected in the response body? These tests were inherently not much harder to write than unit tests. Any complexity and cost increase resulted from having to set up the fuller environment. These were high confidence, with fairly low-cost tests.
Today, however, the same integration tests begin to seem shallow and insufficient, not actually testing the complete flow, only providing a false sense of security. Questions that often need to be asked include:
- Was the message successfully pulled off the queue by the three downstream services?
- Did the third microservice successfully complete?
- How long did it take for all the downstream processing to complete?
- Why did the test fail?
To actually test the complete flow, more code needs to be written to assert the state of various parts of the infrastructure before and after the transaction was triggered. Instrumentation needs to be added so you can “see” or “observe” what is occurring. White box testing, often considered an antipattern, is necessary to provide the data necessary to have assertions that are meaningful and increase confidence.
These tests typically involve a senior engineer with enough experience to know all the services involved, and may involve multiple teams adding instrumentation and test rigs for critical functions to be observed and tested against. Full system tests such as these can take three to 10 days for a senior engineer with knowledge of the system to complete, and maintenance and changes are also time consuming. Creating these more robust tests results in a high-confidence test, but it is very costly and time consuming. As a result, the number of tests created is limited to only the most meaningful and covers just the critical paths.
In summary, microservice-based architectures have increased the need for integration testing against the fully connected system while reducing the confidence provided by black box testing.
Observability-Driven Development and Testing in Modern Distributed Applications
The complexity introduced by moving to asynchronous, message bus-based applications has not only affected testing, but also the ability to troubleshoot and monitor these systems. New methods to gain visibility into the system have been introduced, with the most important of these being distributed tracing. Distributed tracing allows the tracking of application requests as they flow from frontend devices to backend services and databases. It delivers to the site reliability engineering team a view of exactly what occurred when a particular API call was executed. With this visibility, identifying and resolving issues is much easier, improving mean time to repair (MTTR), increasing uptime and thus customer satisfaction.
Distributed Tracing Is the Power Behind Trace-Based Testing
Trace-based tests, similar to most integration tests, consist of a trigger and a set of assertions. What differentiates trace-based tests is the data available to assert against. Where a traditional integration test gathers and allows you to assert against the response of an API call, a trace-based test gathers both the response and the full distributed trace, and allows you to assert against any of the data. This allows you to add assertions such as:
- Assert that the call to an external system in the microservice four systems down returns a status code of 200.
- Assert that all the MySQL calls return in less than 100 milliseconds.
- Assert that a particular, or every, gRPC-based microservice returns a status code of 0.
- Assert that the database is written to by microservice “X.”
These assertions are simple, clean and allow comprehensive tests to be created, which increase confidence.
Tomas Fernandez’s article has a section on how to increase confidence and reduce effort, and breaks it into five characteristics of your tests that you should evaluate:
How can you increase confidence and reduce effort? The answer is to periodically reevaluate the characteristics of your tests in the following five categories:
- Installation: The effort involved in installing and setting up the test framework.
- Writing: The complexity of writing tests and the skill level of the developers for a given framework.
- Running: The difficulty of running the test suite and CI/CD performance.
- Debugging: How easy it is to find and fix a problem when a test fails.
- Maintenance: How much effort is required to maintain a test throughout the project’s lifetime.
Let’s look at trace-based testing and see how it fares for each of these categories. We will use Tracetest, an open source trace-based testing tool, as a basis for this analysis.
Tracetest can be installed in either Docker or Kubernetes. It connects to your existing distributed tracing solution, whether an open source solution such as Jaeger, Grafana Tempo or OpenSearch, or a vendor-provided solution such as Lightstep, New Relic, Honeycomb, Datadog or Elastic.
This is where trace-based testing shines. With Tracetest, when you initially build a test you define the test trigger, such as the API call you want to execute, and run the test with no assertions. You can then see the visual representation response and the full distributed trace and begin to add assertions by interactively inspecting different areas of the process, selecting return data and adding assertions against it. The visual nature of the tool makes understanding the system and where to add assertion easy, even for less experienced or technical users.
This is another area where trace-based testing excels. With every trace-based test, both the API response and the full distributed trace is captured. When looking at a failed test, you can see the steps in the process that directly failed to pass a test specification. What engineer, upon seeing that a test failed, is not going to be happy to get a full trace showing exactly what occurred?
White box testing, which trace-based testing can be categorized as, looks at the internal processes, so by its nature it is more susceptible to tests breaking when the internal details of the assertions are being applied against change. The ease with which trace-based tests can be revised, however, makes the process of updating tests as flows or microservices change trivial.
So, Where Does Trace-Based Testing Fit?
Trace-based testing clearly fits in the integration area of the testing pyramid. For companies with complex microservices architectures that are using Function-as-a-Service-based techniques or using any system with a distributed, cloud native architecture, the current use of black box testing limits the confidence that the entire flow is working. These black box testing techniques also fail to help determine what area of the flow is at fault when a test does fail, leaving the onus on the engineer to attempt to duplicate the issue and diagnose the problem.
Trace-based testing overcomes these handicaps by leveraging the work already invested in providing observability to your system via distributed tracing. Being able to test against this data makes your tests “see” the entire process and allows a test to actually verify the entire system. By doing so in a quick, declarative and visual manner, these tests are both high confidence and low effort, making trace-based tests advantageous for testing distributed applications.