How to Deal with Flaky Tests

Thundra sponsored this post.

The main goal of building a CI/CD pipeline is to improve developer velocity. The more you can automate your integration and deployment process, the faster you can get new releases out the door. A test suite — a collection of tests that check for bugs that were introduced into your code — is a crucial part of such a pipeline.
Sometimes the tests can be flaky, which means they fail or succeed at seemingly random intervals without any code changes. You’ll need to run a test suite multiple times to determine whether you have a bug or a flaky test. This process slows down your CI/CD pipeline tremendously. As noted above, these pipelines should improve developer velocity, not hold you back.
If you have too many flaky tests in your suite, they can also wear down the trust you have in those tests. If one test fails at random, how can you trust that another test won’t do the same? Without confidence in your tests, your team could stop taking test results seriously and even stop writing them in the first place.
How to Spot a Flaky Test
There are two ways to get a feel for how flaky your tests are. One is to run a test, or even a whole test suite, multiple times. If you didn’t change any code between test runs and the test suite shows a different number of failed tests at every run, you can be sure that something has gone awry with your test.
The other way is to run them in a different order each time. If your tests fail when their order is changed, it’s a sign that you haven’t accounted for inter-test dependencies.
Some tools can help with this. For Java developers, there’s a Gradle plugin that will rerun your tests to see if they’re deterministic. While this doesn’t save you time, it at least automates the process of finding flaky tests.
Spotify built a GitHub bot that can run tests for new code multiple times before a merge–you can manually start it on a pull request. While the bot is not publicly available, it’s easy to build such a tool.
AVA is another test runner for JavaScript. It runs your tests in parallel, which has the advantage of starting all tests simultaneously so they can’t depend on each other.
Another option is Thundra Sidekick, which enables developers to troubleshoot tests with non-intrusive debugging by letting them set breakpoints. In this way, developers can figure out if the tests are really flaky or not.
How to Address Flaky Tests?
Now that you can find the flaky tests, you need to fix them. Here are seven highly effective strategies.
1. Visualizing Test Runs
While test-run visualizations alone can give you an idea of how well your tests work, (in combination with multiple randomized test runs) you’ll also get a clear depiction of whether flaky tests are increasing or decreasing over time. A simple table with rows showing time and columns showing tests can be enough for this.
2. Quarantining Flaky Tests
Once you’ve found your flaky tests, you should create a separate test suite for them to serve as a quarantine. Your non-flaky tests don’t have to be run multiple times, so creating an extra test suite will save you from duplicating part of the work. Google has even created a tool to help with this by automatically putting flaky tests in a separate test suite.
This practice will also help when fixing the flaky tests because it allows you to focus on the flaky tests independently. If isolating the tests in their own suite fixes the flakiness, that change alone will give you an idea if inter-test dependence is the reason for the flakiness.
3. Cleaning up State
Remove all state and data generated before a test run, so your test can’t be derailed by existing data you forgot about. This state can live in caches, databases or even variables. You’ll also want to check that your tests clean up correctly after they’re done — clean-up errors are often silently ignored in test suites. In a worst-case scenario, you’ll need to rebuild the whole system for every test run.
For databases, it can be helpful to use transactions. These can be rolled back after a test run, bringing the database back to the state it was in before the test was started.
4. Looking for Timeouts
Asynchronous tests that access network resources are especially prone to flake due to timeouts. The network can be quick or slow depending on the number of services using it. A too-short timeout can cause a test to flake. Setting your timeout variables in bulk will allow you to change them quickly in the future.
If you have a complex test relying on asynchronous services, try to check the service for availability before starting the test. This will save time when your timeouts become too long.
5. Using Test Doubles
You can create a simplified version if you test a service that isn’t deterministic. A common critique of this practice is that test doubles don’t always accurately mimic the actual service. By ensuring that the test double doesn’t deviate from the original, you can account for updates. Writing contract tests can help mitigate this problem.
6. Checking the System Clock
If your code depends on data that can’t be known in advance, such as the system clock, wrap these data sources in your code and don’t rely on them directly. This will allow you to replace their outputs with hard-coded data before running a test.
7. Checking for Memory Leaks
Profile your test code to get a feeling for its memory usage over time. If your code has memory leaks, you’ll see your test suite’s memory usage grow with every run test. Depending on the available resources and other systems running on that hardware, a memory leak could very well be the source of your flakiness problems.
If you use resource allocation pools as wrappers between your code and the actual memory allocation, requesting too much will cause your code to fail in a determined way. You can then try to fix the issue by lowering its memory allocation.
Summary
Flaky tests aren’t trivial. Even if flakiness seems random at first glance, it’s important to keep in mind that pre-existing state, network problems, timing, or even memory allocation can all play a role in how well a test functions.
Luckily, there are viable methods to eliminate flakiness from your tests. Rerun your tests multiple times, change their execution order, and visualize how they succeed and fail over time. Finally, quarantine flaky tests into a separate test suite and try to fix them by looking at the potential root causes.
Flaky tests cost you time and money by slowing down your CI/CD pipeline with multiple reruns and erode the trust your team has in testing. Thundra Sidekick is currently available as an IntelliJ IDEA plugin for Java applications. You can get started in just a few minutes and begin troubleshooting tests today so your team doesn’t lose any more time.
Featured image via Pixabay.