Hunting Sasquatch: How to Find Intermittent Issues Using Periodic Test Automation
Tricentis sponsored this post.
In American folklore, Sasquatch is an elusive, ape-like creature infrequently seen at the boundaries of wilderness and civilization. In the software realm, we have our own version of Sasquatch: that irritating, difficult-to-find intermittent issue that occurs in a complex system and that software developers often don’t believe exists.
But the issue does exist, and your team needs to address it. As Paul Grizzaffi demonstrated in his talk at Accelerate San Francisco, periodic test automation can help.
What Causes Intermittent Issues?
Intermittent issues have increased in recent years due to the rise of continuous integration. This practice of developers integrating different branches of code earlier and more often comes with many benefits, including making it easier to address most issues. However, continuous integration does make it more difficult to catch intermittent issues that stem from quirky test environments, periodic maintenance and race conditions.
Among these scenarios, the most common cause of intermittent issues and the hardest to defend against is race conditions. This occurs when two events are received in a non-deterministic order.
For example, when a system is supposed to receive multiple events but can’t predict the order of those events, problems can arise in understanding how to process them. When this type of race condition happens, we get an intermittent issue — a.k.a. a Sasquatch.
Why Are Race Conditions so Difficult to Test For?
Today’s software is complex, often relying on interactions with multiple third-party components. The order in which events from these applications come into the system can be unpredictable, with numerous permutations and combinations possible. As a result of this complexity, enumerating all of the possibilities is, at the very least, a tedious and error-prone task when performed manually. Automation, for this is and other tasks, is also increasingly seen as a necessity, especially as operations scale and become more complex for many organizations. However, the task is not easy to automate, either. This is because race conditions are unpredictable by nature. Even if we could create a script that could enumerate each possible sequence of events in sufficiently complex systems, there are so many different things to test that most teams just don’t have the resources.
How Can You Find Intermittent Issues Like Race Conditions?
Continuous integration isn’t going anywhere anytime soon, and that means we need to find a better way to discover and handle intermittent issues caused by race conditions. The key to doing so lies in four steps:
- Run “on every deploy” scripts: Always test code on deployment to make sure that newly integrated work doesn’t break anything.
- Supplement those scripts with periodic automation: Periodically rerun all or some of your “on every deploy” scripts to check the results again, because the more you look for something, the more likely you are to see it. This practice increases your chances of seeing the elusive Sasquatch.
- Investigate every sighting: Every time something fails, you must dig in to determine why it failed.
- Maintain a good partnership between developers and testers: This partnership is critical to raising and resolving intermittent issues efficiently and effectively.
Why Is Periodic Test Automation so Essential to Finding Intermittent Issues?
Periodic automation is good at finding intermittent issues because it runs on a time boundary rather than on an event boundary. While your “on every deploy” scripts run only when changes get made to the code, periodic automation runs regularly at a set cadence, even if there have been no changes since the last run. As a result, periodic automation can find and isolate issues that only occur in certain circumstances.
What Are Best Practices for Using Periodic Test Automation to Find Intermittent Issues?
As valuable as periodic automation can be for finding intermittent issues, you must have the right framework in place for it to work properly. To start, follow these best practices:
- Focus on failures, but don’t ignore successes: Obviously you need to address failures, but successes are important too, as they can help you figure out why software works properly in certain situations but not others.
- Act quickly — time is of the essence: The longer you wait to address a failure, the more likely it is that the code or situation will change. As a result, when failures arise you need to act as quickly as possible.
- Keep noise to a minimum to avoid failure fatigue: Only alert relevant people (e.g. those who integrated new code in a certain timeframe) about failures and do so in a prominent, but non-abrasive manner so that they pay attention and act quickly when they receive alerts.
- Only test what you’re prepared to fix: If you’re not prepared to fix a failure (this can happen for numerous reasons), then you need to stop testing it. Otherwise, you will desensitize your team to failures that need to be addressed immediately.
- Trust your automation: Trust issues are far from ideal, but they do happen. And when teams lose trust in automation, they are less likely to review results, investigate and report issues or even find issues to begin with. To effectively find and resolve intermittent issues, you need to trust the automation.
Learn More About Finding Intermittent Issues Using Periodic Test Automation
Check out this session from Tricentis Accelerate 2019 featuring Paul Grizzaffi as he explains how to apply periodic automation to find issues outside of typical event boundaries, how the approach relates to High-Volume Automated Testing (HiVAT) and how it can help you avoid desensitization to failures.
Feature image via Pixabay.