The Process Equation (Cadence Is Everything, Part 2)
In the first part of this series, we looked at how a high iteration rate — or a high cadence — has been used to achieve remarkable accomplishments, even in the field of aerospace, where failure costs tend to be far, far higher and more combustible than in software.
One aspect that largely determines what is possible in spaceflight is the rocket equation, and it is inexorable: Any additional weight you want to carry into orbit requires more fuel, which is itself more weight, which requires more fuel, which is more weight, and so on.
Cycle times and overheads behave in a similar positive feedback loop, albeit with negative effects: When overheads increase, that invariably exerts pressure on developers to increase their batch sizes and to keep the relative overheads in check, which leads to greater cycle times, which leads to more overheads, and thus creates more pressure on developers, and so on.
The solution is to concentrate on the other side of the equation: Maniacally prioritize removing process overheads and make sure our feedback cycles work. This sounds uncontroversial and easy: We all agree that waste is bad, so let’s eliminate it!
However, it turns out that the things that interfere with a high cadence are not just those that we obviously identify as waste, but also those that we regard as important, even critical to producing good work. The key is to also prioritize cadence in these cases, and that is enormously difficult. When these concerns come into conflict, we either need to make a hard choice to remove them in favor of the cadence or find creative ways of having them not impact cadence.
This is actual agility. It is highly effective, and when done right, can feel almost effortless, despite the fact that getting there might be hard. On the other hand, “Agile” and its rituals are mostly easy, but also at best useless.
In the traditional quality assurance mindset, product development produces … well, something, and QA then ensures that the product has the required characteristics that it can be delivered; it assures that the product actually is the product it’s supposed to be. Hence quality assurance. Giving that assurance with the required confidence requires an exhaustive test cycle, usually including manual elements.
This is a good thing.
However, it also pushes a software development organization toward longer iterations, because the QA cycle requires a certain minimum amount of time, so you probably can’t and certainly don’t want to run through it for a one-line code change. (Or maybe you do, as AT&T learned to their great cost when their phone network failed catastrophically in 1990 due to one line of code. So developers will be motivated to increase the batch size of their code changes. Larger batch sizes, in turn, increase the likelihood of an undetected problem, so decreasing certainty and increasing the need for thorough QA, getting to our positive feedback loop of ever longer cycle times.
The key to getting out of this vicious cycle is to take advantage of the same dynamics, just in reverse, by making batch sizes as small as possible, and to accept the fact that QA cannot actually assure quality (the code change that took down the phone network referenced above had gone through a thorough QA cycle). All it can do is increase our confidence in the code that is being shipped.
And small batch sizes definitely help with increasing our confidence: In the limit, if the batch is empty, we are pretty certain that the software will continue to function as before. The smaller and more frequent the changes, the smaller the chance that any one change will have unforeseen adverse effects. First, the change simply is smaller. Second, a smaller change is easier to assess. Last but not least, the higher frequency also means we do this more often, and so get better at it.
To make small batch sizes work with QA, you need to have automated tests, preferably arranged in the well-known testing hierarchy. The function of traditional QA is then no longer to test the product, but rather to validate the automated tests and anchor a set of nested testing feedback loops.
The unit tests run fast and continuously check the product, ideally with every build, but certainly with every check-in. Longer-running integration tests and automated UI tests check the quality of the unit tests. If they find a problem that the unit tests missed, the unit tests must be updated. Finally, manual QA is the backstop that periodically checks that the test suite is not missing anything.
Contrary to what you may have read in the Agile literature, the key to agility is the ability to change code quickly and safely. And the key to that is the ability to retest code quickly and effectively. Fast-running automated tests (“unit tests”) are the key to agility.
—Jason Gorman @firstname.lastname@example.org (@jasongorman), April 18, 2020
Fast tests are one of the main drivers for achieving both high cadence and the courage that allows you to take greater risks, to put the things we’ve learned in previous iterations into practice. Once all your tests are green, you can check in your code and thus complete an iteration with some amount of feedback that tells you both that you’ve actually accomplished something and that you have not regressed.
Ideally, the tests are fast enough that you can do the test-driven development (TDD) cycle one test at a time: Write the test (red), make it pass (green), refactor (stay green) check-in.
This is probably the fastest iteration rate we can reasonably achieve in software.
But what if all your tests aren’t quite this fast? You can still make this work, first, by making sure you split your tests into unit tests that are localized and fast, and acceptance and/or integration tests that check the larger system. You can then treat only the unit tests as blocking and use the larger test suite as an asynchronous signal.
This means you might have to go back and revert (or fix) a change that you made a while ago if there is a failure, but as long as these types of test failures are rare — and they usually are — the tradeoff is well worth it.
What’s important to note here is that the interesting case here is test failure, not the tests passing. Except for the test that you are currently trying to make green, all tests should be green essentially all the time, and if they aren’t, you need to fix that.
With this in mind, it is obvious that we can speed up the signal that we need from the larger test suite without having to speed up the test suite itself: simply run the failing test first!
Of course, for that we’d have to know what test will fail, and with prediction being difficult, especially about the future, we can’t know what test that will be. However, there are techniques for figuring out which tests are most likely to fail and run those first.
At GitLab, we have been using Fail Fast Testing ourselves to do just that, and we have also rolled out that feature to our premium customers. In addition, you can use brute force to run tests in parallel. (Your tests are independent, right?)
Merge Requests and Asynchronous Feedback
The idea of making potentially slow feedback either very fast or asynchronous also applies to code review. Merge requests (MRs), also called pull requests, as commonly practiced in the industry today, are a bad idea.
The reason is that a pull request requires waiting for other developers to switch away from their current task, get into the context of your changes, try to review them, leave meaningful feedback, then you must incorporate the feedback, start the cycle anew, etc. All the while also taking care of other developers’ PRs.
Regularly going through this process for one- or two-line code changes is obviously madness and would probably also make you very unpopular with your colleagues. So developers are strongly incentivized to make their MRs larger both in order to justify this overhead and also to be mindful of their colleagues’ time and effort.
But of course, larger PRs are more difficult to review, so not only do they take more time, they also produce an even greater reluctance to switch focus, increasing cycle times further, and so we’re back in our positive feedback loop of negative consequences.
PRs come from the open source world, where unknown collaborators might send patches that have to be reviewed carefully, at the very least because there is no trust relationship between the actors and communication is very asynchronous and distributed by default. In high-trust environments, such a defensive mechanism doesn’t really make sense, particularly when augmented with automatic checks that enforce review requirements.
Instead, code review can be done completely synchronously by pairing, similar to the hopefully instantaneous feedback provided by fast unit tests. Alternatively, code review can be performed asynchronously by default, with reviews occurring after the code has been checked in.
In most cases, such reviews should not find any show-stoppers that must be removed, but rather suggestions for improvement that can be added as convenience. If this is not the normal case, if most changes introduce such egregious problems that they should have been held up, there are probably serious team issues that need to be resolved.
Traditional Product Management and Design Processes
Having made it this far, you didn’t expect these to remain unscathed? You cannot have iterative development with a high cadence if your product management defines the specs and hands them over to design to create pixel-perfect mockups for engineering to implement. This process has a long lead-time without any feedback cycles, and thus without the compound improvement you get from iterating.
Product, design, and engineering need to iterate together on the actual product … so you need to get something out there quickly, something not finished.
Iterations are for discovering the right specs and the best designs, for bringing real-world feedback to the process. Believing oneself able to accomplish this without feedback is hubris. They are not for implementing specs more quickly, and splitting a task into distinct packages is not iterating.
Product, design, and engineering need to iterate together on the actual product. That way the specs, the designs, and the final product all benefit from feedback from the real word, rather than people’s imaginations. For such feedback-gathering to be possible, there has to be an initial product to iterate on, and so you need to get something out there really quickly, something that is not finished, not polished, not done.
Expect everybody to resist this. Engineers want specced-out tickets and designs that they can just implement and be done with. Designers want to deliver pixel-perfect design. Product managers want control over what comes out at the end before it is started, in order to match the roadmap.
Always Change a Running System, Cowboy
Rushing to code quickly, possibly without full understanding — isn’t that the dreaded cowboy coder mentality? Yes, it is, but at the same time, it is also the cornerstone of highly disciplined and effective agile approaches. How can it be both?
All the streamlining described so far is designed to enable fast iterations, to design the solution concurrently with building it while gathering and integrating feedback from earlier attempts, just like MacGready did with the Gossamer Condor and SpaceX is doing with the Raptor engines and Starship. When done right, it can obviously lead to impressive, almost magical results.
However, going to code early can also lead to quickly delivered, unmaintainable, and barely functional messes, aka cowboy coding. How it turns out depends largely on the willingness to use that ability to actually iterate. Probably the biggest enemy of this is the mentality of “never change a running system,” a mentality that is absolutely deadly in an Agile context.
What you want is pretty much the opposite: Not only should we have a running system as early as possible that is then modified as feedback from the environment is incorporated, but making changes without a running system that is capable of giving such feedback should also be highly discouraged.
How can you tell how you’re doing? If you’re doing exceptionally well, you will know, because you won’t even have to measure cycle times; they will be near instantaneous. For those of us still trying to get better, you can try to use the DORA metrics discussed in the previous installment. GitLab provides an API that automates tracking these and assists you in getting better.
Conclusion: Keeping Not Just Your 10x-er Happy!
Another way of tracking your success is to see if your 10x developer is happy, as she feels the impact of process overhead much more than others. The reason is simple arithmetic: Let’s say we have processes in place that represent a 50% overhead for a normal developer like you and me. For an example task that takes two hours of coding, that will be one hour of overhead, so a third of the time spent.
A 10x developer, on the other hand, will get the coding done in 12 minutes, but the overhead remains the same at one hour. For them, the process overhead is five times the actual useful work, which is an exercise in futility and frustration. The more productive your developers, the more frustrating slow processes will be, and the more likely you to lose them, irrespective of whether 10x developers exist or not.
Shedding weight from your processes is simple, but like most simplifications, it can be very hard. Like the equipment in a rocket, each bit of process was put there by smart people for a good reason. But as with rockets, creative solutions are possible, and these get you from a positive feedback loop of ever more overhead to a positive feedback loop of lighter weight productivity containing ever tighter feedback loops improving your product. Try it, and watch your organization soar!