Cadence Is Everything: 10x Engineering Orgs for 10x Engineers
Cadence to the Stars, part one of a two-part series
I confess: Although I don’t believe in Bigfoot or Nessie, and do believe the moon landings happened, I am convinced that despite the current orthodoxies, 10x engineers very much exist and are a major positive force for the industry, and potentially your organization. If you can find one, convince her to work for you and keep her happy and productive (but I repeat myself).
Alas, finding one is not easy, and no, job adverts stating “We only hire the best” don’t help. However, what you can do is structure your development organization in a way to make such a person productive.
Fortunately, making a 10x developer productive is pretty much the same as making your development organization productive for everyone, just dialed up to 11, particularly because an inefficient organization will affect a more efficient developer much more dramatically.
Unfortunately, this state appears to be neither natural nor stable.
Effective organizations are unnatural. The natural state of organizations is bureaucracy and turf wars, and once deprived of effective leadership they revert to their natural state with shocking speed.
— Paul Graham (@paulg) August 7, 2022
Similar to organizations in general, development organizations naturally tend toward inefficiency.
More specifically, development organizations tend toward ever-lengthening cycle times just as much as organizations in general tend toward bureaucracy. In both cases, this is always for good reasons. This is really important: If this tendency toward lengthening cycle times were just stupidity or laziness, it would be significantly easier to counter. Anthropologist and historian Joseph Tainter makes a similar point about civilizations, whose ever-increasing complexity leads to their collapse. Here as well, the complexity is not introduced willy-nilly but as a necessary response to problems the civilization faces.
The Sky’s the Limit
Software tends to be fairly abstract, but the principles of short cycle times are just applicable in more down-to-earth disciplines, or should I say down to air? First, one of my favorites: the story of how Paul MacCready created the Gossamer Condor to win the first Kremer Prize for human-powered flight. More recently, Elon Musk’s SpaceX has been out-iterating NASA and the legacy spaceflight companies with results that would have seemed miraculous a couple of decades ago. Both examples show that while other factors are obviously more important, cadence actually dominates them in short order.
MacCready had come into a bit of debt due to securing a friend’s business loan, and set his eyes on the first Kremer prize for human-powered flight. This had gone unclaimed for 17 years, but not for lack of trying: There had been over 50 official attempts; all failed. It was a Very Hard Problem we couldn’t solve, so it obviously required the most aerodynamically efficient and sophisticated designs possible. So that’s what people did, and when their sophisticated plane inevitably crashed— after all, they were working on the edge of the possible — it took them a year or more to rebuild it.
MacCready approached this from the opposite angle: He would concentrate on a plane that didn’t have to be so efficient and sophisticated, but instead would fly low and slow, be light and very repairable, aiming for 12 crashes a day.
The Gossamer Condor was built out of some lightweight aluminum struts and mylar foil and could usually be repaired with Scotch tape. It was a weird contraption that didn’t look like it could fly.
Within a few months, the team had accumulated more flights, and more crashes, than the rest of the competition combined. With all that experience, they then also understood the actual problems better than anyone else, for example, how to steer, and soon won the prize, which involved flying a mile in a circle eight.
Alan Kay – Normal Considered Harmful – YouTube
This wasn’t a one-off fluke either: The team went on to win the next Kremer prize as well, crossing the English Channel, and then pioneered solar flight and broke the SR-71’s altitude record. The company that came out of the effort nowadays makes drones, including the successful Switchblade drones for the U.S. military that have recently been sent to help in the Ukraine conflict.
The Sky’s Not the Limit
More recently, SpaceX has been demonstrating the efficacy of iterative development, first with the Falcon 9 rocket and now with the Starship program. While the latter hasn’t flown to space yet and so may still fail completely, both the aim and the achievements so far have been breathtaking, particularly compared to NASA’s Space Launch System (SLS), which was started around the same time and is designed to have similar capabilities, lifting around 100 tons to low earth orbit.
The NASA SLS is a cost-reduced version of the Constellation program, which was canceled early after quickly outgrowing its projected $150 billion dollar budget. The reduced development cost of the SLS (so far $23 billion in 10 years) has been achieved by reusing not just designs, but also parts from the Space Shuttle program. Not just the solid rocket boosters, but some of the main engines are the actual parts that flew on shuttles and had been mothballed by NASA.
Despite this part reuse, launches of the fully expendable rocket are predicted to cost somewhat upward of $1 billion per pop. As of Oct. 20, there have been no flights of any of the hardware (except on space shuttles), and the first test launch scheduled for Nov. 26 will fly the full stack as designed.
In comparison, the Starship program is estimated to have cost $3 billion so far, with estimates of total development costs varying between $5 billion and $10 billion. This is for a completely new rocket, pretty much unlike any that have come before, designed for full reusability and same-day turnaround after refueling, completely new methane-burning engines, assembly-line production using relatively inexpensive materials and a projected cost target of $10 million per launch.
If they work as advertised, just a few Starships could turn the entire launch capacity of planet earth thus far into a footnote, a rounding error, and they plan to build a thousand of them. That’s why they’re building a factory to make them.
It’s anyone’s guess whether all this launch capacity, at costs two or more magnitudes lower than currently possible, is really for making humanity multiplanetary by establishing a Mars colony or “just” for making space-based production and asteroid mining feasible.
When asked, Elon Musk put it quite simply: “Any given technology development is how many iterations do you have and what’s your time and progress between iterations.”
The more quickly you can iterate, the more iterations you have available. But doesn’t iterating more quickly make the progress between iterations correspondingly less, canceling the effect? Surprisingly, that turns out not to be the case.
Elon Musk again: “So if you have a high production rate, you can have a lot of iterations. You can try lots of different things, and it’s OK if you blow up an engine because you’ve got a number of engines coming after that. If you have a small number of engines then you have to be much more conservative, because you can’t risk blowing them up.”
The higher iteration rate allows you to take more risks, which in turn allows you to push the boundaries more and thus gather more relevant feedback in each iteration, at the same time that the reduced time frame reduces what you can do. So there will be more failures. For example, engines blowing up or planes crashing. But as long as the failures provide the information they were supposed to provide, and the individual failure modes aren’t fatal, they aren’t actually failures.
You obviously don’t want to be cavalier about this, but accepting that risk allows you to push much farther per iteration. Musk also mentioned that since it was one of the main problems of the Space Shuttle program, they couldn’t afford to have one blow up because even the first flight was manned.
“A high production rate solves many ills,” he said.
In software, the production rate is the iteration rate. If you have lots of iterations, it’s OK if one of them was a potentially high-value experiment that doesn’t pan out. If you have one iteration per year, you are less likely to want to take that risk, and your reluctance will be justified. The willingness and ability to take risks is captured in the Extreme Programming (XP) value of “courage.”
Compound Interest and Experience
The reason this works out is mathematical. If you iterate and actually use the feedback the iteration gives you to improve, you will improve a little bit each time because you will have learned something. For simplicity’s sake, let’s assume an improvement of 5% per iteration. This is like compound interest, and while it starts slow, once it ramps up it gives outsize returns, like any exponential.
Improve 2% per iteration and after three iterations, you will have improved by 6%, which is essentially the same as a linear improvement. After 200 iterations, however, and whereas the linear approach will have improved by a respectable factor of 4, the iterative approach will have improved by more than 50x.
Apart from the purely mathematical, there is also the human factor: When we do things over and over again, we start to figure out how it works. We develop an intuition.
What the Science Says
The simplistic mathematical function is obviously not an accurate model of the real world, but the science actually has concluded that higher iteration rates are the one most important factor for the output of software development teams, at least according to the researchers. These findings have been published in the book “Accelerate” by Nicole Forsgren, Jez Humble and Gene Kim. The authors have since moved to Google as the DevOps research and assessment (DORA) team and make their findings available here.
In short, they found that performance of software teams correlates strongly with cycle times, with the lowest-performing teams having cycle times measured in months, medium performers in weeks, good performers in days and excellent performers in hours. There is also good evidence for the causality going for cycle times to performance, and not the other way around.
But there’s a deeper connection because the method of iterating on real-world feedback is really just the scientific method. No more, no less. It is somewhat surprising that in the field of software, we still often consider the scientific method as unruly and dangerous “cowboy coding,” and instead advocate for what is really little different from prescience scholasticism as the proper approach to creating software.
To help us also be more scientific and data driven, the DORA team created metrics, called the DORA metrics. They are the following:
- Deployment frequency — How often an organization successfully releases to production
- Lead time for changes — The amount of time it takes a commit to get into production
- Change failure rate — The percentage of deployments causing a failure in production
- Time to restore service — How long it takes an organization to recover from a failure in production
The Dangers of Dead Reckoning
In reality, it is much more dangerous to stay away from actual code and real feedback from users for any length of time. For example before GPS, ships used essentially two methods for navigation: dead reckoning and external fixes. With dead reckoning, you took a known position, added the course speed and known currents over time to come up with a new position.
However, despite the best equipment and methods, this method always introduces some error because the external factors cannot be known with certainty. And what’s worse, just like improvements accumulate and build on each other over time, so do these errors, making the position ever more uncertain over time.
When you are in the middle of the ocean, that might not be a huge problem, but it can be deadly close to shore, which is why the amphibious ships of the Royal Navy were required to use position fixing in intervals of a few minutes. With position fixing, you use the actual external environment, landmarks that you can triangulate to determine your position (and of course GPS is just a version of this, except using satellites for the fix instead of landmarks). This means you aren’t guessing where you are, you know where you are, and every new measurement clears the slate of any errors; there is no accumulation.
Slides don’t crash, and Jira is patient. You can have 100 tasks that are marked as 99% completed in your tracker of choice and still never ship anything to customers.
“Reality is that which, when you stop believing in it, doesn’t go away.” — science-fiction writer Philip K. Dick, “How to Build a Universe that Doesn’t Fall Apart Two Days Later”
In part two, the Process Equation, we will look at overcoming the forces that tend to push software engineering organizations toward higher cycle times and lower cadence.