Measuring Developer Productivity: Who’s Winning the Debate?
“So software development is undermeasured? No, it’s not, it’s one of those measured and scrutinized endeavors in business history!”
So said Daniel Terhorst-North, a software consultant, who talked to The New Stack about McKinsey’s recent article that argued yes, you can measure software developer productivity.
The piece has been controversial since it was published in late August, with many influential figures in software engineering objecting to the management consultancy’s claims. Terhorst-North himself wrote a piece in response that highlighted what can be missed when productivity in software development is framed in a narrow and unreflective way.
“Just don’t try to measure the individual contribution of a unit in a complex adaptive system,” he wrote, “because the premise of the question is flawed.”
The strength of feeling the McKinsey article has inspired warrants further inspection. It appears to have hit a real nerve. Terhorst-North’s blog post doesn’t seem to have exorcized his feelings, for instance; he’s published a more in-depth rebuttal.
But why has it hit such a nerve? And if there are some real problems with the arguments that are made in it, maybe it’s a starting point for prompting a more positive conversation about what the work of software developers actually involves.
What’s the Context?
Although the conversation around developer productivity feels particularly current, in reality this isn’t novel. Martin Fowler, one of the signatories to the Agile manifesto, wrote a short piece on his website 20 years ago called “CannotMeasureProductivity.”
“I can see why measuring productivity is so seductive,” Fowler wrote. “If we could do it we could assess software much more easily and objectively than we can now. But false measures only make things worse.”
This idea that false measures only make things worse is nicely expressed by the concept of Goodhart’s Law — the idea that when you optimize for a specific metric (like, say, lines of code) you get strange and unhelpful consequences.
The idea precedes software engineering and is actually rooted in an economic discussion about monetary policy in the UK in the 1970s, demonstrating that anxiety about measurement was a long-running issue of the late 20th century.
The lesson of Goodhart’s Law is that blindly following a metric leads to bad results, which is something that, as Yvonne Lam, a software engineer who has worked at the platform API company Kong, a platform API company, and for Chef, a DevOps tool vendor, told The New Stack that the piece doesn’t really address.
“The question of why would you want to measure developer productivity is very much up in the air in that article,” Lam said. “There’s just always a moment when you talk about measurement where it’s like, OK, Goodhart’s Law has entered the chat.”
While the issue of productivity and measurement is, of course, an enduring question of management, Kent Beck — who wrote one of the most widely shared responses to McKinsey in collaboration with Gergely Orosz, creator of The Pragmatic Engineer newsletter — told TNS over email there is something specific to today that needs to be acknowledged to understand where McKinsey is coming from: the current economic situation and, more specifically, low interest rates.
“No money for tech [means] execs getting antsy about return on investment,” Beck wrote TNS. “Rising interest rates have thrown additional scrutiny on all investments. Three to four years ago nobody cared, because money to just hire a bunch more programmers was basically free.”
One way of viewing McKinsey’s stance on developer productivity, then, is that the consultancy is well aware of this wave of corporate anxiety and looking to exploit it rather than addressing the more subtle and complex issues of software development productivity.
“Yes, they are trying to make a buck, trying to shift to tools in the face of declining consulting revenue,” Beck wrote in his email. “But they also kind of believe what they are selling.”
“No money for tech means execs getting antsy about return on investment. Rising interest rates have thrown additional scrutiny on all investments. Three to four years ago nobody cared, because money to just hire a bunch more programmers was basically free.”
—Kent Beck, consultant and veteran programmer
However, others aren’t so sure. “It’s always better to assume incompetence than malice,” Terhorst-North said.
In a surprising detour in our conversation, he mentioned Virginia Satir, the pioneering family therapist who introduced the idea that a child’s behavioral issues are symptoms of their family life. The relevance here, he said, is that “if you fix the child and send it back into the same system … you’re gonna end up where you started.”
He frames this in terms of the recruitment of software developers. “If your recruiting process isn’t completely broken, then you’re hiring good people; If good people aren’t performing well in your organization, what in the system is causing them to underperform?”
If you tackle those things, he continued, “you do two things. One is you fix them — you make it easy for them to do work. The other is if you put someone else in their position, they will have the same symptoms.” McKinsey is offering possible solutions but isn’t really getting its readers to think about the context — the “family situation” in therapy terms.
Tackling the Knowledge Gap
A fundamental aspect of the McKinsey piece is that there is a knowledge gap between business leadership and technologists. As the authors (Chandra Gnanasambandam, Martin Harrysson, Alharith Hussin, Jason Keovichit, and Shivam Srivastava) state in their introduction, “The long-held belief by many in tech is that it is not possible to [measure developer productivity] correctly — and that, in any case, only trained engineers are knowledgeable enough to assess the performance of their peers. Yet that status quo is no longer sustainable.”
This shouldn’t be a controversial point: the idea of insiders and outsiders in an organization — in any domain or function — is outdated and regressive at a time when DevOps, FinOps and citizen developers are part of the mainstream. However, one of the problems with how the McKinsey article frames the issue is that it appears to lay the blame on technologists rather than seeing it as a problem of communication and translation.
“There is a long-held belief that coding is the technology equivalent of building. In fact, coding is more like the design phase of a building project, rather than the construction phase. This is the key mindset shift needed. You’d never judge the throughput or completeness of an architectural drawing by counting the number of lines drawn on a blueprint.”
—Steve Fenton, Octopus Deploy
“I think we can all think back to a time where we worked in a team where the senior managers either didn’t know anything about our work, or were drastically out of date,” Steve Fenton, software engineer at Octopus Deploy, a DevOps automation company, told The New Stack. “While it’s not fair to say this is true in all C Suites, there are likely to be some that would benefit by updating their knowledge of software delivery.”
He added that a “collaborative approach is needed where practitioners find ways to communicate across this apparent divide.”
Lam made a similar point. “It can be really hard to make your work legible as a software developer,” she said, adding, “We have to be able to explain our work to people who don’t know anything about our work.”
The McKinsey article is able to identify this knowledge gap, but it fails to really outline how business leaders and technologists can work together. The argument almost seems to be that they just can’t, so must therefore buy McKinsey’s consulting time.
Beck is blunt: “I’ve been writing about better ways of thinking about the situation for 30 years. Near as I can tell the folks with power don’t like my answers because they would feel uncomfortable and they would need to learn new skills.”
What Can Practitioners Do?
Much of the discussion about productivity is framed as an argument about management — what can be measured, tracked and optimized. This means that the perspectives and experiences of practitioners can get lost.
However, the question as to what software engineers can actually do appears to be incredibly complex and, indeed, fraught. While on the one hand, Fenton, of Octopus Deploy, stressed the importance of communication across boundaries (“We need to spend some time understanding what the C-Suite need.”), Beck suggested almost the opposite: “Set boundaries. Hard, if necessary.”
Perhaps there is truth in both perspectives: yes, it’s important to engage with the needs and thinking of those in other domains and to, as Lam said, think about the ways in which work is made “legible.” But it’s also important to make sure that the practice and process of designing and building software is not misunderstood or reduced to being little more than writing lines of code.
“There is a long-held belief that coding is the technology equivalent of building,” Fenton said. “In fact, coding is more like the design phase of a building project, rather than the construction phase. This is the key mindset shift needed. You’d never judge the throughput or completeness of an architectural drawing by counting the number of lines drawn on a blueprint.”
“It can be really hard to make your work legible as a software developer. We have to be able to explain our work to people who don’t know anything about our work.”
—Yvonne Lam, software engineer
This regressive way of thinking is nicely exemplified by the McKinsey writers’ notion of software development consisting of an “inner loop” and an “outer loop.”
The inner loop is ostensibly the fundamental elements of software development — it includes build, code and test. The outer loop, meanwhile, includes activities that McKinsey seems to suggest are peripheral to “‘real” software development work: meetings, deployment at scale, security and compliance and integration.
The problems are obvious — anyone who has worked in software over the last decade will be well aware that efforts have been made to integrate both of these loops as closely as possible, rather than pulling them further and further apart. As Terhorst-North said, “the thinking and the collaborating and the whiteboarding and the arguing with each other and the trying five different things is the inner loop.”
So, perhaps it’s not just a question of making work more legible but also changing the way it’s understood. “The whole point of the article was that the best developers should be doing rather than thinking or supporting,” he added. It’s this idea that needs to be challenged — the notion that building software is all about action and delivery.
There are a number of reasons for this misunderstanding, but Lam pointed out that what makes developer work particularly hard is the level of fragmentation and change with which developers have to reckon.
She stressed the importance of paying attention to the systems and tools in which developers are embedded and enabled.
“We need to build some porcelain, right?” Lam said, nodding to the metaphor of software infrastructures being like plumbing. “We need to build layers so that people can do their work … slowing the appearance of change to the level where people are just able to deploy their thing, or deploy their pipeline or whatever.”
Shifting things, then, seems like it may involve reframing productivity.
“It’s just unknowable,” Beck said. “It would be like looking at a sports team with an unbeaten record and complaining that one player wasn’t scoring as much as they should.”
Terhorst-North, meanwhile, suggested thinking more in terms of happiness: “You can measure productivity by looking at how happy someone is, right? Because they’re willing to do their best work.”
He agreed with Beck, though: “Contribution analysis is a crock.”
But how far does this really get us? Isn’t happiness just another vague measurement? Aren’t we in danger of perpetuating the kind of criticisms from McKinsey, that technologists are unwilling to meet business leaders in understanding their needs? Possibly, but perhaps it just requires more honesty and clarity about what building software actually involves.
However, Fenton highlighted that thinking about emotional state and psychology can actually have a real impact in terms of how we think about developer productivity. In his blog response to McKinsey, he noted the importance of psychological safety, citing a Google study that illustrated its importance.
“There are only so many hours in a day, so a direct cost of low safety is the time spent creating evidence of work,” he said to The New Stack. “Something you might do after a quick conversation becomes an exercise in making statements of intent and obtaining sign-off.
“You can measure productivity by looking at how happy someone is, right? Because they’re willing to do their best work.”
—Dan Terhorst-North, software consultant
“For example, if you send or receive emails confirming conversations and asking for permission to continue as agreed, two people are wasting time on the artifacts of low trust.”
For Fenton, the keys to greater developer productivity are obvious. We shouldn’t just think of productivity in terms of measuring output (like lines of code), but rather about what we can do to minimize friction in people’s day-to-day work — ensuring mutual trust so people can get on with solving problems.
This way of thinking is attractive and could be cause for optimism in this long-running debate. There is surely a way to think about productivity that is not punitive and activity-centered but instead centered on the needs of people actually doing the work.
“I think that a lot of what we’re sort of fumbling towards is like love,” Lam said. “How do we care for each other professionally? What kinds of things help us do our jobs better?”
This is the direction the conversation needs to go in. While that might be difficult given economic anxiety and uncertainty, if there’s a way forward it could be a lot worse than being informed by love.