Google Says You Might Be Doing DORA Metrics Wrong

We know 2023 saw staff cutbacks across the tech industry, which can negatively affect both job satisfaction and work output, as suddenly smaller teams get more stressed out. Add to that customer demands and technological complexity continues to grow at an impressive rate. So no doubt a lot of people in tech were eagerly waiting for Google’s DevOps Research and Assessment (DORA) team to finally release the State of DevOps Report 2023.
If for nothing else than to get a pulse on how the tech industry is actually doing. Because the way we work matters now more than ever. But a lot of organizations large and small use this report to check in on themselves, as a way to benchmark how they’re doing against others, and how they measure up against their past selves.
So what did the almost 3,000 respondents to this year’s DORA report say? And how should organizations that aren’t the size and sprawl of Google interpret and apply those famous DORA metrics? The New Stack sat down with two members of the report’s core team — Nathen Harvey, developer advocate and lead for DORA at Google Cloud, and Michelle Irvine, technical writer — to work out, nine years in, what the DORA metrics are and what software development teams of all kinds can gain from them.
What’s the Point of the DORA Report, Anyway?
Since 2014, the State of DevOps Report — fondly referred to as the DORA report or DORA metrics — has examined the people, processes and technical capabilities that drive performance. It remains program- and platform-agnostic and has grown to include about 50 questions, which have the respondents focusing on what they see at the application or service level they’re working on.
This year, 72% of respondents were folks who work on either development or engineering teams.
DORA is typically associated with its four main software delivery performance metrics — deployment frequency, lead time for changes, change failure rate, and failed delivery recovery time (previously called mean time to recovery, or MTTR) — but its key outcomes encompass a lot more across the sociotechnical spectrum:
- Organizational performance.
- Team performance.
- Software delivery performance.
- Operational performance.
- Reliability.
- Burnout.
- Productivity.
- Job satisfaction.
The final three — burnout, productivity and job satisfaction — are how DORA measures well-being. If you’re only measuring those core four software delivery performance metrics, you’re missing out on measuring a lot of the human aspects of tech organizations.
Collectively, these “key outcomes are the goals that we believe people, teams, or organizations are striving to either reach (organizational performance, for example) or avoid (burnout, for example),” the report reads.
The State of DevOps Report 2023 then examines core capabilities, which drive higher software delivery and organizational performance:
- Code maintainability.
- Continuous delivery.
- Continuous integration.
- Database change management.
- Deployment automation.
- Empowering teams to choose tools.
- Flexible infrastructure.
- Loosely coupled architecture.
- Monitoring and observability.
- Shifting left on security.
- Test automation.
- Test data management.
- Trunk-based development.
- Version control.
- Well-being.
“It’s things that we’ve researched multiple years,” Harvey explained, “and things that we see are meaningful and providing good insights for folks.”
But if you look through the questions, the DORA team never asks outright things like: Do you practice continuous integration? Because that could mean something different for everyone.
“In our research in the survey itself, we never use the term continuous integration. We avoid those terms and ask instead about the characteristics. So we put a Likert scale [like rate from 1 to 5] in front of an individual and say, do you strongly disagree or strongly agree? ‘Automated test failures will block commits’ progress through the pipeline. Automated tests are executed at least daily,’ and then the individuals respond to those characteristics,” Harvey explained to The New Stack.
“In our report, we summarize those questions and give them a label — in this case, continuous integration — because we want to stay away from those industry terms like DevOps — because what is DevOps?”
“One of the things that we found every year of this study is that those two things — throughput and stability — are not trade-offs of one another. You’re either fast and stable, or you’re slow and you’re unstable. And that’s a consistent finding that we’ve had for the decade that this research program has run.”
— Nathen Harvey, Google
The outreach for survey respondents is what Irvine called snowball sampling, where they reach out to previous respondents and across the usual socials, newsletters and websites. The DORA team aims for global reach across a wide range of organizational sizes and sectors, she emphasized, because while the annual survey focuses on software development, that doesn’t mean just the tech industry — after all, almost all organizations have a significant stake in technology nowadays.
In fact, organizations don’t even have to give Google their data — they don’t save any identifying information anyway. Instead, they could run the full 50-ish questions in the survey themselves, or just take the DORA QuickCheck, which focuses only on the core four metrics.
And while companies are always going to compare themselves against others and strive for elite status, that isn’t the main goal of this survey and subsequent report.
“The value of this is really like benchmarking and comparing it against yourself. And focusing on that continuous improvement. Because year over year, we do focus on these four key metrics because they just keep performing in the way that we’ve been seeing, and it’s just very meaningful from our data,” Irvine told The New Stack. “You measure those four key metrics and use that as a benchmark against yourself to see if the experiments you’re running or changes you’re making are having awful impacts or if they are moving the needle in a way that is positive for your team.”
Over the last nine years, the DORA team has also thrown other questions in the mix that they suspect will drive the desired key outcomes, which, she said helps organizations with “identifying in your own team and your own context what you can work on, what might be a bottleneck, what you have, what is in your control, and what that you might be able to play around with.”
Do Engineering Teams Even Understand DORA?
For a recent The New Stack VoxPop — our weekly five-second anonymous polls that pop up on our site — we asked: Are DORA metrics effectively used at your organization to measure developer team productivity? Among 541 respondents, just 14% are confident in their organization’s use of DORA metrics, while 30% are loading their backpacks for an adventure with Dora the Explorer. This quickie poll was inspired by a popular piece TNS colleague Laurence E Hecht wrote entitled, “Despite the Hype, Engineers Not Impressed by DORA Metrics,” which cited the Engineering Team Performance Report that came out last month from LeadDev and Swarmia. That report found that 29% of engineers surveyed said they didn’t know enough about DORA metrics to comment on whether they’re effective or not.
So when we got a chance to sit down with some of the researchers of this year’s DORA report, we had to ask for their response.
“I would posit that you could rephrase the question: Are metrics used effectively at your organization to measure developer productivity? You’ll likely see nearly the same results,” Harvey remarked. “DORA is a lot more than four metrics. The research program itself is much more than four metrics. Of course, it’s well-known for those four key metrics. And who are those four key metrics most for? They’re for your leaders and they can be easily abused by your leaders. They are also for us, the practitioners, and can be easily abused and misused by the practitioners.”
Of course, there’s a self-fulfilling aspect to all this. Those who want to measure DORA and respond to the annual DORA survey will be more likely to care about the survey results. And if they’re working to meet these capabilities, they would likely score higher.
“On the one hand, I think that the people for whom those metrics are best suited are the teams that are doing the work, and [DORA metrics are] best used to help guide: How do we improve on my team? What are the things that I can do better? That we collectively as a team can do better?”
Harvey continued that there are organizations that should be interested in DORA, but don’t self-identify as DevOps organizations and self-select out, like “People in the world that say, ‘DevOps is dead. I’m a platform engineer now.’ The truth is that people that identify as platform engineers can get a lot of value out of this research,” he explained. “The things that they want to do as a platform engineer is help improve how their teams are delivering and operating software. Guess what helps with that — the same capabilities that we’re looking into.”
Similarly, those who don’t care about the values of the survey will likely not be releasing reliably at a steady cadence and not worrying about things like developer productivity and overall well-being.
The main concern — and why some engineers wouldn’t know DORA the report from Dora the Explorer — is that, like all metrics or goals (as we know with OKRs), they’re only valuable when they’re communicated from the top down. DORA isn’t just numbers; it’s a culture.
In the end, so much of this comes down to the leadership and how they empower software delivery teams.
Key Findings from the State of DevOps Report 2023
“Without a healthy culture, none of this matters.” Harvey drove home one of the most important findings of this year’s survey — but certainly not the only one.
The Vast Gap Between Low and Elite Performers Increases
Of course, those core four DORA metrics still matter. For the first time this year, the DORA team has even gone from offering ranges (like 0 to 15) for change failure rate — when you have to fix something right away — to a full range of whole numbers from one to 100. With this new, more precise system, they found that the best performers had an average of 5% change failure rate, while the worst had one of 64%.
“Think about this, they’re deploying between once a week or once per month. So if you’re only deploying 12 times a year, six or seven of those are failing. Failing in a way that you have to fix them right away,” Harvey said.
Of course, 5% can still feel like a lot if you’re deploying hourly, but, he emphasized, “If you’re deploying 300 times a year and 5% of those fail, it’s very different than if you’re deploying 12 times a year and 64% of those fail.”
“Everything that we talk about, these outcomes feeding into software delivery performance, all of that feeds into organizational performance, like revenue and customer satisfaction.”
— Michelle Irvine, Google
Now you could be compelled to think that companies that release less have more time to make sure they get everything right and thus have a lower change failure rate. But that command-and-control mindset with waterfall project management is proven to be lost every time. Those orgs with the highest fail rate consequently had a significantly slower release cadence.
“One of the things that we found every year of this study is that those two things — throughput and stability — are not trade-offs of one another. You’re either fast and stable, or you’re slow and you’re unstable,” Harvey said. “And that’s a consistent finding that we’ve had for the decade that this research program has run.”
And while no metric is fully able to be gamified, gamifying the DORA metrics may actually not be a bad idea, he continued. If you work to shorten your change lead time, “That’s going to improve all of your other metrics as well and it’s going to be good for the entire organization.”
Establish a Generative Culture
This year, more than ever, emphasized the importance of psychological safety on performance. In fact, generative organizational culture — one grounded in high trust and information flow — was added as a core capability this year. This report found that teams with a generative culture experience 30% higher performance, with a dramatic increase in productivity and job satisfaction, alongside a decrease in burnout.
Build with Users in Mind
One key finding of DORA 2023 was that teams that focus on the user — whether external customers or internal platform customers — experience 40% higher overall performance.
As Harvey put it, it’s an ethos that focuses on, ”What is the user actually trying to accomplish? And are we actually thinking about the user and using their experience to help guide what we’re going to build next? This is no longer about just ‘Are we building fast?’ But, ‘Are we building the right thing? Are we building with users in mind?'”
These user-focused organizations also experienced a 20% jump in overall job satisfaction.
Code Review Speed Impacts Everything
Another result included the fact that teams with faster code reviews saw a staggering 50% increase in software delivery performance.
“Code reviews is sort of the culmination of people, process and tools,” Harvey remarked. “Just think for a minute on how code reviews impact the people on your team, and who’s doing them, whose changes get more scrutiny than others. How is the tooling supporting you? And how does your process support you?”
Devs (Still) Want Docs
In an astounding result, DORA 2023 found that quality documentation can have an estimated 12.8 times more impact on organizational performance.
This is the third year that the report has looked at internal documentation, which they’ve found has an impact on “different cultural, well-being outcomes, and technical capabilities, for sure. But then it also interacts with the technical capabilities to amplify their impact on organizational performance,” Irvine explained. “Everything that we talk about, these outcomes feeding into software delivery performance, all of that feeds into organizational performance, like revenue and customer satisfaction.”
The Wide-Ranging Potential of AI
This year’s DORA report also uncovered that respondents from underrepresented groups experienced 24% more burnout. This is influenced in part because, at least for women and other underrepresented genders, they’re doing 40% more repetitive work than men.
While these last three findings about code reviews, docs and burnout due to repetitive work may seem distinctive, each of them has high potential for AI and automation to dramatically improve both the quality of code and of work. Yet this year’s report found that, on average, only a third of respondents saw the importance of AI’s contribution to technical tasks. This will undoubtedly increase next year.
We already know that AI has dramatically increased knowledge sharing, which in turn increases flow to unlock developer productivity, as well as increase job satisfaction while decreasing burnout, frustration and time wasted. It remains logical that an increase in safe, domain-specific AI will continue to enhance these benefits, with its early use cases that exhibit the ability to boost developer productivity with AI.
Why Does Google Bother with DORA?
One last question that felt important to explore was why Google invests so much in this report — and why it bought DORA five years ago. One reason is that there’s now a DORA community that works together to enable a culture of continuous improvement. But it’s more than that.
“I think from a very commercial perspective, those teams that are getting better at delivering and operating software and happen to be doing so on Google Cloud, are going to consume more Google Cloud, but I would say that is absolutely secondary,” Harvey remarked. “At the end of the day, I think that having teams that are better able to deliver and operate software helps all of us like a rising tide lifts all ships.”