How Google Unlocks and Measures Developer Productivity
The time of rapid growth is on hold, leaving engineering teams trying to do more with less. Tech giant Google isn’t immune to this after laying off 6% of its staff last January. And no matter where you are, tighter customer budgets are driving greater demand to release differentiating features faster.
Unlocking productivity for one of software development’s biggest expenses — the humans making it — is more important than ever.
Developer productivity research measures an engineer’s ability to produce a certain amount of work in a given time. This discipline studies not only the end result but what socio-technical factors influence it. More and more, it also attempts to measure developer experience, as it’s proven that DevEx drives productivity.
After all, software development is first and foremost creative work, meaning any effort to improve developer productivity should focus on both human-to-computer and human-to-human interaction among people, processes and technology. Which is harder than you think, as the human experience is rarely multiple-choice.
Developer productivity research is also a nascent topic as developer experience in general tends to be hard to measure.
In a recent episode of the Engineering Enablement podcast, host Abi Noda interviewed Ciera Jaspan and Collin Green, who together lead the engineering productivity research team at Google. At Google, engineering productivity across tens of thousands of engineers comes down to “delivering frictionless engineering and excellent products.”
In this post, we reflect on the latest research and lessons from the engineers, user experience (UX) researchers and psychologists that look to measure and enhance the developer experience and productivity at Google.
The Set-up: Who’s on the Team
Google’s engineering productivity team has about 2,000 engineers, mostly focused on making developer tools and processes more effective. Within, there’s a much smaller team that focuses on engineering productivity research — not necessarily the how, but more the why, when, what and how much.
It’s a mixed-method team that does both quantitative and qualitative research. It also is a mixed team of about half engineers and half user experience researchers, with folks who’ve previously worked as behavioral economists, social psychologists, industrial-organizational psychologists, and even someone from public health.
The social sciences background, Jaspan said, provides the necessary context. Logs analysis — a common starting point for developer productivity research — only provides part of the picture. “It tells you what developers are doing. But it doesn’t tell you why they’re doing that. It doesn’t tell you how they feel about it, [or] if what they’re doing is good or bad. It doesn’t tell you if there’s room for improvement. It only gives you a number, but you can’t interpret that number,” she said on the podcast. “Unless you have more of the qualitative side of the world, and you understand the behaviors and how those behaviors change over time, depending upon how you change the context.”
This is why the productivity research team hired their first UX researcher about five years ago to help design better surveys. Then, by pairing the UX folks with engineers, they are able to optimize not just what they were asking but the when and how. For example, this pairing enabled experience sampling, integrating surveys at the moment developers are running a build. The engineers can help provide both firsthand experience and technical solutions that scale UX research.
“The direct access to subject matter experts who are way deep in it and who are at the top of their field is a really powerful augmentation to have in this quiver of arrows that is behavioral research methods,” Green said. “The domain expertise, the scalability, and the technical skills from the engineering side, combined with the wide variety of behavioral research methods and a facility accounting for things like bias, and the way people work, and what to watch out for in survey responses or interviews,” from the social scientists combine for UX research in a way that may be unique to Google. The UX folks have uncovered nonresponse bias and the engineers have discovered upstream bugs because things simply didn’t look right.
Developer Productivity Is an Org-Wide Goal
This team’s first customer is the first-party developer team which builds the developer tooling for the whole org. The goal is to help them make improvements to infrastructure tooling, processes and best practices.
“When they want to, for example, understand what makes developers productive and what could make them more productive, our data [and] our research is one of the places they go to understand how to even measure that,” Green said.
The productivity research team also collaborates with other teams including operations, real estate and workspaces, corporate engineering — who create tools for all Googlers, not just engineers — and other teams that can effect the overall developer experience. And then, of course, the learnings from developer productivity could benefit other non-technical teams. So long as cross-company communication ensues.
“So when you focus on engineering productivity, you’re focusing on a big chunk of the Google population and so there’s wide interest in what we find,” Green said.
The Google engineering productivity team also acts as a conduit among different dev teams. As Jaspan said, “The company’s really big. People are doing different types of development. The people building the tools may not know about all the different types of work being done.”
All this makes for what Green calls a “playground of well-formed data” paired with engineers who have real experience with the problems at hand.
Speed, Ease and Quality Drive Productivity
So, if you had Google’s engineering budget, what would you measure?
With the rise of platform engineering and the consolidation of cross-organizational tooling, it’s become easier to track the technical developer experience. What’s still challenging is the effect of that technology on its human users and the effect of the people and processes around that experience. No single measurement could begin to capture that.
The developer productivity research team, Jaspan said, upholds a philosophy: There is no single metric that’s going to get you developer productivity. From here, she explained, the team triangulates across three intersecting axes:
For example, Green once proposed – tongue in cheek, to make a point – that the quickest way to improve productivity would be to remove code reviews — which of course everyone resisted because, while it’d increase speed and ease of release, it’d decrease quality. And the team’s research has proven that code quality improves developer productivity.
For speed, they do measure logs, but they also measure engineers’ perception of how fast they think they’re going, as well as diary studies and interviews. Jaspan said, “It is both using multiple measures, but also making sure that they’re validated against each other.”
Mixed-Method Research Validates Data
To have a deeper study of Google’s software development behavior, the team performed a cross-tool logs study, ingesting logs from multiple developer tools. They also performed a diary study, in which, every few minutes, engineers wrote down what they were doing. They compared the two in order to create confidence in the data logs. Since each engineer works and perceives their work differently, it can become an apples-and-oranges situation, so they apply what’s called interrater reliability to calculate the agreement between the two studies.
“We assume there is some truth out there that we can’t directly observe without like sitting next to the developer and probably bothering them,” Green said. “And so we take these two sources and we say: Are these two lenses telling us about the same world?”
The data log study can be performed at scale passively, without having to bug engineers at all, while the diary studies can only be done by up to 50 engineers at a time — and it has the possibility to become annoying.
“Once we’ve sort of found good evidence that we’re getting the same information from the two sources, then we can like lean into the scalable method,” he explained.
Technical Debt and the Engineering Satisfaction Survey
Since 2018, another powerful measuring tool at Google has been the quarterly engineering satisfaction survey, which goes out to about a third of the engineering force at a time. Green admitted that executives were reticent about this measurement at first because it’s “just people’s opinions.” During the pandemic lockdowns of 2020, the survey first revealed an uptick in productivity, followed by a big dip the next quarter, as time at home often alone continued.
It’s proven that technical debt negatively affects developer morale, as well as decreases development speed, so it’s not surprising that, early on, the survey featured two questions on the impact of technical debt on productivity:
- What are the underlying causes of technical debt that you encounter?
- What mitigations would be appropriate to fix this technical debt?
Over the years, in response, Jaspan and Green’s team combined responses until they settled on 10 categories of technical debt that could be hindering engineering productivity:
- Migration is needed or in progress.
- Documentation on project and/or APIs is hard to find, missing or incomplete.
- Poor test quality or coverage.
- Code quality is not well-designed.
- Dead and/or abandoned code has not been removed.
- The codebase has degraded or has not kept up with changing standards.
- A team lacks necessary expertise.
- Dependencies are unstable, rapidly changing, or trigger rollbacks.
- Migration was poorly executed or abandoned, maybe resulting in maintaining two versions.
- Release process needs to be updated, migrated, or maintained.
Engineers can choose any or all options. The resulting data has uncovered differing technical debt interventions needed for different audiences like machine learning engineers versus backend engineers. They also slice the data along organizational lines to show and compare progress in conquering this debt.
The paper on this technical debt question acknowledges that survey-based measures are a lagging indicator — it only emerges as a real problem when that technical debt has become severe enough to hinder engineers. However, after exploring 117 metrics, the Google team has yet to identify and predict when technical debt will soon hinder productivity.
They’ve also added four questions on how teams are managing debt, as they look for continuous improvement.
As this survey became more important to the organization as a whole, engineering VPs started requesting their own questions. That was helpful for a while but then the survey had to be streamlined back down. Now, a different UX researcher is in charge of the survey each quarter with the support of a different engineer, alongside team feedback. Green admitted the survey is still rather “hefty.”
No matter what the size (and budget) of your organization, you are encouraged to invest in a mix of automated and measurable, and observational and experiential research to understand your developer experience and the productivity it supports or hinders.
Just remember that the metrics will change as your teams and your code changes. As Jaspan said, “We know there’s not a single metric for developer productivity, so we try to use all these different research methods to see: are they all aligned? Are they telling us the same thing is happening? Or are they misaligned? In which case we need to dig deeper to figure out what’s going on.”