Metrics-Driven Developer Productivity Engineering at Spotify
At the crux of any platform engineering success or failure is your ability to get enough developers to adopt your platform and then your ability to measure if it is actually helping them. Except, like the discipline itself, developer productivity metrics are inherently socio-technical, which makes them challenging to accurately gauge. And then, how do you align your platform metrics to the organization’s overall goals?
“I have a strong bias for platform-focused development and developer productivity,” Laurent Ploix, engineering manager on the Platform Insights team at Spotify. Over the last three years, his team has worked on making data-informed decisions, based on a mix of platform engineering, data science, research and development, and product management.
In the lead-up to last week’s DPE Summit on developer productivity engineering and developer experience, Ploix gave The New Stack what he refers to as his opinionated view on metrics-informed development, which drives platform engineering at Spotify, and — by extension of the most popular open source IDP — the whole tech industry.
Searching for the Right Developer Productivity Metrics
There’s been a lot of huff and puff about developer productivity over the last couple months. In reality, companies like Google and Spotify have been tracking this for years now. And then a white paper on DevEx metrics was released last May.
Why so much focus in 2023 on measuring developer productivity? Why, it’s the year where the industry’s over-hiring has slowed and most teams are trying to do more with less. And, when the cloud native landscape is so lengthy and complex that developer flow states are constantly interrupted, resulting in unbearable cognitive load.
Spotify puts the different developer metrics on a scale, from very leading all the way to very lagging.
The lagging metrics, Ploix says, are the value- or impact-focused ones, like the long-term trends of revenue, monthly users, and user satisfaction. “They tend to [be] low noise like they don’t move that fast from one day to the next. They move slowly when we take action. They are kind of hard to relate to the action we take,” he said, as they are more indicative of long-term trends. Still, these lagging metrics are strategically important.
Common value-focused lagging metrics are revenue and end-user satisfaction.
These lagging metrics aren’t quick to tell you the magic thing to change in order to boost your engineering team’s productivity, but, Laura Tacho, engineering leadership coach and teacher of the course Measuring Development Team Performance, said they can:
- Track progress against goals and benchmark against yourselves.
- Add a quantitative perspective to known issues or trends in developer experience.
- Help create a narrative to explain your team operations, or defend project investments, to higher-level stakeholders like your exec team or the board.
On the other hand, are the leading metrics. These can be much more easily understood and therefore actionable, like the number of pull requests in a given day or build time. They also tend to be easier to measure, via both automated tooling and developer surveys, as these metrics live closer to the day-to-day data and developer experience.
“They’re going to be useful for tactical, short-term action. They move fast when we take action,” Ploix said. But he warned that they also “can easily turn into vanity metrics. They tend to be difficult to relate to the actual value that is created. And the most problematic part is it’s kind of easy to game.”
In the end, no matter which is preferred, both leading and lagging metrics matter. “Metrics which are both value-focused and actionable typically don’t exist. Stop looking for them,” Ploix emphasized. “What you truly care about, at the end of the day for the company, is value creation, but in a sustainable way. This is nevertheless a very lagging metric.”
Tying Developer Metrics to Organizational Goals
The Spotify platform insights team looks at a big priority metric — in this case, the organizational top-level objective of increasing end-user satisfaction — and then proxy perimeters to support it, like leading technical metrics.
Meantime to recover or MTTR is a factor into user satisfaction because, as Ploix said, “If we have fewer incidents or if the incidents are closed faster, we hope that the end users will be happier.” He says this is an example of one of the “bets” his team makes and then measures over time in an effort to align with cross-company objectives.
A way to decrease MTTR could be to focus on the site reliability engineering (SRE) experience, which had the team look to answer: “Are we going to fix problems faster if SREs are more efficient? And are SREs going to be more efficient if we have a faster log ingestion?”
Developer productivity metrics should always be tied back to organizational goals.
An engineering department could have an OKR on the lagging metric of MTTR and a platform team supporting SREs would have a leading metric of log ingestion speed. These would both be in support of the company-level OKR to increase customer satisfaction, which is measured by things like net promoter scores (NPS), active users and churn rate.
This emphasizes one of the important goals of platform engineering which is to increase engineers’ sense of purpose by connecting their work more closely to delivering business value.
“Productivity cannot be measured easily. And certainly not with a single accurate number. And probably not even with a few of them. So these metrics about SRE efficiency or developer productivity, they need to be contextualized for your own company, your tech stack, your team even,” he said, emphasizing that the trends are typically more important than the actual values. “That does not mean that we cannot have a productive conversation about them. But it does mean there is no absolute way to measure” developer productivity, knowing that proxy metrics will never capture everything.
Spotify has found that it’s really useful to align everyone in the company around OKRs and that by changing some leading metrics, they are indeed able to move some of the lagging ones.
The platform insights team has also uncovered three axioms that are needed to successfully connect developer productivity metrics to all levels of OKRs:
- The metrics you attach to the OKR must be sensitive to the change you implement.
- The change you implement must be aligned to value creation.
- The metrics you are trying to move need to be moveable within your OKR tracking period.
Through these mixed metrics-driven experiments at Spotify, they’ve also found that build time — specifically, the number of builds on the continuous integration pipeline per day — impacts developer satisfaction. And it’s been long held that happy workers are more productive, so satisfied developers should be more productive.
“The faster things are built, the more people can produce code and possibly [increase] deployment frequency,” Ploix said. “We also know that developer satisfaction has an impact on attrition. That might actually mean that build time has an impact on attrition.”
The platform insights team has also realized that when test coverage is done well, it can help cut technical debt. And it’s already proven that technical debt can have a negative effect on developer morale.
How to Get Started for Your DevEx Metrics
Don’t wait! The best way to get started on your developer experience or DevEx metrics is by getting started.
“Start by collecting data. Then try to grow some metrics from that, but the fact is it’s not going to be good,” Ploix warned. “You will not know if your data has bad quality until you have metrics,” so the only way to improve it is by starting that data collection. “Metrics are products that require iterations and have bugs. Deal with it.”
Data scientists should work with decision-makers to figure out what’s important to be tracking. And then, once you start collecting data, you’ll start noticing visible trends that can be understood across business and technical domains, which he said, from there creates knowledge and influences company culture.
Often, the perception of productivity is as important as the actual numbers, which can be a good place to kick off your developer productivity metrics journey.
“If you pair these kinds of workflow metrics with perception-based metrics — like those gathered through a DevEx survey — you’ll have an easier time identifying the right things to do to reduce friction in your development cycles,” Tacho wrote in a recent LinkedIn post. “Your team already knows where the inefficiencies are. They deal with the pain all day long.”
Spotify’s quarterly developer survey includes questions around developers’ perceived productivity. Of course, individual developers may not be 100% accurate, but the trends don’t lie.
They’ve also uncovered a direct link between tool satisfaction and developer productivity — or at least the perception of it. This quarterly Engineering Satisfaction Survey has a deeply sociotechnical approach to data collection and has a whole section on how devs are using tools and how they feel about them. It also asks engineers if they feel productive. Spotify has learned from this developer research that people who dislike tools feel less productive.
“People are surprisingly good at telling about what happens — productivity and blockers,” Ploix reminded. “Trust people! Ask them!”