A Guide to Measuring Developer Productivity
Are your developers as productive as they could be? Is this question even answerable? And, perhaps most importantly, who’s asking, and why?
Lots of people say, “Yes,” developer productivity can and should be measured. Organizations have invented many frameworks for measuring developer productivity, ranging from DORA to SPACE to all sorts of proprietary ones, including one by consulting firm McKinsey & Company which was introduced in late August and ignited a storm in the DevOps community.
But many observers argue that measuring the wrong thing or applying these metrics frameworks in the wrong way not only doesn’t give you useful information about your development team but can actually make them less productive and more likely to either try to game the system or walk away.
That’s because development work is complex and multifaceted and can’t be easily calculated with the black-and-white kinds of metrics in the way salespeople can be evaluated by revenue or recruiters by the number of successful hires. Numbers simply won’t tell the whole story. Qualitative assessments are also necessary, as mounting evidence points to a direct relationship between developer well-being and productivity.
Let’s review the types of developer productivity measurements commonly used today. We’ll delve into the backlash against placing developers into a surveillance culture and point out the pitfalls of various popular metrics. Finally, we’ll describe how modern observability removes some of the top barriers to productivity that developers face with today’s cloud native environments and actually improves developer productivity.
Some Definitions to Frame the Discussion
Let’s start by defining key terms. What is developer productivity? Generally, it refers to the ability of a developer team to efficiently and consistently write and deploy high-quality code that delivers value to the business.
You see right away all the questions that might arise:
- What is considered efficient?
- How do you judge high-quality code?
- Why focus on individual developers versus teams?
- And how do you measure value to the business?
Then there’s the fact that developers are humans, not machines. They are not happy to be put under a microscope or to have their work reduced to numbers. That’s why any measuring of developer productivity must include qualitative assessments as well as quantitative ones.
To begin answering these and other questions, it’s important to understand that four different aspects of work — any type of work — can be quantified.
- Inputs — Some industry observers call this effort. In the world of software, this would involve how much time, energy, thought and creativity has gone into development activities such as designing, coding, testing and debugging.
- Outputs — Tangible things that are delivered as a result of the inputs. These can include a requested software feature or the code itself, as well as any documentation.
- Outcomes — What changes ensue in response to the inputs and outputs? Will employees do their jobs differently because key business processes have been re-engineered? Will customers change their behavior?
- Impacts — What value accrues to the business? Are employees more efficient? Are customers buying more products?
How are Businesses Measuring Developer Productivity?
The two most common metric frameworks used to measure developer productivity are DORA and SPACE.
Using DORA to Measure Development Outcomes
Named for Google’s DevOps Research and Assessment (DORA) team that created them, the DORA standards measure outcomes. The DORA group identified four metrics for DevOps teams, with the goals of improving developer efficiency and being able to communicate results that will have meaning for business leaders.
The four metrics are divided into two buckets: velocity and stability because both are vital to ensure teams don’t over-emphasize speed over quality.
- Deployment frequency — How frequently the team successfully releases code changes to production (this measures velocity)
- Lead time for changes — How long it takes for a commit to get into production (measures velocity)
- Change failure rate — Percentage of deployments causing a failure in production (measures stability)
- Time to restore service — The time it takes to recover from a production failure (measures stability)
DORA metrics are used to classify teams as elite, high, medium and low-performing to drive improvements. According to Google’s internal measurements, elite teams are twice as likely to meet or exceed their organizational performance goals than teams in other categories.
SPACE Goes for Less-Quantifiable Assessments
A second portfolio of measurements is called the SPACE metrics (short for satisfaction and well-being, performance, activity, communication and collaboration, and efficiency and flow). SPACE was co-developed by GitHub and Microsoft to bolster the DORA framework, which was perceived as lacking focus on the admittedly difficult-to-quantify state of developer happiness.
- Satisfaction and well-being — Surveys ask developers about these important aspects of productivity such as whether they would recommend their team to others, whether they have the right tools for their jobs and if they were at risk of burnout.
- Performance — The closest measures are outcomes rather than outputs or even impacts because a developer might deliver a high volume of code, but it might not be of sufficient quality. Likewise, even high-quality code might not be enough to induce customers to change purchasing behavior. Often performance evaluations come down to a binary question: Does the code do what it was designed to do?
- Activity — By simply counting outputs such as builds, tests or incident mitigations, you can get some sense of productivity, but activity on its own isn’t really good or bad. Does a high volume of PRs automatically mean high productivity? Not if your team is making a lot of pull requests (PRs) to revert things or fix issues. Bottom line: Activity numbers should never be used by themselves, out of context. Still, assessing activity can add some data to the big productivity picture.
- Communication and collaboration — As another difficult-to-quantify attribute, communication and collaboration can be measured by proxies such as how quickly code is integrated, assessments of work review quality by team members and onboarding time for new team members.
- Efficiency and flow — Flow is an important concept for many developers, who describe it as being able to work without interruptions. You can attempt to measure this by counting the number of handoffs required in a process, by surveys asking developers about their ability to stay in the flow, by the number of interruptions in a process and by other like metrics.
Some Other Common Developer Productivity Metrics
Either as part of DORA or SPACE, or as standalone metrics, the following are also used by organizations to measure developer productivity:
- Cycle time — This is the time from first commit to production release, or from beginning to finishing work on an assignment. In general, shorter cycle times are considered better, but they shouldn’t be accelerated at the expense of quality.
- PR size — A pull request takes place when a developer is ready to begin the process of merging new code changes with the project repository. This allows developers to create new features or fix bugs without affecting users or worrying about breaking the overall service or application.
- Investment profile — This enables teams to visualize where they are spending their resources and time. This helps management do a better job of distributing work based on business priorities.
- Planning accuracy — Planning accuracy is the ratio of how many story points were finished from the total planned for an iteration. This is a good metric for honing sprint planning.
Behold the Backlash
Then there are the new McKinsey metrics that have the developer community up in arms. McKinsey isn’t alone in thinking that DORA and SPACE don’t go far enough. McKinsey says its methodology complements DORA and SPACE with new “opportunity-focused metrics,” pointing out that they are necessary because software development is changing so rapidly due to generative AI tools such as ChatGPT. McKinsey’s own research found that such tools have the potential to enable developers to complete tasks up to two times faster.
Some of the new metrics McKinsey proposes include a “developer velocity index benchmark,” “contribution analysis” and “talent capability scores.”
Developer and father of extreme programming Ken Beck wrote on LinkedIn, “The report is so absurd and naive that it makes no sense to critique it in detail.” In a later post, he added, “Why would I take the risk of calling out a big, influential organization? It’s because what they published damages people I care about. I’m here to help geeks feel safe in the world. This kind of surveillance makes geeks feel less safe.”
Gergely Orosz, who blogs under “The Pragmatic Engineer,” site co-wrote a two-part rebuttal to the McKinsey article with Beck. One of the things the authors concluded was that it was certainly a worthy goal to try and make development teams more accountable to the business, in the same way that sales and human resources (HR) teams are.
But to help developers become more productive, without causing harm, the goal has to be to develop and sustain high-performing teams, which Orosz and Beck defined as “teams where developers satisfy their customers, feel good about coming to work and don’t feel like they’re constantly measured on senseless metrics.”
The problem with the wrong metrics — or misapplying the right ones, say Orosz and Beck, and others who weighed in — is that measuring invites developers to change the way they work so they win against the system. Start judging your developers on how many lines of code they produce and you’ll get plenty of code, but quality may well suffer.
Tech journalist Bill Doerrfeld, blogging at DevOps.com, agreed, pointing to what British economist Charles Goodhart wrote, which Doerrfeld summarized as “when a measure becomes a target, it ceases to be a good measure.” This can cause overall developer culture as well as quality to deteriorate.
So leaders must be very clear on what the real targets of developer productivity are. Do you want higher-quality code that makes an impact? Then do your best to measure those things. As a case in point, Google analyzed developer inputs and outputs on a broad range of parameters and found that improved code quality correlated with increased developer productivity.
What to Measure: Team or Individual Developer Productivity?
Generally speaking, most savvy CTOs don’t try to measure the productivity of individuals. Most industry observers, and developers themselves, believe that a successful DevOps organization is not just individuals who work independently, but a cohesive team that together produces valuable products and services.
Developers are constantly collaborating and interacting, and much of this cannot be measured because of the interdependencies and nuances. For example, some team members might not produce a lot of code, but they are invaluable to colleagues because of their help, advice and expertise.
Team productivity, on the other hand, is much more visible. Managers or HR professionals who want to assess individual performance for annual reviews or other employment milestones should invest in developing organizational best practices for people management, such as having regular one-on-one meetings; soliciting anonymous feedback from all team members and encouraging individuals to exercise personal accountability.
Much of this is based upon the culture of the DevOps team, rather than any systemic approach to track productivity.
Avoid Common Mistakes When Measuring Developer Productivity
Problems with Input Measurements
The issue with depending on inputs, or efforts, such as hours worked, is that it encourages the wrong behaviors. If the company culture is to value — and reward — hours spent in front of a screen, developers will almost certainly put in the hours, but of what quality will the work be when it is delivered? In more toxic environments, it can even turn into a competition over who comes in earliest and stays latest.
Problems with Output Measurements
Some of the worst metrics fall into this category, such as counting lines of code or commits. And gaming a measurement like that is easy, as developers can churn out lines of code quite quickly. Any output metrics need to be in context.
Problems with Outcome and Impact Measurements
The challenge with these two types of metrics — outcome and impact — is figuring out how responsible the DevOps team is for them. As Orosz and Beck point out, if you try to measure increased profits for the business, it’s nearly impossible to attribute that solely to developers. However, these are possibly the closest metrics to reflecting business goals, which is ultimately the point of measuring developer productivity.
Ways to Improve Developer Productivity
Developer productivity is influenced by a broad range of factors. Here are some of the most important ways you can improve it.
- Nurture the right culture — This is probably the single most important factor. You want to promote the sharing of expertise and knowledge and make processes and operations as transparent as possible to avoid misunderstandings. Work-life balance, reducing stress and physical and mental health should be priorities. Management should be supportive and provide sufficient resources with realistic timelines, appropriate task assignments and constructive feedback. Finally, the work environment should be as free from distractions as possible.
- Provide the right tools — High-quality and appropriate development tools, such as IDEs, debuggers, languages and frameworks can have a huge impact on productivity. So can observability solutions when it comes to reducing troubleshooting and getting to root causes fast.
- Institute proven processes — Development methodologies such as agile, scrum or Kanban, if implemented effectively, can streamline the development process and significantly improve productivity.
- Deploy automation where appropriate Offer ample training and learning opportunities. Emphasize code quality — By stressing quality over quantity or even velocity, code will be easier to maintain, decreasing technical debt and making future changes easier.
Here are four ways the Chronosphere cloud native observability platform helps improve productivity:
- By deploying tools that automatically present each team with data relevant to them, it cuts out all the other noise.
- It empowers DevOps teams to analyze their data to understand what is useful and what is waste.
- It rapidly loads real-time dashboards so less time is spent worrying about metrics and more is spent working on value-added activities.
- Makes tools usable for all levels of engineers, not just power users.
It’s worth noting that the ultimate goal is not just to be more productive in a vacuum, but to produce valuable, high-quality software in a sustainable way. A best-of-breed observability solution can help do just that.