This Week in Numbers: Making Sense of Open Source Projects
Assessing who contributes to an open source project is not as easy as running a few queries against the GitHub API. Yet, as we continue to see, that doesn’t stop people from trying.
RedMonk’s James Governor riffed on the risk of using GitHub stars. We concur because they can be gamed and are only a measure of ephemeral popularity among a subset of developers. As we wrote about before, GitHub organizations are a poor way to measure a corporation’s open source activity.
A better, but still limited way is to look at contributors’ email address to identify their employer. Google’s Felipe Hoffa uses this approach to report that Microsoft had over 1,300 employees push a contribution in 2017, with Google providing 911 and Amazon 134. As Felix explains in his methodology, the data was filtered to exclude many legitimately active repositories and contributors.
Furthermore, it excludes participants that are only submitting issues and entire organizations that use version control systems for most of their development activity.
Yet, it is possible to conduct a similar analysis for non-GitHub projects. For example, the 2017 State of Linux Kernel Development identified the top organizations involved with recent Linux kernel development. Unsurprisingly, Intel, Red Hat and Linaro topped the list, with many chip producers making an appearance. Yet, the report also said over eight percent of contributors came from a company called “none” and another four percent came from “unknown.” This was not a mistake. While unknown means that the contributor’s identity and or affiliation could not be identified, “none” represents developers that may have a day job with a bit company, but are doing the work on their own.
What is someone to do? Can they use GitHub stats to identify winners and losers? Can data “prove” that, for example, that AWS is not as committed to open source as other cloud providers? Companies should probably benchmark their software development processes, but against what metrics?
While there are no easy answers, The Linux Foundation is supporting standardization efforts with the Community Health Analytics Open Source Software (CHAOSS) project. Within this group is a Metrics Committee trying to define implementation-agnostic metrics for assessing open source communities’ health and sustainability. The CHAOSS Metrics Committee goal is to establish implementation-agnostic metrics for measuring community activity.
In other words, they’re getting together lots of people that have done this type of analysis in the past and working towards consensus on common variables and ways to apply data to various use cases. Perhaps one day we’ll be able to argue about the data without arguing about the methodology. Until then, stay tuned for more open source software analysis from The New Stack.
Although an open source project has many stakeholders, The New Stack usually evaluates a project from the point of the view of 1) a developer; 2) a community manager; or 3) a corporation. Below is a brief framework for how a corporation can assess its involvement with different OSS projects and foundations.
The New Stack believes that to be competitive, technology providers will need to continuously improve price/performance and meet customer demands. If those two overarching issues are addressed, then business leaders still have to evaluate both open and closed source technologies based on factors such as risk and profitability.
A Few Project Health Considerations
- Common activity metrics are a number of contributors, commits, forks and GitHub stars. Generally, the more of these you have, the healthier the project. A significant challenge is how to weigh recent activity versus historical involvement.
- How fast pull requests are accepted and issues are resolved to point to project organization. Professional software engineering groups will often address outstanding issues. Other projects require governance structure to define community rules. Process definition is an indicator that the end-product will maintain a high level of quality.
- Governance structure and diversity of corp. involvement is critical to project health. Without strong diversity, tech decisions inevitably lean towards one company or another.
Things to Think about When Evaluating Project Participation
- For projects initiated internally:
- What is its purpose? Does it address an internal technology problem or a customer demand?
- If related to price/performance, is it the best approach, or are other projects better?
- For projects contributed to:
- What’s the impact of adding another developer from the corporation?
- What benefits would the corporation receive by providing the project with financial support?
- Is the primary goal to enable the corporation’s integration with an external project? If so, how much involvement is needed?
- For projects the corporation does not contribute to, but that are in its stack:
- Does the corporation want to influence the project direction?
- Does the technology represent a demand from customers? If so, what can the corporation do to better serve its customers?
- Should the project be offered as a service?
- For projects the corporation is only monitoring:
- Will the technology address a price/performance concern?
- Should the corporation develop a competitive offering with a different technology?
Feature image via PXhere.