Cloud Native / Containers / Development / Kubernetes

TNS Research: Developing a Methodology for Analyzing Open Source Communities

3 Dec 2015 8:41am, by

The New Stack is developing a methodology for analyzing open source communities. To begin this effort, we decided to look at the composition of several open source projects. In an initial analysis, we’ve learned a bit about the companies and the people who are participating in the development of OpenStack, Docker, Kubernetes, and other new stack technologies. Some of the initial research can be found in our recent ebook about the container ecosystem, where 50 of the 71 open source projects we cataloged had an identifiable corporate sponsor.

When we looked at the contributors of a few of these projects, we found in many cases, development was led by a single party. The table below shows that many of the more popular projects associated with containers are dominated by just a few companies.

Percentage of Contributions Coming From Employees: Select Projects in the Container Ecosystem

Project

Top Contributor

Secondary Contributor

Kubernetes Google 72% Red Hat 15%
Docker Docker 58% Red Hat 7%
Cloud Foundry Pivotal/VMware 67% IBM 11%
Mesos Mesosphere 49% Twitter 14%

The Cloud Foundry number requires a bit more explanation. Although, our initial analysis found that Pivotal contributed 58 percent of the Cloud Foundry code, we also found another 10 percent from VMware, which shares the same parent company — EMC — as Pivotal. An additional 10 percent of the contributions come from “bots,” or continual integration software pipelines that automatically submit code that could come from Pivotal or third-parties, such as IBM.  So Pivotal/VMware could be contributing as much as 77 percent of the Cloud Foundry code.

It is worth noting not all projects were dominated by a single company. Both Linux and OpenStack are more heterogeneous communities, compare to the projects listed above:

Percentage of Contributions Coming From Employees: Projects With a More Diverse Contributor Base

Project

Top Contributor

Secondary Contributor

OpenStack HPE 18% Red Hat 17%
Linux Intel 11% Red Hat 8%

Comparing OpenStack with Cloud Foundry contributions in a more visual way, would look something like this:

Chart_Corporate Affiliation of Foundation Board Members

If nothing else, the above numbers show that open source software development, at least for the enterprise, may not always be a community driven process. And this is nothing new: Open source has long enjoyed a strong helping hand from corporations. We plan to investigate in a follow-up article whether this a good or bad thing for our readers and the technologies they use. And we’d like to hear your feedback.

Methodology

To create the first table above, data about contributors was collected using a tool called Blockspring that accessed the GitHub API to pull information about contributors to specific repositories. Although each project has multiple repositories, TNS chose to focus on the primary repository for each.

Since GitHub does not identify a contributor’s employer, we identified this information as follows: TNS used company domain names that were in the email or website fields. However, because a majority of contributors provided Gmail addresses or no email address at all, we used other means to identify their employers’ name. Blockspring, for instance, has an algorithm that cross-checks a person’s email address and username across several social networks and databases. Clearbit and FullContact APIs were also used to collect information.

While none of these methods are perfect, they were accurate a vast majority of the time. For those people that still did not have company information, every personal website that was provided was reviewed. Additionally, if a real user name was provided, a search for the person on LinkedIn was conducted and then verified that their picture and other information was similar to what was included on their GitHub profile.

Note that the number of contributions reviewed differs from that seen on GitHub’s own dashboards because of how we counted contributions from merged repositories and those handled by bots.

The second table, with Linux and OpenStack, contains data from reports published by the organizations themselves, the Linux Foundation and The OpenStack Foundation, respectively. In both cases, they mined profiles from GitHub and other version control systems as well as the text of commits and communication about issue resolution.

Docker, Hewlett Packard Enterprise, IBM, Intel, Pivotal, Red Hat and VMware are sponsors of The New Stack.
Feature image Groucho Marx via Pixabay. Chart icons via Freepik.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.