How Governance in the OpenTelemetry Project Works
In this article, I’m going to discuss how we approached money and alignment when we started the OpenTelemetry project. I’m happy about how we architected both the governance and the design of the project to support those who contribute, and how that success has created a large, healthy community.
Well Defined Democracy
First, let’s talk about governance. OpenTelemetry does not have a benevolent dictator. In my opinion, that would be an inappropriate model for a project that wishes to become an open standard.
Instead, we have a governance committee. The committee was initially seeded with founding members, but all positions going forward are elected by the project contributors. In my experience, contributing engineers tend to vote for other contributing engineers. This keeps the project healthy by ensuring that it is governed by respected, active participants.
The responsibilities of the governance committee are clearly spelled out in our charter, which the wonderful Sarah Novotny guided us in drafting. Contracts are for handling conflict, so it is not really fun to write them. But having everything spelled out prevents a number of potential issues.
You want these rules written down before something looks like a problem, not afterwards. For example, the charter ensures that no one company can be over-represented on the committee. This ensures that no one company can hire its way into a project takeover.
In fact, every form of decision-making in OpenTelemetry is defined by a role. Each role has requirements for obtaining it, and responsibilities that go with it. Again, kind of boring to write. Do we really need to define what a Maintainer is? Yes. Yes, we do. This allows for objectivity and guidance when it comes to transferring these roles, which will happen in a large and long-lived project.
The decisions themselves also follow a similar pattern. To keep the project on track, use a specification. Changes to that specification are handled through an RFC process. It is also an open process: anyone can submit an OTEP (OpenTelemetry Enhancement Proposal).
Having an official process has been really helpful, because it ensures that all of the decision-making happens in public and is recorded on GitHub. When we meet over Zoom, which is frequent, those meetings also happen in public — with the recordings uploaded to YouTube.
Doing all of this work upfront took some time. But I believe that having this structure to lean on made a difference as the project grew to hundreds of contributors spread across a large number of working groups. For large projects, defining roles and processes is just as important as defining a code of conduct.
Clear Project Boundaries
This kind of explicit guidance also extends to the architecture of OpenTelemetry as software. System design and community design interact and influence each other. For OpenTelemetry, there were two important design decisions that have helped to build trust. This can be more subtle than governance structure, but it is also worth considering when you design your own project.
A major goal of the OpenTelemetry project is to create a universal telemetry system, which can describe any computer system and then transmit those observations to any observability backend without the need to reinstrument an application or library.
By observability backend, we mean a data storage and analysis tool. Prometheus, Jaeger, Zipkin, Lightstep, DataDog, XRay, Stackdriver, etc. Everybody! We would like all of these systems to consider adopting OpenTelemetry as their instrumentation system and to contribute to the project. By working together to build a shared telemetry system, something like a standard language for describing the operation of computer systems becomes a feasible goal.
Defining a clear edge to the project has helped us to achieve this goal. Early on, the OpenTelemetry project declared that it would never develop its own backend. This avoids the risk that the project might become a closed system at some point, and cease to care about the requirements of other systems. Having that boundary clearly stated made it clear to potential contributors and neighboring projects that we were not some kind of scary threat looking to replace them. This allowed many of those projects to take the leap and begin contributing heavily to the project.
This rule also makes it clear as to how you are supposed to use this project to make money — build a really awesome and novel analysis tool, then sell it to people! Since you don’t need to build the whole telemetry pipeline, you can focus your efforts on improving the kinds of analysis you provide. I believe we will see a lot of innovation in this field over the next several years, based on doing novel things with OpenTelemetry data.
Shared Ownership, Shared Credit
A huge source of tension in a large project can come from one contributing company “owning” a portion of the project, and attempting to blend their company brand with the brand of the project itself. AKA, hog all of the credit. If you were part of the Container Wars, you may remember some of this and the awkward mess it tended to make.
We created a rule that any code which is to be considered part of OpenTelemetry must live within the OpenTelemetry GitHub organization and have the copyright moved to the OpenTelemetry authors. That, combined with some basic marketing guidelines, has helped us stay aligned as a group, which promotes the OpenTelemetry project as a whole.
Clean Separation of Concerns
Another major goal was to provide a way for shared libraries to instrument themselves natively. This would allow the authors of web frameworks, HTTP clients, and other OSS projects to provide their own observability data. The authors of these projects are the experts; they know what information is important when it comes to tuning or debugging the libraries and services they provide.
In order to ensure that native instrumentation was feasible, we implemented a strict separation of API from implementation. Any code which instruments using OpenTelemetry only pulls in the API, which has very few dependencies and strict backwards compatibility guarantees. This ensures that OpenTelemetry will not create a dependency conflict when the OSS libraries that use it are composed together to form an application.
This API separation means that no one who instruments with the API is required to use the implementation we provide — it is possible to plug in an alternative implementation. Or use no implementation at all. The API calls become no-ops if no implementation is installed.
We also ensure that the Collector, while awesome, does not become a required component when running OpenTelemetry. Likewise, we don’t ever want to force anyone to adopt OTLP.
This a la carte approach allows for the kind of flexibility a universally adopted project would need. We want users to feel like they run each OpenTelemetry component because they want to, not because they have to.
Design For Happiness
The above design choices, both for code and for project governance, came with extra work. It definitely would have been easier, at first, to avoid the above commitments. But having all of this structure in place has led to smooth adoption and a pleasant, low-politics community, in spite of the scale and amount of corporate interest.
I don’t believe there is a one size fits all solution for every project, but I hope you find the guidance here helpful, when your own project takes off.
To learn more about OpenTelemetry and other cloud native technologies, consider coming to KubeCon+CloudNativeCon Europe 2021 – Virtual, May 4-7.