Introducing OpenTelemetry in Your Organization: 3 Steps
To bring observability with OpenTelemetry (OTEL) into your organization, you need a rollout strategy to keep everyone on the same page and avoid different teams doing their own things. There are three steps to getting started with OpenTelemetry:
- Do your homework
Folks in your organization won’t know you want to use OpenTelemetry unless you tell them. This is where a little advocacy goes a long way.
Explain the Benefits of OpenTelemetry
Start by communicating OpenTelemetry’s benefits so that people in your organization understand why you want to use it. These include:
- No vendor lock-in: If you become unhappy with your observability vendor, it’s easy to switch to a different vendor without reinstrumenting your code.
- Vendor neutrality: Because OpenTelemetry is vendor-neutral, you can easily send telemetry data to multiple observability backends simultaneously. This is a great opportunity to do a side-by-side comparison of observability vendors to see which suits your needs.
- OpenTelemetry is active: OpenTelemetry is the second-most active CNCF project, behind Kubernetes.
- Standardization: OpenTelemetry is a standardized framework backed by most major observability vendors — it is here to stay! It also allows you to correlate all three key telemetry signals — traces, metrics and logs — which its predecessors didn’t enable.
Declare Your Intentions
Rolling out OpenTelemetry across an organization is a big initiative. By declaring that OpenTelemetry is happening, people in your organization know it’s serious, especially when the directive comes from leadership. Make the announcement through a combination of Slack, Teams or whatever collaboration tool your organization uses and a town hall.
But people just don’t want to be told what to do. Transformation fatigue or initiative fatigue are real, so you must…
Explain What OpenTelmetry Is and Why It’s Important
If you want people to follow you down this path, they need to know what they’re getting into. To explain what OpenTelemetry is and its benefits, use the resources you have in your organization, such as engineers who “geek out” on observability.
Put a call out to folks who are interested in OpenTelemetry, recruit them to be your champions and consider forming an observability practices team with them. It can serve as an advocacy team, focusing on the benefits of OTEL, creating practices around its implementation and rollout, and becoming subject matter experts in OpenTelemetry. Include engineers who can dig into OpenTelemetry to produce a set of practices within the organization and become the go-to folks for any OTEL-related questions. Recruit a mixture of individual contributors and managers. They don’t have to be OpenTelemetry experts; they can grow into that. What’s important is that they believe in it and want to help roll it out.
Also, connect with folks outside your company to learn how other organizations are rolling out OTEL. Join the OTEL End User Working Group (EUWG) on CNCF Slack to connect with fellow OTEL practitioners who can share tips. Some may be willing to speak with your engineers to answer burning questions or concerns. Also, I’m one of the co-chairs of the EUWG, so I can help make some introductions!
Develop a Plan
Create an OTEL rollout plan with milestones and dates for reaching them to demonstrate your commitment to the project. Make sure your timelines are realistic by getting input from your engineers and managers. Have them work with your observability practices team to put a plan in place, then communicate the plan.
During planning, ask your engineers:
- What are the critical path transactions in the system?
- What information is most important to you for troubleshooting an issue?
- How can we help you adopt OpenTelemetry?
2. Do Your Homework
You need to understand your system’s landscape to put your plan together accurately.
Do a Code Inventory
Your application code probably comprises multiple services. For each service, take inventory of the language it’s written in so you can determine what OTEL instrumentation library (or libraries) your dev teams need to use.
Also inventory any third-party frameworks and libraries (e.g., Python Django, Java Hibernate) you’re using, since OTEL auto-instrumentation is available for many popular libraries and frameworks.
Finally, identify your homegrown frameworks and libraries. More on that shortly.
Identify the Most Critical High-Value Transactions
Next, dig in a bit deeper to identify your most critical transactions. You’ll want to instrument them first because, according to OpenTelemetry co-founder and Lightstep director of developer education Ted Young, “It ensures that complete traces are being created, and you can start to investigate important issues early, without having to wait for the entire organization to complete their migration.”
Identify Any Previously Instrumented Application Code
If any code has already been instrumented, find out if it’s using OpenCensus, OpenTracing or something else. OpenTelemetry is backward-compatible with OpenTracing and OpenCensus, so you won’t need to make any major code changes initially. However, plan to eventually migrate over to OpenTelemetry to take advantage of all it offers. For example, OpenCensus and OpenTracing don’t support logs and metrics or the integration between traces, metrics and logs. If you’re using any homegrown libraries or frameworks, be prepared to reinstrument your application using OpenTelemetry.
Identify Metrics Sources
Along with application tracing data, you’ll want to send metrics data to your observability backend for a nice holistic system view. This means you need to identify your metrics sources. Is it Kubernetes? Kafka? Docker? Nomad? Virtual machines? Also, ask what application metrics you want to capture.
Now you’re ready to start instrumenting with OpenTelemetry. Here are some recommended instrumentation practices to help teams get started.
You Might Need to Put Some Application Features on Hold
If your system is experiencing frequent reliability issues, this is usually a sign you need better observability. Therefore, you might need to delay some planned features to instrument your code or reevaluate what’s already been instrumented.
Start with Auto-Instrumentation, If Possible
Consider Your Homegrown Libraries and Frameworks
You’ll eventually want to supplement your auto-instrumentation with manual instrumentation, so you’ll want to look at instrumenting homegrown libraries and frameworks. This will give you most of the tracing coverage you need as a chunk of your code will probably touch these libraries and frameworks.
Don’t Auto-Instrument Everything
It is possible to overly instrument, which means you end up with so much irrelevant data that it becomes hard to troubleshoot. This can often happen from auto-instrumentation. So, once you start auto-instrumenting your code, take a step back to see if the libraries being auto-instrumented are ones you need to collect instrumentation from. Fortunately, there are ways to limit what gets auto-instrumented, including in Java and Python.
Instrument as You Code
Just as test-driven-development (TDD) is about writing tests alongside your application code, observability-driven development (ODD) is the act of adding instrumentation as you write your application code. By instrumenting as you code, you know exactly what to instrument, as the code is fresh in your mind. It also prevents new technical debt related to observability, as you won’t have to go back to your code to instrument it later.
Instrument Your Own Code
Application teams should instrument their own code. They should never rely on an external team to instrument their code, because they know their code best. They work on it daily and know what to look for when they troubleshoot. It’s tempting to get a third party to instrument your application code when you’re in a time crunch, but it will not end well.
Deploy at Least One OTEL Collector Instance
Although you can send telemetry data directly from your instrumented code to your observability backend, you should use at least one OpenTelemetry Collector. The OTEL Collector acts as a central point for collecting and processing data from multiple data sources at once and then exports the data to your preferred observability backend for analysis. If you decide to change observability backends, you can send data simultaneously to multiple backends to decide which you like best by simply updating the Collector’s YAML configuration file. Once you’ve selected an observability backend, you just need to change the YAML in the OTEL Collector.
Rolling out OpenTelemetry in your organization is no trivial task, but having guidance for getting started goes a long way. Remember to communicate, do your homework and follow instrumentation practices. And if you get stuck, we’re here to help!
Additional Resources on OpenTelemetry
- Myths and Historical Accidents: OpenTelemetry and the Future of Observability, Part 1
- Data by Design: OpenTelemetry and the Future of Observability, Part 2
- What OTEL Is and Isn’t: OpenTelemetry and the Future of Observability, Part 3
- Native OSS Instrumentation: OpenTelemetry and the Future of Observability, Part 4