Why New Relic Supports W3C’s Distributed Tracing Protocol
New Relic sponsored this post.
Distributed tracing is an essential tool for developers working with highly distributed microservices applications, allowing them to track event interactions that traverse multiple microservices. But not all tracing tools have followed the same standards for passing contextual information, via HTTP headers, as traces move from service to service.
This lack of standardization has led to a jumble of mutually incompatible header formats — which can definitely be a problem when development teams within an organization pick their own tracing tools.
W3C Trace Context is a recommended standard that makes distributed tracing easier to implement, more reliable and ultimately more valuable for developers working with modern, highly distributed applications. The standard greatly simplifies use cases where developers instrument services using tools from different distributed tracing solutions. Now all tracers and agents that conform to the W3C Trace Context standard can participate in a trace. Trace data can be propagated from the root service all the way to the terminal service.
For nearly two years, New Relic has participated in the W3C Trace Context Working Group, helping to define the standard and shepherd it through the approval process. The W3C Trace Context specification has reached “recommendation” status, and now we’re excited to announce that we’ve launched support for the standard as it has reached full ratification.
The following New Relic APM agents now support the W3C Trace Context standard:
- Java 5.1.0 and higher.
- Python 5.5 and higher.
- Go 3.1.0 and higher.
The New Relic open source Elixir agent also now supports the standard, and we’ll soon be adding Trace Context support for other APM agents, as well as the New Relic Browser agent.
To get started, just update your agents to the appropriate version. (We explain backward compatibility below.) We then describe in detail about “what” standard means for distributed tracing and observability on the New Relic platform.
The Trouble with Distributed Tracing Today
Every distributed tracing tool requires a way to “correlate” each step of a trace, in the correct order, along with other necessary information to identify and diagnose performance. This involves assigning a unique ID to each transaction, assigning a unique ID to each step in a trace, encoding this contextual information as a set of HTTP headers and passing (or propagating) the headers and encoded context from one service to the next as the trace makes its way through an application environment.
Previously, each distributed tracing tool employed custom headers and context formats; for example, Zipkin used the B3 format and at New Relic we developed our own proprietary format. This wasn’t a problem when trace context headers mostly traveled between services monitored by a single tracing tool or when headers rarely propagated beyond a single organization’s network and middleware infrastructure.
And like we said, it’s not uncommon for many development teams today to use their own tracing tools and find themselves left with mutually incompatible header formats. When a tracing tool receives trace context headers it doesn’t understand, it typically drops the headers and breaks the traces that relied upon them. Trace context headers are also more likely to traverse middleware boundaries including proxies, service meshes and messaging systems along the way. Some of these devices will pass along proprietary headers intact, but many others will drop them, once again resulting in broken traces.
W3C Trace Context: Breaking Down Barriers to Observability
W3C Trace Context enables cross-vendor interoperation of traces, one of the four essential telemetry types. This aligns with New Relic’s open instrumentation initiative and the release of our APIs, Telemetry SDKs and exporters to meet customer needs for interoperation between vendors and open source tools.
We opted for W3C Trace Context thanks to what we observed as its useful capabilities for ensuring that New Relic’s distributed tracing tool can traverse services instrumented with agents from other vendors without the risk of broken traces. It also helps New Relic’s tracing tool to reliably traverse third-party components, including proxies and API gateways.
At the same time, W3C Trace Context will confer the same advantages upon open source tracers, enabling our customers to incorporate tracing telemetry from any source and to implement traces across highly distributed application environments.
Our main goal was to support Trace Context so it could hopefully become a critical and very welcome technology for the future of observability.
Functionally, W3C Trace Context defines a pair of standardized context HTTP headers that serve to propagate context correlation information between services:
- A traceparent header contains the data elements that every distributed tracing model requires to define and propagate context: a trace ID, a parent ID and a sample flag.
- A tracestate header holds vendor-specific, contextual data, typically in order to support additional functionality or optimizations associated with a particular tracing tool.
This common context-propagation format enables trace propagation across other trace instrumentation that conforms to the standard. A standard trace header format also clears barriers for middleware vendors to support propagating trace headers and for framework vendors to build in tracing instrumentation.
If you need or want to use tools other than New Relic agents to instrument your services, but still want to capture those traces in New Relic, we expect most vendors and open source instrumentation tools will support W3C Trace Context. Many have already released compliant tracers, including OpenTelemetry, for standardizing instrumentation needed for observability across the industry.
As the standard matures, we expect any tracers or instrumentation using other header formats to adopt W3C Trace Context and for more tools and shims to become available to enable existing instrumentation to be converted to W3C Trace Context for participation in multivendor traces.
The end result is more flexibility and fewer barriers to observability.
How W3C Trace Context works in New Relic
There are two scenarios for how W3C Trace Context works on the New Relic platform:
- Scenario 1: Where some trace data is sent to New Relic.
- Scenario 2: Where all trace data is sent to New Relic.
Let’s take a look at both.
Scenario 1: Where Some Trace Data Is Sent to New Relic
If all of your trace data is sent to New Relic, you’ll be able to observe a complete trace in the distributed tracing user interface (UI). However, if some of the trace’s data is sent to another tracing service, or nowhere at all, you may need to dig around to find that data. With W3C Trace Context, however, you can use the trace ID to find other data associated with that trace.
- An example of call flow where some services are not sending trace data to New Relic, but the trace is still propagated.
For example, in such a scenario as described above, you’ll likely have a trace with missing spans. The New Relic distributed tracing UI will show that the trace has a gap, but using the surrounding spans, you can still calculate the total time for the trace, or perform other troubleshooting.
- Use the trace ID in the distributed tracing UI to find trace data for missing spans.
Scenario 2: Where All Trace Data Is Sent to New Relic
If you are using an open source tracer and want to send those traces to New Relic, we’ve created several exporters for popular open source monitoring tools, including OpenCensus and OpenTelemetry. We built the exporters using the Telemetry SDK, an open source set of API client libraries that send your trace data to the New Relic platform.
In this scenario, you could use an exporter for the OpenTelemetry agent that is collecting trace data for service 2 to send that data to New Relic, without interrupting your use of other exporters.
- An example of a call flow where an exporter allows New Relic to have data for the complete trace.
How Does Backwards Compatibility Work?
New Relic APM agents that support W3C Trace Context can accept and emit both the W3C Trace Context header format and the New Relic header format. The new agents are also backwards compatible, meaning they will continue to work with older agents, so trace context will be propagated between services with older and newer releases of New Relic agents.
In some cases, you may have services involved in a trace that are instrumented with something other than New Relic agents. As long as that instrumentation is compliant with W3C Trace Context, you can use any New Relic agent version that supports W3C Trace Context as part of that trace and be assured that the trace will be propagated.
If you have a trace with a mix of older and newer New Relic agents and non-New Relic instrumentation that is compliant with W3C Trace Context, traces can still be propagated if you always have New Relic agents that are compliant to W3C Trace Context adjacent to the pre-W3C Trace Context New Relic agents. The New Relic agents that support W3C Trace Context will act as “translator” of the New Relic proprietary trace context.
For example, this trace includes an OpenTelemetry agent, a W3C-compliant agent and a non-compliant agent, yet the trace context can still be propagated:
New Relic agents will always accept and emit the W3C trace header format and it takes priority over the New Relic trace header format. You can optionally disable the New Relic trace header format in the agent’s configuration file. See the documentation for instructions on disabling the New Relic format.
And for details and limitations on backwards compatibility, see the New Relic distributed tracing documentation.
More Than Just Another Boring Protocol
We have been closely involved with our colleagues across the industry as part of the W3C Distributed Tracing Working Group to get this specification to this point of final ratification.
New Relic is committed to the W3C group and we’ll continue to provide composable instrumentation solutions that seamlessly work with open standards. We’ll also continue to add support for more distributed tracing use cases to help our users improve observability throughout their DevOps lifecycle. In the meantime, we’d love to hear from you and about how you’re leveraging W3C Trace Context and our open source exporters. Drop us a line in our GitHub exporter spec repo.
Upgrade your agents and get started with the open New Relic One platform today.
Want to learn more about why you need an open, connected and programmable platform for observability? Check out our ebook, “The Age of Observability.”
Feature image via Pixabay.