OpenTracing Aims for a Clearer View of Processes in Distributed Systems
The OpenTracing diagnostic tool grew out of a workshop on transactional tracing that ultimately became a support group for people, mostly from large internet companies, who had created in-house teams to solve these problems.
“The story was just so painful across every one of these companies. We realized we were hearing the same story over and over from everyone about how hard it was to integrate with a distributed tracing system,” said Ben Sigelman, one of the project’s founders.
Last month, the Cloud Native Computing Foundation adopted OpenTracing as its third projecting, following Prometheus and Kubernetes.
The originators of the technology realized the underlying issue was a lack of standardization, Sigelman said. People developing applications at these companies use a lot of open source libraries, yet there was no commonality in the way tracing was described between them and the organization’s specific business logic.
“If you’re using monitoring, you need to be hearing a clear, concise story about your system The problem is that most of the tools created 10 years ago were designed to tell stories about individual processes, individual VMs or individual machines,” Sigelman said.
Tracing systems such as Zipkin, Dapper, HTrace, X-Trace and others address issues with application-level instrumentation using incompatible APIs, according to Sigelman, who is also co-founder and CEO at LightStep. Sigelman built Dapper while at Google.
With Dapper, he said, “There were so many things that when you saw them, it was blindingly obvious they were inefficient, broken or both. All these things we were discovering were multi-process interactions. … [You would see] that bug had been there for years, but things like that are very difficult to trace unless you can see them in action.”
CNCF started out working on Kubernetes, then added the monitoring project Prometheus in May. It recently announced OpenTracing to bring standardization to systems that track a transaction or workflow as it propagates through distributed architectures.
OpenTracing solves the problem of poor standardization in the propagation of tracing data from one library to the next and one process to the next. It does that in a way that doesn’t bind you to a particular vendor, whether it’s Zipkin or a commercial offering, Sigelman said, adding that a number of application performance management vendors are expected to announce Open Tracing support in the next few months.
Intel’s Nick Weaver Discusses Orchestration
“Say there’s a commercial vendor you want to use for performance monitoring, rather than adding a bunch of code that refers specifically to that vendor, throughout your open source dependencies, all that can be OpenTracing, which is totally generic,” he said. And it makes switching vendors just as easy.
Sigelman says three groups care about OpenTracing:
- Application developers: It gives them the ability to move from one vendor to another. There’s a number of open source projects they can use and have OpenTracing just work out of the box.
- Open source libraries: Committers and maintainers can instrument systems in a way that will allow upstream users to tie that library with any downstream tracing system. Things like PRPC or DropWizard are able to add instrumentation without creating a specific requirement for people who use them.
- Vendors of monitoring and tracing systems: It gives them a wide swath of instrumented code without doing any work.
Application code and OSS packages program against the abstract OpenTracing APIs, describing the path that requests take within each process as well as the propagation between processes. OpenTracing implementations control the buffering and encoding of trace span data, and they also control the semantics of process-to-process trace context information.
Programming languages have been a particular pain point, Sigelman said.
“When people are deploying their services, if you include the mobile web clients, they’re using five or six languages. So if you want to have something that spans your system, you need to have a single set of APIs that are conceptually consistent. I think that’s a big part of the value proposition to companies: You want to know this library both makes sense with itself and semantically from the tracing system’s perspective,” he said.
Uber was an early adopter and tester of OpenTracing, and Apple, Yelp, Cockroach Labs, Joyent and Workiva have also been in the project.
Sigelman foresees a proliferation of integrations over the next year. There is a way to integrate OpenTracing with applications in a way that doesn’t require source code modification within a container environment like Kubernetes or other orchestrated deployment framework, he said.
“I would like to see a story for Open Tracing in that environment, which will require standardization beyond just the level of APIs. It will involve some standardization about the way that tracing data looks on the wire and is formatted on the wire. I’m eager to pursue conversations with people from those communities,” he said.
The real Holy Grail would be to see distributed tracing work with no source code modifications in a vendor-neutral way just by changing your Kubernetes configuration or something along those lines, he said.
“It’s just a matter of human communication and technical configuration that needs to take place. … It’s definitely doable.”
The Cloud Native Computing Foundation, Intel and Joyent are sponsors of The New Stack.
Feature Image: “All Tracks Lead to …” by Les Chatfield, licensed under CC BY-SA 2.0.