Monitoring / Serverless / Sponsored / Contributed

Jaeger vs. Zipkin: Battle of the Open Source Tracing Tools

15 Oct 2020 6:00am, by

Thundra sponsored this post.

Serkan Özal
Serkan is co-founder and CTO of Thundra. He has 10+ years of expertise in software development, is an AWS Certified PRO and has a patent on distributed environments. He mainly works on serverless architectures, distributed systems and monitoring tools.

Often the mark of a revolutionary idea is that at first, it seems totally stupid. Take Twitter: in 2006 when the service launched, most people had a hard time seeing its potential. To make matters worse, Twitter was not only weird, it was also unreliable. Twitter was a killer app for the mobile generation, but its backend systems couldn’t handle its sudden popularity and massive adoption. At the time, most backend software was written as a monolithic application; not only were these systems fragile, but they were hard to extend and maintain.

In order to create more resilient and scalable systems, online services adopted distributed architectures that decomposed into microservices. Like many revolutionary ideas, this both solved existing problems and created a number of new ones.. Specifically, microservices that operate in highly dispersed environments are much harder to monitor and debug.

In this article, we’ll examine two tools, Zipkin and Jaeger, that are designed to make distributed computing and microservice-based architectures easier to monitor and manage. We’ll look at what these tools provide, their strengths and weaknesses, and we’ll make recommendations on why you should choose one or the other.

Observability: A Brave New World

Before we look at the tools, let’s take a deeper look at the problem and at the philosophy behind its solution. A monolithic application is like an old car: as soon as it starts making weird noises or something feels wrong, most of us can figure out what is wrong. If we have enough experience, we then dive under the hood and fix it. But a distributed system is more like a modern car: it will tell you something is wrong, but it gives you no indication of how to fix it without specialized tools or knowledge.

Microservices enable you to build dispersed systems with high levels of redundancy that also offer high levels of scalability and reliability. However, their small size, interconnected nature, and high redundancy make system issues harder to locate, debug and fix. Furthermore, existing monitoring and logging software was developed for monolithic applications.

Distributed architectures, like your current car, are black boxes. Both can be monitored through what engineers call observability, which lets you infer the internal state of an observed system by collecting and observing its inputs and outputs. For example, when a warning light goes off on your dashboard, your mechanic will hook it up to a computer that reads the relevant input and output data.

In the virtual world, distributed tracing is used to make a system observable using platforms such as OpenTracing, OpenCensus, and OpenTelemetry. These platforms let you track and record requests from their point of origin to their destination and the systems through which they pass. Once you have a solution in place, you’ll need tools like Zipkin and Jaeger to manage and process the collected data.

Background and Features

As with the best families, the rivalry between Zipkin and Jaeger originates in their shared history. Zipkin, which predates Jaeger, is an open source version of Google’s Dapper that was further developed by Twitter. At its core, Zipkin is a Java-based application that provides a number of services. Each service implements Zipkin’s feature set and includes a user interface and interfaces for tracing frameworks. Each service also provides a range of storage engines to persist recorded data — such as an in-memory database, MySQL, Cassandra, and Elasticsearch.

In addition, Zipkin provides transport mechanisms — such as RabbitMQ, Scribe, HTTP, and Kafka — and a node-based server for storing data in Cassandra. Even if Zipkin doesn’t currently meet your needs, it does provide libraries for most popular high-level languages — including C#, Java, and JavaScript.

Jaeger was created by Uber and was written in Go. It’s similar to, but also different from, its older sibling. In addition to Zipkin’s feature set, Jaeger also provides dynamic sampling, a REST API, a ReactJS-based UI, and support for Cassandra and Elasticsearch in-memory datastores. To implement these features, Jaeger takes a different, more distributed approach than Zipkin.

Jaeger’s architecture includes a client that emits traces to an agent, which listens for inbound spans and routes them to the collector. The collector then validates, transforms and persists spans. Jaeger’s distributed architecture makes it highly scalable. Jaeger also has a unique way of collecting data: unlike other systems that try to collect every trace and span generated, Jaeger takes a dynamic representative sample of the monitored data. This approach not only handles sudden surges in traffic, but increases Jaeger’s overall performance.

Compare and Contrast

Let’s start with the positives. Zipkin, being older, is the more mature platform. It has broad industry support and a large and active community. Zipkin was written in Java, making it a good fit for enterprise environments. However, it also supports most of the popular high-level languages, which is good if you don’t know or don’t like Java. No matter what your language of choice, Zipkin supports OpenTracing, OpenCensus and OpenTelemetry — the big three open tracing frameworks — and has a wide range of extensibility options and tool integrations.

Jaeger is broadly similar to its older sibling but has some unique features of its own. For a start, it has a more modern design and architecture. Its more-distributed approach is highly flexible and performant. Jaeger gives you a web-based (React) UI that you can easily deploy and extend.

The Jaeger community compensates for Jaeger’s relative lack of maturity by providing good documentation and a range of deployment options. Jaeger also has Cloud Native Computing Foundation support (CNCF); and while this is more of a recommendation than a standard, it should be taken into account.

Let’s turn our attention to the negative aspects of each tool. Because Zipkin is the older of the two, its older design uses a less modular and more centralized architecture — which makes it slower and less flexible than its newer rival. While this difference may not matter for smaller systems, as your system starts to grow or needs to quickly scale, it may become an issue. Zipkin’s less modular design, which lacks the flexibility of newer approaches, may affect its overall performance.

Zipkin’s core components were written in Java, which is great for any organization that values stability more than performance. Not only is Zipkin relatively slow, but it uses ephemeral, in-memory storage for collecting trace and span data. As a result, if your system goes down or loses power, you will lose all your recorded data. Zipkin offers many libraries for the big languages, but no official support for popular languages such as Python, Ruby, and PHP.

Jaeger might be newer, but that doesn’t necessarily mean it’s better. In fact, many people — especially in enterprise IT — will look at Jaeger’s relative immaturity as a disadvantage. Jaeger’s choice of Go as its main language illustrates this point. Go was written as a system language, but it’s far less popular than Java; and that means you might have to learn a new language, rather than going with one you know.

Another area that is both a blessing and a curse for Jaeger is its more modern architecture. This architecture offers benefits in terms of performance, reliability and scalability, but it’s also far more complex and harder to maintain. Jaeger also shares the same ephemeral, in-memory storage issues as Zipkin, plus its API lacks Zipkin support.

Zipkin or Jaeger: Which Is Right for You?

Before we give our recommendations, let’s summarize what Zipkin and Jaeger offer and their strengths and weaknesses. In choosing one or the other, you should also take into account your organization’s structure, its monitoring needs, and its in-house technical expertise. In addition, you should decide whether your organization and team prefer newer or more mature technologies.

Both tools are good options for collecting and managing distributed tracing data. They are both remarkably similar, evenly matched, and will do the job; and they both support distributed tracing libraries, OpenCensus, OpenTracing and OpenTelemetry. In addition, Zipkin and Jaeger have a wide range of extensibility options and tool integration, and both support virtualization and containerization. Both tools rely on in-memory storage and face similar issues with data loss.

For those who don’t want to live on the bleeding edge, Zipkin is the better choice. It’s more mature and has a bigger and more mature community. Zipkin has wide industry support, and its Java roots make it suitable for the world of enterprise IT (where Java still rules).

What Jaeger lacks in maturity, it makes up for in speed and flexibility, and its newer, more dispersed parallel architecture. It’s also more performant and easier to scale. Jaeger has better official language support than its older rival, and you can also look at its CNCF support as a badge of approval.

The Cloud Native Computing Foundation is a sponsor of The New Stack.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.