Monitoring / Serverless / Sponsored / Contributed

Platform-Agnostic Tracing

4 Nov 2020 9:47am, by

Thundra sponsored this post.

Emrah Samdan
Emrah is VP of Product at Thundra. He is enthusiastic about serverless, observability and chaos engineering.

One of the three fundamental laws of classical mechanics is Newton’s first law, which says that an object will remain at rest or continue to move at a constant velocity unless acted upon by force.

We live in an imperfect world where nothing ever remains in motion, and nothing ever remains at rest. Nature is full of forces that toss things around, sometimes against our wishes. As engineers, we’ve yet to figure out the magic that makes apps run forever without issues.

Instead, we’ve learned to build apps that are observable, because we know that bad stuff will happen sooner or later. And when it does, we’ve got to know “what” and “why” in order to recover quickly. In a distributed system, shifting through logs can be daunting. Likewise, metrics have limitations. They could show that something is wrong, but good luck finding out “what” and “where.” This is where tracing comes in.

Tracing is a way of stitching requests together as they transit multiple services. It helps you observe distributed systems, to pinpoint the causes of suboptimal performance and failures.

Microservices applications consist of interconnected systems, services or functions, that work together to serve requests. For example, a microservice app may include an order service, cart service, payment service, and catalog service. Each service is isolated and separated by a network boundary, and the services could be hosted on different platforms. The distributed nature of microservices makes it necessary to have a way to track how requests go through your entire ecosystem of services.

A trace consolidates data to locate failures, correlate error reports, identify how an issue in a single service affects other services, and provide insights into the services that affect your application’s overall performance.

The Fabric of Trace

Fundamentally, a trace begins with a single request and represents the request’s entire journey as it transitions through all the services of a distributed system. Each trace is made up of a series of tagged time intervals, or spans.

You can view a span as a fundamental element of a distributed trace, representing a unit of work done by a single service in a distributed system. A span may have a unique ID, timestamp, name, and metadata. Spans may also contain logs — in the form of key-value pairs, which are useful for capturing span-specific informational or debugging output and logging messages.

A trace covers several spans or segments, and as the request moves through each service, you can access the contextual data from your app’s processes and components. This data helps you to profile and monitor a microservice architecture, locate problems and failures, and diagnose performance issues.

What Is Platform-Agnostic Tracing?

The concept of “platform-agnostic” refers to a set of design philosophies and attributes of a tool or software. When software is platform-agnostic, it is not tied to a specific platform or system. A platform-agnostic cloud tool is capable of running the same way on multiple platforms. It does not care if your app is running on AWS, GCP, or the Azure cloud platform. It runs the same way, irrespective of the cloud provider or language you use.

Building applications that transcend one platform (web, mobile, or desktop) calls for one-tool-fits-all tracing. Platform-agnostic tracing enables you to effectively profile and monitor a single request from start to finish, without paying attention to “where” or “what.”

Why Should Tracing Be Platform-Agnostic?

You may wonder why tracing has to be agnostic about the platform you use. Below are a few reasons you should have a tracing tool that’s blind to your architecture.

The Move from Modularized Monoliths to Microservices Architecture

You may have started with a monolithic architecture, but it will evolve as you grow, and you’ll get to a point where decomposing the monster is the only option. You’ll add more systems or services. The complexity will increase. Many modules or services that communicate in-process will be isolated, running independently, and interacting synchronously or asynchronously across networks.

As your architecture evolves and becomes more diverse, your tracing tool should be blind to your internal architectural changes and should enhance your agility without boxing you in. A platform-agnostic tool will not limit you to a specific platform or language.

The Increasing Complexity of Modern Applications

Cloud platforms revolutionized how applications are developed, delivered and operated, but they also created different problems. The refactoring of monolithic applications into microservice or serverless architectures leads to increased risks. The greater the number of systems interacting, the higher the chances that one component will fail — and the more worries you have.

Your development teams already have enough complexities to worry about. Tracing should not be one of them. Platform-agnostic tracing allows teams to integrate and focus on what matters, irrespective of the platform or the nature of the service.

Debugging Cost

The work of fulfilling a single end-user request in a microservice application is spread across multiple services, which may be hosted on various platforms and implemented in various languages and frameworks. When each service implements or uses a different tracing tool suitable for the platform or language it’s written in, debugging becomes a lot harder and less cost-effective. Platform-agnostic tracing saves you the cost of integrating and collating traces across different platforms and tools.

Too Many Tools

Application development involves numerous tools, and engineers already have too many of them to wrap their heads around. The fewer the tools that engineers have to work with to get the job done, the more productive they can be. Switching between several tracing tools to resolve issues, or find and analyze the most important data, takes up a lot of time — and engineers don’t want to waste time.

Platform-Agnostic Tracing Tools

There are several free and open source tracing tools that you can use to profile and monitor microservices apps. At the core of each tool are libraries that provide APIs for various platforms and programming languages.

Below are some common platform-agnostic tracing tools:

Jaeger

Jaeger allows you to troubleshoot and monitor complex microservices environments. It enables you to quickly examine the entire chain of actions or events happening within microservices.

Jaeger was created by Uber Technologies as an open source project in 2015 and then donated to CNCF in 2017. Jaeger is written in Go, and it comes with features that let you optimize latency and performance, monitor distributed transactions, and perform root cause analysis, service dependency analysis and distributed context propagation.

OpenTracing

OpenTracing allows you to profile and monitor applications across different services and components.

One good thing about OpenTracing is that you can apply it across various libraries and tools. OpenTracing API also supports different programming languages — including JavaScript, Go, C#, C++, Objective-C, PHP, Ruby, Python, and Java.

The OpenTracing open-source project was created by the Cloud Native Computing Foundation (CNCF). At its core, OpenTracing aims to standardize the approach to distributed tracing and instrumentation.

Thundra

Currently, Thundra is well known for tracing in serverless and containerized applications, but it’s going to become completely platform-agnostic in the coming months.

Thundra’s tracing solution makes it possible to auto-generate the traces by providing runtime instrumentation libraries. However, no automated tracing tool can know the internal details of your distributed application. That’s why Thundra also provides a manual instrumentation SDK, compatible with OpenTracing standards, to let users inject their own spans along with automated spans.

Application teams ranging from IT operations to SRE to DevOps to software developers rely on this tool to run fast debugging and troubleshooting, with an improved mean time to resolve by quickly pinpointing the errors and/or performance bottlenecks in distributed architectures.

Zipkin

Zipkin provides mechanisms for storing, sending, receiving, and visualizing traces.

Zipkin has a simple architecture and it was one of the earliest tracing systems. It was developed by Twitter and written in Java. It supports most programming languages, with native OpenTracing extension points or instrumentation support for tracing capabilities for multiple programming languages — including Go, Scala, Python, Javascript, Java, C#, C++, C, and more. The tool also supports the major cloud providers: Google Cloud, Azure, and AWS.

Zipkin has an open source community, where you can always find publications on new data formats, libraries and APIs. Zipkin also has a client-server architecture, uses Thrift as its communication protocol, and supports Elasticsearch and Cassandra as backends for storing trace data.

Don’t Get Boxed in

If you don’t effectively instrument your application’s components to be observable, you’ll have a hard time debugging issues in production. In your quest for observability, you should avoid being boxed in by a tracing tool built primarily to support a single platform.

If you’re worried about dealing with adoption and management of open source solutions, you can start your tracing journey with a managed tool like Thundra. This will familiarize you with the core concepts — such as span, trace, and execution context — with auto-generated traces on the Thundra console. Thundra is flexible and leaves room for manual tracing compatible with OpenTracing standards.

Amazon Web Services and Cloud Native Computing Foundation are sponsors of The New Stack.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.