Observability and distributed tracing are intrinsically linked to reliability of increasingly distributed systems. Observability-driven development uses data and tooling to observe the state and behavior of a system to learn more about its patterns for weaknesses. Distributed tracing provides the metrics and logs that allow for diving into individual requests and to get closer to the problem. In this powerful pairing, observability happens at the event level, which drives your questions, and tracing happens at the request level, which helps answer them.
In this episode of The New Stack Makers podcast, publisher Alex Williams sits down with Raj Dutt, CEO and co-founder Grafana Labs, provider of the open source observability platform Grafana. They talk about creating a more seamless transition among observability, tracing, metrics and logs, across different data types and open source projects.
“You have a huge amount of containers. These containers are emitting telemetry at an increasing rate. And these containers are coming and going. So they’re extremely variable, all at the same time. And so it’s extremely complicated. It’s difficult, really, to get a complete picture in terms of what’s going on with your infrastructure and your application,” Dutt said. “And observability is really about getting deep insight into the behavior of your systems.”
This episode goes into the open source history of the six-year-old Grafana and how it grew to support 50 different data sources, across metrics, logging and tracing. And how it grew from an on-premise to a cloud offering. Dutt says this is the beauty of always being open source is that people can run it wherever they want.
When Grafana Labs was launched in 2017, it released a hosted version of Grafana built atop Kubernetes, which enabled cross-cloud deployment. Dutt says this managed Kubernetes offering “really allows us to focus our developer innovation at the layer that’s above Kubernetes rather than having to run it ourselves.”
This infrastructural evolution also allowed the Grafana Labs team to create Loki. Loki is a multitenant, less-expensive, high-volume log aggregation system, designed specifically to work with Kubernetes and Prometheus. It’s less costly because it only indexes log screens, not full-text log-lines like other aggregators.
For Dutt, observability isn’t one size all. It’s about building your own observability stack with what you need on the budget you’re working with. For the Grafana Labs team, that means an open source stack that serves a highly distributed, asynchronous remote team that’s part of the much more distributed, much larger open source world.