Grafana Seeks to Correct Observability’s Historic ‘Terrible Job’
Times are tougher. IT is under pressure to better comprehend the data through observability, optimize costs and make more informed business decisions. There has been an explosion in telemetry and various observability tools to parse through. At the same time, IT teams are facing job cuts and security pressures, further accentuated by a rise in terminations and even fines following security breaches.
As Raj Dutt, CEO and co-founder of Grafana Labs, said during the opening keynote during Grafana’s ObservabilityCon 2023 annual users conference held in London in November, “The macroeconomic situation has definitely impacted our product strategy in quite a big way.”
Dutt noted how 2023 was a “challenging year for many of us from a macroeconomic standpoint, as infrastructure costs, as well as money itself, have become ‘more expensive,’ while valuations are down and “it’s no longer about growth at all costs.”
“I think the entire observability space — Grafana Labs included perhaps — has actually done quite a terrible job historically in aligning sort of the customer value of what you’re paying for and what you’re paying for,” Dutt said, while noting the “exploding data volumes” that applications and infrastructure generate.
Observability Not Keeping up with Data
In other words, data volumes and costs are rising almost exponentially whereas the value that data and observability platforms often provide are, in many cases at least, have not kept up. “That relationship is something that’s been weighing heavily on our minds the last few years, particularly during the last year and a half,” Dutt said.
“That’s been a big reason behind a lot of our product strategy, including things like adaptive OpenTelemetry, adaptive metrics and adaptive logs.”
@grafana’s @nopzor: « open source is truly where all the cutting edge technology is being developed within observability and I'm sure many of you feel that way today. » #ObservabilityCon 2024 keynote. @thenewstack pic.twitter.com/BfNqL2Amnc
— BC Gain (@bcamerongain) November 14, 2023
In this context, it has been refreshing to see, following ObservabilityCon 2023 annual users conference held in London in November and subsequent conversations with Grafana developers, heads of engineering, developers relations and other Grafana team members, that they are collectively aware of these pressures and, as such, are looking to offer more.
It maintains its historical contributions and ensures compatibility with various observability open source tools, especially evident in its aggressive integration of Grafana with OpenTelemetry.
Additionally, Grafana remains supportive of open source projects, including, notably, Prometheus and continues to introduce new projects like the recent launch of Grafana Beyla, an eBPF auto-instrumentation tool (more about that below) and integration with Cilium for observability and security with eBPF.
Meanwhile, Grafana has continued to offer more features for the Grafana Cloud enterprise version without substantially raising prices or unexpectedly canceling features, as some cloud providers have done (which will remain unnamed here).
Grafana’s open source policy and support contrast with the recent spate of the open source model being under scrutiny or facing currents against the traditional open source philosophy. This include HashiCorp’s, MongoDB’s, Elastic’s and others’ shift from offering purely open source licenses for software. Additionally, there’s Red Hat’s decision last year to no longer make RHEL (Red Hat Enterprise Linux) source code available.
@grafana’s @tom_wilkie on Loki 3.0: « We built a system that really accelerates those needle-in-a-haystack » log queries (among other upcoming Loki features, during the #ObservabilityCon 2023 keynote. @thenewstack pic.twitter.com/kb4trcpYo8
— BC Gain (@bcamerongain) November 14, 2023
All of this telemetry data is on the cloud, or very often, distributed across multiclouds and on-premises operations. A surge in AI and edge computing has also contributed heavily to this increase in data, says Tom Wilkie, CTO of Grafana Labs.
Specifically addressing cost management has been Grafana’s general availability release at OverservabilityCon of its Cost Management Hub for Grafana Cloud.
As explained by Wilkie, it enables users to access features such as adaptive metrics and the UI for that. Additionally, there is an exporter allowing the export of logs into an S3 bucket for extended retention.
A cardinality management tool is also available, enabling users to identify the teams, namespaces and services driving telemetry usage. Moreover, billing usage groups facilitate chargeback processes, Wilkie said. “All these features have been consolidated into the Cost Management Hub, providing a centralized platform for users to access, evaluate and manage costs,” Wilkie said.
Acquiring just the amount of telemetry data you need when you need it and in the way you want to visualize it is key. To that end, Grafana introduced Adaptive Metrics for Mimir in New York during ObservabilityCON 2022. Hundreds of customers have since used it to save hundreds of millions of dollars in cloud and data storage costs, Wilkie said.
“This is becoming a kind of fundamental and pretty unique strategy to Grafana Labs to make it so that you only really have to store the amount of data you need to answer the queries that you ask, fully automatically and it adapts in real time. As you ask different queries, we’ll end up storing different amounts of data,” Wilkie said. “Short-term, this is reducing people’s bills — but long-term, we think this aligns more closely with the value.”
Grafana is now developing Grafana Adaptive Logs, applying the same techniques used in other areas to logging, Wilkie said. This project is in its early stages and is classified as a research project s of now, but “the team has been actively working on it and have experimented with two or three different techniques and the results have been impressive,” Wilkie said. After Grafana concludes its test with what Wilkie calls “three significant customers,” it will be interesting to see how Adaptive Logs come into play to limit overkill in log data once available for use.
OpenTelemetry and eBPF
It’s @grafana’s Beyla: eBPF-based, OpenTelemetry application auto-instrumentation, for @grafana’s Application Observability. @myrleKrantz #ObservabilityCon 2023 keynote. @thenewstack pic.twitter.com/U7vmXu9IHZ
— BC Gain (@bcamerongain) November 14, 2023
There is a need for visibility across stack layers and the network. For comprehensive stack visibility, eBPF extends from the Linux kernel-based across the stack for the applications to which it is applied through the use of hooks. Meanwhile, OpenTelemetry facilitates the integration of different observability tools by providing a standardized interface.
During her keynote, Myrle Krantz, director of engineering, Grafana Labs, covered how users might find it challenging to use the OpenTelemetry SDKs. This could be the case if you are working with a compiled language like C++ or Go, Krantz said. Or perhaps you have a component for which you don’t have the code and you’re deploying it in your landscape.
Alternatively, you might have a component that uses mixed versions in a way that makes it impossible to select a specific OpenTelemetry SDK, Krantz said. “If any of these scenarios resonate with you, there’s still a solution to help instrument your applications,” Krantz said.
Grafana’s Beyla — its general availability announced at ObservabilityCon — integrates eBPF and leverages the OpenTelemetry transport protocol “to allow you to decorate application and kernel calls, track them and then send that data into the cloud,” Krantz said. It is open source, of course, licensed under Apache License V2.
“What all this means is that you can deploy Beyla today, almost regardless of the other pieces you are using from our solutions. Because it’s eBPF, there’s no need for additional code — just one command to deploy it,” Krantz said. “It works in Kubernetes clusters, Docker containers and on bare metal. Most importantly, it achieves all of this while instrumenting aspects that would otherwise be impossible to gain visibility into, precisely because it’s built on eBPF.”
It would be a euphemism to say that LLM and AI will impact observability tools and practices in the coming future. However, the extent and mechanics of this influence remain to be seen. It could be assumed that observability, including alerts and actionable insights, will be orchestrated by AI, providing supreme business insights. But again, the exact mechanics, though, are yet to be revealed.
During ObservabilityCon, Grafana outlined its overall approach to LLM and AI development, introducing the Grafana LLM App in a public preview. This app offers another way to centralize access to LLMs across Grafana. The results of this facilitation will be interesting to observe in the coming months. Simultaneously, generative AI and LLM apps are under development.
During the demo at ObservabilityCon, Marc Chipouras senior director of engineering at Grafana Labs, showcased how these generative AI and LLM apps can be worked out. Additionally, as far as AI development in Grafana is concerned, the recent acquisition of Asserts.ai is expected to be a small or integral part of Grafana’s AI development. Asserts’ purpose aligns with this expansion, offering easier and more autonomous analysis of metric data. Asserts simplifies this contextualization, a complex task for humans.
In the immediate future, Grafana Cloud users will be able to benefit from the use of Asserts, which was created to help users find metrics data (or “contextualize” metrics data, as Grafana describes it) with the use of AI. It specifically scans the labels in Prometheus metrics and automatically discovers an application and infra components and how they’re connected to each other.
“What we want to do is we want to bring generative AI together as a tool to enable you and your developers and your friends in order to be able to use it across Grafana,” Chipouras said.