Observability in 2022: It Pays to Learn

DevOps teams cannot afford not to integrate observability across DevOps. Given the complexity involved as networks sprawl and become increasingly distributed, organizations are challenged to keep up. Doing so without relying on proper observability processes is unthinkable. By 2026, 70% of organizations that successfully applied observability will achieve shorter latency for decision making, enabling competitive advantage for target business or IT processes, Gartner writes.
What 2022 exemplified is that organizations are doing observability as the momentum builds. As IT teams get stuck in and begin to implement observability, they are demanding simplicity and freedom to mix and match the tools and platforms of choice. They are learning by doing and building “muscle memory” as they go along to borrow the term from Thomas Keenan, senior product marketing manager for Kasten by Veeam, when describing the learning process involved for Kubernetes a few months ago.
“In 2022, engineers realized how crucial observability can be to their businesses’ success — especially as they were tasked with operating with leaner teams,” Martin Mao, co-founder and CEO of Chronosphere, told The New Stack.
Strides have also been made to accommodate more observability tools under a single panel — usually Grafana’s — through the OpenTelemetry protocol as well (a lot on that below). Easy-to-use automation is demanded, such as the use of eBPF auto-instrumentation. Any vendor seeking to gain market share through vendor lock-in better beware.
“Vendors who continue pushing their own bespoke and proprietary instrumentation libraries and agents as the default way to use their products will soon find themselves on the wrong side of what consumers are demanding,” Phillip Carter, principal product manager and OpenTelemetry evangelist at Honeycomb, said.
Not Business as Usual
Amid massive layoffs in the tech sector in general, venture capital firms shortening the runways for their startups to see profits, and uncertainty abounding about the state of the U.S. and world economies, 2022 was certainly not business as usual. “With many companies laser-focused on their bottom line and implementing hiring freezes or worse, DevOps teams were left holding the bag of customer expectations,” Mao said.
Indeed, engineers spend an average of 25% of their time on troubleshooting low-level tasks instead of innovating on customer-enhancing activities according to a recent Chronosphere survey, Mao said. “To accomplish this — and avoid burning out teams — engineers turned to more effective observability solutions to fill the gap, achieve more efficiencies and ultimately satisfy customers,” Mao said.
However, organizations are in many ways just beginning to realize obbservabilty’s potential. Despite its adoption, the number of network operations teams that are successful with their overall missions has declined from 47% in 2018 to 27% in 2022, according to EMA’s Network Management Megatrends research. In other words, a lot of work remains to be done. Adopting a proper observability strategy requires appropriate tool and platform choices and the education that goes with it. Organizations must also make cultural changes. Organizations made a lot of headway in 2022, but proper observability — not unlike security — remains a very long-term work in progress as challenges abound.
“While we love to focus on organizations that have already eliminated the artificial separation between infrastructure monitoring, log management, tracing, cloud security monitoring and DevOps pipeline monitoring, we need to remember that the vast majority of organizations still maintain separate budgets for purchasing individual solutions, instead of one observability suite,” Torsten Volk, an analyst for Enterprise Management Associates (EMA), told The New Stack.
Even seasoned DevOps teams struggle under the operational complexity and cost of aligning full-stack telemetry data streams with business metrics, Volk said. “Getting this done requires organizations to create data pipelines and data models in order to be able to truly connect the dots between resource contention at the network or infrastructure level and the user experience offered by the affected app,” Volk said.
OpenTelemetry Saves
It can be argued that OpenTelemetry was the observability story of 2022. As one of the more significant CNCF projects, organizations increasingly realize how OpenTelemetry offers vendor-neutral integration points that help organizations obtain observability data with relatively minimal effort required for its integration. Its vendor-neutrality is key, as users can mix and match and combine many different observability tools.
With OpenTelemetry, developers can take advantage of simplified and mostly automated instrumentation, while operators receive a unified observability backend without having to worry about missing any telemetry data, Volk said.
“OpenTelemetry is the hottest project today within all of CNCF. While we are still at the beginning of this story, most observability platform vendors have committed to supporting OpenTelemetry as the future standard for tracing, logging and gathering health and performance metrics,” Volk said. “This is an exciting perspective, as OpenTelemetry can finally tie together developers and operators by providing one unified platform for both personas to address their individual priorities.”
But again, OpenTelemetry’s way of allotting users the freedom to combine and remove tool choices is a very big deal. “One of the biggest complaints we’ve heard from enterprises this year regarding their observability tooling is lock-in. OpenTelemetry is rising as the great hope of the observability world, as it offers a consistent and open source standard for expressing and ingesting signals and preventing lock-in,” Mao said. “Importantly, we’re not only seeing adoption but it is also being used across a critical mass of clients and programming languages. While OpenTelemetry has become the standard for distributed tracing, adoption for metrics and logging looks like it will take a little more time given the different problems and motivations for those data types.”
Indeed, OpenTelemetry helps DevOps teams achieve unified observability since it helps teams that need to split telemetry and observability concerns: “how do you get the data” vs. “what you do with the data,” Austin Parker, head of developer relations, Lightstep, said. “OpenTelemetry answers the first part of this by providing a vendor-agnostic standard for creating, representing, and collecting telemetry data that’s suitable for observability use cases,” Parker said.
As for why OpenTelemetry received so much attention this year, the “biggest longstanding reasons” are its capabilities (automatically capturing traces and now metrics from a huge set of platforms, languages, and technologies), consistent data model and semantics, and the afforded ability to send data anywhere, Morgan McLean, director of product management, Splunk, said. “OpenTelemetry provides what organizations knew that they needed yet didn’t have. What changed this year is that the project gained a new signal type (metrics), and that the project’s artifacts (collector agent, language agents, language libraries) have generally reached a level of maturity and market awareness where people are happy to use them in high-scale production environments,” McLean said.
More developers at large enterprise organizations are increasingly realizing how OpenTelemetry is part of understanding the benefits of a modern approach to observability, Liz Fong-Jones, field CTO at Honeycomb, said. It involves instrumenting with OpenTelemetry and harnessing the power of distributed tracing, in order to understand how users are experiencing code in production, Fong-Jones said. “This confirms that developers are looking for ways beyond logs and metrics to observe and understand their increasingly complex and distributed systems,” Fong-Jones said. “Furthermore, with the rise of eBPF auto-instrumentation, batteries-included ease of use, and standards compliance with OpenTelemetry, it’s easier than ever for enterprises to have a choice when adopting observability in order to get the insights they need into their systems.”
The auto-instrumentation aspect of OpenTelemetry and eBPF means “users don’t have to change their code to understand its behavior,” Tom Wilkie, vice president of technology for Grafana Labs, said. “This year we saw the beginning of auto instrumentation, be it through OpenTelemetry or eBPF, making it easier for developers to extract the telemetry needed to understand their applications. At the same time, users are asking for a tool that stores all and allows you to analyze all your metrics, regardless of format,” Wilkie said. “This year, great strides were made like the OpenTelemetry auto instrumentation libraries, and projects like Grafana Mimir, allowing you to natively ingest metrics from OTLP, Graphite, Datadog and InfluxDB and to store and analyze these metrics.”
Wilkie also made a bold prediction for 2022. In January, he was quoted in The New Stack as saying: “By the end of 2022, worrying about your metrics cardinality will be a thing of the past,” Wilkie said. How accurate was his prediction?
“High cardinality metrics have historically been expensive to store and inefficient to query and this year saw that change, with techniques to reduce the cost of storage, such as pre-aggregation, getting around the problem,” Wilkie said. “But this has not been universally useful — instead in some areas people have found the need for their high cardinality, such as per-user SLO reporting in multitenant systems. This year saw systems which can cost-effectively scale to support such use cases whilst enabling high-performance analysis, such as Grafana Mimir, gain popularity.”