What Observability Must Learn from Your IDE

Observability is a long-coveted objective for many organizations building complex software. Rather than simply relying on a few static dashboards, an observable system treats reliability like a data analytics problem and brings all of your information into a single place, where it can be queried, analyzed, and converted into actionable insights.
This approach allows teams to view complex problems from multiple perspectives, develop new insights that can benefit operational effectiveness, and generate new insights for the wider business to make better strategic decisions going forward.
Observability engineering is a term coined by monitoring experts Charity Majors, Liz Fong-Jones and George Miranda. It refers to the body of skills, techniques, and technologies that enable the creation of observable systems. “Observability engineering” has been around for a while, but it hasn’t become the mainstream discipline that TDD or extreme programming has. What’s stopping us?
So What’s Missing in Observability Right Now?
The major challenge facing observability in the past decade has been the explosion of data. Data volume is increasing at a phenomenal rate. Organizations are being forced to innovate constantly, just to be able to maintain stable access for the amount of data they’re generating. Even within the narrow constraints of reliability, the amount of data generated by a tracing system is, an order of magnitude, more than simple metrics.
Many of the tools we have now were designed when data volumes weren’t as big. While we’ve upgraded our data storage and collection mechanisms to handle this new scale, there hasn’t been nearly as much effort to make our tools a pleasure to use.
In short, it’s the developer experience that has suffered the most.
So What Can Our Observability Tools Learn from the IDE?
Developer experience is a tricky thing to master. It differs from the general user experience because every one of your users has a basic level of expertise. As such, some common features are regularly found in developer tooling that observability systems have either implemented poorly or, worse, not at all. Let’s focus on one as an example. Autocomplete.
Autocomplete Should Be Non-Negotiable
Autocomplete is only available for Kibana Query Language (KQL) users who pay for a license. Otherwise, you’re firing blind.
Autocomplete has been around for so long that the idea of stripping it out of a developer tool seems extreme. If we turn to tooling that enshrines developer experience, like VS Code, we can see that, for free, we get a great autocomplete dialog:
Autocomplete Is Offered in Some Solutions, but It’s Nothing Like an IDE
Prometheus comes with autocomplete built in, and it’s a great productivity booster, but even this fantastic tool is missing some things that would never be overlooked in an IDE:
No mention of cardinality, number of records, how long the metrics have been gathered for, minimum values, or maximum values. The same is true if an engineer enables autocomplete in Kibana. They’re told the name and nothing more. If this were a variable in an IDE, an engineer would at least wish to know the type of this variable.
This absence of metadata is a killer for productivity because now the user needs to explore every one of these metrics to find the details they need. Again, let’s compare this to something similar in VS Code:
And Observable Autocomplete Is Even More Complex
Autocomplete in an IDE is a relatively simple challenge of indexing existing code paths. There is only one current version of the code, so there is no need to track anything more. However, with observability data, data may be available only in specific timeframes. This means that Kibana can only tell the user that a given log field exists at some time.
The user doesn’t know if they can use it in their query until they try and inspect the results, which is often a complicated task. This is a non-trivial problem to solve, but some companies are tackling it.
And Aggregating Values Is Even More Complex
It is very common to aggregate values when attempting to understand your metrics or a given log value. You would typically aggregate along a label, but which label? Some labels are strings, some are dates, and some are numeric. Most systems will simply show you all possible labels and let you try to find the right one. This is not surfacing information in an actionable way and only adds unnecessary toil.
It’s Time to Refocus on Developer Experience
Observability, as an industry, has been tackling the complex problem of managing a huge volume of data. Still, as we find new techniques for managing all of this information, we also need to remain laser-focused on how we can present that data in an actionable way.
Fundamentally, suppose we have a system that has all of our data, but it’s complicated and time-consuming to use. In that case, the greatest constraint isn’t in the sophistication of our data processing, but in the ability of our users to extract any value out of that data, once it has been processed.
So what’s next?
Fortunately, observability platforms are working hard to make their platforms more intuitive for their users. Every day, new features are released across the industry, both SaaS and open source, that restore faith that we’re beginning to realize the depth and complexity of observability, not just as an operational challenge but as part of the fabric of a high-performing, engineering organization.