In the opinion of Ben Sigelman, CEO and co-founder of Lightstep, users should not have to pay for system metrics. For monitoring microservices, telemetry should be a commodity. The value-add should come with observability, or how much easier monitoring services makes it for end-users to quickly identify and resolve issues within a distributed system.
“Lightstep’s opinion is that you should never pay a margin on telemetry, and you should remain in control of that spend. It’s really just a commodity that you should have control over,” Sigelman said. “In the market right now, people are getting very, very large bills for their telemetry. It’s not even for the observability, it’s really just for the telemetry.”
Lightstep is expanding its core platform from distributed tracing to incorporate a larger array of metrics, a suite of features that will allow developers to identify performance issues faster, the company claims.
“Our objective is to be the quickest way to figure out what’s changed in a system and in the services within that system,” Sigelman said. “We for a long time have been leading the way on the distributed tracing front. But distributed tracing alone wasn’t enough for us to completely answer certain questions.”
The nascent but growing market for distributed tracing platforms includes other startups such as Honeycomb, and Datadog, as well as established application performance management providers such as New Relic.
Lightstep’s update is part of an overall strategy for the company to define observability as a service that goes beyond the simple collection of metrics through telemetry.
“People have been somewhat programmed to believe that the telemetry is observability when it’s actually just the raw material,” Sigelman said. “I think we’re trying to change the way people think about the space to see the telemetry as a commodity they can control, and observability as a value based tool that they build on top of that telemetry.”
Lightstep has built its metrics collecting from the OpenTelemetry project, though its sees its value add as the workflows it offers to better pinpoint problems, and keep monthly monitoring bills reasonable. It allow users to set constraints and to set a budget on what they want to actually spend on their telemetry. The Lightstep software will “right size within that budget from a telemetry standpoint,” Sigelman said.
One new feature, the error and latency analysis, can highlight a problem as it propagates through the call stack, providing a list of data-driven hypotheses for what went wrong. The update also provides runtime metrics, allowing the software to correlate metrics with problems in service performance.
By making runtime metrics accessible with zero additional configuration and providing side-by-side visibility of performance metrics, developers are able to quickly identify if problems in their runtimes (e.g. increased garbage collection, CPU, or memory) are causing service degradations. Regression analysis can automatically show what is contributing errors and increased latency.
One of the company’s core assets is a correlation engine, that can filter for only the infrastructure metrics — CPU, memory, garbage collection, as well as inferred metrics such as latency — that are relevant to a particular regression. “The alternative is just to show pages and pages of graphs, which is a pretty low signal-to-noise,” he said.
To hear more about this approach, listen to this podcast with Lightstep CEO chief technology officer Daniel Spoonhower:
Lightstep is a sponsor of The New Stack.