One of the side effects of moving to a microservices architecture is that organizations can lose the capacity to efficiently troubleshoot their entire application and how the different pieces fit together.
To address this issue, LightStep, which was founded in 2015 and comes out of stealth mode today, has launched a new SaaS product, called [x]PM, that offers a way for software organizations to make sense of the behavior of their complex, scaled-out production services. [x]PM has been in beta with companies including Lyft, Yext, and Twilio.
“Lightstep’s [x]PM is well suited to organizations who are trying to make sense of the behavior of their complex, scaled-out production systems,” asserted Ben Sigelman, co-founder and CEO in an interview.
Sigelman, who designed and deployed global-scale monitoring technologies at Google before founding LightStep, believes microservices is spawning a huge transition in software development, one that is “seismic and categorical.” It’s much more than an architectural or efficiency transition. “It’s a different block diagram,” he said.
The traditional ways of dealing with way too much data are to throw data out or summarize lessons prematurely said Sigelman. LightStep is creating a third way.
Most current tools to monitor a monolith do not work for distributed data, and most application performance management (APM) systems focus only on a limited part where the data goes. The product was not developed exclusively for microservices, said Sigelman, because companies are running a combination of the monolith, microservices, the cloud, and their own data centers, all with different technological generations that are working together.
Breadth of Data
The new service follows a transaction through all its moving parts, picking up data along the way and telling the story of what happened across your entire system in a few seconds. Data is collected from mobile apps, the back-end stack, and pretty much anywhere in the client’s system that touches data and can report failures in seconds.
Your normal vanilla mobile request will touch dozens to thousands of different moving parts, he said, and a person simply can’t follow the transactions and find the failure.
LightStep [x]PM is designed to absorb orders of magnitude more transactional data, more than previously thought possible, he explained. “With that additional information and context, we can provide insights about software that are more relevant to the most critical business needs of our customers that resolve issues, whether they are short-term fires or long-term business issues.”
Messaging giant Twilio reduced incident resolution times by 92 percent with LightStep according to Jason Hudak, Twilo senior vice president for platform engineering. “LightStep [x]PM not only finds our performance problems, it tells us why they’re happening,” he said. “Within an hour of running LightStep, our billing transactions team was able to identify issues and deliver betterments that led to a 70 percent reduction in latency. ”
The breadth of data collected by [x]PM allows companies to look at transactions on a very granular level, being able to track events and keystrokes for a single customer.
The Next Level
The service is designed to be more flexible in how it’s applied and to absorb data from parts of the system not traditionally included in APM, providing historical context to what’s happening and identifying whether an occurrence is an emergency or within tolerated limits.
Another innovative aspect of the service is its ability to home in on very specific transactions. The granularity of data allows a company to do things like monitor service level agreements for individual teams or monitor a specific software release at launch to see how it affects the entire system and roll it back if there’s an unexpected effect.
Sigelman said, “it’s important to note that the amount of data we’re talking about cannot be written to disc. Once you write it to disc, you hobble the capacity to scale.” All this great information comes from the system having all the recent data from all across the system — ‘recent; being a couple of seconds. But that amount of data cannot be stored. The time can be extended based on customer need. But even holding a couple of seconds of data is staggering.
It’s crucial that the data not be passed over the internet. The network load alone would cost millions of dollars, he said. It’s also important to note that the LightStep SaaS should not ever be considered a system of record.
Lightstep is not alone in addressing the need for the emerging market for microservices monitoring. Honeycomb launched earlier this year, drawing from the Facebook Scuba architecture. The Cloud Native Computing Foundation hosts the Uber-developed Jaegar, which, like [x]PM provides tracing capabilities based on the CNCF OpenTracing specification.
The pricing model is also unique for APM providers. The market, he said, is really clear that it doesn’t want to pay per VM connection or by the amount of data being fed into the LightStep system.
So they decided on a model that allows the price to scale with the value of the product to the customer. Their three-tier system has one cost for on-boarding the product. After that, the price is based on the scope of analysis being done in LightStep.
If only a couple of groups are using the product, the price will be lower, he explained. If the product is used across multiple groups, spanning the customer’s entire system, the price will be higher. While he couldn’t give me numbers, he did say it’s been considered a great value to their beta customers.
The Cloud Native Computing Foundation is a sponsor of The New Stack.
Feature image: Ben Sigelman, taken by TC Currie.