With the release of Lightstep Incident Response, ServiceNow has combined observability with incident resolution on a single platform to help developers and site reliability engineers (SREs) monitor, alert, collaborate, and respond to incidents for quick recovery.
The goal is to reduce downtime by arming these folks with the service context and automation they need to effectively respond to incidents, such as a software bug, power outage, or down network, said Rohit Jainendra, vice president and general manager of emerging businesses at ServiceNow.
Eliminating Context Switching
Jainendra said developers and SREs have had to do a lot of manual context switching between various tools to keep up with all their systems.
“What we’re hearing from developers and SREs is that eliminating ‘context switch’ — flipping between observability, on-call, collaboration and incident management tools — would reduce human errors and speed up response times,” he said in a statement. “With Lightstep Incident Response, we are providing teams with a single platform that orchestrates on-call escalation, alert grouping, incident analysis, and remediation, while seamlessly integrating with collaboration and incident management tools to eliminate ‘context switch’ and resolve incidents with speed.”
Major New Feature Post Acquisition
ServiceNow acquired Lightstep in 2021 to extend the benefits of observability across business functions and enable enterprises to increase their cloud native capabilities. The company plans to extend Lightstep’s capabilities beyond observability, with the mission of becoming an end-to-end platform for app development organizations. The general availability of Lightstep Incident Response marks the first major step on that mission.
Ben Sigelman, general manager of Lightstep and co-creator of the CNCF OpenTelemetry project, said Lightstep would have been hardpressed to have delivered this capability as a startup alone, but has accelerated its development under the auspices of ServiceNow.
How It Works
Lightstep Incident Response works by synchronizing on-call workers’ schedules onto a shared calendar, with specific tags that indicate who needs to be looped in based on the nature of the incident and the service that is impacted. From there, collaborators are invited to a dedicated channel based on prebuilt collaboration integrations for quick remediation. Additionally, they can create automation that self-triage and self-remediate problems should they reoccur, Jainendra said.
The new tool integrates with leading monitoring, observability and collaboration tools, including LogicMonitor, Postman, Slack, Sumo Logic, Zoom, and more.
For ServiceNow customers, Lightstep Incident Response natively integrates with the Now Platform, allowing users to quickly respond to or escalate incidents to the right team all on one platform and connecting incident response to core operations.
“ServiceNow has realized that the IT Operations model has changed now. Modern agile and DevOps teams can’t wait for multiple levels of support such as L1, L2, and L3 or wait to work thru a tickets-based system,” said Andy Thurai, VP and Principal Analyst, Constellation Research. “CloudOps is moving towards events/incident-based workflows and escalation/resolution immediately by service owners rather than on-call support engineers, is the newer model. This initiative by ServiceNow is to create toolsets to cater to that model.”
However, version one from ServiceNow has limited capabilities, but they have a good roadmap and a good direction, he noted.
“The mentality of incident response used to be waking someone, that is in the next escalation chain, in the middle of the night is the only way to escalate the problem,” Thurai said. “The newer model is incident/events based. You can, and should, directly escalate the incidents to the service owners who will have firsthand knowledge to solve this problem. The DevOps model is about ownership and accountability of your changes.”
The ability to react quickly sets the new ServiceNow tool apart.
Speed and Efficiency
“With the introduction of Lightstep Incident Response, we are delivering the all-in-one solution for developers and SREs to act with the speed and efficiency necessary to maintain exceptional experiences for customers using their applications and services,” Sigelman said in a statement. “In combination with OpenTelemetry, a Cloud Native Computing Foundation sandbox project founded in part by Lightstep, organizations will now have the data platform, workflows, and an open standards approach necessary to successfully operate highly distributed cloud native services.”
When it comes to monitoring DevOps infrastructure and services, there are lots of open source tools that provide good coverage and could be found in a typical team e.g. Grafana, Prometheus, Influx, etc., said Shailesh Mangal, VP of Engineering at Roambee.
“Two things that we were struggling with which Lightstep helped us resolve were: Get a handle on infra and service alerts,” he told The New Stack. “They essentially helped (forced) us to prioritize the alert types by adding additional metadata.”
The other issue Roambee faced was the need to set up an escalation path and follow automatically, he said.
“Given small team size, it’s not possible for us to have dedicated shift round the clock. We achieve this by sharing the responsibility with regular (non-DevOps) team members moonlighting and rely on alerts. If these alerts are missed out for whatever reason, there was no way for us to detect them until it’s too late. With Lightstep, we set up proper escalation paths, and alerts if not acknowledged, get escalated up the food chain.”
Muthu Gurumoorthy, CTO and co-founder at Assembly, in a statement said of the Lightstep Incident Response feature, “It has been a gamechanger for our on-call engineers and developers. With Lightstep Incident Response, our team is empowered and engaged, knowing that they are armed with the critical context needed to resolve incidents at speed…”
New Pricing Model
Lightstep Incident Response is offered as free and paid versions and introduces a usage-based pricing model based on the number of active services being managed, Jainendra said. Customers don’t pay by the seat and only pay for what they use. This allows the entire team to participate in the incident response process and drive a culture of service ownership. Customers can get started immediately with a 30-day free trial.
“ServiceNow has a leg up as they are the SOR (System of Record) with many large enterprises with their ITSM solution,” Thurai said. “It is an easier upsell if this combination of observability plus incident management works, but it is a long way to go. Good start, but we have to wait and see if they execute on this properly.”
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Postman.
CNCF, InfluxData and ServiceNow are sponsors of The New Stack.
Featured image via Pixabay