Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
At work, but not for production apps
I don’t use WebAssembly but expect to when the technology matures
I have no plans to use WebAssembly
No plans and I get mad whenever I see the buzzword
AI / Operations / Security

LLMs and Incident Response? It Starts with Summarization

A look at how PagerDuty’s data science and product development teams explored how large language models could help during outage crises.
Jun 16th, 2023 11:04am by
Featued image for: LLMs and Incident Response? It Starts with Summarization

Among other things, large language models (LLMs) are good at summarizing all kinds of content. Thanks to the advent of Transformers, natural language processing (NLP) applications like ChatGPT are able to process long sequences, understand context and capture complex relationships in data. This is useful for producing quick summaries of books, topics or 10 seasons of that reality TV show someone mentioned at a party.

In a work context, however, the conditions aren’t always ideal for using plain generative AI today. The context is specific, time may be too constrained to engineer prompts, and the stakes can be high. But that didn’t stop the PagerDuty data science and product development teams from asking the question, “How could LLMs help in incident response?”

Surprises in Generating Status Updates

Anyone who’s sat down to write anything has faced the tyranny of the blank page. It’s stressful enough under patient circumstances, let alone during a major outage affecting customers where every second matters. You are trying to balance timeliness of keeping the team (and customers) in the loop, thoroughness of providing updates and succinctness of getting the info to the people who need to know, including your execs. And yet, it’s critical and fundamental to communicate updates during an incident.

So, how do you use an LLM to write the first draft of an incident status update? Looking into how LLMs could ease the writer’s block led to some surprising discoveries.

“When an incident first strikes, there is a fog of war and usually some initial context in the early data of the incident can really help get the first status update out there quickly. Some data that would seem valuable is actually not that useful,” explained Leeor Engel, PagerDuty’s senior manager of software engineering.

“We started looking at ‘basic’ incident data [such as title, severity] and leveraging ChatGPT to extrapolate summaries from that. These updates were very limited,” explained Everaldo Aguiar, senior manager of data science. From there, the team experimented quickly, thanks to their familiarity with PagerDuty’s foundational data model.

“We already had a fair amount of intuition and experience feeding this data into various ML models. This helped us get going,” noted Engel.

As basic data wasn’t the key, the team began to look elsewhere. Two other sources of data proved useful for generating draft status updates: PagerDuty’s own internal chat data and change data.

“We began looking at our own Slack data (where responders are constantly sharing snapshots of their progress) and other richer data sources (like incident timelines),” added Aguiar. “Updates generated from this data are significantly more informative.”

Chat data has other benefits in the practical application for status updates.

“Slack chats deliver good input since they discuss details and are updated almost in real time,” noted Ben Wiegelmann, senior product manager. “The LLM just reads along.”

There are limitations to using chat data, however; so the team kept experimenting. Their next lesson proved to be even more useful.

“Incident timeline entries so far have provided the best results we’ve seen,” explained Engel.

This correlates with using more PagerDuty features, like adding responders and having service dependencies mapped, which enrich the incident timeline.

“What actually helps much more is seeing how an incident changes over time,” said Engel. “It’s less about where the incident starts and more about the changes and activities that occur on an incident over time that provide real insights into what is going on.

“The series of incident state transitions is extremely important and conveys meaning and progression on the ‘story’ of an incident,” he explained.

“State transitions” are things happening to an incident over time. These might be text-heavy, like notes added or earlier status updates summarizing the situation to that point. It can also be a change in priority level, adding responders, which teams those responders are on and whether incidents get escalated.

The Art of Prompt Engineering

“While it may seem easy, successful prompt design can really make or break things,” noted Aguiar. “We’re finding that this is a very nuanced process that requires lots of iterations.”

The PagerDuty team’s experience validated broader industry observations about an emerging skill set, called prompt engineering. Wiegelmann described three phases of using LLMs with respect to prompt engineering:

  1. Excitement at access to a new superpower.
  2. Overwhelm at the possibilities.
  3. Acceptance of the challenge, when you begin to train the system and optimize your prompts.

“We want to help customers skip to beyond phase three and get the best results at one click,” Wiegelmann said.

After all, during a high-intensity incident, there isn’t time to iterate through prompts. To provide immediate value, you need to be asking the right question at the right time. This is where PagerDuty will add value for users through pre-engineered prompts to provide meaningful results at the click of a button. For example, for generative AI to write a status update, the context informs the prompt design, allowing users to get results with the click of a button.

“Prompt engineering offers countless possibilities,” noted Wiegelmann. “Product design needs to focus on the most impactful use cases and give the user the capability to trigger optimized (by experts) prompts in the right place at the right time with one click.”

To iterate on prompts for status updates, the team needed data.

“We carefully inventoried all our own internal data related to incidents and classified and tested it using prompt engineering,” noted Engel. “Because this data was related to incidents that we had triaged and learned from, we were able to accelerate our learning and understanding of the importance of proper prompt design.”

Once again, the team benefited from years of work that power PagerDuty’s AIOps solution. “We’re heavily reusing that expertise,” Engel said.

With the importance of data privacy and security, the team limited itself to training with PagerDuty’s internal data but also generated artificial data sets using — you guessed it — generative AI.

“We’re also using LLMs to create more examples of these datasets, mimicking what we know Slack conversations between incident responders to look like. This also ensures we’re not using customer data in the design of our system,” explained Aguiar. “This really helps us refine our prompts and the quality of the updates.”

Time spent studying, scaffolding and iterating on prompts has paid off. Similarly, so has using our experience as the basis of learning to benefit our customers. The team is beginning to preview AI-generated status updates for both the initial update and subsequent updates that build on what is known up to that point.

For others who are curious, the team recommends this prompt engineering course from

Encoding Best Practices to Make the Right Thing Easier

By making the right thing the easy thing, software has the power to transform how teams work, for the better. PagerDuty has long focused on supporting “healthy habits,” but there are many practices outside the platform. For those practices, PagerDuty has contributed several Ops Guides to open source.

The most recent updates to the Incident Response Ops Guide were around communications and streamlining stakeholder management. With generative AI and large language models, the team recognized an opportunity to encode more of these best practices.

The team has already explored how status updates feed into generative AI-drafted post-mortems. Using large language models could be a boon to teams adopting both of these best practices with the help of AI.

“Using our own best practice-based internal incident review process makes all the difference in producing quality output. Specific structure and guidance with attention to detail really matters,” concluded Engel. “We spent a long time developing our incident review process and have carefully refined it over the years, and it paid off here.”

Learn more about PagerDuty’s forays into generative AI on the PagerDuty blog. You can also tune into my live interview with Engel, Aguiar, and Wiegelmann on Monday, June 26, at 1 p.m. PST on PagerDuty’s Twitch channel.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.