Modal Title
Culture / DevOps / Machine Learning

PagerDuty Debuts Machine Learning Capabilities and Community Outreach

Sep 7th, 2017 4:01pm by
Featued image for: PagerDuty Debuts Machine Learning Capabilities and Community Outreach

Incident management service provider PagerDuty has updated its Digital Operations Management platform to include machine learning capabilities, one of a number of new PagerDuty features the company CEO Jennifer Tejada revealed at the PagerDuty Summit in San Francisco Thursday.

Jennifer Tejada, CEO of PagerDuty Delivers the Keynote address at PagerDuty Summit

Every modern business is now in a digital upheaval. Modern operations are no longer the purview of IT, Tejada said. Nearly every position is now automated and the silos of last year are gone as departments come together to optimize customer experience.

Therefore, each incident response becomes a full company response, driven by the explosion of signals. By 2020 there will be 30 billion signals crossing individual devices, each with an average of 300 apps on it. Change is necessary.

Control, Tejada said, does not scale in this environment. Delegating decisions up just results in a “Chief Executive Obstacle.” To meet this explosion, companies need to move to distributed responsibility, she explained, where engineers are empowered to solve problems.

There’s no time to identify the right person to solve the problem “You build it, you ship it, you own it.” But it’s not just the engineers who are on-call. All employees at PagerDuty are on call, including the C-suite.

On-call drives learning, she said. When you respond to incidents, you learn a lot.

PD Digital Operations Management Platform. It’s not just IT anymore

With the updated Digital Operations Management platform, customers can now take advantage of the automation capabilities for intelligent real-time decisions, automated precision response and business-wide orchestration. The dynamic notification and event routing automates notification and assignments and routes events to specified teams based on event payloads.

These new capabilities address the challenges businesses face today resulting from digital disruptions. The addition of machine learning to the incident management platform further facilitates the automation of the alert process, providing rapid, frictionless service restoration and enables instant communication across critical teams and stakeholders.

The new Group Alert function, also announced this week, leverages machine learning to use rules-based automation to group related issues. This feature provides the on-call responder critical context to incident management. The alert now includes historical data on how similar incidents have been resolved in the path, including the engineer who solved a similar incident to pull them in if necessary. Likewise, the system can learn which data is critical to push forward and look for similar incidents and what steps were taken to resolve those past incidents.

Business-Wide Orchestration

Response Plays allows you to define what a response should look like for different incident levels. For example, an incident of “severity level 2” (Sev2) where revenue is impacted will pull in a specified number of players with a click. The Plays are set off automatically, saving time and creating consistency across incidents. Automated Precision Response ensures the right people are involved using dynamic routing and notification.

So customer support can be proactively engaged so they can proactively manage customer experience. With the feature, you can add additional roles to the incident as necessary. Each person on the response team is folded into the communications. So they are able to focus on their specific area of expertise (e.g., customer experience, tweeting an announcement that they know about the program and are working on a solution) but also see the entire program.

A new “infrastructure health” console has been added to give you a bird’s eye view of the recent events. The on-call engineer knows exactly what to do or at least where to start looking because the incident shows them similar incidents, who responded to them and the resolution. And every communication is recorded for the entire team to see.

A new post-mortem functionality report pulls in the timeline for an entire incident including communication and resolution captured during the session.

Tejada also announced the launch of a new professional services response team called Digital Insights Service. Created in response to a request for help from PagerDuty customers, these teams work along side PagerDuty customers to streamline not only PagerDuty functionality but also address any organizational changes needed in order to reimagine DevOps.

The last announcement from Tejada is the launch of, an effort to commit one percent of PagerDuty equity and employee time back to the community. “It’s early days,” she said, “but our vision is that we create a network effect that ripples throughout the community.”

The four initial partners for include Girls in Tech, the Hispanic Executive IT Council, Hackbright Academy, and Code2040, which are all committed to bringing more diverse, qualified engineers to the tech community.

“PagerDuty’s new integrated event intelligence and response automation capabilities provide a big step forward in driving agility for our customers when it matters most,” said Chris DeAntonio, a solution principal for Slalom Consulting. “With PagerDuty, we can help our customers spend less time firefighting and more time developing new features for their customers.”

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.