With the explosive growth in digital comes the need to completely change the paradigm in how companies communicate with customers, observed Rachel Obstler, PagerDuty vice president of product management. This is causing a digital operations transformation that is forcing IT operations to change as well.
Next week, the incident management company is hosting its Summit 2017 conference at Pier 27 in San Francisco. There are three parts, Pager Duty University, the Community Break-a-thon on Sept. 6 and the Summit itself on September 7. Held on Wednesday, The two PagerDuty University tracts — Owning Incidence Response and Improving On-Call Life — are sold out. But there is still space at the Summit and Breakathon, which are both free, though registration is required.
Obstler talked with us about what attendees could expect to see at the conference.
Not Just Tech
In the current competitive climate, it is important for a business to find the problem before its customer does, she explained. In order for that to happen, a company needs to have good monitoring and a good process for taking in data, making sure the data gets to the right person and giving that person authority to coordinate the response.
For most businesses, this requires a change in business processes, Obstler said. Old chains of commands with five layers of approval simply don’t work well in this new situation. At PagerDuty, the engineer who built the code is empowered with the right data and the right tools and the ability to make decisions and changes to the service to fix a problem that is surfaced through monitoring.
The old paradigm doesn’t work anymore, she said. An IT issue does not happen in a silo and cannot be fixed in one. Everything is connected and issues go across the organization.
For example, Obstler said an issue brought to light by an alert may require a heads-up to the customer service team so they are aware of it if customers start calling in, and to sales teams, so they don’t head into a demo without knowing the problem exists. All so you can see the problems before customers see the problem.
“It’s a whole business response,” she said, “not just a tech response.”
One of the exciting things about the upcoming PagerDuty flagship release is how to best apply these best practices to customer support. So any issue from across the organization that requires a quick response can be addressed ASAP.
Being in a digital world, said Obstler, means there are so many different things that require a quick response.
For example, she said, you’re a retail boutique and your social media manager finds a celebrity wearing your signature piece, triggering a huge uptick in traffic to your website. That, in turn, requires a huge coordinated response. “Do we need to order more of these?” she asked. “Should this be on the front page of our website? We need to let IT know there will be more traffic.”
You need to move fast or you’ll miss the opportunity. “All of these we’ll be talking about in the Summit in various ways,” Obstler said.
We Break It, You Fix It
The Community Breakathon, held on the sixth and limited to four hours, is set up for small teams to compete. But unlike a typical hack-a-thon, teams will be presented with some typical services and application problems. “We’re going to give you stuff that’s broken,” she said, “and you get fix it.” The details, including prizes, are here.
The event is inspired by the idea of chaos engineering, said Obstler. PagerDuty created a program called Chaos Cat, which is based on an idea originally conceived of by the NetFlix Chaos Monkey program that randomly terminates instances in production to ensure resiliency.
If you want to do incident management correctly, she said, it needs to be practiced. “Things go wrong. You want to be ready for it and resilient to it.”
In addition to Chaos Cat, the company also runs something called Failure Fridays. These events are more planned ahead of time and target a specific function that teams need training on; for example testing a fail across their three data centers. These exercises not only ensure that system reliability does actually exist but allows the teams to practice incident response protocol, which helps everyone know what their job is when something does go wrong.
PagerDuty is a sponsor of The New Stack.
Feature image via Pixabay.