TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
Observability / Operations

Is Your Incident Response Better Than an Energy Company’s?

Portland General Electric’s incident response communications with customers far exceeds that from many tech companies. How does yours stack up?
Mar 21st, 2023 9:04am by
Featued image for: Is Your Incident Response Better Than an Energy Company’s?

Portland, Oregon, has weathered many storms over the past few years that have brought down century-old trees and severed power lines, leaving thousands of people and pets in the dark.

As a resident that means I have become quite familiar with Portland General Electric’s (PGE) incident response communications and honestly, I’m impressed. Their customer communications far exceed what I see from many tech companies. How is that possible? Fewer components to their system? Highly trained responders? I suspect it is due to the information and signals they’re able to get.

It’s no secret that software engineers are faced with an information overload when diagnosing system issues. There’s no solace in the fact that at least these engineers are alerted to real issues because 59% say half of the alerts they receive from their current observability solution aren’t actually helpful or usable. Not only is there a deluge of signals but many are low quality with 40% frequently getting alerts from their observability solution without enough context to triage, according to Chronosphere’s 2023 Cloud Native Observability report.

Follow a power outage through the eyes of a PGE customer and take the quiz at the end to see how your incident communications stack up.

Ready… Set… Outage! 

The very first observable event from a customer’s viewpoint is that all the lights suddenly go out, fans/HVAC shut off and an eerie silence settles in.

Not too long after (within minutes) a text arrives:

PGE text

This is five-star customer outreach because it is proactively notifying me that PGE acknowledges that the power that I care about — specifically my address is affected. Being proactive likely saves PGE support from being overwhelmed with texts and calls while freeing up customer time to find some flashlights and candles.

Another highlight is that PGE’s next steps are clearly outlined. When there is news to report, they’ll be back with an estimated restoration time. Lovely! That’s really the only other information I care about as the customer.

The cherry on top is their method for reporting data inaccuracies. Using the same communication context, not making the customer (who is stressed because the power just went out) have to log into a separate website or app to report.

Say a customer hasn’t signed up for text alerts. They’d likely be navigating to the PGE website on a mobile browser and hoping their battery outlasts the outage. The Outages & Safety section is easy to spot as a permanent part of the main menu bar and gets you to information in one click.

Notice anything about the order of reporting options? I noticed they’re listed in order of ease for customer/efficiency for PGE. Reporting through the app is trivial, followed by filing on their website, both of these allow PGE to quickly get reports at scale. The final option is to report by phone which could tie up humans for the response, hence showing it last.

Hark, a Map!

Say the text notification didn’t quench your thirst for outage information. PGE’s Outage Map is here for you, on demand, showing the company’s current understanding of customer impact.

Why do I love this? Knowing if an outage is widespread across thousands of customers and regions or is isolated to my local neighborhood helps calibrate customer expectations for power restoration.

Obviously, if the entire city of Portland is experiencing outages, it’d be absurd to expect a quick resolution.

Commendations

  1. Location-aware visualization: This is a view that shows the overall system impact. A table view of numbers just wouldn’t be able to communicate that as effectively
  2. Key information about data freshness: When it was last updated and the refresh rate.
  3. Actual number of customers affected: Less useful for me, but increases my confidence that PGE has insight into their system since they’ve got a real live number there.
  4. Ability to search and pinpoint your area based on phone number: Not something arcane like account ID that no one has committed to memory

Basically having this map lets people keep up to date on the resolution progress, or lack thereof, without having to bug and tie up PGE employees; Nervous Nellies can check and refresh to their heart’s content.

The App Worth Installing

I am loath to install an app on my phone these days, but the PGE app has earned a spot in my must-haves.

The design is simple and conveys information so clearly. Let’s look at their app’s outage status page:

PGE app page

Look at that incident resolution timeline! No need to be a lineman to grok it.

  1. Simple stages of an outage with key moments highlighted for customers, especially “crew dispatched,” which means help is on the way!
  2. Updating with the cause of the outage sheds more light on roughly how long to expect mitigation to take. It’s not perfect, but it’s something.

Let There Be Light

Yay! And lights hum back to life, all the machines start making noises, heat flows again and PGE sends a little wrap-up text.

PGE Text

Imagine if you’d gotten that text and were still sitting shivering in the dark. You’d probably be PO’d until you read the final sentence and could easily inform PGE of their grave error by replying STILLOUT.

What a masterclass in customer communication!

Quiz: Is Your Incident Response Better Than PGE’s?

  1. Do you proactively notify customers about impact vs. customers notifying you?
  2. Is it both easy for customers to report impact and efficient for customer support to triage?
  3. Do you regularly provide status updates and info in a customer-friendly way? (For example: Unless your customers are devs, no need to go into detail about a database lockup.)
  4. While an incident is occurring, are you able to determine how many customers are affected?
  5. Can customers subscribe to a feed of outage notifications? Are they tailored to the features/products they use or is it all outages?

    Image 6

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.