We’ve come a long way since “Press 1 for Yes, 2 for No.” Over the past decade, tech giants such as Amazon, Apple, Google, and Microsoft have been developing proprietary voice platforms to power virtual assistants that talk with us. Building on recent dramatic advances in voice recognition, these companies have also begun selling consumer hardware with their chatty assistants built in.
Home devices like Amazon Echo and the Google Home have been hot items in consumer tech, allowing people to ask about the weather, traffic on the way to work and upcoming meetings for the workday — all hands-free. A few have become so pervasive that they have permeated pop culture and established their own recognizable personalities. Apple’s Siri was even credited in the cast of the Lego Batman movie as the voice of the Batcomputer.
Now imagine IT practitioners conversing with the Internet of Things to efficiently handle day-to-day duties, coordinate major incident responses, and check in on the health of their services and teams. It’s digital operations handled by voice, or, if you like, VoiceOps. While there’s still work to be done, this vision is closer to reality than movie magic, considering the prototypes we’ve developed so far at PagerDuty.
People Will Use Voice – If It Works
Voice paradigms are already gaining popularity and earning consumers’ trust. For instance, 15 percent of consumers use chat to interact with health care providers, and 33 percent of consumers chat with banks, either with a real person or via chatbot. Even though these aren’t spoken interactions, they show that consumers are using natural language interfaces to share sensitive details such as healthcare and financial information.
While convenient for personal use, a voice-interaction platform that IT practitioners can use every day has even greater potential. The big tech companies like Amazon, Apple and Google that enable the most popular voice-powered assistants have invested heavily in consumers yet have only scratched the surface of what’s possible for the enterprise. People who buy consumer products also have jobs — chances are they work at a company. There’s a growing market for voice capabilities. In September, we attended the Learning to Talk meetup hosted by Menlo Ventures and were excited to come across a few companies designing for the enterprise. Nuance, for instance, helps support centers transcribe calls and suggests machine-generated solutions for human agents to use in response to callers’ issues.
Giving a Voice to PagerDuty
How our VoiceOps project came to be was almost whimsical. Our Vice President of Product Rachel Obstler approached us with an inspiring question: How might voice interaction transform how people manage their digital operations? After mapping out the possibilities, our team of three — a UX designer, product manager and a developer — set to work. Our initial prototype used a Google Home device and public PagerDuty APIs to answer basic responder peacetime questions like, “When am I on call next?”
From there, we considered how such functionality might be used by a manager or executive and programmed our system to be able to answer questions such as, “How was my team’s night?” and “Was there any significant downtime in the last 18 hours?”
Attendees at PagerDuty Summit 2017 stopped by our User Experience Labs booth to ask our PagerDuty virtual assistant prototype questions and share their prior experiences of voice technologies with us. It was great to hear their first-hand reactions, explore hypothetical scenarios with them, and find out what questions they would like to be able to ask.
On December 5, partnering with the Google Assistant team, we hosted a community event with attendees from HackBright, Code2040 and the PagerDuty community. It was an excellent opportunity for the broader PagerDuty community to explore the potential of VoiceOps in a supportive environment. By fostering a diversity of voices in bringing the project to life, we hope to improve the user experience for all.
As we’ve been experimenting with VoiceOps concepts, we have found in beta testing that if the voice recognition and interaction works, people will use it. If it doesn’t work as designed, individuals will understandably become frustrated and stop using it. It can also be quite difficult to get all the devices to work reliably; voice recognition can still be quite spotty when dealing with a range of accents or in rooms with less-than-optimal acoustics. These early investigations tended to experiment with short queries that were unlikely to be misinterpreted.
We’re also thinking about context of use and how to keep system responses sounding natural. If you’re on five different on-call schedules, for example, the complete and precise answer to the question, “When am I on call next?” could end up being way more information than you have the patience for when you’re on the go. You probably want a useful yet succinct reply rather than the whole story of your upcoming on-call life, which means we need to apply a layer of situational awareness on top of the raw data. By continuing to explore and prototype, being as creative as we can about our future in the face of short-term challenges, we will be ready for what’s coming—not just what is already here.
Will VoiceOps Become Its Own ‘Thing?’
Incidents happen, dinners are interrupted, and your laptop is not always within arm’s reach. In our efforts to improve the lives of people on the IT operations front lines, feedback on our VoiceOps concepts has been both encouraging and enlightening. As for next steps, we’re currently engineering and testing for use in hands-free incident response scenarios. We’re also developing functionality for managers to be able to get questions answered about team health, system performance, and effectiveness of incident response.
Whether or not VoiceOps becomes a phenomenon in its own right remains to be seen. At the very least, we’re entering an era of multimodal user interfaces. As part of your professional role, you may accomplish some things through a gesture, others through typing bot commands, and still others with your voice. In the end, the best and most elegant interaction will be whatever happens to be the most convenient for you in the given situation. And that’s as it should be.
People have communicated human intent to computers through punch cards, cryptic strings of letters nestled among precisely paired parentheses, via keyboards, mice and fingerprint-smeared glass. Technology is now approaching the point where it can listen and speak to us on our own terms, and isn’t it about time?
In the words of Liz Lemon of 30 Rock fame, “I want to go to there.”