All About Alexa: a Developer Look into Amazon Echo
But the voice-activated smart speaker is now getting increased attention, and some positive reviews, both from end users and developers. There are fresh rumors that Google is building their own version. It’s all proof that Amazon has found their way into an attractive and important niche, and they’re actively encouraging developers to help them improve their A.I. assistant.
Released just 9 months ago, Amazon Echo is a kind of table-top personal assistant that’s voice-enabled (with far-field voice recognition, so you can activate it from across a room). This month Amazon also announced the new Echo Dot, which lets users extend the voice-enabled assistant to any room (with the same “far-field” range). And the new Amazon Tap (also announced this month) will deliver its audio output over Dolby stereo speakers. But in less than a year, Amazon Echo has already attracted some positive reviews.
TechCrunch recently called Echo “a surprisingly good product,”noting it can now also pay your credit card bill, call Uber, and order pizza. “Third-party skills add even more capabilities,” Amazon explains on the SDK’s web page, “like ordering a pizza from Domino’s, requesting a ride from Uber, opening your garage with Garageio.” You can play music from Pandora, Spotify, Amazon Music, TuneIn, and iHeartRadio, or listen to audiobooks (from Amazon’s Audible service). Echo can announce sports scores and schedules, set an alarm, calculate the quickest route to work, and even search Yelp.
A recent article in the New York Times adds that Amazon Echo can also provide weather forecasts, and shopping lists, along with new skills being built by developers — but more importantly, the service is gradually evolving. “At first, the skills were a little goofy, like a Magic 8-Ball simulator or one that offers pick-up lines,” wrote one technology columnist. But Amazon is encouraging developers to come up with their own fresh new tricks for Echo’s personal assistant. Her name is Alexa, and you can teach her new “skills” with — of course — the Alexa Skills Kit, a collection of self-service APIs.
There’s already an impressive set of connections to other cloud-based services available at IFTTT, many contributed by Amazon itself. Besides the site’s “recipes” for finding a lost phone, you can create a list of songs you’ve listened to, and then send it as a status update to Facebook or Twitter, or export it into a Google Docs spreadsheet.
After a glowing review in the New York Times, Boston developer Jean-Charles Sisk shared his own experiences building for Echo in the comments on Hacker News. There seems to be something elegant about receiving input from something besides a mouse or keyboard. “it’s a natural interface with a decent interaction model,” Sisk wrote. “The Echo feels like one of the first voice-based interfaces that isn’t a complete gimmick.”
But more importantly, it makes it possible to provide genuinely useful services. “My grandfather passed recently and my grandmother has horrible eyesight,” Sisk shared on Hacker News. “I wrote a skill that enables my grandmother to ask to start a phone call. “She is then asked who she would like to call. Once she says a name, she receives a call on her landline, is greeted, and the call is forwarded to the intended recipient.
“There was a small learning curve (you can’t just say, ‘call John,’ you have to ask her to ‘start a phone call’) but it has worked and has provided an often confused elderly woman with poor eyesight a straightforward way of connecting with her loved ones,” Sisk wrote.
His grandmother still has to pick up her phone separately, and there’s a small cost since the call is ultimately made using Twilio. But he applauds the high-quality speech-to-text capabilities that are now available in the world today and specifically credits the far-field voice recognition that’s a big part of Echo. “It’s as much the hardware as it is the platform that makes it all work in this case.”
“No experience with speech recognition or natural language understanding is required,” explained Amazon’s David Isbitski, Amazon’s chief evangelist for Alexa and Echo, in a blog post when the device was first released last June. “Amazon does all the work to hear, understand, and process the customer’s spoken request so you don’t have to.
“Get in early,” he added, “Natural user interfaces, such as those based on speech, represent the next major disruption in computing…”
And of course, there was a nudge toward cloud services and a mention of Amazon’s own suite of cloud products. “If you have an existing cloud-based service, you can easily use that to start. If not, AWS Lambda is a compute service that makes it really easy to build a cloud-based service that responds quickly to a voice request,” Isbitski wrote.
An Emerging Developer Community
There’s already an official developer community on Hackster.io, and Amazon’s even created the Alexa Fund for early stage pre-revenue companies on up to established brands. They’re planning to spend up to $100 million to invest in innovative uses for voice technology.
Besides offering money, they’re also offering support for developers (and eventually, marketing), plus an AWS Activate membership (which provides low-cost cloud resources to selected startups).
And for everyone else, Amazon is offering webinars every Tuesday and is also holding weekly “office hours” with an Alexa technical evangelist.
Judging by its web page, developing for Amazon Echo is a lot like building an Android app. Developers still define intents to handle user requests, which will now be passed along to their own cloud-based services. But with Amazon Echo, it’s all kicked off by a voice command, which will then get mapped onto an intent with some simple JSON.
The Alexa Skills Kit even comes with a series of built-in intents, and Tuesday Amazon announced some new ones. It’s now possible to add voice-controlled timers and alarms to your apps, as well as voice-controlled pauses for any media that’s playing, mute/unmute functionality, and a voice-controlled way to adjust the volume.
But developers can also create their own intents. Just remember that if you’re building your own, you also supply a “sample utterances” file that lists every permutation of the command that might be spoken by an actual user.
And of course developers can also specify parameters for intents (which are called “slots” in the Alexa universe), for any type of additional data that their service needs to receive. Whenever the user speaks, whatever parameters they’ve also spoken will be translated into text. And again, there’s some parameters that are also already built-in. For example, spoken words indicating a date can be converted into an actual date format for your service — even more abstract words like “today,” “tomorrow,” and “July.” Numerical digits can also be extracted from spoken words, and Alexa can also recognize the names of every city with a population over 100,000, along with thousands of popular first names.
Each human command is ultimately handled by an online service, which can be hosted by Amazon’s Lambda Function in AWS Lambda, or on your own server. Just make sure that you’re using SSL, and that your service returns responses over HTTPS, and accepts HTTPS requests on port 443. The first step is to validate each request’s signature (and checked its timestamp). But then broadly speaking, your service will either launch or end a session or handle a specific request — and the code can be written in a number of languages, including but not limited to Java.
Alexa converts any text it receives from your service back into speech for the lucky end user.
“[T]he Echo has a way of sneaking into your routines,” wrote Farhad Manjoo in The New York Times. “When Alexa reorders popcorn for you or calls an Uber car for you, when your children start asking Alexa to add Popsicles to the grocery list, you start to want pretty much everything else in life to be Alexa-enabled, too.”
If this is the next big disruptive technology, Amazon is already staking out its turf. And it could give them a powerful new tool for attracting new purchases from Amazon.
Images from Amazon.