SoundHound Expands into Voice-Driven Digital Assistance

Music information service SoundHound foresees the future as powered by voice — but then that was the founders’ vision back in 2005.
“Six years out, we will be interacting with connected devices everywhere you go. … Voice will be the most effective, expedient way — in medical it will be the most hygienic and safe. In the car, it will be a transformative experience,” said Katie McMahon, SoundHound vice president and general manager.
While popular virtual personal assistants Siri, Microsoft’s Cortana, Amazon’s Alexa and Google Assistant might be more well known, SoundHound has expanded beyond its search service to provide a generalized voice interface that the company claims is superior at handling complex queries and follow-up questions.
It’s working with a range of big-name enterprises, including 11 automakers, to create custom voice experiences — and wants to help developers add voice components to their own applications.
More Than 10 Years on AI
SoundHound — brainchild of James Hom, Keyvan Mohajer and Majid Emami — was born in a Stanford dorm room.
The Santa Clara, Calif.-based company’s first product was the music-recognition mobile app SoundHound, with capabilities including the ability to identify a song the user sings or hums. Hound is its voice-enabled digital assistant and Houndify its voice-AI developer platform.
The company spent more than 10 years working on artificial intelligence in the lab with scientists with discrete specialties, such as speech recognition, neural networks and natural language processing, McMahon explained.
“This is a late-stage startup, not a pivot to slap on the term ‘AI,’ ” she said. “We built the platform specifically to help developers leverage everything it took 10 years to build.”
The global intelligent virtual assistant market is projected to reach $12.28 billion by 2024, according to Grand View Research, and Juniper Research has predicted voice-enabled products will be in 55 percent of U.S. households by 2022.
The SoundHound technology is used in the home robot Kuri, the personal agent system in Hyundai cars, the Rand McNally OverDryve navigation system and in ModiFace’s augmented reality “smart mirror” that allows users to try on different makeup.
It’s also used in wearables, including the Casio G-SHOCK GBA-400 Bluetooth smartwatch and the Onkyo VC-NX01, a personal assistant you wear around the neck.
The Houndify platform provides developers with more than 150 domain sets of data, including information from partners such as Uber, Yelp, Expedia, AccuWeather and more.
The company is promoting what it calls Collective AI, a collaborative community among developers. For instance, if developer A creates a location domain and clicks the boxes on the site making it shareable and extensible, developer B can use it in a ride-sharing domain that’s shareable, and developer C can use both in a restaurant domain.
The results is the ability to answer questions such as: “How much does it cost to go from the nearest airport to the best Italian restaurant in San Francisco that has four stars, is good for kids, is not a chain, and is open after 9 p.m. on Wednesdays, and how long is the trip?”
Its strategy has been to partner with large companies, offering to work with them to retain control of the voice experience and the ability to create custom experiences for their brand.
In May, it announced a $100 million corporate round from investors including Tencent, Hyundai, Orange and Daimler. It has raised $215 million total.
As Mohajer pointed out at the funding announcement, “These are companies that have successful products and a huge user base, and they clearly understand that voice AI is a very important strategic area for them.”
That huge user base serves SoundHound’s adoption strategy as well.
Complex Queries
Houndify combines a speedy speech recognition system and sophisticated algorithms for natural language understanding, enabling it to understand people’s normal speech patterns.
There are two parts to Houndify’s “special sauce,” according to McMahon: One is what the company calls speech meaning—for taking speech input, understanding it, and quickly providing results.
“Every other system uses a two-step process. First, they run voice queries through ASR (automated speech recognition), which pumps out a text script translation that can then go into an NLU (natural language understanding) engine, looking for the meaning. In any two-step process, there’s error, there’s latency. If there’s a mistake in the first step, then the second step isn’t great,” she explained.
She says the company has increased accuracy by combining the two steps.
It also doesn’t use pure entity detection.
“Say you want to go to a restaurant, but you don’t want to go to a Chinese restaurant. You can say to Siri, ‘Show me restaurants but not Chinese restaurants.” It heard “restaurant” and “Chinese” and weighted those so it will show you everything you did not want,” she said.
“We can handle a query as specific as, ‘Show me Asian restaurants excluding Thai and Japanese.’ The response will come back. Then you can do a follow-on to further refine: ‘Show only those that have outdoor seating and are open after 8 p.m.’
SoundHound offers a SaaS platform, but also works with clients on be a royalty-based business model. More than 60,000 developers have registered on the Houndify site, she said,
The company recently announced it’s a launch partner for Snap Kit, Snapchat’s third-party integration platform for iOS and Android.
It offers SDKs for iOS, Android, C++, Windows C++, Wiced C++, Houndify Explorer, C#, Java, JavaScript, Python and Go.
It’s working on what it calls “natural voice interfacing,” which eliminates the need for a “wake word” — you just start talking naturally.
Feature image via Pixabay.