Hedley, the robotic skull needs to react to voice commands so he can do cool things and help me in my quest for world domination. Amazon’s cloud-based Alexa voice service can easily be implemented on the Raspberry Pi 3 embedded in Hedley’s cranium. I wrote about putting Alexa on a Pi early last year.
That’s all well and good … if your robot is near a Wi-Fi hot-spot. If not, it just won’t listen.
There is good news, however, in that PocketSphinx is a voice recognition framework spun out of Carnegie Mellon University (CMU), that gives pretty good results and works all its magic on the device. There’s no need for a network connection.
Today, we’ll discuss getting PocketSphinx up and running on Hedley’s Raspberry Pi 3 board. We’ll need an internet connection to install the software and build a language model. After that, PocketSphinx can be used without a network and with its own onboard word list.
The easiest way to install PocketSphinx is by using the Synaptic application manager.
Start synaptic on the Raspberry Pi and use the search function to get a list of programs related to “PocketSphinx.” Checkmark the list items for installation. Next, click the “Apply” button in the main synaptic toolbar, followed by another “Apply” button in the Summary pop-up window. synaptic will go through its paces and install PocketSphinx on the Pi.
Once completed, exit synaptic and move on to building a language model using the browser-based “lmtool” program.
lmtool converts a regular text file of words and phrases into corresponding sounds that are “recognized” when you run the PocketSphinx program and speak into the microphone.
I used the vim editor to build a simple language model text file. Any editor works that outputs a regular ASCII text. I put the following lines in a file called commands.txt and saved it in my /home/pi/hedleytheskull directory.
hello hedley introducing doctor torq turn light on turn light off
You’ll probably want to use less than a couple dozen phrases to keep recognition speed up. The more words in the model, the slower the response. Running PocketSphinx without the specific -dic and -lm file options, is pretty slow since it uses a large default language model. The program will also mix and match the words in your language model, so it will recognize combinations of words, not spelled out specifically as a line in the file. “hello, doctor torq” would be recognized, for example.
I opened the Firefox browser and traveled to CMU’s Sphinx knowledge base tool page. Next, I clicked the “Browse” button where it reads “Upload a sentence corpus file” and chose commands.txt from the /home/pi/hedleytheskull directory.
You can then save the .dic and .lm files into your working directory. Alternatively, download and unzip the tar (.tgz) file to get the two files into your working directory. Mine run with the lmtool program produced a file named TAR9363.tgz. I unzipped it, in a terminal with the following command line.
pi@hedley:~ tar -xvzf TAR9363.tgz
tar unzipped the files into the following set.
9363.dic 9363.lm 9363.log_pronounce 9363.sent 9363.vocab
That’s it for installing PocketSphinx and building a language model. Let’s now look at how to actually recognize speech from the command line.
Talk to Me
I used a standard Logitech C270 USB webcam with a built-in microphone as a voice input device.
PocketSphinx has over 120 command line options. You can see a list by typing pocketphinx_continuous at the command line. You only need a couple of them to actually get it to recognize your words.
Here’s a sample command line I used.
pi@hedley:~ pocketsphinx_continuous -dict /home/pi/hedleytheskull/speech/9363.dic -lm /home/pi/hedleytheskull/speech/9363.lm -inmic yes -adcdev plughw:2,0 -logfn /dev/null
Note that I used full path names for the .dic and .lm files. The -inmic yes option basically turns on the microphone for input. Use -logfn /dev/null to suspend the mountain of log data you’d normally see on the screen without using the option. The log data, is great for diagnostics, though if you need it.
The -adcdev option took a while to figure out. This option works with the Linux audio subsystems and defines your capture device. Run the following command to find the appropriate device.
pi@hedley:~ cat /proc/asound/pcm
Here are the results on Hedley.
00-00: bcm2835 ALSA : bcm2835 ALSA : playback 7 00-01: bcm2835 ALSA : bcm2835 IEC958/HDMI : playback 1 01-00: MAI PCM vc4-hdmi-hifi-0 : : playback 1 02-00: USB Audio : USB Audio : capture 1
Notice the capture device at the bottom. Insert the number at the beginning of the line into the plughw parameter and you are good to go.
Enter the command, wait a couple seconds then say one of the language model lines to use the program. You should see a “ready” prompt and then shortly afterward see the spoken line appear on the screen.
I’d recommend installing PocketSphinx on a clean build of the latest version of Raspbian. The speed of recognition was almost real time, a couple of weeks ago, when I first set it up the program. Since then I’ve added a lot of software on Hedley’s wimpy little 8GB low-end micro-SD card (on the Pi). It’s at about 95% capacity and could be slowing things down a bit.
Recent tests take 10 or 15 seconds to recognize a phrase. I’ll install Raspbian on a new 32GB Samsung EVO+ card shortly and expect the response to be back to normal. I highly recommend these cards.
You can also try out the other command line options, to tailor the program behavior to your needs.
The next step is to integrate speech into some of my Python programs using the PocketSphinx-to-Python API. We’ll explore that topic in a future column.
Feature image via Pixabay.