Machine-to-human interfaces are all the rage right now.
Plenty of networked Alexa-styled solutions for voice recognition and machine speech exist. They are amazingly lifelike and use the power of artificial intelligence, using an internet-connected cloud server somewhere to work their magic. I covered setting up Alexa on a Raspberry Pi early last year.
While all this is interesting and useful, any break in connectivity to the internet will render it inoperable. Fortunately, with the immense horsepower built into today’s nano-Linux systems, such as the Raspberry Pi, we don’t need an internet connection. We can do text-to-speech locally, right there on the board, using eSpeak.
Hedley, the robotic skull will definitely need the ability to talk to humans, especially when he isn’t connected to the internet.
Today, I’ll cover how to get started with eSpeak on a Raspberry Pi 3. Note that I used an HDMI monitor and a Logitech wireless keyboard/mouse-pad hooked up to the Pi in a regular desktop configuration for this story. There is another readily available text-to-speech program for Raspbian Linux called Festival. I’ve used it a few times and it sounds pretty good too. Maybe we’ll discuss festival in a future article.
The eSpeak software is a standard package available in the “Stretch” version of Raspbian Linux. I used the Synaptic package manager for installation, although you could just as easily use apt-get. The following command-line installs eSpeak on your machine.
pi@hedley:~$ sudo apt-get install espeak
With the program installed make sure your volume is up about half-way. I use the little speaker icon on the main taskbar for adjusting sound levels.
Next, run eSpeak with a phrase enclosed in double quotes. Here’s an example.
pi@hedley:~$ espeak "My name is Hedley and I'd like to introduce Doctor Torq...hello Doctor Torq"
You should hear the phrase come out of your HDMI monitor’s speakers. By default, eSpeak seems to have a bit of a male English accent. If you don’t hear anything, the sound may be getting piped to the analog audio port. Plug in an earbud and try the command again. You’ll probably hear the voice in the earphone. Right click on the speaker icon and select HDMI to route the sound to the speakers in your monitor. Run the command again and the words should come out of the monitor speakers.
I wanted to use an American voice, so I changed it with the “-v en-us” option. Type “espeak –voices” to see all the available voices.
Suppose you’d like a female American voice? Use a “+f” after the voice designation:
pi@hedley:~$ espeak "My name is Hedley and I'd like to introduce Doctor Torq...hello Doctor Torq" -v en-us+f1
The “+f” is optionally followed by a number. Different numbers adjust the pitch and accent. Experiment with “1, 2, 3” and so on to see what sounds best. As you would expect, a “+m” attached to the “-v” option specifies a male voice.
You can also adjust the sound of the speech with other parameters:
pi@hedley:~$ espeak "My name is Hedley, please help me welcome Doctor Torq to the stage" -v en-us -p 30 -s 150
This example uses the “-p” option for pitch and the “-s” option for speed. Use “espeak –help” for a complete list of options.
I’ve used eSpeak on both an Xubuntu Linux notebook and a Raspbian Linux Raspberry Pi. It works reliably on either one. Interestingly, there is some difference in the inflection and pitch, when using exactly the same phrases and options on different machines. I’ll leave the reader to ponder those implications for our machine-to-human future.
Putting Speech into Your Project
That wraps up the basics of using the command-line eSpeak program. Give it a phrase and it will say the words.
My plan is to use a Python program to speak a phrase when I press a button wired up to a general purpose input/output (GPIO) pin. I’m investigating the best methods of calling eSpeak from within Python, although I’ve done system calls in other Python projects without problems.
I’d like to have Hedley the Skull introduce me to my audience at the beginning of a conference tech talk. When it’s time to start, I’ll push one of the buttons and Hedley might utter one of the phrases from the previous examples. I could then respond with “Thank you, Hedley.” He could then say something about what I’ll cover with another button press. Using this methodology we can carry on a short conversation, before launching into my presentation. Through practice, the scripted conversation should give the illusion of real machine-to-human interaction.
I’m still working on syncing Hedley’s jaw movement to anything he says. It should be pretty cool when it all works together.
Now, how long will it be until we can’t tell if it is a real conversation or just a “show”? Only off-the-shelf hackers can answer that question.
Feature image via Pixabay.