Culture / Edge Computing / Machine Learning

Off-The-Shelf Hacker: Conversations with Hedley, the Robotic Skull

13 Feb 2019 3:00pm, by

One of my dreams has always been to carry on a conversation with Hedley, the robotic skull, during shows and small-scale demos. Speaking to a skull and actually having it answer in any meaningful, real-time way is pretty complex. Alexa-styled responses are possible, although it requires a solid connection to the internet. That’s not always available, sitting outdoors at a patio table during a vendor-sponsored local-pub demo gig. Standalone hardware artificial intelligence (AI) machine vision is getting pretty mature. Hedley’s JeVois machine vision sensor is a good example. AI speech, not quite so much.

Like others in the talking robot niche, my solution is to “simulate” a conversation using scripted replies from Hedley. He and I become actors who say our lines at the right time.

We’ve covered Hedley’s Arduino and jaw servo combination along with the recently added audio amplifier and speaker installed in the roof of his mouth.

The last major piece is the software used to control Hedley’s utterances during an exchange. I covered a baseline Processing program back in September but didn’t have any of the audio file list stepping or push-button code developed at the time.

Making Scripts

In this latest version, the program opens a list of pre-recorded WAV sound files and steps through them as Hedley and I talk to each other. Additionally, the sound is analyzed on-the-fly, producing a digital stream produced that feeds the Arduino-jaw servo sub-system. The sound files can be either recorded using a microphone and Audacity or by saving the audio output of a text-to-speech command line tool like eSpeak. You can develop much of code the on the Linux notebook and then move everything over to the Raspberry Pi to finish out the push button code. Alternatively, you might use the “Upload to Pi” function under the Processing version 3.4 IDE tool tab. This lets you develop on the notebook and run it on the Pi over the network. Be sure to put the audio list text and WAV files in the appropriate directory on the Pi.

Here’s the code, that runs on Hedley’s Raspberry Pi.

As always, we start out initializing variables, constants and libraries. Next, we count the number of lines in the audio file list to use as an index for each phrase as we step through the list.

Going into the main program (draw), we need to check the button status. Not noticing any, it will just play the first the audio file using the SoundFile function. A button push will increment to the next audio file. You can vary the playback rate and other parameters, to get the sound you want. The Amplitude function gives us a digital record of the audio file that we can then reference in the rms.analyze function, in the next statement block. Rms.analyze returns a digital stream of data that we then send to the Hedley’s Arduino Nano microcontroller over the USB line. The Nano interprets the stream and moves the servo based on the data while it simultaneously tracks through the audio.

The whole process is pretty quick. There is an ever-so-slight lag between the audio coming out of Hedley’s mouth and his synchronized jaw movement. Most audience members probably won’t see it.

Going Further

While the Processing program works well, there is plenty of room for improvement. For example, I’ve thought about eliminating the Arduino Nano controlling the jaw servo and actuate the jaw right from the Raspberry Pi. I don’t think it would take much more than adding a reference to the servo library at the beginning and rewriting the code to send servo angles (from 0 to 30 degrees) out to the servo over a general purpose (GPIO) pin on the Pi. Nixing the Nano would simplify overall power management and maybe even remove the tiny lag between the audio and jaw movement.

Another area I want to explore is near real-time text-to-speech and speech recognition. Wouldn’t it be cool to say a word or a phrase to Hedley and then have him somehow respond with his own appropriate phrase. We could start out simple and have him say certain phrases based on a keyword or two that I say.

I’m not sure if Processing is up to the job of this near real-time stuff. We may have to investigate other programming languages, like C. I’ll keep you posted.

A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.