Modal Title

Off-The-Shelf Hacker: Refining Voice-Jaw Synchronization in Robotic Skulls

Aug 29th, 2018 12:00pm by
Featued image for: Off-The-Shelf Hacker: Refining Voice-Jaw Synchronization in Robotic Skulls

Realistically moving a robot’s jaw in sync with its synthesized voice is a challenge. Frightprops has the PicoTalk servo controller for about $80. Skulltronix offers the SON of Chuckee board for $125. While somewhat pricey, these boards also provide servo control for the eyes, pan, tilt and so on in addition to the jaw.

Ultimately, my own talking skull project, Hedley the Skull, should do text-to-speech, from his onboard Raspberry Pi. I envision various applications outputting Hedley’s voice to a speaker and at the same time, positional data to the jaw controller. Having Hedley “speak” results through Amazon’s voice services, from a script or maybe even from an MQTT message opens up a lot of interesting show possibilities.

While off-the-shelf boards are promising, I don’t think they are flexible enough to suit my purposes. I want to be able to fiddle around with how the audio is actually analyzed, so I can get the best possible synchronization of the audio with the jaw movement.

Using a combination of hardware, firmware and software, real-time audio/jaw movement is certainly possible with an Arduino and a Raspberry Pi. Today, we’ll examine the Arduino-jaw side of the equation. The Raspberry Pi-sound and application side will be discussed in a later story.

The Arduino Jaw Controller

In the earlier “Off-The-Shelf Hacker: Hedley the Robotic Skull Speaks” story we looked at using an Arduino with a small electret microphone to move the jaw servo. I’ve since modded the servo limits and added code to read serial data from Hedley’s Raspberry Pi 3 instead of from the hardwired microphone input. The jaw servo is attached to digital pin 6. For now, prototyping the audio analysis program is done on a Linux laptop. The code will eventually move over to Hedley’s Raspberry Pi 3, so no external computers are required for his operation.

One huge breakthrough was to reduce the number of skull jaw positions down to 10 and scale the serial input to fit within that range of motion The jaw is closed at zero and fully open at position nine. I reasoned that just sending “0” through “9” as a text string would keep the data transfer size to the bare minimum. We only have roughly 30 degrees of jaw travel from fully open to fully closed. Divide that by 10 and we get about three degrees of movement per step.

In real-life, incrementally moving the servo in three-degree steps gives fairly well-synced jaw movement without the twitchiness of previous versions of the code. We can probably decrease the number of degrees per step for more resolution and smoother movement. That means we’d have to make a corresponding change in the serial data scale, as well.

Another real-world note is that I used a practically antique Arduino NG for the jaw-servo controller. This old board screams along at 16MHz and uses 8-bit data addressing. It’s certainly of very modest horsepower, compared to more recent Arduino modules.

Nevertheless, the NG is perfectly capable of tracking the data coming from my Linux notebook running the Processing language audio analysis program, with no noticeable lag between the audio and the jaw movement. Performance, in the microcontroller world, tends to increase with time, so using an older board for initial prototyping is a pretty good beginning baseline. Things will get smoother and more capable when new hardware/software is swapped into place as the project matures.

Updated Code

The Arduino NG code is pretty straightforward. Be sure to designate the NG board when compiling and uploading, otherwise, you’ll get errors and the jaw simply won’t move. Don’t forget too that you have to push the reset button before a firmware upload with an NG board.


Variables are initialized at the beginning of the file.

Notice the open and closed position values. These were found through trial and error. The numbers work with my servo and jaw mechanism geometry, so you’ll need to experiment a bit to find values that work in your situation. Be ready to pull the USB cable immediately, if the servo slams against the stops.

There is also an initial position value “s” of 105. The old NG has a built-in 10-second firmware upload period after a reboot. If the NG gets any data over the USB (serial) line, such as sending it positional data from the Raspberry Pi audio analysis application, its firmware will be corrupted and you’ll sit there wondering why nothing happens. After powering up the board, the program runs and moves the jaw to a safe location, at the end of the 10-second boot-up period. I just wait for the jaw to initialize and then know that I can safely start sending jaw data. No jaw movement and I know to re-burn the NG’s firmware.

Next, we open the serial port and start accumulating characters until a “\n” is received. The string is then converted to an integer and fed into the map function. The map function scales the open and closed jaw positions to the 0 through 9 numbers from the input.

Finally, the scaled jaw position number is fed to the jaw servo and the program loops through the serial data stream from the audio analysis program, currently running on the Linux notebook. I’m working on moving the analysis over to the Raspberry Pi, although we’re not quite there yet.

What’s Next

I’m very happy with the jaw synchronization to the audio analysis data. I wrote a Processing language program on the Linux notebook to play the audio and convert the amplitude of the sound to the required serial zero through nine values. The full-sized hobby servo seems more than up to the job of swinging Hedley’s jaw through its 30-degree arc in time with the speech.

Soon, I’ll install a small speaker in the roof of Hedley’s mouth so the sound comes out of the right place. And naturally, Hedley will have to have his own microphone when we are on stage.

It should be a pretty cool effect when everything eventually comes together.

Stay tuned for coverage of the audio analysis program using the Processing language in an upcoming article.

Feature image via Pixabay.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Shelf.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.