Modal Title
Edge Computing

Off-The-Shelf Hacker: ‘I See a Machine Vision Sensor’

This week, Dr. Torq adds object recognition to his robotic skull.
Jul 20th, 2019 6:00am by
Featued image for: Off-The-Shelf Hacker: ‘I See a Machine Vision Sensor’

Last week we covered the basics of object recognition with voice synthesis. A JeVois smart machine vision sensor “sees” objects, the eSpeak command-line program “says” the name, the guvcview program interfaces to the sensor and a Processing script manages the whole works. I ran it on the old war-horse ASUS Linux notebook.

This week, I’ve successfully ported everything over to run on Hedley, the robotic skull, with his built-in Raspberry Pi and gaggle of Arduino boards. The JeVois sensor was reinstalled in his right eye-socket and a small audio amplifier lets him yak through the Raspberry Pi audio port and skull-mounted speaker.

The setup was streamlined to just use a Processing language script, without needing a video feed through guvcview. We also no longer need to send commands to the JeVois through the Arduino serial monitor. Here’s how it works.

Prep the Pi

Hedley runs a Raspberry Pi 3 Model B+, with the Raspbian GNU/Linux 9 (Stretch). The Processing IDE is at version 3.5.3. I also loaded luvcview and guvcview using the Synaptic program manager. Stretch runs great on the Pi 3, by-the-way.

Those two programs work very well for testing that the JeVois is recognizing and sending out data. Note that guvcview pegs the processor utilization at 100% because it is very resource-heavy. After about a minute, the high-temperature CPU indicator will light up on the main desktop display. For longer-term testing use luvcview. Here’s a sample testing command line. Use to exit.

luvcview -f YUYV -s 640x498

I’ve upgraded the code from last week to configure the JeVois each time you hit “run” in the Processing IDE. It eliminates the need for the webcam viewer or the Arduino serial monitor to start recognizing objects.

Here’s the new Processing code.

Code Particulars

Notice in the setup section that we have a bunch of myPort.write() statements. These set up the JeVois to communicate over the USB serial port, set the data detail to “Normal” and pick the recognition algorithm to use. The DarknetYOLO one worked well for identifying humans, chairs, keyboards, remotes and so on. Note also that all the lines end with a newline character. You have to include a newline at the end of each command so it gets executed in the JeVois serial interface environment. If you leave the “\n” off, you’ll spin your wheels for hours wondering why the algorithm doesn’t change or why the data never streams.

I had a few problems with sorting out the miscellaneous response text, the actual data and blank lines when there wasn’t anything to recognize. The JeVois uses a standardized data format that varies according to the level of detail selected. A typical line might be the following.


I was also getting an occasional “OK”, as a response from the JeVois. If you just get an “OK”, even though the line is not NULL, Processing generates an array out-of-bounds error because we are trying to index a line element that doesn’t exist. There aren’t any other tokens. I’ve run into this in other projects with Processing. The simple solution is to process the data line if it is not NULL and we find one of the detail level indicators, in the first text data element. N corresponds to Normal, T corresponds to Terse and so on.

Also, apparently Processing can’t compare strings with a simple logical && or || test. You must use a string compare function. After separating the data line elements with the splitTokens() function we can compare the first data text string with “N2” using the q[0].equals(“N2”) function. If TRUE we proceed to say the object name in text position number two (q[1]) without the array out-of-bounds error.

There are also a couple of myPort.clear() calls in the code. These ensure that we don’t have any lingering odd data in the serial port feed, as we read new lines of data from the JeVois sensor. The delay(3000) call gave a three-second pause between objects recognized. Without a proper delay, there is a tremendously annoying echo of the object name in the speaker as the JeVois sends new object hits about every 250 ms. Hedley is trying to be socially considerate at three seconds.

I changed a couple of other things on Hedley for this project. One was disconnecting the hardware serial line from the JeVois to the pan servo Arduino, in the top of his skull. Figuring out how to route tracking data through the USB cable connected Raspberry Pi then to the pan servo subsystem is now on the to-do list. I’ll have to work out some coordinated scheme to control the pan, jaw and other servos from the Pi, with input from the JeVois and other sensors. Data will flow over the USB serial line instead of via the hardware serial port. Looks like I might need a “show” scripting program of some sort.

I also temporarily unhooked the jaw servo subsystem. A Processing script is currently used to move the jaw based on pre-recorded .WAV audio files. I need to integrate the new object recognition and speech synthesis script with the analysis that realistically drives the jaw servo from the old program.

Next steps

I’m on the hunt for a reliable way to sync the jaw movement with object recognition/speech synthesis, running through the Pi and subsystems. Low latency is important because we don’t want any lag between the actual sound output and the jaw flapping up and down. Even a 200 ms lag ruins the “talking” effect.

I’m also looking for ways to adjust the JeVois recognition algorithms on the fly. For example, it might be useful for Hedley to just sit there for long periods and if there is movement in front of him (say, using the Surprise Recorder module), switch to the Darknet YOLO algorithm for face and object recognition. Detecting movement at longer distances would wake him up to notice faces and objects, which are harder to detect further out.

It might also be cool to generate my own custom networks for specific faces or objects. Another idea is to combine several modules to expand what is recognized. Maybe say AruCo symbols and objects.

Hedley thinks we are doing a pretty good job of explaining and documenting our journey through off-the-shelf projects. He’d love to hear feedback from readers. Well…we are working on voice recognition. Nevertheless, feel free to suggest new topics and things to explore by sending a quick note to doc@drtorq.com.

Catch Dr. Torq’s Off-The-Shelf Hacker column, each Saturday, only on The New Stack! Contact him directly for consulting, speaking appearances and commissioned projects at doc@drtorq.com or 407-718-3274.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Shelf, Torq, The New Stack.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.