Technology

Off-The-Shelf Hacker: Linux Pipes, Redirection and AWK

10 Dec 2016 7:40am, by

In the October 29th article, “Off-The-Shelf Hacker: Deploying And Testing The ESP8266-PIR Yard Sensor,” I briefly talked about capturing the data from the yard sensor, via the network, to a Linux machine for later analysis. Writing the data out to a file proved unreliable, so I suggested readers explore solutions on their own.

As with all of my projects, I’ve pondered the problem, since then, testing various theories as new ideas materialized in my head.

It’s amazing how simple things can be, once the old light bulb goes off. Up until that point, it’s frequently painful, frustrating and time-consuming. Take heart because there are solutions out there if you don’t give up and keep digging.

Linux is a powerful tool on the receiving end of the physical computing stack. Embrace it, learn it, love it, for infinite physical computing fun and profit.

Pipes, Redirection and AWK

The Linux operating system was designed to be very modular, especially with the command line. You know, the console or terminal window. Great examples of that modularity show up as a couple of core concepts, namely pipes and redirection.

Pipes let you string together commands, with the output of one program acting as input to the next one. This technique makes it easy to create deceptively powerful processes that handle complex jobs, typically without resorting to heavy-duty programming efforts. Just add the vertical bar (“|”) character between commands.

Redirection takes a data stream and shoves it into a file, that you can then analyze with another program or simply save for later use. Redirection is symbolized as the right arrow (“>”) character, between the command and the file name.

Take a look at the command we used in the October article.

In this example, the netcat command reads a text data stream from a network node on a certain port number. The node is at the 192.168.1.114 IP address with a port number of 1337, as defined in the ESP8266 yard sensor’s firmware. When you power up the yard sensor, it sits there and waits for another node to connect to its data stream. You run netcat on another machine to make that connection and begin grabbing the data. By-the-way, netcat can act as a host or a client. It’s a pretty powerful command.

Another Unix utility, AWK is a text processing program. Here, it reads input from the netcat, through a pipe. The data streaming from the yard sensor is simply a number that’s incremented every time something moves in its field of view.

While it might be cool to just see how many hits we get from the sensor, the real value is seeing when the sensor picked up movement. So, AWK prints the hit number from the netcat command ($0), then a comma (in quotes) and finally uses the strftime function to add time and date to each line of data. As each line of data streams in from the sensor, netcat and AWK diligently print each line to your screen, in succession. The job will continue until you type a <CTRL>C.

To record the output of the netcat/AWK stream, you’d typically just add a redirection (the “>” character) for a file, like the following.

Using the cat command, you might see data appear in the file after several minutes.

Sadly, most of the time, you’ll see nothing.

It turns out that there’s a buffering problem.

Buffer Madness

The way awk works is that when it receives data in a pipe, the data goes into a buffer and it takes a while before that data is flushed to an output file when using redirection.

Apparently, there’s no problem with flushing the data to standard output, otherwise known as the console (your terminal screen).

I tried a number of schemes to get around the problem, including using the tee command and rearranging the order of the command string.

While reading the awk documentation, I discovered the fflush() option. Big fun, right?

This command-line option simply pushes the data out as it’s processed. It’s usually more efficient to save up data, perhaps say until there’s a full line and then print out the line, instead of just printing out one or two characters at a time.

Here’s the mod to our remote data logging command line on a networked Linux machine.

You can then just cat or tail the file, to see the results.

The tail command works much like cat, except that it just prints out the latest 10 or so lines, that were added to the file. You’d use tail in a separate terminal window.

Here’s some output from the rob.txt file.

Or, if you wanted to watch the file continuously use the -f option and it will print new lines from rob.txt as they come in.

Wrap Up

So, there you have it. A simple fflush() makes everything better and completes a powerful tool chain for capturing sensor data.

Keep in mind that these physical computing stacks are actually pretty complicated systems. Sure, you might only see a little ESP8266 microcontroller feeding passive infrared sensor data over a local area network to a remote Linux machine and logging data to a file.

That’s an awful lot of off-the-shelf hacker technology. Doing that task 10 years ago was a challenge. I was there, I was doing it. With our ever expanding levels of sophistication, our learning and application of seemingly minute details, like fflush(), is more complicated and challenging as well.

Isn’t it fun, though, to have a vision and actually be able to turn it into something you can touch and see work, even if it takes a little while?

Keep rocking those off-the-shelf hacker projects.

Feature image via Pixabay.

The New Stack is a wholly owned subsidiary of Insight Partners. TNS owner Insight Partners is an investor in the following companies: Shelf, Real.

A newsletter digest of the week’s most important stories & analyses.