Brian Kernighan Remembers the Origins of ‘grep’
This month saw the release of a fascinating oral history, in which 76-year-old Brian Kernighan remembers the origins of the Unix command grep.
Kernighan is already a legend in the world of Unix — recognized as the man who coined the term Unix back in 1970. His last initial also became the “k” in awk — and the “K” when people cite the iconic 1978 “K&R book” about C programming. The original Unix Programmer’s Manual calls Kernighan an “expositor par excellence,” and since 2000 he’s been a computer science professor at Princeton University — after 30 years at the historic Computing Science Research Center at Bell Laboratories.
In new interviews with the YouTube channel Computerphile, Kernighan has been sharing some memories…
Two years ago Kernighan remembered when he’d joined Bell Labs as a graduate student in 1967 — studying electrical engineering because at the time there was no “computer science” majors. “It was a wonderful place because there was an enormous number of really good people doing really interesting things and nobody telling you what to do…”
“In one single, largish building there were probably 4,000 people, of who about 2,000 were probably PhDs in various forms of science, physics, chemistry, materials, and then on the, call it the softer end, mathematics and the relatively new field at that point of computer science.”
Remembering Where Unix Was Born
AT&T as a whole had “well over” a million employees, Kernighan remembers, making them America’s single largest employer outside of government. Their dominant position as the phone company for most Americans gave them a very stable revenue stream, and research represented only a small fraction of the company. “In some sense, it didn’t matter as long as this collection of people produced things that were useful,” he remembered.
Which they did. Kernighan cites their work on the transistor, early work on lasers, and zone refining, “which makes semiconductors actually practical — all of these things came out of Bell Labs.”
He remembers when feeding punchcards to a single computer gave way to time-sharing systems, which “actually gave everybody the illusion that they had the whole computer to themselves.” So could you call that an early version of cloud computing? “Perhaps you could. There’s the central computer — it has the resources, that’s where information is being stored, and you’re talking to it remotely.”
“If somebody had an idea they would talk about it. People would gather in the corridors talking about ideas. People would meet at lunch and talk about things like that.”
And this brings us to the moment when Thompson invented grep.
Birth of a Utility
The original Unix Programmer’s Manual calls grep one of “the more memorable mini-revolutions” that Unix experienced, saying it irrevocably ingrained the “tools” outlook into Unix. “Already visible in utilities such as wc, cat, and uniq, the stream-transformation model was deliberately followed in the design of later programs such as tr, m4, sed, and a flurry of language preprocessors.” grep, of course, is used when searching for text patterns — whether that text is coming from input files or from “piped” text (output from another command).
To understand the story of grep, “You have to put yourself back in the early days of computing… the very, very early days of Unix,” Kernighan explains. Back in the early 1970s, Unix ran on PDP-11s, “a machine that had very little computing power. It didn’t run very fast, it also didn’t have very much memory, probably something of the order of 32K, maybe 64K bytes — and that’s 64K bytes, not megabytes — and very small secondary storage as well, a few megabytes of disk.”
In another interview with Computerphile, Kernighan suggests it was these low-storage environments that also gave rise to Unix’s pipe operator.
“It’s possible that if you took the output of one program and had to store it, totally, before you put it into the next program, it might not fit… And so you couldn’t instantiate, necessarily, the output of a program before passing it on to the next program.
“The pipeline never had to instantiate the whole output… That meant that you could kind of just sneak things through…!”
But there also wasn’t even enough memory to edit large files. Back then ed was the standard text editor — a small program written by Thompson for a world with only primitive monitors. “There was no cursor addressing, so you couldn’t move around within a line. The ed text editor reflected that kind of thing.”
ed let you specify regular expressions, which appeared between slashes, along with some operations (specified outside those slashes) to perform on the lines which matched. The operations were often indicated with a single letter, like ‘p’ for print or ‘a’ for append. And there was also a ‘g’ flag which stood for global and would perform a command not just on one line, but on every line of a file (that matched the specified regular expression). For example, that print command, “p”.
And if you wrote out that command — with g for “global” and p for “print”, applying it to a regular expression between the slashes — it would look like this:
Kernighan supplies some crucial context — in the form of a story. Their colleague Lee McMahon had wanted to study the Federalist papers, which were written by several different authors (including Alexander Hamilton) but published under the same pseudonym, carefully analyzing the text for clues about their original authors by finding all the occurrences of specific words and phrases. Unfortunately, a plaintext version of the collection was one megabyte — “down in the noise by today’s standard,” but at the time: “wouldn’t fit. He couldn’t edit them all in ed.”
“He sent this to Ken Thompson, and then went home for dinner or something like that. And he came back the next day, and Ken had written him a program.
“And the program was called grep.”
“It was an instant hit,” asserted the original “Unix Programmer’s Manual,” “and soon entered our language as a verb.”
“Let me add one thing,” Kernighan remembers with a laugh. Twenty-five years ago, in the spring of 1993, he’d been teaching as a visiting professor and needed an assignment for his programming class. So he provided his students the source code for ed — “It was at that time probably 1800 lines of C” — and told them to do what Ken Thompson did.
“Your job is to take these 1800 lines of C and convert them into grep, as a C program. And you’ve got a week to do it.”
Kernighan pointed out to the class that they enjoyed several advantages. For example, the original grep was written in PDP-11 assembly language. “This is typical of what you might encounter in a new job,” explained the text of their assignment. “You’re asked to make a small change in a big program, and most of the challenge is finding the right thing to change while making sure that nothing else breaks.”
Interestingly, Kernighan told Thompson back in 1993 that he was assigning the problem — and the historical record will show that he received this response from Thompson.
looks like a very good assignment.
it stresses reuse and cleanliness rather than grunt.
“Of course, they also had one grave disadvantage,” Kernighan tells Computerphile. “None of them were Ken Thompson.”
A follow-up video shows the interviewer insisting on the rest of the story. “Did they come up with it?”
“Oh, yeah, yeah, it’s fine,” Kernighan replied quickly. “It’s an easy job.”
“I stopped using it after a while. The problem with it is that as people have moved more and more to screen-based editors — vi, emacs, and so on — and then mouse-based editors, much more recently, ed has just fallen into disrepair.”
Though even then, Kernighan suggests that it still forms a part of our great geek heritage. “If you use vi on a regular basis, ed is hiding inside there. All the commands are very similar….”
But he laughs and says “It’s such a great exercise.”
Speaking of vi
In fact, Kernighan himself uses vi as his text editor for that reason, although to this day, he will also still use ed — “very occasionally, when I want to go with something really fast and simple. And it’s scriptable….” He also uses a hybrid text editor called sam, written by Rob Pike, “So in some sense, these are all coming from that prototypical example of the commands and the regular expressions, and that has just followed through.”
When Computerphile’s interviewer had asked if the pipe operator is also still being used today, Kernighan replied emphatically, “Oh, absolutely. It’s still a fundamental mechanism. You use it all the time.
“You don’t even think about it at this point. It’s just part of it. It’s definitely an instance of ‘don’t re-invent the wheel. Other people have done a lot of useful things for you.'”
And in addition, he says, “gluing together commands is also a lot of fun.”
- The origin of HTML’s infamous blink tag.
- Remembering the early days of Google and Facebook.
- How two strangers set up Dropbox and made billions.
- Impressive results in this year’s robot art competition.
- Biometric tickets are here — and they’re coming for Major League Baseball.
- Can you teach a computer to guess which century a painting was painted in?