Tutorial: Hunting the Secrets of Unix ‘grep’

Part of the Unix batch of utilities, grep (Global Regular Expression Print) is such a powerful search tool, that it makes sense to review all its flags and meta-characters to make sure you’re not overlooking something incredibly useful. And it turns out that some of that usefulness lies rooted in the work of the early Unix pioneers.
How grep Evolved
To understand the evolution of grep, “You have to put yourself back in the early days of computing… the very, very early days of Unix,” Brian Kernighan recently remembered in some new interviews. Back then Unix was running on a slow, low-memory PDP-11 — and “It’s possible that if you took the output of one program and had to store it, totally, before you put it into the next program, it might not fit.”
This explains why text editors like ed worked on one line at a time — a tradition which found its ways into many Unix tools which have since been handed down over the last 40 years. For example, ed is the forefather of the ex text editor, much of which ended up incorporated into vim.)
And if you wrote out the one-line ed command for finding a regular expression re — searching “globally” throughout the entire document, and then printing every match — you’d get…
1 |
g/re/p |
Yes, it was such a useful command that eventually became its own stand-alone tool. History has it that Ken Thompson coded up the grep tool overnight to help a colleague search through the entire text of the Federalist Papers without having to load the whole thing into memory first.
But in the same way that the ed command g/re/p became a stand-alone tool named “grep,” some of grep’s most useful flags eventually spun off into more tools. For example, there’s the recursive rgrep, which is the same as grep -r. It’s like grep with a superpower — it searches through every subdirectory.
And then there’s egrep, which is the same as grep -E, though the grep man page warns that egrep “is deprecated,” but “is provided to allow historical applications that rely on them to run unmodified.” Searches with egrep match not only the usual metacharacters ( .* ^ $) but also the Posix-defined set of (E)xtended regular expressions. That lets you get more specific about how many times you want a character to appear in your matches.
? | Once or none |
+ | Once or more |
{n} | Exactly n times |
{n,} | At least n times |
{n, m} | Between n and m times |
You can also do the same quantity-matching with grep, but you have to precede the characters with a backslash.
grep 's\{2\}' filename
With egrep no backslashes are needed.
egrep 's{2}' filename
And there’s one more powerful way to soup up your pattern matching: alternation. The “pipe” character lets you separate a pair of characters, and will match if a line contains either character. Again, grep requires that you precede the pipe character with a backslash — but with egrep you can simply include it in your commands.
grep 'z\|Z' filename
egrep 'z|Z' filename
You can also create a group of characters — which will only be considered a match if the entire group is present.
grep '\(even\|odd\)'
egrep '(even|odd)'
grep also spawned yet-another standalone tool, fgrep, which grep’s man page explains is the same as grep -F . Again it warns that fgrep “is deprecated” but “provided to allow historical applications that rely on them to run unmodified.” I’ve heard it referred to as “fast grep,” because it basically throws away all of grep’s regular expression-matching, and just concentrates on quickly looking only for matching strings. For example, grep uses a dollar sign as a special character matching the end of a line — so if you actually want to search for a dollar sign, you have to precede it by a backslash (and include the whole search string in single quotes).
grep '\$' filename
But fgrep lets you just type in that dollar sign.
fgrep $ filename
And you can also use fgrep to match for a dot or a caret without having to precede it with a backslash — which does make things more readable.
More Ways to Search
If your grep searches are giving you a lot of “false positives,” there’s a built-in way to only display a match when your search pattern is a complete word: the -w flag. For example, if you’re looking for the word “ant” but don’t want to match all the other words which *contain* ant:
grep -w 'ant' filename
And if you want to only display matches when your search string is the entire line, try -x:
grep -x 'Only this text appears on the line' filename
Of course, there’s an alternate way to do that, using grep‘s ^ and $ metacharacters, which let you match the beginning and end of a line, respectively.
grep '^Only this text appears on the line$' filename
Because of ^ and $, there’s also an easy way to search for blank lines.
grep '^$' filename
grep also supports the matching of “character classes” — displaying a line if it contains any one of the characters indicated in brackets.
grep '[a-z]' filename
But there’s also several handy pre-defined character sets. Here’s some of the more useful ones.
[:alnum:] | Matches all numbers and letters (in the language of your current locale) |
[:digit:] | Matches all numbers |
[:alpha:] | Matches all letters |
[:upper:] | Matches all uppercase letters |
[:lower:] | Matches all lowercase letters |
[:punct:] | Matches all punctuation marks |
[:graph:] | Matches all “graphical” characters (all letters, numbers, and punctuation marks) |
And if you start your character class with the special ^ character, it takes on a new behavior: you can instead match all those lines which contain characters not matching your character set.
Fun Grep Tricks
Here’s where it gets fun. If you can track down your system’s “dictionary” directory — usually in the subdirectory /usr/share/dict — there’s a file named words where every word in the English language appears on a separate line — 99,171 words in all. This, of course, makes it trivially easy to solve that Junior Jumble that’s in your morning newspaper.
grep -x '[kriqu]\{5\}' words
quirk
And for years I’ve been using grep as my workaround for getting a list of all of my directory’s subdirectories.
ls -l | grep '^d'
The -l flag on ls starts each line with the file permissions on every file or directory being listed — and of course, the directories are indicated with a d in the first position. So the grep command can just search for all lines beginning with a d!
drwxr-xr-x 2 username pg72253 24 Mar 8 15:11 directoryname
Over the years you eventually pick up more and more tricks. When I first began programming, I’d grep for a specific pattern, and then pipe it into wc to count up how many matches there were.
grep 'ERROR' filename | wc
Years later I realized I could accomplish the same thing just with grep — if I also used grep’s -c flag.
grep -c 'ERROR' filename
And sooner or later you’ll want to ignore case — which can be done using the -i flag.
grep 'UNIX' -i filename
Recently I learned that you can also ignore case by instead using the -y flag, which grep’s man describes as simply an “obsolete synonym”
grep 'UNIX' -y filename
But my favorite grep flag is –color. It does exactly what you think it does: it displays the matches in color. And it’s part of an especially useful trick, found on Tom Limoncelli’s Everything Sysadmin” blog site. In a post titled “4 unix commands I abuse every day,” he writes that “As you get older your eyesight gets worse. It becomes more difficult to find something in a field of text.” So using egrep and its –color flag, he came up with a way to display an entire file — while highlighting certain keywords. Since every line will have a beginning, the command will match and display every line — but you’ll only see the highlights on your search word.
cat filename | egrep --color=always '^|searchword'
Meanwhile, there’s a couple useful flags that I abuse every day:
- -n gives you the line number for each match. (Which comes in handy if you’re planning to then jump right to that line number in vim to edit it.)
- -h lets you tell grep to stop printing the filenames before the matching text.
And one of the most useful flags instructs grep to print additional lines that appear either before or after your matches — using -B or -A respectively (followed immediately by the number of additional lines to print).
grep -A2 'searchstring' filename
grep -B2 'searchstring' filename
Don’t forget the -v flag, which offers a sort of “opposite day grep” — where it prints every line that doesn’t match your search string.
You can also just instruct grep to display only the names of the files which did not contain the matching string — using either the -L flag or the more descriptive –files-without-match
grep -L 'ERROR' filename
grep --files-without-match 'ERROR' filename
I’ve learned a lot, just in writing this tutorial. It’s proof that there’s so many different ways to use grep, it’s always worth spending some time to get to make sure you know all of its hidden extra features!