Technology / Contributed

Tutorial: Find Strings in Text Files Using Grep with Regular Expressions

6 Nov 2019 9:00am, by

Matt Zand
Matt Zand is the founder of High School Technology Services, DC Web Makers and Coding Bootcamps. He has written extensively on advanced topics on web design, mobile app development and blockchain. He is a senior editor at Touchstone Words where he writes and reviews coding and technology articles. He is also a senior instructor and developer living in Washington DC.

The grep Linux/Unix command line utility is one of most popular tools for searching and finding strings in a text file. The name “grep” derives from a command in the now-obsolete Unix ed line editor tool — the ed command for searching globally through a file for a regular expression and then printing those lines was g/re/p, where re was the regular expression you would use. Eventually, the grep command was written to do this search on a file when not using ed.

In this article, we show you to run advance string searching using Grep with regular expression by giving you 10 hands-on examples on its implementations. Many examples discussed in this article have practical implications meaning you can use them in your daily Linux programming. The following samples describe some regexp examples for commonly searched-for patterns.

Find a Single Charterer in a Text File

To output lines in the file “book” that contain a “$” character, type:

Find a Single String in a Text File

To output lines in the file “book” that contains the string “$14.99”, type:

Find a Single Special Charterer in a Text File

To output lines in the file “book” that contain a “\” character, type:

Matching Lines Beginning with Certain Text

Use “ˆ” in a regexp to denote the beginning of a line.

To output all lines in “/usr/dict/words” beginning with “pro”, type:

To output all lines in the file “book” that begin with the text “in the beginning”, regardless of case, type:

NOTE: These regexps were quoted with characters; this is because some shells otherwise treat the “ˆ”character as a special “metacharacter.”

In addition to word and phrase searches, you can use grep to search for complex text patterns called regular expressions. A regular expression — or “regexp”— is a text string of special characters that specifies a set of patterns to match.

Technically speaking, the word or phrase patterns are regular expressions—just very simple ones. In a regular expression, most characters — including letters and numbers — represent themselves. For example, the regexp pattern 1 matches the string “1”, and the pattern boy matches the string “boy”.

There are a number of reserved characters called metacharacters that do not represent themselves in a regular expression, but they have a special meaning that is used to build complex patterns. These metacharacters are as follows: ., *, [, ], ˆ, $, and \. It is good to note that such metacharacters are common among almost all of common and special Linux distributions. Here is a good article that covers special meanings of the metacharacters and gives examples of their usage.

Matching Lines Ending with Certain Text

Use “$” as the last character of quoted text to match that text only at the end of a line.

To output lines in the file “going” ending with an exclamation point, type:

Matching Lines of a Certain Length

 To match lines of a particular length, use that number of “.” characters between “ˆ” and “$” —for example, to match all lines that are two characters (or columns) wide, use “ˆ..$” as the regexp to search for.

To output all lines in “/usr/dict/words” that are exactly three characters wide, type:

For longer lines, it is more useful to use a different construct: “ˆ.\{number\}$”, where number is the number of lines to match. Use “,” to specify a range of numbers.

To output all lines in “/usr/dict/words” that are exactly twelve characters wide, type:

To output all lines in “/usr/dict/words” that are twenty-two or more characters wide, type:

Matching Lines That Contain Any of Some Regexps

To match lines that contain any of a number of regexps, specify each of the regexps to search for between alternation operators (“\|”) as the regexp to search for. Lines containing any of the given regexps will be output.

To output all lines in “playboy” that contains either the patterns “the book” or “cake”, type:

Matching Lines That Contain All of Some Regexps 

To output lines that match all of a number of regexps, use grep to output lines containing the first regexp you want to match, and pipe the output to a grep with the second regexp as an argument. Continue adding pipes to grep searches for all the regexps you want to search for.

To output all lines in “playlist” that contains both patterns “the shore” and “sky”, regardless of case, type:

Matching Lines That Only Contain Certain Characters

To match lines that only contain certain characters, use the regexp “ˆ[characters]*$”, where characters are the ones to match.

To output lines in “/usr/dict/words” that only contain vowels, type:

The “-i” option matches characters regardless of case; so, in this example, all vowel characters are matched regardless of case.

Finding Phrases Regardless of Spacing

One way to search for a phrase that might occur with extra spaces between words, or across a line or page break, is to remove all linefeeds and extra spaces from the input, and then grep that. To do this, pipe the input to tr with “’\r\n:\>\|-’” as an argument to the “-d” option (removing all line breaks from the input); pipe that to the fmt filter with the “-u” option (outputting the text with uniform spacing); and pipe that to grep with the pattern to search for.

To search across line breaks for the string “at the same time as” in the file “docs”, type:

Summary

In this article, we reviewed 10 practical examples of using Grep Linux command for searching and finding strings in a text file. Along the way, we learned how to use regular expressions in conjunction with Grep to conduct complex searches on text files. By now you have a better idea on how powerful Linux search functions are.

Here are additional resources for those interested in learning more about Linux programming:

Resources for System Administrators

Resources for Linux Kernel Programmers

Linux File System Dictionary

Feature image by Arek Socha from Pixabay.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.