Dynamic Text Processing
As hinted in our Introduction to the Command Line, we actually have more power at our fingertips than one might expect thanks to the command line’s ability to pass a coherent stream of data from one command to another. On this page, we cover two commands that lend themselves particularly well to this approach: grep
and sed
.
Finding Text: grep
grep finds specific text within its input data according to some pattern. Unfortunately, explaining the name is too complicated for now, so let’s just leave it at grep:
grep "<pattern>"
This will try to find the desired pattern in the lines that you type. If a line matches, it will repeat that line. If it doesn’t match, it will just wait for the next line until you hit Control-d to end your input.
Try this:
grep "Romance"
Then, type any number of lines. Include the word Romance
in some but not others. Notice that the only lines that repeat are the ones with Romance
in them. Notice also that the matching is case-sensitive—i.e., romance
will not match.
Non-Exact Matches
Exact matches are interesting, but most other everyday applications can do this without a problem. Note how we said that grep can match a pattern and not just search text. It turns out that grep can “understand” a wide variety of symbols that represent different patterns of text.
A period (.) represents any single character. Thus, this pattern:
grep "st..r"
...produces all lines that have “st” and “r” with any two symbols in between. So lines with steer
or Fred Astaire
will match, but store
or restart
will not.
Here are some other patterns that you’ll find useful. Needless to say, this is just the tip of the iceberg; as you get more comfortable with grep, you can learn more and more variations for text patterns.
[<characters>] Matches lines that have any of the characters listed in <characters> ^pattern Matches lines that start with the given pattern pattern$ Matches lines that end with the given pattern pattern* Matches zero or more repetitions of the given pattern [^<characters>]* Matches lines that do not have the characters listed in <characters>
Note the dual use of ^; when within brackets [ ] this means “do not match the characters” but when it is the first symbol of the pattern, it represents the start of a line.
As mentioned, there are many more, but this is a start.
A Few Examples
It’s the patterns that truly reveal grep’s potential power. For example, try this:
grep "[qz]"
Here’s what appears on the screen if the user types “hello world,” “quit bugging me,” “Quit bugging me,” “what's up,” “Zounds!,” “zoundz!,” then Control-d:
hello world quit bugging me quit bugging me Quit bugging me what's up Zounds! zoundz! zoundz!
Since only “quit bugging me” and “zoundz!” match the [qz] pattern, then only those lines are repeated by grep.
Negations ([^ ]) may seem unintuitive at first but after some consideration their behavior does make sense:
grep "[^qz]"
At first, one might think that this will match data that have neither q nor z within. However, this is not the case:
hello world hello world quit bugging me quit bugging me Quit bugging me Quit bugging me what's up what's up Zounds! Zounds! zoundz! zoundz!
That’s because if you have any character that isn’t a q nor z, then grep considers that to be a match. Only data that consists entirely of qs and zs will not match:
qqqq zzzzzzz qzqzzzq qq
The key to matching data that don’t contain those characters at all is to combine them with ^, *, and $:
grep "^[^qz]*$"
This pattern says that no character from the beginning to the end of the line may be a q nor z:
hello world hello world quit bugging me Quit bugging me Quit bugging me what's up what's up Zounds! Zounds! zoundz!
Remember again, though, that we are case-sensitive by default, so you need to include Q and Z in your pattern if you want to factor in capital letters.