Try / grep in Y minutes

grep is the ultimate text search tool available on virtually all Linux machines. While there are now better alternatives (such as ripgrep), you will still often find yourself on a server where grep is the only search tool available. So it's nice to have a working knowledge of it.

Basics · Recursive search · Search options · Output options · Final thoughts

✨ This is an open source guide. Feel free to improve it!

Basics

Basically, grep works like this:

  • You give it a search pattern and a file.
  • grep reads the file line by line, printing the lines that match the pattern and ignoring others.

Let's look at an example. We'll search the httpurr source code, which I've already downloaded to the /opt/httpurr directory like this:

cd /opt
curl -OL https://github.com/rednafi/httpurr/archive/refs/tags/v0.1.2.tar.gz
tar xvzf v0.1.2.tar.gz
mv httpurr-0.1.2 httpurr
cd httpurr

Search in file · Matches · Regular expressions · Fixed strings · Multiple patterns

Search in file

Let's find all occurrences of the word codes in README.md:

grep -n codes README.md
3:    <strong><i> >> HTTP status codes on speed dial << </i></strong>
30:* List the HTTP status codes:
54:* Filter the status codes by categories:
124:		  Print HTTP status codes by category with --list;
131:		  Print HTTP status codes

grep read the contents of README.md, and for each line that contained codes, grep printed it to the terminal.

grep also included the line number for each line, thanks to the -n (--line-number) option.

Not all grep versions support the long option syntax (e.g. --line-number). If you get an error using the long version, try the short one (e.g. -n) — it may work fine.

Matches

grep uses partial matches by default:

grep -n descr README.md
81:* Display the description of a status code:
127:		  Print the description of an HTTP status code

The word description matches the descr search pattern.

To search for whole words instead, use the -w (--word-regexp) option:

grep -n --word-regexp code README.md
81:* Display the description of a status code:
84:	httpurr --code 410
94:	The HyperText Transfer Protocol (HTTP) 410 Gone client error response code
99:	code should be used instead.
126:	  -c, --code [status code]
127:		  Print the description of an HTTP status code

grep found strings containing the word code, but not codes. Try removing --word-regexp and see how the results change.

When using multiple short options, you can combine them like this: grep -nw code README.md. This gives exactly the same result as using the separate options (-n -w).

To search for whole lines instead of partial matches of whole words, use the -x (--line-regexp) option:

grep -n --line-regexp end httpurr.rb
47:end

Try removing --line-regexp and see how the results change.

Regular expressions

To make grep use regular expressions (Perl-compatible regular expressions in grep terminology), use the -P (--perl-regexp) option.

Let's find all lines with a word that contains res followed by other letters:

grep -Pn 'res\w+' README.md
94:	The HyperText Transfer Protocol (HTTP) 410 Gone client error response code
95:	indicates that access to the target resource is no longer available at the
152:of the rest.

\w+ means "one or more word-like characters" (e.g. letters like p or o, but not punctuation like . or !), so response, resource, and rest all match.

Regular expression dialects in grep

Without --perl-regexp, grep treats the search pattern as something called a basic regular expression. While regular expressions are quite common in the software world, the basic dialect is really weird, so it's better not to use it at all.

Another dialect supported by grep is extended regular expressions. You can use the -E (--extended-regexp) option to enable them. Extended regular expressions are almost like normal regular expressions, but not quite. So I wouldn't use them either.

Some grep versions do not support --perl-regexp. For those, --extended-regexp is the best you can get.

Suppose we are only interested in 4 letter words starting with res:

grep -Pn 'res\w\b' README.md
152:of the rest.

\b means "word boundary" (e.g. a space, a punctuation character, or the end of a line), so rest matches, but response and resource don't.

Finally, let's search for 3-digit numbers (showing first 10 matches with head):

grep -Pn '\d\d\d' README.md | head
45:	100    Continue
46:	101    Switching Protocols
47:	102    Processing
48:	103    Early Hints
69:	200    OK
70:	201    Created
71:	202    Accepted
72:	203    Non-Authoritative Information
73:	204    No Content
74:	205    Reset Content

A full tutorial on regular expressions is beyond the scope of this guide, but grep's "Perl-compatible" syntax is documented in the PCRE2 manual.

Fixed strings

What if we want to search for a literal string instead of a regular expression? Suppose we are interested in a word code followed by a dot:

grep -Pn 'code.' src/data.go | head
8:The HTTP 100 Continue informational status response code indicates that
14:status code in response before sending the body.
31:The HTTP 101 Switching Protocols response code indicates a protocol to which the
53:Deprecated: This status code is deprecated. When used, clients may still accept
56:The HTTP 102 Processing informational status response code indicates to client
59:This status code is only sent if the server expects the request to take
112:The HTTP 200 OK success status response code indicates that the request has
141:The HTTP 201 Created success status response code indicates that the request has
149:The common use case of this status code is as the result of a POST request.
165:The HTTP 202 Accepted response status code indicates that the request has been

Since . means "any character" in regular expressions, our pattern also matches code , codes and other cases we are not interested in.

To treat the pattern as a literal string, use the -F (--fixed-strings) option:

grep -Fn 'code.' src/data.go
197:to responses with any status code.
283:Browsers accessing web pages will never encounter this status code.
695:to an error code.
1027:erroneous cases it happens, they will handle it as a generic 400 status code.
1051:Regular web servers will normally not return this status code. But some other
1418:then the server responds with the 510 status code.

Much better!

Multiple patterns

To search for multiple patterns, list them with the -e (--regexp) option. grep will output lines matching at least one of the specified patterns.

For example, search for make or run:

grep -En -e make -e run README.md
139:* Go to the root directory and run:
141:	make init
145:	make lint
149:	make test

Unfortunately, grep can't use Perl-compatible regular expressions (-P) with multiple patterns. So we are stuck with the extended (-E) dialect.

If you have many patterns, it may be easier to put them in a file and point grep to it with -f (--file):

echo 'install' > /tmp/patterns.txt
echo 'make' >> /tmp/patterns.txt
echo 'run' >> /tmp/patterns.txt

grep -En --file=/tmp/patterns.txt README.md
13:* On MacOS, brew install:
17:	    && brew install httpurr
20:* Or elsewhere, go install:
23:	go install github.com/rednafi/httpurr/cmd/httpurr
139:* Go to the root directory and run:
141:	make init
145:	make lint
149:	make test

grep searches directories recursively when called with the -r (--recursive) option.

Search in directory · File globs · Binary files

Search in directory

Let's find all unexported functions (they start with a lowercase letter):

grep -Pnr 'func [a-z]\w+' .
./cmd/httpurr/main.go:12:func main() {
./src/cli.go:16:func formatStatusText(text string) string {
./src/cli.go:21:func printHeader(w *tabwriter.Writer) {
./src/cli.go:35:func printStatusCodes(w *tabwriter.Writer, category string) error {
./src/cli.go:105:func printStatusText(w *tabwriter.Writer, code string) error {

This search returned matches from both the cmd and src directories. If you are only interested in cmd, specify it instead of .:

grep -Pnr 'func [a-z]\w+' cmd
cmd/httpurr/main.go:12:func main() {

To search multiple directories, list them all like this:

grep -Pnr 'func [a-z]\w+' cmd src
cmd/httpurr/main.go:12:func main() {
src/cli.go:16:func formatStatusText(text string) string {
src/cli.go:21:func printHeader(w *tabwriter.Writer) {
src/cli.go:35:func printStatusCodes(w *tabwriter.Writer, category string) error {
src/cli.go:105:func printStatusText(w *tabwriter.Writer, code string) error {

File globs

Let's search for httpurr:

grep -Pnr --max-count=5 httpurr .
./README.md:2:    <h1>ᗢ httpurr</h1>
./README.md:16:	brew tap rednafi/httpurr https://github.com/rednafi/httpurr \
./README.md:17:	    && brew install httpurr
./README.md:23:	go install github.com/rednafi/httpurr/cmd/httpurr
./README.md:33:	httpurr --list
./cmd/httpurr/main.go:4:	"github.com/rednafi/httpurr/src"
./go.mod:1:module github.com/rednafi/httpurr
./httpurr.rb:7:  homepage "https://github.com/rednafi/httpurr"
./httpurr.rb:12:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Darwin_x86_64.tar.gz"
./httpurr.rb:16:        bin.install "httpurr"
./httpurr.rb:20:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Darwin_arm64.tar.gz"
./httpurr.rb:24:        bin.install "httpurr"
./src/cli.go:24:	fmt.Fprintf(w, "\nᗢ httpurr\n")
./src/cli_test.go:64:	want := "\nᗢ httpurr\n==========\n\n"

Note that I have limited the number of results per file to 5 with the -m (--max-count) option to keep the results readable in case there are many matches.

Quite a lot of results. Let's narrow it down by searching only in .go files:

grep -Pnr --include='*.go' httpurr .
./cmd/httpurr/main.go:4:	"github.com/rednafi/httpurr/src"
./src/cli.go:24:	fmt.Fprintf(w, "\nᗢ httpurr\n")
./src/cli_test.go:64:	want := "\nᗢ httpurr\n==========\n\n"

The --include option (there is no short version) takes a glob (filename pattern), typically containing a fixed part (.go in our example) and a wildcard * ("anything but the path separator").

Another example — search in files named http-something:

grep -Pnr --include='http*' httpurr .
./httpurr.rb:7:  homepage "https://github.com/rednafi/httpurr"
./httpurr.rb:12:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Darwin_x86_64.tar.gz"
./httpurr.rb:16:        bin.install "httpurr"
./httpurr.rb:20:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Darwin_arm64.tar.gz"
./httpurr.rb:24:        bin.install "httpurr"
./httpurr.rb:31:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Linux_arm64.tar.gz"
./httpurr.rb:35:        bin.install "httpurr"
./httpurr.rb:39:      url "https://github.com/rednafi/httpurr/releases/download/v0.1.1/httpurr_Linux_x86_64.tar.gz"
./httpurr.rb:43:        bin.install "httpurr"

To negate the glob, use the --exclude option. For example, search everywhere except the .go files:

grep -Pnr --exclude '*.go' def .
./.goreleaser.yml:1:# This is an example .goreleaser.yml file with some sensible defaults.
./httpurr.rb:15:      def install
./httpurr.rb:21:      sha256 "82acefd1222f6228636f2cda6518e0316f46624398adc722defb55c68ac3bb30"
./httpurr.rb:23:      def install
./httpurr.rb:34:      def install
./httpurr.rb:42:      def install

To apply multiple filters, specify multiple glob options. For example, find all functions except those in test files:

grep -Pnr --include '*.go' --exclude '*_test.go' 'func ' .
./cmd/httpurr/main.go:12:func main() {
./src/cli.go:16:func formatStatusText(text string) string {
./src/cli.go:21:func printHeader(w *tabwriter.Writer) {
./src/cli.go:35:func printStatusCodes(w *tabwriter.Writer, category string) error {
./src/cli.go:105:func printStatusText(w *tabwriter.Writer, code string) error {
./src/cli.go:123:func Cli(w *tabwriter.Writer, version string, exitFunc func(int)) {

Binary files

By default, grep does not ignore binary files:

grep -Pnr aha .
grep: ./data.bin: binary file matches

Most of the time, this is probably not what you want. If you're searching in a directory that might contain binary files, it's better to ignore them altogether with the -I (--binary-files=without-match) option:

grep -Pnr --binary-files=without-match aha .
(not found)

If for some reason you want grep to search binary files and print the actual matches (as it does with text files), use the -a (--text) option.

Search options

grep supports a couple of additional search options you may find handy.

Ignore case · Inverse matching

Ignore case

Let's find all occurrences of the word codes in README.md:

grep -Pnr codes README.md
3:    <strong><i> >> HTTP status codes on speed dial << </i></strong>
30:* List the HTTP status codes:
54:* Filter the status codes by categories:
124:		  Print HTTP status codes by category with --list;
131:		  Print HTTP status codes

It returns codes matches, but not Codes because grep is case-sensitive by default. To change this, use -i (--ignore-case):

grep -Pnr --ignore-case codes README.md
3:    <strong><i> >> HTTP status codes on speed dial << </i></strong>
30:* List the HTTP status codes:
40:	Status Codes
54:* Filter the status codes by categories:
64:	Status Codes
124:		  Print HTTP status codes by category with --list;
131:		  Print HTTP status codes

Inverse matching

To find lines that do not contain the pattern, use -v (--invert-match). For example, find the non-empty lines without the @ symbol:

grep -Enr --invert-match -e '@' -e '^$' Makefile
1:.PHONY: lint
2:lint:
8:.PHONY: lint-check
9:lint-check:
14:.PHONY: test
15:test:
20:.PHONY: clean
21:clean:
27:.PHONY: init
28:init:

Output options

grep supports a number of additional output options you may find handy.

Count matches · Limit matches · Show matches only · Show files only · Show context · Silent mode · Colors

Count matches

To count the number of matched lines (per file), use -c (--count). For example, count the number of functions in each .go file:

grep -Pnr --count --include '*.go' 'func ' .
./cmd/httpurr/main.go:1
./src/cli.go:5
./src/cli_test.go:10
./src/data_test.go:2

Note that --count counts the number of lines, not the number of matches. For example, there are 6 words string in src/cli.go, but two of them are on the same line, so --count reports 5:

grep -nrw --count string src/cli.go
5

Limit matches

To limit the number of matching lines per file, use the -m (--max-count) option:

grep -Pnrw --max-count=5 func .
./cmd/httpurr/main.go:12:func main() {
./src/cli.go:16:func formatStatusText(text string) string {
./src/cli.go:21:func printHeader(w *tabwriter.Writer) {
./src/cli.go:35:func printStatusCodes(w *tabwriter.Writer, category string) error {
./src/cli.go:105:func printStatusText(w *tabwriter.Writer, code string) error {
./src/cli.go:123:func Cli(w *tabwriter.Writer, version string, exitFunc func(int)) {
./src/cli_test.go:15:func TestFormatStatusText(t *testing.T) {
./src/cli_test.go:54:func TestPrintHeader(t *testing.T) {
./src/cli_test.go:71:func TestPrintStatusCodes(t *testing.T) {
./src/cli_test.go:159:		t.Run(want, func(t *testing.T) {
./src/cli_test.go:168:func TestPrintStatusText(t *testing.T) {
./src/data_test.go:9:func TestStatusCodes(t *testing.T) {
./src/data_test.go:99:func TestStatusCodeMap(t *testing.T) {

With --max-count=N, grep stops searching the file after finding the first N matching lines (or non-matching lines if used with --invert-match).

Show matches only

By default, grep prints the entire line containing the match. To show only the matching part, use -o (--only-matching).

Suppose we want to find functions named print-something:

grep -Pnr --only-matching --include '*.go' 'func print\w+' .
./src/cli.go:21:func printHeader
./src/cli.go:35:func printStatusCodes
./src/cli.go:105:func printStatusText

The results are much cleaner than without --only-matching (try removing the option in the above command and see for yourself).

Show files only

If there are too many matches, you may prefer to show only the files where the matches occurred. Use -l (--files-with-matches) to do this:

grep -Pnr --files-with-matches 'httpurr' .
./README.md
./cmd/httpurr/main.go
./go.mod
./httpurr.rb
./src/cli.go
./src/cli_test.go

Show context

Let's search for GitHub action jobs:

grep -Pnr 'jobs:' .github/workflows
.github/workflows/automerge.yml:8:jobs:
.github/workflows/lint.yml:11:jobs:
.github/workflows/release.yml:10:jobs:
.github/workflows/test.yml:11:jobs:

These results are kind of useless, because they don't return the actual job name (which is on the next line after jobs). To fix this, let's use -C (--context), which shows N lines around each match:

grep -Pnr --context=1 'jobs:' .github/workflows
.github/workflows/automerge.yml-7-
.github/workflows/automerge.yml:8:jobs:
.github/workflows/automerge.yml-9-  dependabot:
--
.github/workflows/lint.yml-10-
.github/workflows/lint.yml:11:jobs:
.github/workflows/lint.yml-12-  golangci:
--
.github/workflows/release.yml-9-
.github/workflows/release.yml:10:jobs:
.github/workflows/release.yml-11-  goreleaser:
--
.github/workflows/test.yml-10-
.github/workflows/test.yml:11:jobs:
.github/workflows/test.yml-12-  test:

It might be even better to show only the next line after the match, since we are not interested in the previous one. Use -A (--after-context) for this:

grep -Pnr --after-context=1 'jobs:' .github/workflows
.github/workflows/automerge.yml:8:jobs:
.github/workflows/automerge.yml-9-  dependabot:
--
.github/workflows/lint.yml:11:jobs:
.github/workflows/lint.yml-12-  golangci:
--
.github/workflows/release.yml:10:jobs:
.github/workflows/release.yml-11-  goreleaser:
--
.github/workflows/test.yml:11:jobs:
.github/workflows/test.yml-12-  test:

There is also -B (--before-context) for showing N lines before the match.

Nice!

Silent mode

Sometimes you just want to know if a file contains a certain string; you don't care about the number or positions of the matches.

To make grep quit immediately after the first match and not print anything, use the -q (--quiet or --silent) option. Use the return code ($?) to see if grep found anything (0 — found, 1 — not found):

grep -Pnrw --quiet main cmd/httpurr/main.go
if [ $? = "0" ]; then echo "found!"; else echo "not found"; fi
found!

Try changing the search pattern from main to Main and see how the results change.

When searching in multiple files with --quiet, grep stops after the first match in any file and does not check other files:

grep -Pnrw --quiet main .
if [ $? = "0" ]; then echo "found!"; else echo "not found"; fi
found!

Colors

To highlight matches and line numbers, use the --color=always option:

grep -Pnr --color=always codes README.md
3:    <strong><i> >> HTTP status codes on speed dial << </i></strong>
30:* List the HTTP status codes:
54:* Filter the status codes by categories:
124:		  Print HTTP status codes by category with --list;
131:		  Print HTTP status codes

Use --color=auto to let grep decide whether to use colors based on your terminal. Use --color=never to force no-color mode.

Final thoughts

That's it! We've covered just about everything grep can do. Unfortunately, it doesn't support replacing text, reading options from a configuration file, or other fancy features provided by grep alternatives like ack or ripgrep. But grep is still quite powerful, as you can probably see now.

Use grep --help to quickly see all supported options and see the official guide for option descriptions.

Have fun grepping!

Anton Zhiyanov · original · CC-BY-NC-ND-4.0 · 2024-03-29