Type: Linux search command.

Introduction

grep” stands for “global/regular expression/print”.

grep is a command used to search text for lines that match the given pattern in input files and print out the matching lines.

There are no limits on input lines but available memory.

Basic Usage

A general usage:

grep [OPTIONS]... PATTERN [FILES]...

Options

There are several types of options.

Generic Program Information

–help:
       print the help text.

-V, –version:
       print version information.

Matching control.

-e PATTERN, –regexp=PATTERN:
       use PATTERN for matching.

-f FILE, –file=FILE:
       obtain patterns from FILE, one per line.

-i, -y, –ignore-case:
       ignore case distinctions.

-v, –invert-match:
       select non-matching lines.

-w, –word-regexp:
       force PATTERN to match only whole words.

-x, –line-regexp:
       force PATTERN to match only whole lines.

General Output Control

-c, –count:
       print only a count of matching lines per FILE.

–color[=WHEN], –colour[=WHEN]:
       use markers to highlight the matching strings; WHEN is ‘always’, ‘never’ or ‘auto’.

-L, –files-without-match:
        print only names of FILEs containing no match.

-l, –files-with-match:
       print only names of FILEs containing matches.

-m NUM, –max-count=NUM:
       stop after NUM matches.

-o, –only-matching:
       show only the part of a line matching PATTERN.

-q, –quiet, –silent:
       suppress all normal output.

-s, –no-messages:
       suppress error messages.

Output Line Prefix Control

-b, –byte-offset:
       print the byte offset with output lines.

-H, –with-filename:
       print the file name for each match.

-h, –no-filename:
       suppress the file name prefix on output.

–label=LABEL:
       use LABEL as the standard input file name prefix.

-n, –line-number:
       print line number with output lines.

-T, –initial-tab:
       make tabs line up (if needed).

-u, –unix-byte-offsets:
       report offsets as if CRs were not there (MSDOS/Windows).

-Z, –null:
       print 0 byte after FILE name.

Context Line Control

-A NUM, –after-context=NUM:
       print NUM lines of training context.

-B NUM, –before-context=NUM:
       print NUM lines of leading context.

-C NUM, -NUM, –context=NUM:
       print NUM lines of output context.

File and Directory Selection

–binary-file=TYPE:
       assume that binary files are TYPE; TYPE is ‘binary’, ‘text’ or ‘without-match’.

-a, –text:
       equivalent to –binary-files=text.

-I:
       equivalent to –binary-files=without-match.

-D ACTION, –devices-ACTION:
       how to handle devices, FIFOs and sockets; ACTION is ‘read’ or ‘skip’.

-d ACTION, –directories=ACTION:
       how to handle directories; ACTION is ‘read’, ‘recurse’ or ‘skip’.

–exclude=FILE_PATTERN:
       skip files and directories matching FILE_PATTERN.

–include=FILE_PATTERN:
       search only files that match FILE_PATTERN.

–exclude-from=FILE:
       skip files matching any file pattern from FILE.

–exclude-dir=PATTERN:
       directories that match PATTERN will be skipped.

-r, –recursive:
       operand, read and process all files in that directory recursively for each directory. Like ‘–directories=recurse’.

-R, –dereference-recursive:
       operand, read and process all files in that directory recursively, following all symbolic links for each directory.

Other Options

–line-buffered:
       flush output on every line.

-U, –binary:
       do not strip CR characters at EOL (MSDOS/Windows).

-z, –null-data:
       a data line ends in 0 byte, not newline.

grep Programs

There are 4 major variants of grep:

-G, –basic-regexp:
       PATTERN is a basic regular expression (BRE)

-E, –extended-regexp:
       PATTERN is an extended regular expression (ERE)

-F, –fixed-strings:
       PATTERN is a set of newline-separated strings

-P, –perl-regexp:
       PATTERN is a Perl regular expression

In addition, two variant programs egrep and fgrep are available. egrep is the same as ‘grep -E’. fgrep is the same as ‘grep -F’.

Regular Expressions

A regular expression is a pattern that describes a set of strings.

Fundamental Structure

’.’
       matches any single character.

’?’
       the preceding item is optional and will be matched at most once.

‘*‘
       the preceding item will be matched zero or more times.

’+’
       the preceding item will be matched one or more times.

‘{n}’
       the preceding item is matched exactly n times.

‘{n,}’
       the preceding item is matched n or more times.

’{,m}’
       the preceding item is matched at most m times.

‘{n,m}’
       the preceding item is matched at least n times, but not more than m times.

The empty regular expression matches the empty string.
Two regular expressions may be concatenated.
Two regular expressions may be joined by the infix operator ‘|’.

Character Classes and Bracket Expressions

A bracket expression is a list of character enclosed by ‘[’ and ‘]’.
It matches any single character in that list; if the first character of the list is the caret ‘^’, then it matches any character not in the list.

’[:alnum:]’
       alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘c’ locale and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’.

’[:alpha:]’
       alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’; in the ‘c’ locale and ASCII character encoding, this is the same as ‘[A-Za-z]’.

’[:blank:]’
       blank characters: space and tab.

’[:cntrl:]’
       control character. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL).

’[:digit:]’
       digits: 0-9

’[:graph:]’
       graphical characters: ‘[:alnum:]’ and ‘[:punct:]’.

’[:lower:]’
       lower-case letters.

’[:print:]’
       printable characters: ‘[:alnum:]’, ‘[:punct:]’ and space.

’[:punct:]’
       punctuation characters. In the ‘c’ locale and ASCII character encoding, this includes ! “ # $ % & ‘ ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.

’[:space:]’
       space characters: in the ‘c’ locale, this includes tab, newline, vertical tab, form feed, carriage return and space.

’[:upper:]’
       upper-case letters.

’[:xdigit:]’
       hexadecimal digits: 0-9 A-F a-f.

Backslash Character and Special Expressions

‘\b’
       match the empty string at the edge of a word.

‘\B’
       match the empty string provided it is not at the edge of a word.

’\<’
       match the empty string at the beginning of word.

’\>’
       match the empty string at the end of word.

‘\w’
       match word constituent, it is a synonym for ‘[_[:alnum:]]’.

‘\W’
       match non-word constituent, it is a synonym for ‘[^_[:alnum:]]’.

‘\s’
       match whitespace, it is a synonym for ‘[[:space:]]’.

‘\S’
       match non-whitespace, it is a synonym for ‘[^[:space:]]’.

Anchoring

The caret ‘^’ matches the beginning of a line.
The dollar sign ‘$’ matches the end of a line.

Extended Regular Expressions

’|’
       alternation. Expression can be chosen between more than 2 choices by additional pip characters.

’()’
       grouping.

’{}’
       specifies match repetition. The brace characters can specify the number of times that a match is repeated.

Examples

Use the GPL file in the common licenses directory as an example.

cd /usr/share/common-licenses

Basic Examples

The output of the result will be every line containing the pattern text:

grep "GNU" GPL-3

The output of this result will be every line containing the word “license” (with any mixed cases):

grep -i "license" GPL-3

The output of the result will be the lines that do NOT contain the word “the”:

grep -v "the" GPL-3

The output of the result will be the matching lines including the line numbers:

grep -n "the" GPL-3

Regular Expressions Examples

Only match “GNU” if it occurs at the beginning of a line:

grep "^GNU" GPL-3

Only match “and” if it occurs at the end of a line:

grep "and$" GPL-3

Match anything that has 2 characters and then the string “cept”:

grep "..cept" GPL-3

Find the lines that contain “too” or “two”:

grep "t[wo]o" GPL-3

Find the pattern “.ode” but not match “code”:

grep "[^c]ode" GPL-3

Find lines that begin with a capital letter:

grep "^[A-Z]" GPL-3
grep "^[[:upper:]]" GPL-3

Find lines that contain an opening and closing parenthesis, with only letters and single spaces in between:

grep "([A-Za-z ]*)" GPL-3

Find lines that begin with a capital letter and end with a period:

grep "^[A-Z].*\.$" GPL-3

Extended Regular Expressions Examples

Find either “GPL” or “General Public License” in the text:

grep -E "(GPL|General Public License)" GPL-3

Find lines that match “copyright” and “right”:

grep -E "(copy)?right" GPL-3

Find lines that contain the string “free” plus one or more characters that are not whitespace (such as “freedom” and “free.”):

grep -E "free[^[:space:]]+" GPL-3

Find lines that contain any words that have between 16 and 20 characters:

grep -E "[[:alpha:]]{16,20}" GPL-3

Other Examples

List just the names of matching files:

grep -l "GPL" *

Search directories recursively:

grep -r "GPL" .

Search for a whole word instead of a part of a word:

grep -w "GPL" *

Output context around the matching lines:

grep -C 2 "GPL" *

Force to print the name of the file:

grep "GPL" GPL-3 /dev/null
grep -H "GPL" GPL-3

References

  1. GNU grep home page
  2. GNU Grep 3.0


blog comments powered by Disqus

Published

27 June 2018

Tags