version 1.1.1.2, 2012/10/09 09:19:17
|
version 1.1.1.3, 2013/07/22 08:25:56
|
Line 1
|
Line 1
|
PCREGREP(1) PCREGREP(1) | PCREGREP(1) General Commands Manual PCREGREP(1) |
|
|
|
|
|
|
NAME |
NAME |
pcregrep - a grep with Perl-compatible regular expressions. |
pcregrep - a grep with Perl-compatible regular expressions. |
|
|
|
|
SYNOPSIS |
SYNOPSIS |
pcregrep [options] [long options] [pattern] [path1 path2 ...] |
pcregrep [options] [long options] [pattern] [path1 path2 ...] |
|
|
Line 26 DESCRIPTION
|
Line 26 DESCRIPTION
|
with slashes, as is common in Perl scripts), they are interpreted as |
with slashes, as is common in Perl scripts), they are interpreted as |
part of the pattern. Quotes can of course be used to delimit patterns |
part of the pattern. Quotes can of course be used to delimit patterns |
on the command line because they are interpreted by the shell, and |
on the command line because they are interpreted by the shell, and |
indeed they are required if a pattern contains white space or shell | indeed quotes are required if a pattern contains white space or shell |
metacharacters. |
metacharacters. |
|
|
The first argument that follows any option settings is treated as the |
The first argument that follows any option settings is treated as the |
Line 56 DESCRIPTION
|
Line 56 DESCRIPTION
|
times this size is used (to allow for buffering "before" and "after" |
times this size is used (to allow for buffering "before" and "after" |
lines). An error occurs if a line overflows the buffer. |
lines). An error occurs if a line overflows the buffer. |
|
|
Patterns are limited to 8K or BUFSIZ bytes, whichever is the greater. | Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the |
BUFSIZ is defined in <stdio.h>. When there is more than one pattern | greater. BUFSIZ is defined in <stdio.h>. When there is more than one |
(specified by the use of -e and/or -f), each pattern is applied to each | pattern (specified by the use of -e and/or -f), each pattern is applied |
line in the order in which they are defined, except that all the -e | to each line in the order in which they are defined, except that all |
patterns are tried before the -f patterns. | the -e patterns are tried before the -f patterns. |
|
|
By default, as soon as one pattern matches (or fails to match when -v | By default, as soon as one pattern matches a line, no further patterns |
is used), no further patterns are considered. However, if --colour (or | are considered. However, if --colour (or --color) is used to colour the |
--color) is used to colour the matching substrings, or if --only-match- | matching substrings, or if --only-matching, --file-offsets, or --line- |
ing, --file-offsets, or --line-offsets is used to output only the part | offsets is used to output only the part of the line that matched |
of the line that matched (either shown literally, or as an offset), | (either shown literally, or as an offset), scanning resumes immediately |
scanning resumes immediately following the match, so that further | following the match, so that further matches on the same line can be |
matches on the same line can be found. If there are multiple patterns, | found. If there are multiple patterns, they are all tried on the |
they are all tried on the remainder of the line, but patterns that fol- | remainder of the line, but patterns that follow the one that matched |
low the one that matched are not tried on the earlier part of the line. | are not tried on the earlier part of the line. |
|
|
This is the same behaviour as GNU grep, but it does mean that the order | This behaviour means that the order in which multiple patterns are |
in which multiple patterns are specified can affect the output when one | specified can affect the output when one of the above options is used. |
of the above options is used. | This is no longer the same behaviour as GNU grep, which now manages to |
| display earlier matches for later patterns (as long as there is no |
| overlap). |
|
|
Patterns that can match an empty string are accepted, but empty string |
Patterns that can match an empty string are accepted, but empty string |
matches are never recognized. An example is the pattern |
matches are never recognized. An example is the pattern |
Line 112 OPTIONS
|
Line 114 OPTIONS
|
The order in which some of the options appear can affect the output. |
The order in which some of the options appear can affect the output. |
For example, both the -h and -l options affect the printing of file |
For example, both the -h and -l options affect the printing of file |
names. Whichever comes later in the command line will be the one that |
names. Whichever comes later in the command line will be the one that |
takes effect. Numerical values for options may be followed by K or M, | takes effect. Similarly, except where noted below, if an option is |
to signify multiplication by 1024 or 1024*1024 respectively. | given twice, the later setting is used. Numerical values for options |
| may be followed by K or M, to signify multiplication by 1024 or |
| 1024*1024 respectively. |
|
|
-- This terminates the list of options. It is useful if the next |
-- This terminates the list of options. It is useful if the next |
item on the command line starts with a hyphen but is not an |
item on the command line starts with a hyphen but is not an |
Line 208 OPTIONS
|
Line 212 OPTIONS
|
|
|
-d action, --directories=action |
-d action, --directories=action |
If an input path is a directory, "action" specifies how it is |
If an input path is a directory, "action" specifies how it is |
to be processed. Valid values are "read" (the default), | to be processed. Valid values are "read" (the default in |
"recurse" (equivalent to the -r option), or "skip" (silently | non-Windows environments, for compatibility with GNU grep), |
skip the path). In the default case, directories are read as | "recurse" (equivalent to the -r option), or "skip" (silently |
if they were ordinary files. In some operating systems the | skip the path, the default in Windows environments). In the |
effect of reading a directory like this is an immediate end- | "read" case, directories are read as if they were ordinary |
of-file. | files. In some operating systems the effect of reading a |
| directory like this is an immediate end-of-file; in others it |
| may provoke an error. |
|
|
-e pattern, --regex=pattern, --regexp=pattern |
-e pattern, --regex=pattern, --regexp=pattern |
Specify a pattern to be matched. This option can be used mul- |
Specify a pattern to be matched. This option can be used mul- |
Line 221 OPTIONS
|
Line 227 OPTIONS
|
be used as a way of specifying a single pattern that starts |
be used as a way of specifying a single pattern that starts |
with a hyphen. When -e is used, no argument pattern is taken |
with a hyphen. When -e is used, no argument pattern is taken |
from the command line; all arguments are treated as file |
from the command line; all arguments are treated as file |
names. There is an overall maximum of 100 patterns. They are | names. There is no limit to the number of patterns. They are |
applied to each line in the order in which they are defined |
applied to each line in the order in which they are defined |
until one matches (or fails to match if -v is used). If -f is | until one matches. |
used with -e, the command line patterns are matched first, | |
followed by the patterns from the file, independent of the | |
order in which these options are specified. Note that multi- | |
ple use of -e is not the same as a single pattern with alter- | |
natives. For example, X|Y finds the first character in a line | |
that is X or Y, whereas if the two patterns are given sepa- | |
rately, pcregrep finds X if it is present, even if it follows | |
Y in the line. It finds Y only if there is no X in the line. | |
This really matters only if you are using -o to show the | |
part(s) of the line that matched. | |
|
|
|
If -f is used with -e, the command line patterns are matched |
|
first, followed by the patterns from the file(s), independent |
|
of the order in which these options are specified. Note that |
|
multiple use of -e is not the same as a single pattern with |
|
alternatives. For example, X|Y finds the first character in a |
|
line that is X or Y, whereas if the two patterns are given |
|
separately, with X first, pcregrep finds X if it is present, |
|
even if it follows Y in the line. It finds Y only if there is |
|
no X in the line. This matters only if you are using -o or |
|
--colo(u)r to show the part(s) of the line that matched. |
|
|
--exclude=pattern |
--exclude=pattern |
When pcregrep is searching the files in a directory as a con- | Files (but not directories) whose names match the pattern are |
sequence of the -r (recursive search) option, any regular | skipped without being processed. This applies to all files, |
files whose names match the pattern are excluded. Subdirecto- | whether listed on the command line, obtained from --file- |
ries are not excluded by this option; they are searched | list, or by scanning a directory. The pattern is a PCRE regu- |
recursively, subject to the --exclude-dir and --include_dir | lar expression, and is matched against the final component of |
options. The pattern is a PCRE regular expression, and is | the file name, not the entire path. The -F, -w, and -x |
matched against the final component of the file name (not the | options do not apply to this pattern. The option may be given |
entire path). If a file name matches both --include and | any number of times in order to specify multiple patterns. If |
--exclude, it is excluded. There is no short form for this | a file name matches both an --include and an --exclude pat- |
option. | tern, it is excluded. There is no short form for this option. |
|
|
|
--exclude-from=filename |
|
Treat each non-empty line of the file as the data for an |
|
--exclude option. What constitutes a newline when reading the |
|
file is the operating system's default. The --newline option |
|
has no effect on this option. This option may be given more |
|
than once in order to specify a number of files to read. |
|
|
--exclude-dir=pattern |
--exclude-dir=pattern |
When pcregrep is searching the contents of a directory as a | Directories whose names match the pattern are skipped without |
consequence of the -r (recursive search) option, any subdi- | being processed, whatever the setting of the --recursive |
rectories whose names match the pattern are excluded. (Note | option. This applies to all directories, whether listed on |
that the --exclude option does not affect subdirectories.) | the command line, obtained from --file-list, or by scanning a |
The pattern is a PCRE regular expression, and is matched | parent directory. The pattern is a PCRE regular expression, |
against the final component of the name (not the entire | and is matched against the final component of the directory |
path). If a subdirectory name matches both --include-dir and | name, not the entire path. The -F, -w, and -x options do not |
--exclude-dir, it is excluded. There is no short form for | apply to this pattern. The option may be given any number of |
this option. | times in order to specify more than one pattern. If a direc- |
| tory matches both --include-dir and --exclude-dir, it is |
| excluded. There is no short form for this option. |
|
|
-F, --fixed-strings |
-F, --fixed-strings |
Interpret each pattern as a list of fixed strings, separated | Interpret each data-matching pattern as a list of fixed |
by newlines, instead of as a regular expression. The -w | strings, separated by newlines, instead of as a regular |
(match as a word) and -x (match whole line) options can be | expression. What constitutes a newline for this purpose is |
used with -F. They apply to each of the fixed strings. A line | controlled by the --newline option. The -w (match as a word) |
is selected if any of the fixed strings are found in it (sub- | and -x (match whole line) options can be used with -F. They |
ject to -w or -x, if present). | apply to each of the fixed strings. A line is selected if any |
| of the fixed strings are found in it (subject to -w or -x, if |
| present). This option applies only to the patterns that are |
| matched against the contents of files; it does not apply to |
| patterns specified by any of the --include or --exclude |
| options. |
|
|
-f filename, --file=filename |
-f filename, --file=filename |
Read a number of patterns from the file, one per line, and | Read patterns from the file, one per line, and match them |
match them against each line of input. A data line is output | against each line of input. What constitutes a newline when |
if any of the patterns match it. The filename can be given as | reading the file is the operating system's default. The |
"-" to refer to the standard input. When -f is used, patterns | --newline option has no effect on this option. Trailing white |
specified on the command line using -e may also be present; | space is removed from each line, and blank lines are ignored. |
they are tested before the file's patterns. However, no other | An empty file contains no patterns and therefore matches |
pattern is taken from the command line; all arguments are | nothing. See also the comments about multiple patterns versus |
treated as the names of paths to be searched. There is an | a single pattern with alternatives in the description of -e |
overall maximum of 100 patterns. Trailing white space is | above. |
removed from each line, and blank lines are ignored. An empty | |
file contains no patterns and therefore matches nothing. See | |
also the comments about multiple patterns versus a single | |
pattern with alternatives in the description of -e above. | |
|
|
|
If this option is given more than once, all the specified |
|
files are read. A data line is output if any of the patterns |
|
match it. A filename can be given as "-" to refer to the |
|
standard input. When -f is used, patterns specified on the |
|
command line using -e may also be present; they are tested |
|
before the file's patterns. However, no other pattern is |
|
taken from the command line; all arguments are treated as the |
|
names of paths to be searched. |
|
|
--file-list=filename |
--file-list=filename |
Read a list of files to be searched from the given file, one | Read a list of files and/or directories that are to be |
per line. Trailing white space is removed from each line, and | scanned from the given file, one per line. Trailing white |
blank lines are ignored. These files are searched before any | space is removed from each line, and blank lines are ignored. |
others that may be listed on the command line. The filename | These paths are processed before any that are listed on the |
can be given as "-" to refer to the standard input. If --file | command line. The filename can be given as "-" to refer to |
and --file-list are both specified as "-", patterns are read | the standard input. If --file and --file-list are both spec- |
first. This is useful only when the standard input is a ter- | ified as "-", patterns are read first. This is useful only |
minal, from which further lines (the list of files) can be | when the standard input is a terminal, from which further |
read after an end-of-file indication. | lines (the list of files) can be read after an end-of-file |
| indication. If this option is given more than once, all the |
| specified files are read. |
|
|
--file-offsets |
--file-offsets |
Instead of showing lines or parts of lines that match, show | Instead of showing lines or parts of lines that match, show |
each match as an offset from the start of the file and a | each match as an offset from the start of the file and a |
length, separated by a comma. In this mode, no context is | length, separated by a comma. In this mode, no context is |
shown. That is, the -A, -B, and -C options are ignored. If | shown. That is, the -A, -B, and -C options are ignored. If |
there is more than one match in a line, each of them is shown |
there is more than one match in a line, each of them is shown |
separately. This option is mutually exclusive with --line- | separately. This option is mutually exclusive with --line- |
offsets and --only-matching. |
offsets and --only-matching. |
|
|
-H, --with-filename |
-H, --with-filename |
Force the inclusion of the filename at the start of output | Force the inclusion of the filename at the start of output |
lines when searching a single file. By default, the filename | lines when searching a single file. By default, the filename |
is not shown in this case. For matching lines, the filename | is not shown in this case. For matching lines, the filename |
is followed by a colon; for context lines, a hyphen separator |
is followed by a colon; for context lines, a hyphen separator |
is used. If a line number is also being output, it follows | is used. If a line number is also be is used. If a line number is also being output, it follows |
the file name. |
the file name. |
|
|
-h, --no-filename |
-h, --no-filename |
Suppress the output filenames when searching multiple files. | Suppress the output filenames when searching multiple files. |
By default, filenames are shown when multiple files are | By default, filenames are shown when multiple files are |
searched. For matching lines, the filename is followed by a | searched. For matching lines, the filename is followed by a |
colon; for context lines, a hyphen separator is used. If a | colon; for context lines, a hyphen separator is used. If a |
line number is also being output, it follows the file name. |
line number is also being output, it follows the file name. |
|
|
--help Output a help message, giving brief details of the command | --help Output a help message, giving brief details of the command |
options and file type support, and then exit. | options and file type support, and then exit. Anything else |
| on the command line is ignored. |
|
|
-I Treat binary files as never matching. This is equivalent to |
-I Treat binary files as never matching. This is equivalent to |
--binary-files=without-match. |
--binary-files=without-match. |
Line 326 OPTIONS
|
Line 355 OPTIONS
|
Ignore upper/lower case distinctions during comparisons. |
Ignore upper/lower case distinctions during comparisons. |
|
|
--include=pattern |
--include=pattern |
When pcregrep is searching the files in a directory as a con- | If any --include patterns are specified, the only files that |
sequence of the -r (recursive search) option, only those reg- | are processed are those that match one of the patterns (and |
ular files whose names match the pattern are included. Subdi- | do not match an --exclude pattern). This option does not |
rectories are always included and searched recursively, sub- | affect directories, but it applies to all files, whether |
ject to the --include-dir and --exclude-dir options. The pat- | listed on the command line, obtained from --file-list, or by |
tern is a PCRE regular expression, and is matched against the | scanning a directory. The pattern is a PCRE regular expres- |
final component of the file name (not the entire path). If a | sion, and is matched against the final component of the file |
file name matches both --include and --exclude, it is | name, not the entire path. The -F, -w, and -x options do not |
excluded. There is no short form for this option. | apply to this pattern. The option may be given any number of |
| times. If a file name matches both an --include and an |
| --exclude pattern, it is excluded. There is no short form |
| for this option. |
|
|
|
--include-from=filename |
|
Treat each non-empty line of the file as the data for an |
|
--include option. What constitutes a newline for this purpose |
|
is the operating system's default. The --newline option has |
|
no effect on this option. This option may be given any number |
|
of times; all the files are read. |
|
|
--include-dir=pattern |
--include-dir=pattern |
When pcregrep is searching the contents of a directory as a | If any --include-dir patterns are specified, the only direc- |
consequence of the -r (recursive search) option, only those | tories that are processed are those that match one of the |
subdirectories whose names match the pattern are included. | patterns (and do not match an --exclude-dir pattern). This |
(Note that the --include option does not affect subdirecto- | applies to all directories, whether listed on the command |
ries.) The pattern is a PCRE regular expression, and is | line, obtained from --file-list, or by scanning a parent |
matched against the final component of the name (not the | directory. The pattern is a PCRE regular expression, and is |
entire path). If a subdirectory name matches both --include- | matched against the final component of the directory name, |
dir and --exclude-dir, it is excluded. There is no short form | not the entire path. The -F, -w, and -x options do not apply |
for this option. | to this pattern. The option may be given any number of times. |
| If a directory matches both --include-dir and --exclude-dir, |
| it is excluded. There is no short form for this option. |
|
|
-L, --files-without-match |
-L, --files-without-match |
Instead of outputting lines from the files, just output the | Instead of outputting lines from the files, just output the |
names of the files that do not contain any lines that would | names of the files that do not contain any lines that would |
have been output. Each file name is output once, on a sepa- | have been output. Each file name is output once, on a sepa- |
rate line. |
rate line. |
|
|
-l, --files-with-matches |
-l, --files-with-matches |
Instead of outputting lines from the files, just output the | Instead of outputting lines from the files, just output the |
names of the files containing lines that would have been out- |
names of the files containing lines that would have been out- |
put. Each file name is output once, on a separate line. | put. Each file name is output once, on a separate line. |
Searching normally stops as soon as a matching line is found | Searching normally stops as soon as a matching line is found |
in a file. However, if the -c (count) option is also used, | in a file. However, if the -c (count) option is also used, |
matching continues in order to obtain the correct count, and | matching continues in order to obtain the correct count, and |
those files that have at least one match are listed along | those files that have at least one match are listed along |
with their counts. Using this option with -c is a way of sup- |
with their counts. Using this option with -c is a way of sup- |
pressing the listing of files with no matches. |
pressing the listing of files with no matches. |
|
|
Line 370 OPTIONS
|
Line 411 OPTIONS
|
input)" is used. There is no short form for this option. |
input)" is used. There is no short form for this option. |
|
|
--line-buffered |
--line-buffered |
When this option is given, input is read and processed line | When this option is given, input is read and processed line |
by line, and the output is flushed after each write. By | by line, and the output is flushed after each write. By |
default, input is read in large chunks, unless pcregrep can | default, input is read in large chunks, unless pcregrep can |
determine that it is reading from a terminal (which is cur- | determine that it is reading from a terminal (which is cur- |
rently possible only in Unix environments). Output to termi- | rently possible only in Unix-like environments). Output to |
nal is normally automatically flushed by the operating sys- | terminal is normally automatically flushed by the operating |
tem. This option can be useful when the input or output is | system. This option can be useful when the input or output is |
attached to a pipe and you do not want pcregrep to buffer up | attached to a pipe and you do not want pcregrep to buffer up |
large amounts of data. However, its use will affect perfor- | large amounts of data. However, its use will affect perfor- |
mance, and the -M (multiline) option ceases to work. |
mance, and the -M (multiline) option ceases to work. |
|
|
--line-offsets |
--line-offsets |
Instead of showing lines or parts of lines that match, show | Instead of showing lines or parts of lines that match, show |
each match as a line number, the offset from the start of the |
each match as a line number, the offset from the start of the |
line, and a length. The line number is terminated by a colon | line, and a length. The line number is terminated by a colon |
(as usual; see the -n option), and the offset and length are | (as usual; see the -n option), and the offset and length are |
separated by a comma. In this mode, no context is shown. | separated by a comma. In this mode, no context is shown. |
That is, the -A, -B, and -C options are ignored. If there is | That is, the -A, -B, and -C options are ignored. If there is |
more than one match in a line, each of them is shown sepa- | more than one match in a line, each of them is shown sepa- |
rately. This option is mutually exclusive with --file-offsets |
rately. This option is mutually exclusive with --file-offsets |
and --only-matching. |
and --only-matching. |
|
|
--locale=locale-name |
--locale=locale-name |
This option specifies a locale to be used for pattern match- | This option specifies a locale to be used for pattern match- |
ing. It overrides the value in the LC_ALL or LC_CTYPE envi- | ing. It overrides the value in the LC_ALL or LC_CTYPE envi- |
ronment variables. If no locale is specified, the PCRE | ronment variables. If no locale is specified, the PCRE |
library's default (usually the "C" locale) is used. There is | library's default (usually the "C" locale) is used. There is |
no short form for this option. |
no short form for this option. |
|
|
--match-limit=number |
--match-limit=number |
Processing some regular expression patterns can require a | Processing some regular expression patterns can require a |
very large amount of memory, leading in some cases to a pro- | very large amount of memory, leading in some cases to a pro- |
gram crash if not enough is available. Other patterns may | gram crash if not enough is available. Other patterns may |
take a very long time to search for all possible matching | take a very long time to search for all possible matching |
strings. The pcre_exec() function that is called by pcregrep | strings. The pcre_exec() function that is called by pcregrep |
to do the matching has two parameters that can limit the | to do the matching has two parameters that can limit the |
resources that it uses. |
resources that it uses. |
|
|
The --match-limit option provides a means of limiting | The --match-limit option provides a means of limiting |
resource usage when processing patterns that are not going to |
resource usage when processing patterns that are not going to |
match, but which have a very large number of possibilities in |
match, but which have a very large number of possibilities in |
their search trees. The classic example is a pattern that | their search trees. The classic example is a pattern that |
uses nested unlimited repeats. Internally, PCRE uses a func- | uses nested unlimited repeats. Internally, PCRE uses a func- |
tion called match() which it calls repeatedly (sometimes | tion called match() which it calls repeatedly (sometimes |
recursively). The limit set by --match-limit is imposed on | recursively). The limit set by --match-limit is imposed on |
the number of times this function is called during a match, | the number of times this function is called during a match, |
which has the effect of limiting the amount of backtracking | which has the effect of limiting the amount of backtracking |
that can take place. |
that can take place. |
|
|
The --recursion-limit option is similar to --match-limit, but |
The --recursion-limit option is similar to --match-limit, but |
instead of limiting the total number of times that match() is |
instead of limiting the total number of times that match() is |
called, it limits the depth of recursive calls, which in turn |
called, it limits the depth of recursive calls, which in turn |
limits the amount of memory that can be used. The recursion | limits the amount of memory that can be used. The recursion |
depth is a smaller number than the total number of calls, | depth is a smaller number than the total number of calls, |
because not all calls to match() are recursive. This limit is |
because not all calls to match() are recursive. This limit is |
of use only if it is set smaller than --match-limit. |
of use only if it is set smaller than --match-limit. |
|
|
There are no short forms for these options. The default set- | There are no short forms for these options. The default set- |
tings are specified when the PCRE library is compiled, with | tings are specified when the PCRE library is compiled, with |
the default default being 10 million. |
the default default being 10 million. |
|
|
-M, --multiline |
-M, --multiline |
Allow patterns to match more than one line. When this option | Allow patterns to match more than one line. When this option |
is given, patterns may usefully contain literal newline char- |
is given, patterns may usefully contain literal newline char- |
acters and internal occurrences of ^ and $ characters. The | acters and internal occurrences of ^ and $ characters. The |
output for a successful match may consist of more than one | output for a successful match may consist of more than one |
line, the last of which is the one in which the match ended. | line, the last of which is the one in which the match ended. |
If the matched string ends with a newline sequence the output |
If the matched string ends with a newline sequence the output |
ends at the end of that line. |
ends at the end of that line. |
|
|
When this option is set, the PCRE library is called in "mul- | When this option is set, the PCRE library is called in "mul- |
tiline" mode. There is a limit to the number of lines that | tiline" mode. There is a limit to the number of lines that |
can be matched, imposed by the way that pcregrep buffers the | can be matched, imposed by the way that pcregrep buffers the |
input file as it scans it. However, pcregrep ensures that at | input file as it scans it. However, pcregrep ensures that at |
least 8K characters or the rest of the document (whichever is |
least 8K characters or the rest of the document (whichever is |
the shorter) are available for forward matching, and simi- | the shorter) are available for forward matching, and simi- |
larly the previous 8K characters (or all the previous charac- |
larly the previous 8K characters (or all the previous charac- |
ters, if fewer than 8K) are guaranteed to be available for | ters, if fewer than 8K) are guaranteed to be available for |
lookbehind assertions. This option does not work when input | lookbehind assertions. This option does not work when input |
is read line by line (see --line-buffered.) |
is read line by line (see --line-buffered.) |
|
|
-N newline-type, --newline=newline-type |
-N newline-type, --newline=newline-type |
The PCRE library supports five different conventions for | The PCRE library supports five different conventions for |
indicating the ends of lines. They are the single-character | indicating the ends of lines. They are the single-character |
sequences CR (carriage return) and LF (linefeed), the two- | sequences CR (carriage return) and LF (linefeed), the two- |
character sequence CRLF, an "anycrlf" convention, which rec- | character sequence CRLF, an "anycrlf" convention, which rec- |
ognizes any of the preceding three types, and an "any" con- | ognizes any of the preceding three types, and an "any" con- |
vention, in which any Unicode line ending sequence is assumed |
vention, in which any Unicode line ending sequence is assumed |
to end a line. The Unicode sequences are the three just men- | to end a line. The Unicode sequences are the three just men- |
tioned, plus VT (vertical tab, U+000B), FF (form feed, | tioned, plus VT (vertical tab, U+000B), FF (form feed, |
U+000C), NEL (next line, U+0085), LS (line separator, | U+000C), NEL (next line, U+0085), LS (line separator, |
U+2028), and PS (paragraph separator, U+2029). |
U+2028), and PS (paragraph separator, U+2029). |
|
|
When the PCRE library is built, a default line-ending |
When the PCRE library is built, a default line-ending |
sequence is specified. This is normally the standard | sequence is specified. This is normally the standard |
sequence for the operating system. Unless otherwise specified |
sequence for the operating system. Unless otherwise specified |
by this option, pcregrep uses the library's default. The | by this option, pcregrep uses the library's default. The |
possible values for this option are CR, LF, CRLF, ANYCRLF, or |
possible values for this option are CR, LF, CRLF, ANYCRLF, or |
ANY. This makes it possible to use pcregrep on files that | ANY. This makes it possible to use pcregrep to scan files |
have come from other environments without having to modify | that have come from other environments without having to mod- |
their line endings. If the data that is being scanned does | ify their line endings. If the data that is being scanned |
not agree with the convention set by this option, pcregrep | does not agree with the convention set by this option, pcre- |
may behave in strange ways. | grep may behave in strange ways. Note that this option does |
| not apply to files specified by the -f, --exclude-from, or |
| --include-from options, which are expected to use the operat- |
| ing system's standard newline sequence. |
|
|
-n, --line-number |
-n, --line-number |
Precede each output line by its line number in the file, fol- |
Precede each output line by its line number in the file, fol- |
Line 503 OPTIONS
|
Line 547 OPTIONS
|
-onumber, --only-matching=number |
-onumber, --only-matching=number |
Show only the part of the line that matched the capturing |
Show only the part of the line that matched the capturing |
parentheses of the given number. Up to 32 capturing parenthe- |
parentheses of the given number. Up to 32 capturing parenthe- |
ses are supported. Because these options can be given without | ses are supported, and -o0 is equivalent to -o without a num- |
an argument (see above), if an argument is present, it must | ber. Because these options can be given without an argument |
be given in the same shell item, for example, -o3 or --only- | (see above), if an argument is present, it must be given in |
matching=2. The comments given for the non-argument case | the same shell item, for example, -o3 or --only-matching=2. |
above also apply to this case. If the specified capturing | The comments given for the non-argument case above also apply |
parentheses do not exist in the pattern, or were not set in | to this case. If the specified capturing parentheses do not |
the match, nothing is output unless the file name or line | exist in the pattern, or were not set in the match, nothing |
number are being printed. | is output unless the file name or line number are being |
| printed. |
|
|
|
If this option is given multiple times, multiple substrings |
|
are output, in the order the options are given. For example, |
|
-o3 -o1 -o3 causes the substrings matched by capturing paren- |
|
theses 3 and 1 and then 3 again to be output. By default, |
|
there is no separator (but see the next option). |
|
|
|
--om-separator=text |
|
Specify a separating string for multiple occurrences of -o. |
|
The default is an empty string. Separating strings are never |
|
coloured. |
|
|
-q, --quiet |
-q, --quiet |
Work quietly, that is, display nothing except error messages. |
Work quietly, that is, display nothing except error messages. |
The exit status indicates whether or not any matches were | The exit status indicates whether or not any matches were |
found. |
found. |
|
|
-r, --recursive |
-r, --recursive |
If any given path is a directory, recursively scan the files | If any given path is a directory, recursively scan the files |
it contains, taking note of any --include and --exclude set- | it contains, taking note of any --include and --exclude set- |
tings. By default, a directory is read as a normal file; in | tings. By default, a directory is read as a normal file; in |
some operating systems this gives an immediate end-of-file. | some operating systems this gives an immediate end-of-file. |
This option is a shorthand for setting the -d option to | This option is a shorthand for setting the -d option to |
"recurse". |
"recurse". |
|
|
--recursion-limit=number |
--recursion-limit=number |
See --match-limit above. |
See --match-limit above. |
|
|
-s, --no-messages |
-s, --no-messages |
Suppress error messages about non-existent or unreadable | Suppress error messages about non-existent or unreadable |
files. Such files are quietly skipped. However, the return | files. Such files are quietly skipped. However, the return |
code is still 2, even if matches were found in other files. |
code is still 2, even if matches were found in other files. |
|
|
-u, --utf-8 |
-u, --utf-8 |
Operate in UTF-8 mode. This option is available only if PCRE | Operate in UTF-8 mode. This option is available only if PCRE |
has been compiled with UTF-8 support. Both patterns and sub- | has been compiled with UTF-8 support. All patterns (including |
ject lines must be valid strings of UTF-8 characters. | those for any --exclude and --include options) and all sub- |
| ject lines that are scanned must be valid strings of UTF-8 |
| characters. |
|
|
-V, --version |
-V, --version |
Write the version numbers of pcregrep and the PCRE library | Write the version numbers of pcregrep and the PCRE library to |
that is being used to the standard error stream. | the standard output and then exit. Anything else on the com- |
| mand line is ignored. |
|
|
-v, --invert-match |
-v, --invert-match |
Invert the sense of the match, so that lines which do not |
Invert the sense of the match, so that lines which do not |
Line 548 OPTIONS
|
Line 607 OPTIONS
|
|
|
-w, --word-regex, --word-regexp |
-w, --word-regex, --word-regexp |
Force the patterns to match only whole words. This is equiva- |
Force the patterns to match only whole words. This is equiva- |
lent to having \b at the start and end of the pattern. | lent to having \b at the start and end of the pattern. This |
| option applies only to the patterns that are matched against |
| the contents of files; it does not apply to patterns speci- |
| fied by any of the --include or --exclude options. |
|
|
-x, --line-regex, --line-regexp |
-x, --line-regex, --line-regexp |
Force the patterns to be anchored (each must start matching | Force the patterns to be anchored (each must start matching |
at the beginning of a line) and in addition, require them to | at the beginning of a line) and in addition, require them to |
match entire lines. This is equivalent to having ^ and $ | match entire lines. This is equivalent to having ^ and $ |
characters at the start and end of each alternative branch in |
characters at the start and end of each alternative branch in |
every pattern. | every pattern. This option applies only to the patterns that |
| are matched against the contents of files; it does not apply |
| to patterns specified by any of the --include or --exclude |
| options. |
|
|
|
|
ENVIRONMENT VARIABLES |
ENVIRONMENT VARIABLES |
Line 569 ENVIRONMENT VARIABLES
|
Line 634 ENVIRONMENT VARIABLES
|
NEWLINES |
NEWLINES |
|
|
The -N (--newline) option allows pcregrep to scan files with different |
The -N (--newline) option allows pcregrep to scan files with different |
newline conventions from the default. However, the setting of this | newline conventions from the default. Any parts of the input files that |
option does not affect the way in which pcregrep writes information to | are written to the standard output are copied identically, with what- |
the standard error and output streams. It uses the string "\n" in C | ever newline sequences they have in the input. However, the setting of |
printf() calls to indicate newlines, relying on the C I/O library to | this option does not affect the interpretation of files specified by |
convert this to an appropriate sequence if the output is sent to a | the -f, --exclude-from, or --include-from options, which are assumed to |
file. | use the operating system's standard newline sequence, nor does it |
| affect the way in which pcregrep writes informational messages to the |
| standard error and output streams. For these it uses the string "\n" to |
| indicate newlines, relying on the C I/O library to convert this to an |
| appropriate sequence. |
|
|
|
|
OPTIONS COMPATIBILITY |
OPTIONS COMPATIBILITY |
Line 583 OPTIONS COMPATIBILITY
|
Line 652 OPTIONS COMPATIBILITY
|
in the GNU grep program. Any long option of the form --xxx-regexp (GNU |
in the GNU grep program. Any long option of the form --xxx-regexp (GNU |
terminology) is also available as --xxx-regex (PCRE terminology). How- |
terminology) is also available as --xxx-regex (PCRE terminology). How- |
ever, the --file-list, --file-offsets, --include-dir, --line-offsets, |
ever, the --file-list, --file-offsets, --include-dir, --line-offsets, |
--locale, --match-limit, -M, --multiline, -N, --newline, --recursion- | --locale, --match-limit, -M, --multiline, -N, --newline, --om-separa- |
limit, -u, and --utf-8 options are specific to pcregrep, as is the use | tor, --recursion-limit, -u, and --utf-8 options are specific to pcre- |
of the --only-matching option with a capturing parentheses number. | grep, as is the use of the --only-matching option with a capturing |
| parentheses number. |
|
|
Although most of the common options work the same way, a few are dif- | Although most of the common options work the same way, a few are dif- |
ferent in pcregrep. For example, the --include option's argument is a | ferent in pcregrep. For example, the --include option's argument is a |
glob for GNU grep, but a regular expression for pcregrep. If both the | glob for GNU grep, but a regular expression for pcregrep. If both the |
-c and -l options are given, GNU grep lists only file names, without | -c and -l options are given, GNU grep lists only file names, without |
counts, but pcregrep gives the counts. |
counts, but pcregrep gives the counts. |
|
|
|
|
OPTIONS WITH DATA |
OPTIONS WITH DATA |
|
|
There are four different ways in which an option with data can be spec- |
There are four different ways in which an option with data can be spec- |
ified. If a short form option is used, the data may follow immedi- | ified. If a short form option is used, the data may follow immedi- |
ately, or (with one exception) in the next command line item. For exam- |
ately, or (with one exception) in the next command line item. For exam- |
ple: |
ple: |
|
|
-f/some/file |
-f/some/file |
-f /some/file |
-f /some/file |
|
|
The exception is the -o option, which may appear with or without data. | The exception is the -o option, which may appear with or without data. |
Because of this, if data is present, it must follow immediately in the | Because of this, if data is present, it must follow immediately in the |
same item, for example -o3. |
same item, for example -o3. |
|
|
If a long form option is used, the data may appear in the same command | If a long form option is used, the data may appear in the same command |
line item, separated by an equals character, or (with two exceptions) | line item, separated by an equals character, or (with two exceptions) |
it may appear in the next command line item. For example: |
it may appear in the next command line item. For example: |
|
|
--file=/some/file |
--file=/some/file |
--file /some/file |
--file /some/file |
|
|
Note, however, that if you want to supply a file name beginning with ~ | Note, however, that if you want to supply a file name beginning with ~ |
as data in a shell command, and have the shell expand ~ to a home | as data in a shell command, and have the shell expand ~ to a home |
directory, you must separate the file name from the option, because the |
directory, you must separate the file name from the option, because the |
shell does not treat ~ specially unless it is at the start of an item. |
shell does not treat ~ specially unless it is at the start of an item. |
|
|
The exceptions to the above are the --colour (or --color) and --only- | The exceptions to the above are the --colour (or --color) and --only- |
matching options, for which the data is optional. If one of these | matching options, for which the data is optional. If one of these |
options does have data, it must be given in the first form, using an | options does have data, it must be given in the first form, using an |
equals character. Otherwise pcregrep will assume that it has no data. |
equals character. Otherwise pcregrep will assume that it has no data. |
|
|
|
|
MATCHING ERRORS |
MATCHING ERRORS |
|
|
It is possible to supply a regular expression that takes a very long | It is possible to supply a regular expression that takes a very long |
time to fail to match certain lines. Such patterns normally involve | time to fail to match certain lines. Such patterns normally involve |
nested indefinite repeats, for example: (a+)*\d when matched against a | nested indefinite repeats, for example: (a+)*\d when matched against a |
line of a's with no final digit. The PCRE matching function has a | line of a's with no final digit. The PCRE matching function has a |
resource limit that causes it to abort in these circumstances. If this | resource limit that causes it to abort in these circumstances. If this |
happens, pcregrep outputs an error message and the line that caused the |
happens, pcregrep outputs an error message and the line that caused the |
problem to the standard error stream. If there are more than 20 such | problem to the standard error stream. If there are more than 20 such |
errors, pcregrep gives up. |
errors, pcregrep gives up. |
|
|
The --match-limit option of pcregrep can be used to set the overall | The --match-limit option of pcregrep can be used to set the overall |
resource limit; there is a second option called --recursion-limit that | resource limit; there is a second option called --recursion-limit that |
sets a limit on the amount of memory (usually stack) that is used (see | sets a limit on the amount of memory (usually stack) that is used (see |
the discussion of these options above). |
the discussion of these options above). |
|
|
|
|
DIAGNOSTICS |
DIAGNOSTICS |
|
|
Exit status is 0 if any matches were found, 1 if no matches were found, |
Exit status is 0 if any matches were found, 1 if no matches were found, |
and 2 for syntax errors, overlong lines, non-existent or inaccessible | and 2 for syntax errors, overlong lines, non-existent or inaccessible |
files (even if matches were found in other files) or too many matching | files (even if matches were found in other files) or too many matching |
errors. Using the -s option to suppress error messages about inaccessi- |
errors. Using the -s option to suppress error messages about inaccessi- |
ble files does not affect the return code. |
ble files does not affect the return code. |
|
|
|
|
SEE ALSO |
SEE ALSO |
|
|
pcrepattern(3), pcretest(1). | pcrepattern(3), pcresyntax(3), pcretest(1). |
|
|
|
|
AUTHOR |
AUTHOR |
Line 666 AUTHOR
|
Line 736 AUTHOR
|
|
|
REVISION |
REVISION |
|
|
Last updated: 04 March 2012 | Last updated: 13 September 2012 |
Copyright (c) 1997-2012 University of Cambridge. |
Copyright (c) 1997-2012 University of Cambridge. |