version 1.1, 2012/02/21 23:05:51
|
version 1.1.1.3, 2013/07/22 08:25:56
|
Line 1
|
Line 1
|
PCREGREP(1) PCREGREP(1) | PCREGREP(1) General Commands Manual PCREGREP(1) |
|
|
|
|
|
|
NAME |
NAME |
pcregrep - a grep with Perl-compatible regular expressions. |
pcregrep - a grep with Perl-compatible regular expressions. |
|
|
|
|
SYNOPSIS |
SYNOPSIS |
pcregrep [options] [long options] [pattern] [path1 path2 ...] |
pcregrep [options] [long options] [pattern] [path1 path2 ...] |
|
|
Line 26 DESCRIPTION
|
Line 26 DESCRIPTION
|
with slashes, as is common in Perl scripts), they are interpreted as |
with slashes, as is common in Perl scripts), they are interpreted as |
part of the pattern. Quotes can of course be used to delimit patterns |
part of the pattern. Quotes can of course be used to delimit patterns |
on the command line because they are interpreted by the shell, and |
on the command line because they are interpreted by the shell, and |
indeed they are required if a pattern contains white space or shell | indeed quotes are required if a pattern contains white space or shell |
metacharacters. |
metacharacters. |
|
|
The first argument that follows any option settings is treated as the |
The first argument that follows any option settings is treated as the |
Line 56 DESCRIPTION
|
Line 56 DESCRIPTION
|
times this size is used (to allow for buffering "before" and "after" |
times this size is used (to allow for buffering "before" and "after" |
lines). An error occurs if a line overflows the buffer. |
lines). An error occurs if a line overflows the buffer. |
|
|
Patterns are limited to 8K or BUFSIZ bytes, whichever is the greater. | Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the |
BUFSIZ is defined in <stdio.h>. When there is more than one pattern | greater. BUFSIZ is defined in <stdio.h>. When there is more than one |
(specified by the use of -e and/or -f), each pattern is applied to each | pattern (specified by the use of -e and/or -f), each pattern is applied |
line in the order in which they are defined, except that all the -e | to each line in the order in which they are defined, except that all |
patterns are tried before the -f patterns. | the -e patterns are tried before the -f patterns. |
|
|
By default, as soon as one pattern matches (or fails to match when -v | By default, as soon as one pattern matches a line, no further patterns |
is used), no further patterns are considered. However, if --colour (or | are considered. However, if --colour (or --color) is used to colour the |
--color) is used to colour the matching substrings, or if --only-match- | matching substrings, or if --only-matching, --file-offsets, or --line- |
ing, --file-offsets, or --line-offsets is used to output only the part | offsets is used to output only the part of the line that matched |
of the line that matched (either shown literally, or as an offset), | (either shown literally, or as an offset), scanning resumes immediately |
scanning resumes immediately following the match, so that further | following the match, so that further matches on the same line can be |
matches on the same line can be found. If there are multiple patterns, | found. If there are multiple patterns, they are all tried on the |
they are all tried on the remainder of the line, but patterns that fol- | remainder of the line, but patterns that follow the one that matched |
low the one that matched are not tried on the earlier part of the line. | are not tried on the earlier part of the line. |
|
|
This is the same behaviour as GNU grep, but it does mean that the order | This behaviour means that the order in which multiple patterns are |
in which multiple patterns are specified can affect the output when one | specified can affect the output when one of the above options is used. |
of the above options is used. | This is no longer the same behaviour as GNU grep, which now manages to |
| display earlier matches for later patterns (as long as there is no |
| overlap). |
|
|
Patterns that can match an empty string are accepted, but empty string |
Patterns that can match an empty string are accepted, but empty string |
matches are never recognized. An example is the pattern |
matches are never recognized. An example is the pattern |
Line 98 SUPPORT FOR COMPRESSED FILES
|
Line 100 SUPPORT FOR COMPRESSED FILES
|
so treated. |
so treated. |
|
|
|
|
|
BINARY FILES |
|
|
|
By default, a file that contains a binary zero byte within the first |
|
1024 bytes is identified as a binary file, and is processed specially. |
|
(GNU grep also identifies binary files in this manner.) See the |
|
--binary-files option for a means of changing the way binary files are |
|
handled. |
|
|
|
|
OPTIONS |
OPTIONS |
|
|
The order in which some of the options appear can affect the output. |
The order in which some of the options appear can affect the output. |
For example, both the -h and -l options affect the printing of file |
For example, both the -h and -l options affect the printing of file |
names. Whichever comes later in the command line will be the one that |
names. Whichever comes later in the command line will be the one that |
takes effect. Numerical values for options may be followed by K or M, | takes effect. Similarly, except where noted below, if an option is |
to signify multiplication by 1024 or 1024*1024 respectively. | given twice, the later setting is used. Numerical values for options |
| may be followed by K or M, to signify multiplication by 1024 or |
| 1024*1024 respectively. |
|
|
-- This terminates the list of options. It is useful if the next |
-- This terminates the list of options. It is useful if the next |
item on the command line starts with a hyphen but is not an |
item on the command line starts with a hyphen but is not an |
Line 121 OPTIONS
|
Line 134 OPTIONS
|
pcregrep guarantees to have up to 8K of following text avail- |
pcregrep guarantees to have up to 8K of following text avail- |
able for context output. |
able for context output. |
|
|
|
-a, --text |
|
Treat binary files as text. This is equivalent to --binary- |
|
files=text. |
|
|
-B number, --before-context=number |
-B number, --before-context=number |
Output number lines of context before each matching line. If | Output number lines of context before each matching line. If |
filenames and/or line numbers are being output, a hyphen sep- |
filenames and/or line numbers are being output, a hyphen sep- |
arator is used instead of a colon for the context lines. A | arator is used instead of a colon for the context lines. A |
line containing "--" is output between each group of lines, | line containing "--" is output between each group of lines, |
unless they are in fact contiguous in the input file. The | unless they are in fact contiguous in the input file. The |
value of number is expected to be relatively small. However, | value of number is expected to be relatively small. However, |
pcregrep guarantees to have up to 8K of preceding text avail- |
pcregrep guarantees to have up to 8K of preceding text avail- |
able for context output. |
able for context output. |
|
|
|
--binary-files=word |
|
Specify how binary files are to be processed. If the word is |
|
"binary" (the default), pattern matching is performed on |
|
binary files, but the only output is "Binary file <name> |
|
matches" when a match succeeds. If the word is "text", which |
|
is equivalent to the -a or --text option, binary files are |
|
processed in the same way as any other file. In this case, |
|
when a match succeeds, the output may be binary garbage, |
|
which can have nasty effects if sent to a terminal. If the |
|
word is "without-match", which is equivalent to the -I |
|
option, binary files are not processed at all; they are |
|
assumed not to be of interest. |
|
|
--buffer-size=number |
--buffer-size=number |
Set the parameter that controls how much memory is used for | Set the parameter that controls how much memory is used for |
buffering files that are being scanned. |
buffering files that are being scanned. |
|
|
-C number, --context=number |
-C number, --context=number |
Output number lines of context both before and after each | Output number lines of context both before and after each |
matching line. This is equivalent to setting both -A and -B | matching line. This is equivalent to setting both -A and -B |
to the same value. |
to the same value. |
|
|
-c, --count |
-c, --count |
Do not output individual lines from the files that are being | Do not output individual lines from the files that are being |
scanned; instead output the number of lines that would other- |
scanned; instead output the number of lines that would other- |
wise have been shown. If no lines are selected, the number | wise have been shown. If no lines are selected, the number |
zero is output. If several files are are being scanned, a | zero is output. If several files are are being scanned, a |
count is output for each of them. However, if the --files- | count is output for each of them. However, if the --files- |
with-matches option is also used, only those files whose | with-matches option is also used, only those files whose |
counts are greater than zero are listed. When -c is used, the |
counts are greater than zero are listed. When -c is used, the |
-A, -B, and -C options are ignored. |
-A, -B, and -C options are ignored. |
|
|
--colour, --color |
--colour, --color |
If this option is given without any data, it is equivalent to |
If this option is given without any data, it is equivalent to |
"--colour=auto". If data is required, it must be given in | "--colour=auto". If data is required, it must be given in |
the same shell item, separated by an equals sign. |
the same shell item, separated by an equals sign. |
|
|
--colour=value, --color=value |
--colour=value, --color=value |
This option specifies under what circumstances the parts of a |
This option specifies under what circumstances the parts of a |
line that matched a pattern should be coloured in the output. |
line that matched a pattern should be coloured in the output. |
By default, the output is not coloured. The value (which is | By default, the output is not coloured. The value (which is |
optional, see above) may be "never", "always", or "auto". In | optional, see above) may be "never", "always", or "auto". In |
the latter case, colouring happens only if the standard out- | the latter case, colouring happens only if the standard out- |
put is connected to a terminal. More resources are used when | put is connected to a terminal. More resources are used when |
colouring is enabled, because pcregrep has to search for all | colouring is enabled, because pcregrep has to search for all |
possible matches in a line, not just one, in order to colour | possible matches in a line, not just one, in order to colour |
them all. |
them all. |
|
|
The colour that is used can be specified by setting the envi- |
The colour that is used can be specified by setting the envi- |
ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value |
ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value |
of this variable should be a string of two numbers, separated |
of this variable should be a string of two numbers, separated |
by a semicolon. They are copied directly into the control | by a semicolon. They are copied directly into the control |
string for setting colour on a terminal, so it is your | string for setting colour on a terminal, so it is your |
responsibility to ensure that they make sense. If neither of | responsibility to ensure that they make sense. If neither of |
the environment variables is set, the default is "1;31", | the environment variables is set, the default is "1;31", |
which gives red. |
which gives red. |
|
|
-D action, --devices=action |
-D action, --devices=action |
If an input path is not a regular file or a directory, | If an input path is not a regular file or a directory, |
"action" specifies how it is to be processed. Valid values | "action" specifies how it is to be processed. Valid values |
are "read" (the default) or "skip" (silently skip the path). |
are "read" (the default) or "skip" (silently skip the path). |
|
|
-d action, --directories=action |
-d action, --directories=action |
If an input path is a directory, "action" specifies how it is |
If an input path is a directory, "action" specifies how it is |
to be processed. Valid values are "read" (the default), | to be processed. Valid values are "read" (the default in |
| non-Windows environments, for compatibility with GNU grep), |
"recurse" (equivalent to the -r option), or "skip" (silently |
"recurse" (equivalent to the -r option), or "skip" (silently |
skip the path). In the default case, directories are read as | skip the path, the default in Windows environments). In the |
if they were ordinary files. In some operating systems the | "read" case, directories are read as if they were ordinary |
effect of reading a directory like this is an immediate end- | files. In some operating systems the effect of reading a |
of-file. | directory like this is an immediate end-of-file; in others it |
| may provoke an error. |
|
|
-e pattern, --regex=pattern, --regexp=pattern |
-e pattern, --regex=pattern, --regexp=pattern |
Specify a pattern to be matched. This option can be used mul- |
Specify a pattern to be matched. This option can be used mul- |
tiple times in order to specify several patterns. It can also |
tiple times in order to specify several patterns. It can also |
be used as a way of specifying a single pattern that starts | be used as a way of specifying a single pattern that starts |
with a hyphen. When -e is used, no argument pattern is taken | with a hyphen. When -e is used, no argument pattern is taken |
from the command line; all arguments are treated as file | from the command line; all arguments are treated as file |
names. There is an overall maximum of 100 patterns. They are | names. There is no limit to the number of patterns. They are |
applied to each line in the order in which they are defined | applied to each line in the order in which they are defined |
until one matches (or fails to match if -v is used). If -f is | until one matches. |
used with -e, the command line patterns are matched first, | |
followed by the patterns from the file, independent of the | |
order in which these options are specified. Note that multi- | |
ple use of -e is not the same as a single pattern with alter- | |
natives. For example, X|Y finds the first character in a line | |
that is X or Y, whereas if the two patterns are given sepa- | |
rately, pcregrep finds X if it is present, even if it follows | |
Y in the line. It finds Y only if there is no X in the line. | |
This really matters only if you are using -o to show the | |
part(s) of the line that matched. | |
|
|
|
If -f is used with -e, the command line patterns are matched |
|
first, followed by the patterns from the file(s), independent |
|
of the order in which these options are specified. Note that |
|
multiple use of -e is not the same as a single pattern with |
|
alternatives. For example, X|Y finds the first character in a |
|
line that is X or Y, whereas if the two patterns are given |
|
separately, with X first, pcregrep finds X if it is present, |
|
even if it follows Y in the line. It finds Y only if there is |
|
no X in the line. This matters only if you are using -o or |
|
--colo(u)r to show the part(s) of the line that matched. |
|
|
--exclude=pattern |
--exclude=pattern |
When pcregrep is searching the files in a directory as a con- | Files (but not directories) whose names match the pattern are |
sequence of the -r (recursive search) option, any regular | skipped without being processed. This applies to all files, |
files whose names match the pattern are excluded. Subdirecto- | whether listed on the command line, obtained from --file- |
ries are not excluded by this option; they are searched | list, or by scanning a directory. The pattern is a PCRE regu- |
recursively, subject to the --exclude-dir and --include_dir | lar expression, and is matched against the final component of |
options. The pattern is a PCRE regular expression, and is | the file name, not the entire path. The -F, -w, and -x |
matched against the final component of the file name (not the | options do not apply to this pattern. The option may be given |
entire path). If a file name matches both --include and | any number of times in order to specify multiple patterns. If |
--exclude, it is excluded. There is no short form for this | a file name matches both an --include and an --exclude pat- |
option. | tern, it is excluded. There is no short form for this option. |
|
|
|
--exclude-from=filename |
|
Treat each non-empty line of the file as the data for an |
|
--exclude option. What constitutes a newline when reading the |
|
file is the operating system's default. The --newline option |
|
has no effect on this option. This option may be given more |
|
than once in order to specify a number of files to read. |
|
|
--exclude-dir=pattern |
--exclude-dir=pattern |
When pcregrep is searching the contents of a directory as a | Directories whose names match the pattern are skipped without |
consequence of the -r (recursive search) option, any subdi- | being processed, whatever the setting of the --recursive |
rectories whose names match the pattern are excluded. (Note | option. This applies to all directories, whether listed on |
that the --exclude option does not affect subdirectories.) | the command line, obtained from --file-list, or by scanning a |
The pattern is a PCRE regular expression, and is matched | parent directory. The pattern is a PCRE regular expression, |
against the final component of the name (not the entire | and is matched against the final component of the directory |
path). If a subdirectory name matches both --include-dir and | name, not the entire path. The -F, -w, and -x options do not |
--exclude-dir, it is excluded. There is no short form for | apply to this pattern. The option may be given any number of |
this option. | times in order to specify more than one pattern. If a direc- |
| tory matches both --include-dir and --exclude-dir, it is |
| excluded. There is no short form for this option. |
|
|
-F, --fixed-strings |
-F, --fixed-strings |
Interpret each pattern as a list of fixed strings, separated | Interpret each data-matching pattern as a list of fixed |
by newlines, instead of as a regular expression. The -w | strings, separated by newlines, instead of as a regular |
(match as a word) and -x (match whole line) options can be | expression. What constitutes a newline for this purpose is |
used with -F. They apply to each of the fixed strings. A line | controlled by the --newline option. The -w (match as a word) |
is selected if any of the fixed strings are found in it (sub- | and -x (match whole line) options can be used with -F. They |
ject to -w or -x, if present). | apply to each of the fixed strings. A line is selected if any |
| of the fixed strings are found in it (subject to -w or -x, if |
| present). This option applies only to the patterns that are |
| matched against the contents of files; it does not apply to |
| patterns specified by any of the --include or --exclude |
| options. |
|
|
-f filename, --file=filename |
-f filename, --file=filename |
Read a number of patterns from the file, one per line, and | Read patterns from the file, one per line, and match them |
match them against each line of input. A data line is output | against each line of input. What constitutes a newline when |
if any of the patterns match it. The filename can be given as | reading the file is the operating system's default. The |
"-" to refer to the standard input. When -f is used, patterns | --newline option has no effect on this option. Trailing white |
specified on the command line using -e may also be present; | space is removed from each line, and blank lines are ignored. |
they are tested before the file's patterns. However, no other | An empty file contains no patterns and therefore matches |
pattern is taken from the command line; all arguments are | nothing. See also the comments about multiple patterns versus |
treated as file names. There is an overall maximum of 100 | a single pattern with alternatives in the description of -e |
patterns. Trailing white space is removed from each line, and | above. |
blank lines are ignored. An empty file contains no patterns | |
and therefore matches nothing. See also the comments about | |
multiple patterns versus a single pattern with alternatives | |
in the description of -e above. | |
|
|
|
If this option is given more than once, all the specified |
|
files are read. A data line is output if any of the patterns |
|
match it. A filename can be given as "-" to refer to the |
|
standard input. When -f is used, patterns specified on the |
|
command line using -e may also be present; they are tested |
|
before the file's patterns. However, no other pattern is |
|
taken from the command line; all arguments are treated as the |
|
names of paths to be searched. |
|
|
|
--file-list=filename |
|
Read a list of files and/or directories that are to be |
|
scanned from the given file, one per line. Trailing white |
|
space is removed from each line, and blank lines are ignored. |
|
These paths are processed before any that are listed on the |
|
command line. The filename can be given as "-" to refer to |
|
the standard input. If --file and --file-list are both spec- |
|
ified as "-", patterns are read first. This is useful only |
|
when the standard input is a terminal, from which further |
|
lines (the list of files) can be read after an end-of-file |
|
indication. If this option is given more than once, all the |
|
specified files are read. |
|
|
--file-offsets |
--file-offsets |
Instead of showing lines or parts of lines that match, show |
Instead of showing lines or parts of lines that match, show |
each match as an offset from the start of the file and a |
each match as an offset from the start of the file and a |
Line 280 OPTIONS
|
Line 345 OPTIONS
|
line number is also being output, it follows the file name. |
line number is also being output, it follows the file name. |
|
|
--help Output a help message, giving brief details of the command |
--help Output a help message, giving brief details of the command |
options and file type support, and then exit. | options and file type support, and then exit. Anything else |
| on the command line is ignored. |
|
|
|
-I Treat binary files as never matching. This is equivalent to |
|
--binary-files=without-match. |
|
|
-i, --ignore-case |
-i, --ignore-case |
Ignore upper/lower case distinctions during comparisons. |
Ignore upper/lower case distinctions during comparisons. |
|
|
--include=pattern |
--include=pattern |
When pcregrep is searching the files in a directory as a con- | If any --include patterns are specified, the only files that |
sequence of the -r (recursive search) option, only those reg- | are processed are those that match one of the patterns (and |
ular files whose names match the pattern are included. Subdi- | do not match an --exclude pattern). This option does not |
rectories are always included and searched recursively, sub- | affect directories, but it applies to all files, whether |
ject to the --include-dir and --exclude-dir options. The pat- | listed on the command line, obtained from --file-list, or by |
tern is a PCRE regular expression, and is matched against the | scanning a directory. The pattern is a PCRE regular expres- |
final component of the file name (not the entire path). If a | sion, and is matched against the final component of the file |
file name matches both --include and --exclude, it is | name, not the entire path. The -F, -w, and -x options do not |
excluded. There is no short form for this option. | apply to this pattern. The option may be given any number of |
| times. If a file name matches both an --include and an |
| --exclude pattern, it is excluded. There is no short form |
| for this option. |
|
|
|
--include-from=filename |
|
Treat each non-empty line of the file as the data for an |
|
--include option. What constitutes a newline for this purpose |
|
is the operating system's default. The --newline option has |
|
no effect on this option. This option may be given any number |
|
of times; all the files are read. |
|
|
--include-dir=pattern |
--include-dir=pattern |
When pcregrep is searching the contents of a directory as a | If any --include-dir patterns are specified, the only direc- |
consequence of the -r (recursive search) option, only those | tories that are processed are those that match one of the |
subdirectories whose names match the pattern are included. | patterns (and do not match an --exclude-dir pattern). This |
(Note that the --include option does not affect subdirecto- | applies to all directories, whether listed on the command |
ries.) The pattern is a PCRE regular expression, and is | line, obtained from --file-list, or by scanning a parent |
matched against the final component of the name (not the | directory. The pattern is a PCRE regular expression, and is |
entire path). If a subdirectory name matches both --include- | matched against the final component of the directory name, |
dir and --exclude-dir, it is excluded. There is no short form | not the entire path. The -F, -w, and -x options do not apply |
for this option. | to this pattern. The option may be given any number of times. |
| If a directory matches both --include-dir and --exclude-dir, |
| it is excluded. There is no short form for this option. |
|
|
-L, --files-without-match |
-L, --files-without-match |
Instead of outputting lines from the files, just output the | Instead of outputting lines from the files, just output the |
names of the files that do not contain any lines that would | names of the files that do not contain any lines that would |
have been output. Each file name is output once, on a sepa- | have been output. Each file name is output once, on a sepa- |
rate line. |
rate line. |
|
|
-l, --files-with-matches |
-l, --files-with-matches |
Instead of outputting lines from the files, just output the | Instead of outputting lines from the files, just output the |
names of the files containing lines that would have been out- |
names of the files containing lines that would have been out- |
put. Each file name is output once, on a separate line. | put. Each file name is output once, on a separate line. |
Searching normally stops as soon as a matching line is found | Searching normally stops as soon as a matching line is found |
in a file. However, if the -c (count) option is also used, | in a file. However, if the -c (count) option is also used, |
matching continues in order to obtain the correct count, and | matching continues in order to obtain the correct count, and |
those files that have at least one match are listed along | those files that have at least one match are listed along |
with their counts. Using this option with -c is a way of sup- |
with their counts. Using this option with -c is a way of sup- |
pressing the listing of files with no matches. |
pressing the listing of files with no matches. |
|
|
Line 330 OPTIONS
|
Line 411 OPTIONS
|
input)" is used. There is no short form for this option. |
input)" is used. There is no short form for this option. |
|
|
--line-buffered |
--line-buffered |
When this option is given, input is read and processed line | When this option is given, input is read and processed line |
by line, and the output is flushed after each write. By | by line, and the output is flushed after each write. By |
default, input is read in large chunks, unless pcregrep can | default, input is read in large chunks, unless pcregrep can |
determine that it is reading from a terminal (which is cur- | determine that it is reading from a terminal (which is cur- |
rently possible only in Unix environments). Output to termi- | rently possible only in Unix-like environments). Output to |
nal is normally automatically flushed by the operating sys- | terminal is normally automatically flushed by the operating |
tem. This option can be useful when the input or output is | system. This option can be useful when the input or output is |
attached to a pipe and you do not want pcregrep to buffer up | attached to a pipe and you do not want pcregrep to buffer up |
large amounts of data. However, its use will affect perfor- | large amounts of data. However, its use will affect perfor- |
mance, and the -M (multiline) option ceases to work. |
mance, and the -M (multiline) option ceases to work. |
|
|
--line-offsets |
--line-offsets |
Instead of showing lines or parts of lines that match, show | Instead of showing lines or parts of lines that match, show |
each match as a line number, the offset from the start of the |
each match as a line number, the offset from the start of the |
line, and a length. The line number is terminated by a colon | line, and a length. The line number is terminated by a colon |
(as usual; see the -n option), and the offset and length are | (as usual; see the -n option), and the offset and length are |
separated by a comma. In this mode, no context is shown. | separated by a comma. In this mode, no context is shown. |
That is, the -A, -B, and -C options are ignored. If there is | That is, the -A, -B, and -C options are ignored. If there is |
more than one match in a line, each of them is shown sepa- | more than one match in a line, each of them is shown sepa- |
rately. This option is mutually exclusive with --file-offsets |
rately. This option is mutually exclusive with --file-offsets |
and --only-matching. |
and --only-matching. |
|
|
--locale=locale-name |
--locale=locale-name |
This option specifies a locale to be used for pattern match- | This option specifies a locale to be used for pattern match- |
ing. It overrides the value in the LC_ALL or LC_CTYPE envi- | ing. It overrides the value in the LC_ALL or LC_CTYPE envi- |
ronment variables. If no locale is specified, the PCRE | ronment variables. If no locale is specified, the PCRE |
library's default (usually the "C" locale) is used. There is | library's default (usually the "C" locale) is used. There is |
no short form for this option. |
no short form for this option. |
|
|
--match-limit=number |
--match-limit=number |
Processing some regular expression patterns can require a | Processing some regular expression patterns can require a |
very large amount of memory, leading in some cases to a pro- | very large amount of memory, leading in some cases to a pro- |
gram crash if not enough is available. Other patterns may | gram crash if not enough is available. Other patterns may |
take a very long time to search for all possible matching | take a very long time to search for all possible matching |
strings. The pcre_exec() function that is called by pcregrep | strings. The pcre_exec() function that is called by pcregrep |
to do the matching has two parameters that can limit the | to do the matching has two parameters that can limit the |
resources that it uses. |
resources that it uses. |
|
|
The --match-limit option provides a means of limiting | The --match-limit option provides a means of limiting |
resource usage when processing patterns that are not going to |
resource usage when processing patterns that are not going to |
match, but which have a very large number of possibilities in |
match, but which have a very large number of possibilities in |
their search trees. The classic example is a pattern that | their search trees. The classic example is a pattern that |
uses nested unlimited repeats. Internally, PCRE uses a func- | uses nested unlimited repeats. Internally, PCRE uses a func- |
tion called match() which it calls repeatedly (sometimes | tion called match() which it calls repeatedly (sometimes |
recursively). The limit set by --match-limit is imposed on | recursively). The limit set by --match-limit is imposed on |
the number of times this function is called during a match, | the number of times this function is called during a match, |
which has the effect of limiting the amount of backtracking | which has the effect of limiting the amount of backtracking |
that can take place. |
that can take place. |
|
|
The --recursion-limit option is similar to --match-limit, but |
The --recursion-limit option is similar to --match-limit, but |
instead of limiting the total number of times that match() is |
instead of limiting the total number of times that match() is |
called, it limits the depth of recursive calls, which in turn |
called, it limits the depth of recursive calls, which in turn |
limits the amount of memory that can be used. The recursion | limits the amount of memory that can be used. The recursion |
depth is a smaller number than the total number of calls, | depth is a smaller number than the total number of calls, |
because not all calls to match() are recursive. This limit is |
because not all calls to match() are recursive. This limit is |
of use only if it is set smaller than --match-limit. |
of use only if it is set smaller than --match-limit. |
|
|
There are no short forms for these options. The default set- | There are no short forms for these options. The default set- |
tings are specified when the PCRE library is compiled, with | tings are specified when the PCRE library is compiled, with |
the default default being 10 million. |
the default default being 10 million. |
|
|
-M, --multiline |
-M, --multiline |
Allow patterns to match more than one line. When this option | Allow patterns to match more than one line. When this option |
is given, patterns may usefully contain literal newline char- |
is given, patterns may usefully contain literal newline char- |
acters and internal occurrences of ^ and $ characters. The | acters and internal occurrences of ^ and $ characters. The |
output for a successful match may consist of more than one | output for a successful match may consist of more than one |
line, the last of which is the one in which the match ended. | line, the last of which is the one in which the match ended. |
If the matched string ends with a newline sequence the output |
If the matched string ends with a newline sequence the output |
ends at the end of that line. |
ends at the end of that line. |
|
|
When this option is set, the PCRE library is called in "mul- | When this option is set, the PCRE library is called in "mul- |
tiline" mode. There is a limit to the number of lines that | tiline" mode. There is a limit to the number of lines that |
can be matched, imposed by the way that pcregrep buffers the | can be matched, imposed by the way that pcregrep buffers the |
input file as it scans it. However, pcregrep ensures that at | input file as it scans it. However, pcregrep ensures that at |
least 8K characters or the rest of the document (whichever is |
least 8K characters or the rest of the document (whichever is |
the shorter) are available for forward matching, and simi- | the shorter) are available for forward matching, and simi- |
larly the previous 8K characters (or all the previous charac- |
larly the previous 8K characters (or all the previous charac- |
ters, if fewer than 8K) are guaranteed to be available for | ters, if fewer than 8K) are guaranteed to be available for |
lookbehind assertions. This option does not work when input | lookbehind assertions. This option does not work when input |
is read line by line (see --line-buffered.) |
is read line by line (see --line-buffered.) |
|
|
-N newline-type, --newline=newline-type |
-N newline-type, --newline=newline-type |
The PCRE library supports five different conventions for | The PCRE library supports five different conventions for |
indicating the ends of lines. They are the single-character | indicating the ends of lines. They are the single-character |
sequences CR (carriage return) and LF (linefeed), the two- | sequences CR (carriage return) and LF (linefeed), the two- |
character sequence CRLF, an "anycrlf" convention, which rec- | character sequence CRLF, an "anycrlf" convention, which rec- |
ognizes any of the preceding three types, and an "any" con- | ognizes any of the preceding three types, and an "any" con- |
vention, in which any Unicode line ending sequence is assumed |
vention, in which any Unicode line ending sequence is assumed |
to end a line. The Unicode sequences are the three just men- | to end a line. The Unicode sequences are the three just men- |
tioned, plus VT (vertical tab, U+000B), FF (form feed, | tioned, plus VT (vertical tab, U+000B), FF (form feed, |
U+000C), NEL (next line, U+0085), LS (line separator, | U+000C), NEL (next line, U+0085), LS (line separator, |
U+2028), and PS (paragraph separator, U+2029). |
U+2028), and PS (paragraph separator, U+2029). |
|
|
When the PCRE library is built, a default line-ending |
When the PCRE library is built, a default line-ending |
sequence is specified. This is normally the standard | sequence is specified. This is normally the standard |
sequence for the operating system. Unless otherwise specified |
sequence for the operating system. Unless otherwise specified |
by this option, pcregrep uses the library's default. The | by this option, pcregrep uses the library's default. The |
possible values for this option are CR, LF, CRLF, ANYCRLF, or |
possible values for this option are CR, LF, CRLF, ANYCRLF, or |
ANY. This makes it possible to use pcregrep on files that | ANY. This makes it possible to use pcregrep to scan files |
have come from other environments without having to modify | that have come from other environments without having to mod- |
their line endings. If the data that is being scanned does | ify their line endings. If the data that is being scanned |
not agree with the convention set by this option, pcregrep | does not agree with the convention set by this option, pcre- |
may behave in strange ways. | grep may behave in strange ways. Note that this option does |
| not apply to files specified by the -f, --exclude-from, or |
| --include-from options, which are expected to use the operat- |
| ing system's standard newline sequence. |
|
|
-n, --line-number |
-n, --line-number |
Precede each output line by its line number in the file, fol- |
Precede each output line by its line number in the file, fol- |
Line 463 OPTIONS
|
Line 547 OPTIONS
|
-onumber, --only-matching=number |
-onumber, --only-matching=number |
Show only the part of the line that matched the capturing |
Show only the part of the line that matched the capturing |
parentheses of the given number. Up to 32 capturing parenthe- |
parentheses of the given number. Up to 32 capturing parenthe- |
ses are supported. Because these options can be given without | ses are supported, and -o0 is equivalent to -o without a num- |
an argument (see above), if an argument is present, it must | ber. Because these options can be given without an argument |
be given in the same shell item, for example, -o3 or --only- | (see above), if an argument is present, it must be given in |
matching=2. The comments given for the non-argument case | the same shell item, for example, -o3 or --only-matching=2. |
above also apply to this case. If the specified capturing | The comments given for the non-argument case above also apply |
parentheses do not exist in the pattern, or were not set in | to this case. If the specified capturing parentheses do not |
the match, nothing is output unless the file name or line | exist in the pattern, or were not set in the match, nothing |
number are being printed. | is output unless the file name or line number are being |
| printed. |
|
|
|
If this option is given multiple times, multiple substrings |
|
are output, in the order the options are given. For example, |
|
-o3 -o1 -o3 causes the substrings matched by capturing paren- |
|
theses 3 and 1 and then 3 again to be output. By default, |
|
there is no separator (but see the next option). |
|
|
|
--om-separator=text |
|
Specify a separating string for multiple occurrences of -o. |
|
The default is an empty string. Separating strings are never |
|
coloured. |
|
|
-q, --quiet |
-q, --quiet |
Work quietly, that is, display nothing except error messages. |
Work quietly, that is, display nothing except error messages. |
The exit status indicates whether or not any matches were | The exit status indicates whether or not any matches were |
found. |
found. |
|
|
-r, --recursive |
-r, --recursive |
If any given path is a directory, recursively scan the files | If any given path is a directory, recursively scan the files |
it contains, taking note of any --include and --exclude set- | it contains, taking note of any --include and --exclude set- |
tings. By default, a directory is read as a normal file; in | tings. By default, a directory is read as a normal file; in |
some operating systems this gives an immediate end-of-file. | some operating systems this gives an immediate end-of-file. |
This option is a shorthand for setting the -d option to | This option is a shorthand for setting the -d option to |
"recurse". |
"recurse". |
|
|
--recursion-limit=number |
--recursion-limit=number |
See --match-limit above. |
See --match-limit above. |
|
|
-s, --no-messages |
-s, --no-messages |
Suppress error messages about non-existent or unreadable | Suppress error messages about non-existent or unreadable |
files. Such files are quietly skipped. However, the return | files. Such files are quietly skipped. However, the return |
code is still 2, even if matches were found in other files. |
code is still 2, even if matches were found in other files. |
|
|
-u, --utf-8 |
-u, --utf-8 |
Operate in UTF-8 mode. This option is available only if PCRE | Operate in UTF-8 mode. This option is available only if PCRE |
has been compiled with UTF-8 support. Both patterns and sub- | has been compiled with UTF-8 support. All patterns (including |
ject lines must be valid strings of UTF-8 characters. | those for any --exclude and --include options) and all sub- |
| ject lines that are scanned must be valid strings of UTF-8 |
| characters. |
|
|
-V, --version |
-V, --version |
Write the version numbers of pcregrep and the PCRE library | Write the version numbers of pcregrep and the PCRE library to |
that is being used to the standard error stream. | the standard output and then exit. Anything else on the com- |
| mand line is ignored. |
|
|
-v, --invert-match |
-v, --invert-match |
Invert the sense of the match, so that lines which do not |
Invert the sense of the match, so that lines which do not |
Line 508 OPTIONS
|
Line 607 OPTIONS
|
|
|
-w, --word-regex, --word-regexp |
-w, --word-regex, --word-regexp |
Force the patterns to match only whole words. This is equiva- |
Force the patterns to match only whole words. This is equiva- |
lent to having \b at the start and end of the pattern. | lent to having \b at the start and end of the pattern. This |
| option applies only to the patterns that are matched against |
| the contents of files; it does not apply to patterns speci- |
| fied by any of the --include or --exclude options. |
|
|
-x, --line-regex, --line-regexp |
-x, --line-regex, --line-regexp |
Force the patterns to be anchored (each must start matching | Force the patterns to be anchored (each must start matching |
at the beginning of a line) and in addition, require them to | at the beginning of a line) and in addition, require them to |
match entire lines. This is equivalent to having ^ and $ | match entire lines. This is equivalent to having ^ and $ |
characters at the start and end of each alternative branch in |
characters at the start and end of each alternative branch in |
every pattern. | every pattern. This option applies only to the patterns that |
| are matched against the contents of files; it does not apply |
| to patterns specified by any of the --include or --exclude |
| options. |
|
|
|
|
ENVIRONMENT VARIABLES |
ENVIRONMENT VARIABLES |
Line 529 ENVIRONMENT VARIABLES
|
Line 634 ENVIRONMENT VARIABLES
|
NEWLINES |
NEWLINES |
|
|
The -N (--newline) option allows pcregrep to scan files with different |
The -N (--newline) option allows pcregrep to scan files with different |
newline conventions from the default. However, the setting of this | newline conventions from the default. Any parts of the input files that |
option does not affect the way in which pcregrep writes information to | are written to the standard output are copied identically, with what- |
the standard error and output streams. It uses the string "\n" in C | ever newline sequences they have in the input. However, the setting of |
printf() calls to indicate newlines, relying on the C I/O library to | this option does not affect the interpretation of files specified by |
convert this to an appropriate sequence if the output is sent to a | the -f, --exclude-from, or --include-from options, which are assumed to |
file. | use the operating system's standard newline sequence, nor does it |
| affect the way in which pcregrep writes informational messages to the |
| standard error and output streams. For these it uses the string "\n" to |
| indicate newlines, relying on the C I/O library to convert this to an |
| appropriate sequence. |
|
|
|
|
OPTIONS COMPATIBILITY |
OPTIONS COMPATIBILITY |
|
|
Many of the short and long forms of pcregrep's options are the same as |
Many of the short and long forms of pcregrep's options are the same as |
in the GNU grep program (version 2.5.4). Any long option of the form | in the GNU grep program. Any long option of the form --xxx-regexp (GNU |
--xxx-regexp (GNU terminology) is also available as --xxx-regex (PCRE | terminology) is also available as --xxx-regex (PCRE terminology). How- |
terminology). However, the --file-offsets, --include-dir, --line-off- | ever, the --file-list, --file-offsets, --include-dir, --line-offsets, |
sets, --locale, --match-limit, -M, --multiline, -N, --newline, --recur- | --locale, --match-limit, -M, --multiline, -N, --newline, --om-separa- |
sion-limit, -u, and --utf-8 options are specific to pcregrep, as is the | tor, --recursion-limit, -u, and --utf-8 options are specific to pcre- |
use of the --only-matching option with a capturing parentheses number. | grep, as is the use of the --only-matching option with a capturing |
| parentheses number. |
|
|
Although most of the common options work the same way, a few are dif- | Although most of the common options work the same way, a few are dif- |
ferent in pcregrep. For example, the --include option's argument is a | ferent in pcregrep. For example, the --include option's argument is a |
glob for GNU grep, but a regular expression for pcregrep. If both the | glob for GNU grep, but a regular expression for pcregrep. If both the |
-c and -l options are given, GNU grep lists only file names, without | -c and -l options are given, GNU grep lists only file names, without |
counts, but pcregrep gives the counts. |
counts, but pcregrep gives the counts. |
|
|
|
|
OPTIONS WITH DATA |
OPTIONS WITH DATA |
|
|
There are four different ways in which an option with data can be spec- |
There are four different ways in which an option with data can be spec- |
ified. If a short form option is used, the data may follow immedi- | ified. If a short form option is used, the data may follow immedi- |
ately, or (with one exception) in the next command line item. For exam- |
ately, or (with one exception) in the next command line item. For exam- |
ple: |
ple: |
|
|
-f/some/file |
-f/some/file |
-f /some/file |
-f /some/file |
|
|
The exception is the -o option, which may appear with or without data. | The exception is the -o option, which may appear with or without data. |
Because of this, if data is present, it must follow immediately in the | Because of this, if data is present, it must follow immediately in the |
same item, for example -o3. |
same item, for example -o3. |
|
|
If a long form option is used, the data may appear in the same command | If a long form option is used, the data may appear in the same command |
line item, separated by an equals character, or (with two exceptions) | line item, separated by an equals character, or (with two exceptions) |
it may appear in the next command line item. For example: |
it may appear in the next command line item. For example: |
|
|
--file=/some/file |
--file=/some/file |
--file /some/file |
--file /some/file |
|
|
Note, however, that if you want to supply a file name beginning with ~ | Note, however, that if you want to supply a file name beginning with ~ |
as data in a shell command, and have the shell expand ~ to a home | as data in a shell command, and have the shell expand ~ to a home |
directory, you must separate the file name from the option, because the |
directory, you must separate the file name from the option, because the |
shell does not treat ~ specially unless it is at the start of an item. |
shell does not treat ~ specially unless it is at the start of an item. |
|
|
The exceptions to the above are the --colour (or --color) and --only- | The exceptions to the above are the --colour (or --color) and --only- |
matching options, for which the data is optional. If one of these | matching options, for which the data is optional. If one of these |
options does have data, it must be given in the first form, using an | options does have data, it must be given in the first form, using an |
equals character. Otherwise pcregrep will assume that it has no data. |
equals character. Otherwise pcregrep will assume that it has no data. |
|
|
|
|
MATCHING ERRORS |
MATCHING ERRORS |
|
|
It is possible to supply a regular expression that takes a very long | It is possible to supply a regular expression that takes a very long |
time to fail to match certain lines. Such patterns normally involve | time to fail to match certain lines. Such patterns normally involve |
nested indefinite repeats, for example: (a+)*\d when matched against a | nested indefinite repeats, for example: (a+)*\d when matched against a |
line of a's with no final digit. The PCRE matching function has a | line of a's with no final digit. The PCRE matching function has a |
resource limit that causes it to abort in these circumstances. If this | resource limit that causes it to abort in these circumstances. If this |
happens, pcregrep outputs an error message and the line that caused the |
happens, pcregrep outputs an error message and the line that caused the |
problem to the standard error stream. If there are more than 20 such | problem to the standard error stream. If there are more than 20 such |
errors, pcregrep gives up. |
errors, pcregrep gives up. |
|
|
The --match-limit option of pcregrep can be used to set the overall | The --match-limit option of pcregrep can be used to set the overall |
resource limit; there is a second option called --recursion-limit that | resource limit; there is a second option called --recursion-limit that |
sets a limit on the amount of memory (usually stack) that is used (see | sets a limit on the amount of memory (usually stack) that is used (see |
the discussion of these options above). |
the discussion of these options above). |
|
|
|
|
DIAGNOSTICS |
DIAGNOSTICS |
|
|
Exit status is 0 if any matches were found, 1 if no matches were found, |
Exit status is 0 if any matches were found, 1 if no matches were found, |
and 2 for syntax errors, overlong lines, non-existent or inaccessible | and 2 for syntax errors, overlong lines, non-existent or inaccessible |
files (even if matches were found in other files) or too many matching | files (even if matches were found in other files) or too many matching |
errors. Using the -s option to suppress error messages about inaccessi- |
errors. Using the -s option to suppress error messages about inaccessi- |
ble files does not affect the return code. |
ble files does not affect the return code. |
|
|
|
|
SEE ALSO |
SEE ALSO |
|
|
pcrepattern(3), pcretest(1). | pcrepattern(3), pcresyntax(3), pcretest(1). |
|
|
|
|
AUTHOR |
AUTHOR |
Line 626 AUTHOR
|
Line 736 AUTHOR
|
|
|
REVISION |
REVISION |
|
|
Last updated: 06 September 2011 | Last updated: 13 September 2012 |
Copyright (c) 1997-2011 University of Cambridge. | Copyright (c) 1997-2012 University of Cambridge. |