|
version 1.1.1.1, 2012/02/21 23:05:51
|
version 1.1.1.3, 2013/07/22 08:25:56
|
|
Line 1
|
Line 1
|
| PCREGREP(1) PCREGREP(1) | PCREGREP(1) General Commands Manual PCREGREP(1) |
| |
|
| |
|
| |
|
| NAME |
NAME |
| pcregrep - a grep with Perl-compatible regular expressions. |
pcregrep - a grep with Perl-compatible regular expressions. |
| |
|
| |
|
| SYNOPSIS |
SYNOPSIS |
| pcregrep [options] [long options] [pattern] [path1 path2 ...] |
pcregrep [options] [long options] [pattern] [path1 path2 ...] |
| |
|
|
Line 26 DESCRIPTION
|
Line 26 DESCRIPTION
|
| with slashes, as is common in Perl scripts), they are interpreted as |
with slashes, as is common in Perl scripts), they are interpreted as |
| part of the pattern. Quotes can of course be used to delimit patterns |
part of the pattern. Quotes can of course be used to delimit patterns |
| on the command line because they are interpreted by the shell, and |
on the command line because they are interpreted by the shell, and |
| indeed they are required if a pattern contains white space or shell | indeed quotes are required if a pattern contains white space or shell |
| metacharacters. |
metacharacters. |
| |
|
| The first argument that follows any option settings is treated as the |
The first argument that follows any option settings is treated as the |
|
Line 56 DESCRIPTION
|
Line 56 DESCRIPTION
|
| times this size is used (to allow for buffering "before" and "after" |
times this size is used (to allow for buffering "before" and "after" |
| lines). An error occurs if a line overflows the buffer. |
lines). An error occurs if a line overflows the buffer. |
| |
|
| Patterns are limited to 8K or BUFSIZ bytes, whichever is the greater. | Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the |
| BUFSIZ is defined in <stdio.h>. When there is more than one pattern | greater. BUFSIZ is defined in <stdio.h>. When there is more than one |
| (specified by the use of -e and/or -f), each pattern is applied to each | pattern (specified by the use of -e and/or -f), each pattern is applied |
| line in the order in which they are defined, except that all the -e | to each line in the order in which they are defined, except that all |
| patterns are tried before the -f patterns. | the -e patterns are tried before the -f patterns. |
| |
|
| By default, as soon as one pattern matches (or fails to match when -v | By default, as soon as one pattern matches a line, no further patterns |
| is used), no further patterns are considered. However, if --colour (or | are considered. However, if --colour (or --color) is used to colour the |
| --color) is used to colour the matching substrings, or if --only-match- | matching substrings, or if --only-matching, --file-offsets, or --line- |
| ing, --file-offsets, or --line-offsets is used to output only the part | offsets is used to output only the part of the line that matched |
| of the line that matched (either shown literally, or as an offset), | (either shown literally, or as an offset), scanning resumes immediately |
| scanning resumes immediately following the match, so that further | following the match, so that further matches on the same line can be |
| matches on the same line can be found. If there are multiple patterns, | found. If there are multiple patterns, they are all tried on the |
| they are all tried on the remainder of the line, but patterns that fol- | remainder of the line, but patterns that follow the one that matched |
| low the one that matched are not tried on the earlier part of the line. | are not tried on the earlier part of the line. |
| |
|
| This is the same behaviour as GNU grep, but it does mean that the order | This behaviour means that the order in which multiple patterns are |
| in which multiple patterns are specified can affect the output when one | specified can affect the output when one of the above options is used. |
| of the above options is used. | This is no longer the same behaviour as GNU grep, which now manages to |
| | display earlier matches for later patterns (as long as there is no |
| | overlap). |
| |
|
| Patterns that can match an empty string are accepted, but empty string |
Patterns that can match an empty string are accepted, but empty string |
| matches are never recognized. An example is the pattern |
matches are never recognized. An example is the pattern |
|
Line 98 SUPPORT FOR COMPRESSED FILES
|
Line 100 SUPPORT FOR COMPRESSED FILES
|
| so treated. |
so treated. |
| |
|
| |
|
| |
BINARY FILES |
| |
|
| |
By default, a file that contains a binary zero byte within the first |
| |
1024 bytes is identified as a binary file, and is processed specially. |
| |
(GNU grep also identifies binary files in this manner.) See the |
| |
--binary-files option for a means of changing the way binary files are |
| |
handled. |
| |
|
| |
|
| OPTIONS |
OPTIONS |
| |
|
| The order in which some of the options appear can affect the output. |
The order in which some of the options appear can affect the output. |
| For example, both the -h and -l options affect the printing of file |
For example, both the -h and -l options affect the printing of file |
| names. Whichever comes later in the command line will be the one that |
names. Whichever comes later in the command line will be the one that |
| takes effect. Numerical values for options may be followed by K or M, | takes effect. Similarly, except where noted below, if an option is |
| to signify multiplication by 1024 or 1024*1024 respectively. | given twice, the later setting is used. Numerical values for options |
| | may be followed by K or M, to signify multiplication by 1024 or |
| | 1024*1024 respectively. |
| |
|
| -- This terminates the list of options. It is useful if the next |
-- This terminates the list of options. It is useful if the next |
| item on the command line starts with a hyphen but is not an |
item on the command line starts with a hyphen but is not an |
|
Line 121 OPTIONS
|
Line 134 OPTIONS
|
| pcregrep guarantees to have up to 8K of following text avail- |
pcregrep guarantees to have up to 8K of following text avail- |
| able for context output. |
able for context output. |
| |
|
| |
-a, --text |
| |
Treat binary files as text. This is equivalent to --binary- |
| |
files=text. |
| |
|
| -B number, --before-context=number |
-B number, --before-context=number |
| Output number lines of context before each matching line. If | Output number lines of context before each matching line. If |
| filenames and/or line numbers are being output, a hyphen sep- |
filenames and/or line numbers are being output, a hyphen sep- |
| arator is used instead of a colon for the context lines. A | arator is used instead of a colon for the context lines. A |
| line containing "--" is output between each group of lines, | line containing "--" is output between each group of lines, |
| unless they are in fact contiguous in the input file. The | unless they are in fact contiguous in the input file. The |
| value of number is expected to be relatively small. However, | value of number is expected to be relatively small. However, |
| pcregrep guarantees to have up to 8K of preceding text avail- |
pcregrep guarantees to have up to 8K of preceding text avail- |
| able for context output. |
able for context output. |
| |
|
| |
--binary-files=word |
| |
Specify how binary files are to be processed. If the word is |
| |
"binary" (the default), pattern matching is performed on |
| |
binary files, but the only output is "Binary file <name> |
| |
matches" when a match succeeds. If the word is "text", which |
| |
is equivalent to the -a or --text option, binary files are |
| |
processed in the same way as any other file. In this case, |
| |
when a match succeeds, the output may be binary garbage, |
| |
which can have nasty effects if sent to a terminal. If the |
| |
word is "without-match", which is equivalent to the -I |
| |
option, binary files are not processed at all; they are |
| |
assumed not to be of interest. |
| |
|
| --buffer-size=number |
--buffer-size=number |
| Set the parameter that controls how much memory is used for | Set the parameter that controls how much memory is used for |
| buffering files that are being scanned. |
buffering files that are being scanned. |
| |
|
| -C number, --context=number |
-C number, --context=number |
| Output number lines of context both before and after each | Output number lines of context both before and after each |
| matching line. This is equivalent to setting both -A and -B | matching line. This is equivalent to setting both -A and -B |
| to the same value. |
to the same value. |
| |
|
| -c, --count |
-c, --count |
| Do not output individual lines from the files that are being | Do not output individual lines from the files that are being |
| scanned; instead output the number of lines that would other- |
scanned; instead output the number of lines that would other- |
| wise have been shown. If no lines are selected, the number | wise have been shown. If no lines are selected, the number |
| zero is output. If several files are are being scanned, a | zero is output. If several files are are being scanned, a |
| count is output for each of them. However, if the --files- | count is output for each of them. However, if the --files- |
| with-matches option is also used, only those files whose | with-matches option is also used, only those files whose |
| counts are greater than zero are listed. When -c is used, the |
counts are greater than zero are listed. When -c is used, the |
| -A, -B, and -C options are ignored. |
-A, -B, and -C options are ignored. |
| |
|
| --colour, --color |
--colour, --color |
| If this option is given without any data, it is equivalent to |
If this option is given without any data, it is equivalent to |
| "--colour=auto". If data is required, it must be given in | "--colour=auto". If data is required, it must be given in |
| the same shell item, separated by an equals sign. |
the same shell item, separated by an equals sign. |
| |
|
| --colour=value, --color=value |
--colour=value, --color=value |
| This option specifies under what circumstances the parts of a |
This option specifies under what circumstances the parts of a |
| line that matched a pattern should be coloured in the output. |
line that matched a pattern should be coloured in the output. |
| By default, the output is not coloured. The value (which is | By default, the output is not coloured. The value (which is |
| optional, see above) may be "never", "always", or "auto". In | optional, see above) may be "never", "always", or "auto". In |
| the latter case, colouring happens only if the standard out- | the latter case, colouring happens only if the standard out- |
| put is connected to a terminal. More resources are used when | put is connected to a terminal. More resources are used when |
| colouring is enabled, because pcregrep has to search for all | colouring is enabled, because pcregrep has to search for all |
| possible matches in a line, not just one, in order to colour | possible matches in a line, not just one, in order to colour |
| them all. |
them all. |
| |
|
| The colour that is used can be specified by setting the envi- |
The colour that is used can be specified by setting the envi- |
| ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value |
ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value |
| of this variable should be a string of two numbers, separated |
of this variable should be a string of two numbers, separated |
| by a semicolon. They are copied directly into the control | by a semicolon. They are copied directly into the control |
| string for setting colour on a terminal, so it is your | string for setting colour on a terminal, so it is your |
| responsibility to ensure that they make sense. If neither of | responsibility to ensure that they make sense. If neither of |
| the environment variables is set, the default is "1;31", | the environment variables is set, the default is "1;31", |
| which gives red. |
which gives red. |
| |
|
| -D action, --devices=action |
-D action, --devices=action |
| If an input path is not a regular file or a directory, | If an input path is not a regular file or a directory, |
| "action" specifies how it is to be processed. Valid values | "action" specifies how it is to be processed. Valid values |
| are "read" (the default) or "skip" (silently skip the path). |
are "read" (the default) or "skip" (silently skip the path). |
| |
|
| -d action, --directories=action |
-d action, --directories=action |
| If an input path is a directory, "action" specifies how it is |
If an input path is a directory, "action" specifies how it is |
| to be processed. Valid values are "read" (the default), | to be processed. Valid values are "read" (the default in |
| | non-Windows environments, for compatibility with GNU grep), |
| "recurse" (equivalent to the -r option), or "skip" (silently |
"recurse" (equivalent to the -r option), or "skip" (silently |
| skip the path). In the default case, directories are read as | skip the path, the default in Windows environments). In the |
| if they were ordinary files. In some operating systems the | "read" case, directories are read as if they were ordinary |
| effect of reading a directory like this is an immediate end- | files. In some operating systems the effect of reading a |
| of-file. | directory like this is an immediate end-of-file; in others it |
| | may provoke an error. |
| |
|
| -e pattern, --regex=pattern, --regexp=pattern |
-e pattern, --regex=pattern, --regexp=pattern |
| Specify a pattern to be matched. This option can be used mul- |
Specify a pattern to be matched. This option can be used mul- |
| tiple times in order to specify several patterns. It can also |
tiple times in order to specify several patterns. It can also |
| be used as a way of specifying a single pattern that starts | be used as a way of specifying a single pattern that starts |
| with a hyphen. When -e is used, no argument pattern is taken | with a hyphen. When -e is used, no argument pattern is taken |
| from the command line; all arguments are treated as file | from the command line; all arguments are treated as file |
| names. There is an overall maximum of 100 patterns. They are | names. There is no limit to the number of patterns. They are |
| applied to each line in the order in which they are defined | applied to each line in the order in which they are defined |
| until one matches (or fails to match if -v is used). If -f is | until one matches. |
| used with -e, the command line patterns are matched first, | |
| followed by the patterns from the file, independent of the | |
| order in which these options are specified. Note that multi- | |
| ple use of -e is not the same as a single pattern with alter- | |
| natives. For example, X|Y finds the first character in a line | |
| that is X or Y, whereas if the two patterns are given sepa- | |
| rately, pcregrep finds X if it is present, even if it follows | |
| Y in the line. It finds Y only if there is no X in the line. | |
| This really matters only if you are using -o to show the | |
| part(s) of the line that matched. | |
| |
|
| |
If -f is used with -e, the command line patterns are matched |
| |
first, followed by the patterns from the file(s), independent |
| |
of the order in which these options are specified. Note that |
| |
multiple use of -e is not the same as a single pattern with |
| |
alternatives. For example, X|Y finds the first character in a |
| |
line that is X or Y, whereas if the two patterns are given |
| |
separately, with X first, pcregrep finds X if it is present, |
| |
even if it follows Y in the line. It finds Y only if there is |
| |
no X in the line. This matters only if you are using -o or |
| |
--colo(u)r to show the part(s) of the line that matched. |
| |
|
| --exclude=pattern |
--exclude=pattern |
| When pcregrep is searching the files in a directory as a con- | Files (but not directories) whose names match the pattern are |
| sequence of the -r (recursive search) option, any regular | skipped without being processed. This applies to all files, |
| files whose names match the pattern are excluded. Subdirecto- | whether listed on the command line, obtained from --file- |
| ries are not excluded by this option; they are searched | list, or by scanning a directory. The pattern is a PCRE regu- |
| recursively, subject to the --exclude-dir and --include_dir | lar expression, and is matched against the final component of |
| options. The pattern is a PCRE regular expression, and is | the file name, not the entire path. The -F, -w, and -x |
| matched against the final component of the file name (not the | options do not apply to this pattern. The option may be given |
| entire path). If a file name matches both --include and | any number of times in order to specify multiple patterns. If |
| --exclude, it is excluded. There is no short form for this | a file name matches both an --include and an --exclude pat- |
| option. | tern, it is excluded. There is no short form for this option. |
| |
|
| |
--exclude-from=filename |
| |
Treat each non-empty line of the file as the data for an |
| |
--exclude option. What constitutes a newline when reading the |
| |
file is the operating system's default. The --newline option |
| |
has no effect on this option. This option may be given more |
| |
than once in order to specify a number of files to read. |
| |
|
| --exclude-dir=pattern |
--exclude-dir=pattern |
| When pcregrep is searching the contents of a directory as a | Directories whose names match the pattern are skipped without |
| consequence of the -r (recursive search) option, any subdi- | being processed, whatever the setting of the --recursive |
| rectories whose names match the pattern are excluded. (Note | option. This applies to all directories, whether listed on |
| that the --exclude option does not affect subdirectories.) | the command line, obtained from --file-list, or by scanning a |
| The pattern is a PCRE regular expression, and is matched | parent directory. The pattern is a PCRE regular expression, |
| against the final component of the name (not the entire | and is matched against the final component of the directory |
| path). If a subdirectory name matches both --include-dir and | name, not the entire path. The -F, -w, and -x options do not |
| --exclude-dir, it is excluded. There is no short form for | apply to this pattern. The option may be given any number of |
| this option. | times in order to specify more than one pattern. If a direc- |
| | tory matches both --include-dir and --exclude-dir, it is |
| | excluded. There is no short form for this option. |
| |
|
| -F, --fixed-strings |
-F, --fixed-strings |
| Interpret each pattern as a list of fixed strings, separated | Interpret each data-matching pattern as a list of fixed |
| by newlines, instead of as a regular expression. The -w | strings, separated by newlines, instead of as a regular |
| (match as a word) and -x (match whole line) options can be | expression. What constitutes a newline for this purpose is |
| used with -F. They apply to each of the fixed strings. A line | controlled by the --newline option. The -w (match as a word) |
| is selected if any of the fixed strings are found in it (sub- | and -x (match whole line) options can be used with -F. They |
| ject to -w or -x, if present). | apply to each of the fixed strings. A line is selected if any |
| | of the fixed strings are found in it (subject to -w or -x, if |
| | present). This option applies only to the patterns that are |
| | matched against the contents of files; it does not apply to |
| | patterns specified by any of the --include or --exclude |
| | options. |
| |
|
| -f filename, --file=filename |
-f filename, --file=filename |
| Read a number of patterns from the file, one per line, and | Read patterns from the file, one per line, and match them |
| match them against each line of input. A data line is output | against each line of input. What constitutes a newline when |
| if any of the patterns match it. The filename can be given as | reading the file is the operating system's default. The |
| "-" to refer to the standard input. When -f is used, patterns | --newline option has no effect on this option. Trailing white |
| specified on the command line using -e may also be present; | space is removed from each line, and blank lines are ignored. |
| they are tested before the file's patterns. However, no other | An empty file contains no patterns and therefore matches |
| pattern is taken from the command line; all arguments are | nothing. See also the comments about multiple patterns versus |
| treated as file names. There is an overall maximum of 100 | a single pattern with alternatives in the description of -e |
| patterns. Trailing white space is removed from each line, and | above. |
| blank lines are ignored. An empty file contains no patterns | |
| and therefore matches nothing. See also the comments about | |
| multiple patterns versus a single pattern with alternatives | |
| in the description of -e above. | |
| |
|
| |
If this option is given more than once, all the specified |
| |
files are read. A data line is output if any of the patterns |
| |
match it. A filename can be given as "-" to refer to the |
| |
standard input. When -f is used, patterns specified on the |
| |
command line using -e may also be present; they are tested |
| |
before the file's patterns. However, no other pattern is |
| |
taken from the command line; all arguments are treated as the |
| |
names of paths to be searched. |
| |
|
| |
--file-list=filename |
| |
Read a list of files and/or directories that are to be |
| |
scanned from the given file, one per line. Trailing white |
| |
space is removed from each line, and blank lines are ignored. |
| |
These paths are processed before any that are listed on the |
| |
command line. The filename can be given as "-" to refer to |
| |
the standard input. If --file and --file-list are both spec- |
| |
ified as "-", patterns are read first. This is useful only |
| |
when the standard input is a terminal, from which further |
| |
lines (the list of files) can be read after an end-of-file |
| |
indication. If this option is given more than once, all the |
| |
specified files are read. |
| |
|
| --file-offsets |
--file-offsets |
| Instead of showing lines or parts of lines that match, show |
Instead of showing lines or parts of lines that match, show |
| each match as an offset from the start of the file and a |
each match as an offset from the start of the file and a |
|
Line 280 OPTIONS
|
Line 345 OPTIONS
|
| line number is also being output, it follows the file name. |
line number is also being output, it follows the file name. |
| |
|
| --help Output a help message, giving brief details of the command |
--help Output a help message, giving brief details of the command |
| options and file type support, and then exit. | options and file type support, and then exit. Anything else |
| | on the command line is ignored. |
| |
|
| |
-I Treat binary files as never matching. This is equivalent to |
| |
--binary-files=without-match. |
| |
|
| -i, --ignore-case |
-i, --ignore-case |
| Ignore upper/lower case distinctions during comparisons. |
Ignore upper/lower case distinctions during comparisons. |
| |
|
| --include=pattern |
--include=pattern |
| When pcregrep is searching the files in a directory as a con- | If any --include patterns are specified, the only files that |
| sequence of the -r (recursive search) option, only those reg- | are processed are those that match one of the patterns (and |
| ular files whose names match the pattern are included. Subdi- | do not match an --exclude pattern). This option does not |
| rectories are always included and searched recursively, sub- | affect directories, but it applies to all files, whether |
| ject to the --include-dir and --exclude-dir options. The pat- | listed on the command line, obtained from --file-list, or by |
| tern is a PCRE regular expression, and is matched against the | scanning a directory. The pattern is a PCRE regular expres- |
| final component of the file name (not the entire path). If a | sion, and is matched against the final component of the file |
| file name matches both --include and --exclude, it is | name, not the entire path. The -F, -w, and -x options do not |
| excluded. There is no short form for this option. | apply to this pattern. The option may be given any number of |
| | times. If a file name matches both an --include and an |
| | --exclude pattern, it is excluded. There is no short form |
| | for this option. |
| |
|
| |
--include-from=filename |
| |
Treat each non-empty line of the file as the data for an |
| |
--include option. What constitutes a newline for this purpose |
| |
is the operating system's default. The --newline option has |
| |
no effect on this option. This option may be given any number |
| |
of times; all the files are read. |
| |
|
| --include-dir=pattern |
--include-dir=pattern |
| When pcregrep is searching the contents of a directory as a | If any --include-dir patterns are specified, the only direc- |
| consequence of the -r (recursive search) option, only those | tories that are processed are those that match one of the |
| subdirectories whose names match the pattern are included. | patterns (and do not match an --exclude-dir pattern). This |
| (Note that the --include option does not affect subdirecto- | applies to all directories, whether listed on the command |
| ries.) The pattern is a PCRE regular expression, and is | line, obtained from --file-list, or by scanning a parent |
| matched against the final component of the name (not the | directory. The pattern is a PCRE regular expression, and is |
| entire path). If a subdirectory name matches both --include- | matched against the final component of the directory name, |
| dir and --exclude-dir, it is excluded. There is no short form | not the entire path. The -F, -w, and -x options do not apply |
| for this option. | to this pattern. The option may be given any number of times. |
| | If a directory matches both --include-dir and --exclude-dir, |
| | it is excluded. There is no short form for this option. |
| |
|
| -L, --files-without-match |
-L, --files-without-match |
| Instead of outputting lines from the files, just output the | Instead of outputting lines from the files, just output the |
| names of the files that do not contain any lines that would | names of the files that do not contain any lines that would |
| have been output. Each file name is output once, on a sepa- | have been output. Each file name is output once, on a sepa- |
| rate line. |
rate line. |
| |
|
| -l, --files-with-matches |
-l, --files-with-matches |
| Instead of outputting lines from the files, just output the | Instead of outputting lines from the files, just output the |
| names of the files containing lines that would have been out- |
names of the files containing lines that would have been out- |
| put. Each file name is output once, on a separate line. | put. Each file name is output once, on a separate line. |
| Searching normally stops as soon as a matching line is found | Searching normally stops as soon as a matching line is found |
| in a file. However, if the -c (count) option is also used, | in a file. However, if the -c (count) option is also used, |
| matching continues in order to obtain the correct count, and | matching continues in order to obtain the correct count, and |
| those files that have at least one match are listed along | those files that have at least one match are listed along |
| with their counts. Using this option with -c is a way of sup- |
with their counts. Using this option with -c is a way of sup- |
| pressing the listing of files with no matches. |
pressing the listing of files with no matches. |
| |
|
|
Line 330 OPTIONS
|
Line 411 OPTIONS
|
| input)" is used. There is no short form for this option. |
input)" is used. There is no short form for this option. |
| |
|
| --line-buffered |
--line-buffered |
| When this option is given, input is read and processed line | When this option is given, input is read and processed line |
| by line, and the output is flushed after each write. By | by line, and the output is flushed after each write. By |
| default, input is read in large chunks, unless pcregrep can | default, input is read in large chunks, unless pcregrep can |
| determine that it is reading from a terminal (which is cur- | determine that it is reading from a terminal (which is cur- |
| rently possible only in Unix environments). Output to termi- | rently possible only in Unix-like environments). Output to |
| nal is normally automatically flushed by the operating sys- | terminal is normally automatically flushed by the operating |
| tem. This option can be useful when the input or output is | system. This option can be useful when the input or output is |
| attached to a pipe and you do not want pcregrep to buffer up | attached to a pipe and you do not want pcregrep to buffer up |
| large amounts of data. However, its use will affect perfor- | large amounts of data. However, its use will affect perfor- |
| mance, and the -M (multiline) option ceases to work. |
mance, and the -M (multiline) option ceases to work. |
| |
|
| --line-offsets |
--line-offsets |
| Instead of showing lines or parts of lines that match, show | Instead of showing lines or parts of lines that match, show |
| each match as a line number, the offset from the start of the |
each match as a line number, the offset from the start of the |
| line, and a length. The line number is terminated by a colon | line, and a length. The line number is terminated by a colon |
| (as usual; see the -n option), and the offset and length are | (as usual; see the -n option), and the offset and length are |
| separated by a comma. In this mode, no context is shown. | separated by a comma. In this mode, no context is shown. |
| That is, the -A, -B, and -C options are ignored. If there is | That is, the -A, -B, and -C options are ignored. If there is |
| more than one match in a line, each of them is shown sepa- | more than one match in a line, each of them is shown sepa- |
| rately. This option is mutually exclusive with --file-offsets |
rately. This option is mutually exclusive with --file-offsets |
| and --only-matching. |
and --only-matching. |
| |
|
| --locale=locale-name |
--locale=locale-name |
| This option specifies a locale to be used for pattern match- | This option specifies a locale to be used for pattern match- |
| ing. It overrides the value in the LC_ALL or LC_CTYPE envi- | ing. It overrides the value in the LC_ALL or LC_CTYPE envi- |
| ronment variables. If no locale is specified, the PCRE | ronment variables. If no locale is specified, the PCRE |
| library's default (usually the "C" locale) is used. There is | library's default (usually the "C" locale) is used. There is |
| no short form for this option. |
no short form for this option. |
| |
|
| --match-limit=number |
--match-limit=number |
| Processing some regular expression patterns can require a | Processing some regular expression patterns can require a |
| very large amount of memory, leading in some cases to a pro- | very large amount of memory, leading in some cases to a pro- |
| gram crash if not enough is available. Other patterns may | gram crash if not enough is available. Other patterns may |
| take a very long time to search for all possible matching | take a very long time to search for all possible matching |
| strings. The pcre_exec() function that is called by pcregrep | strings. The pcre_exec() function that is called by pcregrep |
| to do the matching has two parameters that can limit the | to do the matching has two parameters that can limit the |
| resources that it uses. |
resources that it uses. |
| |
|
| The --match-limit option provides a means of limiting | The --match-limit option provides a means of limiting |
| resource usage when processing patterns that are not going to |
resource usage when processing patterns that are not going to |
| match, but which have a very large number of possibilities in |
match, but which have a very large number of possibilities in |
| their search trees. The classic example is a pattern that | their search trees. The classic example is a pattern that |
| uses nested unlimited repeats. Internally, PCRE uses a func- | uses nested unlimited repeats. Internally, PCRE uses a func- |
| tion called match() which it calls repeatedly (sometimes | tion called match() which it calls repeatedly (sometimes |
| recursively). The limit set by --match-limit is imposed on | recursively). The limit set by --match-limit is imposed on |
| the number of times this function is called during a match, | the number of times this function is called during a match, |
| which has the effect of limiting the amount of backtracking | which has the effect of limiting the amount of backtracking |
| that can take place. |
that can take place. |
| |
|
| The --recursion-limit option is similar to --match-limit, but |
The --recursion-limit option is similar to --match-limit, but |
| instead of limiting the total number of times that match() is |
instead of limiting the total number of times that match() is |
| called, it limits the depth of recursive calls, which in turn |
called, it limits the depth of recursive calls, which in turn |
| limits the amount of memory that can be used. The recursion | limits the amount of memory that can be used. The recursion |
| depth is a smaller number than the total number of calls, | depth is a smaller number than the total number of calls, |
| because not all calls to match() are recursive. This limit is |
because not all calls to match() are recursive. This limit is |
| of use only if it is set smaller than --match-limit. |
of use only if it is set smaller than --match-limit. |
| |
|
| There are no short forms for these options. The default set- | There are no short forms for these options. The default set- |
| tings are specified when the PCRE library is compiled, with | tings are specified when the PCRE library is compiled, with |
| the default default being 10 million. |
the default default being 10 million. |
| |
|
| -M, --multiline |
-M, --multiline |
| Allow patterns to match more than one line. When this option | Allow patterns to match more than one line. When this option |
| is given, patterns may usefully contain literal newline char- |
is given, patterns may usefully contain literal newline char- |
| acters and internal occurrences of ^ and $ characters. The | acters and internal occurrences of ^ and $ characters. The |
| output for a successful match may consist of more than one | output for a successful match may consist of more than one |
| line, the last of which is the one in which the match ended. | line, the last of which is the one in which the match ended. |
| If the matched string ends with a newline sequence the output |
If the matched string ends with a newline sequence the output |
| ends at the end of that line. |
ends at the end of that line. |
| |
|
| When this option is set, the PCRE library is called in "mul- | When this option is set, the PCRE library is called in "mul- |
| tiline" mode. There is a limit to the number of lines that | tiline" mode. There is a limit to the number of lines that |
| can be matched, imposed by the way that pcregrep buffers the | can be matched, imposed by the way that pcregrep buffers the |
| input file as it scans it. However, pcregrep ensures that at | input file as it scans it. However, pcregrep ensures that at |
| least 8K characters or the rest of the document (whichever is |
least 8K characters or the rest of the document (whichever is |
| the shorter) are available for forward matching, and simi- | the shorter) are available for forward matching, and simi- |
| larly the previous 8K characters (or all the previous charac- |
larly the previous 8K characters (or all the previous charac- |
| ters, if fewer than 8K) are guaranteed to be available for | ters, if fewer than 8K) are guaranteed to be available for |
| lookbehind assertions. This option does not work when input | lookbehind assertions. This option does not work when input |
| is read line by line (see --line-buffered.) |
is read line by line (see --line-buffered.) |
| |
|
| -N newline-type, --newline=newline-type |
-N newline-type, --newline=newline-type |
| The PCRE library supports five different conventions for | The PCRE library supports five different conventions for |
| indicating the ends of lines. They are the single-character | indicating the ends of lines. They are the single-character |
| sequences CR (carriage return) and LF (linefeed), the two- | sequences CR (carriage return) and LF (linefeed), the two- |
| character sequence CRLF, an "anycrlf" convention, which rec- | character sequence CRLF, an "anycrlf" convention, which rec- |
| ognizes any of the preceding three types, and an "any" con- | ognizes any of the preceding three types, and an "any" con- |
| vention, in which any Unicode line ending sequence is assumed |
vention, in which any Unicode line ending sequence is assumed |
| to end a line. The Unicode sequences are the three just men- | to end a line. The Unicode sequences are the three just men- |
| tioned, plus VT (vertical tab, U+000B), FF (form feed, | tioned, plus VT (vertical tab, U+000B), FF (form feed, |
| U+000C), NEL (next line, U+0085), LS (line separator, | U+000C), NEL (next line, U+0085), LS (line separator, |
| U+2028), and PS (paragraph separator, U+2029). |
U+2028), and PS (paragraph separator, U+2029). |
| |
|
| When the PCRE library is built, a default line-ending |
When the PCRE library is built, a default line-ending |
| sequence is specified. This is normally the standard | sequence is specified. This is normally the standard |
| sequence for the operating system. Unless otherwise specified |
sequence for the operating system. Unless otherwise specified |
| by this option, pcregrep uses the library's default. The | by this option, pcregrep uses the library's default. The |
| possible values for this option are CR, LF, CRLF, ANYCRLF, or |
possible values for this option are CR, LF, CRLF, ANYCRLF, or |
| ANY. This makes it possible to use pcregrep on files that | ANY. This makes it possible to use pcregrep to scan files |
| have come from other environments without having to modify | that have come from other environments without having to mod- |
| their line endings. If the data that is being scanned does | ify their line endings. If the data that is being scanned |
| not agree with the convention set by this option, pcregrep | does not agree with the convention set by this option, pcre- |
| may behave in strange ways. | grep may behave in strange ways. Note that this option does |
| | not apply to files specified by the -f, --exclude-from, or |
| | --include-from options, which are expected to use the operat- |
| | ing system's standard newline sequence. |
| |
|
| -n, --line-number |
-n, --line-number |
| Precede each output line by its line number in the file, fol- |
Precede each output line by its line number in the file, fol- |
|
Line 463 OPTIONS
|
Line 547 OPTIONS
|
| -onumber, --only-matching=number |
-onumber, --only-matching=number |
| Show only the part of the line that matched the capturing |
Show only the part of the line that matched the capturing |
| parentheses of the given number. Up to 32 capturing parenthe- |
parentheses of the given number. Up to 32 capturing parenthe- |
| ses are supported. Because these options can be given without | ses are supported, and -o0 is equivalent to -o without a num- |
| an argument (see above), if an argument is present, it must | ber. Because these options can be given without an argument |
| be given in the same shell item, for example, -o3 or --only- | (see above), if an argument is present, it must be given in |
| matching=2. The comments given for the non-argument case | the same shell item, for example, -o3 or --only-matching=2. |
| above also apply to this case. If the specified capturing | The comments given for the non-argument case above also apply |
| parentheses do not exist in the pattern, or were not set in | to this case. If the specified capturing parentheses do not |
| the match, nothing is output unless the file name or line | exist in the pattern, or were not set in the match, nothing |
| number are being printed. | is output unless the file name or line number are being |
| | printed. |
| |
|
| |
If this option is given multiple times, multiple substrings |
| |
are output, in the order the options are given. For example, |
| |
-o3 -o1 -o3 causes the substrings matched by capturing paren- |
| |
theses 3 and 1 and then 3 again to be output. By default, |
| |
there is no separator (but see the next option). |
| |
|
| |
--om-separator=text |
| |
Specify a separating string for multiple occurrences of -o. |
| |
The default is an empty string. Separating strings are never |
| |
coloured. |
| |
|
| -q, --quiet |
-q, --quiet |
| Work quietly, that is, display nothing except error messages. |
Work quietly, that is, display nothing except error messages. |
| The exit status indicates whether or not any matches were | The exit status indicates whether or not any matches were |
| found. |
found. |
| |
|
| -r, --recursive |
-r, --recursive |
| If any given path is a directory, recursively scan the files | If any given path is a directory, recursively scan the files |
| it contains, taking note of any --include and --exclude set- | it contains, taking note of any --include and --exclude set- |
| tings. By default, a directory is read as a normal file; in | tings. By default, a directory is read as a normal file; in |
| some operating systems this gives an immediate end-of-file. | some operating systems this gives an immediate end-of-file. |
| This option is a shorthand for setting the -d option to | This option is a shorthand for setting the -d option to |
| "recurse". |
"recurse". |
| |
|
| --recursion-limit=number |
--recursion-limit=number |
| See --match-limit above. |
See --match-limit above. |
| |
|
| -s, --no-messages |
-s, --no-messages |
| Suppress error messages about non-existent or unreadable | Suppress error messages about non-existent or unreadable |
| files. Such files are quietly skipped. However, the return | files. Such files are quietly skipped. However, the return |
| code is still 2, even if matches were found in other files. |
code is still 2, even if matches were found in other files. |
| |
|
| -u, --utf-8 |
-u, --utf-8 |
| Operate in UTF-8 mode. This option is available only if PCRE | Operate in UTF-8 mode. This option is available only if PCRE |
| has been compiled with UTF-8 support. Both patterns and sub- | has been compiled with UTF-8 support. All patterns (including |
| ject lines must be valid strings of UTF-8 characters. | those for any --exclude and --include options) and all sub- |
| | ject lines that are scanned must be valid strings of UTF-8 |
| | characters. |
| |
|
| -V, --version |
-V, --version |
| Write the version numbers of pcregrep and the PCRE library | Write the version numbers of pcregrep and the PCRE library to |
| that is being used to the standard error stream. | the standard output and then exit. Anything else on the com- |
| | mand line is ignored. |
| |
|
| -v, --invert-match |
-v, --invert-match |
| Invert the sense of the match, so that lines which do not |
Invert the sense of the match, so that lines which do not |
|
Line 508 OPTIONS
|
Line 607 OPTIONS
|
| |
|
| -w, --word-regex, --word-regexp |
-w, --word-regex, --word-regexp |
| Force the patterns to match only whole words. This is equiva- |
Force the patterns to match only whole words. This is equiva- |
| lent to having \b at the start and end of the pattern. | lent to having \b at the start and end of the pattern. This |
| | option applies only to the patterns that are matched against |
| | the contents of files; it does not apply to patterns speci- |
| | fied by any of the --include or --exclude options. |
| |
|
| -x, --line-regex, --line-regexp |
-x, --line-regex, --line-regexp |
| Force the patterns to be anchored (each must start matching | Force the patterns to be anchored (each must start matching |
| at the beginning of a line) and in addition, require them to | at the beginning of a line) and in addition, require them to |
| match entire lines. This is equivalent to having ^ and $ | match entire lines. This is equivalent to having ^ and $ |
| characters at the start and end of each alternative branch in |
characters at the start and end of each alternative branch in |
| every pattern. | every pattern. This option applies only to the patterns that |
| | are matched against the contents of files; it does not apply |
| | to patterns specified by any of the --include or --exclude |
| | options. |
| |
|
| |
|
| ENVIRONMENT VARIABLES |
ENVIRONMENT VARIABLES |
|
Line 529 ENVIRONMENT VARIABLES
|
Line 634 ENVIRONMENT VARIABLES
|
| NEWLINES |
NEWLINES |
| |
|
| The -N (--newline) option allows pcregrep to scan files with different |
The -N (--newline) option allows pcregrep to scan files with different |
| newline conventions from the default. However, the setting of this | newline conventions from the default. Any parts of the input files that |
| option does not affect the way in which pcregrep writes information to | are written to the standard output are copied identically, with what- |
| the standard error and output streams. It uses the string "\n" in C | ever newline sequences they have in the input. However, the setting of |
| printf() calls to indicate newlines, relying on the C I/O library to | this option does not affect the interpretation of files specified by |
| convert this to an appropriate sequence if the output is sent to a | the -f, --exclude-from, or --include-from options, which are assumed to |
| file. | use the operating system's standard newline sequence, nor does it |
| | affect the way in which pcregrep writes informational messages to the |
| | standard error and output streams. For these it uses the string "\n" to |
| | indicate newlines, relying on the C I/O library to convert this to an |
| | appropriate sequence. |
| |
|
| |
|
| OPTIONS COMPATIBILITY |
OPTIONS COMPATIBILITY |
| |
|
| Many of the short and long forms of pcregrep's options are the same as |
Many of the short and long forms of pcregrep's options are the same as |
| in the GNU grep program (version 2.5.4). Any long option of the form | in the GNU grep program. Any long option of the form --xxx-regexp (GNU |
| --xxx-regexp (GNU terminology) is also available as --xxx-regex (PCRE | terminology) is also available as --xxx-regex (PCRE terminology). How- |
| terminology). However, the --file-offsets, --include-dir, --line-off- | ever, the --file-list, --file-offsets, --include-dir, --line-offsets, |
| sets, --locale, --match-limit, -M, --multiline, -N, --newline, --recur- | --locale, --match-limit, -M, --multiline, -N, --newline, --om-separa- |
| sion-limit, -u, and --utf-8 options are specific to pcregrep, as is the | tor, --recursion-limit, -u, and --utf-8 options are specific to pcre- |
| use of the --only-matching option with a capturing parentheses number. | grep, as is the use of the --only-matching option with a capturing |
| | parentheses number. |
| |
|
| Although most of the common options work the same way, a few are dif- | Although most of the common options work the same way, a few are dif- |
| ferent in pcregrep. For example, the --include option's argument is a | ferent in pcregrep. For example, the --include option's argument is a |
| glob for GNU grep, but a regular expression for pcregrep. If both the | glob for GNU grep, but a regular expression for pcregrep. If both the |
| -c and -l options are given, GNU grep lists only file names, without | -c and -l options are given, GNU grep lists only file names, without |
| counts, but pcregrep gives the counts. |
counts, but pcregrep gives the counts. |
| |
|
| |
|
| OPTIONS WITH DATA |
OPTIONS WITH DATA |
| |
|
| There are four different ways in which an option with data can be spec- |
There are four different ways in which an option with data can be spec- |
| ified. If a short form option is used, the data may follow immedi- | ified. If a short form option is used, the data may follow immedi- |
| ately, or (with one exception) in the next command line item. For exam- |
ately, or (with one exception) in the next command line item. For exam- |
| ple: |
ple: |
| |
|
| -f/some/file |
-f/some/file |
| -f /some/file |
-f /some/file |
| |
|
| The exception is the -o option, which may appear with or without data. | The exception is the -o option, which may appear with or without data. |
| Because of this, if data is present, it must follow immediately in the | Because of this, if data is present, it must follow immediately in the |
| same item, for example -o3. |
same item, for example -o3. |
| |
|
| If a long form option is used, the data may appear in the same command | If a long form option is used, the data may appear in the same command |
| line item, separated by an equals character, or (with two exceptions) | line item, separated by an equals character, or (with two exceptions) |
| it may appear in the next command line item. For example: |
it may appear in the next command line item. For example: |
| |
|
| --file=/some/file |
--file=/some/file |
| --file /some/file |
--file /some/file |
| |
|
| Note, however, that if you want to supply a file name beginning with ~ | Note, however, that if you want to supply a file name beginning with ~ |
| as data in a shell command, and have the shell expand ~ to a home | as data in a shell command, and have the shell expand ~ to a home |
| directory, you must separate the file name from the option, because the |
directory, you must separate the file name from the option, because the |
| shell does not treat ~ specially unless it is at the start of an item. |
shell does not treat ~ specially unless it is at the start of an item. |
| |
|
| The exceptions to the above are the --colour (or --color) and --only- | The exceptions to the above are the --colour (or --color) and --only- |
| matching options, for which the data is optional. If one of these | matching options, for which the data is optional. If one of these |
| options does have data, it must be given in the first form, using an | options does have data, it must be given in the first form, using an |
| equals character. Otherwise pcregrep will assume that it has no data. |
equals character. Otherwise pcregrep will assume that it has no data. |
| |
|
| |
|
| MATCHING ERRORS |
MATCHING ERRORS |
| |
|
| It is possible to supply a regular expression that takes a very long | It is possible to supply a regular expression that takes a very long |
| time to fail to match certain lines. Such patterns normally involve | time to fail to match certain lines. Such patterns normally involve |
| nested indefinite repeats, for example: (a+)*\d when matched against a | nested indefinite repeats, for example: (a+)*\d when matched against a |
| line of a's with no final digit. The PCRE matching function has a | line of a's with no final digit. The PCRE matching function has a |
| resource limit that causes it to abort in these circumstances. If this | resource limit that causes it to abort in these circumstances. If this |
| happens, pcregrep outputs an error message and the line that caused the |
happens, pcregrep outputs an error message and the line that caused the |
| problem to the standard error stream. If there are more than 20 such | problem to the standard error stream. If there are more than 20 such |
| errors, pcregrep gives up. |
errors, pcregrep gives up. |
| |
|
| The --match-limit option of pcregrep can be used to set the overall | The --match-limit option of pcregrep can be used to set the overall |
| resource limit; there is a second option called --recursion-limit that | resource limit; there is a second option called --recursion-limit that |
| sets a limit on the amount of memory (usually stack) that is used (see | sets a limit on the amount of memory (usually stack) that is used (see |
| the discussion of these options above). |
the discussion of these options above). |
| |
|
| |
|
| DIAGNOSTICS |
DIAGNOSTICS |
| |
|
| Exit status is 0 if any matches were found, 1 if no matches were found, |
Exit status is 0 if any matches were found, 1 if no matches were found, |
| and 2 for syntax errors, overlong lines, non-existent or inaccessible | and 2 for syntax errors, overlong lines, non-existent or inaccessible |
| files (even if matches were found in other files) or too many matching | files (even if matches were found in other files) or too many matching |
| errors. Using the -s option to suppress error messages about inaccessi- |
errors. Using the -s option to suppress error messages about inaccessi- |
| ble files does not affect the return code. |
ble files does not affect the return code. |
| |
|
| |
|
| SEE ALSO |
SEE ALSO |
| |
|
| pcrepattern(3), pcretest(1). | pcrepattern(3), pcresyntax(3), pcretest(1). |
| |
|
| |
|
| AUTHOR |
AUTHOR |
|
Line 626 AUTHOR
|
Line 736 AUTHOR
|
| |
|
| REVISION |
REVISION |
| |
|
| Last updated: 06 September 2011 | Last updated: 13 September 2012 |
| Copyright (c) 1997-2011 University of Cambridge. | Copyright (c) 1997-2012 University of Cambridge. |