--- embedaddon/pcre/doc/pcregrep.txt 2012/10/09 09:19:17 1.1.1.2 +++ embedaddon/pcre/doc/pcregrep.txt 2013/07/22 08:25:56 1.1.1.3 @@ -1,10 +1,10 @@ -PCREGREP(1) PCREGREP(1) +PCREGREP(1) General Commands Manual PCREGREP(1) + NAME pcregrep - a grep with Perl-compatible regular expressions. - SYNOPSIS pcregrep [options] [long options] [pattern] [path1 path2 ...] @@ -26,7 +26,7 @@ DESCRIPTION with slashes, as is common in Perl scripts), they are interpreted as part of the pattern. Quotes can of course be used to delimit patterns on the command line because they are interpreted by the shell, and - indeed they are required if a pattern contains white space or shell + indeed quotes are required if a pattern contains white space or shell metacharacters. The first argument that follows any option settings is treated as the @@ -56,25 +56,27 @@ DESCRIPTION times this size is used (to allow for buffering "before" and "after" lines). An error occurs if a line overflows the buffer. - Patterns are limited to 8K or BUFSIZ bytes, whichever is the greater. - BUFSIZ is defined in . When there is more than one pattern - (specified by the use of -e and/or -f), each pattern is applied to each - line in the order in which they are defined, except that all the -e - patterns are tried before the -f patterns. + Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the + greater. BUFSIZ is defined in . When there is more than one + pattern (specified by the use of -e and/or -f), each pattern is applied + to each line in the order in which they are defined, except that all + the -e patterns are tried before the -f patterns. - By default, as soon as one pattern matches (or fails to match when -v - is used), no further patterns are considered. However, if --colour (or - --color) is used to colour the matching substrings, or if --only-match- - ing, --file-offsets, or --line-offsets is used to output only the part - of the line that matched (either shown literally, or as an offset), - scanning resumes immediately following the match, so that further - matches on the same line can be found. If there are multiple patterns, - they are all tried on the remainder of the line, but patterns that fol- - low the one that matched are not tried on the earlier part of the line. + By default, as soon as one pattern matches a line, no further patterns + are considered. However, if --colour (or --color) is used to colour the + matching substrings, or if --only-matching, --file-offsets, or --line- + offsets is used to output only the part of the line that matched + (either shown literally, or as an offset), scanning resumes immediately + following the match, so that further matches on the same line can be + found. If there are multiple patterns, they are all tried on the + remainder of the line, but patterns that follow the one that matched + are not tried on the earlier part of the line. - This is the same behaviour as GNU grep, but it does mean that the order - in which multiple patterns are specified can affect the output when one - of the above options is used. + This behaviour means that the order in which multiple patterns are + specified can affect the output when one of the above options is used. + This is no longer the same behaviour as GNU grep, which now manages to + display earlier matches for later patterns (as long as there is no + overlap). Patterns that can match an empty string are accepted, but empty string matches are never recognized. An example is the pattern @@ -112,8 +114,10 @@ OPTIONS The order in which some of the options appear can affect the output. For example, both the -h and -l options affect the printing of file names. Whichever comes later in the command line will be the one that - takes effect. Numerical values for options may be followed by K or M, - to signify multiplication by 1024 or 1024*1024 respectively. + takes effect. Similarly, except where noted below, if an option is + given twice, the later setting is used. Numerical values for options + may be followed by K or M, to signify multiplication by 1024 or + 1024*1024 respectively. -- This terminates the list of options. It is useful if the next item on the command line starts with a hyphen but is not an @@ -208,12 +212,14 @@ OPTIONS -d action, --directories=action If an input path is a directory, "action" specifies how it is - to be processed. Valid values are "read" (the default), - "recurse" (equivalent to the -r option), or "skip" (silently - skip the path). In the default case, directories are read as - if they were ordinary files. In some operating systems the - effect of reading a directory like this is an immediate end- - of-file. + to be processed. Valid values are "read" (the default in + non-Windows environments, for compatibility with GNU grep), + "recurse" (equivalent to the -r option), or "skip" (silently + skip the path, the default in Windows environments). In the + "read" case, directories are read as if they were ordinary + files. In some operating systems the effect of reading a + directory like this is an immediate end-of-file; in others it + may provoke an error. -e pattern, --regex=pattern, --regexp=pattern Specify a pattern to be matched. This option can be used mul- @@ -221,103 +227,126 @@ OPTIONS be used as a way of specifying a single pattern that starts with a hyphen. When -e is used, no argument pattern is taken from the command line; all arguments are treated as file - names. There is an overall maximum of 100 patterns. They are + names. There is no limit to the number of patterns. They are applied to each line in the order in which they are defined - until one matches (or fails to match if -v is used). If -f is - used with -e, the command line patterns are matched first, - followed by the patterns from the file, independent of the - order in which these options are specified. Note that multi- - ple use of -e is not the same as a single pattern with alter- - natives. For example, X|Y finds the first character in a line - that is X or Y, whereas if the two patterns are given sepa- - rately, pcregrep finds X if it is present, even if it follows - Y in the line. It finds Y only if there is no X in the line. - This really matters only if you are using -o to show the - part(s) of the line that matched. + until one matches. + If -f is used with -e, the command line patterns are matched + first, followed by the patterns from the file(s), independent + of the order in which these options are specified. Note that + multiple use of -e is not the same as a single pattern with + alternatives. For example, X|Y finds the first character in a + line that is X or Y, whereas if the two patterns are given + separately, with X first, pcregrep finds X if it is present, + even if it follows Y in the line. It finds Y only if there is + no X in the line. This matters only if you are using -o or + --colo(u)r to show the part(s) of the line that matched. + --exclude=pattern - When pcregrep is searching the files in a directory as a con- - sequence of the -r (recursive search) option, any regular - files whose names match the pattern are excluded. Subdirecto- - ries are not excluded by this option; they are searched - recursively, subject to the --exclude-dir and --include_dir - options. The pattern is a PCRE regular expression, and is - matched against the final component of the file name (not the - entire path). If a file name matches both --include and - --exclude, it is excluded. There is no short form for this - option. + Files (but not directories) whose names match the pattern are + skipped without being processed. This applies to all files, + whether listed on the command line, obtained from --file- + list, or by scanning a directory. The pattern is a PCRE regu- + lar expression, and is matched against the final component of + the file name, not the entire path. The -F, -w, and -x + options do not apply to this pattern. The option may be given + any number of times in order to specify multiple patterns. If + a file name matches both an --include and an --exclude pat- + tern, it is excluded. There is no short form for this option. + --exclude-from=filename + Treat each non-empty line of the file as the data for an + --exclude option. What constitutes a newline when reading the + file is the operating system's default. The --newline option + has no effect on this option. This option may be given more + than once in order to specify a number of files to read. + --exclude-dir=pattern - When pcregrep is searching the contents of a directory as a - consequence of the -r (recursive search) option, any subdi- - rectories whose names match the pattern are excluded. (Note - that the --exclude option does not affect subdirectories.) - The pattern is a PCRE regular expression, and is matched - against the final component of the name (not the entire - path). If a subdirectory name matches both --include-dir and - --exclude-dir, it is excluded. There is no short form for - this option. + Directories whose names match the pattern are skipped without + being processed, whatever the setting of the --recursive + option. This applies to all directories, whether listed on + the command line, obtained from --file-list, or by scanning a + parent directory. The pattern is a PCRE regular expression, + and is matched against the final component of the directory + name, not the entire path. The -F, -w, and -x options do not + apply to this pattern. The option may be given any number of + times in order to specify more than one pattern. If a direc- + tory matches both --include-dir and --exclude-dir, it is + excluded. There is no short form for this option. -F, --fixed-strings - Interpret each pattern as a list of fixed strings, separated - by newlines, instead of as a regular expression. The -w - (match as a word) and -x (match whole line) options can be - used with -F. They apply to each of the fixed strings. A line - is selected if any of the fixed strings are found in it (sub- - ject to -w or -x, if present). + Interpret each data-matching pattern as a list of fixed + strings, separated by newlines, instead of as a regular + expression. What constitutes a newline for this purpose is + controlled by the --newline option. The -w (match as a word) + and -x (match whole line) options can be used with -F. They + apply to each of the fixed strings. A line is selected if any + of the fixed strings are found in it (subject to -w or -x, if + present). This option applies only to the patterns that are + matched against the contents of files; it does not apply to + patterns specified by any of the --include or --exclude + options. -f filename, --file=filename - Read a number of patterns from the file, one per line, and - match them against each line of input. A data line is output - if any of the patterns match it. The filename can be given as - "-" to refer to the standard input. When -f is used, patterns - specified on the command line using -e may also be present; - they are tested before the file's patterns. However, no other - pattern is taken from the command line; all arguments are - treated as the names of paths to be searched. There is an - overall maximum of 100 patterns. Trailing white space is - removed from each line, and blank lines are ignored. An empty - file contains no patterns and therefore matches nothing. See - also the comments about multiple patterns versus a single - pattern with alternatives in the description of -e above. + Read patterns from the file, one per line, and match them + against each line of input. What constitutes a newline when + reading the file is the operating system's default. The + --newline option has no effect on this option. Trailing white + space is removed from each line, and blank lines are ignored. + An empty file contains no patterns and therefore matches + nothing. See also the comments about multiple patterns versus + a single pattern with alternatives in the description of -e + above. + If this option is given more than once, all the specified + files are read. A data line is output if any of the patterns + match it. A filename can be given as "-" to refer to the + standard input. When -f is used, patterns specified on the + command line using -e may also be present; they are tested + before the file's patterns. However, no other pattern is + taken from the command line; all arguments are treated as the + names of paths to be searched. + --file-list=filename - Read a list of files to be searched from the given file, one - per line. Trailing white space is removed from each line, and - blank lines are ignored. These files are searched before any - others that may be listed on the command line. The filename - can be given as "-" to refer to the standard input. If --file - and --file-list are both specified as "-", patterns are read - first. This is useful only when the standard input is a ter- - minal, from which further lines (the list of files) can be - read after an end-of-file indication. + Read a list of files and/or directories that are to be + scanned from the given file, one per line. Trailing white + space is removed from each line, and blank lines are ignored. + These paths are processed before any that are listed on the + command line. The filename can be given as "-" to refer to + the standard input. If --file and --file-list are both spec- + ified as "-", patterns are read first. This is useful only + when the standard input is a terminal, from which further + lines (the list of files) can be read after an end-of-file + indication. If this option is given more than once, all the + specified files are read. --file-offsets - Instead of showing lines or parts of lines that match, show - each match as an offset from the start of the file and a - length, separated by a comma. In this mode, no context is - shown. That is, the -A, -B, and -C options are ignored. If + Instead of showing lines or parts of lines that match, show + each match as an offset from the start of the file and a + length, separated by a comma. In this mode, no context is + shown. That is, the -A, -B, and -C options are ignored. If there is more than one match in a line, each of them is shown - separately. This option is mutually exclusive with --line- + separately. This option is mutually exclusive with --line- offsets and --only-matching. -H, --with-filename - Force the inclusion of the filename at the start of output - lines when searching a single file. By default, the filename - is not shown in this case. For matching lines, the filename + Force the inclusion of the filename at the start of output + lines when searching a single file. By default, the filename + is not shown in this case. For matching lines, the filename is followed by a colon; for context lines, a hyphen separator - is used. If a line number is also being output, it follows + is used. If a line number is also being output, it follows the file name. -h, --no-filename - Suppress the output filenames when searching multiple files. - By default, filenames are shown when multiple files are - searched. For matching lines, the filename is followed by a - colon; for context lines, a hyphen separator is used. If a + Suppress the output filenames when searching multiple files. + By default, filenames are shown when multiple files are + searched. For matching lines, the filename is followed by a + colon; for context lines, a hyphen separator is used. If a line number is also being output, it follows the file name. - --help Output a help message, giving brief details of the command - options and file type support, and then exit. + --help Output a help message, giving brief details of the command + options and file type support, and then exit. Anything else + on the command line is ignored. -I Treat binary files as never matching. This is equivalent to --binary-files=without-match. @@ -326,41 +355,53 @@ OPTIONS Ignore upper/lower case distinctions during comparisons. --include=pattern - When pcregrep is searching the files in a directory as a con- - sequence of the -r (recursive search) option, only those reg- - ular files whose names match the pattern are included. Subdi- - rectories are always included and searched recursively, sub- - ject to the --include-dir and --exclude-dir options. The pat- - tern is a PCRE regular expression, and is matched against the - final component of the file name (not the entire path). If a - file name matches both --include and --exclude, it is - excluded. There is no short form for this option. + If any --include patterns are specified, the only files that + are processed are those that match one of the patterns (and + do not match an --exclude pattern). This option does not + affect directories, but it applies to all files, whether + listed on the command line, obtained from --file-list, or by + scanning a directory. The pattern is a PCRE regular expres- + sion, and is matched against the final component of the file + name, not the entire path. The -F, -w, and -x options do not + apply to this pattern. The option may be given any number of + times. If a file name matches both an --include and an + --exclude pattern, it is excluded. There is no short form + for this option. + --include-from=filename + Treat each non-empty line of the file as the data for an + --include option. What constitutes a newline for this purpose + is the operating system's default. The --newline option has + no effect on this option. This option may be given any number + of times; all the files are read. + --include-dir=pattern - When pcregrep is searching the contents of a directory as a - consequence of the -r (recursive search) option, only those - subdirectories whose names match the pattern are included. - (Note that the --include option does not affect subdirecto- - ries.) The pattern is a PCRE regular expression, and is - matched against the final component of the name (not the - entire path). If a subdirectory name matches both --include- - dir and --exclude-dir, it is excluded. There is no short form - for this option. + If any --include-dir patterns are specified, the only direc- + tories that are processed are those that match one of the + patterns (and do not match an --exclude-dir pattern). This + applies to all directories, whether listed on the command + line, obtained from --file-list, or by scanning a parent + directory. The pattern is a PCRE regular expression, and is + matched against the final component of the directory name, + not the entire path. The -F, -w, and -x options do not apply + to this pattern. The option may be given any number of times. + If a directory matches both --include-dir and --exclude-dir, + it is excluded. There is no short form for this option. -L, --files-without-match - Instead of outputting lines from the files, just output the - names of the files that do not contain any lines that would - have been output. Each file name is output once, on a sepa- + Instead of outputting lines from the files, just output the + names of the files that do not contain any lines that would + have been output. Each file name is output once, on a sepa- rate line. -l, --files-with-matches - Instead of outputting lines from the files, just output the + Instead of outputting lines from the files, just output the names of the files containing lines that would have been out- - put. Each file name is output once, on a separate line. - Searching normally stops as soon as a matching line is found - in a file. However, if the -c (count) option is also used, - matching continues in order to obtain the correct count, and - those files that have at least one match are listed along + put. Each file name is output once, on a separate line. + Searching normally stops as soon as a matching line is found + in a file. However, if the -c (count) option is also used, + matching continues in order to obtain the correct count, and + those files that have at least one match are listed along with their counts. Using this option with -c is a way of sup- pressing the listing of files with no matches. @@ -370,109 +411,112 @@ OPTIONS input)" is used. There is no short form for this option. --line-buffered - When this option is given, input is read and processed line - by line, and the output is flushed after each write. By - default, input is read in large chunks, unless pcregrep can - determine that it is reading from a terminal (which is cur- - rently possible only in Unix environments). Output to termi- - nal is normally automatically flushed by the operating sys- - tem. This option can be useful when the input or output is - attached to a pipe and you do not want pcregrep to buffer up - large amounts of data. However, its use will affect perfor- + When this option is given, input is read and processed line + by line, and the output is flushed after each write. By + default, input is read in large chunks, unless pcregrep can + determine that it is reading from a terminal (which is cur- + rently possible only in Unix-like environments). Output to + terminal is normally automatically flushed by the operating + system. This option can be useful when the input or output is + attached to a pipe and you do not want pcregrep to buffer up + large amounts of data. However, its use will affect perfor- mance, and the -M (multiline) option ceases to work. --line-offsets - Instead of showing lines or parts of lines that match, show + Instead of showing lines or parts of lines that match, show each match as a line number, the offset from the start of the - line, and a length. The line number is terminated by a colon - (as usual; see the -n option), and the offset and length are - separated by a comma. In this mode, no context is shown. - That is, the -A, -B, and -C options are ignored. If there is - more than one match in a line, each of them is shown sepa- + line, and a length. The line number is terminated by a colon + (as usual; see the -n option), and the offset and length are + separated by a comma. In this mode, no context is shown. + That is, the -A, -B, and -C options are ignored. If there is + more than one match in a line, each of them is shown sepa- rately. This option is mutually exclusive with --file-offsets and --only-matching. --locale=locale-name - This option specifies a locale to be used for pattern match- - ing. It overrides the value in the LC_ALL or LC_CTYPE envi- - ronment variables. If no locale is specified, the PCRE - library's default (usually the "C" locale) is used. There is + This option specifies a locale to be used for pattern match- + ing. It overrides the value in the LC_ALL or LC_CTYPE envi- + ronment variables. If no locale is specified, the PCRE + library's default (usually the "C" locale) is used. There is no short form for this option. --match-limit=number - Processing some regular expression patterns can require a - very large amount of memory, leading in some cases to a pro- - gram crash if not enough is available. Other patterns may - take a very long time to search for all possible matching - strings. The pcre_exec() function that is called by pcregrep - to do the matching has two parameters that can limit the + Processing some regular expression patterns can require a + very large amount of memory, leading in some cases to a pro- + gram crash if not enough is available. Other patterns may + take a very long time to search for all possible matching + strings. The pcre_exec() function that is called by pcregrep + to do the matching has two parameters that can limit the resources that it uses. - The --match-limit option provides a means of limiting + The --match-limit option provides a means of limiting resource usage when processing patterns that are not going to match, but which have a very large number of possibilities in - their search trees. The classic example is a pattern that - uses nested unlimited repeats. Internally, PCRE uses a func- - tion called match() which it calls repeatedly (sometimes - recursively). The limit set by --match-limit is imposed on - the number of times this function is called during a match, - which has the effect of limiting the amount of backtracking + their search trees. The classic example is a pattern that + uses nested unlimited repeats. Internally, PCRE uses a func- + tion called match() which it calls repeatedly (sometimes + recursively). The limit set by --match-limit is imposed on + the number of times this function is called during a match, + which has the effect of limiting the amount of backtracking that can take place. The --recursion-limit option is similar to --match-limit, but instead of limiting the total number of times that match() is called, it limits the depth of recursive calls, which in turn - limits the amount of memory that can be used. The recursion - depth is a smaller number than the total number of calls, + limits the amount of memory that can be used. The recursion + depth is a smaller number than the total number of calls, because not all calls to match() are recursive. This limit is of use only if it is set smaller than --match-limit. - There are no short forms for these options. The default set- - tings are specified when the PCRE library is compiled, with + There are no short forms for these options. The default set- + tings are specified when the PCRE library is compiled, with the default default being 10 million. -M, --multiline - Allow patterns to match more than one line. When this option + Allow patterns to match more than one line. When this option is given, patterns may usefully contain literal newline char- - acters and internal occurrences of ^ and $ characters. The - output for a successful match may consist of more than one - line, the last of which is the one in which the match ended. + acters and internal occurrences of ^ and $ characters. The + output for a successful match may consist of more than one + line, the last of which is the one in which the match ended. If the matched string ends with a newline sequence the output ends at the end of that line. - When this option is set, the PCRE library is called in "mul- - tiline" mode. There is a limit to the number of lines that - can be matched, imposed by the way that pcregrep buffers the - input file as it scans it. However, pcregrep ensures that at + When this option is set, the PCRE library is called in "mul- + tiline" mode. There is a limit to the number of lines that + can be matched, imposed by the way that pcregrep buffers the + input file as it scans it. However, pcregrep ensures that at least 8K characters or the rest of the document (whichever is - the shorter) are available for forward matching, and simi- + the shorter) are available for forward matching, and simi- larly the previous 8K characters (or all the previous charac- - ters, if fewer than 8K) are guaranteed to be available for - lookbehind assertions. This option does not work when input + ters, if fewer than 8K) are guaranteed to be available for + lookbehind assertions. This option does not work when input is read line by line (see --line-buffered.) -N newline-type, --newline=newline-type - The PCRE library supports five different conventions for - indicating the ends of lines. They are the single-character - sequences CR (carriage return) and LF (linefeed), the two- - character sequence CRLF, an "anycrlf" convention, which rec- - ognizes any of the preceding three types, and an "any" con- + The PCRE library supports five different conventions for + indicating the ends of lines. They are the single-character + sequences CR (carriage return) and LF (linefeed), the two- + character sequence CRLF, an "anycrlf" convention, which rec- + ognizes any of the preceding three types, and an "any" con- vention, in which any Unicode line ending sequence is assumed - to end a line. The Unicode sequences are the three just men- - tioned, plus VT (vertical tab, U+000B), FF (form feed, - U+000C), NEL (next line, U+0085), LS (line separator, + to end a line. The Unicode sequences are the three just men- + tioned, plus VT (vertical tab, U+000B), FF (form feed, + U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). When the PCRE library is built, a default line-ending - sequence is specified. This is normally the standard + sequence is specified. This is normally the standard sequence for the operating system. Unless otherwise specified - by this option, pcregrep uses the library's default. The + by this option, pcregrep uses the library's default. The possible values for this option are CR, LF, CRLF, ANYCRLF, or - ANY. This makes it possible to use pcregrep on files that - have come from other environments without having to modify - their line endings. If the data that is being scanned does - not agree with the convention set by this option, pcregrep - may behave in strange ways. + ANY. This makes it possible to use pcregrep to scan files + that have come from other environments without having to mod- + ify their line endings. If the data that is being scanned + does not agree with the convention set by this option, pcre- + grep may behave in strange ways. Note that this option does + not apply to files specified by the -f, --exclude-from, or + --include-from options, which are expected to use the operat- + ing system's standard newline sequence. -n, --line-number Precede each output line by its line number in the file, fol- @@ -503,44 +547,59 @@ OPTIONS -onumber, --only-matching=number Show only the part of the line that matched the capturing parentheses of the given number. Up to 32 capturing parenthe- - ses are supported. Because these options can be given without - an argument (see above), if an argument is present, it must - be given in the same shell item, for example, -o3 or --only- - matching=2. The comments given for the non-argument case - above also apply to this case. If the specified capturing - parentheses do not exist in the pattern, or were not set in - the match, nothing is output unless the file name or line - number are being printed. + ses are supported, and -o0 is equivalent to -o without a num- + ber. Because these options can be given without an argument + (see above), if an argument is present, it must be given in + the same shell item, for example, -o3 or --only-matching=2. + The comments given for the non-argument case above also apply + to this case. If the specified capturing parentheses do not + exist in the pattern, or were not set in the match, nothing + is output unless the file name or line number are being + printed. + If this option is given multiple times, multiple substrings + are output, in the order the options are given. For example, + -o3 -o1 -o3 causes the substrings matched by capturing paren- + theses 3 and 1 and then 3 again to be output. By default, + there is no separator (but see the next option). + + --om-separator=text + Specify a separating string for multiple occurrences of -o. + The default is an empty string. Separating strings are never + coloured. + -q, --quiet Work quietly, that is, display nothing except error messages. - The exit status indicates whether or not any matches were + The exit status indicates whether or not any matches were found. -r, --recursive - If any given path is a directory, recursively scan the files - it contains, taking note of any --include and --exclude set- - tings. By default, a directory is read as a normal file; in - some operating systems this gives an immediate end-of-file. - This option is a shorthand for setting the -d option to + If any given path is a directory, recursively scan the files + it contains, taking note of any --include and --exclude set- + tings. By default, a directory is read as a normal file; in + some operating systems this gives an immediate end-of-file. + This option is a shorthand for setting the -d option to "recurse". --recursion-limit=number See --match-limit above. -s, --no-messages - Suppress error messages about non-existent or unreadable - files. Such files are quietly skipped. However, the return + Suppress error messages about non-existent or unreadable + files. Such files are quietly skipped. However, the return code is still 2, even if matches were found in other files. -u, --utf-8 - Operate in UTF-8 mode. This option is available only if PCRE - has been compiled with UTF-8 support. Both patterns and sub- - ject lines must be valid strings of UTF-8 characters. + Operate in UTF-8 mode. This option is available only if PCRE + has been compiled with UTF-8 support. All patterns (including + those for any --exclude and --include options) and all sub- + ject lines that are scanned must be valid strings of UTF-8 + characters. -V, --version - Write the version numbers of pcregrep and the PCRE library - that is being used to the standard error stream. + Write the version numbers of pcregrep and the PCRE library to + the standard output and then exit. Anything else on the com- + mand line is ignored. -v, --invert-match Invert the sense of the match, so that lines which do not @@ -548,14 +607,20 @@ OPTIONS -w, --word-regex, --word-regexp Force the patterns to match only whole words. This is equiva- - lent to having \b at the start and end of the pattern. + lent to having \b at the start and end of the pattern. This + option applies only to the patterns that are matched against + the contents of files; it does not apply to patterns speci- + fied by any of the --include or --exclude options. -x, --line-regex, --line-regexp - Force the patterns to be anchored (each must start matching - at the beginning of a line) and in addition, require them to - match entire lines. This is equivalent to having ^ and $ + Force the patterns to be anchored (each must start matching + at the beginning of a line) and in addition, require them to + match entire lines. This is equivalent to having ^ and $ characters at the start and end of each alternative branch in - every pattern. + every pattern. This option applies only to the patterns that + are matched against the contents of files; it does not apply + to patterns specified by any of the --include or --exclude + options. ENVIRONMENT VARIABLES @@ -569,12 +634,16 @@ ENVIRONMENT VARIABLES NEWLINES The -N (--newline) option allows pcregrep to scan files with different - newline conventions from the default. However, the setting of this - option does not affect the way in which pcregrep writes information to - the standard error and output streams. It uses the string "\n" in C - printf() calls to indicate newlines, relying on the C I/O library to - convert this to an appropriate sequence if the output is sent to a - file. + newline conventions from the default. Any parts of the input files that + are written to the standard output are copied identically, with what- + ever newline sequences they have in the input. However, the setting of + this option does not affect the interpretation of files specified by + the -f, --exclude-from, or --include-from options, which are assumed to + use the operating system's standard newline sequence, nor does it + affect the way in which pcregrep writes informational messages to the + standard error and output streams. For these it uses the string "\n" to + indicate newlines, relying on the C I/O library to convert this to an + appropriate sequence. OPTIONS COMPATIBILITY @@ -583,78 +652,79 @@ OPTIONS COMPATIBILITY in the GNU grep program. Any long option of the form --xxx-regexp (GNU terminology) is also available as --xxx-regex (PCRE terminology). How- ever, the --file-list, --file-offsets, --include-dir, --line-offsets, - --locale, --match-limit, -M, --multiline, -N, --newline, --recursion- - limit, -u, and --utf-8 options are specific to pcregrep, as is the use - of the --only-matching option with a capturing parentheses number. + --locale, --match-limit, -M, --multiline, -N, --newline, --om-separa- + tor, --recursion-limit, -u, and --utf-8 options are specific to pcre- + grep, as is the use of the --only-matching option with a capturing + parentheses number. - Although most of the common options work the same way, a few are dif- - ferent in pcregrep. For example, the --include option's argument is a - glob for GNU grep, but a regular expression for pcregrep. If both the - -c and -l options are given, GNU grep lists only file names, without + Although most of the common options work the same way, a few are dif- + ferent in pcregrep. For example, the --include option's argument is a + glob for GNU grep, but a regular expression for pcregrep. If both the + -c and -l options are given, GNU grep lists only file names, without counts, but pcregrep gives the counts. OPTIONS WITH DATA There are four different ways in which an option with data can be spec- - ified. If a short form option is used, the data may follow immedi- + ified. If a short form option is used, the data may follow immedi- ately, or (with one exception) in the next command line item. For exam- ple: -f/some/file -f /some/file - The exception is the -o option, which may appear with or without data. - Because of this, if data is present, it must follow immediately in the + The exception is the -o option, which may appear with or without data. + Because of this, if data is present, it must follow immediately in the same item, for example -o3. - If a long form option is used, the data may appear in the same command - line item, separated by an equals character, or (with two exceptions) + If a long form option is used, the data may appear in the same command + line item, separated by an equals character, or (with two exceptions) it may appear in the next command line item. For example: --file=/some/file --file /some/file - Note, however, that if you want to supply a file name beginning with ~ - as data in a shell command, and have the shell expand ~ to a home + Note, however, that if you want to supply a file name beginning with ~ + as data in a shell command, and have the shell expand ~ to a home directory, you must separate the file name from the option, because the shell does not treat ~ specially unless it is at the start of an item. - The exceptions to the above are the --colour (or --color) and --only- - matching options, for which the data is optional. If one of these - options does have data, it must be given in the first form, using an + The exceptions to the above are the --colour (or --color) and --only- + matching options, for which the data is optional. If one of these + options does have data, it must be given in the first form, using an equals character. Otherwise pcregrep will assume that it has no data. MATCHING ERRORS - It is possible to supply a regular expression that takes a very long - time to fail to match certain lines. Such patterns normally involve - nested indefinite repeats, for example: (a+)*\d when matched against a - line of a's with no final digit. The PCRE matching function has a - resource limit that causes it to abort in these circumstances. If this + It is possible to supply a regular expression that takes a very long + time to fail to match certain lines. Such patterns normally involve + nested indefinite repeats, for example: (a+)*\d when matched against a + line of a's with no final digit. The PCRE matching function has a + resource limit that causes it to abort in these circumstances. If this happens, pcregrep outputs an error message and the line that caused the - problem to the standard error stream. If there are more than 20 such + problem to the standard error stream. If there are more than 20 such errors, pcregrep gives up. - The --match-limit option of pcregrep can be used to set the overall - resource limit; there is a second option called --recursion-limit that - sets a limit on the amount of memory (usually stack) that is used (see + The --match-limit option of pcregrep can be used to set the overall + resource limit; there is a second option called --recursion-limit that + sets a limit on the amount of memory (usually stack) that is used (see the discussion of these options above). DIAGNOSTICS Exit status is 0 if any matches were found, 1 if no matches were found, - and 2 for syntax errors, overlong lines, non-existent or inaccessible - files (even if matches were found in other files) or too many matching + and 2 for syntax errors, overlong lines, non-existent or inaccessible + files (even if matches were found in other files) or too many matching errors. Using the -s option to suppress error messages about inaccessi- ble files does not affect the return code. SEE ALSO - pcrepattern(3), pcretest(1). + pcrepattern(3), pcresyntax(3), pcretest(1). AUTHOR @@ -666,5 +736,5 @@ AUTHOR REVISION - Last updated: 04 March 2012 + Last updated: 13 September 2012 Copyright (c) 1997-2012 University of Cambridge.