version 1.1.1.2, 2012/02/21 23:50:25
|
version 1.1.1.3, 2012/10/09 09:19:17
|
Line 111 COMMAND LINE OPTIONS
|
Line 111 COMMAND LINE OPTIONS
|
size megabytes. |
size megabytes. |
|
|
-s or -s+ Behave as if each pattern has the /S modifier; in other |
-s or -s+ Behave as if each pattern has the /S modifier; in other |
words, force each pattern to be studied. If -s+ is used, the | words, force each pattern to be studied. If -s+ is used, all |
PCRE_STUDY_JIT_COMPILE flag is passed to pcre[16]_study(), | the JIT compile options are passed to pcre[16]_study(), caus- |
causing just-in-time optimization to be set up if it is | ing just-in-time optimization to be set up if it is avail- |
available. If the /I or /D option is present on a pattern | able, for both full and partial matching. Specific JIT com- |
(requesting output about the compiled pattern), information | pile options can be selected by following -s+ with a digit in |
about the result of studying is not included when studying is | the range 1 to 7, which selects the JIT compile modes as fol- |
caused only by -s and neither -i nor -d is present on the | lows: |
command line. This behaviour means that the output from tests | |
that are run with and without -s should be identical, except | |
when options that output information about the actual running | |
of a match are set. | |
|
|
The -M, -t, and -tm options, which give information about | 1 normal match only |
resources used, are likely to produce different output with | 2 soft partial match only |
and without -s. Output may also differ if the /C option is | 3 normal match and soft partial match |
present on an individual pattern. This uses callouts to trace | 4 hard partial match only |
the the matching process, and this may be different between | 6 soft and hard partial match |
studied and non-studied patterns. If the pattern contains | 7 all three modes (default) |
(*MARK) items there may also be differences, for the same | |
reason. The -s command line option can be overridden for spe- | |
cific patterns that should never be studied (see the /S pat- | |
tern modifier below). | |
|
|
-t Run each compile, study, and match many times with a timer, | If -s++ is used instead of -s+ (with or without a following |
and output resulting time per compile or match (in millisec- | digit), the text "(JIT)" is added to the first output line |
onds). Do not set -m with -t, because you will then get the | after a match or no match when JIT-compiled code was actually |
size output a zillion times, and the timing will be dis- | used. |
torted. You can control the number of iterations that are | |
used for timing by following -t with a number (as a separate | If the /I or /D option is present on a pattern (requesting output about |
| the compiled pattern), information about the result of studying is not |
| included when studying is caused only by -s and neither -i nor -d is |
| present on the command line. This behaviour means that the output from |
| tests that are run with and without -s should be identical, except when |
| options that output information about the actual running of a match are |
| set. |
| |
| The -M, -t, and -tm options, which give information about resources |
| used, are likely to produce different output with and without -s. Out- |
| put may also differ if the /C option is present on an individual pat- |
| tern. This uses callouts to trace the the matching process, and this |
| may be different between studied and non-studied patterns. If the pat- |
| tern contains (*MARK) items there may also be differences, for the same |
| reason. The -s command line option can be overridden for specific pat- |
| terns that should never be studied (see the /S pattern modifier below). |
| |
| -t Run each compile, study, and match many times with a timer, |
| and output resulting time per compile or match (in millisec- |
| onds). Do not set -m with -t, because you will then get the |
| size output a zillion times, and the timing will be dis- |
| torted. You can control the number of iterations that are |
| used for timing by following -t with a number (as a separate |
item on the command line). For example, "-t 1000" would iter- |
item on the command line). For example, "-t 1000" would iter- |
ate 1000 times. The default is to iterate 500000 times. |
ate 1000 times. The default is to iterate 500000 times. |
|
|
Line 149 COMMAND LINE OPTIONS
|
Line 163 COMMAND LINE OPTIONS
|
|
|
DESCRIPTION |
DESCRIPTION |
|
|
If pcretest is given two filename arguments, it reads from the first | If pcretest is given two filename arguments, it re If pcretest is given two filename arguments, it reads from the first |
and writes to the second. If it is given only one filename argument, it |
and writes to the second. If it is given only one filename argument, it |
reads from that file and writes to stdout. Otherwise, it reads from | reads from that file and writes to stdout. Otherwise, it reads from |
stdin and writes to stdout, and prompts for each line of input, using | stdin and writes to stdout, and prompts for each line of input, using |
"re>" to prompt for regular expressions, and "data>" to prompt for data |
"re>" to prompt for regular expressions, and "data>" to prompt for data |
lines. |
lines. |
|
|
When pcretest is built, a configuration option can specify that it | When pcretest is built, a configuration option can specify that it |
should be linked with the libreadline library. When this is done, if | should be linked with the libreadline library. When this is done, if |
the input is from a terminal, it is read using the readline() function. |
the input is from a terminal, it is read using the readline() function. |
This provides line-editing and history facilities. The output from the | This provides line-editing and history facilities. The output from the |
-help option states whether or not readline() will be used. |
-help option states whether or not readline() will be used. |
|
|
The program handles any number of sets of input on a single input file. |
The program handles any number of sets of input on a single input file. |
Each set starts with a regular expression, and continues with any num- | Each set starts with a regular expression, and continues with any num- |
ber of data lines to be matched against the pattern. |
ber of data lines to be matched against the pattern. |
|
|
Each data line is matched separately and independently. If you want to | Each data line is matched separately and independently. If you want to |
do multi-line matches, you have to use the \n escape sequence (or \r or |
do multi-line matches, you have to use the \n escape sequence (or \r or |
\r\n, etc., depending on the newline setting) in a single line of input |
\r\n, etc., depending on the newline setting) in a single line of input |
to encode the newline sequences. There is no limit on the length of | to encode the newline sequences. There is no limit on the length of |
data lines; the input buffer is automatically extended if it is too | data lines; the input buffer is automatically extended if it is too |
small. |
small. |
|
|
An empty line signals the end of the data lines, at which point a new | An empty line signals the end of the data lines, at which point a new |
regular expression is read. The regular expressions are given enclosed | regular expression is read. The regular expressions are given enclosed |
in any non-alphanumeric delimiters other than backslash, for example: |
in any non-alphanumeric delimiters other than backslash, for example: |
|
|
/(a|bc)x+yz/ |
/(a|bc)x+yz/ |
|
|
White space before the initial delimiter is ignored. A regular expres- | White space before the initial delimiter is ignored. A regular expres- |
sion may be continued over several input lines, in which case the new- | sion may be continued over several input lines, in which case the new- |
line characters are included within it. It is possible to include the | line characters are included within it. It is possible to include the |
delimiter within the pattern by escaping it, for example |
delimiter within the pattern by escaping it, for example |
|
|
/abc\/def/ |
/abc\/def/ |
|
|
If you do so, the escape and the delimiter form part of the pattern, | If you do so, the escape and the delimiter form part of the pattern, |
but since delimiters are always non-alphanumeric, this does not affect | but since delimiters are always non-alphanumeric, this does not affect |
its interpretation. If the terminating delimiter is immediately fol- | its interpretation. If the terminating delimiter is immediately fol- |
lowed by a backslash, for example, |
lowed by a backslash, for example, |
|
|
/abc/\ |
/abc/\ |
|
|
then a backslash is added to the end of the pattern. This is done to | then a backslash is added to the end of the pattern. This is done to |
provide a way of testing the error condition that arises if a pattern | provide a way of testing the error condition that arises if a pattern |
finishes with a backslash, because |
finishes with a backslash, because |
|
|
/abc\/ |
/abc\/ |
|
|
is interpreted as the first line of a pattern that starts with "abc/", | is interpreted as the first line of a pattern that starts with "abc/", |
causing pcretest to read the next line as a continuation of the regular |
causing pcretest to read the next line as a continuation of the regular |
expression. |
expression. |
|
|
|
|
PATTERN MODIFIERS |
PATTERN MODIFIERS |
|
|
A pattern may be followed by any number of modifiers, which are mostly | A pattern may be followed by any number of modifiers, which are mostly |
single characters. Following Perl usage, these are referred to below | single characters. Following Perl usage, these are referred to below |
as, for example, "the /i modifier", even though the delimiter of the | as, for example, "the /i modifier", even though the delimiter of the |
pattern need not always be a slash, and no slash is used when writing | pattern need not always be a slash, and no slash is used when writing |
modifiers. White space may appear between the final pattern delimiter | modifiers. White space may appear between the final pattern delimiter |
and the first modifier, and between the modifiers themselves. |
and the first modifier, and between the modifiers themselves. |
|
|
The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, |
The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, |
PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre[16]_com- |
PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre[16]_com- |
pile() is called. These four modifier letters have the same effect as | pile() is called. These four modifier letters have the same effect as |
they do in Perl. For example: |
they do in Perl. For example: |
|
|
/caseless/i |
/caseless/i |
|
|
The following table shows additional modifiers for setting PCRE com- | The following table shows additional modifiers for setting PCR The following table shows additional modifiers for setting PCR |
pile-time options that do not correspond to anything in Perl: |
pile-time options that do not correspond to anything in Perl: |
|
|
/8 PCRE_UTF8 ) when using the 8-bit |
/8 PCRE_UTF8 ) when using the 8-bit |
Line 248 PATTERN MODIFIERS
|
Line 262 PATTERN MODIFIERS
|
/<bsr_anycrlf> PCRE_BSR_ANYCRLF |
/<bsr_anycrlf> PCRE_BSR_ANYCRLF |
/<bsr_unicode> PCRE_BSR_UNICODE |
/<bsr_unicode> PCRE_BSR_UNICODE |
|
|
The modifiers that are enclosed in angle brackets are literal strings | The modifiers that are enclosed in angle brackets are literal strings |
as shown, including the angle brackets, but the letters within can be | as shown, including the angle brackets, but the letters within can be |
in either case. This example sets multiline matching with CRLF as the | in either case. This example sets multiline matching with CRLF as the |
line ending sequence: |
line ending sequence: |
|
|
/^abc/m<CRLF> |
/^abc/m<CRLF> |
|
|
As well as turning on the PCRE_UTF8/16 option, the /8 modifier causes | As well as turning on the PCRE_UTF8/16 option, the /8 modifier causes |
all non-printing characters in output strings to be printed using the | all non-printing characters in output strings to be printed using the |
\x{hh...} notation. Otherwise, those less than 0x100 are output in hex | \x{hh...} notation. Otherwise, those less than 0x100 are output in hex |
without the curly brackets. |
without the curly brackets. |
|
|
Full details of the PCRE options are given in the pcreapi documenta- | Full details of the PCRE options are given in the pcreapi documenta- |
tion. |
tion. |
|
|
Finding all matches in a string |
Finding all matches in a string |
|
|
Searching for all possible matches within each subject string can be | Searching for all possible matches within each subject string can be |
requested by the /g or /G modifier. After finding a match, PCRE is | requested by the /g or /G modifier. After finding a match, PCRE is |
called again to search the remainder of the subject string. The differ- |
called again to search the remainder of the subject string. The differ- |
ence between /g and /G is that the former uses the startoffset argument |
ence between /g and /G is that the former uses the startoffset argument |
to pcre[16]_exec() to start searching at a new point within the entire | to pcre[16]_exec() to start searching at a new point within the entire |
string (which is in effect what Perl does), whereas the latter passes | string (which is in effect what Perl does), whereas the latter passes |
over a shortened substring. This makes a difference to the matching | over a shortened substring. This makes a difference to the matching |
process if the pattern begins with a lookbehind assertion (including \b |
process if the pattern begins with a lookbehind assertion (including \b |
or \B). |
or \B). |
|
|
If any call to pcre[16]_exec() in a /g or /G sequence matches an empty | If any call to pcre[16]_exec() in a /g or /G sequence matches an empty |
string, the next call is done with the PCRE_NOTEMPTY_ATSTART and | string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
PCRE_ANCHORED flags set in order to search for another, non-empty, | PCRE_ANCHORED flags set in order to search for another, non-empty, |
match at the same point. If this second match fails, the start offset | match at the same point. If this second match fails, the start offset |
is advanced, and the normal match is retried. This imitates the way | is advanced, and the normal match is retried. This imitates the way |
Perl handles such cases when using the /g modifier or the split() func- |
Perl handles such cases when using the /g modifier or the split() func- |
tion. Normally, the start offset is advanced by one character, but if | tion. Normally, the start offset is advanced by one character, but if |
the newline convention recognizes CRLF as a newline, and the current | the newline convention recognizes CRLF as a newline, and the current |
character is CR followed by LF, an advance of two is used. |
character is CR followed by LF, an advance of two is used. |
|
|
Other modifiers |
Other modifiers |
|
|
There are yet more modifiers for controlling the way pcretest operates. |
There are yet more modifiers for controlling the way pcretest operates. |
|
|
The /+ modifier requests that as well as outputting the substring that | The /+ modifier requests that as well as outputting the substring that |
matched the entire pattern, pcretest should in addition output the | matched the entire pattern, pcretest should in addition output the |
remainder of the subject string. This is useful for tests where the | remainder of the subject string. This is useful for tests where the |
subject contains multiple copies of the same substring. If the + modi- | subject contains multiple copies of the same substring. If the + modi- |
fier appears twice, the same action is taken for captured substrings. | fier appears twice, the same action is taken for captured substrings. |
In each case the remainder is output on the following line with a plus | In each case the remainder is output on the following line with a plus |
character following the capture number. Note that this modifier must | character following the capture number. Note that this modifier must |
not immediately follow the /S modifier because /S+ has another meaning. | not immediately follow the /S modifier because /S+ and /S++ have other |
| meanings. |
|
|
The /= modifier requests that the values of all potential captured |
The /= modifier requests that the values of all potential captured |
parentheses be output after a match. By default, only those up to the |
parentheses be output after a match. By default, only those up to the |
Line 368 PATTERN MODIFIERS
|
Line 383 PATTERN MODIFIERS
|
different when the pattern is studied. |
different when the pattern is studied. |
|
|
If the /S modifier is immediately followed by a + character, the call |
If the /S modifier is immediately followed by a + character, the call |
to pcre[16]_study() is made with the PCRE_STUDY_JIT_COMPILE option, | to pcre[16]_study() is made with all the JIT study options, requesting |
requesting just-in-time optimization support if it is available. Note | just-in-time optimization support if it is available, for both normal |
that there is also a /+ modifier; it must not be given immediately | and partial matching. If you want to restrict the JIT compiling modes, |
after /S because this will be misinterpreted. If JIT studying is suc- | you can follow /S+ with a digit in the range 1 to 7: |
cessful, it will automatically be used when pcre[16]_exec() is run, | |
except when incompatible run-time options are specified. These include | |
the partial matching options; a complete list is given in the pcrejit | |
documentation. See also the \J escape sequence below for a way of set- | |
ting the size of the JIT stack. | |
|
|
|
1 normal match only |
|
2 soft partial match only |
|
3 normal match and soft partial match |
|
4 hard partial match only |
|
6 soft and hard partial match |
|
7 all three modes (default) |
|
|
|
If /S++ is used instead of /S+ (with or without a following digit), the |
|
text "(JIT)" is added to the first output line after a match or no |
|
match when JIT-compiled code was actually used. |
|
|
|
Note that there is also an independent /+ modifier; it must not be |
|
given immediately after /S or /S+ because this will be misinterpreted. |
|
|
|
If JIT studying is successful, the compiled JIT code will automatically |
|
be used when pcre[16]_exec() is run, except when incompatible run-time |
|
options are specified. For more details, see the pcrejit documentation. |
|
See also the \J escape sequence below for a way of setting the size of |
|
the JIT stack. |
|
|
The /T modifier must be followed by a single digit. It causes a spe- |
The /T modifier must be followed by a single digit. It causes a spe- |
cific set of built-in character tables to be passed to pcre[16]_com- |
cific set of built-in character tables to be passed to pcre[16]_com- |
pile(). It is used in the standard PCRE tests to check behaviour with |
pile(). It is used in the standard PCRE tests to check behaviour with |
Line 869 AUTHOR
|
Line 899 AUTHOR
|
|
|
REVISION |
REVISION |
|
|
Last updated: 14 January 2012 | Last updated: 21 February 2012 |
Copyright (c) 1997-2012 University of Cambridge. |
Copyright (c) 1997-2012 University of Cambridge. |