|
|
| version 1.1.1.2, 2012/02/21 23:50:25 | version 1.1.1.3, 2012/10/09 09:19:17 |
|---|---|
| Line 111 COMMAND LINE OPTIONS | Line 111 COMMAND LINE OPTIONS |
| size megabytes. | size megabytes. |
| -s or -s+ Behave as if each pattern has the /S modifier; in other | -s or -s+ Behave as if each pattern has the /S modifier; in other |
| words, force each pattern to be studied. If -s+ is used, the | words, force each pattern to be studied. If -s+ is used, all |
| PCRE_STUDY_JIT_COMPILE flag is passed to pcre[16]_study(), | the JIT compile options are passed to pcre[16]_study(), caus- |
| causing just-in-time optimization to be set up if it is | ing just-in-time optimization to be set up if it is avail- |
| available. If the /I or /D option is present on a pattern | able, for both full and partial matching. Specific JIT com- |
| (requesting output about the compiled pattern), information | pile options can be selected by following -s+ with a digit in |
| about the result of studying is not included when studying is | the range 1 to 7, which selects the JIT compile modes as fol- |
| caused only by -s and neither -i nor -d is present on the | lows: |
| command line. This behaviour means that the output from tests | |
| that are run with and without -s should be identical, except | |
| when options that output information about the actual running | |
| of a match are set. | |
| The -M, -t, and -tm options, which give information about | 1 normal match only |
| resources used, are likely to produce different output with | 2 soft partial match only |
| and without -s. Output may also differ if the /C option is | 3 normal match and soft partial match |
| present on an individual pattern. This uses callouts to trace | 4 hard partial match only |
| the the matching process, and this may be different between | 6 soft and hard partial match |
| studied and non-studied patterns. If the pattern contains | 7 all three modes (default) |
| (*MARK) items there may also be differences, for the same | |
| reason. The -s command line option can be overridden for spe- | |
| cific patterns that should never be studied (see the /S pat- | |
| tern modifier below). | |
| -t Run each compile, study, and match many times with a timer, | If -s++ is used instead of -s+ (with or without a following |
| and output resulting time per compile or match (in millisec- | digit), the text "(JIT)" is added to the first output line |
| onds). Do not set -m with -t, because you will then get the | after a match or no match when JIT-compiled code was actually |
| size output a zillion times, and the timing will be dis- | used. |
| torted. You can control the number of iterations that are | |
| used for timing by following -t with a number (as a separate | If the /I or /D option is present on a pattern (requesting output about |
| the compiled pattern), information about the result of studying is not | |
| included when studying is caused only by -s and neither -i nor -d is | |
| present on the command line. This behaviour means that the output from | |
| tests that are run with and without -s should be identical, except when | |
| options that output information about the actual running of a match are | |
| set. | |
| The -M, -t, and -tm options, which give information about resources | |
| used, are likely to produce different output with and without -s. Out- | |
| put may also differ if the /C option is present on an individual pat- | |
| tern. This uses callouts to trace the the matching process, and this | |
| may be different between studied and non-studied patterns. If the pat- | |
| tern contains (*MARK) items there may also be differences, for the same | |
| reason. The -s command line option can be overridden for specific pat- | |
| terns that should never be studied (see the /S pattern modifier below). | |
| -t Run each compile, study, and match many times with a timer, | |
| and output resulting time per compile or match (in millisec- | |
| onds). Do not set -m with -t, because you will then get the | |
| size output a zillion times, and the timing will be dis- | |
| torted. You can control the number of iterations that are | |
| used for timing by following -t with a number (as a separate | |
| item on the command line). For example, "-t 1000" would iter- | item on the command line). For example, "-t 1000" would iter- |
| ate 1000 times. The default is to iterate 500000 times. | ate 1000 times. The default is to iterate 500000 times. |
| Line 149 COMMAND LINE OPTIONS | Line 163 COMMAND LINE OPTIONS |
| DESCRIPTION | DESCRIPTION |
| If pcretest is given two filename arguments, it reads from the first | If pcretest is given two filename arguments, it re If pcretest is given two filename arguments, it reads from the first |
| and writes to the second. If it is given only one filename argument, it | and writes to the second. If it is given only one filename argument, it |
| reads from that file and writes to stdout. Otherwise, it reads from | reads from that file and writes to stdout. Otherwise, it reads from |
| stdin and writes to stdout, and prompts for each line of input, using | stdin and writes to stdout, and prompts for each line of input, using |
| "re>" to prompt for regular expressions, and "data>" to prompt for data | "re>" to prompt for regular expressions, and "data>" to prompt for data |
| lines. | lines. |
| When pcretest is built, a configuration option can specify that it | When pcretest is built, a configuration option can specify that it |
| should be linked with the libreadline library. When this is done, if | should be linked with the libreadline library. When this is done, if |
| the input is from a terminal, it is read using the readline() function. | the input is from a terminal, it is read using the readline() function. |
| This provides line-editing and history facilities. The output from the | This provides line-editing and history facilities. The output from the |
| -help option states whether or not readline() will be used. | -help option states whether or not readline() will be used. |
| The program handles any number of sets of input on a single input file. | The program handles any number of sets of input on a single input file. |
| Each set starts with a regular expression, and continues with any num- | Each set starts with a regular expression, and continues with any num- |
| ber of data lines to be matched against the pattern. | ber of data lines to be matched against the pattern. |
| Each data line is matched separately and independently. If you want to | Each data line is matched separately and independently. If you want to |
| do multi-line matches, you have to use the \n escape sequence (or \r or | do multi-line matches, you have to use the \n escape sequence (or \r or |
| \r\n, etc., depending on the newline setting) in a single line of input | \r\n, etc., depending on the newline setting) in a single line of input |
| to encode the newline sequences. There is no limit on the length of | to encode the newline sequences. There is no limit on the length of |
| data lines; the input buffer is automatically extended if it is too | data lines; the input buffer is automatically extended if it is too |
| small. | small. |
| An empty line signals the end of the data lines, at which point a new | An empty line signals the end of the data lines, at which point a new |
| regular expression is read. The regular expressions are given enclosed | regular expression is read. The regular expressions are given enclosed |
| in any non-alphanumeric delimiters other than backslash, for example: | in any non-alphanumeric delimiters other than backslash, for example: |
| /(a|bc)x+yz/ | /(a|bc)x+yz/ |
| White space before the initial delimiter is ignored. A regular expres- | White space before the initial delimiter is ignored. A regular expres- |
| sion may be continued over several input lines, in which case the new- | sion may be continued over several input lines, in which case the new- |
| line characters are included within it. It is possible to include the | line characters are included within it. It is possible to include the |
| delimiter within the pattern by escaping it, for example | delimiter within the pattern by escaping it, for example |
| /abc\/def/ | /abc\/def/ |
| If you do so, the escape and the delimiter form part of the pattern, | If you do so, the escape and the delimiter form part of the pattern, |
| but since delimiters are always non-alphanumeric, this does not affect | but since delimiters are always non-alphanumeric, this does not affect |
| its interpretation. If the terminating delimiter is immediately fol- | its interpretation. If the terminating delimiter is immediately fol- |
| lowed by a backslash, for example, | lowed by a backslash, for example, |
| /abc/\ | /abc/\ |
| then a backslash is added to the end of the pattern. This is done to | then a backslash is added to the end of the pattern. This is done to |
| provide a way of testing the error condition that arises if a pattern | provide a way of testing the error condition that arises if a pattern |
| finishes with a backslash, because | finishes with a backslash, because |
| /abc\/ | /abc\/ |
| is interpreted as the first line of a pattern that starts with "abc/", | is interpreted as the first line of a pattern that starts with "abc/", |
| causing pcretest to read the next line as a continuation of the regular | causing pcretest to read the next line as a continuation of the regular |
| expression. | expression. |
| PATTERN MODIFIERS | PATTERN MODIFIERS |
| A pattern may be followed by any number of modifiers, which are mostly | A pattern may be followed by any number of modifiers, which are mostly |
| single characters. Following Perl usage, these are referred to below | single characters. Following Perl usage, these are referred to below |
| as, for example, "the /i modifier", even though the delimiter of the | as, for example, "the /i modifier", even though the delimiter of the |
| pattern need not always be a slash, and no slash is used when writing | pattern need not always be a slash, and no slash is used when writing |
| modifiers. White space may appear between the final pattern delimiter | modifiers. White space may appear between the final pattern delimiter |
| and the first modifier, and between the modifiers themselves. | and the first modifier, and between the modifiers themselves. |
| The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, | The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, |
| PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre[16]_com- | PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre[16]_com- |
| pile() is called. These four modifier letters have the same effect as | pile() is called. These four modifier letters have the same effect as |
| they do in Perl. For example: | they do in Perl. For example: |
| /caseless/i | /caseless/i |
| The following table shows additional modifiers for setting PCRE com- | The following table shows additional modifiers for setting PCR The following table shows additional modifiers for setting PCR |
| pile-time options that do not correspond to anything in Perl: | pile-time options that do not correspond to anything in Perl: |
| /8 PCRE_UTF8 ) when using the 8-bit | /8 PCRE_UTF8 ) when using the 8-bit |
| Line 248 PATTERN MODIFIERS | Line 262 PATTERN MODIFIERS |
| /<bsr_anycrlf> PCRE_BSR_ANYCRLF | /<bsr_anycrlf> PCRE_BSR_ANYCRLF |
| /<bsr_unicode> PCRE_BSR_UNICODE | /<bsr_unicode> PCRE_BSR_UNICODE |
| The modifiers that are enclosed in angle brackets are literal strings | The modifiers that are enclosed in angle brackets are literal strings |
| as shown, including the angle brackets, but the letters within can be | as shown, including the angle brackets, but the letters within can be |
| in either case. This example sets multiline matching with CRLF as the | in either case. This example sets multiline matching with CRLF as the |
| line ending sequence: | line ending sequence: |
| /^abc/m<CRLF> | /^abc/m<CRLF> |
| As well as turning on the PCRE_UTF8/16 option, the /8 modifier causes | As well as turning on the PCRE_UTF8/16 option, the /8 modifier causes |
| all non-printing characters in output strings to be printed using the | all non-printing characters in output strings to be printed using the |
| \x{hh...} notation. Otherwise, those less than 0x100 are output in hex | \x{hh...} notation. Otherwise, those less than 0x100 are output in hex |
| without the curly brackets. | without the curly brackets. |
| Full details of the PCRE options are given in the pcreapi documenta- | Full details of the PCRE options are given in the pcreapi documenta- |
| tion. | tion. |
| Finding all matches in a string | Finding all matches in a string |
| Searching for all possible matches within each subject string can be | Searching for all possible matches within each subject string can be |
| requested by the /g or /G modifier. After finding a match, PCRE is | requested by the /g or /G modifier. After finding a match, PCRE is |
| called again to search the remainder of the subject string. The differ- | called again to search the remainder of the subject string. The differ- |
| ence between /g and /G is that the former uses the startoffset argument | ence between /g and /G is that the former uses the startoffset argument |
| to pcre[16]_exec() to start searching at a new point within the entire | to pcre[16]_exec() to start searching at a new point within the entire |
| string (which is in effect what Perl does), whereas the latter passes | string (which is in effect what Perl does), whereas the latter passes |
| over a shortened substring. This makes a difference to the matching | over a shortened substring. This makes a difference to the matching |
| process if the pattern begins with a lookbehind assertion (including \b | process if the pattern begins with a lookbehind assertion (including \b |
| or \B). | or \B). |
| If any call to pcre[16]_exec() in a /g or /G sequence matches an empty | If any call to pcre[16]_exec() in a /g or /G sequence matches an empty |
| string, the next call is done with the PCRE_NOTEMPTY_ATSTART and | string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
| PCRE_ANCHORED flags set in order to search for another, non-empty, | PCRE_ANCHORED flags set in order to search for another, non-empty, |
| match at the same point. If this second match fails, the start offset | match at the same point. If this second match fails, the start offset |
| is advanced, and the normal match is retried. This imitates the way | is advanced, and the normal match is retried. This imitates the way |
| Perl handles such cases when using the /g modifier or the split() func- | Perl handles such cases when using the /g modifier or the split() func- |
| tion. Normally, the start offset is advanced by one character, but if | tion. Normally, the start offset is advanced by one character, but if |
| the newline convention recognizes CRLF as a newline, and the current | the newline convention recognizes CRLF as a newline, and the current |
| character is CR followed by LF, an advance of two is used. | character is CR followed by LF, an advance of two is used. |
| Other modifiers | Other modifiers |
| There are yet more modifiers for controlling the way pcretest operates. | There are yet more modifiers for controlling the way pcretest operates. |
| The /+ modifier requests that as well as outputting the substring that | The /+ modifier requests that as well as outputting the substring that |
| matched the entire pattern, pcretest should in addition output the | matched the entire pattern, pcretest should in addition output the |
| remainder of the subject string. This is useful for tests where the | remainder of the subject string. This is useful for tests where the |
| subject contains multiple copies of the same substring. If the + modi- | subject contains multiple copies of the same substring. If the + modi- |
| fier appears twice, the same action is taken for captured substrings. | fier appears twice, the same action is taken for captured substrings. |
| In each case the remainder is output on the following line with a plus | In each case the remainder is output on the following line with a plus |
| character following the capture number. Note that this modifier must | character following the capture number. Note that this modifier must |
| not immediately follow the /S modifier because /S+ has another meaning. | not immediately follow the /S modifier because /S+ and /S++ have other |
| meanings. | |
| The /= modifier requests that the values of all potential captured | The /= modifier requests that the values of all potential captured |
| parentheses be output after a match. By default, only those up to the | parentheses be output after a match. By default, only those up to the |
| Line 368 PATTERN MODIFIERS | Line 383 PATTERN MODIFIERS |
| different when the pattern is studied. | different when the pattern is studied. |
| If the /S modifier is immediately followed by a + character, the call | If the /S modifier is immediately followed by a + character, the call |
| to pcre[16]_study() is made with the PCRE_STUDY_JIT_COMPILE option, | to pcre[16]_study() is made with all the JIT study options, requesting |
| requesting just-in-time optimization support if it is available. Note | just-in-time optimization support if it is available, for both normal |
| that there is also a /+ modifier; it must not be given immediately | and partial matching. If you want to restrict the JIT compiling modes, |
| after /S because this will be misinterpreted. If JIT studying is suc- | you can follow /S+ with a digit in the range 1 to 7: |
| cessful, it will automatically be used when pcre[16]_exec() is run, | |
| except when incompatible run-time options are specified. These include | |
| the partial matching options; a complete list is given in the pcrejit | |
| documentation. See also the \J escape sequence below for a way of set- | |
| ting the size of the JIT stack. | |
| 1 normal match only | |
| 2 soft partial match only | |
| 3 normal match and soft partial match | |
| 4 hard partial match only | |
| 6 soft and hard partial match | |
| 7 all three modes (default) | |
| If /S++ is used instead of /S+ (with or without a following digit), the | |
| text "(JIT)" is added to the first output line after a match or no | |
| match when JIT-compiled code was actually used. | |
| Note that there is also an independent /+ modifier; it must not be | |
| given immediately after /S or /S+ because this will be misinterpreted. | |
| If JIT studying is successful, the compiled JIT code will automatically | |
| be used when pcre[16]_exec() is run, except when incompatible run-time | |
| options are specified. For more details, see the pcrejit documentation. | |
| See also the \J escape sequence below for a way of setting the size of | |
| the JIT stack. | |
| The /T modifier must be followed by a single digit. It causes a spe- | The /T modifier must be followed by a single digit. It causes a spe- |
| cific set of built-in character tables to be passed to pcre[16]_com- | cific set of built-in character tables to be passed to pcre[16]_com- |
| pile(). It is used in the standard PCRE tests to check behaviour with | pile(). It is used in the standard PCRE tests to check behaviour with |
| Line 869 AUTHOR | Line 899 AUTHOR |
| REVISION | REVISION |
| Last updated: 14 January 2012 | Last updated: 21 February 2012 |
| Copyright (c) 1997-2012 University of Cambridge. | Copyright (c) 1997-2012 University of Cambridge. |