--- embedaddon/pcre/doc/pcretest.txt 2012/02/21 23:50:25 1.1.1.2 +++ embedaddon/pcre/doc/pcretest.txt 2012/10/09 09:19:17 1.1.1.3 @@ -111,35 +111,49 @@ COMMAND LINE OPTIONS size megabytes. -s or -s+ Behave as if each pattern has the /S modifier; in other - words, force each pattern to be studied. If -s+ is used, the - PCRE_STUDY_JIT_COMPILE flag is passed to pcre[16]_study(), - causing just-in-time optimization to be set up if it is - available. If the /I or /D option is present on a pattern - (requesting output about the compiled pattern), information - about the result of studying is not included when studying is - caused only by -s and neither -i nor -d is present on the - command line. This behaviour means that the output from tests - that are run with and without -s should be identical, except - when options that output information about the actual running - of a match are set. + words, force each pattern to be studied. If -s+ is used, all + the JIT compile options are passed to pcre[16]_study(), caus- + ing just-in-time optimization to be set up if it is avail- + able, for both full and partial matching. Specific JIT com- + pile options can be selected by following -s+ with a digit in + the range 1 to 7, which selects the JIT compile modes as fol- + lows: - The -M, -t, and -tm options, which give information about - resources used, are likely to produce different output with - and without -s. Output may also differ if the /C option is - present on an individual pattern. This uses callouts to trace - the the matching process, and this may be different between - studied and non-studied patterns. If the pattern contains - (*MARK) items there may also be differences, for the same - reason. The -s command line option can be overridden for spe- - cific patterns that should never be studied (see the /S pat- - tern modifier below). + 1 normal match only + 2 soft partial match only + 3 normal match and soft partial match + 4 hard partial match only + 6 soft and hard partial match + 7 all three modes (default) - -t Run each compile, study, and match many times with a timer, - and output resulting time per compile or match (in millisec- - onds). Do not set -m with -t, because you will then get the - size output a zillion times, and the timing will be dis- - torted. You can control the number of iterations that are - used for timing by following -t with a number (as a separate + If -s++ is used instead of -s+ (with or without a following + digit), the text "(JIT)" is added to the first output line + after a match or no match when JIT-compiled code was actually + used. + + If the /I or /D option is present on a pattern (requesting output about + the compiled pattern), information about the result of studying is not + included when studying is caused only by -s and neither -i nor -d is + present on the command line. This behaviour means that the output from + tests that are run with and without -s should be identical, except when + options that output information about the actual running of a match are + set. + + The -M, -t, and -tm options, which give information about resources + used, are likely to produce different output with and without -s. Out- + put may also differ if the /C option is present on an individual pat- + tern. This uses callouts to trace the the matching process, and this + may be different between studied and non-studied patterns. If the pat- + tern contains (*MARK) items there may also be differences, for the same + reason. The -s command line option can be overridden for specific pat- + terns that should never be studied (see the /S pattern modifier below). + + -t Run each compile, study, and match many times with a timer, + and output resulting time per compile or match (in millisec- + onds). Do not set -m with -t, because you will then get the + size output a zillion times, and the timing will be dis- + torted. You can control the number of iterations that are + used for timing by following -t with a number (as a separate item on the command line). For example, "-t 1000" would iter- ate 1000 times. The default is to iterate 500000 times. @@ -149,78 +163,78 @@ COMMAND LINE OPTIONS DESCRIPTION - If pcretest is given two filename arguments, it reads from the first + If pcretest is given two filename arguments, it reads from the first and writes to the second. If it is given only one filename argument, it - reads from that file and writes to stdout. Otherwise, it reads from - stdin and writes to stdout, and prompts for each line of input, using + reads from that file and writes to stdout. Otherwise, it reads from + stdin and writes to stdout, and prompts for each line of input, using "re>" to prompt for regular expressions, and "data>" to prompt for data lines. - When pcretest is built, a configuration option can specify that it - should be linked with the libreadline library. When this is done, if + When pcretest is built, a configuration option can specify that it + should be linked with the libreadline library. When this is done, if the input is from a terminal, it is read using the readline() function. - This provides line-editing and history facilities. The output from the + This provides line-editing and history facilities. The output from the -help option states whether or not readline() will be used. The program handles any number of sets of input on a single input file. - Each set starts with a regular expression, and continues with any num- + Each set starts with a regular expression, and continues with any num- ber of data lines to be matched against the pattern. - Each data line is matched separately and independently. If you want to + Each data line is matched separately and independently. If you want to do multi-line matches, you have to use the \n escape sequence (or \r or \r\n, etc., depending on the newline setting) in a single line of input - to encode the newline sequences. There is no limit on the length of - data lines; the input buffer is automatically extended if it is too + to encode the newline sequences. There is no limit on the length of + data lines; the input buffer is automatically extended if it is too small. - An empty line signals the end of the data lines, at which point a new - regular expression is read. The regular expressions are given enclosed + An empty line signals the end of the data lines, at which point a new + regular expression is read. The regular expressions are given enclosed in any non-alphanumeric delimiters other than backslash, for example: /(a|bc)x+yz/ - White space before the initial delimiter is ignored. A regular expres- - sion may be continued over several input lines, in which case the new- - line characters are included within it. It is possible to include the + White space before the initial delimiter is ignored. A regular expres- + sion may be continued over several input lines, in which case the new- + line characters are included within it. It is possible to include the delimiter within the pattern by escaping it, for example /abc\/def/ - If you do so, the escape and the delimiter form part of the pattern, - but since delimiters are always non-alphanumeric, this does not affect - its interpretation. If the terminating delimiter is immediately fol- + If you do so, the escape and the delimiter form part of the pattern, + but since delimiters are always non-alphanumeric, this does not affect + its interpretation. If the terminating delimiter is immediately fol- lowed by a backslash, for example, /abc/\ - then a backslash is added to the end of the pattern. This is done to - provide a way of testing the error condition that arises if a pattern + then a backslash is added to the end of the pattern. This is done to + provide a way of testing the error condition that arises if a pattern finishes with a backslash, because /abc\/ - is interpreted as the first line of a pattern that starts with "abc/", + is interpreted as the first line of a pattern that starts with "abc/", causing pcretest to read the next line as a continuation of the regular expression. PATTERN MODIFIERS - A pattern may be followed by any number of modifiers, which are mostly - single characters. Following Perl usage, these are referred to below - as, for example, "the /i modifier", even though the delimiter of the - pattern need not always be a slash, and no slash is used when writing - modifiers. White space may appear between the final pattern delimiter + A pattern may be followed by any number of modifiers, which are mostly + single characters. Following Perl usage, these are referred to below + as, for example, "the /i modifier", even though the delimiter of the + pattern need not always be a slash, and no slash is used when writing + modifiers. White space may appear between the final pattern delimiter and the first modifier, and between the modifiers themselves. The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre[16]_com- - pile() is called. These four modifier letters have the same effect as + pile() is called. These four modifier letters have the same effect as they do in Perl. For example: /caseless/i - The following table shows additional modifiers for setting PCRE com- + The following table shows additional modifiers for setting PCRE com- pile-time options that do not correspond to anything in Perl: /8 PCRE_UTF8 ) when using the 8-bit @@ -248,55 +262,56 @@ PATTERN MODIFIERS / PCRE_BSR_ANYCRLF / PCRE_BSR_UNICODE - The modifiers that are enclosed in angle brackets are literal strings - as shown, including the angle brackets, but the letters within can be - in either case. This example sets multiline matching with CRLF as the + The modifiers that are enclosed in angle brackets are literal strings + as shown, including the angle brackets, but the letters within can be + in either case. This example sets multiline matching with CRLF as the line ending sequence: /^abc/m - As well as turning on the PCRE_UTF8/16 option, the /8 modifier causes - all non-printing characters in output strings to be printed using the - \x{hh...} notation. Otherwise, those less than 0x100 are output in hex + As well as turning on the PCRE_UTF8/16 option, the /8 modifier causes + all non-printing characters in output strings to be printed using the + \x{hh...} notation. Otherwise, those less than 0x100 are output in hex without the curly brackets. - Full details of the PCRE options are given in the pcreapi documenta- + Full details of the PCRE options are given in the pcreapi documenta- tion. Finding all matches in a string - Searching for all possible matches within each subject string can be - requested by the /g or /G modifier. After finding a match, PCRE is + Searching for all possible matches within each subject string can be + requested by the /g or /G modifier. After finding a match, PCRE is called again to search the remainder of the subject string. The differ- ence between /g and /G is that the former uses the startoffset argument - to pcre[16]_exec() to start searching at a new point within the entire - string (which is in effect what Perl does), whereas the latter passes - over a shortened substring. This makes a difference to the matching + to pcre[16]_exec() to start searching at a new point within the entire + string (which is in effect what Perl does), whereas the latter passes + over a shortened substring. This makes a difference to the matching process if the pattern begins with a lookbehind assertion (including \b or \B). - If any call to pcre[16]_exec() in a /g or /G sequence matches an empty - string, the next call is done with the PCRE_NOTEMPTY_ATSTART and - PCRE_ANCHORED flags set in order to search for another, non-empty, - match at the same point. If this second match fails, the start offset - is advanced, and the normal match is retried. This imitates the way + If any call to pcre[16]_exec() in a /g or /G sequence matches an empty + string, the next call is done with the PCRE_NOTEMPTY_ATSTART and + PCRE_ANCHORED flags set in order to search for another, non-empty, + match at the same point. If this second match fails, the start offset + is advanced, and the normal match is retried. This imitates the way Perl handles such cases when using the /g modifier or the split() func- - tion. Normally, the start offset is advanced by one character, but if - the newline convention recognizes CRLF as a newline, and the current + tion. Normally, the start offset is advanced by one character, but if + the newline convention recognizes CRLF as a newline, and the current character is CR followed by LF, an advance of two is used. Other modifiers There are yet more modifiers for controlling the way pcretest operates. - The /+ modifier requests that as well as outputting the substring that - matched the entire pattern, pcretest should in addition output the - remainder of the subject string. This is useful for tests where the - subject contains multiple copies of the same substring. If the + modi- - fier appears twice, the same action is taken for captured substrings. - In each case the remainder is output on the following line with a plus - character following the capture number. Note that this modifier must - not immediately follow the /S modifier because /S+ has another meaning. + The /+ modifier requests that as well as outputting the substring that + matched the entire pattern, pcretest should in addition output the + remainder of the subject string. This is useful for tests where the + subject contains multiple copies of the same substring. If the + modi- + fier appears twice, the same action is taken for captured substrings. + In each case the remainder is output on the following line with a plus + character following the capture number. Note that this modifier must + not immediately follow the /S modifier because /S+ and /S++ have other + meanings. The /= modifier requests that the values of all potential captured parentheses be output after a match. By default, only those up to the @@ -368,16 +383,31 @@ PATTERN MODIFIERS different when the pattern is studied. If the /S modifier is immediately followed by a + character, the call - to pcre[16]_study() is made with the PCRE_STUDY_JIT_COMPILE option, - requesting just-in-time optimization support if it is available. Note - that there is also a /+ modifier; it must not be given immediately - after /S because this will be misinterpreted. If JIT studying is suc- - cessful, it will automatically be used when pcre[16]_exec() is run, - except when incompatible run-time options are specified. These include - the partial matching options; a complete list is given in the pcrejit - documentation. See also the \J escape sequence below for a way of set- - ting the size of the JIT stack. + to pcre[16]_study() is made with all the JIT study options, requesting + just-in-time optimization support if it is available, for both normal + and partial matching. If you want to restrict the JIT compiling modes, + you can follow /S+ with a digit in the range 1 to 7: + 1 normal match only + 2 soft partial match only + 3 normal match and soft partial match + 4 hard partial match only + 6 soft and hard partial match + 7 all three modes (default) + + If /S++ is used instead of /S+ (with or without a following digit), the + text "(JIT)" is added to the first output line after a match or no + match when JIT-compiled code was actually used. + + Note that there is also an independent /+ modifier; it must not be + given immediately after /S or /S+ because this will be misinterpreted. + + If JIT studying is successful, the compiled JIT code will automatically + be used when pcre[16]_exec() is run, except when incompatible run-time + options are specified. For more details, see the pcrejit documentation. + See also the \J escape sequence below for a way of setting the size of + the JIT stack. + The /T modifier must be followed by a single digit. It causes a spe- cific set of built-in character tables to be passed to pcre[16]_com- pile(). It is used in the standard PCRE tests to check behaviour with @@ -869,5 +899,5 @@ AUTHOR REVISION - Last updated: 14 January 2012 + Last updated: 21 February 2012 Copyright (c) 1997-2012 University of Cambridge.