| version 1.1.1.2, 2012/02/21 23:50:25 | version 1.1.1.3, 2012/10/09 09:19:17 | 
| Line 111  COMMAND LINE OPTIONS | Line 111  COMMAND LINE OPTIONS | 
 | size megabytes. | size megabytes. | 
 |  |  | 
 | -s or -s+ Behave as if each pattern  has  the  /S  modifier;  in  other | -s or -s+ Behave as if each pattern  has  the  /S  modifier;  in  other | 
| words,  force each pattern to be studied. If -s+ is used, the | words,  force each pattern to be studied. If -s+ is used, all | 
| PCRE_STUDY_JIT_COMPILE flag is  passed  to  pcre[16]_study(), | the JIT compile options are passed to pcre[16]_study(), caus- | 
| causing  just-in-time  optimization  to  be  set  up if it is | ing  just-in-time  optimization  to be set up if it is avail- | 
| available. If the /I or /D option is  present  on  a  pattern | able, for both full and partial matching. Specific  JIT  com- | 
| (requesting  output  about the compiled pattern), information | pile options can be selected by following -s+ with a digit in | 
| about the result of studying is not included when studying is | the range 1 to 7, which selects the JIT compile modes as fol- | 
| caused  only  by  -s  and neither -i nor -d is present on the | lows: | 
| command line. This behaviour means that the output from tests |  | 
| that  are run with and without -s should be identical, except |  | 
| when options that output information about the actual running |  | 
| of a match are set. |  | 
 |  |  | 
| The  -M,  -t,  and  -tm options, which give information about | 1  normal match only | 
| resources used, are likely to produce different  output  with | 2  soft partial match only | 
| and  without  -s.  Output may also differ if the /C option is | 3  normal match and soft partial match | 
| present on an individual pattern. This uses callouts to trace | 4  hard partial match only | 
| the  the  matching process, and this may be different between | 6  soft and hard partial match | 
| studied and non-studied patterns.  If  the  pattern  contains | 7  all three modes (default) | 
| (*MARK)  items  there  may  also be differences, for the same |  | 
| reason. The -s command line option can be overridden for spe- |  | 
| cific  patterns that should never be studied (see the /S pat- |  | 
| tern modifier below). |  | 
 |  |  | 
| -t        Run each compile, study, and match many times with  a  timer, | If  -s++  is used instead of -s+ (with or without a following | 
| and  output resulting time per compile or match (in millisec- | digit), the text "(JIT)" is added to the  first  output  line | 
| onds). Do not set -m with -t, because you will then  get  the | after a match or no match when JIT-compiled code was actually | 
| size  output  a  zillion  times,  and the timing will be dis- | used. | 
| torted. You can control the number  of  iterations  that  are |  | 
| used  for timing by following -t with a number (as a separate | If the /I or /D option is present on a pattern (requesting output about | 
|  | the  compiled pattern), information about the result of studying is not | 
|  | included when studying is caused only by -s and neither -i  nor  -d  is | 
|  | present  on the command line. This behaviour means that the output from | 
|  | tests that are run with and without -s should be identical, except when | 
|  | options that output information about the actual running of a match are | 
|  | set. | 
|  |  | 
|  | The -M, -t, and -tm options, which  give  information  about  resources | 
|  | used,  are likely to produce different output with and without -s. Out- | 
|  | put may also differ if the /C option is present on an  individual  pat- | 
|  | tern.  This  uses  callouts to trace the the matching process, and this | 
|  | may be different between studied and non-studied patterns. If the  pat- | 
|  | tern contains (*MARK) items there may also be differences, for the same | 
|  | reason. The -s command line option can be overridden for specific  pat- | 
|  | terns that should never be studied (see the /S pattern modifier below). | 
|  |  | 
|  | -t        Run  each  compile, study, and match many times with a timer, | 
|  | and output resulting time per compile or match (in  millisec- | 
|  | onds).  Do  not set -m with -t, because you will then get the | 
|  | size output a zillion times, and  the  timing  will  be  dis- | 
|  | torted.  You  can  control  the number of iterations that are | 
|  | used for timing by following -t with a number (as a  separate | 
 | item on the command line). For example, "-t 1000" would iter- | item on the command line). For example, "-t 1000" would iter- | 
 | ate 1000 times. The default is to iterate 500000 times. | ate 1000 times. The default is to iterate 500000 times. | 
 |  |  | 
| Line 149  COMMAND LINE OPTIONS | Line 163  COMMAND LINE OPTIONS | 
 |  |  | 
 | DESCRIPTION | DESCRIPTION | 
 |  |  | 
| If pcretest is given two filename arguments, it reads  from  the  first | If  pcretest  is  given two filename arguments, it re       If  pcretest  is  given two filename arguments, it reads from the first | 
 | and writes to the second. If it is given only one filename argument, it | and writes to the second. If it is given only one filename argument, it | 
| reads from that file and writes to stdout.  Otherwise,  it  reads  from | reads  from  that  file  and writes to stdout. Otherwise, it reads from | 
| stdin  and  writes to stdout, and prompts for each line of input, using | stdin and writes to stdout, and prompts for each line of  input,  using | 
 | "re>" to prompt for regular expressions, and "data>" to prompt for data | "re>" to prompt for regular expressions, and "data>" to prompt for data | 
 | lines. | lines. | 
 |  |  | 
| When  pcretest  is  built,  a  configuration option can specify that it | When pcretest is built, a configuration  option  can  specify  that  it | 
| should be linked with the libreadline library. When this  is  done,  if | should  be  linked  with the libreadline library. When this is done, if | 
 | the input is from a terminal, it is read using the readline() function. | the input is from a terminal, it is read using the readline() function. | 
| This provides line-editing and history facilities. The output from  the | This  provides line-editing and history facilities. The output from the | 
 | -help option states whether or not readline() will be used. | -help option states whether or not readline() will be used. | 
 |  |  | 
 | The program handles any number of sets of input on a single input file. | The program handles any number of sets of input on a single input file. | 
| Each set starts with a regular expression, and continues with any  num- | Each  set starts with a regular expression, and continues with any num- | 
 | ber of data lines to be matched against the pattern. | ber of data lines to be matched against the pattern. | 
 |  |  | 
| Each  data line is matched separately and independently. If you want to | Each data line is matched separately and independently. If you want  to | 
 | do multi-line matches, you have to use the \n escape sequence (or \r or | do multi-line matches, you have to use the \n escape sequence (or \r or | 
 | \r\n, etc., depending on the newline setting) in a single line of input | \r\n, etc., depending on the newline setting) in a single line of input | 
| to encode the newline sequences. There is no limit  on  the  length  of | to  encode  the  newline  sequences. There is no limit on the length of | 
| data  lines;  the  input  buffer is automatically extended if it is too | data lines; the input buffer is automatically extended  if  it  is  too | 
 | small. | small. | 
 |  |  | 
| An empty line signals the end of the data lines, at which point  a  new | An  empty  line signals the end of the data lines, at which point a new | 
| regular  expression is read. The regular expressions are given enclosed | regular expression is read. The regular expressions are given  enclosed | 
 | in any non-alphanumeric delimiters other than backslash, for example: | in any non-alphanumeric delimiters other than backslash, for example: | 
 |  |  | 
 | /(a|bc)x+yz/ | /(a|bc)x+yz/ | 
 |  |  | 
| White space before the initial delimiter is ignored. A regular  expres- | White  space before the initial delimiter is ignored. A regular expres- | 
| sion  may be continued over several input lines, in which case the new- | sion may be continued over several input lines, in which case the  new- | 
| line characters are included within it. It is possible to  include  the | line  characters  are included within it. It is possible to include the | 
 | delimiter within the pattern by escaping it, for example | delimiter within the pattern by escaping it, for example | 
 |  |  | 
 | /abc\/def/ | /abc\/def/ | 
 |  |  | 
| If  you  do  so, the escape and the delimiter form part of the pattern, | If you do so, the escape and the delimiter form part  of  the  pattern, | 
| but since delimiters are always non-alphanumeric, this does not  affect | but  since delimiters are always non-alphanumeric, this does not affect | 
| its  interpretation.   If the terminating delimiter is immediately fol- | its interpretation.  If the terminating delimiter is  immediately  fol- | 
 | lowed by a backslash, for example, | lowed by a backslash, for example, | 
 |  |  | 
 | /abc/\ | /abc/\ | 
 |  |  | 
| then a backslash is added to the end of the pattern. This  is  done  to | then  a  backslash  is added to the end of the pattern. This is done to | 
| provide  a  way of testing the error condition that arises if a pattern | provide a way of testing the error condition that arises if  a  pattern | 
 | finishes with a backslash, because | finishes with a backslash, because | 
 |  |  | 
 | /abc\/ | /abc\/ | 
 |  |  | 
| is interpreted as the first line of a pattern that starts with  "abc/", | is  interpreted as the first line of a pattern that starts with "abc/", | 
 | causing pcretest to read the next line as a continuation of the regular | causing pcretest to read the next line as a continuation of the regular | 
 | expression. | expression. | 
 |  |  | 
 |  |  | 
 | PATTERN MODIFIERS | PATTERN MODIFIERS | 
 |  |  | 
| A pattern may be followed by any number of modifiers, which are  mostly | A  pattern may be followed by any number of modifiers, which are mostly | 
| single  characters.  Following  Perl usage, these are referred to below | single characters. Following Perl usage, these are  referred  to  below | 
| as, for example, "the /i modifier", even though the  delimiter  of  the | as,  for  example,  "the /i modifier", even though the delimiter of the | 
| pattern  need  not always be a slash, and no slash is used when writing | pattern need not always be a slash, and no slash is used  when  writing | 
| modifiers. White space may appear between the final  pattern  delimiter | modifiers.  White  space may appear between the final pattern delimiter | 
 | and the first modifier, and between the modifiers themselves. | and the first modifier, and between the modifiers themselves. | 
 |  |  | 
 | The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, | The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, | 
 | PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre[16]_com- | PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre[16]_com- | 
| pile()  is  called. These four modifier letters have the same effect as | pile() is called. These four modifier letters have the same  effect  as | 
 | they do in Perl. For example: | they do in Perl. For example: | 
 |  |  | 
 | /caseless/i | /caseless/i | 
 |  |  | 
| The following table shows additional modifiers for  setting  PCRE  com- | The  following  table  shows additional modifiers for setting PCR       The  following  table  shows additional modifiers for setting PCR | 
 | pile-time options that do not correspond to anything in Perl: | pile-time options that do not correspond to anything in Perl: | 
 |  |  | 
 | /8              PCRE_UTF8           ) when using the 8-bit | /8              PCRE_UTF8           ) when using the 8-bit | 
| Line 248  PATTERN MODIFIERS | Line 262  PATTERN MODIFIERS | 
 | /<bsr_anycrlf>  PCRE_BSR_ANYCRLF | /<bsr_anycrlf>  PCRE_BSR_ANYCRLF | 
 | /<bsr_unicode>  PCRE_BSR_UNICODE | /<bsr_unicode>  PCRE_BSR_UNICODE | 
 |  |  | 
| The  modifiers  that are enclosed in angle brackets are literal strings | The modifiers that are enclosed in angle brackets are  literal  strings | 
| as shown, including the angle brackets, but the letters within  can  be | as  shown,  including the angle brackets, but the letters within can be | 
| in  either case.  This example sets multiline matching with CRLF as the | in either case.  This example sets multiline matching with CRLF as  the | 
 | line ending sequence: | line ending sequence: | 
 |  |  | 
 | /^abc/m<CRLF> | /^abc/m<CRLF> | 
 |  |  | 
| As well as turning on the PCRE_UTF8/16 option, the /8  modifier  causes | As  well  as turning on the PCRE_UTF8/16 option, the /8 modifier causes | 
| all  non-printing  characters in output strings to be printed using the | all non-printing characters in output strings to be printed  using  the | 
| \x{hh...} notation. Otherwise, those less than 0x100 are output in  hex | \x{hh...}  notation. Otherwise, those less than 0x100 are output in hex | 
 | without the curly brackets. | without the curly brackets. | 
 |  |  | 
| Full  details  of  the PCRE options are given in the pcreapi documenta- | Full details of the PCRE options are given in  the  pcreapi  documenta- | 
 | tion. | tion. | 
 |  |  | 
 | Finding all matches in a string | Finding all matches in a string | 
 |  |  | 
| Searching for all possible matches within each subject  string  can  be | Searching  for  all  possible matches within each subject string can be | 
| requested  by  the  /g  or  /G modifier. After finding a match, PCRE is | requested by the /g or /G modifier. After  finding  a  match,  PCRE  is | 
 | called again to search the remainder of the subject string. The differ- | called again to search the remainder of the subject string. The differ- | 
 | ence between /g and /G is that the former uses the startoffset argument | ence between /g and /G is that the former uses the startoffset argument | 
| to pcre[16]_exec() to start searching at a new point within the  entire | to  pcre[16]_exec() to start searching at a new point within the entire | 
| string  (which  is in effect what Perl does), whereas the latter passes | string (which is in effect what Perl does), whereas the  latter  passes | 
| over a shortened substring. This makes a  difference  to  the  matching | over  a  shortened  substring.  This makes a difference to the matching | 
 | process if the pattern begins with a lookbehind assertion (including \b | process if the pattern begins with a lookbehind assertion (including \b | 
 | or \B). | or \B). | 
 |  |  | 
| If any call to pcre[16]_exec() in a /g or /G sequence matches an  empty | If  any call to pcre[16]_exec() in a /g or /G sequence matches an empty | 
| string,  the  next  call  is  done  with  the PCRE_NOTEMPTY_ATSTART and | string, the next  call  is  done  with  the  PCRE_NOTEMPTY_ATSTART  and | 
| PCRE_ANCHORED flags set in order  to  search  for  another,  non-empty, | PCRE_ANCHORED  flags  set  in  order  to search for another, non-empty, | 
| match  at  the same point. If this second match fails, the start offset | match at the same point. If this second match fails, the  start  offset | 
| is advanced, and the normal match is retried.  This  imitates  the  way | is  advanced,  and  the  normal match is retried. This imitates the way | 
 | Perl handles such cases when using the /g modifier or the split() func- | Perl handles such cases when using the /g modifier or the split() func- | 
| tion. Normally, the start offset is advanced by one character,  but  if | tion.  Normally,  the start offset is advanced by one character, but if | 
| the  newline  convention  recognizes CRLF as a newline, and the current | the newline convention recognizes CRLF as a newline,  and  the  current | 
 | character is CR followed by LF, an advance of two is used. | character is CR followed by LF, an advance of two is used. | 
 |  |  | 
 | Other modifiers | Other modifiers | 
 |  |  | 
 | There are yet more modifiers for controlling the way pcretest operates. | There are yet more modifiers for controlling the way pcretest operates. | 
 |  |  | 
| The /+ modifier requests that as well as outputting the substring  that | The  /+ modifier requests that as well as outputting the substring that | 
| matched  the  entire  pattern,  pcretest  should in addition output the | matched the entire pattern, pcretest  should  in  addition  output  the | 
| remainder of the subject string. This is useful  for  tests  where  the | remainder  of  the  subject  string. This is useful for tests where the | 
| subject  contains multiple copies of the same substring. If the + modi- | subject contains multiple copies of the same substring. If the +  modi- | 
| fier appears twice, the same action is taken for  captured  substrings. | fier  appears  twice, the same action is taken for captured substrings. | 
| In  each case the remainder is output on the following line with a plus | In each case the remainder is output on the following line with a  plus | 
| character following the capture number. Note that  this  modifier  must | character  following  the  capture number. Note that this modifier must | 
| not immediately follow the /S modifier because /S+ has another meaning. | not immediately follow the /S modifier because /S+ and /S++ have  other | 
|  | meanings. | 
 |  |  | 
 | The  /=  modifier  requests  that  the values of all potential captured | The  /=  modifier  requests  that  the values of all potential captured | 
 | parentheses be output after a match. By default, only those up  to  the | parentheses be output after a match. By default, only those up  to  the | 
| Line 368  PATTERN MODIFIERS | Line 383  PATTERN MODIFIERS | 
 | different when the pattern is studied. | different when the pattern is studied. | 
 |  |  | 
 | If the /S modifier is immediately followed by a + character,  the  call | If the /S modifier is immediately followed by a + character,  the  call | 
| to  pcre[16]_study()  is  made  with the PCRE_STUDY_JIT_COMPILE option, | to  pcre[16]_study() is made with all the JIT study options, requesting | 
| requesting just-in-time optimization support if it is  available.  Note | just-in-time optimization support if it is available, for  both  normal | 
| that  there  is  also  a  /+ modifier; it must not be given immediately | and  partial matching. If you want to restrict the JIT compiling modes, | 
| after /S because this will be misinterpreted. If JIT studying  is  suc- | you can follow /S+ with a digit in the range 1 to 7: | 
| cessful,  it  will  automatically  be used when pcre[16]_exec() is run, |  | 
| except when incompatible run-time options are specified. These  include |  | 
| the  partial  matching options; a complete list is given in the pcrejit |  | 
| documentation. See also the \J escape sequence below for a way of  set- |  | 
| ting the size of the JIT stack. |  | 
 |  |  | 
 |  | 1  normal match only | 
 |  | 2  soft partial match only | 
 |  | 3  normal match and soft partial match | 
 |  | 4  hard partial match only | 
 |  | 6  soft and hard partial match | 
 |  | 7  all three modes (default) | 
 |  |  | 
 |  | If /S++ is used instead of /S+ (with or without a following digit), the | 
 |  | text  "(JIT)"  is  added  to  the first output line after a match or no | 
 |  | match when JIT-compiled code was actually used. | 
 |  |  | 
 |  | Note that there is also an independent /+  modifier;  it  must  not  be | 
 |  | given immediately after /S or /S+ because this will be misinterpreted. | 
 |  |  | 
 |  | If JIT studying is successful, the compiled JIT code will automatically | 
 |  | be used when pcre[16]_exec() is run, except when incompatible  run-time | 
 |  | options are specified. For more details, see the pcrejit documentation. | 
 |  | See also the \J escape sequence below for a way of setting the size  of | 
 |  | the JIT stack. | 
 |  |  | 
 | The  /T  modifier  must be followed by a single digit. It causes a spe- | The  /T  modifier  must be followed by a single digit. It causes a spe- | 
 | cific set of built-in character tables to be  passed  to  pcre[16]_com- | cific set of built-in character tables to be  passed  to  pcre[16]_com- | 
 | pile().  It  is used in the standard PCRE tests to check behaviour with | pile().  It  is used in the standard PCRE tests to check behaviour with | 
| Line 869  AUTHOR | Line 899  AUTHOR | 
 |  |  | 
 | REVISION | REVISION | 
 |  |  | 
| Last updated: 14 January 2012 | Last updated: 21 February 2012 | 
 | Copyright (c) 1997-2012 University of Cambridge. | Copyright (c) 1997-2012 University of Cambridge. |