Diff for /embedaddon/pcre/doc/pcre.txt between versions 1.1.1.4 and 1.1.1.5

version 1.1.1.4, 2013/07/22 08:25:56 version 1.1.1.5, 2014/06/15 19:46:04
Line 53  INTRODUCTION Line 53  INTRODUCTION
        5.12, including support for UTF-8/16/32  encoded  strings  and  Unicode         5.12, including support for UTF-8/16/32  encoded  strings  and  Unicode
        general  category  properties. However, UTF-8/16/32 and Unicode support         general  category  properties. However, UTF-8/16/32 and Unicode support
        has to be explicitly enabled; it is not the default. The Unicode tables         has to be explicitly enabled; it is not the default. The Unicode tables
       correspond to Unicode release 6.2.0.       correspond to Unicode release 6.3.0.
   
        In  addition to the Perl-compatible matching function, PCRE contains an         In  addition to the Perl-compatible matching function, PCRE contains an
        alternative function that matches the same compiled patterns in a  dif-         alternative function that matches the same compiled patterns in a  dif-
Line 532  PCRE 32-BIT API BASIC FUNCTIONS Line 532  PCRE 32-BIT API BASIC FUNCTIONS
   
        pcre32 *pcre32_compile2(PCRE_SPTR32 pattern, int options,         pcre32 *pcre32_compile2(PCRE_SPTR32 pattern, int options,
             int *errorcodeptr,              int *errorcodeptr,
             const char **errptr, int *erroffset,  
             const unsigned char *tableptr);              const unsigned char *tableptr);
   
        pcre32_extra *pcre32_study(const pcre32 *code, int options,         pcre32_extra *pcre32_study(const pcre32 *code, int options,
Line 1458  THE ALTERNATIVE MATCHING ALGORITHM Line 1457  THE ALTERNATIVE MATCHING ALGORITHM
        at the fifth character of the subject. The algorithm does not automati-         at the fifth character of the subject. The algorithm does not automati-
        cally move on to find matches that start at later positions.         cally move on to find matches that start at later positions.
   
          PCRE's  "auto-possessification" optimization usually applies to charac-
          ter repeats at the end of a pattern (as well as internally). For  exam-
          ple, the pattern "a\d+" is compiled as if it were "a\d++" because there
          is no point even considering the possibility of backtracking  into  the
          repeated  digits.  For  DFA matching, this means that only one possible
          match is found. If you really do want multiple matches in  such  cases,
          either use an ungreedy repeat ("a\d+?") or set the PCRE_NO_AUTO_POSSESS
          option when compiling.
   
        There are a number of features of PCRE regular expressions that are not         There are a number of features of PCRE regular expressions that are not
        supported by the alternative matching algorithm. They are as follows:         supported by the alternative matching algorithm. They are as follows:
   
       1. Because the algorithm finds all  possible  matches,  the  greedy  or       1.  Because  the  algorithm  finds  all possible matches, the greedy or
       ungreedy  nature  of repetition quantifiers is not relevant. Greedy and       ungreedy nature of repetition quantifiers is not relevant.  Greedy  and
        ungreedy quantifiers are treated in exactly the same way. However, pos-         ungreedy quantifiers are treated in exactly the same way. However, pos-
       sessive  quantifiers can make a difference when what follows could also       sessive quantifiers can make a difference when what follows could  also
        match what is quantified, for example in a pattern like this:         match what is quantified, for example in a pattern like this:
   
          ^a++\w!           ^a++\w!
   
       This pattern matches "aaab!" but not "aaa!", which would be matched  by       This  pattern matches "aaab!" but not "aaa!", which would be matched by
       a  non-possessive quantifier. Similarly, if an atomic group is present,       a non-possessive quantifier. Similarly, if an atomic group is  present,
       it is matched as if it were a standalone pattern at the current  point,       it  is matched as if it were a standalone pattern at the current point,
       and  the  longest match is then "locked in" for the rest of the overall       and the longest match is then "locked in" for the rest of  the  overall
        pattern.         pattern.
   
        2. When dealing with multiple paths through the tree simultaneously, it         2. When dealing with multiple paths through the tree simultaneously, it
       is  not  straightforward  to  keep track of captured substrings for the       is not straightforward to keep track of  captured  substrings  for  the
       different matching possibilities, and  PCRE's  implementation  of  this       different  matching  possibilities,  and  PCRE's implementation of this
        algorithm does not attempt to do this. This means that no captured sub-         algorithm does not attempt to do this. This means that no captured sub-
        strings are available.         strings are available.
   
       3. Because no substrings are captured, back references within the  pat-       3.  Because no substrings are captured, back references within thpat-
        tern are not supported, and cause errors if encountered.         tern are not supported, and cause errors if encountered.
   
       4.  For  the same reason, conditional expressions that use a backrefer-       4. For the same reason, conditional expressions that use  a  backrefer-
       ence as the condition or test for a specific group  recursion  are  not       ence  as  the  condition or test for a specific group recursion are not
        supported.         supported.
   
       5.  Because  many  paths  through the tree may be active, the \K escape       5. Because many paths through the tree may be  active,  the  \K  escape
        sequence, which resets the start of the match when encountered (but may         sequence, which resets the start of the match when encountered (but may
       be  on  some  paths  and not on others), is not supported. It causes an       be on some paths and not on others), is not  supported.  It  causes  an
        error if encountered.         error if encountered.
   
       6. Callouts are supported, but the value of the  capture_top  field  is       6.  Callouts  are  supported, but the value of the capture_t       6.  Callouts  are  supported, but the value of the capture_t
        always 1, and the value of the capture_last field is always -1.         always 1, and the value of the capture_last field is always -1.
   
       7.  The  \C  escape  sequence, which (in the standard algorithm) always       7. The \C escape sequence, which (in  the  standard  algorithm)  always
       matches a single data unit, even in UTF-8, UTF-16 or UTF-32  modes,  is       matches  a  single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is
       not  supported  in these modes, because the alternative algorithm moves       not supported in these modes, because the alternative  algorithm  moves
        through the subject string one character (not data unit) at a time, for         through the subject string one character (not data unit) at a time, for
        all active paths through the tree.         all active paths through the tree.
   
       8.  Except for (*FAIL), the backtracking control verbs such as (*PRUNE)       8. Except for (*FAIL), the backtracking control verbs such as  (*PRUNE)
       are not supported. (*FAIL) is supported, and  behaves  like  a  failing       are  not  supported.  (*FAIL)  is supported, and behaves like a failing
        negative assertion.         negative assertion.
   
   
 ADVANTAGES OF THE ALTERNATIVE ALGORITHM  ADVANTAGES OF THE ALTERNATIVE ALGORITHM
   
       Using  the alternative matching algorithm provides the following advan-       Using the alternative matching algorithm provides the following  advan-
        tages:         tages:
   
        1. All possible matches (at a single point in the subject) are automat-         1. All possible matches (at a single point in the subject) are automat-
       ically  found,  and  in particular, the longest match is found. To find       ically found, and in particular, the longest match is  found.  To  find
        more than one match using the standard algorithm, you have to do kludgy         more than one match using the standard algorithm, you have to do kludgy
        things with callouts.         things with callouts.
   
       2.  Because  the  alternative  algorithm  scans the subject string just       2. Because the alternative algorithm  scans  the  subject  string  just
        once, and never needs to backtrack (except for lookbehinds), it is pos-         once, and never needs to backtrack (except for lookbehinds), it is pos-
       sible  to  pass  very  long subject strings to the matching function in       sible to pass very long subject strings to  the  matching  function  in
        several pieces, checking for partial matching each time. Although it is         several pieces, checking for partial matching each time. Although it is
       possible  to  do multi-segment matching using the standard algorithm by       possible to do multi-segment matching using the standard  algorithm  by
       retaining partially matched substrings, it  is  more  complicated.  The       retaining  partially  matched  substrings,  it is more complicated. The
       pcrepartial  documentation  gives  details of partial matching and dis-       pcrepartial documentation gives details of partial  matching  and  dis-
        cusses multi-segment matching.         cusses multi-segment matching.
   
   
Line 1531  DISADVANTAGES OF THE ALTERNATIVE ALGORITHM Line 1539  DISADVANTAGES OF THE ALTERNATIVE ALGORITHM
   
        The alternative algorithm suffers from a number of disadvantages:         The alternative algorithm suffers from a number of disadvantages:
   
       1. It is substantially slower than  the  standard  algorithm.  This  is       1.  It  is  substantially  slower  than the standard algorithm. This is
       partly  because  it has to search for all possible matches, but is also       partly because it has to search for all possible matches, but  is  also
        because it is less susceptible to optimization.         because it is less susceptible to optimization.
   
        2. Capturing parentheses and back references are not supported.         2. Capturing parentheses and back references are not supported.
Line 1550  AUTHOR Line 1558  AUTHOR
   
 REVISION  REVISION
   
       Last updated: 08 January 2012       Last updated: 12 November 2013
        Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
 ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
   
Line 1958  CHECKING BUILD-TIME OPTIONS Line 1966  CHECKING BUILD-TIME OPTIONS
        POSIX interface uses malloc() for output vectors. Further  details  are         POSIX interface uses malloc() for output vectors. Further  details  are
        given in the pcreposix documentation.         given in the pcreposix documentation.
   
            PCRE_CONFIG_PARENS_LIMIT
   
          The output is a long integer that gives the maximum depth of nesting of
          parentheses (of any kind) in a pattern. This limit is  imposed  to  cap
          the amount of system stack used when a pattern is compiled. It is spec-
          ified when PCRE is built; the default is 250.
   
          PCRE_CONFIG_MATCH_LIMIT           PCRE_CONFIG_MATCH_LIMIT
   
       The  output is a long integer that gives the default limit for the num-       The output is a long integer that gives the default limit for the  num-
       ber of internal matching function calls  in  a  pcre_exec()  execution.       ber  of  internal  matching  function calls in a pcre_exec() execution.
        Further details are given with pcre_exec() below.         Further details are given with pcre_exec() below.
   
          PCRE_CONFIG_MATCH_LIMIT_RECURSION           PCRE_CONFIG_MATCH_LIMIT_RECURSION
   
        The output is a long integer that gives the default limit for the depth         The output is a long integer that gives the default limit for the depth
       of  recursion  when  calling  the  internal  matching  function  in   a       of   recursion  when  calling  the  internal  matching  function  in  a
       pcre_exec()  execution.  Further  details  are  given  with pcre_exec()       pcre_exec() execution.  Further  details  are  given  with  pcre_exec()
        below.         below.
   
          PCRE_CONFIG_STACKRECURSE           PCRE_CONFIG_STACKRECURSE
   
       The output is an integer that is set to one if internal recursion  when       The  output is an integer that is set to one if internal recursiowhen
        running pcre_exec() is implemented by recursive function calls that use         running pcre_exec() is implemented by recursive function calls that use
       the stack to remember their state. This is the usual way that  PCRE  is       the  stack  to remember their state. This is the usual way th       the  stack  to remember their state. This is the usual way that PCRE is
        compiled. The output is zero if PCRE was compiled to use blocks of data         compiled. The output is zero if PCRE was compiled to use blocks of data
       on the  heap  instead  of  recursive  function  calls.  In  this  case,       on  the  heap  instead  of  recursive  function  calls.  In  this case,
       pcre_stack_malloc  and  pcre_stack_free  are  called  to  manage memory       pcre_stack_malloc and  pcre_stack_free  are  called  to  manage  memory
        blocks on the heap, thus avoiding the use of the stack.         blocks on the heap, thus avoiding the use of the stack.
   
   
Line 1995  COMPILING A PATTERN Line 2010  COMPILING A PATTERN
   
        Either of the functions pcre_compile() or pcre_compile2() can be called         Either of the functions pcre_compile() or pcre_compile2() can be called
        to compile a pattern into an internal form. The only difference between         to compile a pattern into an internal form. The only difference between
       the two interfaces is that pcre_compile2() has an additional  argument,       the  two interfaces is that pcre_compile2() has an additional argument,
       errorcodeptr,  via  which  a  numerical  error code can be returned. To       errorcodeptr, via which a numerical error  code  can  be  returned.  To
       avoid too much repetition, we refer just to pcre_compile()  below,  but       avoid  too  much repetition, we refer just to pcre_compile() below, but
        the information applies equally to pcre_compile2().         the information applies equally to pcre_compile2().
   
        The pattern is a C string terminated by a binary zero, and is passed in         The pattern is a C string terminated by a binary zero, and is passed in
       the pattern argument. A pointer to a single block  of  memory  that  is       the  pattern  argument.  A  pointer to a single block of memory that is
       obtained  via  pcre_malloc is returned. This contains the compiled code       obtained via pcre_malloc is returned. This contains the  compiled  code
        and related data. The pcre type is defined for the returned block; this         and related data. The pcre type is defined for the returned block; this
        is a typedef for a structure whose contents are not externally defined.         is a typedef for a structure whose contents are not externally defined.
        It is up to the caller to free the memory (via pcre_free) when it is no         It is up to the caller to free the memory (via pcre_free) when it is no
        longer required.         longer required.
   
       Although  the compiled code of a PCRE regex is relocatable, that is, it       Although the compiled code of a PCRE regex is relocatable, that is,  it
        does not depend on memory location, the complete pcre data block is not         does not depend on memory location, the complete pcre data block is not
       fully  relocatable, because it may contain a copy of the tableptr argu-       fully relocatable, because it may contain a copy of the tableptr  argu-
        ment, which is an address (see below).         ment, which is an address (see below).
   
        The options argument contains various bit settings that affect the com-         The options argument contains various bit settings that affect the com-
       pilation.  It  should be zero if no options are required. The available       pilation. It should be zero if no options are required.  The  available
       options are described below. Some of them (in  particular,  those  that       options  are  described  below. Some of them (in particular, those that
       are  compatible with Perl, but some others as well) can also be set and       are compatible with Perl, but some others as well) can also be set  and
       unset from within the pattern (see  the  detailed  description  in  the       unset  from  within  the  pattern  (see the detailed description in the
       pcrepattern  documentation). For those options that can be different in       pcrepattern documentation). For those options that can be different  in
       different parts of the pattern, the contents of  the  options  argument       different  parts  of  the pattern, the contents of the options argument
        specifies their settings at the start of compilation and execution. The         specifies their settings at the start of compilation and execution. The
       PCRE_ANCHORED, PCRE_BSR_xxx, PCRE_NEWLINE_xxx, PCRE_NO_UTF8_CHECK,  and       PCRE_ANCHORED,  PCRE_BSR_xxx, PCRE_NEWLINE_xxx, PCRE_NO_UTF8_CHECK, and
       PCRE_NO_START_OPTIMIZE  options  can  be set at the time of matching as       PCRE_NO_START_OPTIMIZE options can be set at the time  of  matching  as
        well as at compile time.         well as at compile time.
   
        If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,         If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
       if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and       if compilation of a pattern fails,  pcre_compile()  returns  NULL,  and
        sets the variable pointed to by errptr to point to a textual error mes-         sets the variable pointed to by errptr to point to a textual error mes-
        sage. This is a static string that is part of the library. You must not         sage. This is a static string that is part of the library. You must not
       try to free it. Normally, the offset from the start of the  pattern  to       try  to  free it. Normally, the offset from the start of the pattern to
        the data unit that was being processed when the error was discovered is         the data unit that was being processed when the error was discovered is
       placed in the variable pointed to by erroffset, which must not be  NULL       placed  in the variable pointed to by erroffset, which must not be NULL
       (if  it is, an immediate error is given). However, for an invalid UTF-8       (if it is, an immediate error is given). However, for an invalid  UTF-8
       or UTF-16 string, the offset is that of the  first  data  unit  of  the       or  UTF-16  string,  the  offset  is that of the first data unit of the
        failing character.         failing character.
   
       Some  errors are not detected until the whole pattern has been scanned;       Some errors are not detected until the whole pattern has been  scanned;
       in these cases, the offset passed back is the length  of  the  pattern.       in  these  cases,  the offset passed back is the length of the pattern.
       Note  that  the  offset is in data units, not characters, even in a UTF       Note that the offset is in data units, not characters, even  in  a  UTF
        mode. It may sometimes point into the middle of a UTF-8 or UTF-16 char-         mode. It may sometimes point into the middle of a UTF-8 or UTF-16 char-
        acter.         acter.
   
       If  pcre_compile2()  is  used instead of pcre_compile(), and the error-       If pcre_compile2() is used instead of pcre_compile(),  and  the  error-
       codeptr argument is not NULL, a non-zero error code number is  returned       codeptr  argument is not NULL, a non-zero error code number is returned
       via  this argument in the event of an error. This is in addition to the       via this argument in the event of an error. This is in addition to  the
        textual error message. Error codes and messages are listed below.         textual error message. Error codes and messages are listed below.
   
       If the final argument, tableptr, is NULL, PCRE uses a  default  set  of       If  the  final  argument, tableptr, is NULL, PCRE uses a default set of
       character  tables  that  are  built  when  PCRE  is compiled, using the       character tables that are  built  when  PCRE  is  compiled,  using  the
       default C locale. Otherwise, tableptr must be an address  that  is  the       default  C  locale.  Otherwise, tableptr must be an address that is the
       result  of  a  call to pcre_maketables(). This value is stored with the       result of a call to pcre_maketables(). This value is  stored  with  the
       compiled pattern, and used again by pcre_exec(), unless  another  table       compiled  pattern,  and  used  again by pcre_exec() and pcre_dfa_exec()
       pointer is passed to it. For more discussion, see the section on locale       when the pattern is matched. For more discussion, see  the  section  on
       support below.       locale support below.
   
       This code fragment shows a typical straightforward  call  to  pcre_com-       This  code  fragment  shows a typical straightforward call to pcre_com-
        pile():         pile():
   
          pcre *re;           pcre *re;
Line 2068  COMPILING A PATTERN Line 2083  COMPILING A PATTERN
            &erroffset,       /* for error offset */             &erroffset,       /* for error offset */
            NULL);            /* use default character tables */             NULL);            /* use default character tables */
   
       The  following  names  for option bits are defined in the pcre.h header       The following names for option bits are defined in  the  pcre.h  header
        file:         file:
   
          PCRE_ANCHORED           PCRE_ANCHORED
   
        If this bit is set, the pattern is forced to be "anchored", that is, it         If this bit is set, the pattern is forced to be "anchored", that is, it
       is  constrained to match only at the first matching point in the string       is constrained to match only at the first matching point in the  string
       that is being searched (the "subject string"). This effect can also  be       that  is being searched (the "subject string"). This effect can also be
       achieved  by appropriate constructs in the pattern itself, which is the       achieved by appropriate constructs in the pattern itself, which is  the
        only way to do it in Perl.         only way to do it in Perl.
   
          PCRE_AUTO_CALLOUT           PCRE_AUTO_CALLOUT
   
        If this bit is set, pcre_compile() automatically inserts callout items,         If this bit is set, pcre_compile() automatically inserts callout items,
       all  with  number  255, before each pattern item. For discussion of the       all with number 255, before each pattern item. For  discussion  of  the
        callout facility, see the pcrecallout documentation.         callout facility, see the pcrecallout documentation.
   
          PCRE_BSR_ANYCRLF           PCRE_BSR_ANYCRLF
          PCRE_BSR_UNICODE           PCRE_BSR_UNICODE
   
        These options (which are mutually exclusive) control what the \R escape         These options (which are mutually exclusive) control what the \R escape
       sequence  matches.  The choice is either to match only CR, LF, or CRLF,       sequence matches. The choice is either to match only CR, LF,  or  CRLF,
        or to match any Unicode newline sequence. The default is specified when         or to match any Unicode newline sequence. The default is specified when
        PCRE is built. It can be overridden from within the pattern, or by set-         PCRE is built. It can be overridden from within the pattern, or by set-
        ting an option when a compiled pattern is matched.         ting an option when a compiled pattern is matched.
   
          PCRE_CASELESS           PCRE_CASELESS
   
       If this bit is set, letters in the pattern match both upper  and  lower       If  this  bit is set, letters in the pattern match both upper and lower
       case  letters.  It  is  equivalent  to  Perl's /i option, and it can be       case letters. It is equivalent to Perl's  /i  option,  and  it  can  be
       changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE       changed  within a pattern by a (?i) option setting. In UTF-8 mode, PCRE
       always  understands the concept of case for characters whose values are       always understands the concept of case for characters whose values  are
       less than 128, so caseless matching is always possible. For  characters       less  than 128, so caseless matching is always possible. For characters
       with  higher  values,  the concept of case is supported if PCRE is com-       with higher values, the concept of case is supported if  PCRE  is  com-
       piled with Unicode property support, but not otherwise. If you want  to       piled  with Unicode property support, but not otherwise. If you want to
       use  caseless  matching  for  characters 128 and above, you must ensure       use caseless matching for characters 128 and  above,  you  must  ensure
       that PCRE is compiled with Unicode property support  as  well  as  with       that  PCRE  is  compiled  with Unicode property support as well as with
        UTF-8 support.         UTF-8 support.
   
          PCRE_DOLLAR_ENDONLY           PCRE_DOLLAR_ENDONLY
   
       If  this bit is set, a dollar metacharacter in the pattern matches only       If this bit is set, a dollar metacharacter in the pattern matches  only
       at the end of the subject string. Without this option,  a  dollar  also       at  the  end  of the subject string. Without this option, a dollar also
       matches  immediately before a newline at the end of the string (but not       matches immediately before a newline at the end of the string (but  not
       before any other newlines). The PCRE_DOLLAR_ENDONLY option  is  ignored       before  any  other newlines). The PCRE_DOLLAR_ENDONLY option is ignored
       if  PCRE_MULTILINE  is  set.   There is no equivalent to this option in       if PCRE_MULTILINE is set.  There is no equivalent  to  this  option  in
        Perl, and no way to set it within a pattern.         Perl, and no way to set it within a pattern.
   
          PCRE_DOTALL           PCRE_DOTALL
   
       If this bit is set, a dot metacharacter in the pattern matches a  char-       If  this bit is set, a dot metacharacter in the pattern matches char-
        acter of any value, including one that indicates a newline. However, it         acter of any value, including one that indicates a newline. However, it
       only ever matches one character, even if newlines are  coded  as  CRLF.       only  ever  matches  one character, even if newlines are coded as CRLF.
       Without  this option, a dot does not match when the current position is       Without this option, a dot does not match when the current position  is
        at a newline. This option is equivalent to Perl's /s option, and it can         at a newline. This option is equivalent to Perl's /s option, and it can
       be  changed within a pattern by a (?s) option setting. A negative class       be changed within a pattern by a (?s) option setting. A negative  class
        such as [^a] always matches newline characters, independent of the set-         such as [^a] always matches newline characters, independent of the set-
        ting of this option.         ting of this option.
   
          PCRE_DUPNAMES           PCRE_DUPNAMES
   
       If  this  bit is set, names used to identify capturing subpatterns need       If this bit is set, names used to identify capturing  subpatterns  need
        not be unique. This can be helpful for certain types of pattern when it         not be unique. This can be helpful for certain types of pattern when it
       is  known  that  only  one instance of the named subpattern can ever be       is known that only one instance of the named  subpattern  can  ever  be
       matched. There are more details of named subpatterns  below;  see  also       matched.  There  are  more details of named subpatterns below; see also
        the pcrepattern documentation.         the pcrepattern documentation.
   
          PCRE_EXTENDED           PCRE_EXTENDED
   
       If  this  bit  is  set,  white space data characters in the pattern are       If this bit is set, most white space  characters  in  the  pattern  are
       totally ignored except when escaped or inside a character class.  White       totally  ignored  except when escaped or inside a character class. How-
       space does not include the VT character (code 11). In addition, charac-       ever, white space is not allowed within  sequences  such  as  (?>  that
       ters between an unescaped # outside a character class and the next new-       introduce  various  parenthesized  subpatterns,  nor within a numerical
       line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x       quantifier such as {1,3}.  However, ignorable white space is  permitted
       option, and it can be changed within a pattern by a  (?x)  option  set-       between an item and a following quantifier and between a quantifier and
       ting.       a following + that indicates possessiveness.
   
       Which  characters  are  interpreted  as  newlines  is controlled by the       White space did not used to include the VT character (code 11), because
       options passed to pcre_compile() or by a special sequence at the  start       Perl did not treat this character as white space. However, Perl changed
       of  the  pattern, as described in the section entitled "Newline conven-       at release 5.18, so PCRE followed  at  release  8.34,  and  VT  is  now
        treated as white space.
 
        PCRE_EXTENDED  also  causes characters between an unescaped # outside a
        character class  and  the  next  newline,  inclusive,  to  be  ignored.
        PCRE_EXTENDED  is equivalent to Perl's /x option, and it can be changed
        within a pattern by a (?x) option setting.
 
        Which characters are interpreted  as  newlines  is  controlled  by  the
        options  passed to pcre_compile() or by a special sequence at the start
        of the pattern, as described in the section entitled  "Newline  conven-
        tions" in the pcrepattern documentation. Note that the end of this type         tions" in the pcrepattern documentation. Note that the end of this type
       of  comment  is  a  literal  newline  sequence  in  the pattern; escape       of comment is  a  literal  newline  sequence  in  the  pattern;  escape
        sequences that happen to represent a newline do not count.         sequences that happen to represent a newline do not count.
   
       This option makes it possible to include  comments  inside  complicated       This  option  makes  it possible to include comments inside complicated
       patterns.   Note,  however,  that this applies only to data characters.       patterns.  Note, however, that this applies only  to  data  characters.
       White space  characters  may  never  appear  within  special  character       White  space  characters  may  never  appear  within  special character
        sequences in a pattern, for example within the sequence (?( that intro-         sequences in a pattern, for example within the sequence (?( that intro-
        duces a conditional subpattern.         duces a conditional subpattern.
   
          PCRE_EXTRA           PCRE_EXTRA
   
       This option was invented in order to turn on  additional  functionality       This  option  was invented in order to turn on additional functionality
       of  PCRE  that  is  incompatible with Perl, but it is currently of very       of PCRE that is incompatible with Perl, but it  is  currently  of  very
       little use. When set, any backslash in a pattern that is followed by  a       little  use. When set, any backslash in a pattern that is followed by a
       letter  that  has  no  special  meaning causes an error, thus reserving       letter that has no special meaning  causes  an  error,  thus  reserving
       these combinations for future expansion. By  default,  as  in  Perl,  a       these  combinations  for  future  expansion.  By default, as in Perl, a
       backslash  followed by a letter with no special meaning is treated as a       backslash followed by a letter with no special meaning is treated as  a
        literal. (Perl can, however, be persuaded to give an error for this, by         literal. (Perl can, however, be persuaded to give an error for this, by
       running  it with the -w option.) There are at present no other features       running it with the -w option.) There are at present no other  features
       controlled by this option. It can also be set by a (?X) option  setting       controlled  by this option. It can also be set by a (?X) option setting
        within a pattern.         within a pattern.
   
          PCRE_FIRSTLINE           PCRE_FIRSTLINE
   
       If  this  option  is  set,  an  unanchored pattern is required to match       If this option is set, an  unanchored  pattern  is  required  to  match
       before or at the first  newline  in  the  subject  string,  though  the       before  or  at  the  first  newline  in  the subject string, though the
        matched text may continue over the newline.         matched text may continue over the newline.
   
          PCRE_JAVASCRIPT_COMPAT           PCRE_JAVASCRIPT_COMPAT
   
        If this option is set, PCRE's behaviour is changed in some ways so that         If this option is set, PCRE's behaviour is changed in some ways so that
       it is compatible with JavaScript rather than Perl. The changes  are  as       it  is  compatible with JavaScript rather than Perl. The changes are as
        follows:         follows:
   
       (1)  A  lone  closing square bracket in a pattern causes a compile-time       (1) A lone closing square bracket in a pattern  causes  a  compile-time
       error, because this is illegal in JavaScript (by default it is  treated       error,  because this is illegal in JavaScript (by default it is treated
        as a data character). Thus, the pattern AB]CD becomes illegal when this         as a data character). Thus, the pattern AB]CD becomes illegal when this
        option is set.         option is set.
   
       (2) At run time, a back reference to an unset subpattern group  matches       (2)  At run time, a back reference to an unset subpattern group matches
       an  empty  string (by default this causes the current matching alterna-       an empty string (by default this causes the current  matching  alterna-
       tive to fail). A pattern such as (\1)(a) succeeds when this  option  is       tive  to  fail). A pattern such as (\1)(a) succeeds when this option is
       set  (assuming  it can find an "a" in the subject), whereas it fails by       set (assuming it can find an "a" in the subject), whereas it  fails  by
        default, for Perl compatibility.         default, for Perl compatibility.
   
        (3) \U matches an upper case "U" character; by default \U causes a com-         (3) \U matches an upper case "U" character; by default \U causes a com-
        pile time error (Perl uses \U to upper case subsequent characters).         pile time error (Perl uses \U to upper case subsequent characters).
   
        (4) \u matches a lower case "u" character unless it is followed by four         (4) \u matches a lower case "u" character unless it is followed by four
       hexadecimal digits, in which case the hexadecimal  number  defines  the       hexadecimal  digits,  in  which case the hexadecimal number defines the
       code  point  to match. By default, \u causes a compile time error (Perl       code point to match. By default, \u causes a compile time  error  (Perl
        uses it to upper case the following character).         uses it to upper case the following character).
   
       (5) \x matches a lower case "x" character unless it is followed by  two       (5)  \x matches a lower case "x" character unless it is followed by two
       hexadecimal  digits,  in  which case the hexadecimal number defines the       hexadecimal digits, in which case the hexadecimal  number  defines  the
       code point to match. By default, as in Perl, a  hexadecimal  number  is       code  point  to  match. By default, as in Perl, a hexadecimal number is
        always expected after \x, but it may have zero, one, or two digits (so,         always expected after \x, but it may have zero, one, or two digits (so,
        for example, \xz matches a binary zero character followed by z).         for example, \xz matches a binary zero character followed by z).
   
          PCRE_MULTILINE           PCRE_MULTILINE
   
       By default, for the purposes of matching "start of line"  and  "end  of       By  default,  f       By  default,  for  the purposes of matching "start of line" and "end of
        line", PCRE treats the subject string as consisting of a single line of         line", PCRE treats the subject string as consisting of a single line of
       characters, even if it actually contains newlines. The "start of  line"       characters,  even if it actually contains newlines. The "start of line"
        metacharacter (^) matches only at the start of the string, and the "end         metacharacter (^) matches only at the start of the string, and the "end
       of line" metacharacter ($) matches only at the end of  the  string,  or       of  line"  metacharacter  ($) matches only at the end of the string, or
       before  a terminating newline (except when PCRE_DOLLAR_ENDONLY is set).       before a terminating newline (except when PCRE_DOLLAR_ENDONLY is  set).
       Note, however, that unless PCRE_DOTALL  is  set,  the  "any  character"       Note,  however,  that  unless  PCRE_DOTALL  is set, the "any character"
       metacharacter  (.)  does not match at a newline. This behaviour (for ^,       metacharacter (.) does not match at a newline. This behaviour  (for  ^,
        $, and dot) is the same as Perl.         $, and dot) is the same as Perl.
   
       When PCRE_MULTILINE it is set, the "start of line" and  "end  of  line"       When  PCRE_MULTILINE  it  is set, the "start of line" and "end of line"
       constructs  match  immediately following or immediately before internal       constructs match immediately following or immediately  before  internal
       newlines in the subject string, respectively, as well as  at  the  very       newlines  in  the  subject string, respectively, as well as at the very
       start  and  end.  This is equivalent to Perl's /m option, and it can be       start and end. This is equivalent to Perl's /m option, and  it  can  be
        changed within a pattern by a (?m) option setting. If there are no new-         changed within a pattern by a (?m) option setting. If there are no new-
       lines  in  a  subject string, or no occurrences of ^ or $ in a pattern,       lines in a subject string, or no occurrences of ^ or $  in  a  pattern,
        setting PCRE_MULTILINE has no effect.         setting PCRE_MULTILINE has no effect.
   
          PCRE_NEVER_UTF           PCRE_NEVER_UTF
   
        This option locks out interpretation of the pattern as UTF-8 (or UTF-16         This option locks out interpretation of the pattern as UTF-8 (or UTF-16
       or  UTF-32  in the 16-bit and 32-bit libraries). In particular, it pre-       or UTF-32 in the 16-bit and 32-bit libraries). In particular,  it  pre-
       vents the creator of the pattern from switching to  UTF  interpretation       vents  the  creator of the pattern from switching to UTF interpretation
        by starting the pattern with (*UTF). This may be useful in applications         by starting the pattern with (*UTF). This may be useful in applications
        that  process  patterns  from  external  sources.  The  combination  of         that  process  patterns  from  external  sources.  The  combination  of
        PCRE_UTF8 and PCRE_NEVER_UTF also causes an error.         PCRE_UTF8 and PCRE_NEVER_UTF also causes an error.
Line 2243  COMPILING A PATTERN Line 2268  COMPILING A PATTERN
          PCRE_NEWLINE_ANYCRLF           PCRE_NEWLINE_ANYCRLF
          PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
   
       These  options  override the default newline definition that was chosen       These options override the default newline definition that  was  chosen
       when PCRE was built. Setting the first or the second specifies  that  a       when  PCRE  was built. Setting the first or the second specifies that a
       newline  is  indicated  by a single character (CR or LF, respectively).       newline is indicated by a single character (CR  or  LF,  respectively).
       Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the       Setting  PCRE_NEWLINE_CRLF specifies that a newline is indicated by the
       two-character  CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF specifies       two-character CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF  specifies
        that any of the three preceding sequences should be recognized. Setting         that any of the three preceding sequences should be recognized. Setting
       PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be       PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should  be
        recognized.         recognized.
   
       In an ASCII/Unicode environment, the Unicode newline sequences are  the       In  an ASCII/Unicode environment, the Unicode newline sequences are the
       three  just  mentioned,  plus  the  single characters VT (vertical tab,       three just mentioned, plus the  single  characters  VT  (vertical  tab,
        U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line sep-         U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line sep-
       arator,  U+2028),  and  PS (paragraph separator, U+2029). For the 8-bit       arator, U+2028), and PS (paragraph separator, U+2029).  For  the  8-bit
        library, the last two are recognized only in UTF-8 mode.         library, the last two are recognized only in UTF-8 mode.
   
       When PCRE is compiled to run in an EBCDIC (mainframe) environment,  the       When  PCRE is compiled to run in an EBCDIC (mainframe) environment, the
        code for CR is 0x0d, the same as ASCII. However, the character code for         code for CR is 0x0d, the same as ASCII. However, the character code for
       LF is normally 0x15, though in some EBCDIC environments 0x25  is  used.       LF  is  normally 0x15, though in some EBCDIC environments 0x25 is used.
       Whichever  of  these  is  not LF is made to correspond to Unicode's NEL       Whichever of these is not LF is made to  correspond  to  Unicode's  NEL
       character. EBCDIC codes are all less than 256. For  more  details,  see       character.  EBCDIC  codes  are all less than 256. For more details, see
        the pcrebuild documentation.         the pcrebuild documentation.
   
       The  newline  setting  in  the  options  word  uses three bits that are       The newline setting in the  options  word  uses  three  bits  that  are
        treated as a number, giving eight possibilities. Currently only six are         treated as a number, giving eight possibilities. Currently only six are
       used  (default  plus the five values above). This means that if you set       used (default plus the five values above). This means that if  you  set
       more than one newline option, the combination may or may not be  sensi-       more  than one newline option, the combination may or may not be sensi-
        ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to         ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
       PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers  and       PCRE_NEWLINE_CRLF,  but other combinations may yield unused numbers and
        cause an error.         cause an error.
   
       The  only  time  that a line break in a pattern is specially recognized       The only time that a line break in a pattern  is  specially  recognized
       when compiling is when PCRE_EXTENDED is set. CR and LF are white  space       when  compiling is when PCRE_EXTENDED is set. CR and LF are white space
       characters,  and so are ignored in this mode. Also, an unescaped # out-       characters, and so are ignored in this mode. Also, an unescaped #  out-
       side a character class indicates a comment that lasts until  after  the       side  a  character class indicates a comment that lasts until after the
       next  line break sequence. In other circumstances, line break sequences       next line break sequence. In other circumstances, line break  sequences
        in patterns are treated as literal data.         in patterns are treated as literal data.
   
        The newline option that is set at compile time becomes the default that         The newline option that is set at compile time becomes the default that
Line 2286  COMPILING A PATTERN Line 2311  COMPILING A PATTERN
          PCRE_NO_AUTO_CAPTURE           PCRE_NO_AUTO_CAPTURE
   
        If this option is set, it disables the use of numbered capturing paren-         If this option is set, it disables the use of numbered capturing paren-
       theses in the pattern. Any opening parenthesis that is not followed  by       theses  in the pattern. Any opening parenthesis that is not followed by
       ?  behaves as if it were followed by ?: but named parentheses can still       ? behaves as if it were followed by ?: but named parentheses can  still
       be used for capturing (and they acquire  numbers  in  the  usual  way).       be  used  for  capturing  (and  they acquire numbers in the usual way).
        There is no equivalent of this option in Perl.         There is no equivalent of this option in Perl.
   
            PCRE_NO_AUTO_POSSESS
   
          If this option is set, it disables "auto-possessification". This is  an
          optimization  that,  for example, turns a+b into a++b in order to avoid
          backtracks into a+ that can never be successful. However,  if  callouts
          are  in  use,  auto-possessification  means that some of them are never
          taken. You can set this option if you want the matching functions to do
          a  full  unoptimized  search and run all the callouts, but it is mainly
          provided for testing purposes.
   
          PCRE_NO_START_OPTIMIZE           PCRE_NO_START_OPTIMIZE
   
       This  is an option that acts at matching time; that is, it is really an       This is an option that acts at matching time; that is, it is really  an
       option for pcre_exec() or pcre_dfa_exec(). If  it  is  set  at  compile       option  for  pcre_exec()  or  pcre_dfa_exec().  If it is set at compile
       time,  it is remembered with the compiled pattern and assumed at match-       time, it is remembered with the compiled pattern and assumed at  match-
       ing time. This is necessary if you want to use JIT  execution,  because       ing  time.  This is necessary if you want to use JIT execution, because
       the  JIT  compiler needs to know whether or not this option is set. For       the JIT compiler needs to know whether or not this option is  set.  For
        details see the discussion of PCRE_NO_START_OPTIMIZE below.         details see the discussion of PCRE_NO_START_OPTIMIZE below.
   
          PCRE_UCP           PCRE_UCP
   
       This option changes the way PCRE processes \B, \b, \D, \d, \S, \s,  \W,       This  option changes the way PCRE processes \B, \b, \D, \d, \S, \s, \W,
       \w,  and  some  of  the POSIX character classes. By default, only ASCII       \w, and some of the POSIX character classes.  By  default,  only  ASCII
       characters are recognized, but if PCRE_UCP is set,  Unicode  properties       characters  are  recognized, but if PCRE_UCP is set, Unicode properties
       are  used instead to classify characters. More details are given in the       are used instead to classify characters. More details are given in  the
       section on generic character types in the pcrepattern page. If you  set       section  on generic character types in the pcrepattern page. If you set
       PCRE_UCP,  matching  one of the items it affects takes much longer. The       PCRE_UCP, matching one of the items it affects takes much  longer.  The
       option is available only if PCRE has been compiled with  Unicode  prop-       option  is  available only if PCRE has been compiled with Unicode prop-
        erty support.         erty support.
   
          PCRE_UNGREEDY           PCRE_UNGREEDY
   
       This  option  inverts  the "greediness" of the quantifiers so that they       This option inverts the "greediness" of the quantifiers  so  that  they
       are not greedy by default, but become greedy if followed by "?". It  is       are  not greedy by default, but become greedy if followed by "?". It is
       not  compatible  with Perl. It can also be set by a (?U) option setting       not compatible with Perl. It can also be set by a (?U)  option  setting
        within the pattern.         within the pattern.
   
          PCRE_UTF8           PCRE_UTF8
   
       This option causes PCRE to regard both the pattern and the  subject  as       This  option  causes PCRE to regard both the pattern and the subject as
        strings of UTF-8 characters instead of single-byte strings. However, it         strings of UTF-8 characters instead of single-byte strings. However, it
       is available only when PCRE is built to include UTF  support.  If  not,       is  available  only  when PCRE is built to include UTF support. If not,
       the  use  of  this option provokes an error. Details of how this option       the use of this option provokes an error. Details of  how  this  option
        changes the behaviour of PCRE are given in the pcreunicode page.         changes the behaviour of PCRE are given in the pcreunicode page.
   
          PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
   
        When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is         When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
       automatically  checked.  There  is  a  discussion about the validity of       automatically checked. There is a  discussion  about  the  validity  of
       UTF-8 strings in the pcreunicode page. If an invalid UTF-8 sequence  is       UTF-8  strings in the pcreunicode page. If an invalid UTF-8 sequence is
       found,  pcre_compile()  returns an error. If you already know that your       found, pcre_compile() returns an error. If you already know  that  your
       pattern is valid, and you want to skip this check for performance  rea-       pattern  is valid, and you want to skip this check for performance rea-
       sons,  you  can set the PCRE_NO_UTF8_CHECK option.  When it is set, the       sons, you can set the PCRE_NO_UTF8_CHECK option.  When it is  set,  the
        effect of passing an invalid UTF-8 string as a pattern is undefined. It         effect of passing an invalid UTF-8 string as a pattern is undefined. It
       may  cause  your  program  to  crash. Note that this option can also be       may cause your program to crash or loop. Note that this option can also
       passed to pcre_exec() and pcre_dfa_exec(),  to  suppress  the  validity       be  passed to pcre_exec() and pcre_dfa_exec(), to suppress the validity
       checking  of  subject strings only. If the same string is being matched       checking of subject strings only. If the same string is  being  matched
       many times, the option can be safely set for the second and  subsequent       many  times, the option can be safely set for the second and subsequent
        matchings to improve performance.         matchings to improve performance.
   
   
 COMPILATION ERROR CODES  COMPILATION ERROR CODES
   
       The  following  table  lists  the  error  codes than may be returned by       The following table lists the error  codes  than  may  be  returned  by
       pcre_compile2(), along with the error messages that may be returned  by       pcre_compile2(),  along with the error messages that may be returned by
       both  compiling  functions.  Note  that error messages are always 8-bit       both compiling functions. Note that error  messages  are  always  8-bit
       ASCII strings, even in 16-bit or 32-bit mode. As  PCRE  has  developed,       ASCII  strings,  even  in 16-bit or 32-bit mode. As PCRE has developed,
       some  error codes have fallen out of use. To avoid confusion, they have       some error codes have fallen out of use. To avoid confusion, they  have
        not been re-used.         not been re-used.
   
           0  no error            0  no error
Line 2385  COMPILATION ERROR CODES Line 2420  COMPILATION ERROR CODES
          31  POSIX collating elements are not supported           31  POSIX collating elements are not supported
          32  this version of PCRE is compiled without UTF support           32  this version of PCRE is compiled without UTF support
          33  [this code is not in use]           33  [this code is not in use]
         34  character value in \x{...} sequence is too large         34  character value in \x{} or \o{} is too large
          35  invalid condition (?(0)           35  invalid condition (?(0)
          36  \C not allowed in lookbehind assertion           36  \C not allowed in lookbehind assertion
          37  PCRE does not support \L, \l, \N{name}, \U, or \u           37  PCRE does not support \L, \l, \N{name}, \U, or \u
Line 2433  COMPILATION ERROR CODES Line 2468  COMPILATION ERROR CODES
          75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)           75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
          76  character value in \u.... sequence is too large           76  character value in \u.... sequence is too large
          77  invalid UTF-32 string (specifically UTF-32)           77  invalid UTF-32 string (specifically UTF-32)
            78  setting UTF is disabled by the application
            79  non-hex character in \x{} (closing brace missing?)
            80  non-octal character in \o{} (closing brace missing?)
            81  missing opening brace after \o
            82  parentheses are too deeply nested
            83  invalid range in character class
   
       The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different       The  numbers  32  and 10000 in errors 48 and 49 are defaults       The  numbers  32  and 10000 in errors 48 and 49 are defaults
        values may be used if the limits were changed when PCRE was built.         values may be used if the limits were changed when PCRE was built.
   
   
 STUDYING A PATTERN  STUDYING A PATTERN
   
       pcre_extra *pcre_study(const pcre *code, int options       pcre_extra *pcre_study(const pcre *code, int options,
             const char **errptr);              const char **errptr);
   
       If  a  compiled  pattern is going to be used several times, it is worth       If a compiled pattern is going to be used several times,  it  is  worth
        spending more time analyzing it in order to speed up the time taken for         spending more time analyzing it in order to speed up the time taken for
       matching.  The function pcre_study() takes a pointer to a compiled pat-       matching. The function pcre_study() takes a pointer to a compiled  pat-
        tern as its first argument. If studying the pattern produces additional         tern as its first argument. If studying the pattern produces additional
       information  that  will  help speed up matching, pcre_study() returns a       information that will help speed up matching,  pcre_study()  returns  a
       pointer to a pcre_extra block, in which the study_data field points  to       pointer  to a pcre_extra block, in which the study_data field points to
        the results of the study.         the results of the study.
   
        The  returned  value  from  pcre_study()  can  be  passed  directly  to         The  returned  value  from  pcre_study()  can  be  passed  directly  to
       pcre_exec() or pcre_dfa_exec(). However, a pcre_extra block  also  con-       pcre_exec()  or  pcre_dfa_exec(). However, a pcre_extra block also con-
       tains  other  fields  that can be set by the caller before the block is       tains other fields that can be set by the caller before  the  block  is
        passed; these are described below in the section on matching a pattern.         passed; these are described below in the section on matching a pattern.
   
       If studying the  pattern  does  not  produce  any  useful  information,       If  studying  the  pattern  does  not  produce  any useful information,
       pcre_study()  returns  NULL  by  default.  In that circumstance, if the       pcre_study() returns NULL by default.  In  that  circumstance,  if  the
        calling program wants to pass any of the other fields to pcre_exec() or         calling program wants to pass any of the other fields to pcre_exec() or
       pcre_dfa_exec(),  it  must set up its own pcre_extra block. However, if       pcre_dfa_exec(), it must set up its own pcre_extra block.  However,  if
       pcre_study() is called  with  the  PCRE_STUDY_EXTRA_NEEDED  option,  it       pcre_study()  is  called  with  the  PCRE_STUDY_EXTRA_NEEDED option, it
        returns a pcre_extra block even if studying did not find any additional         returns a pcre_extra block even if studying did not find any additional
       information. It may still return NULL, however, if an error  occurs  in       information.  It  may still return NULL, however, if an error occurs in
        pcre_study().         pcre_study().
   
       The  second  argument  of  pcre_study() contains option bits. There are       The second argument of pcre_study() contains  option  bits.  There  are
        three further options in addition to PCRE_STUDY_EXTRA_NEEDED:         three further options in addition to PCRE_STUDY_EXTRA_NEEDED:
   
          PCRE_STUDY_JIT_COMPILE           PCRE_STUDY_JIT_COMPILE
          PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE           PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
          PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE           PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
   
       If any of these are set, and the just-in-time  compiler  is  available,       If  any  of  these are set, and the just-in-time compiler is available,
       the  pattern  is  further compiled into machine code that executes much       the pattern is further compiled into machine code  that  executes  much
       faster than the pcre_exec()  interpretive  matching  function.  If  the       faster  than  the  pcre_exec()  interpretive  matching function. If the
       just-in-time  compiler is not available, these options are ignored. All       just-in-time compiler is not available, these options are ignored.  All
        undefined bits in the options argument must be zero.         undefined bits in the options argument must be zero.
   
       JIT compilation is a heavyweight optimization. It can  take  some  time       JIT  compilation  is  a heavyweight optimization. It can take some time
       for  patterns  to  be analyzed, and for one-off matches and simple pat-       for patterns to be analyzed, and for one-off matches  and  simple  pat-
       terns the benefit of faster execution might be offset by a much  slower       terns  the benefit of faster execution might be offset by a much slower
        study time.  Not all patterns can be optimized by the JIT compiler. For         study time.  Not all patterns can be optimized by the JIT compiler. For
       those that cannot be handled, matching automatically falls back to  the       those  that cannot be handled, matching automatically falls back to the
       pcre_exec()  interpreter.  For more details, see the pcrejit documenta-       pcre_exec() interpreter. For more details, see the  pcrejit  documenta-
        tion.         tion.
   
       The third argument for pcre_study() is a pointer for an error  message.       The  third argument for pcre_study() is a pointer for an error message.
       If  studying  succeeds  (even  if no data is returned), the variable it       If studying succeeds (even if no data is  returned),  the  variable  it
       points to is set to NULL. Otherwise it is set to  point  to  a  textual       points  to  is  set  to NULL. Otherwise it is set to point to a textual
        error message. This is a static string that is part of the library. You         error message. This is a static string that is part of the library. You
       must not try to free it. You should test the  error  pointer  for  NULL       must  not  try  to  free it. You should test the error pointer for NULL
        after calling pcre_study(), to be sure that it has run successfully.         after calling pcre_study(), to be sure that it has run successfully.
   
       When  you are finished with a pattern, you can free the memory used for       When you are finished with a pattern, you can free the memory used  for
        the study data by calling pcre_free_study(). This function was added to         the study data by calling pcre_free_study(). This function was added to
       the  API  for  release  8.20. For earlier versions, the memory could be       the API for release 8.20. For earlier versions,  the  memory  could  be
       freed with pcre_free(), just like the pattern itself. This  will  still       freed  with  pcre_free(), just like the pattern itself. This will still
       work  in  cases where JIT optimization is not used, but it is advisable       work in cases where JIT optimization is not used, but it  is  advisable
        to change to the new function when convenient.         to change to the new function when convenient.
   
       This is a typical way in which pcre_study() is used (except that  in  a       This  is  a typical way in which pcre_study() is used (except that in a
        real application there should be tests for errors):         real application there should be tests for errors):
   
          int rc;           int rc;
Line 2520  STUDYING A PATTERN Line 2561  STUDYING A PATTERN
        Studying a pattern does two things: first, a lower bound for the length         Studying a pattern does two things: first, a lower bound for the length
        of subject string that is needed to match the pattern is computed. This         of subject string that is needed to match the pattern is computed. This
        does not mean that there are any strings of that length that match, but         does not mean that there are any strings of that length that match, but
       it does guarantee that no shorter strings match. The value is  used  to       it  does  guarantee that no shorter strings match. The value        it  does  guarantee that no shorter strings match. The value is used to
        avoid wasting time by trying to match strings that are shorter than the         avoid wasting time by trying to match strings that are shorter than the
       lower bound. You can find out the value in a calling  program  via  the       lower  bound.  You  can find out the value in a calling program via the
        pcre_fullinfo() function.         pcre_fullinfo() function.
   
        Studying a pattern is also useful for non-anchored patterns that do not         Studying a pattern is also useful for non-anchored patterns that do not
       have a single fixed starting character. A bitmap of  possible  starting       have  a  single fixed starting character. A bitmap of possible starting
       bytes  is  created. This speeds up finding a position in the subject at       bytes is created. This speeds up finding a position in the  subject  at
        which to start matching. (In 16-bit mode, the bitmap is used for 16-bit         which to start matching. (In 16-bit mode, the bitmap is used for 16-bit
       values  less  than  256.  In 32-bit mode, the bitmap is used for 32-bit       values less than 256.  In 32-bit mode, the bitmap is  used  for  32-bit
        values less than 256.)         values less than 256.)
   
       These two optimizations apply to both pcre_exec() and  pcre_dfa_exec(),       These  two optimizations apply to both pcre_exec() and pcre_dfa_exec(),
       and  the  information  is also used by the JIT compiler.  The optimiza-       and the information is also used by the JIT  compiler.   The  optimiza-
       tions can be disabled by  setting  the  PCRE_NO_START_OPTIMIZE  option.       tions  can  be  disabled  by setting the PCRE_NO_START_OPTIMIZE option.
       You  might want to do this if your pattern contains callouts or (*MARK)       You might want to do this if your pattern contains callouts or  (*MARK)
       and you want to make use of these facilities in  cases  where  matching       and  you  want  to make use of these facilities in cases where matching
        fails.         fails.
   
       PCRE_NO_START_OPTIMIZE  can be specified at either compile time or exe-       PCRE_NO_START_OPTIMIZE can be specified at either compile time or  exe-
       cution  time.  However,  if   PCRE_NO_START_OPTIMIZE   is   passed   to       cution   time.   However,   if   PCRE_NO_START_OPTIMIZE  is  passed  to
        pcre_exec(), (that is, after any JIT compilation has happened) JIT exe-         pcre_exec(), (that is, after any JIT compilation has happened) JIT exe-
       cution is disabled. For JIT execution to work with  PCRE_NO_START_OPTI-       cution  is disabled. For JIT execution to work with PCRE_NO_START_OPTI-
        MIZE, the option must be set at compile time.         MIZE, the option must be set at compile time.
   
        There is a longer discussion of PCRE_NO_START_OPTIMIZE below.         There is a longer discussion of PCRE_NO_START_OPTIMIZE below.
Line 2550  STUDYING A PATTERN Line 2591  STUDYING A PATTERN
   
 LOCALE SUPPORT  LOCALE SUPPORT
   
       PCRE  handles  caseless matching, and determines whether characters are       PCRE handles caseless matching, and determines whether  characters  are
       letters, digits, or whatever, by reference to a set of tables,  indexed       letters,  digits, or whatever, by reference to a set of tables, indexed
       by  character  value.  When running in UTF-8 mode, this applies only to       by character code point. When running in UTF-8 mode, or in the  16-  or
       characters with codes less than 128. By  default,  higher-valued  codes       32-bit libraries, this applies only to characters with code points less
       never match escapes such as \w or \d, but they can be tested with \p if       than 256. By default, higher-valued code  points  never  match  escapes
       PCRE is built with Unicode character property  support.  Alternatively,       such  as \w or \d. However, if PCRE is built with Unicode property sup-
       the  PCRE_UCP  option  can  be  set at compile time; this causes \w and       port, all characters can be tested with \p and \P,  or,  alternatively,
       friends to use Unicode property support instead of built-in tables. The       the  PCRE_UCP option can be set when a pattern is compiled; this causes
       use of locales with Unicode is discouraged. If you are handling charac-       \w and friends to use Unicode property support instead of the  built-in
       ters with codes greater than 128, you should either use UTF-8 and  Uni-       tables.
       code, or use locales, but not try to mix the two. 
   
          The  use  of  locales  with Unicode is discouraged. If you are handling
          characters with code points greater than 128,  you  should  either  use
          Unicode support, or use locales, but not try to mix the two.
   
        PCRE  contains  an  internal set of tables that are used when the final         PCRE  contains  an  internal set of tables that are used when the final
        argument of pcre_compile() is  NULL.  These  are  sufficient  for  many         argument of pcre_compile() is  NULL.  These  are  sufficient  for  many
        applications.  Normally, the internal tables recognize only ASCII char-         applications.  Normally, the internal tables recognize only ASCII char-
Line 2576  LOCALE SUPPORT Line 2620  LOCALE SUPPORT
   
        External  tables  are  built by calling the pcre_maketables() function,         External  tables  are  built by calling the pcre_maketables() function,
        which has no arguments, in the relevant locale. The result can then  be         which has no arguments, in the relevant locale. The result can then  be
       passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For       passed  to  pcre_compile() as often as necessary. For example, to build
       example, to build and use tables that are appropriate  for  the  French       and use tables that  are  appropriate  for  the  French  locale  (where
       locale  (where  accented  characters  with  values greater than 128 are       accented  characters  with  values greater than 128 are treated as let-
       treated as letters), the following code could be used:       ters), the following code could be used:
   
          setlocale(LC_CTYPE, "fr_FR");           setlocale(LC_CTYPE, "fr_FR");
          tables = pcre_maketables();           tables = pcre_maketables();
Line 2595  LOCALE SUPPORT Line 2639  LOCALE SUPPORT
   
        The pointer that is passed to pcre_compile() is saved with the compiled         The pointer that is passed to pcre_compile() is saved with the compiled
        pattern,  and the same tables are used via this pointer by pcre_study()         pattern,  and the same tables are used via this pointer by pcre_study()
       and normally also by pcre_exec(). Thus, by default, for any single pat-       and also by pcre_exec() and pcre_dfa_exec(). Thus, for any single  pat-
        tern, compilation, studying and matching all happen in the same locale,         tern, compilation, studying and matching all happen in the same locale,
       but different patterns can be compiled in different locales.       but different patterns can be processed in different locales.
   
        It is possible to pass a table pointer or NULL (indicating the  use  of         It is possible to pass a table pointer or NULL (indicating the  use  of
       the  internal  tables)  to  pcre_exec(). Although not intended for this       the internal tables) to pcre_exec() or pcre_dfa_exec() (see the discus-
       purpose, this facility could be used to match a pattern in a  different       sion below in the section on matching a pattern). This facility is pro-
       locale from the one in which it was compiled. Passing table pointers at       vided  for  use  with  pre-compiled  patterns  that have been saved and
       run time is discussed below in the section on matching a pattern.       reloaded.  Character tables are not saved with patterns, so if  a  non-
        standard table was used at compile time, it must be provided again when
        the reloaded pattern is matched. Attempting to  use  this  facility  to
        match a pattern in a different locale from the one in which it was com-
        piled is likely to lead to anomalous (usually incorrect) results.
   
   
 INFORMATION ABOUT A PATTERN  INFORMATION ABOUT A PATTERN
Line 2744  INFORMATION ABOUT A PATTERN Line 2792  INFORMATION ABOUT A PATTERN
        /^a\dz\d/ the returned value is -1.         /^a\dz\d/ the returned value is -1.
   
        Since for the 32-bit library using the non-UTF-32 mode,  this  function         Since for the 32-bit library using the non-UTF-32 mode,  this  function
       is  unable to return the full 32-bit range of the character, this value       is  unable to return the full 32-bit range of characters, this value is
       is   deprecated;   instead    the    PCRE_INFO_REQUIREDCHARFLAGS    and       deprecated;     instead     the     PCRE_INFO_REQUIREDCHARFLAGS     and
        PCRE_INFO_REQUIREDCHAR values should be used.         PCRE_INFO_REQUIREDCHAR values should be used.
   
            PCRE_INFO_MATCH_EMPTY
   
          Return  1  if  the  pattern can match an empty string, otherwise 0. The
          fourth argument should point to an int variable.
   
          PCRE_INFO_MATCHLIMIT           PCRE_INFO_MATCHLIMIT
   
       If  the  pattern  set  a  match  limit by including an item of the form       If the pattern set a match limit by  including  an  item  of  the  form
       (*LIMIT_MATCH=nnnn) at the start, the value  is  returned.  The  fourth       (*LIMIT_MATCH=nnnn)  at  the  start,  the value is returned. The fourth
       argument  should  point to an unsigned 32-bit integer. If no such value       argument should point to an unsigned 32-bit integer. If no  such  value
       has  been  set,  the  call  to  pcre_fullinfo()   returns   the   error       has   been   set,   the  call  to  pcre_fullinfo()  returns  the  error
        PCRE_ERROR_UNSET.         PCRE_ERROR_UNSET.
   
          PCRE_INFO_MAXLOOKBEHIND           PCRE_INFO_MAXLOOKBEHIND
   
       Return  the  number  of  characters  (NB not data units) in the longest       Return the number of characters (NB not  data  units)  in  the  longest
       lookbehind assertion in the pattern. This information  is  useful  when       lookbehind  assertion  in  the pattern. This information is useful when
       doing  multi-segment  matching  using  the partial matching facilities.       doing multi-segment matching using  the  partial  matching  facilities.
        Note that the simple assertions \b and \B require a one-character look-         Note that the simple assertions \b and \B require a one-character look-
       behind.  \A  also  registers a one-character lookbehind, though it does       behind. \A also registers a one-character lookbehind,  though  it  does
       not actually inspect the previous character. This is to ensure that  at       not  actually inspect the previous character. This is to ensure that at
        least one character from the old segment is retained when a new segment         least one character from the old segment is retained when a new segment
        is processed. Otherwise, if there are no lookbehinds in the pattern, \A         is processed. Otherwise, if there are no lookbehinds in the pattern, \A
        might match incorrectly at the start of a new segment.         might match incorrectly at the start of a new segment.
   
          PCRE_INFO_MINLENGTH           PCRE_INFO_MINLENGTH
   
       If  the  pattern  was studied and a minimum length for matching subject       If the pattern was studied and a minimum length  for  matching  subject
       strings was computed, its value is  returned.  Otherwise  the  returned       strings  was  computed,  its  value is returned. Otherwise the returned
        value is -1. The value is a number of characters, which in UTF mode may         value is -1. The value is a number of characters, which in UTF mode may
       be different from the number of data units. The fourth argument  should       be  different from the number of data units. The fourth argument should
       point  to an int variable. A non-negative value is a lower bound to the       point to an int variable. A non-negative value is a lower bound to  the
       length of any matching string. There may not be  any  strings  of  that       length  of  any  matching  string. There may not be any strings of that
       length  that  do actually match, but every string that does match is at       length that do actually match, but every string that does match  is  at
        least that long.         least that long.
   
          PCRE_INFO_NAMECOUNT           PCRE_INFO_NAMECOUNT
          PCRE_INFO_NAMEENTRYSIZE           PCRE_INFO_NAMEENTRYSIZE
          PCRE_INFO_NAMETABLE           PCRE_INFO_NAMETABLE
   
       PCRE supports the use of named as well as numbered capturing  parenthe-       PCRE  supports the use of named as well as numbered capturing parenthe-
       ses.  The names are just an additional way of identifying the parenthe-       ses. The names are just an additional way of identifying the  parenthe-
        ses, which still acquire numbers. Several convenience functions such as         ses, which still acquire numbers. Several convenience functions such as
       pcre_get_named_substring()  are  provided  for extracting captured sub-       pcre_get_named_substring() are provided for  extracting  captured  sub-
       strings by name. It is also possible to extract the data  directly,  by       strings  by  name. It is also possible to extract the data directly, by
       first  converting  the  name to a number in order to access the correct       first converting the name to a number in order to  access  the  correct
        pointers in the output vector (described with pcre_exec() below). To do         pointers in the output vector (described with pcre_exec() below). To do
       the  conversion,  you  need  to  use  the  name-to-number map, which is       the conversion, you need  to  use  the  name-to-number  map,  which  is
        described by these three values.         described by these three values.
   
        The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
        gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
       of each entry; both of these  return  an  int  value.  The  entry  size       of  each  entry;  both  of  these  return  an int value. The entry size
       depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns       depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
        a pointer to the first entry of the table. This is a pointer to char in         a pointer to the first entry of the table. This is a pointer to char in
        the 8-bit library, where the first two bytes of each entry are the num-         the 8-bit library, where the first two bytes of each entry are the num-
       ber of the capturing parenthesis, most significant byte first.  In  the       ber  of  the capturing parenthesis, most significant byte first. In the
       16-bit  library,  the pointer points to 16-bit data units, the first of       16-bit library, the pointer points to 16-bit data units, the  first  of
       which contains the parenthesis  number.  In  the  32-bit  library,  the       which  contains  the  parenthesis  number.  In  the 32-bit library, the
       pointer  points  to  32-bit data units, the first of which contains the       pointer points to 32-bit data units, the first of  which  contains  the
       parenthesis number. The rest of the entry is  the  corresponding  name,       parenthesis  number.  The  rest of the entry is the corresponding name,
        zero terminated.         zero terminated.
   
       The  names are in alphabetical order. Duplicate names may appear if (?|       The names are in alphabetical order. If (?| is used to create  multiple
       is used to create multiple groups with the same number, as described in       groups  with  the same number, as described in the section on duplicate
       the  section  on  duplicate subpattern numbers in the pcrepattern page.       subpattern numbers in the pcrepattern page, the groups may be given the
       Duplicate names for subpatterns with different  numbers  are  permitted       same  name,  but  there is only one entry in the table. Different names
       only  if  PCRE_DUPNAMES  is  set. In all cases of duplicate names, they       for groups of the same number are not permitted.  Duplicate  names  for
       appear in the table in the order in which they were found in  the  pat-       subpatterns with different numbers are permitted, but only if PCRE_DUP-
       tern.  In  the  absence  of (?| this is the order of increasing number;       NAMES is set. They appear in the table in the order in which they  were
       when (?| is used this is not necessarily the case because later subpat-       found  in  the  pattern.  In  the  absence  of (?| this is the order of
       terns may have lower numbers.       increasing number; when (?| is used this is not  necessarily  the  case
        because later subpatterns may have lower numbers.
   
        As  a  simple  example of the name/number table, consider the following         As  a  simple  example of the name/number table, consider the following
        pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is         pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is
Line 2924  INFORMATION ABOUT A PATTERN Line 2978  INFORMATION ABOUT A PATTERN
   
          PCRE_INFO_FIRSTCHARACTER           PCRE_INFO_FIRSTCHARACTER
   
       Return  the  fixed  first character value, if PCRE_INFO_FIRSTCHARACTER-       Return   the  fixed  first  character  value  in  the  situation  where
       FLAGS returned 1; otherwise returns 0. The fourth argument should point       PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; otherwise return 0. The fourth
       to an uint_t variable.       argument should point to an uint_t variable.
   
        In  the 8-bit library, the value is always less than 256. In the 16-bit         In  the 8-bit library, the value is always less than 256. In the 16-bit
        library the value can be up to 0xffff. In the 32-bit library in  UTF-32         library the value can be up to 0xffff. In the 32-bit library in  UTF-32
        mode  the  value  can  be up to 0x10ffff, and up to 0xffffffff when not         mode  the  value  can  be up to 0x10ffff, and up to 0xffffffff when not
        using UTF-32 mode.         using UTF-32 mode.
   
        If there is no fixed first value, and if either  
   
        (a) the pattern was compiled with the PCRE_MULTILINE option, and  every  
        branch starts with "^", or  
   
        (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not  
        set (if it were set, the pattern would be anchored),  
   
        -1 is returned, indicating that the pattern matches only at  the  start  
        of  a  subject string or after any newline within the string. Otherwise  
        -2 is returned. For anchored patterns, -2 is returned.  
   
          PCRE_INFO_REQUIREDCHARFLAGS           PCRE_INFO_REQUIREDCHARFLAGS
   
        Returns 1 if there is a rightmost literal data unit that must exist  in         Returns 1 if there is a rightmost literal data unit that must exist  in
Line 3133  MATCHING A PATTERN: THE TRADITIONAL FUNCTION Line 3175  MATCHING A PATTERN: THE TRADITIONAL FUNCTION
        The callout_data field is used in conjunction with the  "callout"  fea-         The callout_data field is used in conjunction with the  "callout"  fea-
        ture, and is described in the pcrecallout documentation.         ture, and is described in the pcrecallout documentation.
   
       The  tables  field  is  used  to  pass  a  character  tables pointer to       The  tables field is provided for use with patterns that have been pre-
       pcre_exec(); this overrides the value that is stored with the  compiled       compiled using custom character tables, saved to disc or elsewhere, and
       pattern.  A  non-NULL value is stored with the compiled pattern only if       then  reloaded,  because the tables that were used to compile a pattern
       custom tables were supplied to pcre_compile() via  its  tableptr  argu-       are not saved with it. See the pcreprecompile documentation for a  dis-
       ment.  If NULL is passed to pcre_exec() using this mechanism, it forces       cussion  of  saving  compiled patterns for later use. If NULL is passed
       PCRE's internal tables to be used. This facility is  helpful  when  re-       using this mechanism, it forces PCRE's internal tables to be used.
       using  patterns  that  have been saved after compiling with an external 
       set of tables, because the external tables  might  be  at  a  different 
       address  when  pcre_exec() is called. See the pcreprecompile documenta- 
       tion for a discussion of saving compiled patterns for later use. 
   
          Warning: The tables that pcre_exec() uses must be  the  same  as  those
          that  were used when the pattern was compiled. If this is not the case,
          the behaviour of pcre_exec() is undefined. Therefore, when a pattern is
          compiled  and  matched  in the same process, this field should never be
          set. In this (the most common) case, the correct table pointer is auto-
          matically  passed  with  the  compiled  pattern  from pcre_compile() to
          pcre_exec().
   
        If PCRE_EXTRA_MARK is set in the flags field, the mark  field  must  be         If PCRE_EXTRA_MARK is set in the flags field, the mark  field  must  be
        set  to point to a suitable variable. If the pattern contains any back-         set  to point to a suitable variable. If the pattern contains any back-
        tracking control verbs such as (*MARK:NAME), and the execution ends  up         tracking control verbs such as (*MARK:NAME), and the execution ends  up
Line 3351  MATCHING A PATTERN: THE TRADITIONAL FUNCTION Line 3397  MATCHING A PATTERN: THE TRADITIONAL FUNCTION
        points  to  the  start of a character (or the end of the subject). When         points  to  the  start of a character (or the end of the subject). When
        PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a         PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a
        subject  or  an invalid value of startoffset is undefined. Your program         subject  or  an invalid value of startoffset is undefined. Your program
       may crash.       may crash or loop.
   
          PCRE_PARTIAL_HARD           PCRE_PARTIAL_HARD
          PCRE_PARTIAL_SOFT           PCRE_PARTIAL_SOFT
Line 4131  MATCHING A PATTERN: THE ALTERNATIVE FUNCTION Line 4177  MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
        filled  with  the  longest matches. Unlike pcre_exec(), pcre_dfa_exec()         filled  with  the  longest matches. Unlike pcre_exec(), pcre_dfa_exec()
        can use the entire ovector for returning matched strings.         can use the entire ovector for returning matched strings.
   
          NOTE: PCRE's "auto-possessification" optimization  usually  applies  to
          character  repeats at the end of a pattern (as well as internally). For
          example, the pattern "a\d+" is compiled as if it were  "a\d++"  because
          there is no point even considering the possibility of backtracking into
          the repeated digits. For DFA matching, this means that only one  possi-
          ble  match  is  found.  If  you really do want multiple matches in such
          cases,  either  use  an  ungreedy   repeat   ("a\d+?")   or   set   the
          PCRE_NO_AUTO_POSSESS option when compiling.
   
    Error returns from pcre_dfa_exec()     Error returns from pcre_dfa_exec()
   
       The pcre_dfa_exec() function returns a negative number when  it  fails.       The  pcre_dfa_exec()  function returns a negative number when it fails.
       Many  of  the  errors  are  the  same as for pcre_exec(), and these are       Many of the errors are the same  as  for  pcre_exec(),  and  these  are
       described above.  There are in addition the following errors  that  are       described  above.   There are in addition the following errors that are
        specific to pcre_dfa_exec():         specific to pcre_dfa_exec():
   
          PCRE_ERROR_DFA_UITEM      (-16)           PCRE_ERROR_DFA_UITEM      (-16)
   
       This  return is given if pcre_dfa_exec() encounters an item in the pat-       This return is given if pcre_dfa_exec() encounters an item in the  pat-
       tern that it does not support, for instance, the use of \C  or  a  back       tern  that  it  does not support, for instance, the use of \C or a back
        reference.         reference.
   
          PCRE_ERROR_DFA_UCOND      (-17)           PCRE_ERROR_DFA_UCOND      (-17)
   
       This  return  is  given  if pcre_dfa_exec() encounters a condition item       This return is given if pcre_dfa_exec()  encounters  a  condition  item
       that uses a back reference for the condition, or a test  for  recursion       that  uses  a back reference for the condition, or a test for recursion
        in a specific group. These are not supported.         in a specific group. These are not supported.
   
          PCRE_ERROR_DFA_UMLIMIT    (-18)           PCRE_ERROR_DFA_UMLIMIT    (-18)
   
       This  return  is given if pcre_dfa_exec() is called with an extra block       This return is given if pcre_dfa_exec() is called with an  extra  block
       that contains a setting of  the  match_limit  or  match_limit_recursion       that  contains  a  setting  of the match_limit or match_limit_recursion
       fields.  This  is  not  supported (these fields are meaningless for DFA       fields. This is not supported (these fields  are  meaningless  for  DFA
        matching).         matching).
   
          PCRE_ERROR_DFA_WSSIZE     (-19)           PCRE_ERROR_DFA_WSSIZE     (-19)
   
       This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the       This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
        workspace vector.         workspace vector.
   
          PCRE_ERROR_DFA_RECURSE    (-20)           PCRE_ERROR_DFA_RECURSE    (-20)
   
       When  a  recursive subpattern is processed, the matching function calls       When a recursive subpattern is processed, the matching  function  calls
       itself recursively, using private vectors for  ovector  and  workspace.       itself  recursively,  using  private vectors for ovector and workspace.
       This  error  is  given  if  the output vector is not large enough. This       This error is given if the output vector  is  not  large  enough.  This
        should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
   
          PCRE_ERROR_DFA_BADRESTART (-30)           PCRE_ERROR_DFA_BADRESTART (-30)
   
       When pcre_dfa_exec() is called with the PCRE_DFA_RESTART  option,  some       When  pcre_dfa_exec()  is called with the PCRE_DFA_RESTART option, some
       plausibility  checks  are  made on the contents of the workspace, which       plausibility checks are made on the contents of  the  workspace,  which
       should contain data about the previous partial match. If any  of  these       should  contain  data about the previous partial match. If any of these
        checks fail, this error is given.         checks fail, this error is given.
   
   
 SEE ALSO  SEE ALSO
   
       pcre16(3),   pcre32(3),  pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),       pcre16(3),  pcre32(3),  pcrebuild(3),  pcrecallout(3),   pcrecpp(3)(3),
        pcrematching(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcre-         pcrematching(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcre-
        sample(3), pcrestack(3).         sample(3), pcrestack(3).
   
Line 4193  AUTHOR Line 4248  AUTHOR
   
 REVISION  REVISION
   
       Last updated: 12 May 2013       Last updated: 12 November 2013
        Copyright (c) 1997-2013 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
 ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
   
Line 4256  DESCRIPTION Line 4311  DESCRIPTION
        independent groups).         independent groups).
   
        Automatic callouts can be used for tracking  the  progress  of  pattern         Automatic callouts can be used for tracking  the  progress  of  pattern
       matching.  The pcretest command has an option that sets automatic call-       matching.   The pcretest program has a pattern qualifier (/C) that sets
       outs; when it is used, the output indicates how the pattern is matched.       automatic callouts; when it is used, the output indicates how the  pat-
       This  is useful information when you are trying to optimize the perfor-       tern  is  being matched. This is useful information when you are trying
       mance of a particular pattern.       to optimize the performance of a particular pattern.
   
   
 MISSING CALLOUTS  MISSING CALLOUTS
   
       You should be aware that, because of  optimizations  in  the  way  PCRE       You should be aware that, because of optimizations in the way PCRE com-
       matches  patterns  by  default,  callouts  sometimes do not happen. For       piles and matches patterns, callouts sometimes do not happen exactly as
       example, if the pattern is       you might expect.
   
          At compile time, PCRE "auto-possessifies" repeated items when it  knows
          that  what follows cannot be part of the repeat. For example, a+[bc] is
          compiled as if it were a++[bc]. The pcretest output when  this  pattern
          is  anchored  and  then  applied  with automatic callouts to the string
          "aaaa" is:
   
            --->aaaa
             +0 ^        ^
             +1 ^        a+
             +3 ^   ^    [bc]
            No match
   
          This indicates that when matching [bc] fails, there is no  backtracking
          into  a+  and  therefore the callouts that would be taken for the back-
          tracks do not occur.  You can disable the  auto-possessify  feature  by
          passing PCRE_NO_AUTO_POSSESS to pcre_compile(), or starting the pattern
          with (*NO_AUTO_POSSESS). If this is done  in  pcretest  (using  the  /O
          qualifier), the output changes to this:
   
            --->aaaa
             +0 ^        ^
             +1 ^        a+
             +3 ^   ^    [bc]
             +3 ^  ^     [bc]
             +3 ^ ^      [bc]
             +3 ^^       [bc]
            No match
   
          This time, when matching [bc] fails, the matcher backtracks into a+ and
          tries again, repeatedly, until a+ itself fails.
   
          Other optimizations that provide fast "no match"  results  also  affect
          callouts.  For example, if the pattern is
   
          ab(?C4)cd           ab(?C4)cd
   
        PCRE knows that any matching string must contain the letter "d". If the         PCRE knows that any matching string must contain the letter "d". If the
       subject  string  is "abyz", the lack of "d" means that matching doesn't       subject string is "abyz", the lack of "d" means that  matching  doesn't
       ever start, and the callout is never  reached.  However,  with  "abyd",       ever  start,  and  the  callout is never reached. However, with "abyd",
        though the result is still no match, the callout is obeyed.         though the result is still no match, the callout is obeyed.
   
       If  the pattern is studied, PCRE knows the minimum length of a matching       If the pattern is studied, PCRE knows the minimum length of a  matching
       string, and will immediately give a "no match" return without  actually       string,  and will immediately give a "no match" return without actually
       running  a  match if the subject is not long enough, or, for unanchored       running a match if the subject is not long enough, or,  for  unanchored
        patterns, if it has been scanned far enough.         patterns, if it has been scanned far enough.
   
       You can disable these optimizations by passing the  PCRE_NO_START_OPTI-       You  can disable these optimizations by passing the PCRE_NO_START_OPTI-
       MIZE  option  to the matching function, or by starting the pattern with       MIZE option to the matching function, or by starting the  pattern  with
       (*NO_START_OPT). This slows down the matching process, but does  ensure       (*NO_START_OPT).  This slows down the matching process, but does ensure
        that callouts such as the example above are obeyed.         that callouts such as the example above are obeyed.
   
   
 THE CALLOUT INTERFACE  THE CALLOUT INTERFACE
   
       During  matching, when PCRE reaches a callout point, the external func-       During matching, when PCRE reaches a callout point, the external  func-
        tion defined by pcre_callout or pcre[16|32]_callout is called (if it is         tion defined by pcre_callout or pcre[16|32]_callout is called (if it is
       set).  This  applies to both normal and DFA matching. The only argument       set). This applies to both normal and DFA matching. The  only  argument
       to  the  callout  function  is  a  pointer   to   a   pcre_callout   or       to   the   callout   function   is  a  pointer  to  a  pcre_callout  or
       pcre[16|32]_callout  block.   These  structures  contains the following       pcre[16|32]_callout block.  These  structures  contains  the  following
        fields:         fields:
   
          int           version;           int           version;
Line 4313  THE CALLOUT INTERFACE Line 4402  THE CALLOUT INTERFACE
          const PCRE_UCHAR16  *mark;       (16-bit version)           const PCRE_UCHAR16  *mark;       (16-bit version)
          const PCRE_UCHAR32  *mark;       (32-bit version)           const PCRE_UCHAR32  *mark;       (32-bit version)
   
       The version field is an integer containing the version  number  of  the       The  version  field  is an integer containing the version number of the
       block  format. The initial version was 0; the current version is 2. The       block format. The initial version was 0; the current version is 2.  The
       version number will change again in future  if  additional  fields  are       version  number  will  change  again in future if additional fields are
        added, but the intention is never to remove any of the existing fields.         added, but the intention is never to remove any of the existing fields.
   
       The  callout_number  field  contains the number of the callout, as com-       The callout_number field contains the number of the  callout,  as  com-
       piled into the pattern (that is, the number after ?C for  manual  call-       piled  into  the pattern (that is, the number after ?C for manual call-
        outs, and 255 for automatically generated callouts).         outs, and 255 for automatically generated callouts).
   
       The  offset_vector field is a pointer to the vector of offsets that was       The offset_vector field is a pointer to the vector of offsets that  was
       passed by the caller to the  matching  function.  When  pcre_exec()  or       passed  by  the  caller  to  the matching function. When pcre_exec() or
       pcre[16|32]_exec()  is used, the contents can be inspected, in order to       pcre[16|32]_exec() is used, the contents can be inspected, in order  to
       extract substrings that have been matched so far, in the  same  way  as       extract  substrings  that  have been matched so far, in the same way as
       for  extracting  substrings  after  a  match has completed. For the DFA       for extracting substrings after a match  has  completed.  For  the  DFA
        matching functions, this field is not useful.         matching functions, this field is not useful.
   
        The subject and subject_length fields contain copies of the values that         The subject and subject_length fields contain copies of the values that
        were passed to the matching function.         were passed to the matching function.
   
       The  start_match  field normally contains the offset within the subject       The start_match field normally contains the offset within  the  subject
       at which the current match attempt  started.  However,  if  the  escape       at  which  the  current  match  attempt started. However, if the escape
       sequence  \K has been encountered, this value is changed to reflect the       sequence \K has been encountered, this value is changed to reflect  the
       modified starting point. If the pattern is not  anchored,  the  callout       modified  starting  point.  If the pattern is not anchored, the callout
        function may be called several times from the same point in the pattern         function may be called several times from the same point in the pattern
        for different starting points in the subject.         for different starting points in the subject.
   
       The current_position field contains the offset within  the  subject  of       The  current_position  fie       The  current_position  fie
        the current match pointer.         the current match pointer.
   
       When  the  pcre_exec()  or  pcre[16|32]_exec() is used, the capture_top       When the pcre_exec() or pcre[16|32]_exec()  is  used,  the  capture_top
       field contains one more than the number of the  highest  numbered  cap-       field  contains  one  more than the number of the highest numbered cap-
       tured  substring so far. If no substrings have been captured, the value       tured substring so far. If no substrings have been captured, the  value
       of capture_top is one. This is always the case when the  DFA  functions       of  capture_top  is one. This is always the case when the DFA functions
        are used, because they do not support captured substrings.         are used, because they do not support captured substrings.
   
       The  capture_last  field  contains the number of the most recently cap-       The capture_last field contains the number of the  most  recently  cap-
       tured substring. However, when a recursion exits, the value reverts  to       tured  substring. However, when a recursion exits, the value reverts to
       what  it  was  outside  the recursion, as do the values of all captured       what it was outside the recursion, as do the  values  of  all  captured
       substrings. If no substrings have been  captured,  the  value  of  cap-       substrings.  If  no  substrings  have  been captured, the value of cap-
       ture_last  is  -1.  This  is always the case for the DFA matching func-       ture_last is -1. This is always the case for  the  DFA  matching  func-
        tions.         tions.
   
       The callout_data field contains a value that is passed  to  a  matching       The  callout_data  field  contains a value that is passed to a matching
       function  specifically so that it can be passed back in callouts. It is       function specifically so that it can be passed back in callouts. It  is
       passed in the callout_data field of a pcre_extra  or  pcre[16|32]_extra       passed  in  the callout_data field of a pcre_extra or pcre[16|32]_extra
       data  structure.  If no such data was passed, the value of callout_data       data structure. If no such data was passed, the value  of  callout_data
       in a callout block is NULL. There is a description  of  the  pcre_extra       in  a  callout  block is NULL. There is a description of the pcre_extra
        structure in the pcreapi documentation.         structure in the pcreapi documentation.
   
       The  pattern_position  field  is  present from version 1 of the callout       The pattern_position field is present from version  1  of  the  callout
        structure. It contains the offset to the next item to be matched in the         structure. It contains the offset to the next item to be matched in the
        pattern string.         pattern string.
   
       The  next_item_length  field  is  present from version 1 of the callout       The next_item_length field is present from version  1  of  the  callout
        structure. It contains the length of the next item to be matched in the         structure. It contains the length of the next item to be matched in the
       pattern  string.  When  the callout immediately precedes an alternation       pattern string. When the callout immediately  precedes  an  alternation
       bar, a closing parenthesis, or the end of the pattern,  the  length  is       bar,  a  closing  parenthesis, or the end of the pattern, the length is
       zero.  When  the callout precedes an opening parenthesis, the length is       zero. When the callout precedes an opening parenthesis, the  length  is
        that of the entire subpattern.         that of the entire subpattern.
   
       The pattern_position and next_item_length fields are intended  to  help       The  pattern_position  and next_item_length fields are intended to help
       in  distinguishing between different automatic callouts, which all have       in distinguishing between different automatic callouts, which all  have
        the same callout number. However, they are set for all callouts.         the same callout number. However, they are set for all callouts.
   
       The mark field is present from version 2 of the callout  structure.  In       The  mark  field is present from version 2 of the callout structure. In
       callouts  from  pcre_exec() or pcre[16|32]_exec() it contains a pointer       callouts from pcre_exec() or pcre[16|32]_exec() it contains  a  pointer
       to the zero-terminated  name  of  the  most  recently  passed  (*MARK),       to  the  zero-terminated  name  of  the  most  recently passed (*MARK),
       (*PRUNE),  or  (*THEN) item in the match, or NULL if no such items have       (*PRUNE), or (*THEN) item in the match, or NULL if no such  items  have
       been passed. Instances of (*PRUNE) or (*THEN) without  a  name  do  not       been  passed.  Instances  of  (*PRUNE) or (*THEN) without a name do not
       obliterate  a previous (*MARK). In callouts from the DFA matching func-       obliterate a previous (*MARK). In callouts from the DFA matching  func-
        tions this field always contains NULL.         tions this field always contains NULL.
   
   
 RETURN VALUES  RETURN VALUES
   
       The external callout function returns an integer to PCRE. If the  value       The  external callout function returns an integer to PCRE. If the value
       is  zero,  matching  proceeds  as  normal. If the value is greater than       is zero, matching proceeds as normal. If  the  value  is  greater  than
       zero, matching fails at the current point, but  the  testing  of  other       zero,  matching  fails  at  the current point, but the testing of other
        matching possibilities goes ahead, just as if a lookahead assertion had         matching possibilities goes ahead, just as if a lookahead assertion had
       failed. If the value is less than zero, the  match  is  abandoned,  the       failed.  If  the  value  is less than zero, the match is abandoned, the
        matching function returns the negative value.         matching function returns the negative value.
   
       Negative   values   should   normally   be   chosen  from  the  set  of       Negative  values  should  normally  be   chosen   from   the   set   of
        PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan-         PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan-
       dard  "no  match"  failure.   The  error  number  PCRE_ERROR_CALLOUT is       dard "no  match"  failure.   The  error  number  PCRE_ERROR_CALLOUT  is
       reserved for use by callout functions; it will never be  used  by  PCRE       reserved  for  use  by callout functions; it will never be used by PCRE
        itself.         itself.
   
   
Line 4411  AUTHOR Line 4500  AUTHOR
   
 REVISION  REVISION
   
       Last updated: 03 March 2013       Last updated: 12 November 2013
        Copyright (c) 1997-2013 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
 ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
   
Line 4533  DIFFERENCES BETWEEN PCRE AND PERL Line 4622  DIFFERENCES BETWEEN PCRE AND PERL
   
        15. Perl recognizes comments in some places that  PCRE  does  not,  for         15. Perl recognizes comments in some places that  PCRE  does  not,  for
        example,  between  the  ( and ? at the start of a subpattern. If the /x         example,  between  the  ( and ? at the start of a subpattern. If the /x
       modifier is set, Perl allows white space between ( and ? but PCRE never       modifier is set, Perl allows white space between ( and ?  (though  cur-
       does, even if the PCRE_EXTENDED option is set.       rent  Perls  warn that this is deprecated) but PCRE never does, even if
        the PCRE_EXTENDED option is set.
   
       16.  In  PCRE,  the upper/lower case character properties Lu and Ll are       16. Perl, when in warning mode, gives warnings  for  character  classes
        such  as  [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter-
        als. PCRE has no warning features, so it gives an error in these  cases
        because they are almost certainly user mistakes.
 
        17.  In  PCRE,  the upper/lower case character properties Lu and Ll are
        not affected when case-independent matching is specified. For  example,         not affected when case-independent matching is specified. For  example,
        \p{Lu} always matches an upper case letter. I think Perl has changed in         \p{Lu} always matches an upper case letter. I think Perl has changed in
        this respect; in the release at the time of writing (5.16), \p{Lu}  and         this respect; in the release at the time of writing (5.16), \p{Lu}  and
        \p{Ll} match all letters, regardless of case, when case independence is         \p{Ll} match all letters, regardless of case, when case independence is
        specified.         specified.
   
       17. PCRE provides some extensions to the Perl regular expression facil-       18. PCRE provides some extensions to the Perl regular expression facil-
        ities.   Perl  5.10  includes new features that are not in earlier ver-         ities.   Perl  5.10  includes new features that are not in earlier ver-
        sions of Perl, some of which (such as named parentheses) have  been  in         sions of Perl, some of which (such as named parentheses) have  been  in
        PCRE for some time. This list is with respect to Perl 5.10:         PCRE for some time. This list is with respect to Perl 5.10:
Line 4600  AUTHOR Line 4695  AUTHOR
   
 REVISION  REVISION
   
       Last updated: 19 March 2013       Last updated: 10 November 2013
        Copyright (c) 1997-2013 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
 ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
   
Line 4679  SPECIAL START-OF-PATTERN ITEMS Line 4774  SPECIAL START-OF-PATTERN ITEMS
   
    Unicode property support     Unicode property support
   
       Another special sequence that may appear at the start of a pattern is       Another special sequence that may appear at the start of a  pattern  is
        (*UCP).   This  has  the same effect as setting the PCRE_UCP option: it
        causes sequences such as \d and \w to use Unicode properties to  deter-
        mine character types, instead of recognizing only characters with codes
        less than 128 via a lookup table.
   
         (*UCP)   Disabling auto-possessification
   
       This has the same effect as setting  the  PCRE_UCP  option:  it  causes       If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect  as
       sequences  such  as  \d  and  \w to use Unicode properties to determine       setting  the  PCRE_NO_AUTO_POSSESS  option  at compile time. This stops
       character types, instead of recognizing only characters with codes less       PCRE from making quantifiers possessive when what follows cannot  match
       than 128 via a lookup table.       the  repeated item. For example, by default a+b is treated as a++b. For
        more details, see the pcreapi documentation.
   
    Disabling start-up optimizations     Disabling start-up optimizations
   
       If  a  pattern  starts  with (*NO_START_OPT), it has the same effect as       If a pattern starts with (*NO_START_OPT), it has  the  same  effect  as
        setting the PCRE_NO_START_OPTIMIZE option either at compile or matching         setting the PCRE_NO_START_OPTIMIZE option either at compile or matching
       time.       time. This disables several  optimizations  for  quickly  reaching  "no
        match" results. For more details, see the pcreapi documentation.
   
    Newline conventions     Newline conventions
   
Line 4746  SPECIAL START-OF-PATTERN ITEMS Line 4847  SPECIAL START-OF-PATTERN ITEMS
          (*LIMIT_RECURSION=d)           (*LIMIT_RECURSION=d)
   
        where d is any number of decimal digits. However, the value of the set-         where d is any number of decimal digits. However, the value of the set-
       ting must be less than the value set by the caller of  pcre_exec()  for       ting must be less than the value set (or defaulted) by  the  caller  of
       it to have any effect. In other words, the pattern writer can lower the       pcre_exec()  for  it  to  have  any effect. In other words, the pattern
       limit set by the programmer, but not raise it. If there  is  more  than       writer can lower the limits set by the programmer, but not raise  them.
       one setting of one of these limits, the lower value is used.       If  there  is  more  than one setting of one of these limits, the lower
        value is used.
   
   
 EBCDIC CHARACTER CODES  EBCDIC CHARACTER CODES
   
       PCRE  can  be compiled to run in an environment that uses EBCDIC as its       PCRE can be compiled to run in an environment that uses EBCDIC  as  its
        character code rather than ASCII or Unicode (typically a mainframe sys-         character code rather than ASCII or Unicode (typically a mainframe sys-
       tem).  In  the  sections below, character code values are ASCII or Uni-       tem). In the sections below, character code values are  ASCII  or  Uni-
        code; in an EBCDIC environment these characters may have different code         code; in an EBCDIC environment these characters may have different code
        values, and there are no code points greater than 255.         values, and there are no code points greater than 255.
   
   
 CHARACTERS AND METACHARACTERS  CHARACTERS AND METACHARACTERS
   
       A  regular  expression  is  a pattern that is matched against a subject       A regular expression is a pattern that is  matched  against  a  subject
       string from left to right. Most characters stand for  themselves  in  a       string  from  left  to right. Most characters stand for themselves in a
       pattern,  and  match  the corresponding characters in the subject. As a       pattern, and match the corresponding characters in the  subject.  As  a
        trivial example, the pattern         trivial example, the pattern
   
          The quick brown fox           The quick brown fox
   
        matches a portion of a subject string that is identical to itself. When         matches a portion of a subject string that is identical to itself. When
       caseless  matching is specified (the PCRE_CASELESS option), letters are       caseless matching is specified (the PCRE_CASELESS option), letters  are
       matched independently of case. In a UTF mode, PCRE  always  understands       matched  independently  of case. In a UTF mode, PCRE always understands
       the  concept  of case for characters whose values are less than 128, so       the concept of case for characters whose values are less than  128,  so
       caseless matching is always possible. For characters with  higher  val-       caseless  matching  is always possible. For characters with higher val-
       ues,  the concept of case is supported if PCRE is compiled with Unicode       ues, the concept of case is supported if PCRE is compiled with  Unicode
       property support, but not otherwise.   If  you  want  to  use  caseless       property  support,  but  not  otherwise.   If  you want to use caseless
       matching  for  characters  128  and above, you must ensure that PCRE is       matching for characters 128 and above, you must  ensure  that  PCRE  is
        compiled with Unicode property support as well as with UTF support.         compiled with Unicode property support as well as with UTF support.
   
       The power of regular expressions comes  from  the  ability  to  include       The  power  of  regular  expressions  comes from the ability to include
       alternatives  and  repetitions in the pattern. These are encoded in the       alternatives and repetitions in the pattern. These are encoded  in  the
        pattern by the use of metacharacters, which do not stand for themselves         pattern by the use of metacharacters, which do not stand for themselves
        but instead are interpreted in some special way.         but instead are interpreted in some special way.
   
       There  are  two different sets of metacharacters: those that are recog-       There are two different sets of metacharacters: those that  are  recog-
       nized anywhere in the pattern except within square brackets, and  those       nized  anywhere in the pattern except within square brackets, and those
       that  are  recognized  within square brackets. Outside square brackets,       that are recognized within square brackets.  Outside  square  brackets,
        the metacharacters are as follows:         the metacharacters are as follows:
   
          \      general escape character with several uses           \      general escape character with several uses
Line 4806  CHARACTERS AND METACHARACTERS Line 4908  CHARACTERS AND METACHARACTERS
                 also "possessive quantifier"                  also "possessive quantifier"
          {      start min/max quantifier           {      start min/max quantifier
   
       Part of a pattern that is in square brackets  is  called  a  "character       Part  of  a  pattern  that is in square brackets is called a "character
        class". In a character class the only metacharacters are:         class". In a character class the only metacharacters are:
   
          \      general escape character           \      general escape character
Line 4823  BACKSLASH Line 4925  BACKSLASH
   
        The backslash character has several uses. Firstly, if it is followed by         The backslash character has several uses. Firstly, if it is followed by
        a character that is not a number or a letter, it takes away any special         a character that is not a number or a letter, it takes away any special
       meaning  that  character  may  have. This use of backslash as an escape       meaning that character may have. This use of  backslash  as  an  escape
        character applies both inside and outside character classes.         character applies both inside and outside character classes.
   
       For example, if you want to match a * character, you write  \*  in  the       For  example,  if  you want to match a * character, you write \* in the
       pattern.   This  escaping  action  applies whether or not the following       pattern.  This escaping action applies whether  or  not  the  following
       character would otherwise be interpreted as a metacharacter, so  it  is       character  would  otherwise be interpreted as a metacharacter, so it is
       always  safe  to  precede  a non-alphanumeric with backslash to specify       always safe to precede a non-alphanumeric  with  backslash  to  specify
       that it stands for itself. In particular, if you want to match a  back-       that  it stands for itself. In particular, if you want to match a back-
        slash, you write \\.         slash, you write \\.
   
       In  a UTF mode, only ASCII numbers and letters have any special meaning       In a UTF mode, only ASCII numbers and letters have any special  meaning
       after a backslash. All other characters  (in  particular,  those  whose       after  a  backslash.  All  other characters (in particular, those whose
        codepoints are greater than 127) are treated as literals.         codepoints are greater than 127) are treated as literals.
   
       If  a pattern is compiled with the PCRE_EXTENDED option, white space in       If a pattern is compiled with  the  PCRE_EXTENDED  option,  most  white
       the pattern (other than in a character class) and characters between  a       space  in the pattern (other than in a character class), and characters
       # outside a character class and the next newline are ignored. An escap-       between a # outside a character class and the next newline,  inclusive,
       ing backslash can be used to include a white space or  #  character  as       are ignored. An escaping backslash can be used to include a white space
       part of the pattern.       or # character as part of the pattern.
   
       If  you  want  to remove the special meaning from a sequence of charac-       If you want to remove the special meaning from a  sequence  of  charac-
       ters, you can do so by putting them between \Q and \E. This is  differ-       ters,  you can do so by putting them between \Q and \E. This is differ-
       ent  from  Perl  in  that  $  and  @ are handled as literals in \Q...\E       ent from Perl in that $ and  @  are  handled  as  literals  in  \Q...\E
       sequences in PCRE, whereas in Perl, $ and @ cause  variable  interpola-       sequences  in  PCRE, whereas in Perl, $ and @ cause variable interpola-
        tion. Note the following examples:         tion. Note the following examples:
   
          Pattern            PCRE matches   Perl matches           Pattern            PCRE matches   Perl matches
Line 4856  BACKSLASH Line 4958  BACKSLASH
          \Qabc\$xyz\E       abc\$xyz       abc\$xyz           \Qabc\$xyz\E       abc\$xyz       abc\$xyz
          \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz           \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
   
       The  \Q...\E  sequence  is recognized both inside and outside character       The \Q...\E sequence is recognized both inside  and  outside  character
       classes.  An isolated \E that is not preceded by \Q is ignored.  If  \Q       classes.   An  isolated \E that is not preceded by \Q is ignored. If \Q
       is  not followed by \E later in the pattern, the literal interpretation       is not followed by \E later in the pattern, the literal  interpretation
       continues to the end of the pattern (that is,  \E  is  assumed  at  the       continues  to  the  end  of  the pattern (that is, \E is assumed at the
       end).  If  the  isolated \Q is inside a character class, this causes an       end). If the isolated \Q is inside a character class,  this  causes  an
        error, because the character class is not terminated.         error, because the character class is not terminated.
   
    Non-printing characters     Non-printing characters
   
        A second use of backslash provides a way of encoding non-printing char-         A second use of backslash provides a way of encoding non-printing char-
       acters  in patterns in a visible manner. There is no restriction on the       acters in patterns in a visible manner. There is no restriction on  the
       appearance of non-printing characters, apart from the binary zero  that       appearance  of non-printing characters, apart from the binary zero that
       terminates  a  pattern,  but  when  a pattern is being prepared by text       terminates a pattern, but when a pattern  is  being  prepared  by  text
       editing, it is  often  easier  to  use  one  of  the  following  escape       editing,  it  is  often  easier  to  use  one  of  the following escape
        sequences than the binary character it represents:         sequences than the binary character it represents:
   
          \a        alarm, that is, the BEL character (hex 07)           \a        alarm, that is, the BEL character (hex 07)
Line 4879  BACKSLASH Line 4981  BACKSLASH
          \n        linefeed (hex 0A)           \n        linefeed (hex 0A)
          \r        carriage return (hex 0D)           \r        carriage return (hex 0D)
          \t        tab (hex 09)           \t        tab (hex 09)
            \0dd      character with octal code 0dd
          \ddd      character with octal code ddd, or back reference           \ddd      character with octal code ddd, or back reference
            \o{ddd..} character with octal code ddd..
          \xhh      character with hex code hh           \xhh      character with hex code hh
          \x{hhh..} character with hex code hhh.. (non-JavaScript mode)           \x{hhh..} character with hex code hhh.. (non-JavaScript mode)
          \uhhhh    character with hex code hhhh (JavaScript mode only)           \uhhhh    character with hex code hhhh (JavaScript mode only)
   
       The  precise effect of \cx on ASCII characters is as follows: if x is a       The precise effect of \cx on ASCII characters is as follows: if x is  a
       lower case letter, it is converted to upper case. Then  bit  6  of  the       lower  case  letter,  it  is converted to upper case. Then bit 6 of the
        character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A         character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A
       (A is 41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and  \c;  becomes       (A  is  41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and \c; becomes
       hex  7B (; is 3B). If the data item (byte or 16-bit value) following \c       hex 7B (; is 3B). If the data item (byte or 16-bit value) following  \c
       has a value greater than 127, a compile-time error occurs.  This  locks       has  a  value greater than 127, a compile-time error occurs. This locks
        out non-ASCII characters in all modes.         out non-ASCII characters in all modes.
   
       The  \c  facility  was designed for use with ASCII characters, but with       The \c facility was designed for use with ASCII  characters,  but  with
       the extension to Unicode it is even less useful than it  once  was.  It       the  extension  to  Unicode it is even less useful than it once was. It
       is,  however,  recognized  when  PCRE is compiled in EBCDIC mode, where       is, however, recognized when PCRE is compiled  in  EBCDIC  mode,  where
       data items are always bytes. In this mode, all values are  valid  after       data  items  are always bytes. In this mode, all values are valid after
       \c.  If  the  next character is a lower case letter, it is converted to       \c. If the next character is a lower case letter, it  is  converted  to
       upper case. Then the 0xc0 bits of  the  byte  are  inverted.  Thus  \cA       upper  case.  Then  the  0xc0  bits  of the byte are inverted. Thus \cA
       becomes  hex  01, as in ASCII (A is C1), but because the EBCDIC letters       becomes hex 01, as in ASCII (A is C1), but because the  EBCDIC  letters
       are disjoint, \cZ becomes hex 29 (Z is E9), and other  characters  also       are  disjoint,  \cZ becomes hex 29 (Z is E9), and other characters also
        generate different values.         generate different values.
   
       By  default,  after  \x,  from  zero to two hexadecimal digits are read       After \0 up to two further octal digits are read. If  there  are  fewer
       (letters can be in upper or lower case). Any number of hexadecimal dig-       than  two  digits,  just  those  that  are  present  are used. Thus the
       its may appear between \x{ and }, but the character code is constrained 
       as follows: 
 
         8-bit non-UTF mode    less than 0x100 
         8-bit UTF-8 mode      less than 0x10ffff and a valid codepoint 
         16-bit non-UTF mode   less than 0x10000 
         16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint 
         32-bit non-UTF mode   less than 0x80000000 
         32-bit UTF-32 mode    less than 0x10ffff and a valid codepoint 
 
       Invalid Unicode codepoints are the range  0xd800  to  0xdfff  (the  so- 
       called "surrogate" codepoints), and 0xffef. 
 
       If  characters  other than hexadecimal digits appear between \x{ and }, 
       or if there is no terminating }, this form of escape is not recognized. 
       Instead,  the  initial  \x  will  be interpreted as a basic hexadecimal 
       escape, with no following digits, giving a  character  whose  value  is 
       zero. 
 
       If  the  PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x 
       is as just described only when it is followed by two  hexadecimal  dig- 
       its.   Otherwise,  it  matches  a  literal "x" character. In JavaScript 
       mode, support for code points greater than 256 is provided by \u, which 
       must  be  followed  by  four hexadecimal digits; otherwise it matches a 
       literal "u" character.  Character codes specified by \u  in  JavaScript 
       mode  are  constrained in the same was as those specified by \x in non- 
       JavaScript mode. 
 
       Characters whose value is less than 256 can be defined by either of the 
       two  syntaxes for \x (or by \u in JavaScript mode). There is no differ- 
       ence in the way they are handled. For example, \xdc is exactly the same 
       as \x{dc} (or \u00dc in JavaScript mode). 
 
       After  \0  up  to two further octal digits are read. If there are fewer 
       than two digits, just  those  that  are  present  are  used.  Thus  the 
        sequence \0\x\07 specifies two binary zeros followed by a BEL character         sequence \0\x\07 specifies two binary zeros followed by a BEL character
       (code value 7). Make sure you supply two digits after the initial  zero       (code  value 7). Make sure you supply two digits after the initial zero
        if the pattern character that follows is itself an octal digit.         if the pattern character that follows is itself an octal digit.
   
          The escape \o must be followed by a sequence of octal digits,  enclosed
          in  braces.  An  error occurs if this is not the case. This escape is a
          recent addition to Perl; it provides way of specifying  character  code
          points  as  octal  numbers  greater than 0777, and it also allows octal
          numbers and back references to be unambiguously specified.
   
          For greater clarity and unambiguity, it is best to avoid following \ by
          a digit greater than zero. Instead, use \o{} or \x{} to specify charac-
          ter numbers, and \g{} to specify back references. The  following  para-
          graphs describe the old, ambiguous syntax.
   
        The handling of a backslash followed by a digit other than 0 is compli-         The handling of a backslash followed by a digit other than 0 is compli-
       cated.  Outside a character class, PCRE reads it and any following dig-       cated, and Perl has changed in recent releases, causing  PCRE  also  to
       its  as  a  decimal  number. If the number is less than 10, or if there       change. Outside a character class, PCRE reads the digit and any follow-
       have been at least that many previous capturing left parentheses in the       ing digits as a decimal number. If the number is less  than  8,  or  if
       expression,  the  entire  sequence  is  taken  as  a  back reference. A       there  have been at least that many previous capturing left parentheses
       description of how this works is given later, following the  discussion       in the expression, the entire sequence is taken as a back reference.  A
        description  of how this works is given later, following the discussion
        of parenthesized subpatterns.         of parenthesized subpatterns.
   
       Inside  a  character  class, or if the decimal number is greater than 9       Inside a character class, or if  the  decimal  number  following  \  is
       and there have not been that many capturing subpatterns, PCRE  re-reads       greater than 7 and there have not been that many capturing subpatterns,
       up to three octal digits following the backslash, and uses them to gen-       PCRE handles \8 and \9 as the literal characters "8" and "9", and  oth-
       erate a data character. Any subsequent digits stand for themselves. The       erwise re-reads up to three octal digits following the backslash, using
       value  of  the  character  is constrained in the same way as characters       them to generate a data character.  Any  subsequent  digits  stand  for
       specified in hexadecimal.  For example:       themselves. For example:
   
          \040   is another way of writing an ASCII space           \040   is another way of writing an ASCII space
          \40    is the same, provided there are fewer than 40           \40    is the same, provided there are fewer than 40
Line 4970  BACKSLASH Line 5051  BACKSLASH
                    character with octal code 113                     character with octal code 113
          \377   might be a back reference, otherwise           \377   might be a back reference, otherwise
                    the value 255 (decimal)                     the value 255 (decimal)
         \81    is either a back reference, or a binary zero         \81    is either a back reference, or the two
                   followed by the two characters "8" and "1"                   characters "8" and "1"
   
       Note that octal values of 100 or greater must not be  introduced  by  a       Note  that octal values of 100 or greater that are specified using this
       leading zero, because no more than three octal digits are ever read.       syntax must not be introduced by a leading zero, because no  more  than
        three octal digits are ever read.
   
          By  default, after \x that is not followed by {, from zero to two hexa-
          decimal digits are read (letters can be in upper or  lower  case).  Any
          number of hexadecimal digits may appear between \x{ and }. If a charac-
          ter other than a hexadecimal digit appears between \x{  and  },  or  if
          there is no terminating }, an error occurs.
   
          If  the  PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x
          is as just described only when it is followed by two  hexadecimal  dig-
          its.   Otherwise,  it  matches  a  literal "x" character. In JavaScript
          mode, support for code points greater than 256 is provided by \u, which
          must  be  followed  by  four hexadecimal digits; otherwise it matches a
          literal "u" character.
   
          Characters whose value is less than 256 can be defined by either of the
          two  syntaxes for \x (or by \u in JavaScript mode). There is no differ-
          ence in the way they are handled. For example, \xdc is exactly the same
          as \x{dc} (or \u00dc in JavaScript mode).
   
      Constraints on character values
   
          Characters  that  are  specified using octal or hexadecimal numbers are
          limited to certain values, as follows:
   
            8-bit non-UTF mode    less than 0x100
            8-bit UTF-8 mode      less than 0x10ffff and a valid codepoint
            16-bit non-UTF mode   less than 0x10000
            16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint
            32-bit non-UTF mode   less than 0x100000000
            32-bit UTF-32 mode    less than 0x10ffff and a valid codepoint
   
          Invalid Unicode codepoints are the range  0xd800  to  0xdfff  (the  so-
          called "surrogate" codepoints), and 0xffef.
   
      Escape sequences in character classes
   
        All the sequences that define a single character value can be used both         All the sequences that define a single character value can be used both
        inside and outside character classes. In addition, inside  a  character         inside and outside character classes. In addition, inside  a  character
        class, \b is interpreted as the backspace character (hex 08).         class, \b is interpreted as the backspace character (hex 08).
Line 5039  BACKSLASH Line 5156  BACKSLASH
        the subject string, all of them fail, because there is no character  to         the subject string, all of them fail, because there is no character  to
        match.         match.
   
       For  compatibility  with Perl, \s does not match the VT character (code       For  compatibility with Perl, \s did not used to match the VT character
       11).  This makes it different from the the POSIX "space" class. The  \s       (code 11), which made it different from the the  POSIX  "space"  class.
       characters  are  HT  (9), LF (10), FF (12), CR (13), and space (32). If       However,  Perl  added  VT  at  release  5.18, and PCRE followed suit at
       "use locale;" is included in a Perl script, \s may match the VT charac-       release 8.34. The default \s characters are now HT  (9),  LF  (10),  VT
       ter. In PCRE, it never does.       (11),  FF  (12),  CR  (13),  and space (32), which are defined as white
        space in the "C" locale. This list may vary if locale-specific matching
        is  taking place. For example, in some locales the "non-breaking space"
        character (\xA0) is recognized as white space, and  in  others  the  VT
        character is not.
   
        A  "word"  character is an underscore or any character that is a letter         A  "word"  character is an underscore or any character that is a letter
        or digit.  By default, the definition of letters  and  digits  is  con-         or digit.  By default, the definition of letters  and  digits  is  con-
        trolled  by PCRE's low-valued character tables, and may vary if locale-         trolled  by PCRE's low-valued character tables, and may vary if locale-
        specific matching is taking place (see "Locale support" in the  pcreapi         specific matching is taking place (see "Locale support" in the  pcreapi
        page).  For  example,  in  a French locale such as "fr_FR" in Unix-like         page).  For  example,  in  a French locale such as "fr_FR" in Unix-like
       systems, or "french" in Windows, some character codes greater than  128       systems, or "french" in Windows, some character codes greater than  127
        are  used  for  accented letters, and these are then matched by \w. The         are  used  for  accented letters, and these are then matched by \w. The
        use of locales with Unicode is discouraged.         use of locales with Unicode is discouraged.
   
       By default, in a UTF mode, characters  with  values  greater  than  128       By default, characters whose code points are  greater  than  127  never
       never  match  \d,  \s,  or  \w,  and always match \D, \S, and \W. These       match \d, \s, or \w, and always match \D, \S, and \W, although this may
       sequences retain their original meanings from before  UTF  support  was       vary for characters in the range 128-255 when locale-specific  matching
       available,  mainly for efficiency reasons. However, if PCRE is compiled       is  happening.   These  escape sequences retain their original meanings
       with Unicode property support, and the PCRE_UCP option is set, the  be-       from before Unicode support was available, mainly for  efficiency  rea-
       haviour  is  changed  so  that Unicode properties are used to determine       sons.  If  PCRE  is  compiled  with  Unicode  property support, and the
       character types, as follows:       PCRE_UCP option is set, the behaviour is changed so that Unicode  prop-
        erties are used to determine character types, as follows:
   
         \d  any character that \p{Nd} matches (decimal digit)         \d  any character that matches \p{Nd} (decimal digit)
         \s  any character that \p{Z} matches, plus HT, LF, FF, CR         \s  any character that matches \p{Z} or \h or \v
         \w  any character that \p{L} or \p{N} matches, plus underscore         \w  any character that matches \p{L} or \p{N}, plus underscore
   
       The upper case escapes match the inverse sets of characters. Note  that       The  upper case escapes match the inverse sets of characters. Note that
       \d  matches  only decimal digits, whereas \w matches any Unicode digit,       \d matches only decimal digits, whereas \w matches any  Unicode  digit,
       as well as any Unicode letter, and underscore. Note also that  PCRE_UCP       as  well as any Unicode letter, and underscore. Note also that PCRE_UCP
       affects  \b,  and  \B  because  they are defined in terms of \w and \W.       affects \b, and \B because they are defined in  terms  of  \w  and  \W.
        Matching these sequences is noticeably slower when PCRE_UCP is set.         Matching these sequences is noticeably slower when PCRE_UCP is set.
   
       The sequences \h, \H, \v, and \V are features that were added  to  Perl       The  sequences  \h, \H, \v, and \V are features that were added to Perl
       at  release  5.10. In contrast to the other sequences, which match only       at release 5.10. In contrast to the other sequences, which  match  only
       ASCII characters by default, these  always  match  certain  high-valued       ASCII  characters  by  default,  these always match certain high-valued
       codepoints,  whether or not PCRE_UCP is set. The horizontal space char-       code points, whether or not PCRE_UCP is set. The horizontal space char-
        acters are:         acters are:
   
          U+0009     Horizontal tab (HT)           U+0009     Horizontal tab (HT)
Line 5113  BACKSLASH Line 5235  BACKSLASH
   
    Newline sequences     Newline sequences
   
       Outside  a  character class, by default, the escape sequence \R matches       Outside a character class, by default, the escape sequence  \R  matches
       any Unicode newline sequence. In 8-bit non-UTF-8 mode \R is  equivalent       any  Unicode newline sequence. In 8-bit non-UTF-8 mode \R is equivalent
        to the following:         to the following:
   
          (?>\r\n|\n|\x0b|\f|\r|\x85)           (?>\r\n|\n|\x0b|\f|\r|\x85)
   
       This  is  an  example  of an "atomic group", details of which are given       This is an example of an "atomic group", details  of  which  are  given
        below.  This particular group matches either the two-character sequence         below.  This particular group matches either the two-character sequence
       CR  followed  by  LF,  or  one  of  the single characters LF (linefeed,       CR followed by LF, or  one  of  the  single  characters  LF  (linefeed,
       U+000A), VT (vertical tab, U+000B), FF (form feed,  U+000C),  CR  (car-       U+000A),  VT  (vertical  tab, U+000B), FF (form feed, U+000C), CR (car-
       riage  return,  U+000D),  or NEL (next line, U+0085). The two-character       riage return, U+000D), or NEL (next line,  U+0085).  The  two-character
        sequence is treated as a single unit that cannot be split.         sequence is treated as a single unit that cannot be split.
   
       In other modes, two additional characters whose codepoints are  greater       In  other modes, two additional characters whose codepoints argreater
        than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-         than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
       rator, U+2029).  Unicode character property support is not  needed  for       rator,  U+2029).   Unicode character property support is not needed for
        these characters to be recognized.         these characters to be recognized.
   
        It is possible to restrict \R to match only CR, LF, or CRLF (instead of         It is possible to restrict \R to match only CR, LF, or CRLF (instead of
       the complete set  of  Unicode  line  endings)  by  setting  the  option       the  complete  set  of  Unicode  line  endings)  by  setting the option
        PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched.         PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched.
        (BSR is an abbrevation for "backslash R".) This can be made the default         (BSR is an abbrevation for "backslash R".) This can be made the default
       when  PCRE  is  built;  if this is the case, the other behaviour can be       when PCRE is built; if this is the case, the  other  behaviour  can  be
       requested via the PCRE_BSR_UNICODE option.   It  is  also  possible  to       requested  via  the  PCRE_BSR_UNICODE  option.   It is also possible to
       specify  these  settings  by  starting a pattern string with one of the       specify these settings by starting a pattern string  with  one  of  the
        following sequences:         following sequences:
   
          (*BSR_ANYCRLF)   CR, LF, or CRLF only           (*BSR_ANYCRLF)   CR, LF, or CRLF only
          (*BSR_UNICODE)   any Unicode newline sequence           (*BSR_UNICODE)   any Unicode newline sequence
   
        These override the default and the options given to the compiling func-         These override the default and the options given to the compiling func-
       tion,  but  they  can  themselves  be  overridden by options given to a       tion, but they can themselves be  overridden  by  options  given  to  a
       matching function. Note that these  special  settings,  which  are  not       matching  function.  Note  that  these  special settings, which are not
       Perl-compatible,  are  recognized  only at the very start of a pattern,       Perl-compatible, are recognized only at the very start  of  a  pattern,
       and that they must be in upper case.  If  more  than  one  of  them  is       and  that  they  must  be  in  upper  case. If more than one of them is
       present,  the  last  one is used. They can be combined with a change of       present, the last one is used. They can be combined with  a  change  of
        newline convention; for example, a pattern can start with:         newline convention; for example, a pattern can start with:
   
          (*ANY)(*BSR_ANYCRLF)           (*ANY)(*BSR_ANYCRLF)
   
       They can also be combined with the (*UTF8), (*UTF16), (*UTF32),  (*UTF)       They  can also be combined with the (*UTF8), (*UTF16), (*UTF32), (*UTF)
        or (*UCP) special sequences. Inside a character class, \R is treated as         or (*UCP) special sequences. Inside a character class, \R is treated as
       an unrecognized escape sequence, and  so  matches  the  letter  "R"  by       an  unrecognized  escape  sequence,  and  so  matches the lett       an  unrecognized  escape  sequence,  and  so  matches the lett
        default, but causes an error if PCRE_EXTRA is set.         default, but causes an error if PCRE_EXTRA is set.
   
    Unicode character properties     Unicode character properties
   
        When PCRE is built with Unicode character property support, three addi-         When PCRE is built with Unicode character property support, three addi-
       tional escape sequences that match characters with specific  properties       tional  escape sequences that match characters with specific properties
       are  available.   When  in 8-bit non-UTF-8 mode, these sequences are of       are available.  When in 8-bit non-UTF-8 mode, these  sequences  are  of
       course limited to testing characters whose  codepoints  are  less  than       course  limited  to  testing  characters whose codepoints are less than
        256, but they do work in this mode.  The extra escape sequences are:         256, but they do work in this mode.  The extra escape sequences are:
   
          \p{xx}   a character with the xx property           \p{xx}   a character with the xx property
          \P{xx}   a character without the xx property           \P{xx}   a character without the xx property
          \X       a Unicode extended grapheme cluster           \X       a Unicode extended grapheme cluster
   
       The  property  names represented by xx above are limited to the Unicode       The property names represented by xx above are limited to  the  Unicode
        script names, the general category properties, "Any", which matches any         script names, the general category properties, "Any", which matches any
       character   (including  newline),  and  some  special  PCRE  properties       character  (including  newline),  and  some  special  PCRE   properties
       (described in the next section).  Other Perl properties such as  "InMu-       (described  in the next section).  Other Perl properties such as "InMu-
       sicalSymbols"  are  not  currently supported by PCRE. Note that \P{Any}       sicalSymbols" are not currently supported by PCRE.  Note  that  \P{Any}
        does not match any characters, so always causes a match failure.         does not match any characters, so always causes a match failure.
   
        Sets of Unicode characters are defined as belonging to certain scripts.         Sets of Unicode characters are defined as belonging to certain scripts.
       A  character from one of these sets can be matched using a script name.       A character from one of these sets can be matched using a script  name.
        For example:         For example:
   
          \p{Greek}           \p{Greek}
          \P{Han}           \P{Han}
   
       Those that are not part of an identified script are lumped together  as       Those  that are not part of an identified script are lumped together as
        "Common". The current list of scripts is:         "Common". The current list of scripts is:
   
       Arabic,  Armenian,  Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,       Arabic, Armenian, Avestan, Balinese, Bamum, Batak,  Bengali,  Bopomofo,
       Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian,  Chakma,       Brahmi,  Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma,
       Cham,  Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,       Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic,  Deseret,
       Devanagari,  Egyptian_Hieroglyphs,  Ethiopic,   Georgian,   Glagolitic,       Devanagari,   Egyptian_Hieroglyphs,   Ethiopic,  Georgian,  Glagolitic,
       Gothic,  Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-       Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira-
       gana,  Imperial_Aramaic,  Inherited,  Inscriptional_Pahlavi,   Inscrip-       gana,   Imperial_Aramaic,  Inherited,  Inscriptional_Pahlavi,  Inscrip-
       tional_Parthian,   Javanese,   Kaithi,   Kannada,  Katakana,  Kayah_Li,       tional_Parthian,  Javanese,  Kaithi,   Kannada,   Katakana,   Kayah_Li,
       Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B,  Lisu,  Lycian,       Kharoshthi,  Khmer,  Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian,
        Lydian,    Malayalam,    Mandaic,    Meetei_Mayek,    Meroitic_Cursive,         Lydian,    Malayalam,    Mandaic,    Meetei_Mayek,    Meroitic_Cursive,
       Meroitic_Hieroglyphs,  Miao,  Mongolian,  Myanmar,  New_Tai_Lue,   Nko,       Meroitic_Hieroglyphs,   Miao,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,
       Ogham,    Old_Italic,   Old_Persian,   Old_South_Arabian,   Old_Turkic,       Ogham,   Old_Italic,   Old_Persian,   Old_South_Arabian,    Old_Turkic,
       Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic,  Samari-       Ol_Chiki,  Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari-
       tan,  Saurashtra,  Sharada,  Shavian, Sinhala, Sora_Sompeng, Sundanese,       tan, Saurashtra, Sharada, Shavian,  Sinhala,  Sora_Sompeng,  Sundanese,
       Syloti_Nagri, Syriac, Tagalog, Tagbanwa,  Tai_Le,  Tai_Tham,  Tai_Viet,       Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,
       Takri,  Tamil,  Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,       Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh,  Ugaritic,  Vai,
        Yi.         Yi.
   
        Each character has exactly one Unicode general category property, spec-         Each character has exactly one Unicode general category property, spec-
       ified  by a two-letter abbreviation. For compatibility with Perl, nega-       ified by a two-letter abbreviation. For compatibility with Perl,  nega-
       tion can be specified by including a  circumflex  between  the  opening       tion  can  be  specified  by including a circumflex between the opening
       brace  and  the  property  name.  For  example,  \p{^Lu} is the same as       brace and the property name.  For  example,  \p{^Lu}  is  the  same  as
        \P{Lu}.         \P{Lu}.
   
        If only one letter is specified with \p or \P, it includes all the gen-         If only one letter is specified with \p or \P, it includes all the gen-
       eral  category properties that start with that letter. In this case, in       eral category properties that start with that letter. In this case,  in
       the absence of negation, the curly brackets in the escape sequence  are       the  absence of negation, the curly brackets in the escape sequence are
        optional; these two examples have the same effect:         optional; these two examples have the same effect:
   
          \p{L}           \p{L}
Line 5264  BACKSLASH Line 5386  BACKSLASH
          Zp    Paragraph separator           Zp    Paragraph separator
          Zs    Space separator           Zs    Space separator
   
       The  special property L& is also supported: it matches a character that       The special property L& is also supported: it matches a character  that
       has the Lu, Ll, or Lt property, in other words, a letter  that  is  not       has  the  Lu,  Ll, or Lt property, in other words, a letter that is not
        classified as a modifier or "other".         classified as a modifier or "other".
   
       The  Cs  (Surrogate)  property  applies only to characters in the range       The Cs (Surrogate) property applies only to  characters  in  the  range
       U+D800 to U+DFFF. Such characters are not valid in Unicode strings  and       U+D800  to U+DFFF. Such characters are not valid in Unicode strings and
       so  cannot  be  tested  by  PCRE, unless UTF validity checking has been       so cannot be tested by PCRE, unless  UTF  validity  checking  has  been
        turned    off    (see    the    discussion    of    PCRE_NO_UTF8_CHECK,         turned    off    (see    the    discussion    of    PCRE_NO_UTF8_CHECK,
       PCRE_NO_UTF16_CHECK  and PCRE_NO_UTF32_CHECK in the pcreapi page). Perl       PCRE_NO_UTF16_CHECK and PCRE_NO_UTF32_CHECK in the pcreapi page).  Perl
        does not support the Cs property.         does not support the Cs property.
   
       The long synonyms for  property  names  that  Perl  supports  (such  as       The  long  synonyms  for  property  names  that  Perl supports (such as
       \p{Letter})  are  not  supported by PCRE, nor is it permitted to prefix       \p{Letter}) are not supported by PCRE, nor is it  permitted  to  prefix
        any of these properties with "Is".         any of these properties with "Is".
   
        No character that is in the Unicode table has the Cn (unassigned) prop-         No character that is in the Unicode table has the Cn (unassigned) prop-
        erty.  Instead, this property is assumed for any code point that is not         erty.  Instead, this property is assumed for any code point that is not
        in the Unicode table.         in the Unicode table.
   
       Specifying caseless matching does not affect  these  escape  sequences.       Specifying  caseless  matching  does not affect these escape sequences.
       For  example,  \p{Lu}  always  matches only upper case letters. This is       For example, \p{Lu} always matches only upper  case  letters.  This  is
        different from the behaviour of current versions of Perl.         different from the behaviour of current versions of Perl.
   
       Matching characters by Unicode property is not fast, because  PCRE  has       Matching  characters  by Unicode property is not fast, because PCRE has
       to  do  a  multistage table lookup in order to find a character's prop-       to do a multistage table lookup in order to find  a  character's  prop-
        erty. That is why the traditional escape sequences such as \d and \w do         erty. That is why the traditional escape sequences such as \d and \w do
        not use Unicode properties in PCRE by default, though you can make them         not use Unicode properties in PCRE by default, though you can make them
       do so by setting the PCRE_UCP option or by starting  the  pattern  with       do  so  by  setting the PCRE_UCP option or by starting the patter       do  so  by  setting the PCRE_UCP option or by starting the patter
        (*UCP).         (*UCP).
   
    Extended grapheme clusters     Extended grapheme clusters
   
       The  \X  escape  matches  any number of Unicode characters that form an       The \X escape matches any number of Unicode  characters  that  form  an
        "extended grapheme cluster", and treats the sequence as an atomic group         "extended grapheme cluster", and treats the sequence as an atomic group
       (see  below).   Up  to and including release 8.31, PCRE matched an ear-       (see below).  Up to and including release 8.31, PCRE  matched  an  ear-
        lier, simpler definition that was equivalent to         lier, simpler definition that was equivalent to
   
          (?>\PM\pM*)           (?>\PM\pM*)
   
       That is, it matched a character without the "mark"  property,  followed       That  is,  it matched a character without the "mark" property, followed
       by  zero  or  more characters with the "mark" property. Characters with       by zero or more characters with the "mark"  property.  Characters  with
       the "mark" property are typically non-spacing accents that  affect  the       the  "mark"  property are typically non-spacing accents that affect the
        preceding character.         preceding character.
   
       This  simple definition was extended in Unicode to include more compli-       This simple definition was extended in Unicode to include more  compli-
       cated kinds of composite character by giving each character a  grapheme       cated  kinds of composite character by giving each character a grapheme
       breaking  property,  and  creating  rules  that use these properties to       breaking property, and creating rules  that  use  these  properties  to
       define the boundaries of extended grapheme  clusters.  In  releases  of       define  the  boundaries  of  extended grapheme clusters. In releases of
        PCRE later than 8.31, \X matches one of these clusters.         PCRE later than 8.31, \X matches one of these clusters.
   
       \X  always  matches  at least one character. Then it decides whether to       \X always matches at least one character. Then it  decides  whether  to
        add additional characters according to the following rules for ending a         add additional characters according to the following rules for ending a
        cluster:         cluster:
   
        1. End at the end of the subject string.         1. End at the end of the subject string.
   
       2.  Do not end between CR and LF; otherwise end after any control char-       2. Do not end between CR and LF; otherwise end after any control  char-
        acter.         acter.
   
       3. Do not break Hangul (a Korean  script)  syllable  sequences.  Hangul       3.  Do  not  break  Hangul (a Korean script) syllable sequences. Hangul
       characters  are of five types: L, V, T, LV, and LVT. An L character may       characters are of five types: L, V, T, LV, and LVT. An L character  may
       be followed by an L, V, LV, or LVT character; an LV or V character  may       be  followed by an L, V, LV, or LVT character; an LV or V character may
        be followed by a V or T character; an LVT or T character may be follwed         be followed by a V or T character; an LVT or T character may be follwed
        only by a T character.         only by a T character.
   
       4. Do not end before extending characters or spacing marks.  Characters       4.  Do not end before extending characters or spacing marks. Characters
       with  the  "mark"  property  always have the "extend" grapheme breaking       with the "mark" property always have  the  "extend"  grapheme  breaking
        property.         property.
   
        5. Do not end after prepend characters.         5. Do not end after prepend characters.
Line 5339  BACKSLASH Line 5461  BACKSLASH
   
    PCRE's additional properties     PCRE's additional properties
   
       As well as the standard Unicode properties described above,  PCRE  sup-       As  well  as the standard Unicode properties described above, PCRE sup-
       ports  four  more  that  make it possible to convert traditional escape       ports four more that make it possible  to  convert  traditional  escape
       sequences such as \w and \s and POSIX character classes to use  Unicode       sequences  such as \w and \s to use Unicode properties. PCRE uses these
       properties.  PCRE  uses  these non-standard, non-Perl properties inter-       non-standard, non-Perl properties internally when PCRE_UCP is set. How-
       nally when PCRE_UCP is set. However, they may also be used  explicitly. 
       These properties are: 
   
          Xan   Any alphanumeric character           Xan   Any alphanumeric character
          Xps   Any POSIX space character           Xps   Any POSIX space character
Line 5354  BACKSLASH Line 5475  BACKSLASH
        Xan  matches  characters that have either the L (letter) or the N (num-         Xan  matches  characters that have either the L (letter) or the N (num-
        ber) property. Xps matches the characters tab, linefeed, vertical  tab,         ber) property. Xps matches the characters tab, linefeed, vertical  tab,
        form  feed,  or carriage return, and any other character that has the Z         form  feed,  or carriage return, and any other character that has the Z
       (separator) property.  Xsp is the same as Xps, except that vertical tab       (separator) property.  Xsp is the same as Xps; it used to exclude  ver-
       is excluded. Xwd matches the same characters as Xan, plus underscore.       tical  tab,  for Perl compatibility, but Perl changed, and so PCRE fol-
        lowed at release 8.34. Xwd matches the same  characters  as  Xan,  plus
        underscore.
   
        There  is another non-standard property, Xuc, which matches any charac-         There  is another non-standard property, Xuc, which matches any charac-
        ter that can be represented by a Universal Character Name  in  C++  and         ter that can be represented by a Universal Character Name  in  C++  and
Line 5628  SQUARE BRACKETS AND CHARACTER CLASSES Line 5751  SQUARE BRACKETS AND CHARACTER CLASSES
        between d and m, inclusive. If a  minus  character  is  required  in  a         between d and m, inclusive. If a  minus  character  is  required  in  a
        class,  it  must  be  escaped  with a backslash or appear in a position         class,  it  must  be  escaped  with a backslash or appear in a position
        where it cannot be interpreted as indicating a range, typically as  the         where it cannot be interpreted as indicating a range, typically as  the
       first or last character in the class.       first or last character in the class, or immediately after a range. For
        example, [b-d-z] matches letters in the range b to d, a hyphen  charac-
        ter, or z.
   
        It is not possible to have the literal character "]" as the end charac-         It is not possible to have the literal character "]" as the end charac-
        ter of a range. A pattern such as [W-]46] is interpreted as a class  of         ter of a range. A pattern such as [W-]46] is interpreted as a class  of
Line 5639  SQUARE BRACKETS AND CHARACTER CLASSES Line 5764  SQUARE BRACKETS AND CHARACTER CLASSES
        The  octal or hexadecimal representation of "]" can also be used to end         The  octal or hexadecimal representation of "]" can also be used to end
        a range.         a range.
   
       Ranges operate in the collating sequence of character values. They  can       An error is generated if a POSIX character  class  (see  below)  or  an
       also   be  used  for  characters  specified  numerically,  for  example       escape  sequence other than one that defines a single character appears
       [\000-\037]. Ranges can include any characters that are valid  for  the       at a point where a range ending character  is  expected.  For  example,
        [z-\xff] is valid, but [A-\d] and [A-[:digit:]] are not.
 
        Ranges  operate in the collating sequence of character values. They can
        also  be  used  for  characters  specified  numerically,  for   example
        [\000-\037].  Ranges  can include any characters that are valid for the
        current mode.         current mode.
   
        If a range that includes letters is used when caseless matching is set,         If a range that includes letters is used when caseless matching is set,
        it matches the letters in either case. For example, [W-c] is equivalent         it matches the letters in either case. For example, [W-c] is equivalent
       to  [][\\^_`wxyzabc],  matched  caselessly,  and  in a non-UTF mode, if       to [][\\^_`wxyzabc], matched caselessly, and  in  a  non-UTF  mode,  if
       character tables for a French locale are in  use,  [\xc8-\xcb]  matches       character  tables  for  a French locale are in use, [\xc8-\xcb] matches
       accented  E  characters  in both cases. In UTF modes, PCRE supports the       accented E characters in both cases. In UTF modes,  PCRE  supports  the
       concept of case for characters with values greater than 128  only  when       concept  of  case for characters with values greater than 128 only when
        it is compiled with Unicode property support.         it is compiled with Unicode property support.
   
       The  character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, \V,       The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,  \V,
        \w, and \W may appear in a character class, and add the characters that         \w, and \W may appear in a character class, and add the characters that
       they  match to the class. For example, [\dABCDEF] matches any hexadeci-       they match to the class. For example, [\dABCDEF] matches any  hexadeci-
       mal digit. In UTF modes, the PCRE_UCP option affects  the  meanings  of       mal  digit.  In  UTF modes, the PCRE_UCP option affects the meanings of
       \d,  \s,  \w  and  their upper case partners, just as it does when they       \d, \s, \w and their upper case partners, just as  it  does  when  they
       appear outside a character class, as described in the section  entitled       appear  outside a character class, as described in the section entitled
        "Generic character types" above. The escape sequence \b has a different         "Generic character types" above. The escape sequence \b has a different
       meaning inside a character class; it matches the  backspace  character.       meaning  inside  a character class; it matches the backspace character.
       The  sequences  \B,  \N,  \R, and \X are not special inside a character       The sequences \B, \N, \R, and \X are not  special  inside  a  character
       class. Like any other unrecognized escape sequences, they  are  treated       class.  Like  any other unrecognized escape sequences, they are treated
       as  the literal characters "B", "N", "R", and "X" by default, but cause       as the literal characters "B", "N", "R", and "X" by default, but  cause
        an error if the PCRE_EXTRA option is set.         an error if the PCRE_EXTRA option is set.
   
       A circumflex can conveniently be used with  the  upper  case  character       A  circumflex  can  conveniently  be used with the upper case character
       types  to specify a more restricted set of characters than the matching       types to specify a more restricted set of characters than the  matching
       lower case type.  For example, the class [^\W_] matches any  letter  or       lower  case  type.  For example, the class [^\W_] matches any letter or
        digit, but not underscore, whereas [\w] includes underscore. A positive         digit, but not underscore, whereas [\w] includes underscore. A positive
        character class should be read as "something OR something OR ..." and a         character class should be read as "something OR something OR ..." and a
        negative class as "NOT something AND NOT something AND NOT ...".         negative class as "NOT something AND NOT something AND NOT ...".
   
       The  only  metacharacters  that are recognized in character classes are       The only metacharacters that are recognized in  character  classes  are
       backslash, hyphen (only where it can be  interpreted  as  specifying  a       backslash,  hyphen  (only  where  it can be interpreted as specifying a
       range),  circumflex  (only  at the start), opening square bracket (only       range), circumflex (only at the start), opening  square  bracket  (only
       when it can be interpreted as introducing a POSIX class name - see  the       when  it can be interpreted as introducing a POSIX class name, or for a
       next  section),  and  the  terminating closing square bracket. However,       special compatibility feature - see the next  two  sections),  and  the
       escaping other non-alphanumeric characters does no harm.       terminating  closing  square  bracket.  However,  escaping  other  non-
        alphanumeric characters does no harm.
   
   
 POSIX CHARACTER CLASSES  POSIX CHARACTER CLASSES
Line 5701  POSIX CHARACTER CLASSES Line 5832  POSIX CHARACTER CLASSES
          lower    lower case letters           lower    lower case letters
          print    printing characters, including space           print    printing characters, including space
          punct    printing characters, excluding letters and digits and space           punct    printing characters, excluding letters and digits and space
         space    white space (not quite the same as \s)         space    white space (the same as \s from PCRE 8.34)
          upper    upper case letters           upper    upper case letters
          word     "word" characters (same as \w)           word     "word" characters (same as \w)
          xdigit   hexadecimal digits           xdigit   hexadecimal digits
   
       The  "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),       The  default  "space" characters are HT (9), LF (10), VT (11), FF (12),
       and space (32). Notice that this list includes the VT  character  (code       CR (13), and space (32). If locale-specific matching is  taking  place,
       11). This makes "space" different to \s, which does not include VT (for       the  list  of  space characters may be different; there may be fewer or
       Perl compatibility).       more of them. "Space" used to be different to \s, which did not include
        VT, for Perl compatibility.  However, Perl changed at release 5.18, and
        PCRE followed at release 8.34.  "Space" and \s now match the  same  set
        of characters.
   
       The name "word" is a Perl extension, and "blank"  is  a  GNU  extension       The  name  "word"  is  a Perl extension, and "blank" is a GNU extension
       from  Perl  5.8. Another Perl extension is negation, which is indicated       from Perl 5.8. Another Perl extension is negation, which  is  indicated
        by a ^ character after the colon. For example,         by a ^ character after the colon. For example,
   
          [12[:^digit:]]           [12[:^digit:]]
   
       matches "1", "2", or any non-digit. PCRE (and Perl) also recognize  the       matches  "1", "2", or any non-digit. PCRE (and Perl) also recognize the
        POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but         POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
        these are not supported, and an error is given if they are encountered.         these are not supported, and an error is given if they are encountered.
   
       By default, in UTF modes, characters with values greater  than  128  do       By default, characters with values greater than 128 do not match any of
       not  match any of the POSIX character classes. However, if the PCRE_UCP       the POSIX character classes. However, if the PCRE_UCP option is  passed
       option is passed to pcre_compile(), some of the classes are changed  so       to  pcre_compile(),  some  of  the  classes are changed so that Unicode
       that Unicode character properties are used. This is achieved by replac-       character properties are used. This is achieved  by  replacing  certain
       ing the POSIX classes by other sequences, as follows:       POSIX classes by other sequences, as follows:
   
          [:alnum:]  becomes  \p{Xan}           [:alnum:]  becomes  \p{Xan}
          [:alpha:]  becomes  \p{L}           [:alpha:]  becomes  \p{L}
Line 5736  POSIX CHARACTER CLASSES Line 5870  POSIX CHARACTER CLASSES
          [:upper:]  becomes  \p{Lu}           [:upper:]  becomes  \p{Lu}
          [:word:]   becomes  \p{Xwd}           [:word:]   becomes  \p{Xwd}
   
       Negated versions, such as [:^alpha:] use \P instead of  \p.  The  other       Negated  versions, such as [:^alpha:] use \P instead of \p. Three other
       POSIX classes are unchanged, and match only characters with code points       POSIX classes are handled specially in UCP mode:
       less than 128. 
   
          [:graph:] This matches characters that have glyphs that mark  the  page
                    when printed. In Unicode property terms, it matches all char-
                    acters with the L, M, N, P, S, or Cf properties, except for:
   
                      U+061C           Arabic Letter Mark
                      U+180E           Mongolian Vowel Separator
                      U+2066 - U+2069  Various "isolate"s
   
   
          [:print:] This matches the same  characters  as  [:graph:]  plus  space
                    characters  that  are  not controls, that is, characters with
                    the Zs property.
   
          [:punct:] This matches all characters that have the Unicode P (punctua-
                    tion)  property,  plus those characters whose code points are
                    less than 128 that have the S (Symbol) property.
   
          The other POSIX classes are unchanged, and match only  characters  with
          code points less than 128.
   
   
   COMPATIBILITY FEATURE FOR WORD BOUNDARIES
   
          In  the POSIX.2 compliant library that was included in 4.4BSD Unix, the
          ugly syntax [[:<:]] and [[:>:]] is used for matching  "start  of  word"
          and "end of word". PCRE treats these items as follows:
   
            [[:<:]]  is converted to  \b(?=\w)
            [[:>:]]  is converted to  \b(?<=\w)
   
          Only these exact character sequences are recognized. A sequence such as
          [a[:<:]b] provokes error for an unrecognized  POSIX  class  name.  This
          support  is not compatible with Perl. It is provided to help migrations
          from other environments, and is best not used in any new patterns. Note
          that  \b matches at the start and the end of a word (see "Simple asser-
          tions" above), and in a Perl-style pattern the preceding  or  following
          character  normally  shows  which  is  wanted, without the need for the
          assertions that are used above in order to give exactly the  POSIX  be-
          haviour.
   
   
 VERTICAL BAR  VERTICAL BAR
   
       Vertical bar characters are used to separate alternative patterns.  For       Vertical  bar characters are used to separate alternative patterns. For
        example, the pattern         example, the pattern
   
          gilbert|sullivan           gilbert|sullivan
   
       matches  either "gilbert" or "sullivan". Any number of alternatives may       matches either "gilbert" or "sullivan". Any number of alternatives  may
       appear, and an empty  alternative  is  permitted  (matching  the  empty       appear,  and  an  empty  alternative  is  permitted (matching the empty
        string). The matching process tries each alternative in turn, from left         string). The matching process tries each alternative in turn, from left
       to right, and the first one that succeeds is used. If the  alternatives       to  right, and the first one that succeeds is used. If thalternatives
       are  within a subpattern (defined below), "succeeds" means matching the       are within a subpattern (defined below), "succeeds" means matching  the
        rest of the main pattern as well as the alternative in the subpattern.         rest of the main pattern as well as the alternative in the subpattern.
   
   
 INTERNAL OPTION SETTING  INTERNAL OPTION SETTING
   
       The settings of the  PCRE_CASELESS,  PCRE_MULTILINE,  PCRE_DOTALL,  and       The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
       PCRE_EXTENDED  options  (which are Perl-compatible) can be changed from       PCRE_EXTENDED options (which are Perl-compatible) can be  changed  from
       within the pattern by  a  sequence  of  Perl  option  letters  enclosed       within  the  pattern  by  a  sequence  of  Perl option letters enclosed
        between "(?" and ")".  The option letters are         between "(?" and ")".  The option letters are
   
          i  for PCRE_CASELESS           i  for PCRE_CASELESS
Line 5770  INTERNAL OPTION SETTING Line 5943  INTERNAL OPTION SETTING
   
        For example, (?im) sets caseless, multiline matching. It is also possi-         For example, (?im) sets caseless, multiline matching. It is also possi-
        ble to unset these options by preceding the letter with a hyphen, and a         ble to unset these options by preceding the letter with a hyphen, and a
       combined  setting and unsetting such as (?im-sx), which sets PCRE_CASE-       combined setting and unsetting such as (?im-sx), which sets  PCRE_CASE-
       LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and  PCRE_EXTENDED,       LESS  and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
       is  also  permitted.  If  a  letter  appears  both before and after the       is also permitted. If a  letter  appears  both  before  and  after  the
        hyphen, the option is unset.         hyphen, the option is unset.
   
       The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and  PCRE_EXTRA       The  PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA
       can  be changed in the same way as the Perl-compatible options by using       can be changed in the same way as the Perl-compatible options by  using
        the characters J, U and X respectively.         the characters J, U and X respectively.
   
       When one of these option changes occurs at  top  level  (that  is,  not       When  one  of  these  option  changes occurs at top level (that is, not
       inside  subpattern parentheses), the change applies to the remainder of       inside subpattern parentheses), the change applies to the remainder  of
        the pattern that follows. If the change is placed right at the start of         the pattern that follows. If the change is placed right at the start of
        a pattern, PCRE extracts it into the global options (and it will there-         a pattern, PCRE extracts it into the global options (and it will there-
        fore show up in data extracted by the pcre_fullinfo() function).         fore show up in data extracted by the pcre_fullinfo() function).
   
       An option change within a subpattern (see below for  a  description  of       An  option  chan       An  option  chan
       subpatterns)  affects only that part of the subpattern that follows it,       subpatterns) affects only that part of the subpattern that follows  it,
        so         so
   
          (a(?i)b)c           (a(?i)b)c
   
        matches abc and aBc and no other strings (assuming PCRE_CASELESS is not         matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
       used).   By  this means, options can be made to have different settings       used).  By this means, options can be made to have  different  settings
       in different parts of the pattern. Any changes made in one  alternative       in  different parts of the pattern. Any changes made in one alternative
       do  carry  on  into subsequent branches within the same subpattern. For       do carry on into subsequent branches within the  same  subpattern.  For
        example,         example,
   
          (a(?i)b|c)           (a(?i)b|c)
   
       matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the       matches  "ab",  "aB",  "c",  and "C", even though when matching "C" the
       first  branch  is  abandoned before the option setting. This is because       first branch is abandoned before the option setting.  This  is  because
       the effects of option settings happen at compile time. There  would  be       the  effects  of option settings happen at compile time. There would be
        some very weird behaviour otherwise.         some very weird behaviour otherwise.
   
       Note:  There  are  other  PCRE-specific  options that can be set by the       Note: There are other PCRE-specific options that  can  be  set  by  the
       application when the compiling or matching  functions  are  called.  In       application  when  the  compiling  or matching functions are called. In
       some  cases  the  pattern can contain special leading sequences such as       some cases the pattern can contain special leading  sequences  such  as
       (*CRLF) to override what the application  has  set  or  what  has  been       (*CRLF)  to  override  what  the  application  has set or what has been
       defaulted.   Details   are  given  in  the  section  entitled  "Newline       defaulted.  Details  are  given  in  the  section   entitled   "Newline
       sequences" above. There are also the  (*UTF8),  (*UTF16),(*UTF32),  and       sequences"  above.  There  are also the (*UTF8), (*UTF16),(*UTF32), and
       (*UCP)  leading sequences that can be used to set UTF and Unicode prop-       (*UCP) leading sequences that can be used to set UTF and Unicode  prop-
       erty modes; they are equivalent to setting the  PCRE_UTF8,  PCRE_UTF16,       erty  modes;  they are equivalent to setting the PCRE_UTF8, PCRE_UTF16,
       PCRE_UTF32  and the PCRE_UCP options, respectively. The (*UTF) sequence       PCRE_UTF32 and the PCRE_UCP options, respectively. The (*UTF)  sequence
       is a generic version that can be used with any of the  libraries.  How-       is  a  generic version that can be used with any of the libraries. How-
       ever,  the  application  can set the PCRE_NEVER_UTF option, which locks       ever, the application can set the PCRE_NEVER_UTF  option,  which  locks
        out the use of the (*UTF) sequences.         out the use of the (*UTF) sequences.
   
   
Line 5827  SUBPATTERNS Line 6000  SUBPATTERNS
   
          cat(aract|erpillar|)           cat(aract|erpillar|)
   
       matches  "cataract",  "caterpillar", or "cat". Without the parentheses,       matches "cataract", "caterpillar", or "cat". Without  the  parentheses,
        it would match "cataract", "erpillar" or an empty string.         it would match "cataract", "erpillar" or an empty string.
   
       2. It sets up the subpattern as  a  capturing  subpattern.  This  means       2.  It  sets  up  the  subpattern as a capturing subpattern. This means
       that,  when  the  whole  pattern  matches,  that portion of the subject       that, when the whole pattern  matches,  that  portion  of  the  subject
        string that matched the subpattern is passed back to the caller via the         string that matched the subpattern is passed back to the caller via the
       ovector  argument  of  the matching function. (This applies only to the       ovector argument of the matching function. (This applies  only  to  the
       traditional matching functions; the DFA matching functions do not  sup-       traditional  matching functions; the DFA matching functions do not sup-
        port capturing.)         port capturing.)
   
        Opening parentheses are counted from left to right (starting from 1) to         Opening parentheses are counted from left to right (starting from 1) to
       obtain numbers for the  capturing  subpatterns.  For  example,  if  the       obtain  numbers  for  the  capturing  subpatterns.  For example, if the
        string "the red king" is matched against the pattern         string "the red king" is matched against the pattern
   
          the ((red|white) (king|queen))           the ((red|white) (king|queen))
Line 5846  SUBPATTERNS Line 6019  SUBPATTERNS
        the captured substrings are "red king", "red", and "king", and are num-         the captured substrings are "red king", "red", and "king", and are num-
        bered 1, 2, and 3, respectively.         bered 1, 2, and 3, respectively.
   
       The fact that plain parentheses fulfil  two  functions  is  not  always       The  fact  that  plain  parentheses  fulfil two functions is not always
       helpful.   There are often times when a grouping subpattern is required       helpful.  There are often times when a grouping subpattern is  required
       without a capturing requirement. If an opening parenthesis is  followed       without  a capturing requirement. If an opening parenthesis is followed
       by  a question mark and a colon, the subpattern does not do any captur-       by a question mark and a colon, the subpattern does not do any  captur-
       ing, and is not counted when computing the  number  of  any  subsequent       ing,  and  is  not  counted when computing the number of any subsequent
       capturing  subpatterns. For example, if the string "the white queen" is       capturing subpatterns. For example, if the string "the white queen"  is
        matched against the pattern         matched against the pattern
   
          the ((?:red|white) (king|queen))           the ((?:red|white) (king|queen))
Line 5859  SUBPATTERNS Line 6032  SUBPATTERNS
        the captured substrings are "white queen" and "queen", and are numbered         the captured substrings are "white queen" and "queen", and are numbered
        1 and 2. The maximum number of capturing subpatterns is 65535.         1 and 2. The maximum number of capturing subpatterns is 65535.
   
       As  a  convenient shorthand, if any option settings are required at the       As a convenient shorthand, if any option settings are required  at  the
       start of a non-capturing subpattern,  the  option  letters  may  appear       start  of  a  non-capturing  subpattern,  the option letters may appear
        between the "?" and the ":". Thus the two patterns         between the "?" and the ":". Thus the two patterns
   
          (?i:saturday|sunday)           (?i:saturday|sunday)
          (?:(?i)saturday|sunday)           (?:(?i)saturday|sunday)
   
        match exactly the same set of strings. Because alternative branches are         match exactly the same set of strings. Because alternative branches are
       tried from left to right, and options are not reset until  the  end  of       tried  from  left  to right, and options are not reset until the end of
       the  subpattern is reached, an option setting in one branch does affect       the subpattern is reached, an option setting in one branch does  affect
       subsequent branches, so the above patterns match "SUNDAY"  as  well  as       subsequent  branches,  so  the above patterns match "SUNDAY" as well as
        "Saturday".         "Saturday".
   
   
 DUPLICATE SUBPATTERN NUMBERS  DUPLICATE SUBPATTERN NUMBERS
   
        Perl 5.10 introduced a feature whereby each alternative in a subpattern         Perl 5.10 introduced a feature whereby each alternative in a subpattern
       uses the same numbers for its capturing parentheses. Such a  subpattern       uses  the same numbers for its capturing parentheses. Such a subpattern
       starts  with (?| and is itself a non-capturing subpattern. For example,       starts with (?| and is itself a non-capturing subpattern. For  example,
        consider this pattern:         consider this pattern:
   
          (?|(Sat)ur|(Sun))day           (?|(Sat)ur|(Sun))day
   
       Because the two alternatives are inside a (?| group, both sets of  cap-       Because  the two alternatives are inside a (?| group, both sets of cap-
       turing  parentheses  are  numbered one. Thus, when the pattern matches,       turing parentheses are numbered one. Thus, when  the  pattern  matches,
       you can look at captured substring number  one,  whichever  alternative       you  can  look  at captured substring number one, whichever alternative
       matched.  This  construct  is useful when you want to capture part, but       matched. This construct is useful when you want to  capture  part,  but
        not all, of one of a number of alternatives. Inside a (?| group, paren-         not all, of one of a number of alternatives. Inside a (?| group, paren-
       theses  are  numbered as usual, but the number is reset at the start of       theses are numbered as usual, but the number is reset at the  start  of
       each branch. The numbers of any capturing parentheses that  follow  the       each  branch.  The numbers of any capturing parentheses that follow the
       subpattern  start after the highest number used in any branch. The fol-       subpattern start after the highest number used in any branch. The  fol-
        lowing example is taken from the Perl documentation. The numbers under-         lowing example is taken from the Perl documentation. The numbers under-
        neath show in which buffer the captured content will be stored.         neath show in which buffer the captured content will be stored.
   
Line 5897  DUPLICATE SUBPATTERN NUMBERS Line 6070  DUPLICATE SUBPATTERN NUMBERS
          / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x           / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
          # 1            2         2  3        2     3     4           # 1            2         2  3        2     3     4
   
       A  back  reference  to a numbered subpattern uses the most recent value       A back reference to a numbered subpattern uses the  most  recent  value
       that is set for that number by any subpattern.  The  following  pattern       that  is  set  for that number by any subpattern. The following pattern
        matches "abcabc" or "defdef":         matches "abcabc" or "defdef":
   
          /(?|(abc)|(def))\1/           /(?|(abc)|(def))\1/
   
       In  contrast,  a subroutine call to a numbered subpattern always refers       In contrast, a subroutine call to a numbered subpattern  always  refers
       to the first one in the pattern with the given  number.  The  following       to  the  first  one in the pattern with the given number. The following
        pattern matches "abcabc" or "defabc":         pattern matches "abcabc" or "defabc":
   
          /(?|(abc)|(def))(?1)/           /(?|(abc)|(def))(?1)/
   
       If  a condition test for a subpattern's having matched refers to a non-       If a condition test for a subpattern's having matched refers to a  non-
       unique number, the test is true if any of the subpatterns of that  num-       unique  number, the test is true if any of the subpatterns of that num-
        ber have matched.         ber have matched.
   
       An  alternative approach to using this "branch reset" feature is to use       An alternative approach to using this "branch reset" feature is to  use
        duplicate named subpatterns, as described in the next section.         duplicate named subpatterns, as described in the next section.
   
   
 NAMED SUBPATTERNS  NAMED SUBPATTERNS
   
       Identifying capturing parentheses by number is simple, but  it  can  be       Identifying  capturing  parentheses  by number is simple, but it can be
       very  hard  to keep track of the numbers in complicated regular expres-       very hard to keep track of the numbers in complicated  regular  expres-
       sions. Furthermore, if an  expression  is  modified,  the  numbers  may       sions.  Furthermore,  if  an  expression  is  modified, the numbers may
       change.  To help with this difficulty, PCRE supports the naming of sub-       change. To help with this difficulty, PCRE supports the naming of  sub-
        patterns. This feature was not added to Perl until release 5.10. Python         patterns. This feature was not added to Perl until release 5.10. Python
       had  the  feature earlier, and PCRE introduced it at release 4.0, using       had the feature earlier, and PCRE introduced it at release  4.0,  using
       the Python syntax. PCRE now supports both the Perl and the Python  syn-       the  Python syntax. PCRE now supports both the Perl and the Python syn-
       tax.  Perl  allows  identically  numbered subpatterns to have different       tax. Perl allows identically numbered  subpatterns  to  have  different
        names, but PCRE does not.         names, but PCRE does not.
   
       In PCRE, a subpattern can be named in one of three  ways:  (?<name>...)       In  PCRE,  a subpattern can be named in one of three ways: (?<name>...)
       or  (?'name'...)  as in Perl, or (?P<name>...) as in Python. References       or (?'name'...) as in Perl, or (?P<name>...) as in  Python.  References
       to capturing parentheses from other parts of the pattern, such as  back       to  capturing parentheses from other parts of the pattern, such as back
       references,  recursion,  and conditions, can be made by name as well as       references, recursion, and conditions, can be made by name as  well  as
        by number.         by number.
   
       Names consist of up to  32  alphanumeric  characters  and  underscores.       Names  consist of up to 32 alphanumeric characters and underscores, but
       Named  capturing  parentheses  are  still  allocated numbers as well as       must start with a non-digit.  Named  capturing  parentheses  are  still
       names, exactly as if the names were not present. The PCRE API  provides       allocated  numbers  as  well as names, exactly as if the names were not
       function calls for extracting the name-to-number translation table from       present. The PCRE API provides function calls for extracting the  name-
       a compiled pattern. There is also a convenience function for extracting       to-number  translation  table  from a compiled pattern. There is also a
       a captured substring by name.       convenience function for extracting a captured substring by name.
   
       By  default, a name must be unique within a pattern, but it is possible       By default, a name must be unique within a pattern, but it is  possible
        to relax this constraint by setting the PCRE_DUPNAMES option at compile         to relax this constraint by setting the PCRE_DUPNAMES option at compile
       time.  (Duplicate  names are also always permitted for subpatterns with       time. (Duplicate names are also always permitted for  subpatterns  with
       the same number, set up as described in the previous  section.)  Dupli-       the  same  number, set up as described in the previous section.) Dupli-
       cate  names  can  be useful for patterns where only one instance of the       cate names can be useful for patterns where only one  instance  of  the
       named parentheses can match. Suppose you want to match the  name  of  a       named  parentheses  can  match. Suppose you want to match the name of a
       weekday,  either as a 3-letter abbreviation or as the full name, and in       weekday, either as a 3-letter abbreviation or as the full name, and  in
        both cases you want to extract the abbreviation. This pattern (ignoring         both cases you want to extract the abbreviation. This pattern (ignoring
        the line breaks) does the job:         the line breaks) does the job:
   
Line 5958  NAMED SUBPATTERNS Line 6131  NAMED SUBPATTERNS
          (?<DN>Thu)(?:rsday)?|           (?<DN>Thu)(?:rsday)?|
          (?<DN>Sat)(?:urday)?           (?<DN>Sat)(?:urday)?
   
       There  are  five capturing substrings, but only one is ever set after a       There are five capturing substrings, but only one is ever set  after  a
        match.  (An alternative way of solving this problem is to use a "branch         match.  (An alternative way of solving this problem is to use a "branch
        reset" subpattern, as described in the previous section.)         reset" subpattern, as described in the previous section.)
   
       The  convenience  function  for extracting the data by name returns the       The convenience function for extracting the data by  name  returns  the
       substring for the first (and in this example, the only)  subpattern  of       substring  for  the first (and in this example, the only) subpattern of
       that  name  that  matched.  This saves searching to find which numbered       that name that matched. This saves searching  to  find  which  numbered
        subpattern it was.         subpattern it was.
   
       If you make a back reference to  a  non-unique  named  subpattern  from       If  you  make  a  back  reference to a non-unique named subpattern from
       elsewhere  in the pattern, the one that corresponds to the first occur-       elsewhere in the pattern, the subpatterns to which the name refers  are
       rence of the name is used. In the absence of duplicate numbers (see the       checked  in  the order in which they appear in the overall pattern. The
       previous  section) this is the one with the lowest number. If you use a       first one that is set is used for the reference. For example, this pat-
       named reference in a condition test (see the section  about  conditions       tern matches both "foofoo" and "barbar" but not "foobar" or "barfoo":
       below),  either  to check whether a subpattern has matched, or to check 
       for recursion, all subpatterns with the same name are  tested.  If  the 
       condition  is  true for any one of them, the overall condition is true. 
       This is the same behaviour as testing by number. For further details of 
       the interfaces for handling named subpatterns, see the pcreapi documen- 
       tation. 
   
            (?:(?<n>foo)|(?<n>bar))\k<n>
   
   
          If you make a subroutine call to a non-unique named subpattern, the one
          that corresponds to the first occurrence of the name is  used.  In  the
          absence of duplicate numbers (see the previous section) this is the one
          with the lowest number.
   
          If you use a named reference in a condition test (see the section about
          conditions below), either to check whether a subpattern has matched, or
          to check for recursion, all subpatterns with the same name are  tested.
          If  the condition is true for any one of them, the overall condition is
          true. This is the same behaviour as  testing  by  number.  For  further
          details  of  the  interfaces  for  handling  named subpatterns, see the
          pcreapi documentation.
   
        Warning: You cannot use different names to distinguish between two sub-         Warning: You cannot use different names to distinguish between two sub-
        patterns  with  the same number because PCRE uses only the numbers when         patterns  with  the same number because PCRE uses only the numbers when
        matching. For this reason, an error is given at compile time if differ-         matching. For this reason, an error is given at compile time if differ-
        ent  names  are given to subpatterns with the same number. However, you         ent  names  are given to subpatterns with the same number. However, you
       can give the same name to subpatterns with the same number,  even  when       can always give the same name to subpatterns with the same number, even
       PCRE_DUPNAMES is not set.       when PCRE_DUPNAMES is not set.
   
   
 REPETITION  REPETITION
Line 6619  CONDITIONAL SUBPATTERNS Line 6802  CONDITIONAL SUBPATTERNS
        Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a         Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a
        used  subpattern  by  name.  For compatibility with earlier versions of         used  subpattern  by  name.  For compatibility with earlier versions of
        PCRE, which had this facility before Perl, the syntax  (?(name)...)  is         PCRE, which had this facility before Perl, the syntax  (?(name)...)  is
       also  recognized. However, there is a possible ambiguity with this syn-       also recognized.
       tax, because subpattern names may  consist  entirely  of  digits.  PCRE 
       looks  first for a named subpattern; if it cannot find one and the name 
       consists entirely of digits, PCRE looks for a subpattern of  that  num- 
       ber,  which must be greater than zero. Using subpattern names that con- 
       sist entirely of digits is not recommended. 
   
        Rewriting the above example to use a named subpattern gives this:         Rewriting the above example to use a named subpattern gives this:
   
          (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )           (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )
   
       If the name used in a condition of this kind is a duplicate,  the  test       If  the  name used in a condition of this kind is a duplicate, the test
       is  applied to all subpatterns of the same name, and is true if any one       is applied to all subpatterns of the same name, and is true if any  one
        of them has matched.         of them has matched.
   
    Checking for pattern recursion     Checking for pattern recursion
   
        If the condition is the string (R), and there is no subpattern with the         If the condition is the string (R), and there is no subpattern with the
       name  R, the condition is true if a recursive call to the whole pattern       name R, the condition is true if a recursive call to the whole  pattern
        or any subpattern has been made. If digits or a name preceded by amper-         or any subpattern has been made. If digits or a name preceded by amper-
        sand follow the letter R, for example:         sand follow the letter R, for example:
   
Line 6645  CONDITIONAL SUBPATTERNS Line 6823  CONDITIONAL SUBPATTERNS
   
        the condition is true if the most recent recursion is into a subpattern         the condition is true if the most recent recursion is into a subpattern
        whose number or name is given. This condition does not check the entire         whose number or name is given. This condition does not check the entire
       recursion  stack.  If  the  name  used in a condition of this kind is a       recursion stack. If the name used in a condition  of  this  kind  is  a
        duplicate, the test is applied to all subpatterns of the same name, and         duplicate, the test is applied to all subpatterns of the same name, and
        is true if any one of them is the most recent recursion.         is true if any one of them is the most recent recursion.
   
       At  "top  level",  all  these recursion test conditions are false.  The       At "top level", all these recursion test  conditions  are  false.   The
        syntax for recursive patterns is described below.         syntax for recursive patterns is described below.
   
    Defining subpatterns for use by reference only     Defining subpatterns for use by reference only
   
       If the condition is the string (DEFINE), and  there  is  no  subpattern       If  the  condition  is  the string (DEFINE), and there is no subpattern
       with  the  name  DEFINE,  the  condition is always false. In this case,       with the name DEFINE, the condition is  always  false.  In  this  case,
       there may be only one alternative  in  the  subpattern.  It  is  always       there  may  be  only  one  alternative  in the subpattern. It is always
       skipped  if  control  reaches  this  point  in the pattern; the idea of       skipped if control reaches this point  in  the  pattern;  the  idea  of
       DEFINE is that it can be used to define subroutines that can be  refer-       DEFINE  is that it can be used to define subroutines that can be refer-
       enced  from elsewhere. (The use of subroutines is described below.) For       enced from elsewhere. (The use of subroutines is described below.)  For
       example, a pattern to match an IPv4 address  such  as  "192.168.23.245"       example,  a  pattern  to match an IPv4 address such as "192.168.23.245"
        could be written like this (ignore white space and line breaks):         could be written like this (ignore white space and line breaks):
   
          (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )           (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
          \b (?&byte) (\.(?&byte)){3} \b           \b (?&byte) (\.(?&byte)){3} \b
   
       The  first part of the pattern is a DEFINE group inside which a another       The first part of the pattern is a DEFINE group inside which a  another
       group named "byte" is defined. This matches an individual component  of       group  named "byte" is defined. This matches an individual component of
       an  IPv4  address  (a number less than 256). When matching takes place,       an IPv4 address (a number less than 256). When  matching  takes  place,
       this part of the pattern is skipped because DEFINE acts  like  a  false       this  part  of  the pattern is skipped because DEFINE acts like a false
       condition.  The  rest of the pattern uses references to the named group       condition. The rest of the pattern uses references to the  named  group
       to match the four dot-separated components of an IPv4 address,  insist-       to  match the four dot-separated components of an IPv4 address, insist-
        ing on a word boundary at each end.         ing on a word boundary at each end.
   
    Assertion conditions     Assertion conditions
   
       If  the  condition  is  not  in any of the above formats, it must be an       If the condition is not in any of the above  formats,  it  must  be  an
       assertion.  This may be a positive or negative lookahead or  lookbehind       assertion.   This may be a positive or negative lookahead or lookbehind
       assertion.  Consider  this  pattern,  again  containing non-significant       assertion. Consider  this  pattern,  again  containing  non-significant
        white space, and with the two alternatives on the second line:         white space, and with the two alternatives on the second line:
   
          (?(?=[^a-z]*[a-z])           (?(?=[^a-z]*[a-z])
          \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )           \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
   
       The condition  is  a  positive  lookahead  assertion  that  matches  an       The  condition  is  a  positive  lookahead  assertion  that  matches an
       optional  sequence of non-letters followed by a letter. In other words,       optional sequence of non-letters followed by a letter. In other  words,
       it tests for the presence of at least one letter in the subject.  If  a       it  tests  for the presence of at least one letter in the subject. If a
       letter  is found, the subject is matched against the first alternative;       letter is found, the subject is matched against the first  alternative;
       otherwise it is  matched  against  the  second.  This  pattern  matches       otherwise  it  is  matched  against  the  second.  This pattern matches
       strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
        letters and dd are digits.         letters and dd are digits.
   
   
Line 6698  COMMENTS Line 6876  COMMENTS
        There are two ways of including comments in patterns that are processed         There are two ways of including comments in patterns that are processed
        by PCRE. In both cases, the start of the comment must not be in a char-         by PCRE. In both cases, the start of the comment must not be in a char-
        acter class, nor in the middle of any other sequence of related charac-         acter class, nor in the middle of any other sequence of related charac-
       ters  such  as  (?: or a subpattern name or number. The characters that       ters such as (?: or a subpattern name or number.  The  characters  that
        make up a comment play no part in the pattern matching.         make up a comment play no part in the pattern matching.
   
       The sequence (?# marks the start of a comment that continues up to  the       The  sequence (?# marks the start of a comment that continues up tthe
       next  closing parenthesis. Nested parentheses are not permitted. If the       next closing parenthesis. Nested parentheses are not permitted. If  the
        PCRE_EXTENDED option is set, an unescaped # character also introduces a         PCRE_EXTENDED option is set, an unescaped # character also introduces a
       comment,  which  in  this  case continues to immediately after the next       comment, which in this case continues to  immediately  after  the  next
       newline character or character sequence in the pattern.  Which  charac-       newline  character  or character sequence in the pattern. Which charac-
        ters are interpreted as newlines is controlled by the options passed to         ters are interpreted as newlines is controlled by the options passed to
       a compiling function or by a special sequence at the start of the  pat-       a  compiling function or by a special sequence at the start of thpat-
        tern, as described in the section entitled "Newline conventions" above.         tern, as described in the section entitled "Newline conventions" above.
        Note that the end of this type of comment is a literal newline sequence         Note that the end of this type of comment is a literal newline sequence
       in  the pattern; escape sequences that happen to represent a newline do       in the pattern; escape sequences that happen to represent a newline  do
       not count. For example, consider this  pattern  when  PCRE_EXTENDED  is       not  count.  For  example,  consider this pattern when PCRE_EXTENDED is
        set, and the default newline convention is in force:         set, and the default newline convention is in force:
   
          abc #comment \n still comment           abc #comment \n still comment
   
       On  encountering  the  # character, pcre_compile() skips along, looking       On encountering the # character, pcre_compile()  skips  along,  looking
       for a newline in the pattern. The sequence \n is still literal at  this       for  a newline in the pattern. The sequence \n is still literal at this
       stage,  so  it does not terminate the comment. Only an actual character       stage, so it does not terminate the comment. Only an  actual  character
        with the code value 0x0a (the default newline) does so.         with the code value 0x0a (the default newline) does so.
   
   
 RECURSIVE PATTERNS  RECURSIVE PATTERNS
   
       Consider the problem of matching a string in parentheses, allowing  for       Consider  the problem of matching a string in parentheses, allowing for
       unlimited  nested  parentheses.  Without the use of recursion, the best       unlimited nested parentheses. Without the use of  recursion,  the  best
       that can be done is to use a pattern that  matches  up  to  some  fixed       that  can  be  done  is  to use a pattern that matches up to some fixed
       depth  of  nesting.  It  is not possible to handle an arbitrary nesting       depth of nesting. It is not possible to  handle  an  arbitrary  nesting
        depth.         depth.
   
        For some time, Perl has provided a facility that allows regular expres-         For some time, Perl has provided a facility that allows regular expres-
       sions  to recurse (amongst other things). It does this by interpolating       sions to recurse (amongst other things). It does this by  interpolating
       Perl code in the expression at run time, and the code can refer to  the       Perl  code in the expression at run time, and the code can refer to the
        expression itself. A Perl pattern using code interpolation to solve the         expression itself. A Perl pattern using code interpolation to solve the
        parentheses problem can be created like this:         parentheses problem can be created like this:
   
Line 6742  RECURSIVE PATTERNS Line 6920  RECURSIVE PATTERNS
        refers recursively to the pattern in which it appears.         refers recursively to the pattern in which it appears.
   
        Obviously, PCRE cannot support the interpolation of Perl code. Instead,         Obviously, PCRE cannot support the interpolation of Perl code. Instead,
       it supports special syntax for recursion of  the  entire  pattern,  and       it  supports  special  syntax  for recursion of the entire pattern, and
       also  for  individual  subpattern  recursion. After its introduction in       also for individual subpattern recursion.  After  its  introduction  in
       PCRE and Python, this kind of  recursion  was  subsequently  introduced       PCRE  and  Python,  this  kind of recursion was subsequently introduced
        into Perl at release 5.10.         into Perl at release 5.10.
   
       A  special  item  that consists of (? followed by a number greater than       A special item that consists of (? followed by a  number  greater  than
       zero and a closing parenthesis is a recursive subroutine  call  of  the       zero  and  a  closing parenthesis is a recursive subroutine call of the
       subpattern  of  the  given  number, provided that it occurs inside that       subpattern of the given number, provided that  it  occurs  inside  that
       subpattern. (If not, it is a non-recursive subroutine  call,  which  is       subpattern.  (If  not,  it is a non-recursive subroutine call, which is
       described  in  the  next  section.)  The special item (?R) or (?0) is a       described in the next section.) The special item  (?R)  or  (?0)  is  a
        recursive call of the entire regular expression.         recursive call of the entire regular expression.
   
       This PCRE pattern solves the nested  parentheses  problem  (assume  the       This  PCRE  pattern  solves  the nested parentheses problem (assume the
        PCRE_EXTENDED option is set so that white space is ignored):         PCRE_EXTENDED option is set so that white space is ignored):
   
          \( ( [^()]++ | (?R) )* \)           \( ( [^()]++ | (?R) )* \)
   
       First  it matches an opening parenthesis. Then it matches any number of       First it matches an opening parenthesis. Then it matches any number  of
       substrings which can either be a  sequence  of  non-parentheses,  or  a       substrings  which  can  either  be  a sequence of non-parentheses, or a
       recursive  match  of the pattern itself (that is, a correctly parenthe-       recursive match of the pattern itself (that is, a  correctly  parenthe-
        sized substring).  Finally there is a closing parenthesis. Note the use         sized substring).  Finally there is a closing parenthesis. Note the use
        of a possessive quantifier to avoid backtracking into sequences of non-         of a possessive quantifier to avoid backtracking into sequences of non-
        parentheses.         parentheses.
   
       If this were part of a larger pattern, you would not  want  to  recurse       If  this  were  part of a larger pattern, you would        If  this  were  part of a larger pattern, you would 
        the entire pattern, so instead you could use this:         the entire pattern, so instead you could use this:
   
          ( \( ( [^()]++ | (?1) )* \) )           ( \( ( [^()]++ | (?1) )* \) )
   
       We  have  put the pattern into parentheses, and caused the recursion to       We have put the pattern into parentheses, and caused the  recursion  to
        refer to them instead of the whole pattern.         refer to them instead of the whole pattern.
   
       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be       In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
       tricky.  This is made easier by the use of relative references. Instead       tricky. This is made easier by the use of relative references.  Instead
        of (?1) in the pattern above you can write (?-2) to refer to the second         of (?1) in the pattern above you can write (?-2) to refer to the second
       most  recently  opened  parentheses  preceding  the recursion. In other       most recently opened parentheses  preceding  the  recursion.  In  other
       words, a negative number counts capturing  parentheses  leftwards  from       words,  a  negative  number counts capturing parentheses leftwards from
        the point at which it is encountered.         the point at which it is encountered.
   
       It  is  also  possible  to refer to subsequently opened parentheses, by       It is also possible to refer to  subsequently  opened  parentheses,  by
       writing references such as (?+2). However, these  cannot  be  recursive       writing  references  such  as (?+2). However, these cannot be recursive
       because  the  reference  is  not inside the parentheses that are refer-       because the reference is not inside the  parentheses  that  are  refer-
       enced. They are always non-recursive subroutine calls, as described  in       enced.  They are always non-recursive subroutine calls, as described in
        the next section.         the next section.
   
       An  alternative  approach is to use named parentheses instead. The Perl       An alternative approach is to use named parentheses instead.  The  Perl
       syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also       syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also
        supported. We could rewrite the above example as follows:         supported. We could rewrite the above example as follows:
   
          (?<pn> \( ( [^()]++ | (?&pn) )* \) )           (?<pn> \( ( [^()]++ | (?&pn) )* \) )
   
       If  there  is more than one subpattern with the same name, the earliest       If there is more than one subpattern with the same name,  the  earliest
        one is used.         one is used.
   
       This particular example pattern that we have been looking  at  contains       This  particular  example pattern that we have been looking at contains
        nested unlimited repeats, and so the use of a possessive quantifier for         nested unlimited repeats, and so the use of a possessive quantifier for
        matching strings of non-parentheses is important when applying the pat-         matching strings of non-parentheses is important when applying the pat-
       tern  to  strings  that do not match. For example, when this pattern is       tern to strings that do not match. For example, when  this  pattern  is
        applied to         applied to
   
          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
   
       it yields "no match" quickly. However, if a  possessive  quantifier  is       it  yields  "no  match" quickly. However, if a possessive quantifier is
       not  used, the match runs for a very long time indeed because there are       not used, the match runs for a very long time indeed because there  are
       so many different ways the + and * repeats can carve  up  the  subject,       so  many  different  ways the + and * repeats can carve up the subject,
        and all have to be tested before failure can be reported.         and all have to be tested before failure can be reported.
   
       At  the  end  of a match, the values of capturing parentheses are those       At the end of a match, the values of capturing  parentheses  are  those
       from the outermost level. If you want to obtain intermediate values,  a       from  the outermost level. If you want to obtain intermediate values, a
       callout  function can be used (see below and the pcrecallout documenta-       callout function can be used (see below and the pcrecallout  documenta-
        tion). If the pattern above is matched against         tion). If the pattern above is matched against
   
          (ab(cd)ef)           (ab(cd)ef)
   
       the value for the inner capturing parentheses  (numbered  2)  is  "ef",       the  value  for  the  inner capturing parentheses (numbered 2) is "ef",
       which  is the last value taken on at the top level. If a capturing sub-       which is the last value taken on at the top level. If a capturing  sub-
       pattern is not matched at the top level, its final  captured  value  is       pattern  is  not  matched at the top level, its final captured value is
       unset,  even  if  it was (temporarily) set at a deeper level during the       unset, even if it was (temporarily) set at a deeper  level  during  the
        matching process.         matching process.
   
       If there are more than 15 capturing parentheses in a pattern, PCRE  has       If  there are more than 15 capturing parentheses in a pattern, PCRE has
       to  obtain extra memory to store data during a recursion, which it does       to obtain extra memory to store data during a recursion, which it  does
        by using pcre_malloc, freeing it via pcre_free afterwards. If no memory         by using pcre_malloc, freeing it via pcre_free afterwards. If no memory
        can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.         can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
   
       Do  not  confuse  the (?R) item with the condition (R), which tests for       Do not confuse the (?R) item with the condition (R),  which  tests  for
       recursion.  Consider this pattern, which matches text in  angle  brack-       recursion.   Consider  this pattern, which matches text in angle brack-
       ets,  allowing for arbitrary nesting. Only digits are allowed in nested       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
       brackets (that is, when recursing), whereas any characters are  permit-       brackets  (that is, when recursing), whereas any characters are permit-
        ted at the outer level.         ted at the outer level.
   
          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >           < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
   
       In  this  pattern, (?(R) is the start of a conditional subpattern, with       In this pattern, (?(R) is the start of a conditional  subpattern,  with
       two different alternatives for the recursive and  non-recursive  cases.       two  different  alternatives for the recursive and non-recursive cases.
        The (?R) item is the actual recursive call.         The (?R) item is the actual recursive call.
   
    Differences in recursion processing between PCRE and Perl     Differences in recursion processing between PCRE and Perl
   
       Recursion  processing  in PCRE differs from Perl in two important ways.       Recursion processing in PCRE differs from Perl in two  important  ways.
       In PCRE (like Python, but unlike Perl), a recursive subpattern call  is       In  PCRE (like Python, but unlike Perl), a recursive subpattern call is
        always treated as an atomic group. That is, once it has matched some of         always treated as an atomic group. That is, once it has matched some of
        the subject string, it is never re-entered, even if it contains untried         the subject string, it is never re-entered, even if it contains untried
       alternatives  and  there  is a subsequent matching failure. This can be       alternatives and there is a subsequent matching failure.  This  can  be
       illustrated by the following pattern, which purports to match a  palin-       illustrated  by the following pattern, which purports to match a palin-
       dromic  string  that contains an odd number of characters (for example,       dromic string that contains an odd number of characters  (for  example,
        "a", "aba", "abcba", "abcdcba"):         "a", "aba", "abcba", "abcdcba"):
   
          ^(.|(.)(?1)\2)$           ^(.|(.)(?1)\2)$
   
        The idea is that it either matches a single character, or two identical         The idea is that it either matches a single character, or two identical
       characters  surrounding  a sub-palindrome. In Perl, this pattern works;       characters surrounding a sub-palindrome. In Perl, this  pattern  works;
       in PCRE it does not if the pattern is  longer  than  three  characters.       in  PCRE  it  does  not if the pattern is longer than three characters.
        Consider the subject string "abcba":         Consider the subject string "abcba":
   
       At  the  top level, the first character is matched, but as it is not at       At the top level, the first character is matched, but as it is  not  at
        the end of the string, the first alternative fails; the second alterna-         the end of the string, the first alternative fails; the second alterna-
        tive is taken and the recursion kicks in. The recursive call to subpat-         tive is taken and the recursion kicks in. The recursive call to subpat-
       tern 1 successfully matches the next character ("b").  (Note  that  the       tern  1  successfully  matches the next character ("b"). (Note that the
        beginning and end of line tests are not part of the recursion).         beginning and end of line tests are not part of the recursion).
   
       Back  at  the top level, the next character ("c") is compared with what       Back at the top level, the next character ("c") is compared  with  what
       subpattern 2 matched, which was "a". This fails. Because the  recursion       subpattern  2 matched, which was "a". This fails. Because the recursion
       is  treated  as  an atomic group, there are now no backtracking points,       is treated as an atomic group, there are now  no  backtracking  points,
       and so the entire match fails. (Perl is able, at  this  point,  to  re-       and  so  the  entire  match fails. (Perl is able, at this point, to re-
       enter  the  recursion  and try the second alternative.) However, if the       enter the recursion and try the second alternative.)  However,  if  the
        pattern is written with the alternatives in the other order, things are         pattern is written with the alternatives in the other order, things are
        different:         different:
   
          ^((.)(?1)\2|.)$           ^((.)(?1)\2|.)$
   
       This  time,  the recursing alternative is tried first, and continues to       This time, the recursing alternative is tried first, and  continues  to
       recurse until it runs out of characters, at which point  the  recursion       recurse  until  it runs out of characters, at which point the recursion
       fails.  But  this  time  we  do  have another alternative to try at the       fails. But this time we do have  another  alternative  to  try  at  the
       higher level. That is the big difference:  in  the  previous  case  the       higher  level.  That  is  the  big difference: in the previous case the
        remaining alternative is at a deeper recursion level, which PCRE cannot         remaining alternative is at a deeper recursion level, which PCRE cannot
        use.         use.
   
       To change the pattern so that it matches all palindromic  strings,  not       To  change  the pattern so that it matches all palindromic strings, not
       just  those  with an odd number of characters, it is tempting to change       just those with an odd number of characters, it is tempting  to  change
        the pattern to this:         the pattern to this:
   
          ^((.)(?1)\2|.?)$           ^((.)(?1)\2|.?)$
   
       Again, this works in Perl, but not in PCRE, and for  the  same  reason.       Again,  this  works  in Perl, but not in PCRE, and for the same reason.
       When  a  deeper  recursion has matched a single character, it cannot be       When a deeper recursion has matched a single character,  it  cannot  be
       entered again in order to match an empty string.  The  solution  is  to       entered  again  in  order  to match an empty string. The solution is to
       separate  the two cases, and write out the odd and even cases as alter-       separate the two cases, and write out the odd and even cases as  alter-
        natives at the higher level:         natives at the higher level:
   
          ^(?:((.)(?1)\2|)|((.)(?3)\4|.))           ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
   
       If you want to match typical palindromic phrases, the  pattern  has  to       If  you  want  to match typical palindromic phrases, the patte       If  you  want  to match typical palindromic phrases, the patte
        ignore all non-word characters, which can be done like this:         ignore all non-word characters, which can be done like this:
   
          ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$           ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
   
        If run with the PCRE_CASELESS option, this pattern matches phrases such         If run with the PCRE_CASELESS option, this pattern matches phrases such
        as "A man, a plan, a canal: Panama!" and it works well in both PCRE and         as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
       Perl.  Note the use of the possessive quantifier *+ to avoid backtrack-       Perl. Note the use of the possessive quantifier *+ to avoid  backtrack-
       ing into sequences of non-word characters. Without this, PCRE  takes  a       ing  into  sequences of non-word characters. Without this, PCRE takes a
       great  deal  longer  (ten  times or more) to match typical phrases, and       great deal longer (ten times or more) to  match  typical  phrases,  and
        Perl takes so long that you think it has gone into a loop.         Perl takes so long that you think it has gone into a loop.
   
       WARNING: The palindrome-matching patterns above work only if  the  sub-       WARNING:  The  palindrome-matching patterns above work only if the sub-
       ject  string  does not start with a palindrome that is shorter than the       ject string does not start with a palindrome that is shorter  than  the
       entire string.  For example, although "abcba" is correctly matched,  if       entire  string.  For example, although "abcba" is correctly matched, if
       the  subject  is "ababa", PCRE finds the palindrome "aba" at the start,       the subject is "ababa", PCRE finds the palindrome "aba" at  the  start,
       then fails at top level because the end of the string does not  follow.       then  fails at top level because the end of the string does not follow.
       Once  again, it cannot jump back into the recursion to try other alter-       Once again, it cannot jump back into the recursion to try other  alter-
        natives, so the entire match fails.         natives, so the entire match fails.
   
       The second way in which PCRE and Perl differ in  their  recursion  pro-       The  second  way  in which PCRE and Perl differ in their recursion pro-
       cessing  is in the handling of captured values. In Perl, when a subpat-       cessing is in the handling of captured values. In Perl, when a  subpat-
       tern is called recursively or as a subpattern (see the  next  section),       tern  is  called recursively or as a subpattern (see the next section),
       it  has  no  access to any values that were captured outside the recur-       it has no access to any values that were captured  outside  the  recur-
       sion, whereas in PCRE these values can  be  referenced.  Consider  this       sion,  whereas  in  PCRE  these values can be referenced. Consider this
        pattern:         pattern:
   
          ^(.)(\1|a(?2))           ^(.)(\1|a(?2))
   
       In  PCRE,  this  pattern matches "bab". The first capturing parentheses       In PCRE, this pattern matches "bab". The  first  capturing  parentheses
       match "b", then in the second group, when the back reference  \1  fails       match  "b",  then in the second group, when the back reference \1 fails
       to  match "b", the second alternative matches "a" and then recurses. In       to match "b", the second alternative matches "a" and then recurses.  In
       the recursion, \1 does now match "b" and so the whole  match  succeeds.       the  recursion,  \1 does now match "b" and so the whole match succeeds.
       In  Perl,  the pattern fails to match because inside the recursive call       In Perl, the pattern fails to match because inside the  recursive  call
        \1 cannot access the externally set value.         \1 cannot access the externally set value.
   
   
 SUBPATTERNS AS SUBROUTINES  SUBPATTERNS AS SUBROUTINES
   
       If the syntax for a recursive subpattern call (either by number  or  by       If  the  syntax for a recursive subpattern call (either by number or by
       name)  is  used outside the parentheses to which it refers, it operates       name) is used outside the parentheses to which it refers,  it  operates
       like a subroutine in a programming language. The called subpattern  may       like  a subroutine in a programming language. The called subpattern may
       be  defined  before or after the reference. A numbered reference can be       be defined before or after the reference. A numbered reference  can  be
        absolute or relative, as in these examples:         absolute or relative, as in these examples:
   
          (...(absolute)...)...(?2)...           (...(absolute)...)...(?2)...
Line 6947  SUBPATTERNS AS SUBROUTINES Line 7125  SUBPATTERNS AS SUBROUTINES
   
          (sens|respons)e and \1ibility           (sens|respons)e and \1ibility
   
       matches "sense and sensibility" and "response and responsibility",  but       matches  "sense and sensibility" and "response and responsibility", but
        not "sense and responsibility". If instead the pattern         not "sense and responsibility". If instead the pattern
   
          (sens|respons)e and (?1)ibility           (sens|respons)e and (?1)ibility
   
       is  used, it does match "sense and responsibility" as well as the other       is used, it does match "sense and responsibility" as well as the  other
       two strings. Another example is  given  in  the  discussion  of  DEFINE       two  strings.  Another  example  is  given  in the discussion of DEFINE
        above.         above.
   
       All  subroutine  calls, whether recursive or not, are always treated as       All subroutine calls, whether recursive or not, are always  treated  as
       atomic groups. That is, once a subroutine has matched some of the  sub-       atomic  groups. That is, once a subroutine has matched some of the sub-
        ject string, it is never re-entered, even if it contains untried alter-         ject string, it is never re-entered, even if it contains untried alter-
       natives and there is  a  subsequent  matching  failure.  Any  capturing       natives  and  there  is  a  subsequent  matching failure. Any capturing
       parentheses  that  are  set  during the subroutine call revert to their       parentheses that are set during the subroutine  call  revert  to  their
        previous values afterwards.         previous values afterwards.
   
       Processing options such as case-independence are fixed when  a  subpat-       Processing  options  such as case-independence are fixed when a subpat-
       tern  is defined, so if it is used as a subroutine, such options cannot       tern is defined, so if it is used as a subroutine, such options  cannot
        be changed for different calls. For example, consider this pattern:         be changed for different calls. For example, consider this pattern:
   
          (abc)(?i:(?-1))           (abc)(?i:(?-1))
   
       It matches "abcabc". It does not match "abcABC" because the  change  of       It  matches  "abcabc". It does not match "abcABC" because t       It  matches  "abcabc". It does not match "abcABC" because the change of
        processing option does not affect the called subpattern.         processing option does not affect the called subpattern.
   
   
 ONIGURUMA SUBROUTINE SYNTAX  ONIGURUMA SUBROUTINE SYNTAX
   
       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
        name or a number enclosed either in angle brackets or single quotes, is         name or a number enclosed either in angle brackets or single quotes, is
       an  alternative  syntax  for  referencing a subpattern as a subroutine,       an alternative syntax for referencing a  subpattern  as  a  subroutine,
       possibly recursively. Here are two of the examples used above,  rewrit-       possibly  recursively. Here are two of the examples used above, rewrit-
        ten using this syntax:         ten using this syntax:
   
          (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )           (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
          (sens|respons)e and \g'1'ibility           (sens|respons)e and \g'1'ibility
   
       PCRE  supports  an extension to Oniguruma: if a number is preceded by a       PCRE supports an extension to Oniguruma: if a number is preceded  by  a
        plus or a minus sign it is taken as a relative reference. For example:         plus or a minus sign it is taken as a relative reference. For example:
   
          (abc)(?i:\g<-1>)           (abc)(?i:\g<-1>)
   
       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not       Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
       synonymous.  The former is a back reference; the latter is a subroutine       synonymous. The former is a back reference; the latter is a  subroutine
        call.         call.
   
   
 CALLOUTS  CALLOUTS
   
        Perl has a feature whereby using the sequence (?{...}) causes arbitrary         Perl has a feature whereby using the sequence (?{...}) causes arbitrary
       Perl  code to be obeyed in the middle of matching a regular expression.       Perl code to be obeyed in the middle of matching a regular  expression.
        This makes it possible, amongst other things, to extract different sub-         This makes it possible, amongst other things, to extract different sub-
        strings that match the same pair of parentheses when there is a repeti-         strings that match the same pair of parentheses when there is a repeti-
        tion.         tion.
   
        PCRE provides a similar feature, but of course it cannot obey arbitrary         PCRE provides a similar feature, but of course it cannot obey arbitrary
        Perl code. The feature is called "callout". The caller of PCRE provides         Perl code. The feature is called "callout". The caller of PCRE provides
       an external function by putting its entry point in the global  variable       an  external function by putting its entry point in the global variable
       pcre_callout  (8-bit  library) or pcre[16|32]_callout (16-bit or 32-bit       pcre_callout (8-bit library) or pcre[16|32]_callout (16-bit  or  32-bit
       library).  By default, this variable contains NULL, which disables  all       library).   By default, this variable contains NULL, which disables all
        calling out.         calling out.
   
       Within  a  regular  expression,  (?C) indicates the points at which the       Within a regular expression, (?C) indicates the  points  at  which  the
       external function is to be called. If you want  to  identify  different       external  function  is  to be called. If you want to identify different
       callout  points, you can put a number less than 256 after the letter C.       callout points, you can put a number less than 256 after the letter  C.
       The default value is zero.  For example, this pattern has  two  callout       The  default  value is zero.  For example, this pattern has two callout
        points:         points:
   
          (?C1)abc(?C2)def           (?C1)abc(?C2)def
   
       If  the PCRE_AUTO_CALLOUT flag is passed to a compiling function, call-       If the PCRE_AUTO_CALLOUT flag is passed to a compiling function,  call-
       outs are automatically installed before each item in the pattern.  They       outs  are automatically installed before each item in the pattern. They
       are  all  numbered  255. If there is a conditional group in the pattern       are all numbered 255. If there is a conditional group  in  the  pattern
        whose condition is an assertion, an additional callout is inserted just         whose condition is an assertion, an additional callout is inserted just
        before the condition. An explicit callout may also be set at this posi-         before the condition. An explicit callout may also be set at this posi-
        tion, as in this example:         tion, as in this example:
Line 7029  CALLOUTS Line 7207  CALLOUTS
        Note that this applies only to assertion conditions, not to other types         Note that this applies only to assertion conditions, not to other types
        of condition.         of condition.
   
       During  matching, when PCRE reaches a callout point, the external func-       During matching, when PCRE reaches a callout point, the external  func-
       tion is called. It is provided with the  number  of  the  callout,  the       tion  is  called.  It  is  provided with the number of the callout, the
       position  in  the pattern, and, optionally, one item of data originally       position in the pattern, and, optionally, one item of  data  originally
       supplied by the caller of the matching function. The  callout  function       supplied  by  the caller of the matching function. The callout function
       may  cause  matching to proceed, to backtrack, or to fail altogether. A       may cause matching to proceed, to backtrack, or to fail altogether.
       complete description of the interface to the callout function is  given 
       in the pcrecallout documentation. 
   
          By default, PCRE implements a number of optimizations at  compile  time
          and  matching  time, and one side-effect is that sometimes callouts are
          skipped. If you need all possible callouts to happen, you need  to  set
          options  that  disable  the relevant optimizations. More details, and a
          complete description of the interface  to  the  callout  function,  are
          given in the pcrecallout documentation.
   
   
 BACKTRACKING CONTROL  BACKTRACKING CONTROL
   
        Perl  5.10 introduced a number of "Special Backtracking Control Verbs",         Perl  5.10 introduced a number of "Special Backtracking Control Verbs",
Line 7364  BACKTRACKING CONTROL Line 7547  BACKTRACKING CONTROL
          ...(*COMMIT)(*PRUNE)...           ...(*COMMIT)(*PRUNE)...
   
        If there is a matching failure to the right, backtracking onto (*PRUNE)         If there is a matching failure to the right, backtracking onto (*PRUNE)
       cases it to be triggered, and its action is taken. There can never be a       causes  it to be triggered, and its action is taken. There can never be
       backtrack onto (*COMMIT).       a backtrack onto (*COMMIT).
   
    Backtracking verbs in repeated groups     Backtracking verbs in repeated groups
   
Line 7435  AUTHOR Line 7618  AUTHOR
   
 REVISION  REVISION
   
       Last updated: 26 April 2013       Last updated: 03 December 2013
        Copyright (c) 1997-2013 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
 ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
   
Line 7469  CHARACTERS Line 7652  CHARACTERS
          \n         newline (hex 0A)           \n         newline (hex 0A)
          \r         carriage return (hex 0D)           \r         carriage return (hex 0D)
          \t         tab (hex 09)           \t         tab (hex 09)
            \0dd       character with octal code 0dd
          \ddd       character with octal code ddd, or backreference           \ddd       character with octal code ddd, or backreference
            \o{ddd..}  character with octal code ddd..
          \xhh       character with hex code hh           \xhh       character with hex code hh
          \x{hhh..}  character with hex code hhh..           \x{hhh..}  character with hex code hhh..
   
          Note that \0dd is always an octal code, and that \8 and \9 are the lit-
          eral characters "8" and "9".
   
   
 CHARACTER TYPES  CHARACTER TYPES
   
          .          any character except newline;           .          any character except newline;
Line 7495  CHARACTER TYPES Line 7683  CHARACTER TYPES
          \W         a "non-word" character           \W         a "non-word" character
          \X         a Unicode extended grapheme cluster           \X         a Unicode extended grapheme cluster
   
       In  PCRE,  by  default, \d, \D, \s, \S, \w, and \W recognize only ASCII       By default, \d, \s, and \w match only ASCII characters, even  in  UTF-8
       characters, even in a UTF mode. However, this can be changed by setting       mode  or  in  the 16- bit and 32-bit libraries. However, if locale-spe-
       the PCRE_UCP option.       cific matching is happening, \s and \w may also match  characters  with
        code  points  in  the range 128-255. If the PCRE_UCP option is set, the
        behaviour of these escape sequences is changed to use  Unicode  proper-
        ties and they match many more characters.
   
   
 GENERAL CATEGORY PROPERTIES FOR \p and \P  GENERAL CATEGORY PROPERTIES FOR \p and \P
Line 7552  PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P Line 7743  PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P
   
          Xan        Alphanumeric: union of properties L and N           Xan        Alphanumeric: union of properties L and N
          Xps        POSIX space: property Z or tab, NL, VT, FF, CR           Xps        POSIX space: property Z or tab, NL, VT, FF, CR
         Xsp        Perl space: property Z or tab, NL, FF, CR         Xsp        Perl space: property Z or tab, NL, VT, FF, CR
          Xuc        Univerally-named character: one that can be           Xuc        Univerally-named character: one that can be
                       represented by a Universal Character Name                        represented by a Universal Character Name
          Xwd        Perl word: property Xan or underscore           Xwd        Perl word: property Xan or underscore
   
          Perl and POSIX space are now the same. Perl added VT to its space char-
          acter set at release 5.18 and PCRE changed at release 8.34.
   
   
 SCRIPT NAMES FOR \p AND \P  SCRIPT NAMES FOR \p AND \P
   
       Arabic,  Armenian,  Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,       Arabic, Armenian, Avestan, Balinese, Bamum, Batak,  Bengali,  Bopomofo,
       Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian,  Chakma,       Brahmi,  Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma,
       Cham,  Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,       Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic,  Deseret,
       Devanagari,  Egyptian_Hieroglyphs,  Ethiopic,   Georgian,   Glagolitic,       Devanagari,   Egyptian_Hieroglyphs,   Ethiopic,  Georgian,  Glagolitic,
       Gothic,  Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-       Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira-
       gana,  Imperial_Aramaic,  Inherited,  Inscriptional_Pahlavi,   Inscrip-       gana,   Imperial_Aramaic,  Inherited,  Inscriptional_Pahlavi,  Inscrip-
       tional_Parthian,   Javanese,   Kaithi,   Kannada,  Katakana,  Kayah_Li,       tional_Parthian,  Javanese,  Kaithi,   Kannada,   Katakana,   Kayah_Li,
       Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B,  Lisu,  Lycian,       Kharoshthi,  Khmer,  Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian,
        Lydian,    Malayalam,    Mandaic,    Meetei_Mayek,    Meroitic_Cursive,         Lydian,    Malayalam,    Mandaic,    Meetei_Mayek,    Meroitic_Cursive,
       Meroitic_Hieroglyphs,  Miao,  Mongolian,  Myanmar,  New_Tai_Lue,   Nko,       Meroitic_Hieroglyphs,   Miao,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,
       Ogham,    Old_Italic,   Old_Persian,   Old_South_Arabian,   Old_Turkic,       Ogham,   Old_Italic,   Old_Persian,   Old_South_Arabian,    Old_Turkic,
       Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic,  Samari-       Ol_Chiki,  Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari-
       tan,  Saurashtra,  Sharada,  Shavian, Sinhala, Sora_Sompeng, Sundanese,       tan, Saurashtra, Sharada, Shavian,  Sinhala,  Sora_Sompeng,  Sundanese,
       Syloti_Nagri, Syriac, Tagalog, Tagbanwa,  Tai_Le,  Tai_Tham,  Tai_Viet,       Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,
       Takri,  Tamil,  Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,       Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh,  Ugaritic,  Vai,
        Yi.         Yi.
   
   
Line 7601  CHARACTER CLASSES Line 7795  CHARACTER CLASSES
          word        same as \w           word        same as \w
          xdigit      hexadecimal digit           xdigit      hexadecimal digit
   
       In PCRE, POSIX character set names recognize only ASCII  characters  by       In  PCRE,  POSIX character set names recognize only ASCII characters by
       default,  but  some  of them use Unicode properties if PCRE_UCP is set.       default, but some of them use Unicode properties if  PCRE_UCP  is  set.
        You can use \Q...\E inside a character class.         You can use \Q...\E inside a character class.
   
   
Line 7683  OPTION SETTING Line 7877  OPTION SETTING
          (?x)            extended (ignore white space)           (?x)            extended (ignore white space)
          (?-...)         unset option(s)           (?-...)         unset option(s)
   
       The following are recognized only at the start of a  pattern  or  after       The  following  are  recognized only at the start o       The  following  are  recognized only at the start of a pattern or after
        one of the newline-setting options with similar syntax:         one of the newline-setting options with similar syntax:
   
          (*LIMIT_MATCH=d) set the match limit to d (decimal number)           (*LIMIT_MATCH=d) set the match limit to d (decimal number)
Line 7695  OPTION SETTING Line 7889  OPTION SETTING
          (*UTF)          set appropriate UTF mode for the library in use           (*UTF)          set appropriate UTF mode for the library in use
          (*UCP)          set PCRE_UCP (use Unicode properties for \d etc)           (*UCP)          set PCRE_UCP (use Unicode properties for \d etc)
   
          Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value  of
          the limits set by the caller of pcre_exec(), not increase them.
   
   
 LOOKAHEAD AND LOOKBEHIND ASSERTIONS  LOOKAHEAD AND LOOKBEHIND ASSERTIONS
   
          (?=...)         positive look ahead           (?=...)         positive look ahead
Line 7819  AUTHOR Line 8016  AUTHOR
   
 REVISION  REVISION
   
       Last updated: 26 April 2013       Last updated: 12 November 2013
        Copyright (c) 1997-2013 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
 ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
   
Line 8743  MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre[16 Line 8940  MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre[16
        matched  string. It is up to the calling program to do that if it needs         matched  string. It is up to the calling program to do that if it needs
        to.         to.
   
          That means that, for an unanchored pattern, if a continued match fails,
          it  is  not  possible  to  try  again at a new starting point. All this
          facility is capable of doing is  continuing  with  the  previous  match
          attempt.  In  the previous example, if the second set of data is "ug23"
          the result is no match, even though there would be a match for  "aug23"
          if  the entire string were given at once. Depending on the application,
          this may or may not be what you want.  The only way to allow for start-
          ing  again  at  the next character is to retain the matched part of the
          subject and try a new complete match.
   
        You can set the PCRE_PARTIAL_SOFT  or  PCRE_PARTIAL_HARD  options  with         You can set the PCRE_PARTIAL_SOFT  or  PCRE_PARTIAL_HARD  options  with
        PCRE_DFA_RESTART  to  continue partial matching over multiple segments.         PCRE_DFA_RESTART  to  continue partial matching over multiple segments.
        This facility can be used to pass very long subject strings to the  DFA         This facility can be used to pass very long subject strings to the  DFA
Line 8926  AUTHOR Line 9133  AUTHOR
   
 REVISION  REVISION
   
       Last updated: 20 February 2013       Last updated: 02 July 2013
        Copyright (c) 1997-2013 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
 ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
   
Line 9030  RE-USING A PRECOMPILED PATTERN Line 9237  RE-USING A PRECOMPILED PATTERN
        is  used  to  pass this data, as described in the section on matching a         is  used  to  pass this data, as described in the section on matching a
        pattern in the pcreapi documentation.         pattern in the pcreapi documentation.
   
          Warning: The tables that pcre_exec() and pcre_dfa_exec()  use  must  be
          the same as those that were used when the pattern was compiled. If this
          is not the case, the behaviour is undefined.
   
        If you did not provide custom character tables  when  the  pattern  was         If you did not provide custom character tables  when  the  pattern  was
        compiled, the pointer in the compiled pattern is NULL, which causes the         compiled, the pointer in the compiled pattern is NULL, which causes the
        matching functions to use PCRE's internal tables. Thus, you do not need         matching functions to use PCRE's internal tables. Thus, you do not need
Line 9061  AUTHOR Line 9272  AUTHOR
   
 REVISION  REVISION
   
       Last updated: 24 June 2012       Last updated: 12 November 2013
       Copyright (c) 1997-2012 University of Cambridge.       Copyright (c) 1997-2013 University of Cambridge.
 ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
   
   
Line 9243  PCREPOSIX(3)               Library Functions Manual    Line 9454  PCREPOSIX(3)               Library Functions Manual   
 NAME  NAME
        PCRE - Perl-compatible regular expressions.         PCRE - Perl-compatible regular expressions.
   
SYNOPSIS OF POSIX APISYNOPSIS
   
        #include <pcreposix.h>         #include <pcreposix.h>
   
Line 9252  SYNOPSIS OF POSIX API Line 9463  SYNOPSIS OF POSIX API
   
        int regexec(regex_t *preg, const char *string,         int regexec(regex_t *preg, const char *string,
             size_t nmatch, regmatch_t pmatch[], int eflags);              size_t nmatch, regmatch_t pmatch[], int eflags);
            size_t regerror(int errcode, const regex_t *preg,
       size_t regerror(int errcode, const regex_t *preg, 
             char *errbuf, size_t errbuf_size);              char *errbuf, size_t errbuf_size);
   
        void regfree(regex_t *preg);         void regfree(regex_t *preg);
Line 9943  SIZE AND OTHER LIMITATIONS Line 10153  SIZE AND OTHER LIMITATIONS
        never in practice be relevant.         never in practice be relevant.
   
        The maximum length of a compiled  pattern  is  approximately  64K  data         The maximum length of a compiled  pattern  is  approximately  64K  data
       units  (bytes  for  the  8-bit  library,  32-bit  units  for the 32-bit       units  (bytes  for  the  8-bit  library,  16-bit  units  for the 16-bit
        library, and 32-bit units for the 32-bit library) if PCRE  is  compiled         library, and 32-bit units for the 32-bit library) if PCRE  is  compiled
       with  the  default  internal  linkage  size  of 2 bytes. If you want to       with  the default internal linkage size, which is 2 bytes for the 8-bit
       process regular expressions that are truly enormous,  you  can  compile       and 16-bit libraries, and 4 bytes for the 32-bit library. If  you  want
       PCRE  with an internal linkage size of 3 or 4 (when building the 16-bit       to process regular expressions that are truly enormous, you can compile
       or 32-bit library, 3 is rounded up to 4). See the README  file  in  the       PCRE with an internal linkage size of 3 or 4 (when building the  16-bit
       source  distribution  and  the  pcrebuild documentation for details. In       or  32-bit  library,  3 is rounded up to 4). See the README file in the
       these cases the limit is substantially larger.  However, the  speed  of       source distribution and the pcrebuild  documentation  for  details.  In
        these  cases  the limit is substantially larger.  However, the speed of
        execution is slower.         execution is slower.
   
        All values in repeating quantifiers must be less than 65536.         All values in repeating quantifiers must be less than 65536.
   
        There is no limit to the number of parenthesized subpatterns, but there         There is no limit to the number of parenthesized subpatterns, but there
       can be no more than 65535 capturing subpatterns.       can  be  no more than 65535 capturing subpatterns. There is, however, a
        limit to the depth of  nesting  of  parenthesized  subpatterns  of  all
        kinds.  This  is  imposed  in order to limit the amount of system stack
        used at compile time. The limit can be specified when  PCRE  is  built;
        the default is 250.
   
        There is a limit to the number of forward references to subsequent sub-         There is a limit to the number of forward references to subsequent sub-
       patterns  of  around  200,000.  Repeated  forward references with fixed       patterns of around 200,000.  Repeated  forward  references  with  fixed
       upper limits, for example, (?2){0,100} when subpattern number 2  is  to       upper  limits,  for example, (?2){0,100} when subpattern number 2 is to
       the  right,  are included in the count. There is no limit to the number       the right, are included in the count. There is no limit to  the  number
        of backward references.         of backward references.
   
        The maximum length of name for a named subpattern is 32 characters, and         The maximum length of name for a named subpattern is 32 characters, and
        the maximum number of named subpatterns is 10000.         the maximum number of named subpatterns is 10000.
   
       The  maximum  length  of  a  name  in  a (*MARK), (*PRUNE), (*SKIP), or       The maximum length of a  name  in  a  (*MARK),  (*PRUNE),  (*SKIP),  or
       (*THEN) verb is 255 for the 8-bit library and 65535 for the 16-bit  and       (*THEN)  verb is 255 for the 8-bit library and 65535 for the 16-bit and
       32-bit library.       32-bit libraries.
   
       The  maximum  length of a subject string is the largest positive number       The maximum length of a subject string is the largest  positive  number
       that an integer variable can hold. However, when using the  traditional       that  an integer variable can hold. However, when using the traditional
        matching function, PCRE uses recursion to handle subpatterns and indef-         matching function, PCRE uses recursion to handle subpatterns and indef-
       inite repetition.  This means that the available stack space may  limit       inite  repetition.  This means that the available stack space may limit
        the size of a subject string that can be processed by certain patterns.         the size of a subject string that can be processed by certain patterns.
        For a discussion of stack issues, see the pcrestack documentation.         For a discussion of stack issues, see the pcrestack documentation.
   
Line 9988  AUTHOR Line 10203  AUTHOR
   
 REVISION  REVISION
   
       Last updated: 04 May 2012       Last updated: 05 November 2013
       Copyright (c) 1997-2012 University of Cambridge.       Copyright (c) 1997-2013 University of Cambridge.
 ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
   
   

Removed from v.1.1.1.4  
changed lines
  Added in v.1.1.1.5


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>