Diff for /embedaddon/pcre/doc/html/pcrepattern.html between versions 1.1.1.2 and 1.1.1.3

version 1.1.1.2, 2012/02/21 23:50:25 version 1.1.1.3, 2012/10/09 09:19:18
Line 227  backslash. All other characters (in particular, those  Line 227  backslash. All other characters (in particular, those 
 greater than 127) are treated as literals.  greater than 127) are treated as literals.
 </P>  </P>
 <P>  <P>
If a pattern is compiled with the PCRE_EXTENDED option, whitespace in theIf a pattern is compiled with the PCRE_EXTENDED option, white space in the
 pattern (other than in a character class) and characters between a # outside  pattern (other than in a character class) and characters between a # outside
 a character class and the next newline are ignored. An escaping backslash can  a character class and the next newline are ignored. An escaping backslash can
be used to include a whitespace or # character as part of the pattern.be used to include a white space or # character as part of the pattern.
 </P>  </P>
 <P>  <P>
 If you want to remove the special meaning from a sequence of characters, you  If you want to remove the special meaning from a sequence of characters, you
Line 264  one of the following escape sequences than the binary  Line 264  one of the following escape sequences than the binary 
   \a        alarm, that is, the BEL character (hex 07)    \a        alarm, that is, the BEL character (hex 07)
   \cx       "control-x", where x is any ASCII character    \cx       "control-x", where x is any ASCII character
   \e        escape (hex 1B)    \e        escape (hex 1B)
  \f        formfeed (hex 0C)  \f        form feed (hex 0C)
   \n        linefeed (hex 0A)    \n        linefeed (hex 0A)
   \r        carriage return (hex 0D)    \r        carriage return (hex 0D)
   \t        tab (hex 09)    \t        tab (hex 09)
Line 307  as just described only when it is followed by two hexa Line 307  as just described only when it is followed by two hexa
 Otherwise, it matches a literal "x" character. In JavaScript mode, support for  Otherwise, it matches a literal "x" character. In JavaScript mode, support for
 code points greater than 256 is provided by \u, which must be followed by  code points greater than 256 is provided by \u, which must be followed by
 four hexadecimal digits; otherwise it matches a literal "u" character.  four hexadecimal digits; otherwise it matches a literal "u" character.
   Character codes specified by \u in JavaScript mode are constrained in the same
   was as those specified by \x in non-JavaScript mode.
 </P>  </P>
 <P>  <P>
 Characters whose value is less than 256 can be defined by either of the two  Characters whose value is less than 256 can be defined by either of the two
Line 406  Another use of backslash is for specifying generic cha Line 408  Another use of backslash is for specifying generic cha
 <pre>  <pre>
   \d     any decimal digit    \d     any decimal digit
   \D     any character that is not a decimal digit    \D     any character that is not a decimal digit
  \h     any horizontal whitespace character  \h     any horizontal white space character
  \H     any character that is not a horizontal whitespace character  \H     any character that is not a horizontal white space character
  \s     any whitespace character  \s     any white space character
  \S     any character that is not a whitespace character  \S     any character that is not a white space character
  \v     any vertical whitespace character  \v     any vertical white space character
  \V     any character that is not a vertical whitespace character  \V     any character that is not a vertical white space character
   \w     any "word" character    \w     any "word" character
   \W     any "non-word" character    \W     any "non-word" character
 </pre>  </pre>
Line 497  The vertical space characters are: Line 499  The vertical space characters are:
 <pre>  <pre>
   U+000A     Linefeed    U+000A     Linefeed
   U+000B     Vertical tab    U+000B     Vertical tab
  U+000C     Formfeed  U+000C     Form feed
   U+000D     Carriage return    U+000D     Carriage return
   U+0085     Next line    U+0085     Next line
   U+2028     Line separator    U+2028     Line separator
Line 520  This is an example of an "atomic group", details of wh Line 522  This is an example of an "atomic group", details of wh
 <a href="#atomicgroup">below.</a>  <a href="#atomicgroup">below.</a>
 This particular group matches either the two-character sequence CR followed by  This particular group matches either the two-character sequence CR followed by
 LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,  LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (nextU+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
 line, U+0085). The two-character sequence is treated as a single unit that  line, U+0085). The two-character sequence is treated as a single unit that
 cannot be split.  cannot be split.
 </P>  </P>
Line 596  Armenian, Line 598  Armenian,
 Avestan,  Avestan,
 Balinese,  Balinese,
 Bamum,  Bamum,
   Batak,
 Bengali,  Bengali,
 Bopomofo,  Bopomofo,
   Brahmi,
 Braille,  Braille,
 Buginese,  Buginese,
 Buhid,  Buhid,
 Canadian_Aboriginal,  Canadian_Aboriginal,
 Carian,  Carian,
   Chakma,
 Cham,  Cham,
 Cherokee,  Cherokee,
 Common,  Common,
Line 645  Lisu, Line 650  Lisu,
 Lycian,  Lycian,
 Lydian,  Lydian,
 Malayalam,  Malayalam,
   Mandaic,
 Meetei_Mayek,  Meetei_Mayek,
   Meroitic_Cursive,
   Meroitic_Hieroglyphs,
   Miao,
 Mongolian,  Mongolian,
 Myanmar,  Myanmar,
 New_Tai_Lue,  New_Tai_Lue,
Line 664  Rejang, Line 673  Rejang,
 Runic,  Runic,
 Samaritan,  Samaritan,
 Saurashtra,  Saurashtra,
   Sharada,
 Shavian,  Shavian,
 Sinhala,  Sinhala,
   Sora_Sompeng,
 Sundanese,  Sundanese,
 Syloti_Nagri,  Syloti_Nagri,
 Syriac,  Syriac,
Line 674  Tagbanwa, Line 685  Tagbanwa,
 Tai_Le,  Tai_Le,
 Tai_Tham,  Tai_Tham,
 Tai_Viet,  Tai_Viet,
   Takri,
 Tamil,  Tamil,
 Telugu,  Telugu,
 Thaana,  Thaana,
Line 812  PCRE_UCP is set. They are: Line 824  PCRE_UCP is set. They are:
   Xwd   Any Perl "word" character    Xwd   Any Perl "word" character
 </pre>  </pre>
 Xan matches characters that have either the L (letter) or the N (number)  Xan matches characters that have either the L (letter) or the N (number)
property. Xps matches the characters tab, linefeed, vertical tab, formfeed, orproperty. Xps matches the characters tab, linefeed, vertical tab, form feed, or
 carriage return, and any other character that has the Z (separator) property.  carriage return, and any other character that has the Z (separator) property.
 Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the  Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the
 same characters as Xan, plus underscore.  same characters as Xan, plus underscore.
Line 1008  used. Because \C breaks up characters into individual  Line 1020  used. Because \C breaks up characters into individual 
 unit with \C in a UTF mode means that the rest of the string may start with a  unit with \C in a UTF mode means that the rest of the string may start with a
 malformed UTF character. This has undefined results, because PCRE assumes that  malformed UTF character. This has undefined results, because PCRE assumes that
 it is dealing with valid UTF strings (and by default it checks this at the  it is dealing with valid UTF strings (and by default it checks this at the
start of processing unless the PCRE_NO_UTF8_CHECK option is used).start of processing unless the PCRE_NO_UTF8_CHECK or PCRE_NO_UTF16_CHECK option
 is used).
 </P>  </P>
 <P>  <P>
 PCRE does not allow \C to appear in lookbehind assertions  PCRE does not allow \C to appear in lookbehind assertions
Line 1818  Because there may be many capturing parentheses in a p Line 1831  Because there may be many capturing parentheses in a p
 following a backslash are taken as part of a potential back reference number.  following a backslash are taken as part of a potential back reference number.
 If the pattern continues with a digit character, some delimiter must be used to  If the pattern continues with a digit character, some delimiter must be used to
 terminate the back reference. If the PCRE_EXTENDED option is set, this can be  terminate the back reference. If the PCRE_EXTENDED option is set, this can be
whitespace. Otherwise, the \g{ syntax or an empty comment (seewhite space. Otherwise, the \g{ syntax or an empty comment (see
 <a href="#comments">"Comments"</a>  <a href="#comments">"Comments"</a>
 below) can be used.  below) can be used.
 </P>  </P>
Line 2160  point in the pattern; the idea of DEFINE is that it ca Line 2173  point in the pattern; the idea of DEFINE is that it ca
 subroutines that can be referenced from elsewhere. (The use of  subroutines that can be referenced from elsewhere. (The use of
 <a href="#subpatternsassubroutines">subroutines</a>  <a href="#subpatternsassubroutines">subroutines</a>
 is described below.) For example, a pattern to match an IPv4 address such as  is described below.) For example, a pattern to match an IPv4 address such as
"192.168.23.245" could be written like this (ignore whitespace and line"192.168.23.245" could be written like this (ignore white space and line
 breaks):  breaks):
 <pre>  <pre>
   (?(DEFINE) (?&#60;byte&#62; 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )    (?(DEFINE) (?&#60;byte&#62; 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
Line 2554  exception: the name from a *(MARK), (*PRUNE), or (*THE Line 2567  exception: the name from a *(MARK), (*PRUNE), or (*THE
 a successful positive assertion <i>is</i> passed back when a match succeeds  a successful positive assertion <i>is</i> passed back when a match succeeds
 (compare capturing parentheses in assertions). Note that such subpatterns are  (compare capturing parentheses in assertions). Note that such subpatterns are
 processed as anchored at the point where they are tested. Note also that Perl's  processed as anchored at the point where they are tested. Note also that Perl's
treatment of subroutines is different in some cases.treatment of subroutines and assertions is different in some cases.
 </P>  </P>
 <P>  <P>
 The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
 parenthesis followed by an asterisk. They are generally of the form  parenthesis followed by an asterisk. They are generally of the form
 (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,  (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
 depending on whether or not an argument is present. A name is any sequence of  depending on whether or not an argument is present. A name is any sequence of
characters that does not include a closing parenthesis. If the name is empty,characters that does not include a closing parenthesis. The maximum length of
that is, if the closing parenthesis immediately follows the colon, the effectname is 255 in the 8-bit library and 65535 in the 16-bit library. If the name
is as if the colon were not there. Any number of these verbs may occur in ais empty, that is, if the closing parenthesis immediately follows the colon,
pattern.the effect is as if the colon were not there. Any number of these verbs may
</P>occur in a pattern.
 <a name="nooptimize"></a></P>
 <br><b>
 Optimizations that affect backtracking verbs
 </b><br>
 <P>  <P>
 PCRE contains some optimizations that are used to speed up matching by running  PCRE contains some optimizations that are used to speed up matching by running
 some checks at the start of each match attempt. For example, it may know the  some checks at the start of each match attempt. For example, it may know the
Line 2574  present. When one of these optimizations suppresses th Line 2591  present. When one of these optimizations suppresses th
 included backtracking verbs will not, of course, be processed. You can suppress  included backtracking verbs will not, of course, be processed. You can suppress
 the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option  the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option
 when calling <b>pcre_compile()</b> or <b>pcre_exec()</b>, or by starting the  when calling <b>pcre_compile()</b> or <b>pcre_exec()</b>, or by starting the
pattern with (*NO_START_OPT).pattern with (*NO_START_OPT). There is more discussion of this option in the
 section entitled
 <a href="pcreapi.html#execoptions">"Option bits for <b>pcre_exec()</b>"</a>
 in the
 <a href="pcreapi.html"><b>pcreapi</b></a>
 documentation.
 </P>  </P>
 <P>  <P>
 Experiments with Perl suggest that it too has similar optimizations, sometimes  Experiments with Perl suggest that it too has similar optimizations, sometimes
Line 2662  After a partial match or a failed match, the name of t Line 2684  After a partial match or a failed match, the name of t
   No match, mark = B    No match, mark = B
 </pre>  </pre>
 Note that in this unanchored example the mark is retained from the match  Note that in this unanchored example the mark is retained from the match
attempt that started at the letter "X". Subsequent match attempts starting atattempt that started at the letter "X" in the subject. Subsequent match
"P" and then with an empty string do not get as far as the (*MARK) item, butattempts starting at "P" and then with an empty string do not get as far as the
nevertheless do not reset it.(*MARK) item, but nevertheless do not reset it.
 </P>  </P>
   <P>
   If you are interested in (*MARK) values after failed matches, you should
   probably set the PCRE_NO_START_OPTIMIZE option
   <a href="#nooptimize">(see above)</a>
   to ensure that the match is always attempted.
   </P>
 <br><b>  <br><b>
 Verbs that act after backtracking  Verbs that act after backtracking
 </b><br>  </b><br>
Line 2843  Cambridge CB2 3QH, England. Line 2871  Cambridge CB2 3QH, England.
 </P>  </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>  <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>  <P>
Last updated: 09 January 2012Last updated: 17 June 2012
 <br>  <br>
 Copyright &copy; 1997-2012 University of Cambridge.  Copyright &copy; 1997-2012 University of Cambridge.
 <br>  <br>

Removed from v.1.1.1.2  
changed lines
  Added in v.1.1.1.3


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>