Diff for /embedaddon/pcre/doc/pcrepattern.3 between versions 1.1.1.2 and 1.1.1.3

version 1.1.1.2, 2012/02/21 23:50:25 version 1.1.1.3, 2012/10/09 09:19:17
Line 1 Line 1
.TH PCREPATTERN 3.TH PCREPATTERN 3 "04 May 2012" "PCRE 8.31"
 .SH NAME  .SH NAME
 PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"  .SH "PCRE REGULAR EXPRESSION DETAILS"
Line 198  In a UTF mode, only ASCII numbers and letters have any Line 198  In a UTF mode, only ASCII numbers and letters have any
 backslash. All other characters (in particular, those whose codepoints are  backslash. All other characters (in particular, those whose codepoints are
 greater than 127) are treated as literals.  greater than 127) are treated as literals.
 .P  .P
If a pattern is compiled with the PCRE_EXTENDED option, whitespace in theIf a pattern is compiled with the PCRE_EXTENDED option, white space in the
 pattern (other than in a character class) and characters between a # outside  pattern (other than in a character class) and characters between a # outside
 a character class and the next newline are ignored. An escaping backslash can  a character class and the next newline are ignored. An escaping backslash can
be used to include a whitespace or # character as part of the pattern.be used to include a white space or # character as part of the pattern.
 .P  .P
 If you want to remove the special meaning from a sequence of characters, you  If you want to remove the special meaning from a sequence of characters, you
 can do so by putting them between \eQ and \eE. This is different from Perl in  can do so by putting them between \eQ and \eE. This is different from Perl in
Line 237  one of the following escape sequences than the binary  Line 237  one of the following escape sequences than the binary 
   \ea        alarm, that is, the BEL character (hex 07)    \ea        alarm, that is, the BEL character (hex 07)
   \ecx       "control-x", where x is any ASCII character    \ecx       "control-x", where x is any ASCII character
   \ee        escape (hex 1B)    \ee        escape (hex 1B)
  \ef        formfeed (hex 0C)  \ef        form feed (hex 0C)
   \en        linefeed (hex 0A)    \en        linefeed (hex 0A)
   \er        carriage return (hex 0D)    \er        carriage return (hex 0D)
   \et        tab (hex 09)    \et        tab (hex 09)
Line 277  as just described only when it is followed by two hexa Line 277  as just described only when it is followed by two hexa
 Otherwise, it matches a literal "x" character. In JavaScript mode, support for  Otherwise, it matches a literal "x" character. In JavaScript mode, support for
 code points greater than 256 is provided by \eu, which must be followed by  code points greater than 256 is provided by \eu, which must be followed by
 four hexadecimal digits; otherwise it matches a literal "u" character.  four hexadecimal digits; otherwise it matches a literal "u" character.
   Character codes specified by \eu in JavaScript mode are constrained in the same
   was as those specified by \ex in non-JavaScript mode.
 .P  .P
 Characters whose value is less than 256 can be defined by either of the two  Characters whose value is less than 256 can be defined by either of the two
 syntaxes for \ex (or by \eu in JavaScript mode). There is no difference in the  syntaxes for \ex (or by \eu in JavaScript mode). There is no difference in the
Line 399  Another use of backslash is for specifying generic cha Line 401  Another use of backslash is for specifying generic cha
 .sp  .sp
   \ed     any decimal digit    \ed     any decimal digit
   \eD     any character that is not a decimal digit    \eD     any character that is not a decimal digit
  \eh     any horizontal whitespace character  \eh     any horizontal white space character
  \eH     any character that is not a horizontal whitespace character  \eH     any character that is not a horizontal white space character
  \es     any whitespace character  \es     any white space character
  \eS     any character that is not a whitespace character  \eS     any character that is not a white space character
  \ev     any vertical whitespace character  \ev     any vertical white space character
  \eV     any character that is not a vertical whitespace character  \eV     any character that is not a vertical white space character
   \ew     any "word" character    \ew     any "word" character
   \eW     any "non-word" character    \eW     any "non-word" character
 .sp  .sp
Line 493  The vertical space characters are: Line 495  The vertical space characters are:
 .sp  .sp
   U+000A     Linefeed    U+000A     Linefeed
   U+000B     Vertical tab    U+000B     Vertical tab
  U+000C     Formfeed  U+000C     Form feed
   U+000D     Carriage return    U+000D     Carriage return
   U+0085     Next line    U+0085     Next line
   U+2028     Line separator    U+2028     Line separator
Line 520  below. Line 522  below.
 .\"  .\"
 This particular group matches either the two-character sequence CR followed by  This particular group matches either the two-character sequence CR followed by
 LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,  LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (nextU+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
 line, U+0085). The two-character sequence is treated as a single unit that  line, U+0085). The two-character sequence is treated as a single unit that
 cannot be split.  cannot be split.
 .P  .P
Line 596  Armenian, Line 598  Armenian,
 Avestan,  Avestan,
 Balinese,  Balinese,
 Bamum,  Bamum,
   Batak,
 Bengali,  Bengali,
 Bopomofo,  Bopomofo,
   Brahmi,
 Braille,  Braille,
 Buginese,  Buginese,
 Buhid,  Buhid,
 Canadian_Aboriginal,  Canadian_Aboriginal,
 Carian,  Carian,
   Chakma,
 Cham,  Cham,
 Cherokee,  Cherokee,
 Common,  Common,
Line 645  Lisu, Line 650  Lisu,
 Lycian,  Lycian,
 Lydian,  Lydian,
 Malayalam,  Malayalam,
   Mandaic,
 Meetei_Mayek,  Meetei_Mayek,
   Meroitic_Cursive,
   Meroitic_Hieroglyphs,
   Miao,
 Mongolian,  Mongolian,
 Myanmar,  Myanmar,
 New_Tai_Lue,  New_Tai_Lue,
Line 664  Rejang, Line 673  Rejang,
 Runic,  Runic,
 Samaritan,  Samaritan,
 Saurashtra,  Saurashtra,
   Sharada,
 Shavian,  Shavian,
 Sinhala,  Sinhala,
   Sora_Sompeng,
 Sundanese,  Sundanese,
 Syloti_Nagri,  Syloti_Nagri,
 Syriac,  Syriac,
Line 674  Tagbanwa, Line 685  Tagbanwa,
 Tai_Le,  Tai_Le,
 Tai_Tham,  Tai_Tham,
 Tai_Viet,  Tai_Viet,
   Takri,
 Tamil,  Tamil,
 Telugu,  Telugu,
 Thaana,  Thaana,
Line 809  PCRE_UCP is set. They are: Line 821  PCRE_UCP is set. They are:
   Xwd   Any Perl "word" character    Xwd   Any Perl "word" character
 .sp  .sp
 Xan matches characters that have either the L (letter) or the N (number)  Xan matches characters that have either the L (letter) or the N (number)
property. Xps matches the characters tab, linefeed, vertical tab, formfeed, orproperty. Xps matches the characters tab, linefeed, vertical tab, form feed, or
 carriage return, and any other character that has the Z (separator) property.  carriage return, and any other character that has the Z (separator) property.
 Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the  Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the
 same characters as Xan, plus underscore.  same characters as Xan, plus underscore.
Line 1010  used. Because \eC breaks up characters into individual Line 1022  used. Because \eC breaks up characters into individual
 unit with \eC in a UTF mode means that the rest of the string may start with a  unit with \eC in a UTF mode means that the rest of the string may start with a
 malformed UTF character. This has undefined results, because PCRE assumes that  malformed UTF character. This has undefined results, because PCRE assumes that
 it is dealing with valid UTF strings (and by default it checks this at the  it is dealing with valid UTF strings (and by default it checks this at the
start of processing unless the PCRE_NO_UTF8_CHECK option is used).start of processing unless the PCRE_NO_UTF8_CHECK or PCRE_NO_UTF16_CHECK option
 is used).
 .P  .P
 PCRE does not allow \eC to appear in lookbehind assertions  PCRE does not allow \eC to appear in lookbehind assertions
 .\" HTML <a href="#lookbehind">  .\" HTML <a href="#lookbehind">
Line 1832  Because there may be many capturing parentheses in a p Line 1845  Because there may be many capturing parentheses in a p
 following a backslash are taken as part of a potential back reference number.  following a backslash are taken as part of a potential back reference number.
 If the pattern continues with a digit character, some delimiter must be used to  If the pattern continues with a digit character, some delimiter must be used to
 terminate the back reference. If the PCRE_EXTENDED option is set, this can be  terminate the back reference. If the PCRE_EXTENDED option is set, this can be
whitespace. Otherwise, the \eg{ syntax or an empty comment (seewhite space. Otherwise, the \eg{ syntax or an empty comment (see
 .\" HTML <a href="#comments">  .\" HTML <a href="#comments">
 .\" </a>  .\" </a>
 "Comments"  "Comments"
Line 2189  subroutines that can be referenced from elsewhere. (Th Line 2202  subroutines that can be referenced from elsewhere. (Th
 subroutines  subroutines
 .\"  .\"
 is described below.) For example, a pattern to match an IPv4 address such as  is described below.) For example, a pattern to match an IPv4 address such as
"192.168.23.245" could be written like this (ignore whitespace and line"192.168.23.245" could be written like this (ignore white space and line
 breaks):  breaks):
 .sp  .sp
   (?(DEFINE) (?<byte> 2[0-4]\ed | 25[0-5] | 1\ed\ed | [1-9]?\ed) )    (?(DEFINE) (?<byte> 2[0-4]\ed | 25[0-5] | 1\ed\ed | [1-9]?\ed) )
Line 2588  exception: the name from a *(MARK), (*PRUNE), or (*THE Line 2601  exception: the name from a *(MARK), (*PRUNE), or (*THE
 a successful positive assertion \fIis\fP passed back when a match succeeds  a successful positive assertion \fIis\fP passed back when a match succeeds
 (compare capturing parentheses in assertions). Note that such subpatterns are  (compare capturing parentheses in assertions). Note that such subpatterns are
 processed as anchored at the point where they are tested. Note also that Perl's  processed as anchored at the point where they are tested. Note also that Perl's
treatment of subroutines is different in some cases.treatment of subroutines and assertions is different in some cases.
 .P  .P
 The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
 parenthesis followed by an asterisk. They are generally of the form  parenthesis followed by an asterisk. They are generally of the form
 (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,  (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
 depending on whether or not an argument is present. A name is any sequence of  depending on whether or not an argument is present. A name is any sequence of
characters that does not include a closing parenthesis. If the name is empty,characters that does not include a closing parenthesis. The maximum length of
that is, if the closing parenthesis immediately follows the colon, the effectname is 255 in the 8-bit library and 65535 in the 16-bit library. If the name
is as if the colon were not there. Any number of these verbs may occur in ais empty, that is, if the closing parenthesis immediately follows the colon,
pattern.the effect is as if the colon were not there. Any number of these verbs may
.Poccur in a pattern.
 .
 .
 .\" HTML <a name="nooptimize"></a>
 .SS "Optimizations that affect backtracking verbs"
 .rs
 .sp
 PCRE contains some optimizations that are used to speed up matching by running  PCRE contains some optimizations that are used to speed up matching by running
 some checks at the start of each match attempt. For example, it may know the  some checks at the start of each match attempt. For example, it may know the
 minimum length of matching subject, or that a particular character must be  minimum length of matching subject, or that a particular character must be
Line 2606  present. When one of these optimizations suppresses th Line 2625  present. When one of these optimizations suppresses th
 included backtracking verbs will not, of course, be processed. You can suppress  included backtracking verbs will not, of course, be processed. You can suppress
 the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option  the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option
 when calling \fBpcre_compile()\fP or \fBpcre_exec()\fP, or by starting the  when calling \fBpcre_compile()\fP or \fBpcre_exec()\fP, or by starting the
pattern with (*NO_START_OPT).pattern with (*NO_START_OPT). There is more discussion of this option in the
 section entitled
 .\" HTML <a href="pcreapi.html#execoptions">
 .\" </a>
 "Option bits for \fBpcre_exec()\fP"
 .\"
 in the
 .\" HREF
 \fBpcreapi\fP
 .\"
 documentation.
 .P  .P
 Experiments with Perl suggest that it too has similar optimizations, sometimes  Experiments with Perl suggest that it too has similar optimizations, sometimes
 leading to anomalous results.  leading to anomalous results.
Line 2695  After a partial match or a failed match, the name of t Line 2724  After a partial match or a failed match, the name of t
   No match, mark = B    No match, mark = B
 .sp  .sp
 Note that in this unanchored example the mark is retained from the match  Note that in this unanchored example the mark is retained from the match
attempt that started at the letter "X". Subsequent match attempts starting atattempt that started at the letter "X" in the subject. Subsequent match
"P" and then with an empty string do not get as far as the (*MARK) item, butattempts starting at "P" and then with an empty string do not get as far as the
nevertheless do not reset it.(*MARK) item, but nevertheless do not reset it.
 .P
 If you are interested in (*MARK) values after failed matches, you should
 probably set the PCRE_NO_START_OPTIMIZE option
 .\" HTML <a href="#nooptimize">
 .\" </a>
 (see above)
 .\"
 to ensure that the match is always attempted.
 .  .
 .  .
 .SS "Verbs that act after backtracking"  .SS "Verbs that act after backtracking"
Line 2876  Cambridge CB2 3QH, England. Line 2913  Cambridge CB2 3QH, England.
 .rs  .rs
 .sp  .sp
 .nf  .nf
Last updated: 09 January 2012Last updated: 17 June 2012
 Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
 .fi  .fi

Removed from v.1.1.1.2  
changed lines
  Added in v.1.1.1.3


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>