| version 1.1.1.2, 2012/02/21 23:50:25 | version 1.1.1.5, 2014/06/15 19:46:05 | 
| Line 61  documentation. This document contains a quick-referenc | Line 61  documentation. This document contains a quick-referenc | 
 | \a         alarm, that is, the BEL character (hex 07) | \a         alarm, that is, the BEL character (hex 07) | 
 | \cx        "control-x", where x is any ASCII character | \cx        "control-x", where x is any ASCII character | 
 | \e         escape (hex 1B) | \e         escape (hex 1B) | 
| \f         formfeed (hex 0C) | \f         form feed (hex 0C) | 
 | \n         newline (hex 0A) | \n         newline (hex 0A) | 
 | \r         carriage return (hex 0D) | \r         carriage return (hex 0D) | 
 | \t         tab (hex 09) | \t         tab (hex 09) | 
 |  | \0dd       character with octal code 0dd | 
 | \ddd       character with octal code ddd, or backreference | \ddd       character with octal code ddd, or backreference | 
 |  | \o{ddd..}  character with octal code ddd.. | 
 | \xhh       character with hex code hh | \xhh       character with hex code hh | 
 | \x{hhh..}  character with hex code hhh.. | \x{hhh..}  character with hex code hhh.. | 
| </PRE> | </pre> | 
|  | Note that \0dd is always an octal code, and that \8 and \9 are the literal | 
|  | characters "8" and "9". | 
 | </P> | </P> | 
 | <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> | <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> | 
 | <P> | <P> | 
| Line 78  documentation. This document contains a quick-referenc | Line 82  documentation. This document contains a quick-referenc | 
 | \C         one data unit, even in UTF mode (best avoided) | \C         one data unit, even in UTF mode (best avoided) | 
 | \d         a decimal digit | \d         a decimal digit | 
 | \D         a character that is not a decimal digit | \D         a character that is not a decimal digit | 
| \h         a horizontal whitespace character | \h         a horizontal white space character | 
| \H         a character that is not a horizontal whitespace character | \H         a character that is not a horizontal white space character | 
 | \N         a character that is not a newline | \N         a character that is not a newline | 
 | \p{<i>xx</i>}     a character with the <i>xx</i> property | \p{<i>xx</i>}     a character with the <i>xx</i> property | 
 | \P{<i>xx</i>}     a character without the <i>xx</i> property | \P{<i>xx</i>}     a character without the <i>xx</i> property | 
 | \R         a newline sequence | \R         a newline sequence | 
| \s         a whitespace character | \s         a white space character | 
| \S         a character that is not a whitespace character | \S         a character that is not a white space character | 
| \v         a vertical whitespace character | \v         a vertical white space character | 
| \V         a character that is not a vertical whitespace character | \V         a character that is not a vertical white space character | 
 | \w         a "word" character | \w         a "word" character | 
 | \W         a "non-word" character | \W         a "non-word" character | 
| \X         an extended Unicode sequence | \X         a Unicode extended grapheme cluster | 
 | </pre> | </pre> | 
| In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII | By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode | 
| characters, even in a UTF mode. However, this can be changed by setting the | or in the 16- bit and 32-bit libraries. However, if locale-specific matching is | 
| PCRE_UCP option. | happening, \s and \w may also match characters with code points in the range | 
|  | 128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences | 
|  | is changed to use Unicode properties and they match many more characters. | 
 | </P> | </P> | 
 | <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> | <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> | 
 | <P> | <P> | 
| Line 150  PCRE_UCP option. | Line 156  PCRE_UCP option. | 
 | <pre> | <pre> | 
 | Xan        Alphanumeric: union of properties L and N | Xan        Alphanumeric: union of properties L and N | 
 | Xps        POSIX space: property Z or tab, NL, VT, FF, CR | Xps        POSIX space: property Z or tab, NL, VT, FF, CR | 
| Xsp        Perl space: property Z or tab, NL, FF, CR | Xsp        Perl space: property Z or tab, NL, VT, FF, CR | 
|  | Xuc        Univerally-named character: one that can be | 
|  | represented by a Universal Character Name | 
 | Xwd        Perl word: property Xan or underscore | Xwd        Perl word: property Xan or underscore | 
| </PRE> | </pre> | 
|  | Perl and POSIX space are now the same. Perl added VT to its space character set | 
|  | at release 5.18 and PCRE changed at release 8.34. | 
 | </P> | </P> | 
 | <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> | <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> | 
 | <P> | <P> | 
| Line 161  Armenian, | Line 171  Armenian, | 
 | Avestan, | Avestan, | 
 | Balinese, | Balinese, | 
 | Bamum, | Bamum, | 
 |  | Batak, | 
 | Bengali, | Bengali, | 
 | Bopomofo, | Bopomofo, | 
 |  | Brahmi, | 
 | Braille, | Braille, | 
 | Buginese, | Buginese, | 
 | Buhid, | Buhid, | 
 | Canadian_Aboriginal, | Canadian_Aboriginal, | 
 | Carian, | Carian, | 
 |  | Chakma, | 
 | Cham, | Cham, | 
 | Cherokee, | Cherokee, | 
 | Common, | Common, | 
| Line 210  Lisu, | Line 223  Lisu, | 
 | Lycian, | Lycian, | 
 | Lydian, | Lydian, | 
 | Malayalam, | Malayalam, | 
 |  | Mandaic, | 
 | Meetei_Mayek, | Meetei_Mayek, | 
 |  | Meroitic_Cursive, | 
 |  | Meroitic_Hieroglyphs, | 
 |  | Miao, | 
 | Mongolian, | Mongolian, | 
 | Myanmar, | Myanmar, | 
 | New_Tai_Lue, | New_Tai_Lue, | 
| Line 229  Rejang, | Line 246  Rejang, | 
 | Runic, | Runic, | 
 | Samaritan, | Samaritan, | 
 | Saurashtra, | Saurashtra, | 
 |  | Sharada, | 
 | Shavian, | Shavian, | 
 | Sinhala, | Sinhala, | 
 |  | Sora_Sompeng, | 
 | Sundanese, | Sundanese, | 
 | Syloti_Nagri, | Syloti_Nagri, | 
 | Syriac, | Syriac, | 
| Line 239  Tagbanwa, | Line 258  Tagbanwa, | 
 | Tai_Le, | Tai_Le, | 
 | Tai_Tham, | Tai_Tham, | 
 | Tai_Viet, | Tai_Viet, | 
 |  | Takri, | 
 | Tamil, | Tamil, | 
 | Telugu, | Telugu, | 
 | Thaana, | Thaana, | 
| Line 268  Yi. | Line 288  Yi. | 
 | lower       lower case letter | lower       lower case letter | 
 | print       printing, including space | print       printing, including space | 
 | punct       printing, excluding alphanumeric | punct       printing, excluding alphanumeric | 
| space       whitespace | space       white space | 
 | upper       upper case letter | upper       upper case letter | 
 | word        same as \w | word        same as \w | 
 | xdigit      hexadecimal digit | xdigit      hexadecimal digit | 
| Line 365  but some of them use Unicode properties if PCRE_UCP is | Line 385  but some of them use Unicode properties if PCRE_UCP is | 
 | The following are recognized only at the start of a pattern or after one of the | The following are recognized only at the start of a pattern or after one of the | 
 | newline-setting options with similar syntax: | newline-setting options with similar syntax: | 
 | <pre> | <pre> | 
 |  | (*LIMIT_MATCH=d) set the match limit to d (decimal number) | 
 |  | (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) | 
 | (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) | (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) | 
 | (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8) | (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8) | 
 | (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16) | (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16) | 
 |  | (*UTF32)        set UTF-32 mode: 32-bit library (PCRE_UTF32) | 
 |  | (*UTF)          set appropriate UTF mode for the library in use | 
 | (*UCP)          set PCRE_UCP (use Unicode properties for \d etc) | (*UCP)          set PCRE_UCP (use Unicode properties for \d etc) | 
| </PRE> | </pre> | 
|  | Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the | 
|  | limits set by the caller of pcre_exec(), not increase them. | 
 | </P> | </P> | 
 | <br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> | <br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> | 
 | <P> | <P> | 
| Line 459  pattern is not anchored. | Line 485  pattern is not anchored. | 
 | <br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> | <br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> | 
 | <P> | <P> | 
 | These are recognized only at the very start of the pattern or after a | These are recognized only at the very start of the pattern or after a | 
| (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option. | (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option. | 
 | <pre> | <pre> | 
 | (*CR)           carriage return only | (*CR)           carriage return only | 
 | (*LF)           linefeed only | (*LF)           linefeed only | 
| Line 500  Cambridge CB2 3QH, England. | Line 526  Cambridge CB2 3QH, England. | 
 | </P> | </P> | 
 | <br><a name="SEC27" href="#TOC1">REVISION</a><br> | <br><a name="SEC27" href="#TOC1">REVISION</a><br> | 
 | <P> | <P> | 
| Last updated: 10 January 2012 | Last updated: 12 November 2013 | 
 | <br> | <br> | 
| Copyright © 1997-2012 University of Cambridge. | Copyright © 1997-2013 University of Cambridge. | 
 | <br> | <br> | 
 | <p> | <p> | 
 | Return to the <a href="index.html">PCRE index page</a>. | Return to the <a href="index.html">PCRE index page</a>. |