version 1.1.1.2, 2012/02/21 23:50:25
|
version 1.1.1.5, 2014/06/15 19:46:05
|
Line 61 documentation. This document contains a quick-referenc
|
Line 61 documentation. This document contains a quick-referenc
|
\a alarm, that is, the BEL character (hex 07) |
\a alarm, that is, the BEL character (hex 07) |
\cx "control-x", where x is any ASCII character |
\cx "control-x", where x is any ASCII character |
\e escape (hex 1B) |
\e escape (hex 1B) |
\f formfeed (hex 0C) | \f form feed (hex 0C) |
\n newline (hex 0A) |
\n newline (hex 0A) |
\r carriage return (hex 0D) |
\r carriage return (hex 0D) |
\t tab (hex 09) |
\t tab (hex 09) |
|
\0dd character with octal code 0dd |
\ddd character with octal code ddd, or backreference |
\ddd character with octal code ddd, or backreference |
|
\o{ddd..} character with octal code ddd.. |
\xhh character with hex code hh |
\xhh character with hex code hh |
\x{hhh..} character with hex code hhh.. |
\x{hhh..} character with hex code hhh.. |
</PRE> | </pre> |
| Note that \0dd is always an octal code, and that \8 and \9 are the literal |
| characters "8" and "9". |
</P> |
</P> |
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> |
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> |
<P> |
<P> |
Line 78 documentation. This document contains a quick-referenc
|
Line 82 documentation. This document contains a quick-referenc
|
\C one data unit, even in UTF mode (best avoided) |
\C one data unit, even in UTF mode (best avoided) |
\d a decimal digit |
\d a decimal digit |
\D a character that is not a decimal digit |
\D a character that is not a decimal digit |
\h a horizontal whitespace character | \h a horizontal white space character |
\H a character that is not a horizontal whitespace character | \H a character that is not a horizontal white space character |
\N a character that is not a newline |
\N a character that is not a newline |
\p{<i>xx</i>} a character with the <i>xx</i> property |
\p{<i>xx</i>} a character with the <i>xx</i> property |
\P{<i>xx</i>} a character without the <i>xx</i> property |
\P{<i>xx</i>} a character without the <i>xx</i> property |
\R a newline sequence |
\R a newline sequence |
\s a whitespace character | \s a white space character |
\S a character that is not a whitespace character | \S a character that is not a white space character |
\v a vertical whitespace character | \v a vertical white space character |
\V a character that is not a vertical whitespace character | \V a character that is not a vertical white space character |
\w a "word" character |
\w a "word" character |
\W a "non-word" character |
\W a "non-word" character |
\X an extended Unicode sequence | \X a Unicode extended grapheme cluster |
</pre> |
</pre> |
In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII | By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode |
characters, even in a UTF mode. However, this can be changed by setting the | or in the 16- bit and 32-bit libraries. However, if locale-specific matching is |
PCRE_UCP option. | happening, \s and \w may also match characters with code points in the range |
| 128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences |
| is changed to use Unicode properties and they match many more characters. |
</P> |
</P> |
<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> |
<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> |
<P> |
<P> |
Line 150 PCRE_UCP option.
|
Line 156 PCRE_UCP option.
|
<pre> |
<pre> |
Xan Alphanumeric: union of properties L and N |
Xan Alphanumeric: union of properties L and N |
Xps POSIX space: property Z or tab, NL, VT, FF, CR |
Xps POSIX space: property Z or tab, NL, VT, FF, CR |
Xsp Perl space: property Z or tab, NL, FF, CR | Xsp Perl space: property Z or tab, NL, VT, FF, CR |
| Xuc Univerally-named character: one that can be |
| represented by a Universal Character Name |
Xwd Perl word: property Xan or underscore |
Xwd Perl word: property Xan or underscore |
</PRE> | </pre> |
| Perl and POSIX space are now the same. Perl added VT to its space character set |
| at release 5.18 and PCRE changed at release 8.34. |
</P> |
</P> |
<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> |
<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> |
<P> |
<P> |
Line 161 Armenian,
|
Line 171 Armenian,
|
Avestan, |
Avestan, |
Balinese, |
Balinese, |
Bamum, |
Bamum, |
|
Batak, |
Bengali, |
Bengali, |
Bopomofo, |
Bopomofo, |
|
Brahmi, |
Braille, |
Braille, |
Buginese, |
Buginese, |
Buhid, |
Buhid, |
Canadian_Aboriginal, |
Canadian_Aboriginal, |
Carian, |
Carian, |
|
Chakma, |
Cham, |
Cham, |
Cherokee, |
Cherokee, |
Common, |
Common, |
Line 210 Lisu,
|
Line 223 Lisu,
|
Lycian, |
Lycian, |
Lydian, |
Lydian, |
Malayalam, |
Malayalam, |
|
Mandaic, |
Meetei_Mayek, |
Meetei_Mayek, |
|
Meroitic_Cursive, |
|
Meroitic_Hieroglyphs, |
|
Miao, |
Mongolian, |
Mongolian, |
Myanmar, |
Myanmar, |
New_Tai_Lue, |
New_Tai_Lue, |
Line 229 Rejang,
|
Line 246 Rejang,
|
Runic, |
Runic, |
Samaritan, |
Samaritan, |
Saurashtra, |
Saurashtra, |
|
Sharada, |
Shavian, |
Shavian, |
Sinhala, |
Sinhala, |
|
Sora_Sompeng, |
Sundanese, |
Sundanese, |
Syloti_Nagri, |
Syloti_Nagri, |
Syriac, |
Syriac, |
Line 239 Tagbanwa,
|
Line 258 Tagbanwa,
|
Tai_Le, |
Tai_Le, |
Tai_Tham, |
Tai_Tham, |
Tai_Viet, |
Tai_Viet, |
|
Takri, |
Tamil, |
Tamil, |
Telugu, |
Telugu, |
Thaana, |
Thaana, |
Line 268 Yi.
|
Line 288 Yi.
|
lower lower case letter |
lower lower case letter |
print printing, including space |
print printing, including space |
punct printing, excluding alphanumeric |
punct printing, excluding alphanumeric |
space whitespace | space white space |
upper upper case letter |
upper upper case letter |
word same as \w |
word same as \w |
xdigit hexadecimal digit |
xdigit hexadecimal digit |
Line 365 but some of them use Unicode properties if PCRE_UCP is
|
Line 385 but some of them use Unicode properties if PCRE_UCP is
|
The following are recognized only at the start of a pattern or after one of the |
The following are recognized only at the start of a pattern or after one of the |
newline-setting options with similar syntax: |
newline-setting options with similar syntax: |
<pre> |
<pre> |
|
(*LIMIT_MATCH=d) set the match limit to d (decimal number) |
|
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) |
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) |
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) |
(*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) |
(*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) |
(*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) |
(*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) |
|
(*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) |
|
(*UTF) set appropriate UTF mode for the library in use |
(*UCP) set PCRE_UCP (use Unicode properties for \d etc) |
(*UCP) set PCRE_UCP (use Unicode properties for \d etc) |
</PRE> | </pre> |
| Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the |
| limits set by the caller of pcre_exec(), not increase them. |
</P> |
</P> |
<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> |
<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> |
<P> |
<P> |
Line 459 pattern is not anchored.
|
Line 485 pattern is not anchored.
|
<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> |
<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> |
<P> |
<P> |
These are recognized only at the very start of the pattern or after a |
These are recognized only at the very start of the pattern or after a |
(*BSR_...), (*UTF8), (*UTF16) or (*UCP) option. | (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option. |
<pre> |
<pre> |
(*CR) carriage return only |
(*CR) carriage return only |
(*LF) linefeed only |
(*LF) linefeed only |
Line 500 Cambridge CB2 3QH, England.
|
Line 526 Cambridge CB2 3QH, England.
|
</P> |
</P> |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
<P> |
<P> |
Last updated: 10 January 2012 | Last updated: 12 November 2013 |
<br> |
<br> |
Copyright © 1997-2012 University of Cambridge. | Copyright © 1997-2013 University of Cambridge. |
<br> |
<br> |
<p> |
<p> |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |