|
version 1.1.1.3, 2012/10/09 09:19:18
|
version 1.1.1.5, 2014/06/15 19:46:05
|
|
Line 65 documentation. This document contains a quick-referenc
|
Line 65 documentation. This document contains a quick-referenc
|
| \n newline (hex 0A) |
\n newline (hex 0A) |
| \r carriage return (hex 0D) |
\r carriage return (hex 0D) |
| \t tab (hex 09) |
\t tab (hex 09) |
| |
\0dd character with octal code 0dd |
| \ddd character with octal code ddd, or backreference |
\ddd character with octal code ddd, or backreference |
| |
\o{ddd..} character with octal code ddd.. |
| \xhh character with hex code hh |
\xhh character with hex code hh |
| \x{hhh..} character with hex code hhh.. |
\x{hhh..} character with hex code hhh.. |
| </PRE> | </pre> |
| | Note that \0dd is always an octal code, and that \8 and \9 are the literal |
| | characters "8" and "9". |
| </P> |
</P> |
| <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> |
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> |
| <P> |
<P> |
|
Line 90 documentation. This document contains a quick-referenc
|
Line 94 documentation. This document contains a quick-referenc
|
| \V a character that is not a vertical white space character |
\V a character that is not a vertical white space character |
| \w a "word" character |
\w a "word" character |
| \W a "non-word" character |
\W a "non-word" character |
| \X an extended Unicode sequence | \X a Unicode extended grapheme cluster |
| </pre> |
</pre> |
| In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII | By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode |
| characters, even in a UTF mode. However, this can be changed by setting the | or in the 16- bit and 32-bit libraries. However, if locale-specific matching is |
| PCRE_UCP option. | happening, \s and \w may also match characters with code points in the range |
| | 128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences |
| | is changed to use Unicode properties and they match many more characters. |
| </P> |
</P> |
| <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> |
<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> |
| <P> |
<P> |
|
Line 150 PCRE_UCP option.
|
Line 156 PCRE_UCP option.
|
| <pre> |
<pre> |
| Xan Alphanumeric: union of properties L and N |
Xan Alphanumeric: union of properties L and N |
| Xps POSIX space: property Z or tab, NL, VT, FF, CR |
Xps POSIX space: property Z or tab, NL, VT, FF, CR |
| Xsp Perl space: property Z or tab, NL, FF, CR | Xsp Perl space: property Z or tab, NL, VT, FF, CR |
| | Xuc Univerally-named character: one that can be |
| | represented by a Universal Character Name |
| Xwd Perl word: property Xan or underscore |
Xwd Perl word: property Xan or underscore |
| </PRE> | </pre> |
| | Perl and POSIX space are now the same. Perl added VT to its space character set |
| | at release 5.18 and PCRE changed at release 8.34. |
| </P> |
</P> |
| <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> |
<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> |
| <P> |
<P> |
|
Line 375 but some of them use Unicode properties if PCRE_UCP is
|
Line 385 but some of them use Unicode properties if PCRE_UCP is
|
| The following are recognized only at the start of a pattern or after one of the |
The following are recognized only at the start of a pattern or after one of the |
| newline-setting options with similar syntax: |
newline-setting options with similar syntax: |
| <pre> |
<pre> |
| |
(*LIMIT_MATCH=d) set the match limit to d (decimal number) |
| |
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) |
| (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) |
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) |
| (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) |
(*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) |
| (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) |
(*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) |
| |
(*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) |
| |
(*UTF) set appropriate UTF mode for the library in use |
| (*UCP) set PCRE_UCP (use Unicode properties for \d etc) |
(*UCP) set PCRE_UCP (use Unicode properties for \d etc) |
| </PRE> | </pre> |
| | Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the |
| | limits set by the caller of pcre_exec(), not increase them. |
| </P> |
</P> |
| <br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> |
<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> |
| <P> |
<P> |
|
Line 469 pattern is not anchored.
|
Line 485 pattern is not anchored.
|
| <br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> |
<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> |
| <P> |
<P> |
| These are recognized only at the very start of the pattern or after a |
These are recognized only at the very start of the pattern or after a |
| (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option. | (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option. |
| <pre> |
<pre> |
| (*CR) carriage return only |
(*CR) carriage return only |
| (*LF) linefeed only |
(*LF) linefeed only |
|
Line 510 Cambridge CB2 3QH, England.
|
Line 526 Cambridge CB2 3QH, England.
|
| </P> |
</P> |
| <br><a name="SEC27" href="#TOC1">REVISION</a><br> |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
| <P> |
<P> |
| Last updated: 10 January 2012 | Last updated: 12 November 2013 |
| <br> |
<br> |
| Copyright © 1997-2012 University of Cambridge. | Copyright © 1997-2013 University of Cambridge. |
| <br> |
<br> |
| <p> |
<p> |
| Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |