--- embedaddon/pcre/doc/pcresyntax.3 2012/10/09 09:19:17 1.1.1.3 +++ embedaddon/pcre/doc/pcresyntax.3 2014/06/15 19:46:05 1.1.1.5 @@ -1,4 +1,4 @@ -.TH PCRESYNTAX 3 "10 January 2012" "PCRE 8.30" +.TH PCRESYNTAX 3 "12 November 2013" "PCRE 8.34" .SH NAME PCRE - Perl-compatible regular expressions .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY" @@ -29,9 +29,14 @@ documentation. This document contains a quick-referenc \en newline (hex 0A) \er carriage return (hex 0D) \et tab (hex 09) + \e0dd character with octal code 0dd \eddd character with octal code ddd, or backreference + \eo{ddd..} character with octal code ddd.. \exhh character with hex code hh \ex{hhh..} character with hex code hhh.. +.sp +Note that \e0dd is always an octal code, and that \e8 and \e9 are the literal +characters "8" and "9". . . .SH "CHARACTER TYPES" @@ -54,11 +59,13 @@ documentation. This document contains a quick-referenc \eV a character that is not a vertical white space character \ew a "word" character \eW a "non-word" character - \eX an extended Unicode sequence + \eX a Unicode extended grapheme cluster .sp -In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII -characters, even in a UTF mode. However, this can be changed by setting the -PCRE_UCP option. +By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode +or in the 16- bit and 32-bit libraries. However, if locale-specific matching is +happening, \es and \ew may also match characters with code points in the range +128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences +is changed to use Unicode properties and they match many more characters. . . .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP" @@ -115,8 +122,13 @@ PCRE_UCP option. .sp Xan Alphanumeric: union of properties L and N Xps POSIX space: property Z or tab, NL, VT, FF, CR - Xsp Perl space: property Z or tab, NL, FF, CR + Xsp Perl space: property Z or tab, NL, VT, FF, CR + Xuc Univerally-named character: one that can be + represented by a Universal Character Name Xwd Perl word: property Xan or underscore +.sp +Perl and POSIX space are now the same. Perl added VT to its space character set +at release 5.18 and PCRE changed at release 8.34. . . .SH "SCRIPT NAMES FOR \ep AND \eP" @@ -345,10 +357,17 @@ but some of them use Unicode properties if PCRE_UCP is The following are recognized only at the start of a pattern or after one of the newline-setting options with similar syntax: .sp + (*LIMIT_MATCH=d) set the match limit to d (decimal number) + (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) + (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) + (*UTF) set appropriate UTF mode for the library in use (*UCP) set PCRE_UCP (use Unicode properties for \ed etc) +.sp +Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the +limits set by the caller of pcre_exec(), not increase them. . . .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS" @@ -442,7 +461,7 @@ pattern is not anchored. .rs .sp These are recognized only at the very start of the pattern or after a -(*BSR_...), (*UTF8), (*UTF16) or (*UCP) option. +(*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option. .sp (*CR) carriage return only (*LF) linefeed only @@ -489,6 +508,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 10 January 2012 -Copyright (c) 1997-2012 University of Cambridge. +Last updated: 12 November 2013 +Copyright (c) 1997-2013 University of Cambridge. .fi