version 1.1.1.1, 2012/02/21 23:05:52
|
version 1.1.1.2, 2012/02/21 23:50:25
|
Line 46 man page, in case the conversion went wrong.
|
Line 46 man page, in case the conversion went wrong.
|
The full syntax and semantics of the regular expressions that are supported by |
The full syntax and semantics of the regular expressions that are supported by |
PCRE are described in the |
PCRE are described in the |
<a href="pcrepattern.html"><b>pcrepattern</b></a> |
<a href="pcrepattern.html"><b>pcrepattern</b></a> |
documentation. This document contains just a quick-reference summary of the | documentation. This document contains a quick-reference summary of the syntax. |
syntax. | |
</P> |
</P> |
<br><a name="SEC2" href="#TOC1">QUOTING</a><br> |
<br><a name="SEC2" href="#TOC1">QUOTING</a><br> |
<P> |
<P> |
Line 76 syntax.
|
Line 75 syntax.
|
<pre> |
<pre> |
. any character except newline; |
. any character except newline; |
in dotall mode, any character whatsoever |
in dotall mode, any character whatsoever |
\C one byte, even in UTF-8 mode (best avoided) | \C one data unit, even in UTF mode (best avoided) |
\d a decimal digit |
\d a decimal digit |
\D a character that is not a decimal digit |
\D a character that is not a decimal digit |
\h a horizontal whitespace character |
\h a horizontal whitespace character |
Line 94 syntax.
|
Line 93 syntax.
|
\X an extended Unicode sequence |
\X an extended Unicode sequence |
</pre> |
</pre> |
In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII |
In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII |
characters, even in UTF-8 mode. However, this can be changed by setting the | characters, even in a UTF mode. However, this can be changed by setting the |
PCRE_UCP option. |
PCRE_UCP option. |
</P> |
</P> |
<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> |
<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> |
Line 367 The following are recognized only at the start of a pa
|
Line 366 The following are recognized only at the start of a pa
|
newline-setting options with similar syntax: |
newline-setting options with similar syntax: |
<pre> |
<pre> |
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) |
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) |
(*UTF8) set UTF-8 mode (PCRE_UTF8) | (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) |
| (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) |
(*UCP) set PCRE_UCP (use Unicode properties for \d etc) |
(*UCP) set PCRE_UCP (use Unicode properties for \d etc) |
</PRE> |
</PRE> |
</P> |
</P> |
Line 439 The following act immediately they are reached:
|
Line 439 The following act immediately they are reached:
|
<pre> |
<pre> |
(*ACCEPT) force successful match |
(*ACCEPT) force successful match |
(*FAIL) force backtrack; synonym (*F) |
(*FAIL) force backtrack; synonym (*F) |
|
(*MARK:NAME) set name to be passed back; synonym (*:NAME) |
</pre> |
</pre> |
The following act only when a subsequent match failure causes a backtrack to |
The following act only when a subsequent match failure causes a backtrack to |
reach them. They all force a match failure, but they differ in what happens |
reach them. They all force a match failure, but they differ in what happens |
Line 447 pattern is not anchored.
|
Line 448 pattern is not anchored.
|
<pre> |
<pre> |
(*COMMIT) overall failure, no advance of starting point |
(*COMMIT) overall failure, no advance of starting point |
(*PRUNE) advance to next starting character |
(*PRUNE) advance to next starting character |
(*SKIP) advance start to current matching position | (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE) |
| (*SKIP) advance to current matching position |
| (*SKIP:NAME) advance to position corresponding to an earlier |
| (*MARK:NAME); if not found, the (*SKIP) is ignored |
(*THEN) local failure, backtrack to next alternation |
(*THEN) local failure, backtrack to next alternation |
|
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) |
</PRE> |
</PRE> |
</P> |
</P> |
<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> |
<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> |
<P> |
<P> |
These are recognized only at the very start of the pattern or after a |
These are recognized only at the very start of the pattern or after a |
(*BSR_...) or (*UTF8) or (*UCP) option. | (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option. |
<pre> |
<pre> |
(*CR) carriage return only |
(*CR) carriage return only |
(*LF) linefeed only |
(*LF) linefeed only |
Line 466 These are recognized only at the very start of the pat
|
Line 471 These are recognized only at the very start of the pat
|
<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br> |
<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br> |
<P> |
<P> |
These are recognized only at the very start of the pattern or after a |
These are recognized only at the very start of the pattern or after a |
(*...) option that sets the newline convention or UTF-8 or UCP mode. | (*...) option that sets the newline convention or a UTF or UCP mode. |
<pre> |
<pre> |
(*BSR_ANYCRLF) CR, LF, or CRLF |
(*BSR_ANYCRLF) CR, LF, or CRLF |
(*BSR_UNICODE) any Unicode newline sequence |
(*BSR_UNICODE) any Unicode newline sequence |
Line 495 Cambridge CB2 3QH, England.
|
Line 500 Cambridge CB2 3QH, England.
|
</P> |
</P> |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
<P> |
<P> |
Last updated: 21 November 2010 | Last updated: 10 January 2012 |
<br> |
<br> |
Copyright © 1997-2010 University of Cambridge. | Copyright © 1997-2012 University of Cambridge. |
<br> |
<br> |
<p> |
<p> |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |