|
version 1.1.1.4, 2013/07/22 08:25:57
|
version 1.1.1.5, 2014/06/15 19:46:05
|
|
Line 23 man page, in case the conversion went wrong.
|
Line 23 man page, in case the conversion went wrong.
|
| <li><a name="TOC8" href="#SEC8">MATCHING A SINGLE DATA UNIT</a> |
<li><a name="TOC8" href="#SEC8">MATCHING A SINGLE DATA UNIT</a> |
| <li><a name="TOC9" href="#SEC9">SQUARE BRACKETS AND CHARACTER CLASSES</a> |
<li><a name="TOC9" href="#SEC9">SQUARE BRACKETS AND CHARACTER CLASSES</a> |
| <li><a name="TOC10" href="#SEC10">POSIX CHARACTER CLASSES</a> |
<li><a name="TOC10" href="#SEC10">POSIX CHARACTER CLASSES</a> |
| <li><a name="TOC11" href="#SEC11">VERTICAL BAR</a> | <li><a name="TOC11" href="#SEC11">COMPATIBILITY FEATURE FOR WORD BOUNDARIES</a> |
| <li><a name="TOC12" href="#SEC12">INTERNAL OPTION SETTING</a> | <li><a name="TOC12" href="#SEC12">VERTICAL BAR</a> |
| <li><a name="TOC13" href="#SEC13">SUBPATTERNS</a> | <li><a name="TOC13" href="#SEC13">INTERNAL OPTION SETTING</a> |
| <li><a name="TOC14" href="#SEC14">DUPLICATE SUBPATTERN NUMBERS</a> | <li><a name="TOC14" href="#SEC14">SUBPATTERNS</a> |
| <li><a name="TOC15" href="#SEC15">NAMED SUBPATTERNS</a> | <li><a name="TOC15" href="#SEC15">DUPLICATE SUBPATTERN NUMBERS</a> |
| <li><a name="TOC16" href="#SEC16">REPETITION</a> | <li><a name="TOC16" href="#SEC16">NAMED SUBPATTERNS</a> |
| <li><a name="TOC17" href="#SEC17">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a> | <li><a name="TOC17" href="#SEC17">REPETITION</a> |
| <li><a name="TOC18" href="#SEC18">BACK REFERENCES</a> | <li><a name="TOC18" href="#SEC18">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a> |
| <li><a name="TOC19" href="#SEC19">ASSERTIONS</a> | <li><a name="TOC19" href="#SEC19">BACK REFERENCES</a> |
| <li><a name="TOC20" href="#SEC20">CONDITIONAL SUBPATTERNS</a> | <li><a name="TOC20" href="#SEC20">ASSERTIONS</a> |
| <li><a name="TOC21" href="#SEC21">COMMENTS</a> | <li><a name="TOC21" href="#SEC21">CONDITIONAL SUBPATTERNS</a> |
| <li><a name="TOC22" href="#SEC22">RECURSIVE PATTERNS</a> | <li><a name="TOC22" href="#SEC22">COMMENTS</a> |
| <li><a name="TOC23" href="#SEC23">SUBPATTERNS AS SUBROUTINES</a> | <li><a name="TOC23" href="#SEC23">RECURSIVE PATTERNS</a> |
| <li><a name="TOC24" href="#SEC24">ONIGURUMA SUBROUTINE SYNTAX</a> | <li><a name="TOC24" href="#SEC24">SUBPATTERNS AS SUBROUTINES</a> |
| <li><a name="TOC25" href="#SEC25">CALLOUTS</a> | <li><a name="TOC25" href="#SEC25">ONIGURUMA SUBROUTINE SYNTAX</a> |
| <li><a name="TOC26" href="#SEC26">BACKTRACKING CONTROL</a> | <li><a name="TOC26" href="#SEC26">CALLOUTS</a> |
| <li><a name="TOC27" href="#SEC27">SEE ALSO</a> | <li><a name="TOC27" href="#SEC27">BACKTRACKING CONTROL</a> |
| <li><a name="TOC28" href="#SEC28">AUTHOR</a> | <li><a name="TOC28" href="#SEC28">SEE ALSO</a> |
| <li><a name="TOC29" href="#SEC29">REVISION</a> | <li><a name="TOC29" href="#SEC29">AUTHOR</a> |
| | <li><a name="TOC30" href="#SEC30">REVISION</a> |
| </ul> |
</ul> |
| <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br> |
<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br> |
| <P> |
<P> |
|
Line 116 appearance causes an error.
|
Line 117 appearance causes an error.
|
| Unicode property support |
Unicode property support |
| </b><br> |
</b><br> |
| <P> |
<P> |
| Another special sequence that may appear at the start of a pattern is | Another special sequence that may appear at the start of a pattern is (*UCP). |
| <pre> | |
| (*UCP) | |
| </pre> | |
| This has the same effect as setting the PCRE_UCP option: it causes sequences |
This has the same effect as setting the PCRE_UCP option: it causes sequences |
| such as \d and \w to use Unicode properties to determine character types, |
such as \d and \w to use Unicode properties to determine character types, |
| instead of recognizing only characters with codes less than 128 via a lookup |
instead of recognizing only characters with codes less than 128 via a lookup |
| table. |
table. |
| </P> |
</P> |
| <br><b> |
<br><b> |
| |
Disabling auto-possessification |
| |
</b><br> |
| |
<P> |
| |
If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as setting |
| |
the PCRE_NO_AUTO_POSSESS option at compile time. This stops PCRE from making |
| |
quantifiers possessive when what follows cannot match the repeated item. For |
| |
example, by default a+b is treated as a++b. For more details, see the |
| |
<a href="pcreapi.html"><b>pcreapi</b></a> |
| |
documentation. |
| |
</P> |
| |
<br><b> |
| Disabling start-up optimizations |
Disabling start-up optimizations |
| </b><br> |
</b><br> |
| <P> |
<P> |
| If a pattern starts with (*NO_START_OPT), it has the same effect as setting the |
If a pattern starts with (*NO_START_OPT), it has the same effect as setting the |
| PCRE_NO_START_OPTIMIZE option either at compile or matching time. | PCRE_NO_START_OPTIMIZE option either at compile or matching time. This disables |
| | several optimizations for quickly reaching "no match" results. For more |
| | details, see the |
| | <a href="pcreapi.html"><b>pcreapi</b></a> |
| | documentation. |
| <a name="newlines"></a></P> |
<a name="newlines"></a></P> |
| <br><b> |
<br><b> |
| Newline conventions |
Newline conventions |
|
Line 193 pattern of the form
|
Line 206 pattern of the form
|
| (*LIMIT_RECURSION=d) |
(*LIMIT_RECURSION=d) |
| </pre> |
</pre> |
| where d is any number of decimal digits. However, the value of the setting must |
where d is any number of decimal digits. However, the value of the setting must |
| be less than the value set by the caller of <b>pcre_exec()</b> for it to have | be less than the value set (or defaulted) by the caller of <b>pcre_exec()</b> |
| any effect. In other words, the pattern writer can lower the limit set by the | for it to have any effect. In other words, the pattern writer can lower the |
| programmer, but not raise it. If there is more than one setting of one of these | limits set by the programmer, but not raise them. If there is more than one |
| limits, the lower value is used. | setting of one of these limits, the lower value is used. |
| </P> |
</P> |
| <br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br> |
<br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br> |
| <P> |
<P> |
|
Line 283 backslash. All other characters (in particular, those
|
Line 296 backslash. All other characters (in particular, those
|
| greater than 127) are treated as literals. |
greater than 127) are treated as literals. |
| </P> |
</P> |
| <P> |
<P> |
| If a pattern is compiled with the PCRE_EXTENDED option, white space in the | If a pattern is compiled with the PCRE_EXTENDED option, most white space in the |
| pattern (other than in a character class) and characters between a # outside | pattern (other than in a character class), and characters between a # outside a |
| a character class and the next newline are ignored. An escaping backslash can | character class and the next newline, inclusive, are ignored. An escaping |
| be used to include a white space or # character as part of the pattern. | backslash can be used to include a white space or # character as part of the |
| | pattern. |
| </P> |
</P> |
| <P> |
<P> |
| If you want to remove the special meaning from a sequence of characters, you |
If you want to remove the special meaning from a sequence of characters, you |
|
Line 324 one of the following escape sequences than the binary
|
Line 338 one of the following escape sequences than the binary
|
| \n linefeed (hex 0A) |
\n linefeed (hex 0A) |
| \r carriage return (hex 0D) |
\r carriage return (hex 0D) |
| \t tab (hex 09) |
\t tab (hex 09) |
| |
\0dd character with octal code 0dd |
| \ddd character with octal code ddd, or back reference |
\ddd character with octal code ddd, or back reference |
| |
\o{ddd..} character with octal code ddd.. |
| \xhh character with hex code hh |
\xhh character with hex code hh |
| \x{hhh..} character with hex code hhh.. (non-JavaScript mode) |
\x{hhh..} character with hex code hhh.. (non-JavaScript mode) |
| \uhhhh character with hex code hhhh (JavaScript mode only) |
\uhhhh character with hex code hhhh (JavaScript mode only) |
|
Line 347 the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z
|
Line 363 the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z
|
| characters also generate different values. |
characters also generate different values. |
| </P> |
</P> |
| <P> |
<P> |
| By default, after \x, from zero to two hexadecimal digits are read (letters |
|
| can be in upper or lower case). Any number of hexadecimal digits may appear |
|
| between \x{ and }, but the character code is constrained as follows: |
|
| <pre> |
|
| 8-bit non-UTF mode less than 0x100 |
|
| 8-bit UTF-8 mode less than 0x10ffff and a valid codepoint |
|
| 16-bit non-UTF mode less than 0x10000 |
|
| 16-bit UTF-16 mode less than 0x10ffff and a valid codepoint |
|
| 32-bit non-UTF mode less than 0x80000000 |
|
| 32-bit UTF-32 mode less than 0x10ffff and a valid codepoint |
|
| </pre> |
|
| Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called |
|
| "surrogate" codepoints), and 0xffef. |
|
| </P> |
|
| <P> |
|
| If characters other than hexadecimal digits appear between \x{ and }, or if |
|
| there is no terminating }, this form of escape is not recognized. Instead, the |
|
| initial \x will be interpreted as a basic hexadecimal escape, with no |
|
| following digits, giving a character whose value is zero. |
|
| </P> |
|
| <P> |
|
| If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x is |
|
| as just described only when it is followed by two hexadecimal digits. |
|
| Otherwise, it matches a literal "x" character. In JavaScript mode, support for |
|
| code points greater than 256 is provided by \u, which must be followed by |
|
| four hexadecimal digits; otherwise it matches a literal "u" character. |
|
| Character codes specified by \u in JavaScript mode are constrained in the same |
|
| was as those specified by \x in non-JavaScript mode. |
|
| </P> |
|
| <P> |
|
| Characters whose value is less than 256 can be defined by either of the two |
|
| syntaxes for \x (or by \u in JavaScript mode). There is no difference in the |
|
| way they are handled. For example, \xdc is exactly the same as \x{dc} (or |
|
| \u00dc in JavaScript mode). |
|
| </P> |
|
| <P> |
|
| After \0 up to two further octal digits are read. If there are fewer than two |
After \0 up to two further octal digits are read. If there are fewer than two |
| digits, just those that are present are used. Thus the sequence \0\x\07 |
digits, just those that are present are used. Thus the sequence \0\x\07 |
| specifies two binary zeros followed by a BEL character (code value 7). Make |
specifies two binary zeros followed by a BEL character (code value 7). Make |
|
Line 390 sure you supply two digits after the initial zero if t
|
Line 370 sure you supply two digits after the initial zero if t
|
| follows is itself an octal digit. |
follows is itself an octal digit. |
| </P> |
</P> |
| <P> |
<P> |
| The handling of a backslash followed by a digit other than 0 is complicated. | The escape \o must be followed by a sequence of octal digits, enclosed in |
| Outside a character class, PCRE reads it and any following digits as a decimal | braces. An error occurs if this is not the case. This escape is a recent |
| number. If the number is less than 10, or if there have been at least that many | addition to Perl; it provides way of specifying character code points as octal |
| | numbers greater than 0777, and it also allows octal numbers and back references |
| | to be unambiguously specified. |
| | </P> |
| | <P> |
| | For greater clarity and unambiguity, it is best to avoid following \ by a |
| | digit greater than zero. Instead, use \o{} or \x{} to specify character |
| | numbers, and \g{} to specify back references. The following paragraphs |
| | describe the old, ambiguous syntax. |
| | </P> |
| | <P> |
| | The handling of a backslash followed by a digit other than 0 is complicated, |
| | and Perl has changed in recent releases, causing PCRE also to change. Outside a |
| | character class, PCRE reads the digit and any following digits as a decimal |
| | number. If the number is less than 8, or if there have been at least that many |
| previous capturing left parentheses in the expression, the entire sequence is |
previous capturing left parentheses in the expression, the entire sequence is |
| taken as a <i>back reference</i>. A description of how this works is given |
taken as a <i>back reference</i>. A description of how this works is given |
| <a href="#backreferences">later,</a> |
<a href="#backreferences">later,</a> |
|
Line 400 following the discussion of
|
Line 394 following the discussion of
|
| <a href="#subpattern">parenthesized subpatterns.</a> |
<a href="#subpattern">parenthesized subpatterns.</a> |
| </P> |
</P> |
| <P> |
<P> |
| Inside a character class, or if the decimal number is greater than 9 and there | Inside a character class, or if the decimal number following \ is greater than |
| have not been that many capturing subpatterns, PCRE re-reads up to three octal | 7 and there have not been that many capturing subpatterns, PCRE handles \8 and |
| digits following the backslash, and uses them to generate a data character. Any | \9 as the literal characters "8" and "9", and otherwise re-reads up to three |
| subsequent digits stand for themselves. The value of the character is | octal digits following the backslash, using them to generate a data character. |
| constrained in the same way as characters specified in hexadecimal. | Any subsequent digits stand for themselves. For example: |
| For example: | |
| <pre> |
<pre> |
| \040 is another way of writing an ASCII space |
\040 is another way of writing an ASCII space |
| \40 is the same, provided there are fewer than 40 previous capturing subpatterns |
\40 is the same, provided there are fewer than 40 previous capturing subpatterns |
|
Line 415 For example:
|
Line 408 For example:
|
| \0113 is a tab followed by the character "3" |
\0113 is a tab followed by the character "3" |
| \113 might be a back reference, otherwise the character with octal code 113 |
\113 might be a back reference, otherwise the character with octal code 113 |
| \377 might be a back reference, otherwise the value 255 (decimal) |
\377 might be a back reference, otherwise the value 255 (decimal) |
| \81 is either a back reference, or a binary zero followed by the two characters "8" and "1" | \81 is either a back reference, or the two characters "8" and "1" |
| </pre> |
</pre> |
| Note that octal values of 100 or greater must not be introduced by a leading | Note that octal values of 100 or greater that are specified using this syntax |
| zero, because no more than three octal digits are ever read. | must not be introduced by a leading zero, because no more than three octal |
| | digits are ever read. |
| </P> |
</P> |
| <P> |
<P> |
| |
By default, after \x that is not followed by {, from zero to two hexadecimal |
| |
digits are read (letters can be in upper or lower case). Any number of |
| |
hexadecimal digits may appear between \x{ and }. If a character other than |
| |
a hexadecimal digit appears between \x{ and }, or if there is no terminating |
| |
}, an error occurs. |
| |
</P> |
| |
<P> |
| |
If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x is |
| |
as just described only when it is followed by two hexadecimal digits. |
| |
Otherwise, it matches a literal "x" character. In JavaScript mode, support for |
| |
code points greater than 256 is provided by \u, which must be followed by |
| |
four hexadecimal digits; otherwise it matches a literal "u" character. |
| |
</P> |
| |
<P> |
| |
Characters whose value is less than 256 can be defined by either of the two |
| |
syntaxes for \x (or by \u in JavaScript mode). There is no difference in the |
| |
way they are handled. For example, \xdc is exactly the same as \x{dc} (or |
| |
\u00dc in JavaScript mode). |
| |
</P> |
| |
<br><b> |
| |
Constraints on character values |
| |
</b><br> |
| |
<P> |
| |
Characters that are specified using octal or hexadecimal numbers are |
| |
limited to certain values, as follows: |
| |
<pre> |
| |
8-bit non-UTF mode less than 0x100 |
| |
8-bit UTF-8 mode less than 0x10ffff and a valid codepoint |
| |
16-bit non-UTF mode less than 0x10000 |
| |
16-bit UTF-16 mode less than 0x10ffff and a valid codepoint |
| |
32-bit non-UTF mode less than 0x100000000 |
| |
32-bit UTF-32 mode less than 0x10ffff and a valid codepoint |
| |
</pre> |
| |
Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called |
| |
"surrogate" codepoints), and 0xffef. |
| |
</P> |
| |
<br><b> |
| |
Escape sequences in character classes |
| |
</b><br> |
| |
<P> |
| All the sequences that define a single character value can be used both inside |
All the sequences that define a single character value can be used both inside |
| and outside character classes. In addition, inside a character class, \b is |
and outside character classes. In addition, inside a character class, \b is |
| interpreted as the backspace character (hex 08). |
interpreted as the backspace character (hex 08). |
|
Line 498 matching point is at the end of the subject string, al
|
Line 532 matching point is at the end of the subject string, al
|
| there is no character to match. |
there is no character to match. |
| </P> |
</P> |
| <P> |
<P> |
| For compatibility with Perl, \s does not match the VT character (code 11). | For compatibility with Perl, \s did not used to match the VT character (code |
| This makes it different from the the POSIX "space" class. The \s characters | 11), which made it different from the the POSIX "space" class. However, Perl |
| are HT (9), LF (10), FF (12), CR (13), and space (32). If "use locale;" is | added VT at release 5.18, and PCRE followed suit at release 8.34. The default |
| included in a Perl script, \s may match the VT character. In PCRE, it never | \s characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space |
| does. | (32), which are defined as white space in the "C" locale. This list may vary if |
| | locale-specific matching is taking place. For example, in some locales the |
| | "non-breaking space" character (\xA0) is recognized as white space, and in |
| | others the VT character is not. |
| </P> |
</P> |
| <P> |
<P> |
| A "word" character is an underscore or any character that is a letter or digit. |
A "word" character is an underscore or any character that is a letter or digit. |
|
Line 513 place (see
|
Line 550 place (see
|
| in the |
in the |
| <a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
| page). For example, in a French locale such as "fr_FR" in Unix-like systems, |
page). For example, in a French locale such as "fr_FR" in Unix-like systems, |
| or "french" in Windows, some character codes greater than 128 are used for | or "french" in Windows, some character codes greater than 127 are used for |
| accented letters, and these are then matched by \w. The use of locales with |
accented letters, and these are then matched by \w. The use of locales with |
| Unicode is discouraged. |
Unicode is discouraged. |
| </P> |
</P> |
| <P> |
<P> |
| By default, in a UTF mode, characters with values greater than 128 never match | By default, characters whose code points are greater than 127 never match \d, |
| \d, \s, or \w, and always match \D, \S, and \W. These sequences retain | \s, or \w, and always match \D, \S, and \W, although this may vary for |
| their original meanings from before UTF support was available, mainly for | characters in the range 128-255 when locale-specific matching is happening. |
| efficiency reasons. However, if PCRE is compiled with Unicode property support, | These escape sequences retain their original meanings from before Unicode |
| and the PCRE_UCP option is set, the behaviour is changed so that Unicode | support was available, mainly for efficiency reasons. If PCRE is compiled with |
| properties are used to determine character types, as follows: | Unicode property support, and the PCRE_UCP option is set, the behaviour is |
| | changed so that Unicode properties are used to determine character types, as |
| | follows: |
| <pre> |
<pre> |
| \d any character that \p{Nd} matches (decimal digit) | \d any character that matches \p{Nd} (decimal digit) |
| \s any character that \p{Z} matches, plus HT, LF, FF, CR | \s any character that matches \p{Z} or \h or \v |
| \w any character that \p{L} or \p{N} matches, plus underscore | \w any character that matches \p{L} or \p{N}, plus underscore |
| </pre> |
</pre> |
| The upper case escapes match the inverse sets of characters. Note that \d |
The upper case escapes match the inverse sets of characters. Note that \d |
| matches only decimal digits, whereas \w matches any Unicode digit, as well as |
matches only decimal digits, whereas \w matches any Unicode digit, as well as |
|
Line 538 is noticeably slower when PCRE_UCP is set.
|
Line 577 is noticeably slower when PCRE_UCP is set.
|
| <P> |
<P> |
| The sequences \h, \H, \v, and \V are features that were added to Perl at |
The sequences \h, \H, \v, and \V are features that were added to Perl at |
| release 5.10. In contrast to the other sequences, which match only ASCII |
release 5.10. In contrast to the other sequences, which match only ASCII |
| characters by default, these always match certain high-valued codepoints, | characters by default, these always match certain high-valued code points, |
| whether or not PCRE_UCP is set. The horizontal space characters are: |
whether or not PCRE_UCP is set. The horizontal space characters are: |
| <pre> |
<pre> |
| U+0009 Horizontal tab (HT) |
U+0009 Horizontal tab (HT) |
|
Line 913 PCRE's additional properties
|
Line 952 PCRE's additional properties
|
| <P> |
<P> |
| As well as the standard Unicode properties described above, PCRE supports four |
As well as the standard Unicode properties described above, PCRE supports four |
| more that make it possible to convert traditional escape sequences such as \w |
more that make it possible to convert traditional escape sequences such as \w |
| and \s and POSIX character classes to use Unicode properties. PCRE uses these | and \s to use Unicode properties. PCRE uses these non-standard, non-Perl |
| non-standard, non-Perl properties internally when PCRE_UCP is set. However, | properties internally when PCRE_UCP is set. However, they may also be used |
| they may also be used explicitly. These properties are: | explicitly. These properties are: |
| <pre> |
<pre> |
| Xan Any alphanumeric character |
Xan Any alphanumeric character |
| Xps Any POSIX space character |
Xps Any POSIX space character |
|
Line 925 they may also be used explicitly. These properties are
|
Line 964 they may also be used explicitly. These properties are
|
| Xan matches characters that have either the L (letter) or the N (number) |
Xan matches characters that have either the L (letter) or the N (number) |
| property. Xps matches the characters tab, linefeed, vertical tab, form feed, or |
property. Xps matches the characters tab, linefeed, vertical tab, form feed, or |
| carriage return, and any other character that has the Z (separator) property. |
carriage return, and any other character that has the Z (separator) property. |
| Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the | Xsp is the same as Xps; it used to exclude vertical tab, for Perl |
| same characters as Xan, plus underscore. | compatibility, but Perl changed, and so PCRE followed at release 8.34. Xwd |
| | matches the same characters as Xan, plus underscore. |
| </P> |
</P> |
| <P> |
<P> |
| There is another non-standard property, Xuc, which matches any character that |
There is another non-standard property, Xuc, which matches any character that |
|
Line 1218 The minus (hyphen) character can be used to specify a
|
Line 1258 The minus (hyphen) character can be used to specify a
|
| character class. For example, [d-m] matches any letter between d and m, |
character class. For example, [d-m] matches any letter between d and m, |
| inclusive. If a minus character is required in a class, it must be escaped with |
inclusive. If a minus character is required in a class, it must be escaped with |
| a backslash or appear in a position where it cannot be interpreted as |
a backslash or appear in a position where it cannot be interpreted as |
| indicating a range, typically as the first or last character in the class. | indicating a range, typically as the first or last character in the class, or |
| | immediately after a range. For example, [b-d-z] matches letters in the range b |
| | to d, a hyphen character, or z. |
| </P> |
</P> |
| <P> |
<P> |
| It is not possible to have the literal character "]" as the end character of a |
It is not possible to have the literal character "]" as the end character of a |
|
Line 1230 followed by two other characters. The octal or hexadec
|
Line 1272 followed by two other characters. The octal or hexadec
|
| "]" can also be used to end a range. |
"]" can also be used to end a range. |
| </P> |
</P> |
| <P> |
<P> |
| |
An error is generated if a POSIX character class (see below) or an escape |
| |
sequence other than one that defines a single character appears at a point |
| |
where a range ending character is expected. For example, [z-\xff] is valid, |
| |
but [A-\d] and [A-[:digit:]] are not. |
| |
</P> |
| |
<P> |
| Ranges operate in the collating sequence of character values. They can also be |
Ranges operate in the collating sequence of character values. They can also be |
| used for characters specified numerically, for example [\000-\037]. Ranges |
used for characters specified numerically, for example [\000-\037]. Ranges |
| can include any characters that are valid for the current mode. |
can include any characters that are valid for the current mode. |
|
Line 1269 something AND NOT ...".
|
Line 1317 something AND NOT ...".
|
| The only metacharacters that are recognized in character classes are backslash, |
The only metacharacters that are recognized in character classes are backslash, |
| hyphen (only where it can be interpreted as specifying a range), circumflex |
hyphen (only where it can be interpreted as specifying a range), circumflex |
| (only at the start), opening square bracket (only when it can be interpreted as |
(only at the start), opening square bracket (only when it can be interpreted as |
| introducing a POSIX class name - see the next section), and the terminating | introducing a POSIX class name, or for a special compatibility feature - see |
| closing square bracket. However, escaping other non-alphanumeric characters | the next two sections), and the terminating closing square bracket. However, |
| does no harm. | escaping other non-alphanumeric characters does no harm. |
| </P> |
</P> |
| <br><a name="SEC10" href="#TOC1">POSIX CHARACTER CLASSES</a><br> |
<br><a name="SEC10" href="#TOC1">POSIX CHARACTER CLASSES</a><br> |
| <P> |
<P> |
|
Line 1294 are:
|
Line 1342 are:
|
| lower lower case letters |
lower lower case letters |
| print printing characters, including space |
print printing characters, including space |
| punct printing characters, excluding letters and digits and space |
punct printing characters, excluding letters and digits and space |
| space white space (not quite the same as \s) | space white space (the same as \s from PCRE 8.34) |
| upper upper case letters |
upper upper case letters |
| word "word" characters (same as \w) |
word "word" characters (same as \w) |
| xdigit hexadecimal digits |
xdigit hexadecimal digits |
| </pre> |
</pre> |
| The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), and | The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), |
| space (32). Notice that this list includes the VT character (code 11). This | and space (32). If locale-specific matching is taking place, the list of space |
| makes "space" different to \s, which does not include VT (for Perl | characters may be different; there may be fewer or more of them. "Space" used |
| compatibility). | to be different to \s, which did not include VT, for Perl compatibility. |
| | However, Perl changed at release 5.18, and PCRE followed at release 8.34. |
| | "Space" and \s now match the same set of characters. |
| </P> |
</P> |
| <P> |
<P> |
| The name "word" is a Perl extension, and "blank" is a GNU extension from Perl |
The name "word" is a Perl extension, and "blank" is a GNU extension from Perl |
|
Line 1316 syntax [.ch.] and [=ch=] where "ch" is a "collating el
|
Line 1366 syntax [.ch.] and [=ch=] where "ch" is a "collating el
|
| supported, and an error is given if they are encountered. |
supported, and an error is given if they are encountered. |
| </P> |
</P> |
| <P> |
<P> |
| By default, in UTF modes, characters with values greater than 128 do not match | By default, characters with values greater than 128 do not match any of the |
| any of the POSIX character classes. However, if the PCRE_UCP option is passed | POSIX character classes. However, if the PCRE_UCP option is passed to |
| to <b>pcre_compile()</b>, some of the classes are changed so that Unicode | <b>pcre_compile()</b>, some of the classes are changed so that Unicode character |
| character properties are used. This is achieved by replacing the POSIX classes | properties are used. This is achieved by replacing certain POSIX classes by |
| by other sequences, as follows: | other sequences, as follows: |
| <pre> |
<pre> |
| [:alnum:] becomes \p{Xan} |
[:alnum:] becomes \p{Xan} |
| [:alpha:] becomes \p{L} |
[:alpha:] becomes \p{L} |
|
Line 1331 by other sequences, as follows:
|
Line 1381 by other sequences, as follows:
|
| [:upper:] becomes \p{Lu} |
[:upper:] becomes \p{Lu} |
| [:word:] becomes \p{Xwd} |
[:word:] becomes \p{Xwd} |
| </pre> |
</pre> |
| Negated versions, such as [:^alpha:] use \P instead of \p. The other POSIX | Negated versions, such as [:^alpha:] use \P instead of \p. Three other POSIX |
| classes are unchanged, and match only characters with code points less than | classes are handled specially in UCP mode: |
| 128. | |
| </P> |
</P> |
| <br><a name="SEC11" href="#TOC1">VERTICAL BAR</a><br> |
|
| <P> |
<P> |
| |
[:graph:] |
| |
This matches characters that have glyphs that mark the page when printed. In |
| |
Unicode property terms, it matches all characters with the L, M, N, P, S, or Cf |
| |
properties, except for: |
| |
<pre> |
| |
U+061C Arabic Letter Mark |
| |
U+180E Mongolian Vowel Separator |
| |
U+2066 - U+2069 Various "isolate"s |
| |
|
| |
</PRE> |
| |
</P> |
| |
<P> |
| |
[:print:] |
| |
This matches the same characters as [:graph:] plus space characters that are |
| |
not controls, that is, characters with the Zs property. |
| |
</P> |
| |
<P> |
| |
[:punct:] |
| |
This matches all characters that have the Unicode P (punctuation) property, |
| |
plus those characters whose code points are less than 128 that have the S |
| |
(Symbol) property. |
| |
</P> |
| |
<P> |
| |
The other POSIX classes are unchanged, and match only characters with code |
| |
points less than 128. |
| |
</P> |
| |
<br><a name="SEC11" href="#TOC1">COMPATIBILITY FEATURE FOR WORD BOUNDARIES</a><br> |
| |
<P> |
| |
In the POSIX.2 compliant library that was included in 4.4BSD Unix, the ugly |
| |
syntax [[:<:]] and [[:>:]] is used for matching "start of word" and "end of |
| |
word". PCRE treats these items as follows: |
| |
<pre> |
| |
[[:<:]] is converted to \b(?=\w) |
| |
[[:>:]] is converted to \b(?<=\w) |
| |
</pre> |
| |
Only these exact character sequences are recognized. A sequence such as |
| |
[a[:<:]b] provokes error for an unrecognized POSIX class name. This support is |
| |
not compatible with Perl. It is provided to help migrations from other |
| |
environments, and is best not used in any new patterns. Note that \b matches |
| |
at the start and the end of a word (see |
| |
<a href="#smallassertions">"Simple assertions"</a> |
| |
above), and in a Perl-style pattern the preceding or following character |
| |
normally shows which is wanted, without the need for the assertions that are |
| |
used above in order to give exactly the POSIX behaviour. |
| |
</P> |
| |
<br><a name="SEC12" href="#TOC1">VERTICAL BAR</a><br> |
| |
<P> |
| Vertical bar characters are used to separate alternative patterns. For example, |
Vertical bar characters are used to separate alternative patterns. For example, |
| the pattern |
the pattern |
| <pre> |
<pre> |
|
Line 1350 that succeeds is used. If the alternatives are within
|
Line 1445 that succeeds is used. If the alternatives are within
|
| "succeeds" means matching the rest of the main pattern as well as the |
"succeeds" means matching the rest of the main pattern as well as the |
| alternative in the subpattern. |
alternative in the subpattern. |
| </P> |
</P> |
| <br><a name="SEC12" href="#TOC1">INTERNAL OPTION SETTING</a><br> | <br><a name="SEC13" href="#TOC1">INTERNAL OPTION SETTING</a><br> |
| <P> |
<P> |
| The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and |
The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and |
| PCRE_EXTENDED options (which are Perl-compatible) can be changed from within |
PCRE_EXTENDED options (which are Perl-compatible) can be changed from within |
|
Line 1413 options, respectively. The (*UTF) sequence is a generi
|
Line 1508 options, respectively. The (*UTF) sequence is a generi
|
| used with any of the libraries. However, the application can set the |
used with any of the libraries. However, the application can set the |
| PCRE_NEVER_UTF option, which locks out the use of the (*UTF) sequences. |
PCRE_NEVER_UTF option, which locks out the use of the (*UTF) sequences. |
| <a name="subpattern"></a></P> |
<a name="subpattern"></a></P> |
| <br><a name="SEC13" href="#TOC1">SUBPATTERNS</a><br> | <br><a name="SEC14" href="#TOC1">SUBPATTERNS</a><br> |
| <P> |
<P> |
| Subpatterns are delimited by parentheses (round brackets), which can be nested. |
Subpatterns are delimited by parentheses (round brackets), which can be nested. |
| Turning part of a pattern into a subpattern does two things: |
Turning part of a pattern into a subpattern does two things: |
|
Line 1469 from left to right, and options are not reset until th
|
Line 1564 from left to right, and options are not reset until th
|
| is reached, an option setting in one branch does affect subsequent branches, so |
is reached, an option setting in one branch does affect subsequent branches, so |
| the above patterns match "SUNDAY" as well as "Saturday". |
the above patterns match "SUNDAY" as well as "Saturday". |
| <a name="dupsubpatternnumber"></a></P> |
<a name="dupsubpatternnumber"></a></P> |
| <br><a name="SEC14" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br> | <br><a name="SEC15" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br> |
| <P> |
<P> |
| Perl 5.10 introduced a feature whereby each alternative in a subpattern uses |
Perl 5.10 introduced a feature whereby each alternative in a subpattern uses |
| the same numbers for its capturing parentheses. Such a subpattern starts with |
the same numbers for its capturing parentheses. Such a subpattern starts with |
|
Line 1513 true if any of the subpatterns of that number have mat
|
Line 1608 true if any of the subpatterns of that number have mat
|
| An alternative approach to using this "branch reset" feature is to use |
An alternative approach to using this "branch reset" feature is to use |
| duplicate named subpatterns, as described in the next section. |
duplicate named subpatterns, as described in the next section. |
| </P> |
</P> |
| <br><a name="SEC15" href="#TOC1">NAMED SUBPATTERNS</a><br> | <br><a name="SEC16" href="#TOC1">NAMED SUBPATTERNS</a><br> |
| <P> |
<P> |
| Identifying capturing parentheses by number is simple, but it can be very hard |
Identifying capturing parentheses by number is simple, but it can be very hard |
| to keep track of the numbers in complicated regular expressions. Furthermore, |
to keep track of the numbers in complicated regular expressions. Furthermore, |
|
Line 1535 and
|
Line 1630 and
|
| can be made by name as well as by number. |
can be made by name as well as by number. |
| </P> |
</P> |
| <P> |
<P> |
| Names consist of up to 32 alphanumeric characters and underscores. Named | Names consist of up to 32 alphanumeric characters and underscores, but must |
| capturing parentheses are still allocated numbers as well as names, exactly as | start with a non-digit. Named capturing parentheses are still allocated numbers |
| if the names were not present. The PCRE API provides function calls for | as well as names, exactly as if the names were not present. The PCRE API |
| extracting the name-to-number translation table from a compiled pattern. There | provides function calls for extracting the name-to-number translation table |
| is also a convenience function for extracting a captured substring by name. | from a compiled pattern. There is also a convenience function for extracting a |
| | captured substring by name. |
| </P> |
</P> |
| <P> |
<P> |
| By default, a name must be unique within a pattern, but it is possible to relax |
By default, a name must be unique within a pattern, but it is possible to relax |
|
Line 1568 matched. This saves searching to find which numbered s
|
Line 1664 matched. This saves searching to find which numbered s
|
| </P> |
</P> |
| <P> |
<P> |
| If you make a back reference to a non-unique named subpattern from elsewhere in |
If you make a back reference to a non-unique named subpattern from elsewhere in |
| the pattern, the one that corresponds to the first occurrence of the name is | the pattern, the subpatterns to which the name refers are checked in the order |
| used. In the absence of duplicate numbers (see the previous section) this is | in which they appear in the overall pattern. The first one that is set is used |
| the one with the lowest number. If you use a named reference in a condition | for the reference. For example, this pattern matches both "foofoo" and |
| | "barbar" but not "foobar" or "barfoo": |
| | <pre> |
| | (?:(?<n>foo)|(?<n>bar))\k<n> |
| | |
| | </PRE> |
| | </P> |
| | <P> |
| | If you make a subroutine call to a non-unique named subpattern, the one that |
| | corresponds to the first occurrence of the name is used. In the absence of |
| | duplicate numbers (see the previous section) this is the one with the lowest |
| | number. |
| | </P> |
| | <P> |
| | If you use a named reference in a condition |
| test (see the |
test (see the |
| <a href="#conditions">section about conditions</a> |
<a href="#conditions">section about conditions</a> |
| below), either to check whether a subpattern has matched, or to check for |
below), either to check whether a subpattern has matched, or to check for |
|
Line 1585 documentation.
|
Line 1695 documentation.
|
| <b>Warning:</b> You cannot use different names to distinguish between two |
<b>Warning:</b> You cannot use different names to distinguish between two |
| subpatterns with the same number because PCRE uses only the numbers when |
subpatterns with the same number because PCRE uses only the numbers when |
| matching. For this reason, an error is given at compile time if different names |
matching. For this reason, an error is given at compile time if different names |
| are given to subpatterns with the same number. However, you can give the same | are given to subpatterns with the same number. However, you can always give the |
| name to subpatterns with the same number, even when PCRE_DUPNAMES is not set. | same name to subpatterns with the same number, even when PCRE_DUPNAMES is not |
| | set. |
| </P> |
</P> |
| <br><a name="SEC16" href="#TOC1">REPETITION</a><br> | <br><a name="SEC17" href="#TOC1">REPETITION</a><br> |
| <P> |
<P> |
| Repetition is specified by quantifiers, which can follow any of the following |
Repetition is specified by quantifiers, which can follow any of the following |
| items: |
items: |
|
Line 1756 example, after
|
Line 1867 example, after
|
| </pre> |
</pre> |
| matches "aba" the value of the second captured substring is "b". |
matches "aba" the value of the second captured substring is "b". |
| <a name="atomicgroup"></a></P> |
<a name="atomicgroup"></a></P> |
| <br><a name="SEC17" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br> | <br><a name="SEC18" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br> |
| <P> |
<P> |
| With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") |
With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") |
| repetition, failure of what follows normally causes the repeated item to be |
repetition, failure of what follows normally causes the repeated item to be |
|
Line 1860 an atomic group, like this:
|
Line 1971 an atomic group, like this:
|
| </pre> |
</pre> |
| sequences of non-digits cannot be broken, and failure happens quickly. |
sequences of non-digits cannot be broken, and failure happens quickly. |
| <a name="backreferences"></a></P> |
<a name="backreferences"></a></P> |
| <br><a name="SEC18" href="#TOC1">BACK REFERENCES</a><br> | <br><a name="SEC19" href="#TOC1">BACK REFERENCES</a><br> |
| <P> |
<P> |
| Outside a character class, a backslash followed by a digit greater than 0 (and |
Outside a character class, a backslash followed by a digit greater than 0 (and |
| possibly further digits) is a back reference to a capturing subpattern earlier |
possibly further digits) is a back reference to a capturing subpattern earlier |
|
Line 1988 as an
|
Line 2099 as an
|
| Once the whole group has been matched, a subsequent matching failure cannot |
Once the whole group has been matched, a subsequent matching failure cannot |
| cause backtracking into the middle of the group. |
cause backtracking into the middle of the group. |
| <a name="bigassertions"></a></P> |
<a name="bigassertions"></a></P> |
| <br><a name="SEC19" href="#TOC1">ASSERTIONS</a><br> | <br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br> |
| <P> |
<P> |
| An assertion is a test on the characters following or preceding the current |
An assertion is a test on the characters following or preceding the current |
| matching point that does not actually consume any characters. The simple |
matching point that does not actually consume any characters. The simple |
|
Line 2178 preceded by "foo", while
|
Line 2289 preceded by "foo", while
|
| is another pattern that matches "foo" preceded by three digits and any three |
is another pattern that matches "foo" preceded by three digits and any three |
| characters that are not "999". |
characters that are not "999". |
| <a name="conditions"></a></P> |
<a name="conditions"></a></P> |
| <br><a name="SEC20" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br> | <br><a name="SEC21" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br> |
| <P> |
<P> |
| It is possible to cause the matching process to obey a subpattern |
It is possible to cause the matching process to obey a subpattern |
| conditionally or to choose between two alternative subpatterns, depending on |
conditionally or to choose between two alternative subpatterns, depending on |
|
Line 2252 Checking for a used subpattern by name
|
Line 2363 Checking for a used subpattern by name
|
| <P> |
<P> |
| Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used |
Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used |
| subpattern by name. For compatibility with earlier versions of PCRE, which had |
subpattern by name. For compatibility with earlier versions of PCRE, which had |
| this facility before Perl, the syntax (?(name)...) is also recognized. However, | this facility before Perl, the syntax (?(name)...) is also recognized. |
| there is a possible ambiguity with this syntax, because subpattern names may | |
| consist entirely of digits. PCRE looks first for a named subpattern; if it | |
| cannot find one and the name consists entirely of digits, PCRE looks for a | |
| subpattern of that number, which must be greater than zero. Using subpattern | |
| names that consist entirely of digits is not recommended. | |
| </P> |
</P> |
| <P> |
<P> |
| Rewriting the above example to use a named subpattern gives this: |
Rewriting the above example to use a named subpattern gives this: |
|
Line 2333 subject is matched against the first alternative; othe
|
Line 2439 subject is matched against the first alternative; othe
|
| against the second. This pattern matches strings in one of the two forms |
against the second. This pattern matches strings in one of the two forms |
| dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits. |
dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits. |
| <a name="comments"></a></P> |
<a name="comments"></a></P> |
| <br><a name="SEC21" href="#TOC1">COMMENTS</a><br> | <br><a name="SEC22" href="#TOC1">COMMENTS</a><br> |
| <P> |
<P> |
| There are two ways of including comments in patterns that are processed by |
There are two ways of including comments in patterns that are processed by |
| PCRE. In both cases, the start of the comment must not be in a character class, |
PCRE. In both cases, the start of the comment must not be in a character class, |
|
Line 2362 a newline in the pattern. The sequence \n is still lit
|
Line 2468 a newline in the pattern. The sequence \n is still lit
|
| it does not terminate the comment. Only an actual character with the code value |
it does not terminate the comment. Only an actual character with the code value |
| 0x0a (the default newline) does so. |
0x0a (the default newline) does so. |
| <a name="recursion"></a></P> |
<a name="recursion"></a></P> |
| <br><a name="SEC22" href="#TOC1">RECURSIVE PATTERNS</a><br> | <br><a name="SEC23" href="#TOC1">RECURSIVE PATTERNS</a><br> |
| <P> |
<P> |
| Consider the problem of matching a string in parentheses, allowing for |
Consider the problem of matching a string in parentheses, allowing for |
| unlimited nested parentheses. Without the use of recursion, the best that can |
unlimited nested parentheses. Without the use of recursion, the best that can |
|
Line 2577 now match "b" and so the whole match succeeds. In Perl
|
Line 2683 now match "b" and so the whole match succeeds. In Perl
|
| match because inside the recursive call \1 cannot access the externally set |
match because inside the recursive call \1 cannot access the externally set |
| value. |
value. |
| <a name="subpatternsassubroutines"></a></P> |
<a name="subpatternsassubroutines"></a></P> |
| <br><a name="SEC23" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br> | <br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br> |
| <P> |
<P> |
| If the syntax for a recursive subpattern call (either by number or by |
If the syntax for a recursive subpattern call (either by number or by |
| name) is used outside the parentheses to which it refers, it operates like a |
name) is used outside the parentheses to which it refers, it operates like a |
|
Line 2618 different calls. For example, consider this pattern:
|
Line 2724 different calls. For example, consider this pattern:
|
| It matches "abcabc". It does not match "abcABC" because the change of |
It matches "abcabc". It does not match "abcABC" because the change of |
| processing option does not affect the called subpattern. |
processing option does not affect the called subpattern. |
| <a name="onigurumasubroutines"></a></P> |
<a name="onigurumasubroutines"></a></P> |
| <br><a name="SEC24" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br> | <br><a name="SEC25" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br> |
| <P> |
<P> |
| For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or |
For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or |
| a number enclosed either in angle brackets or single quotes, is an alternative |
a number enclosed either in angle brackets or single quotes, is an alternative |
|
Line 2636 plus or a minus sign it is taken as a relative referen
|
Line 2742 plus or a minus sign it is taken as a relative referen
|
| Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i> |
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i> |
| synonymous. The former is a back reference; the latter is a subroutine call. |
synonymous. The former is a back reference; the latter is a subroutine call. |
| </P> |
</P> |
| <br><a name="SEC25" href="#TOC1">CALLOUTS</a><br> | <br><a name="SEC26" href="#TOC1">CALLOUTS</a><br> |
| <P> |
<P> |
| Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl |
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl |
| code to be obeyed in the middle of matching a regular expression. This makes it |
code to be obeyed in the middle of matching a regular expression. This makes it |
|
Line 2674 During matching, when PCRE reaches a callout point, th
|
Line 2780 During matching, when PCRE reaches a callout point, th
|
| called. It is provided with the number of the callout, the position in the |
called. It is provided with the number of the callout, the position in the |
| pattern, and, optionally, one item of data originally supplied by the caller of |
pattern, and, optionally, one item of data originally supplied by the caller of |
| the matching function. The callout function may cause matching to proceed, to |
the matching function. The callout function may cause matching to proceed, to |
| backtrack, or to fail altogether. A complete description of the interface to | backtrack, or to fail altogether. |
| the callout function is given in the | </P> |
| | <P> |
| | By default, PCRE implements a number of optimizations at compile time and |
| | matching time, and one side-effect is that sometimes callouts are skipped. If |
| | you need all possible callouts to happen, you need to set options that disable |
| | the relevant optimizations. More details, and a complete description of the |
| | interface to the callout function, are given in the |
| <a href="pcrecallout.html"><b>pcrecallout</b></a> |
<a href="pcrecallout.html"><b>pcrecallout</b></a> |
| documentation. |
documentation. |
| <a name="backtrackcontrol"></a></P> |
<a name="backtrackcontrol"></a></P> |
| <br><a name="SEC26" href="#TOC1">BACKTRACKING CONTROL</a><br> | <br><a name="SEC27" href="#TOC1">BACKTRACKING CONTROL</a><br> |
| <P> |
<P> |
| Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which |
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which |
| are still described in the Perl documentation as "experimental and subject to |
are still described in the Perl documentation as "experimental and subject to |
|
Line 3026 example:
|
Line 3138 example:
|
| <pre> |
<pre> |
| ...(*COMMIT)(*PRUNE)... |
...(*COMMIT)(*PRUNE)... |
| </pre> |
</pre> |
| If there is a matching failure to the right, backtracking onto (*PRUNE) cases | If there is a matching failure to the right, backtracking onto (*PRUNE) causes |
| it to be triggered, and its action is taken. There can never be a backtrack |
it to be triggered, and its action is taken. There can never be a backtrack |
| onto (*COMMIT). |
onto (*COMMIT). |
| <a name="btrepeat"></a></P> |
<a name="btrepeat"></a></P> |
|
Line 3093 the subroutine match to fail.
|
Line 3205 the subroutine match to fail.
|
| the subpattern that has alternatives. If there is no such group within the |
the subpattern that has alternatives. If there is no such group within the |
| subpattern, (*THEN) causes the subroutine match to fail. |
subpattern, (*THEN) causes the subroutine match to fail. |
| </P> |
</P> |
| <br><a name="SEC27" href="#TOC1">SEE ALSO</a><br> | <br><a name="SEC28" href="#TOC1">SEE ALSO</a><br> |
| <P> |
<P> |
| <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), |
<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), |
| <b>pcresyntax</b>(3), <b>pcre</b>(3), <b>pcre16(3)</b>, <b>pcre32(3)</b>. |
<b>pcresyntax</b>(3), <b>pcre</b>(3), <b>pcre16(3)</b>, <b>pcre32(3)</b>. |
| </P> |
</P> |
| <br><a name="SEC28" href="#TOC1">AUTHOR</a><br> | <br><a name="SEC29" href="#TOC1">AUTHOR</a><br> |
| <P> |
<P> |
| Philip Hazel |
Philip Hazel |
| <br> |
<br> |
|
Line 3107 University Computing Service
|
Line 3219 University Computing Service
|
| Cambridge CB2 3QH, England. |
Cambridge CB2 3QH, England. |
| <br> |
<br> |
| </P> |
</P> |
| <br><a name="SEC29" href="#TOC1">REVISION</a><br> | <br><a name="SEC30" href="#TOC1">REVISION</a><br> |
| <P> |
<P> |
| Last updated: 26 April 2013 | Last updated: 03 December 2013 |
| <br> |
<br> |
| Copyright © 1997-2013 University of Cambridge. |
Copyright © 1997-2013 University of Cambridge. |
| <br> |
<br> |