version 1.1.1.2, 2012/02/21 23:50:25
|
version 1.1.1.5, 2014/06/15 19:46:05
|
Line 36 these do not seem to have any use.
|
Line 36 these do not seem to have any use.
|
</P> |
</P> |
<P> |
<P> |
3. Capturing subpatterns that occur inside negative lookahead assertions are |
3. Capturing subpatterns that occur inside negative lookahead assertions are |
counted, but their entries in the offsets vector are never set. Perl sets its | counted, but their entries in the offsets vector are never set. Perl sometimes |
numerical variables from any such patterns that are matched before the | (but not always) sets its numerical variables from inside negative assertions. |
assertion fails to match something (thereby succeeding), but only if the | |
negative lookahead assertion contains just one branch. | |
</P> |
</P> |
<P> |
<P> |
4. Though binary zero characters are supported in the subject string, they are |
4. Though binary zero characters are supported in the subject string, they are |
Line 67 the internal representation of Unicode characters, the
|
Line 65 the internal representation of Unicode characters, the
|
implement the somewhat messy concept of surrogates." |
implement the somewhat messy concept of surrogates." |
</P> |
</P> |
<P> |
<P> |
7. PCRE implements a simpler version of \X than Perl, which changed to make | 7. PCRE does support the \Q...\E escape for quoting substrings. Characters in |
\X match what Unicode calls an "extended grapheme cluster". This is more | |
complicated than an extended Unicode sequence, which is what PCRE matches. | |
</P> | |
<P> | |
8. PCRE does support the \Q...\E escape for quoting substrings. Characters in | |
between are treated as literals. This is slightly different from Perl in that $ |
between are treated as literals. This is slightly different from Perl in that $ |
and @ are also handled as literals inside the quotes. In Perl, they cause |
and @ are also handled as literals inside the quotes. In Perl, they cause |
variable interpolation (but of course PCRE does not have variables). Note the |
variable interpolation (but of course PCRE does not have variables). Note the |
Line 87 following examples:
|
Line 80 following examples:
|
The \Q...\E sequence is recognized both inside and outside character classes. |
The \Q...\E sequence is recognized both inside and outside character classes. |
</P> |
</P> |
<P> |
<P> |
9. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) | 8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) |
constructions. However, there is support for recursive patterns. This is not |
constructions. However, there is support for recursive patterns. This is not |
available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout" |
available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout" |
feature allows an external function to be called during pattern matching. See |
feature allows an external function to be called during pattern matching. See |
Line 96 the
|
Line 89 the
|
documentation for details. |
documentation for details. |
</P> |
</P> |
<P> |
<P> |
10. Subpatterns that are called as subroutines (whether or not recursively) are | 9. Subpatterns that are called as subroutines (whether or not recursively) are |
always treated as atomic groups in PCRE. This is like Python, but unlike Perl. |
always treated as atomic groups in PCRE. This is like Python, but unlike Perl. |
Captured values that are set outside a subroutine call can be reference from |
Captured values that are set outside a subroutine call can be reference from |
inside in PCRE, but not in Perl. There is a discussion that explains these |
inside in PCRE, but not in Perl. There is a discussion that explains these |
Line 107 in the
|
Line 100 in the
|
page. |
page. |
</P> |
</P> |
<P> |
<P> |
11. If (*THEN) is present in a group that is called as a subroutine, its action | 10. If any of the backtracking control verbs are used in a subpattern that is |
is limited to that group, even if the group does not contain any | characters. | called as a subroutine (whether or not recursively), their effect is confined |
| to that subpattern; it does not extend to the surrounding pattern. This is not |
| always the case in Perl. In particular, if (*THEN) is present in a group that |
| is called as a subroutine, its action is limited to that group, even if the |
| group does not contain any | characters. Note that such subpatterns are |
| processed as anchored at the point where they are tested. |
</P> |
</P> |
<P> |
<P> |
12. There are some differences that are concerned with the settings of captured | 11. If a pattern contains more than one backtracking control verb, the first |
| one that is backtracked onto acts. For example, in the pattern |
| A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C |
| triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the |
| same as PCRE, but there are examples where it differs. |
| </P> |
| <P> |
| 12. Most backtracking verbs in assertions have their normal actions. They are |
| not confined to the assertion. |
| </P> |
| <P> |
| 13. There are some differences that are concerned with the settings of captured |
strings when part of a pattern is repeated. For example, matching "aba" against |
strings when part of a pattern is repeated. For example, matching "aba" against |
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". |
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". |
</P> |
</P> |
<P> |
<P> |
13. PCRE's handling of duplicate subpattern numbers and duplicate subpattern | 14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern |
names is not as general as Perl's. This is a consequence of the fact the PCRE |
names is not as general as Perl's. This is a consequence of the fact the PCRE |
works internally just with numbers, using an external table to translate |
works internally just with numbers, using an external table to translate |
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B), |
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B), |
Line 127 names map to capturing subpattern number 1. To avoid t
|
Line 136 names map to capturing subpattern number 1. To avoid t
|
an error is given at compile time. |
an error is given at compile time. |
</P> |
</P> |
<P> |
<P> |
14. Perl recognizes comments in some places that PCRE does not, for example, | 15. Perl recognizes comments in some places that PCRE does not, for example, |
between the ( and ? at the start of a subpattern. If the /x modifier is set, |
between the ( and ? at the start of a subpattern. If the /x modifier is set, |
Perl allows whitespace between ( and ? but PCRE never does, even if the | Perl allows white space between ( and ? (though current Perls warn that this is |
PCRE_EXTENDED option is set. | deprecated) but PCRE never does, even if the PCRE_EXTENDED option is set. |
</P> |
</P> |
<P> |
<P> |
15. PCRE provides some extensions to the Perl regular expression facilities. | 16. Perl, when in warning mode, gives warnings for character classes such as |
| [A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE has no |
| warning features, so it gives an error in these cases because they are almost |
| certainly user mistakes. |
| </P> |
| <P> |
| 17. In PCRE, the upper/lower case character properties Lu and Ll are not |
| affected when case-independent matching is specified. For example, \p{Lu} |
| always matches an upper case letter. I think Perl has changed in this respect; |
| in the release at the time of writing (5.16), \p{Lu} and \p{Ll} match all |
| letters, regardless of case, when case independence is specified. |
| </P> |
| <P> |
| 18. PCRE provides some extensions to the Perl regular expression facilities. |
Perl 5.10 includes new features that are not in earlier versions of Perl, some |
Perl 5.10 includes new features that are not in earlier versions of Perl, some |
of which (such as named parentheses) have been in PCRE for some time. This list |
of which (such as named parentheses) have been in PCRE for some time. This list |
is with respect to Perl 5.10: |
is with respect to Perl 5.10: |
Line 181 different hosts that have the other endianness. Howeve
|
Line 203 different hosts that have the other endianness. Howeve
|
optimized data created by the just-in-time compiler. |
optimized data created by the just-in-time compiler. |
<br> |
<br> |
<br> |
<br> |
(k) The alternative matching functions (<b>pcre_dfa_exec()</b> and | (k) The alternative matching functions (<b>pcre_dfa_exec()</b>, |
<b>pcre16_dfa_exec()</b>) match in a different way and are not Perl-compatible. | <b>pcre16_dfa_exec()</b> and <b>pcre32_dfa_exec()</b>,) match in a different way |
| and are not Perl-compatible. |
<br> |
<br> |
<br> |
<br> |
(l) PCRE recognizes some special sequences such as (*CR) at the start of |
(l) PCRE recognizes some special sequences such as (*CR) at the start of |
Line 203 Cambridge CB2 3QH, England.
|
Line 226 Cambridge CB2 3QH, England.
|
REVISION |
REVISION |
</b><br> |
</b><br> |
<P> |
<P> |
Last updated: 08 Januray 2012 | Last updated: 10 November 2013 |
<br> |
<br> |
Copyright © 1997-2012 University of Cambridge. | Copyright © 1997-2013 University of Cambridge. |
<br> |
<br> |
<p> |
<p> |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |