version 1.1.1.1, 2012/02/21 23:05:52
|
version 1.1.1.5, 2014/06/15 19:46:05
|
Line 1
|
Line 1
|
.TH PCRECOMPAT 3 | .TH PCRECOMPAT 3 "10 November 2013" "PCRE 8.34" |
.SH NAME |
.SH NAME |
PCRE - Perl-compatible regular expressions |
PCRE - Perl-compatible regular expressions |
.SH "DIFFERENCES BETWEEN PCRE AND PERL" |
.SH "DIFFERENCES BETWEEN PCRE AND PERL" |
Line 8 This document describes the differences in the ways th
|
Line 8 This document describes the differences in the ways th
|
regular expressions. The differences described here are with respect to Perl |
regular expressions. The differences described here are with respect to Perl |
versions 5.10 and above. |
versions 5.10 and above. |
.P |
.P |
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what | 1. PCRE has only a subset of Perl's Unicode support. Details of what it does |
it does have are given in the | have are given in the |
.\" HREF |
.\" HREF |
\fBpcreunicode\fP |
\fBpcreunicode\fP |
.\" |
.\" |
Line 23 just once). Perl allows repeat quantifiers on other as
|
Line 23 just once). Perl allows repeat quantifiers on other as
|
these do not seem to have any use. |
these do not seem to have any use. |
.P |
.P |
3. Capturing subpatterns that occur inside negative lookahead assertions are |
3. Capturing subpatterns that occur inside negative lookahead assertions are |
counted, but their entries in the offsets vector are never set. Perl sets its | counted, but their entries in the offsets vector are never set. Perl sometimes |
numerical variables from any such patterns that are matched before the | (but not always) sets its numerical variables from inside negative assertions. |
assertion fails to match something (thereby succeeding), but only if the | |
negative lookahead assertion contains just one branch. | |
.P |
.P |
4. Though binary zero characters are supported in the subject string, they are |
4. Though binary zero characters are supported in the subject string, they are |
not allowed in a pattern string because it is passed as a normal C string, |
not allowed in a pattern string because it is passed as a normal C string, |
Line 50 Perl documentation says "Because Perl hides the need f
|
Line 48 Perl documentation says "Because Perl hides the need f
|
the internal representation of Unicode characters, there is no need to |
the internal representation of Unicode characters, there is no need to |
implement the somewhat messy concept of surrogates." |
implement the somewhat messy concept of surrogates." |
.P |
.P |
7. PCRE implements a simpler version of \eX than Perl, which changed to make | 7. PCRE does support the \eQ...\eE escape for quoting substrings. Characters in |
\eX match what Unicode calls an "extended grapheme cluster". This is more | |
complicated than an extended Unicode sequence, which is what PCRE matches. | |
.P | |
8. PCRE does support the \eQ...\eE escape for quoting substrings. Characters in | |
between are treated as literals. This is slightly different from Perl in that $ |
between are treated as literals. This is slightly different from Perl in that $ |
and @ are also handled as literals inside the quotes. In Perl, they cause |
and @ are also handled as literals inside the quotes. In Perl, they cause |
variable interpolation (but of course PCRE does not have variables). Note the |
variable interpolation (but of course PCRE does not have variables). Note the |
Line 70 following examples:
|
Line 64 following examples:
|
.sp |
.sp |
The \eQ...\eE sequence is recognized both inside and outside character classes. |
The \eQ...\eE sequence is recognized both inside and outside character classes. |
.P |
.P |
9. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) | 8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) |
constructions. However, there is support for recursive patterns. This is not |
constructions. However, there is support for recursive patterns. This is not |
available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout" |
available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout" |
feature allows an external function to be called during pattern matching. See |
feature allows an external function to be called during pattern matching. See |
Line 80 the
|
Line 74 the
|
.\" |
.\" |
documentation for details. |
documentation for details. |
.P |
.P |
10. Subpatterns that are called as subroutines (whether or not recursively) are | 9. Subpatterns that are called as subroutines (whether or not recursively) are |
always treated as atomic groups in PCRE. This is like Python, but unlike Perl. |
always treated as atomic groups in PCRE. This is like Python, but unlike Perl. |
Captured values that are set outside a subroutine call can be reference from |
Captured values that are set outside a subroutine call can be reference from |
inside in PCRE, but not in Perl. There is a discussion that explains these |
inside in PCRE, but not in Perl. There is a discussion that explains these |
Line 95 in the
|
Line 89 in the
|
.\" |
.\" |
page. |
page. |
.P |
.P |
11. If (*THEN) is present in a group that is called as a subroutine, its action | 10. If any of the backtracking control verbs are used in a subpattern that is |
is limited to that group, even if the group does not contain any | characters. | called as a subroutine (whether or not recursively), their effect is confined |
| to that subpattern; it does not extend to the surrounding pattern. This is not |
| always the case in Perl. In particular, if (*THEN) is present in a group that |
| is called as a subroutine, its action is limited to that group, even if the |
| group does not contain any | characters. Note that such subpatterns are |
| processed as anchored at the point where they are tested. |
.P |
.P |
12. There are some differences that are concerned with the settings of captured | 11. If a pattern contains more than one backtracking control verb, the first |
| one that is backtracked onto acts. For example, in the pattern |
| A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C |
| triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the |
| same as PCRE, but there are examples where it differs. |
| .P |
| 12. Most backtracking verbs in assertions have their normal actions. They are |
| not confined to the assertion. |
| .P |
| 13. There are some differences that are concerned with the settings of captured |
strings when part of a pattern is repeated. For example, matching "aba" against |
strings when part of a pattern is repeated. For example, matching "aba" against |
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". |
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". |
.P |
.P |
13. PCRE's handling of duplicate subpattern numbers and duplicate subpattern | 14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern |
names is not as general as Perl's. This is a consequence of the fact the PCRE |
names is not as general as Perl's. This is a consequence of the fact the PCRE |
works internally just with numbers, using an external table to translate |
works internally just with numbers, using an external table to translate |
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B), |
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B), |
Line 112 would not be possible to distinguish which parentheses
|
Line 120 would not be possible to distinguish which parentheses
|
names map to capturing subpattern number 1. To avoid this confusing situation, |
names map to capturing subpattern number 1. To avoid this confusing situation, |
an error is given at compile time. |
an error is given at compile time. |
.P |
.P |
14. Perl recognizes comments in some places that PCRE does not, for example, | 15. Perl recognizes comments in some places that PCRE does not, for example, |
between the ( and ? at the start of a subpattern. If the /x modifier is set, |
between the ( and ? at the start of a subpattern. If the /x modifier is set, |
Perl allows whitespace between ( and ? but PCRE never does, even if the | Perl allows white space between ( and ? (though current Perls warn that this is |
PCRE_EXTENDED option is set. | deprecated) but PCRE never does, even if the PCRE_EXTENDED option is set. |
.P |
.P |
15. PCRE provides some extensions to the Perl regular expression facilities. | 16. Perl, when in warning mode, gives warnings for character classes such as |
| [A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE has no |
| warning features, so it gives an error in these cases because they are almost |
| certainly user mistakes. |
| .P |
| 17. In PCRE, the upper/lower case character properties Lu and Ll are not |
| affected when case-independent matching is specified. For example, \ep{Lu} |
| always matches an upper case letter. I think Perl has changed in this respect; |
| in the release at the time of writing (5.16), \ep{Lu} and \ep{Ll} match all |
| letters, regardless of case, when case independence is specified. |
| .P |
| 18. PCRE provides some extensions to the Perl regular expression facilities. |
Perl 5.10 includes new features that are not in earlier versions of Perl, some |
Perl 5.10 includes new features that are not in earlier versions of Perl, some |
of which (such as named parentheses) have been in PCRE for some time. This list |
of which (such as named parentheses) have been in PCRE for some time. This list |
is with respect to Perl 5.10: |
is with respect to Perl 5.10: |
Line 154 by the PCRE_BSR_ANYCRLF option.
|
Line 173 by the PCRE_BSR_ANYCRLF option.
|
different hosts that have the other endianness. However, this does not apply to |
different hosts that have the other endianness. However, this does not apply to |
optimized data created by the just-in-time compiler. |
optimized data created by the just-in-time compiler. |
.sp |
.sp |
(k) The alternative matching function (\fBpcre_dfa_exec()\fP) matches in a | (k) The alternative matching functions (\fBpcre_dfa_exec()\fP, |
different way and is not Perl-compatible. | \fBpcre16_dfa_exec()\fP and \fBpcre32_dfa_exec()\fP,) match in a different way |
| and are not Perl-compatible. |
.sp |
.sp |
(l) PCRE recognizes some special sequences such as (*CR) at the start of |
(l) PCRE recognizes some special sequences such as (*CR) at the start of |
a pattern that set overall options that cannot be changed within the pattern. |
a pattern that set overall options that cannot be changed within the pattern. |
Line 175 Cambridge CB2 3QH, England.
|
Line 195 Cambridge CB2 3QH, England.
|
.rs |
.rs |
.sp |
.sp |
.nf |
.nf |
Last updated: 14 November 2011 | Last updated: 10 November 2013 |
Copyright (c) 1997-2011 University of Cambridge. | Copyright (c) 1997-2013 University of Cambridge. |
.fi |
.fi |