| version 1.1.1.1, 2012/02/21 23:05:52 | version 1.1.1.4, 2013/07/22 08:25:57 | 
| Line 21  regular expressions. The differences described here ar | Line 21  regular expressions. The differences described here ar | 
 | versions 5.10 and above. | versions 5.10 and above. | 
 | </P> | </P> | 
 | <P> | <P> | 
| 1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what | 1. PCRE has only a subset of Perl's Unicode support. Details of what it does | 
| it does have are given in the | have are given in the | 
 | <a href="pcreunicode.html"><b>pcreunicode</b></a> | <a href="pcreunicode.html"><b>pcreunicode</b></a> | 
 | page. | page. | 
 | </P> | </P> | 
| Line 36  these do not seem to have any use. | Line 36  these do not seem to have any use. | 
 | </P> | </P> | 
 | <P> | <P> | 
 | 3. Capturing subpatterns that occur inside negative lookahead assertions are | 3. Capturing subpatterns that occur inside negative lookahead assertions are | 
| counted, but their entries in the offsets vector are never set. Perl sets its | counted, but their entries in the offsets vector are never set. Perl sometimes | 
| numerical variables from any such patterns that are matched before the | (but not always) sets its numerical variables from inside negative assertions. | 
| assertion fails to match something (thereby succeeding), but only if the |  | 
| negative lookahead assertion contains just one branch. |  | 
 | </P> | </P> | 
 | <P> | <P> | 
 | 4. Though binary zero characters are supported in the subject string, they are | 4. Though binary zero characters are supported in the subject string, they are | 
| Line 67  the internal representation of Unicode characters, the | Line 65  the internal representation of Unicode characters, the | 
 | implement the somewhat messy concept of surrogates." | implement the somewhat messy concept of surrogates." | 
 | </P> | </P> | 
 | <P> | <P> | 
| 7. PCRE implements a simpler version of \X than Perl, which changed to make | 7. PCRE does support the \Q...\E escape for quoting substrings. Characters in | 
| \X match what Unicode calls an "extended grapheme cluster". This is more |  | 
| complicated than an extended Unicode sequence, which is what PCRE matches. |  | 
| </P> |  | 
| <P> |  | 
| 8. PCRE does support the \Q...\E escape for quoting substrings. Characters in |  | 
 | between are treated as literals. This is slightly different from Perl in that $ | between are treated as literals. This is slightly different from Perl in that $ | 
 | and @ are also handled as literals inside the quotes. In Perl, they cause | and @ are also handled as literals inside the quotes. In Perl, they cause | 
 | variable interpolation (but of course PCRE does not have variables). Note the | variable interpolation (but of course PCRE does not have variables). Note the | 
| Line 87  following examples: | Line 80  following examples: | 
 | The \Q...\E sequence is recognized both inside and outside character classes. | The \Q...\E sequence is recognized both inside and outside character classes. | 
 | </P> | </P> | 
 | <P> | <P> | 
| 9. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) | 8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) | 
 | constructions. However, there is support for recursive patterns. This is not | constructions. However, there is support for recursive patterns. This is not | 
 | available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout" | available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout" | 
 | feature allows an external function to be called during pattern matching. See | feature allows an external function to be called during pattern matching. See | 
| Line 96  the | Line 89  the | 
 | documentation for details. | documentation for details. | 
 | </P> | </P> | 
 | <P> | <P> | 
| 10. Subpatterns that are called as subroutines (whether or not recursively) are | 9. Subpatterns that are called as subroutines (whether or not recursively) are | 
 | always treated as atomic groups in PCRE. This is like Python, but unlike Perl. | always treated as atomic groups in PCRE. This is like Python, but unlike Perl. | 
 | Captured values that are set outside a subroutine call can be reference from | Captured values that are set outside a subroutine call can be reference from | 
 | inside in PCRE, but not in Perl. There is a discussion that explains these | inside in PCRE, but not in Perl. There is a discussion that explains these | 
| Line 107  in the | Line 100  in the | 
 | page. | page. | 
 | </P> | </P> | 
 | <P> | <P> | 
| 11. If (*THEN) is present in a group that is called as a subroutine, its action | 10. If any of the backtracking control verbs are used in a subpattern that is | 
| is limited to that group, even if the group does not contain any | characters. | called as a subroutine (whether or not recursively), their effect is confined | 
|  | to that subpattern; it does not extend to the surrounding pattern. This is not | 
|  | always the case in Perl. In particular, if (*THEN) is present in a group that | 
|  | is called as a subroutine, its action is limited to that group, even if the | 
|  | group does not contain any | characters. Note that such subpatterns are | 
|  | processed as anchored at the point where they are tested. | 
 | </P> | </P> | 
 | <P> | <P> | 
| 12. There are some differences that are concerned with the settings of captured | 11. If a pattern contains more than one backtracking control verb, the first | 
|  | one that is backtracked onto acts. For example, in the pattern | 
|  | A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C | 
|  | triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the | 
|  | same as PCRE, but there are examples where it differs. | 
|  | </P> | 
|  | <P> | 
|  | 12. Most backtracking verbs in assertions have their normal actions. They are | 
|  | not confined to the assertion. | 
|  | </P> | 
|  | <P> | 
|  | 13. There are some differences that are concerned with the settings of captured | 
 | strings when part of a pattern is repeated. For example, matching "aba" against | strings when part of a pattern is repeated. For example, matching "aba" against | 
 | the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". | the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". | 
 | </P> | </P> | 
 | <P> | <P> | 
| 13. PCRE's handling of duplicate subpattern numbers and duplicate subpattern | 14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern | 
 | names is not as general as Perl's. This is a consequence of the fact the PCRE | names is not as general as Perl's. This is a consequence of the fact the PCRE | 
 | works internally just with numbers, using an external table to translate | works internally just with numbers, using an external table to translate | 
 | between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B), | between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B), | 
| Line 127  names map to capturing subpattern number 1. To avoid t | Line 136  names map to capturing subpattern number 1. To avoid t | 
 | an error is given at compile time. | an error is given at compile time. | 
 | </P> | </P> | 
 | <P> | <P> | 
| 14. Perl recognizes comments in some places that PCRE does not, for example, | 15. Perl recognizes comments in some places that PCRE does not, for example, | 
 | between the ( and ? at the start of a subpattern. If the /x modifier is set, | between the ( and ? at the start of a subpattern. If the /x modifier is set, | 
| Perl allows whitespace between ( and ? but PCRE never does, even if the | Perl allows white space between ( and ? but PCRE never does, even if the | 
 | PCRE_EXTENDED option is set. | PCRE_EXTENDED option is set. | 
 | </P> | </P> | 
 | <P> | <P> | 
| 15. PCRE provides some extensions to the Perl regular expression facilities. | 16. In PCRE, the upper/lower case character properties Lu and Ll are not | 
|  | affected when case-independent matching is specified. For example, \p{Lu} | 
|  | always matches an upper case letter. I think Perl has changed in this respect; | 
|  | in the release at the time of writing (5.16), \p{Lu} and \p{Ll} match all | 
|  | letters, regardless of case, when case independence is specified. | 
|  | </P> | 
|  | <P> | 
|  | 17. PCRE provides some extensions to the Perl regular expression facilities. | 
 | Perl 5.10 includes new features that are not in earlier versions of Perl, some | Perl 5.10 includes new features that are not in earlier versions of Perl, some | 
 | of which (such as named parentheses) have been in PCRE for some time. This list | of which (such as named parentheses) have been in PCRE for some time. This list | 
 | is with respect to Perl 5.10: | is with respect to Perl 5.10: | 
| Line 181  different hosts that have the other endianness. Howeve | Line 197  different hosts that have the other endianness. Howeve | 
 | optimized data created by the just-in-time compiler. | optimized data created by the just-in-time compiler. | 
 | <br> | <br> | 
 | <br> | <br> | 
| (k) The alternative matching function (<b>pcre_dfa_exec()</b>) matches in a | (k) The alternative matching functions (<b>pcre_dfa_exec()</b>, | 
| different way and is not Perl-compatible. | <b>pcre16_dfa_exec()</b> and <b>pcre32_dfa_exec()</b>,) match in a different way | 
|  | and are not Perl-compatible. | 
 | <br> | <br> | 
 | <br> | <br> | 
 | (l) PCRE recognizes some special sequences such as (*CR) at the start of | (l) PCRE recognizes some special sequences such as (*CR) at the start of | 
| Line 203  Cambridge CB2 3QH, England. | Line 220  Cambridge CB2 3QH, England. | 
 | REVISION | REVISION | 
 | </b><br> | </b><br> | 
 | <P> | <P> | 
| Last updated: 14 November 2011 | Last updated: 19 March 2013 | 
 | <br> | <br> | 
| Copyright © 1997-2011 University of Cambridge. | Copyright © 1997-2013 University of Cambridge. | 
 | <br> | <br> | 
 | <p> | <p> | 
 | Return to the <a href="index.html">PCRE index page</a>. | Return to the <a href="index.html">PCRE index page</a>. |