--- embedaddon/pcre/doc/pcrematching.3 2012/10/09 09:19:17 1.1.1.3 +++ embedaddon/pcre/doc/pcrematching.3 2014/06/15 19:46:05 1.1.1.5 @@ -1,4 +1,4 @@ -.TH PCREMATCHING 3 "08 January 2012" "PCRE 8.30" +.TH PCREMATCHING 3 "12 November 2013" "PCRE 8.34" .SH NAME PCRE - Perl-compatible regular expressions .SH "PCRE MATCHING ALGORITHMS" @@ -6,19 +6,20 @@ PCRE - Perl-compatible regular expressions .sp This document describes the two different algorithms that are available in PCRE for matching a compiled regular expression against a given subject string. The -"standard" algorithm is the one provided by the \fBpcre_exec()\fP and -\fBpcre16_exec()\fP functions. These work in the same was as Perl's matching -function, and provide a Perl-compatible matching operation. The just-in-time -(JIT) optimization that is described in the +"standard" algorithm is the one provided by the \fBpcre_exec()\fP, +\fBpcre16_exec()\fP and \fBpcre32_exec()\fP functions. These work in the same +as as Perl's matching function, and provide a Perl-compatible matching operation. +The just-in-time (JIT) optimization that is described in the .\" HREF \fBpcrejit\fP .\" documentation is compatible with these functions. .P -An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP and -\fBpcre16_dfa_exec()\fP functions; they operate in a different way, and are not -Perl-compatible. This alternative has advantages and disadvantages compared -with the standard algorithm, and these are described below. +An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP, +\fBpcre16_dfa_exec()\fP and \fBpcre32_dfa_exec()\fP functions; they operate in +a different way, and are not Perl-compatible. This alternative has advantages +and disadvantages compared with the standard algorithm, and these are described +below. .P When there is only one possible way in which a given subject string can match a pattern, the two algorithms give the same answer. A difference arises, however, @@ -105,6 +106,14 @@ the three strings "caterpillar", "cater", and "cat" th character of the subject. The algorithm does not automatically move on to find matches that start at later positions. .P +PCRE's "auto-possessification" optimization usually applies to character +repeats at the end of a pattern (as well as internally). For example, the +pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point +even considering the possibility of backtracking into the repeated digits. For +DFA matching, this means that only one possible match is found. If you really +do want multiple matches in such cases, either use an ungreedy repeat +("a\ed+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. +.P There are a number of features of PCRE regular expressions that are not supported by the alternative matching algorithm. They are as follows: .P @@ -140,9 +149,9 @@ and not on others), is not supported. It causes an err always 1, and the value of the \fIcapture_last\fP field is always -1. .P 7. The \eC escape sequence, which (in the standard algorithm) always matches a -single data unit, even in UTF-8 or UTF-16 modes, is not supported in these -modes, because the alternative algorithm moves through the subject string one -character (not data unit) at a time, for all active paths through the tree. +single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in +these modes, because the alternative algorithm moves through the subject string +one character (not data unit) at a time, for all active paths through the tree. .P 8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not supported. (*FAIL) is supported, and behaves like a failing negative assertion. @@ -200,6 +209,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 08 January 2012 +Last updated: 12 November 2013 Copyright (c) 1997-2012 University of Cambridge. .fi