--- embedaddon/pcre/doc/html/pcrematching.html 2012/02/21 23:50:25 1.1.1.2 +++ embedaddon/pcre/doc/html/pcrematching.html 2014/06/15 19:46:05 1.1.1.4 @@ -26,18 +26,19 @@ man page, in case the conversion went wrong.

This document describes the two different algorithms that are available in PCRE for matching a compiled regular expression against a given subject string. The -"standard" algorithm is the one provided by the pcre_exec() and -pcre16_exec() functions. These work in the same was as Perl's matching -function, and provide a Perl-compatible matching operation. The just-in-time -(JIT) optimization that is described in the +"standard" algorithm is the one provided by the pcre_exec(), +pcre16_exec() and pcre32_exec() functions. These work in the same +as as Perl's matching function, and provide a Perl-compatible matching operation. +The just-in-time (JIT) optimization that is described in the pcrejit documentation is compatible with these functions.

-An alternative algorithm is provided by the pcre_dfa_exec() and -pcre16_dfa_exec() functions; they operate in a different way, and are not -Perl-compatible. This alternative has advantages and disadvantages compared -with the standard algorithm, and these are described below. +An alternative algorithm is provided by the pcre_dfa_exec(), +pcre16_dfa_exec() and pcre32_dfa_exec() functions; they operate in +a different way, and are not Perl-compatible. This alternative has advantages +and disadvantages compared with the standard algorithm, and these are described +below.

When there is only one possible way in which a given subject string can match a @@ -125,6 +126,15 @@ character of the subject. The algorithm does not autom matches that start at later positions.

+PCRE's "auto-possessification" optimization usually applies to character +repeats at the end of a pattern (as well as internally). For example, the +pattern "a\d+" is compiled as if it were "a\d++" because there is no point +even considering the possibility of backtracking into the repeated digits. For +DFA matching, this means that only one possible match is found. If you really +do want multiple matches in such cases, either use an ungreedy repeat +("a\d+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. +

+

There are a number of features of PCRE regular expressions that are not supported by the alternative matching algorithm. They are as follows:

@@ -167,9 +177,9 @@ always 1, and the value of the capture_last fie

7. The \C escape sequence, which (in the standard algorithm) always matches a -single data unit, even in UTF-8 or UTF-16 modes, is not supported in these -modes, because the alternative algorithm moves through the subject string one -character (not data unit) at a time, for all active paths through the tree. +single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in +these modes, because the alternative algorithm moves through the subject string +one character (not data unit) at a time, for all active paths through the tree.

8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not @@ -223,7 +233,7 @@ Cambridge CB2 3QH, England.


REVISION

-Last updated: 08 January 2012 +Last updated: 12 November 2013
Copyright © 1997-2012 University of Cambridge.