--- embedaddon/pcre/doc/html/pcrematching.html 2012/02/21 23:05:52 1.1 +++ embedaddon/pcre/doc/html/pcrematching.html 2014/06/15 19:46:05 1.1.1.4 @@ -26,13 +26,17 @@ man page, in case the conversion went wrong.

This document describes the two different algorithms that are available in PCRE for matching a compiled regular expression against a given subject string. The -"standard" algorithm is the one provided by the pcre_exec() function. -This works in the same was as Perl's matching function, and provides a -Perl-compatible matching operation. +"standard" algorithm is the one provided by the pcre_exec(), +pcre16_exec() and pcre32_exec() functions. These work in the same +as as Perl's matching function, and provide a Perl-compatible matching operation. +The just-in-time (JIT) optimization that is described in the +pcrejit +documentation is compatible with these functions.

-An alternative algorithm is provided by the pcre_dfa_exec() function; -this operates in a different way, and is not Perl-compatible. It has advantages +An alternative algorithm is provided by the pcre_dfa_exec(), +pcre16_dfa_exec() and pcre32_dfa_exec() functions; they operate in +a different way, and are not Perl-compatible. This alternative has advantages and disadvantages compared with the standard algorithm, and these are described below.

@@ -122,6 +126,15 @@ character of the subject. The algorithm does not autom matches that start at later positions.

+PCRE's "auto-possessification" optimization usually applies to character +repeats at the end of a pattern (as well as internally). For example, the +pattern "a\d+" is compiled as if it were "a\d++" because there is no point +even considering the possibility of backtracking into the repeated digits. For +DFA matching, this means that only one possible match is found. If you really +do want multiple matches in such cases, either use an ungreedy repeat +("a\d+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. +

+

There are a number of features of PCRE regular expressions that are not supported by the alternative matching algorithm. They are as follows:

@@ -163,10 +176,10 @@ and not on others), is not supported. It causes an err always 1, and the value of the capture_last field is always -1.

-7. The \C escape sequence, which (in the standard algorithm) matches a single -byte, even in UTF-8 mode, is not supported in UTF-8 mode, because the -alternative algorithm moves through the subject string one character at a time, -for all active paths through the tree. +7. The \C escape sequence, which (in the standard algorithm) always matches a +single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in +these modes, because the alternative algorithm moves through the subject string +one character (not data unit) at a time, for all active paths through the tree.

8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not @@ -184,11 +197,11 @@ callouts.

2. Because the alternative algorithm scans the subject string just once, and -never needs to backtrack, it is possible to pass very long subject strings to -the matching function in several pieces, checking for partial matching each -time. Although it is possible to do multi-segment matching using the standard -algorithm (pcre_exec()), by retaining partially matched substrings, it is -more complicated. The +never needs to backtrack (except for lookbehinds), it is possible to pass very +long subject strings to the matching function in several pieces, checking for +partial matching each time. Although it is possible to do multi-segment +matching using the standard algorithm by retaining partially matched +substrings, it is more complicated. The pcrepartial documentation gives details of partial matching and discusses multi-segment matching. @@ -220,9 +233,9 @@ Cambridge CB2 3QH, England.


REVISION

-Last updated: 19 November 2011 +Last updated: 12 November 2013
-Copyright © 1997-2010 University of Cambridge. +Copyright © 1997-2012 University of Cambridge.

Return to the PCRE index page.