--- embedaddon/pcre/doc/html/pcrematching.html 2012/02/21 23:05:52 1.1 +++ embedaddon/pcre/doc/html/pcrematching.html 2012/02/21 23:50:25 1.1.1.2 @@ -26,15 +26,18 @@ man page, in case the conversion went wrong.

This document describes the two different algorithms that are available in PCRE for matching a compiled regular expression against a given subject string. The -"standard" algorithm is the one provided by the pcre_exec() function. -This works in the same was as Perl's matching function, and provides a -Perl-compatible matching operation. +"standard" algorithm is the one provided by the pcre_exec() and +pcre16_exec() functions. These work in the same was as Perl's matching +function, and provide a Perl-compatible matching operation. The just-in-time +(JIT) optimization that is described in the +pcrejit +documentation is compatible with these functions.

-An alternative algorithm is provided by the pcre_dfa_exec() function; -this operates in a different way, and is not Perl-compatible. It has advantages -and disadvantages compared with the standard algorithm, and these are described -below. +An alternative algorithm is provided by the pcre_dfa_exec() and +pcre16_dfa_exec() functions; they operate in a different way, and are not +Perl-compatible. This alternative has advantages and disadvantages compared +with the standard algorithm, and these are described below.

When there is only one possible way in which a given subject string can match a @@ -163,10 +166,10 @@ and not on others), is not supported. It causes an err always 1, and the value of the capture_last field is always -1.

-7. The \C escape sequence, which (in the standard algorithm) matches a single -byte, even in UTF-8 mode, is not supported in UTF-8 mode, because the -alternative algorithm moves through the subject string one character at a time, -for all active paths through the tree. +7. The \C escape sequence, which (in the standard algorithm) always matches a +single data unit, even in UTF-8 or UTF-16 modes, is not supported in these +modes, because the alternative algorithm moves through the subject string one +character (not data unit) at a time, for all active paths through the tree.

8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not @@ -184,11 +187,11 @@ callouts.

2. Because the alternative algorithm scans the subject string just once, and -never needs to backtrack, it is possible to pass very long subject strings to -the matching function in several pieces, checking for partial matching each -time. Although it is possible to do multi-segment matching using the standard -algorithm (pcre_exec()), by retaining partially matched substrings, it is -more complicated. The +never needs to backtrack (except for lookbehinds), it is possible to pass very +long subject strings to the matching function in several pieces, checking for +partial matching each time. Although it is possible to do multi-segment +matching using the standard algorithm by retaining partially matched +substrings, it is more complicated. The pcrepartial documentation gives details of partial matching and discusses multi-segment matching. @@ -220,9 +223,9 @@ Cambridge CB2 3QH, England.


REVISION

-Last updated: 19 November 2011 +Last updated: 08 January 2012
-Copyright © 1997-2010 University of Cambridge. +Copyright © 1997-2012 University of Cambridge.

Return to the PCRE index page.