--- embedaddon/pcre/doc/pcrematching.3 2012/02/21 23:05:52 1.1.1.1 +++ embedaddon/pcre/doc/pcrematching.3 2012/02/21 23:50:25 1.1.1.2 @@ -6,14 +6,19 @@ PCRE - Perl-compatible regular expressions .sp This document describes the two different algorithms that are available in PCRE for matching a compiled regular expression against a given subject string. The -"standard" algorithm is the one provided by the \fBpcre_exec()\fP function. -This works in the same was as Perl's matching function, and provides a -Perl-compatible matching operation. +"standard" algorithm is the one provided by the \fBpcre_exec()\fP and +\fBpcre16_exec()\fP functions. These work in the same was as Perl's matching +function, and provide a Perl-compatible matching operation. The just-in-time +(JIT) optimization that is described in the +.\" HREF +\fBpcrejit\fP +.\" +documentation is compatible with these functions. .P -An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP function; -this operates in a different way, and is not Perl-compatible. It has advantages -and disadvantages compared with the standard algorithm, and these are described -below. +An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP and +\fBpcre16_dfa_exec()\fP functions; they operate in a different way, and are not +Perl-compatible. This alternative has advantages and disadvantages compared +with the standard algorithm, and these are described below. .P When there is only one possible way in which a given subject string can match a pattern, the two algorithms give the same answer. A difference arises, however, @@ -28,6 +33,7 @@ is matched against the string there are three possible answers. The standard algorithm finds only one of them, whereas the alternative algorithm finds all three. . +. .SH "REGULAR EXPRESSIONS AS TREES" .rs .sp @@ -38,6 +44,7 @@ string (from a given starting point) can be thought of There are two ways to search a tree: depth-first and breadth-first, and these correspond to the two matching algorithms provided by PCRE. . +. .SH "THE STANDARD MATCHING ALGORITHM" .rs .sp @@ -63,6 +70,7 @@ straightforward for this algorithm to keep track of th matched by portions of the pattern in parentheses. This provides support for capturing parentheses and back references. . +. .SH "THE ALTERNATIVE MATCHING ALGORITHM" .rs .sp @@ -131,14 +139,15 @@ and not on others), is not supported. It causes an err 6. Callouts are supported, but the value of the \fIcapture_top\fP field is always 1, and the value of the \fIcapture_last\fP field is always -1. .P -7. The \eC escape sequence, which (in the standard algorithm) matches a single -byte, even in UTF-8 mode, is not supported in UTF-8 mode, because the -alternative algorithm moves through the subject string one character at a time, -for all active paths through the tree. +7. The \eC escape sequence, which (in the standard algorithm) always matches a +single data unit, even in UTF-8 or UTF-16 modes, is not supported in these +modes, because the alternative algorithm moves through the subject string one +character (not data unit) at a time, for all active paths through the tree. .P 8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not supported. (*FAIL) is supported, and behaves like a failing negative assertion. . +. .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM" .rs .sp @@ -150,11 +159,11 @@ match using the standard algorithm, you have to do klu callouts. .P 2. Because the alternative algorithm scans the subject string just once, and -never needs to backtrack, it is possible to pass very long subject strings to -the matching function in several pieces, checking for partial matching each -time. Although it is possible to do multi-segment matching using the standard -algorithm (\fBpcre_exec()\fP), by retaining partially matched substrings, it is -more complicated. The +never needs to backtrack (except for lookbehinds), it is possible to pass very +long subject strings to the matching function in several pieces, checking for +partial matching each time. Although it is possible to do multi-segment +matching using the standard algorithm by retaining partially matched +substrings, it is more complicated. The .\" HREF \fBpcrepartial\fP .\" @@ -191,6 +200,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 19 November 2011 -Copyright (c) 1997-2010 University of Cambridge. +Last updated: 08 January 2012 +Copyright (c) 1997-2012 University of Cambridge. .fi