--- embedaddon/pcre/doc/pcrematching.3 2012/02/21 23:05:52 1.1 +++ embedaddon/pcre/doc/pcrematching.3 2014/06/15 19:46:05 1.1.1.5 @@ -1,4 +1,4 @@ -.TH PCREMATCHING 3 +.TH PCREMATCHING 3 "12 November 2013" "PCRE 8.34" .SH NAME PCRE - Perl-compatible regular expressions .SH "PCRE MATCHING ALGORITHMS" @@ -6,12 +6,18 @@ PCRE - Perl-compatible regular expressions .sp This document describes the two different algorithms that are available in PCRE for matching a compiled regular expression against a given subject string. The -"standard" algorithm is the one provided by the \fBpcre_exec()\fP function. -This works in the same was as Perl's matching function, and provides a -Perl-compatible matching operation. +"standard" algorithm is the one provided by the \fBpcre_exec()\fP, +\fBpcre16_exec()\fP and \fBpcre32_exec()\fP functions. These work in the same +as as Perl's matching function, and provide a Perl-compatible matching operation. +The just-in-time (JIT) optimization that is described in the +.\" HREF +\fBpcrejit\fP +.\" +documentation is compatible with these functions. .P -An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP function; -this operates in a different way, and is not Perl-compatible. It has advantages +An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP, +\fBpcre16_dfa_exec()\fP and \fBpcre32_dfa_exec()\fP functions; they operate in +a different way, and are not Perl-compatible. This alternative has advantages and disadvantages compared with the standard algorithm, and these are described below. .P @@ -28,6 +34,7 @@ is matched against the string there are three possible answers. The standard algorithm finds only one of them, whereas the alternative algorithm finds all three. . +. .SH "REGULAR EXPRESSIONS AS TREES" .rs .sp @@ -38,6 +45,7 @@ string (from a given starting point) can be thought of There are two ways to search a tree: depth-first and breadth-first, and these correspond to the two matching algorithms provided by PCRE. . +. .SH "THE STANDARD MATCHING ALGORITHM" .rs .sp @@ -63,6 +71,7 @@ straightforward for this algorithm to keep track of th matched by portions of the pattern in parentheses. This provides support for capturing parentheses and back references. . +. .SH "THE ALTERNATIVE MATCHING ALGORITHM" .rs .sp @@ -97,6 +106,14 @@ the three strings "caterpillar", "cater", and "cat" th character of the subject. The algorithm does not automatically move on to find matches that start at later positions. .P +PCRE's "auto-possessification" optimization usually applies to character +repeats at the end of a pattern (as well as internally). For example, the +pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point +even considering the possibility of backtracking into the repeated digits. For +DFA matching, this means that only one possible match is found. If you really +do want multiple matches in such cases, either use an ungreedy repeat +("a\ed+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. +.P There are a number of features of PCRE regular expressions that are not supported by the alternative matching algorithm. They are as follows: .P @@ -131,14 +148,15 @@ and not on others), is not supported. It causes an err 6. Callouts are supported, but the value of the \fIcapture_top\fP field is always 1, and the value of the \fIcapture_last\fP field is always -1. .P -7. The \eC escape sequence, which (in the standard algorithm) matches a single -byte, even in UTF-8 mode, is not supported in UTF-8 mode, because the -alternative algorithm moves through the subject string one character at a time, -for all active paths through the tree. +7. The \eC escape sequence, which (in the standard algorithm) always matches a +single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in +these modes, because the alternative algorithm moves through the subject string +one character (not data unit) at a time, for all active paths through the tree. .P 8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not supported. (*FAIL) is supported, and behaves like a failing negative assertion. . +. .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM" .rs .sp @@ -150,11 +168,11 @@ match using the standard algorithm, you have to do klu callouts. .P 2. Because the alternative algorithm scans the subject string just once, and -never needs to backtrack, it is possible to pass very long subject strings to -the matching function in several pieces, checking for partial matching each -time. Although it is possible to do multi-segment matching using the standard -algorithm (\fBpcre_exec()\fP), by retaining partially matched substrings, it is -more complicated. The +never needs to backtrack (except for lookbehinds), it is possible to pass very +long subject strings to the matching function in several pieces, checking for +partial matching each time. Although it is possible to do multi-segment +matching using the standard algorithm by retaining partially matched +substrings, it is more complicated. The .\" HREF \fBpcrepartial\fP .\" @@ -191,6 +209,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 19 November 2011 -Copyright (c) 1997-2010 University of Cambridge. +Last updated: 12 November 2013 +Copyright (c) 1997-2012 University of Cambridge. .fi