Diff for /embedaddon/pcre/doc/pcrematching.3 between versions 1.1.1.1 and 1.1.1.5

version 1.1.1.1, 2012/02/21 23:05:52 version 1.1.1.5, 2014/06/15 19:46:05
Line 1 Line 1
.TH PCREMATCHING 3.TH PCREMATCHING 3 "12 November 2013" "PCRE 8.34"
 .SH NAME  .SH NAME
 PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
 .SH "PCRE MATCHING ALGORITHMS"  .SH "PCRE MATCHING ALGORITHMS"
Line 6  PCRE - Perl-compatible regular expressions Line 6  PCRE - Perl-compatible regular expressions
 .sp  .sp
 This document describes the two different algorithms that are available in PCRE  This document describes the two different algorithms that are available in PCRE
 for matching a compiled regular expression against a given subject string. The  for matching a compiled regular expression against a given subject string. The
"standard" algorithm is the one provided by the \fBpcre_exec()\fP function."standard" algorithm is the one provided by the \fBpcre_exec()\fP,
This works in the same was as Perl's matching function, and provides a\fBpcre16_exec()\fP and \fBpcre32_exec()\fP functions. These work in the same
Perl-compatible matching operation.as as Perl's matching function, and provide a Perl-compatible matching operation.
 The just-in-time (JIT) optimization that is described in the
 .\" HREF
 \fBpcrejit\fP
 .\"
 documentation is compatible with these functions.
 .P  .P
An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP function;An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP,
this operates in a different way, and is not Perl-compatible. It has advantages\fBpcre16_dfa_exec()\fP and \fBpcre32_dfa_exec()\fP functions; they operate in
 a different way, and are not Perl-compatible. This alternative has advantages
 and disadvantages compared with the standard algorithm, and these are described  and disadvantages compared with the standard algorithm, and these are described
 below.  below.
 .P  .P
Line 28  is matched against the string Line 34  is matched against the string
 there are three possible answers. The standard algorithm finds only one of  there are three possible answers. The standard algorithm finds only one of
 them, whereas the alternative algorithm finds all three.  them, whereas the alternative algorithm finds all three.
 .  .
   .
 .SH "REGULAR EXPRESSIONS AS TREES"  .SH "REGULAR EXPRESSIONS AS TREES"
 .rs  .rs
 .sp  .sp
Line 38  string (from a given starting point) can be thought of Line 45  string (from a given starting point) can be thought of
 There are two ways to search a tree: depth-first and breadth-first, and these  There are two ways to search a tree: depth-first and breadth-first, and these
 correspond to the two matching algorithms provided by PCRE.  correspond to the two matching algorithms provided by PCRE.
 .  .
   .
 .SH "THE STANDARD MATCHING ALGORITHM"  .SH "THE STANDARD MATCHING ALGORITHM"
 .rs  .rs
 .sp  .sp
Line 63  straightforward for this algorithm to keep track of th Line 71  straightforward for this algorithm to keep track of th
 matched by portions of the pattern in parentheses. This provides support for  matched by portions of the pattern in parentheses. This provides support for
 capturing parentheses and back references.  capturing parentheses and back references.
 .  .
   .
 .SH "THE ALTERNATIVE MATCHING ALGORITHM"  .SH "THE ALTERNATIVE MATCHING ALGORITHM"
 .rs  .rs
 .sp  .sp
Line 97  the three strings "caterpillar", "cater", and "cat" th Line 106  the three strings "caterpillar", "cater", and "cat" th
 character of the subject. The algorithm does not automatically move on to find  character of the subject. The algorithm does not automatically move on to find
 matches that start at later positions.  matches that start at later positions.
 .P  .P
   PCRE's "auto-possessification" optimization usually applies to character
   repeats at the end of a pattern (as well as internally). For example, the
   pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point
   even considering the possibility of backtracking into the repeated digits. For
   DFA matching, this means that only one possible match is found. If you really
   do want multiple matches in such cases, either use an ungreedy repeat
   ("a\ed+?") or set the PCRE_NO_AUTO_POSSESS option when compiling.
   .P
 There are a number of features of PCRE regular expressions that are not  There are a number of features of PCRE regular expressions that are not
 supported by the alternative matching algorithm. They are as follows:  supported by the alternative matching algorithm. They are as follows:
 .P  .P
Line 131  and not on others), is not supported. It causes an err Line 148  and not on others), is not supported. It causes an err
 6. Callouts are supported, but the value of the \fIcapture_top\fP field is  6. Callouts are supported, but the value of the \fIcapture_top\fP field is
 always 1, and the value of the \fIcapture_last\fP field is always -1.  always 1, and the value of the \fIcapture_last\fP field is always -1.
 .P  .P
7. The \eC escape sequence, which (in the standard algorithm) matches a single7. The \eC escape sequence, which (in the standard algorithm) always matches a
byte, even in UTF-8 mode, is not supported in UTF-8 mode, because thesingle data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in
alternative algorithm moves through the subject string one character at a time,these modes, because the alternative algorithm moves through the subject string
for all active paths through the tree.one character (not data unit) at a time, for all active paths through the tree.
 .P  .P
 8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not  8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
 supported. (*FAIL) is supported, and behaves like a failing negative assertion.  supported. (*FAIL) is supported, and behaves like a failing negative assertion.
 .  .
   .
 .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM"  .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM"
 .rs  .rs
 .sp  .sp
Line 150  match using the standard algorithm, you have to do klu Line 168  match using the standard algorithm, you have to do klu
 callouts.  callouts.
 .P  .P
 2. Because the alternative algorithm scans the subject string just once, and  2. Because the alternative algorithm scans the subject string just once, and
never needs to backtrack, it is possible to pass very long subject strings tonever needs to backtrack (except for lookbehinds), it is possible to pass very
the matching function in several pieces, checking for partial matching eachlong subject strings to the matching function in several pieces, checking for
time. Although it is possible to do multi-segment matching using the standardpartial matching each time. Although it is possible to do multi-segment
algorithm (\fBpcre_exec()\fP), by retaining partially matched substrings, it ismatching using the standard algorithm by retaining partially matched
more complicated. Thesubstrings, it is more complicated. The
 .\" HREF  .\" HREF
 \fBpcrepartial\fP  \fBpcrepartial\fP
 .\"  .\"
Line 191  Cambridge CB2 3QH, England. Line 209  Cambridge CB2 3QH, England.
 .rs  .rs
 .sp  .sp
 .nf  .nf
Last updated: 19 November 2011Last updated: 12 November 2013
Copyright (c) 1997-2010 University of Cambridge.Copyright (c) 1997-2012 University of Cambridge.
 .fi  .fi

Removed from v.1.1.1.1  
changed lines
  Added in v.1.1.1.5


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>