Diff for /embedaddon/pcre/doc/pcrematching.3 between versions 1.1 and 1.1.1.3

version 1.1, 2012/02/21 23:05:52 version 1.1.1.3, 2012/10/09 09:19:17
Line 1 Line 1
.TH PCREMATCHING 3.TH PCREMATCHING 3 "08 January 2012" "PCRE 8.30"
 .SH NAME  .SH NAME
 PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
 .SH "PCRE MATCHING ALGORITHMS"  .SH "PCRE MATCHING ALGORITHMS"
Line 6  PCRE - Perl-compatible regular expressions Line 6  PCRE - Perl-compatible regular expressions
 .sp  .sp
 This document describes the two different algorithms that are available in PCRE  This document describes the two different algorithms that are available in PCRE
 for matching a compiled regular expression against a given subject string. The  for matching a compiled regular expression against a given subject string. The
"standard" algorithm is the one provided by the \fBpcre_exec()\fP function."standard" algorithm is the one provided by the \fBpcre_exec()\fP and
This works in the same was as Perl's matching function, and provides a\fBpcre16_exec()\fP functions. These work in the same was as Perl's matching
Perl-compatible matching operation.function, and provide a Perl-compatible matching operation. The just-in-time
 (JIT) optimization that is described in the
 .\" HREF
 \fBpcrejit\fP
 .\"
 documentation is compatible with these functions.
 .P  .P
An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP function;An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP and
this operates in a different way, and is not Perl-compatible. It has advantages\fBpcre16_dfa_exec()\fP functions; they operate in a different way, and are not
and disadvantages compared with the standard algorithm, and these are describedPerl-compatible. This alternative has advantages and disadvantages compared
below.with the standard algorithm, and these are described below.
 .P  .P
 When there is only one possible way in which a given subject string can match a  When there is only one possible way in which a given subject string can match a
 pattern, the two algorithms give the same answer. A difference arises, however,  pattern, the two algorithms give the same answer. A difference arises, however,
Line 28  is matched against the string Line 33  is matched against the string
 there are three possible answers. The standard algorithm finds only one of  there are three possible answers. The standard algorithm finds only one of
 them, whereas the alternative algorithm finds all three.  them, whereas the alternative algorithm finds all three.
 .  .
   .
 .SH "REGULAR EXPRESSIONS AS TREES"  .SH "REGULAR EXPRESSIONS AS TREES"
 .rs  .rs
 .sp  .sp
Line 38  string (from a given starting point) can be thought of Line 44  string (from a given starting point) can be thought of
 There are two ways to search a tree: depth-first and breadth-first, and these  There are two ways to search a tree: depth-first and breadth-first, and these
 correspond to the two matching algorithms provided by PCRE.  correspond to the two matching algorithms provided by PCRE.
 .  .
   .
 .SH "THE STANDARD MATCHING ALGORITHM"  .SH "THE STANDARD MATCHING ALGORITHM"
 .rs  .rs
 .sp  .sp
Line 63  straightforward for this algorithm to keep track of th Line 70  straightforward for this algorithm to keep track of th
 matched by portions of the pattern in parentheses. This provides support for  matched by portions of the pattern in parentheses. This provides support for
 capturing parentheses and back references.  capturing parentheses and back references.
 .  .
   .
 .SH "THE ALTERNATIVE MATCHING ALGORITHM"  .SH "THE ALTERNATIVE MATCHING ALGORITHM"
 .rs  .rs
 .sp  .sp
Line 131  and not on others), is not supported. It causes an err Line 139  and not on others), is not supported. It causes an err
 6. Callouts are supported, but the value of the \fIcapture_top\fP field is  6. Callouts are supported, but the value of the \fIcapture_top\fP field is
 always 1, and the value of the \fIcapture_last\fP field is always -1.  always 1, and the value of the \fIcapture_last\fP field is always -1.
 .P  .P
7. The \eC escape sequence, which (in the standard algorithm) matches a single7. The \eC escape sequence, which (in the standard algorithm) always matches a
byte, even in UTF-8 mode, is not supported in UTF-8 mode, because thesingle data unit, even in UTF-8 or UTF-16 modes, is not supported in these
alternative algorithm moves through the subject string one character at a time,modes, because the alternative algorithm moves through the subject string one
for all active paths through the tree.character (not data unit) at a time, for all active paths through the tree.
 .P  .P
 8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not  8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
 supported. (*FAIL) is supported, and behaves like a failing negative assertion.  supported. (*FAIL) is supported, and behaves like a failing negative assertion.
 .  .
   .
 .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM"  .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM"
 .rs  .rs
 .sp  .sp
Line 150  match using the standard algorithm, you have to do klu Line 159  match using the standard algorithm, you have to do klu
 callouts.  callouts.
 .P  .P
 2. Because the alternative algorithm scans the subject string just once, and  2. Because the alternative algorithm scans the subject string just once, and
never needs to backtrack, it is possible to pass very long subject strings tonever needs to backtrack (except for lookbehinds), it is possible to pass very
the matching function in several pieces, checking for partial matching eachlong subject strings to the matching function in several pieces, checking for
time. Although it is possible to do multi-segment matching using the standardpartial matching each time. Although it is possible to do multi-segment
algorithm (\fBpcre_exec()\fP), by retaining partially matched substrings, it ismatching using the standard algorithm by retaining partially matched
more complicated. Thesubstrings, it is more complicated. The
 .\" HREF  .\" HREF
 \fBpcrepartial\fP  \fBpcrepartial\fP
 .\"  .\"
Line 191  Cambridge CB2 3QH, England. Line 200  Cambridge CB2 3QH, England.
 .rs  .rs
 .sp  .sp
 .nf  .nf
Last updated: 19 November 2011Last updated: 08 January 2012
Copyright (c) 1997-2010 University of Cambridge.Copyright (c) 1997-2012 University of Cambridge.
 .fi  .fi

Removed from v.1.1  
changed lines
  Added in v.1.1.1.3


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>