version 1.1.1.2, 2012/02/21 23:50:25
|
version 1.1.1.5, 2014/06/15 19:46:05
|
Line 1
|
Line 1
|
.TH PCREMATCHING 3 | .TH PCREMATCHING 3 "12 November 2013" "PCRE 8.34" |
.SH NAME |
.SH NAME |
PCRE - Perl-compatible regular expressions |
PCRE - Perl-compatible regular expressions |
.SH "PCRE MATCHING ALGORITHMS" |
.SH "PCRE MATCHING ALGORITHMS" |
Line 6 PCRE - Perl-compatible regular expressions
|
Line 6 PCRE - Perl-compatible regular expressions
|
.sp |
.sp |
This document describes the two different algorithms that are available in PCRE |
This document describes the two different algorithms that are available in PCRE |
for matching a compiled regular expression against a given subject string. The |
for matching a compiled regular expression against a given subject string. The |
"standard" algorithm is the one provided by the \fBpcre_exec()\fP and | "standard" algorithm is the one provided by the \fBpcre_exec()\fP, |
\fBpcre16_exec()\fP functions. These work in the same was as Perl's matching | \fBpcre16_exec()\fP and \fBpcre32_exec()\fP functions. These work in the same |
function, and provide a Perl-compatible matching operation. The just-in-time | as as Perl's matching function, and provide a Perl-compatible matching operation. |
(JIT) optimization that is described in the | The just-in-time (JIT) optimization that is described in the |
.\" HREF |
.\" HREF |
\fBpcrejit\fP |
\fBpcrejit\fP |
.\" |
.\" |
documentation is compatible with these functions. |
documentation is compatible with these functions. |
.P |
.P |
An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP and | An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP, |
\fBpcre16_dfa_exec()\fP functions; they operate in a different way, and are not | \fBpcre16_dfa_exec()\fP and \fBpcre32_dfa_exec()\fP functions; they operate in |
Perl-compatible. This alternative has advantages and disadvantages compared | a different way, and are not Perl-compatible. This alternative has advantages |
with the standard algorithm, and these are described below. | and disadvantages compared with the standard algorithm, and these are described |
| below. |
.P |
.P |
When there is only one possible way in which a given subject string can match a |
When there is only one possible way in which a given subject string can match a |
pattern, the two algorithms give the same answer. A difference arises, however, |
pattern, the two algorithms give the same answer. A difference arises, however, |
Line 105 the three strings "caterpillar", "cater", and "cat" th
|
Line 106 the three strings "caterpillar", "cater", and "cat" th
|
character of the subject. The algorithm does not automatically move on to find |
character of the subject. The algorithm does not automatically move on to find |
matches that start at later positions. |
matches that start at later positions. |
.P |
.P |
|
PCRE's "auto-possessification" optimization usually applies to character |
|
repeats at the end of a pattern (as well as internally). For example, the |
|
pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point |
|
even considering the possibility of backtracking into the repeated digits. For |
|
DFA matching, this means that only one possible match is found. If you really |
|
do want multiple matches in such cases, either use an ungreedy repeat |
|
("a\ed+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. |
|
.P |
There are a number of features of PCRE regular expressions that are not |
There are a number of features of PCRE regular expressions that are not |
supported by the alternative matching algorithm. They are as follows: |
supported by the alternative matching algorithm. They are as follows: |
.P |
.P |
Line 140 and not on others), is not supported. It causes an err
|
Line 149 and not on others), is not supported. It causes an err
|
always 1, and the value of the \fIcapture_last\fP field is always -1. |
always 1, and the value of the \fIcapture_last\fP field is always -1. |
.P |
.P |
7. The \eC escape sequence, which (in the standard algorithm) always matches a |
7. The \eC escape sequence, which (in the standard algorithm) always matches a |
single data unit, even in UTF-8 or UTF-16 modes, is not supported in these | single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in |
modes, because the alternative algorithm moves through the subject string one | these modes, because the alternative algorithm moves through the subject string |
character (not data unit) at a time, for all active paths through the tree. | one character (not data unit) at a time, for all active paths through the tree. |
.P |
.P |
8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not |
8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not |
supported. (*FAIL) is supported, and behaves like a failing negative assertion. |
supported. (*FAIL) is supported, and behaves like a failing negative assertion. |
Line 200 Cambridge CB2 3QH, England.
|
Line 209 Cambridge CB2 3QH, England.
|
.rs |
.rs |
.sp |
.sp |
.nf |
.nf |
Last updated: 08 January 2012 | Last updated: 12 November 2013 |
Copyright (c) 1997-2012 University of Cambridge. |
Copyright (c) 1997-2012 University of Cambridge. |
.fi |
.fi |