|
version 1.1.1.1, 2012/02/21 23:05:52
|
version 1.1.1.4, 2013/07/22 08:25:56
|
|
Line 1
|
Line 1
|
| .TH PCREMATCHING 3 | .TH PCREMATCHING 3 "08 January 2012" "PCRE 8.30" |
| .SH NAME |
.SH NAME |
| PCRE - Perl-compatible regular expressions |
PCRE - Perl-compatible regular expressions |
| .SH "PCRE MATCHING ALGORITHMS" |
.SH "PCRE MATCHING ALGORITHMS" |
|
Line 6 PCRE - Perl-compatible regular expressions
|
Line 6 PCRE - Perl-compatible regular expressions
|
| .sp |
.sp |
| This document describes the two different algorithms that are available in PCRE |
This document describes the two different algorithms that are available in PCRE |
| for matching a compiled regular expression against a given subject string. The |
for matching a compiled regular expression against a given subject string. The |
| "standard" algorithm is the one provided by the \fBpcre_exec()\fP function. | "standard" algorithm is the one provided by the \fBpcre_exec()\fP, |
| This works in the same was as Perl's matching function, and provides a | \fBpcre16_exec()\fP and \fBpcre32_exec()\fP functions. These work in the same |
| Perl-compatible matching operation. | as as Perl's matching function, and provide a Perl-compatible matching operation. |
| | The just-in-time (JIT) optimization that is described in the |
| | .\" HREF |
| | \fBpcrejit\fP |
| | .\" |
| | documentation is compatible with these functions. |
| .P |
.P |
| An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP function; | An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP, |
| this operates in a different way, and is not Perl-compatible. It has advantages | \fBpcre16_dfa_exec()\fP and \fBpcre32_dfa_exec()\fP functions; they operate in |
| | a different way, and are not Perl-compatible. This alternative has advantages |
| and disadvantages compared with the standard algorithm, and these are described |
and disadvantages compared with the standard algorithm, and these are described |
| below. |
below. |
| .P |
.P |
|
Line 28 is matched against the string
|
Line 34 is matched against the string
|
| there are three possible answers. The standard algorithm finds only one of |
there are three possible answers. The standard algorithm finds only one of |
| them, whereas the alternative algorithm finds all three. |
them, whereas the alternative algorithm finds all three. |
| . |
. |
| |
. |
| .SH "REGULAR EXPRESSIONS AS TREES" |
.SH "REGULAR EXPRESSIONS AS TREES" |
| .rs |
.rs |
| .sp |
.sp |
|
Line 38 string (from a given starting point) can be thought of
|
Line 45 string (from a given starting point) can be thought of
|
| There are two ways to search a tree: depth-first and breadth-first, and these |
There are two ways to search a tree: depth-first and breadth-first, and these |
| correspond to the two matching algorithms provided by PCRE. |
correspond to the two matching algorithms provided by PCRE. |
| . |
. |
| |
. |
| .SH "THE STANDARD MATCHING ALGORITHM" |
.SH "THE STANDARD MATCHING ALGORITHM" |
| .rs |
.rs |
| .sp |
.sp |
|
Line 63 straightforward for this algorithm to keep track of th
|
Line 71 straightforward for this algorithm to keep track of th
|
| matched by portions of the pattern in parentheses. This provides support for |
matched by portions of the pattern in parentheses. This provides support for |
| capturing parentheses and back references. |
capturing parentheses and back references. |
| . |
. |
| |
. |
| .SH "THE ALTERNATIVE MATCHING ALGORITHM" |
.SH "THE ALTERNATIVE MATCHING ALGORITHM" |
| .rs |
.rs |
| .sp |
.sp |
|
Line 131 and not on others), is not supported. It causes an err
|
Line 140 and not on others), is not supported. It causes an err
|
| 6. Callouts are supported, but the value of the \fIcapture_top\fP field is |
6. Callouts are supported, but the value of the \fIcapture_top\fP field is |
| always 1, and the value of the \fIcapture_last\fP field is always -1. |
always 1, and the value of the \fIcapture_last\fP field is always -1. |
| .P |
.P |
| 7. The \eC escape sequence, which (in the standard algorithm) matches a single | 7. The \eC escape sequence, which (in the standard algorithm) always matches a |
| byte, even in UTF-8 mode, is not supported in UTF-8 mode, because the | single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in |
| alternative algorithm moves through the subject string one character at a time, | these modes, because the alternative algorithm moves through the subject string |
| for all active paths through the tree. | one character (not data unit) at a time, for all active paths through the tree. |
| .P |
.P |
| 8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not |
8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not |
| supported. (*FAIL) is supported, and behaves like a failing negative assertion. |
supported. (*FAIL) is supported, and behaves like a failing negative assertion. |
| . |
. |
| |
. |
| .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM" |
.SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM" |
| .rs |
.rs |
| .sp |
.sp |
|
Line 150 match using the standard algorithm, you have to do klu
|
Line 160 match using the standard algorithm, you have to do klu
|
| callouts. |
callouts. |
| .P |
.P |
| 2. Because the alternative algorithm scans the subject string just once, and |
2. Because the alternative algorithm scans the subject string just once, and |
| never needs to backtrack, it is possible to pass very long subject strings to | never needs to backtrack (except for lookbehinds), it is possible to pass very |
| the matching function in several pieces, checking for partial matching each | long subject strings to the matching function in several pieces, checking for |
| time. Although it is possible to do multi-segment matching using the standard | partial matching each time. Although it is possible to do multi-segment |
| algorithm (\fBpcre_exec()\fP), by retaining partially matched substrings, it is | matching using the standard algorithm by retaining partially matched |
| more complicated. The | substrings, it is more complicated. The |
| .\" HREF |
.\" HREF |
| \fBpcrepartial\fP |
\fBpcrepartial\fP |
| .\" |
.\" |
|
Line 191 Cambridge CB2 3QH, England.
|
Line 201 Cambridge CB2 3QH, England.
|
| .rs |
.rs |
| .sp |
.sp |
| .nf |
.nf |
| Last updated: 19 November 2011 | Last updated: 08 January 2012 |
| Copyright (c) 1997-2010 University of Cambridge. | Copyright (c) 1997-2012 University of Cambridge. |
| .fi |
.fi |