version 1.1.1.3, 2013/07/22 08:25:57
|
version 1.1.1.4, 2014/06/15 19:46:05
|
Line 77 independent groups).
|
Line 77 independent groups).
|
Automatic callouts can be used for tracking the progress of pattern matching. |
Automatic callouts can be used for tracking the progress of pattern matching. |
The |
The |
<a href="pcretest.html"><b>pcretest</b></a> |
<a href="pcretest.html"><b>pcretest</b></a> |
command has an option that sets automatic callouts; when it is used, the output | program has a pattern qualifier (/C) that sets automatic callouts; when it is |
indicates how the pattern is matched. This is useful information when you are | used, the output indicates how the pattern is being matched. This is useful |
trying to optimize the performance of a particular pattern. | information when you are trying to optimize the performance of a particular |
| pattern. |
</P> |
</P> |
<br><a name="SEC3" href="#TOC1">MISSING CALLOUTS</a><br> |
<br><a name="SEC3" href="#TOC1">MISSING CALLOUTS</a><br> |
<P> |
<P> |
You should be aware that, because of optimizations in the way PCRE matches | You should be aware that, because of optimizations in the way PCRE compiles and |
patterns by default, callouts sometimes do not happen. For example, if the | matches patterns, callouts sometimes do not happen exactly as you might expect. |
pattern is | </P> |
| <P> |
| At compile time, PCRE "auto-possessifies" repeated items when it knows that |
| what follows cannot be part of the repeat. For example, a+[bc] is compiled as |
| if it were a++[bc]. The <b>pcretest</b> output when this pattern is anchored and |
| then applied with automatic callouts to the string "aaaa" is: |
<pre> |
<pre> |
|
--->aaaa |
|
+0 ^ ^ |
|
+1 ^ a+ |
|
+3 ^ ^ [bc] |
|
No match |
|
</pre> |
|
This indicates that when matching [bc] fails, there is no backtracking into a+ |
|
and therefore the callouts that would be taken for the backtracks do not occur. |
|
You can disable the auto-possessify feature by passing PCRE_NO_AUTO_POSSESS |
|
to <b>pcre_compile()</b>, or starting the pattern with (*NO_AUTO_POSSESS). If |
|
this is done in <b>pcretest</b> (using the /O qualifier), the output changes to |
|
this: |
|
<pre> |
|
--->aaaa |
|
+0 ^ ^ |
|
+1 ^ a+ |
|
+3 ^ ^ [bc] |
|
+3 ^ ^ [bc] |
|
+3 ^ ^ [bc] |
|
+3 ^^ [bc] |
|
No match |
|
</pre> |
|
This time, when matching [bc] fails, the matcher backtracks into a+ and tries |
|
again, repeatedly, until a+ itself fails. |
|
</P> |
|
<P> |
|
Other optimizations that provide fast "no match" results also affect callouts. |
|
For example, if the pattern is |
|
<pre> |
ab(?C4)cd |
ab(?C4)cd |
</pre> |
</pre> |
PCRE knows that any matching string must contain the letter "d". If the subject |
PCRE knows that any matching string must contain the letter "d". If the subject |
Line 109 callouts such as the example above are obeyed.
|
Line 144 callouts such as the example above are obeyed.
|
<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br> |
<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br> |
<P> |
<P> |
During matching, when PCRE reaches a callout point, the external function |
During matching, when PCRE reaches a callout point, the external function |
defined by <i>pcre_callout</i> or <i>pcre[16|32]_callout</i> is called | defined by <i>pcre_callout</i> or <i>pcre[16|32]_callout</i> is called (if it is |
(if it is set). This applies to both normal and DFA matching. The only | set). This applies to both normal and DFA matching. The only argument to the |
argument to the callout function is a pointer to a <b>pcre_callout</b> | callout function is a pointer to a <b>pcre_callout</b> or |
or <b>pcre[16|32]_callout</b> block. | <b>pcre[16|32]_callout</b> block. These structures contains the following |
These structures contains the following fields: | fields: |
<pre> |
<pre> |
int <i>version</i>; |
int <i>version</i>; |
int <i>callout_number</i>; |
int <i>callout_number</i>; |
Line 242 Cambridge CB2 3QH, England.
|
Line 277 Cambridge CB2 3QH, England.
|
</P> |
</P> |
<br><a name="SEC7" href="#TOC1">REVISION</a><br> |
<br><a name="SEC7" href="#TOC1">REVISION</a><br> |
<P> |
<P> |
Last updated: 03 March 2013 | Last updated: 12 November 2013 |
<br> |
<br> |
Copyright © 1997-2013 University of Cambridge. |
Copyright © 1997-2013 University of Cambridge. |
<br> |
<br> |