version 1.1.1.2, 2012/02/21 23:50:25
|
version 1.1.1.3, 2012/10/09 09:19:17
|
Line 58 the details differ between the two types of matching f
|
Line 58 the details differ between the two types of matching f
|
are set, PCRE_PARTIAL_HARD takes precedence. |
are set, PCRE_PARTIAL_HARD takes precedence. |
</P> |
</P> |
<P> |
<P> |
Setting a partial matching option disables the use of any just-in-time code | If you want to use partial matching with just-in-time optimized code, you must |
that was set up by studying the compiled pattern with the | call <b>pcre_study()</b> or <b>pcre16_study()</b> with one or both of these |
PCRE_STUDY_JIT_COMPILE option. It also disables two of PCRE's standard | options: |
| <pre> |
| PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE |
| PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE |
| </pre> |
| PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial |
| matches on the same pattern. If the appropriate JIT study mode has not been set |
| for a match, the interpretive matching code is used. |
| </P> |
| <P> |
| Setting a partial matching option disables two of PCRE's standard |
optimizations. PCRE remembers the last literal data unit in a pattern, and |
optimizations. PCRE remembers the last literal data unit in a pattern, and |
abandons matching immediately if it is not present in the subject string. This |
abandons matching immediately if it is not present in the subject string. This |
optimization cannot be used for a subject string that might match only |
optimization cannot be used for a subject string that might match only |
Line 317 treat the end of a segment as the end of the subject w
|
Line 327 treat the end of a segment as the end of the subject w
|
</pre> |
</pre> |
At this stage, an application could discard the text preceding "23ja", add on |
At this stage, an application could discard the text preceding "23ja", add on |
text from the next segment, and call the matching function again. Unlike the |
text from the next segment, and call the matching function again. Unlike the |
DFA matching functions the entire matching string must always be available, and | DFA matching functions, the entire matching string must always be available, |
the complete matching process occurs for each call, so more memory and more | and the complete matching process occurs for each call, so more memory and more |
processing time is needed. |
processing time is needed. |
</P> |
</P> |
<P> |
<P> |
Line 326 processing time is needed.
|
Line 336 processing time is needed.
|
with \b or \B, the string that is returned for a partial match includes |
with \b or \B, the string that is returned for a partial match includes |
characters that precede the partially matched string itself, because these must |
characters that precede the partially matched string itself, because these must |
be retained when adding on more characters for a subsequent matching attempt. |
be retained when adding on more characters for a subsequent matching attempt. |
|
However, in some cases you may need to retain even earlier characters, as |
|
discussed in the next section. |
</P> |
</P> |
<br><a name="SEC9" href="#TOC1">ISSUES WITH MULTI-SEGMENT MATCHING</a><br> |
<br><a name="SEC9" href="#TOC1">ISSUES WITH MULTI-SEGMENT MATCHING</a><br> |
<P> |
<P> |
Line 340 doing multi-segment matching you should be using PCRE_
|
Line 352 doing multi-segment matching you should be using PCRE_
|
includes the effect of PCRE_NOTEOL. |
includes the effect of PCRE_NOTEOL. |
</P> |
</P> |
<P> |
<P> |
2. Lookbehind assertions at the start of a pattern are catered for in the | 2. Lookbehind assertions that have already been obeyed are catered for in the |
offsets that are returned for a partial match. However, in theory, a lookbehind | offsets that are returned for a partial match. However a lookbehind assertion |
assertion later in the pattern could require even earlier characters to be | later in the pattern could require even earlier characters to be inspected. You |
inspected, and it might not have been reached when a partial match occurs. This | can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the |
is probably an extremely unlikely case; you could guard against it to a certain | <b>pcre_fullinfo()</b> or <b>pcre16_fullinfo()</b> functions to obtain the length |
extent by always including extra characters at the start. | of the largest lookbehind in the pattern. This length is given in characters, |
| not bytes. If you always retain at least that many characters before the |
| partially matched string, all should be well. (Of course, near the start of the |
| subject, fewer characters may be present; in that case all characters should be |
| retained.) |
</P> |
</P> |
<P> |
<P> |
3. Matching a subject string that is split into multiple segments may not | 3. Because a partial match must always contain at least one character, what |
| might be considered a partial match of an empty string actually gives a "no |
| match" result. For example: |
| <pre> |
| re> /c(?<=abc)x/ |
| data> ab\P |
| No match |
| </pre> |
| If the next segment begins "cx", a match should be found, but this will only |
| happen if characters from the previous segment are retained. For this reason, a |
| "no match" result should be interpreted as "partial match of an empty string" |
| when the pattern contains lookbehinds. |
| </P> |
| <P> |
| 4. Matching a subject string that is split into multiple segments may not |
always produce exactly the same result as matching over one single long string, |
always produce exactly the same result as matching over one single long string, |
especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and |
especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and |
Word Boundaries" above describes an issue that arises if the pattern ends with |
Word Boundaries" above describes an issue that arises if the pattern ends with |
Line 390 multi-segment data. The example above then behaves dif
|
Line 420 multi-segment data. The example above then behaves dif
|
data> gsb\R\P\P\D |
data> gsb\R\P\P\D |
Partial match: gsb |
Partial match: gsb |
</pre> |
</pre> |
4. Patterns that contain alternatives at the top level which do not all start | 5. Patterns that contain alternatives at the top level which do not all start |
with the same pattern item may not work as expected when PCRE_DFA_RESTART is |
with the same pattern item may not work as expected when PCRE_DFA_RESTART is |
used. For example, consider this pattern: |
used. For example, consider this pattern: |
<pre> |
<pre> |
Line 435 Cambridge CB2 3QH, England.
|
Line 465 Cambridge CB2 3QH, England.
|
</P> |
</P> |
<br><a name="SEC11" href="#TOC1">REVISION</a><br> |
<br><a name="SEC11" href="#TOC1">REVISION</a><br> |
<P> |
<P> |
Last updated: 21 January 2012 | Last updated: 24 February 2012 |
<br> |
<br> |
Copyright © 1997-2012 University of Cambridge. |
Copyright © 1997-2012 University of Cambridge. |
<br> |
<br> |