--- embedaddon/pcre/doc/html/pcrepartial.html 2012/02/21 23:50:25 1.1.1.2 +++ embedaddon/pcre/doc/html/pcrepartial.html 2012/10/09 09:19:17 1.1.1.3 @@ -58,9 +58,19 @@ the details differ between the two types of matching f are set, PCRE_PARTIAL_HARD takes precedence.

-Setting a partial matching option disables the use of any just-in-time code -that was set up by studying the compiled pattern with the -PCRE_STUDY_JIT_COMPILE option. It also disables two of PCRE's standard +If you want to use partial matching with just-in-time optimized code, you must +call pcre_study() or pcre16_study() with one or both of these +options: +

+  PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
+  PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
+
+PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial +matches on the same pattern. If the appropriate JIT study mode has not been set +for a match, the interpretive matching code is used. +

+

+Setting a partial matching option disables two of PCRE's standard optimizations. PCRE remembers the last literal data unit in a pattern, and abandons matching immediately if it is not present in the subject string. This optimization cannot be used for a subject string that might match only @@ -317,8 +327,8 @@ treat the end of a segment as the end of the subject w At this stage, an application could discard the text preceding "23ja", add on text from the next segment, and call the matching function again. Unlike the -DFA matching functions the entire matching string must always be available, and -the complete matching process occurs for each call, so more memory and more +DFA matching functions, the entire matching string must always be available, +and the complete matching process occurs for each call, so more memory and more processing time is needed.

@@ -326,6 +336,8 @@ processing time is needed. with \b or \B, the string that is returned for a partial match includes characters that precede the partially matched string itself, because these must be retained when adding on more characters for a subsequent matching attempt. +However, in some cases you may need to retain even earlier characters, as +discussed in the next section.


ISSUES WITH MULTI-SEGMENT MATCHING

@@ -340,15 +352,33 @@ doing multi-segment matching you should be using PCRE_ includes the effect of PCRE_NOTEOL.

-2. Lookbehind assertions at the start of a pattern are catered for in the -offsets that are returned for a partial match. However, in theory, a lookbehind -assertion later in the pattern could require even earlier characters to be -inspected, and it might not have been reached when a partial match occurs. This -is probably an extremely unlikely case; you could guard against it to a certain -extent by always including extra characters at the start. +2. Lookbehind assertions that have already been obeyed are catered for in the +offsets that are returned for a partial match. However a lookbehind assertion +later in the pattern could require even earlier characters to be inspected. You +can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the +pcre_fullinfo() or pcre16_fullinfo() functions to obtain the length +of the largest lookbehind in the pattern. This length is given in characters, +not bytes. If you always retain at least that many characters before the +partially matched string, all should be well. (Of course, near the start of the +subject, fewer characters may be present; in that case all characters should be +retained.)

-3. Matching a subject string that is split into multiple segments may not +3. Because a partial match must always contain at least one character, what +might be considered a partial match of an empty string actually gives a "no +match" result. For example: +

+    re> /c(?<=abc)x/
+  data> ab\P
+  No match
+
+If the next segment begins "cx", a match should be found, but this will only +happen if characters from the previous segment are retained. For this reason, a +"no match" result should be interpreted as "partial match of an empty string" +when the pattern contains lookbehinds. +

+

+4. Matching a subject string that is split into multiple segments may not always produce exactly the same result as matching over one single long string, especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and Word Boundaries" above describes an issue that arises if the pattern ends with @@ -390,7 +420,7 @@ multi-segment data. The example above then behaves dif data> gsb\R\P\P\D Partial match: gsb -4. Patterns that contain alternatives at the top level which do not all start +5. Patterns that contain alternatives at the top level which do not all start with the same pattern item may not work as expected when PCRE_DFA_RESTART is used. For example, consider this pattern:

@@ -435,7 +465,7 @@ Cambridge CB2 3QH, England.
 


REVISION

-Last updated: 21 January 2012 +Last updated: 24 February 2012
Copyright © 1997-2012 University of Cambridge.