--- embedaddon/pcre/doc/html/pcrepartial.html	2012/02/21 23:05:52	1.1.1.1
+++ embedaddon/pcre/doc/html/pcrepartial.html	2012/02/21 23:50:25	1.1.1.2
@@ -14,24 +14,24 @@ man page, in case the conversion went wrong.
 <br>
 <ul>
 <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
-<li><a name="TOC2" href="#SEC2">PARTIAL MATCHING USING pcre_exec()</a>
-<li><a name="TOC3" href="#SEC3">PARTIAL MATCHING USING pcre_dfa_exec()</a>
+<li><a name="TOC2" href="#SEC2">PARTIAL MATCHING USING pcre_exec() OR pcre16_exec()</a>
+<li><a name="TOC3" href="#SEC3">PARTIAL MATCHING USING pcre_dfa_exec() OR pcre16_dfa_exec()</a>
 <li><a name="TOC4" href="#SEC4">PARTIAL MATCHING AND WORD BOUNDARIES</a>
 <li><a name="TOC5" href="#SEC5">FORMERLY RESTRICTED PATTERNS</a>
 <li><a name="TOC6" href="#SEC6">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
-<li><a name="TOC7" href="#SEC7">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
-<li><a name="TOC8" href="#SEC8">MULTI-SEGMENT MATCHING WITH pcre_exec()</a>
+<li><a name="TOC7" href="#SEC7">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre16_dfa_exec()</a>
+<li><a name="TOC8" href="#SEC8">MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre16_exec()</a>
 <li><a name="TOC9" href="#SEC9">ISSUES WITH MULTI-SEGMENT MATCHING</a>
 <li><a name="TOC10" href="#SEC10">AUTHOR</a>
 <li><a name="TOC11" href="#SEC11">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
 <P>
-In normal use of PCRE, if the subject string that is passed to
-<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> matches as far as it goes, but is
-too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
-are circumstances where it might be helpful to distinguish this case from other
-cases in which there is no match.
+In normal use of PCRE, if the subject string that is passed to a matching
+function matches as far as it goes, but is too short to match the entire
+pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances where it might
+be helpful to distinguish this case from other cases in which there is no
+match.
 </P>
 <P>
 Consider, for example, an application where a human is required to type in data
@@ -50,42 +50,41 @@ long and is not all available at once.
 </P>
 <P>
 PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and
-PCRE_PARTIAL_HARD options, which can be set when calling <b>pcre_exec()</b> or
-<b>pcre_dfa_exec()</b>. For backwards compatibility, PCRE_PARTIAL is a synonym
-for PCRE_PARTIAL_SOFT. The essential difference between the two options is
-whether or not a partial match is preferred to an alternative complete match,
-though the details differ between the two matching functions. If both options
+PCRE_PARTIAL_HARD options, which can be set when calling any of the matching
+functions. For backwards compatibility, PCRE_PARTIAL is a synonym for
+PCRE_PARTIAL_SOFT. The essential difference between the two options is whether
+or not a partial match is preferred to an alternative complete match, though
+the details differ between the two types of matching function. If both options
 are set, PCRE_PARTIAL_HARD takes precedence.
 </P>
 <P>
-Setting a partial matching option for <b>pcre_exec()</b> disables the use of any
-just-in-time code that was set up by calling <b>pcre_study()</b> with the
+Setting a partial matching option disables the use of any just-in-time code
+that was set up by studying the compiled pattern with the
 PCRE_STUDY_JIT_COMPILE option. It also disables two of PCRE's standard
-optimizations. PCRE remembers the last literal byte in a pattern, and abandons
-matching immediately if such a byte is not present in the subject string. This
+optimizations. PCRE remembers the last literal data unit in a pattern, and
+abandons matching immediately if it is not present in the subject string. This
 optimization cannot be used for a subject string that might match only
 partially. If the pattern was studied, PCRE knows the minimum length of a
 matching string, and does not bother to run the matching function on shorter
 strings. This optimization is also disabled for partial matching.
 </P>
-<br><a name="SEC2" href="#TOC1">PARTIAL MATCHING USING pcre_exec()</a><br>
+<br><a name="SEC2" href="#TOC1">PARTIAL MATCHING USING pcre_exec() OR pcre16_exec()</a><br>
 <P>
-A partial match occurs during a call to <b>pcre_exec()</b> when the end of the
-subject string is reached successfully, but matching cannot continue because
-more characters are needed. However, at least one character in the subject must
-have been inspected. This character need not form part of the final matched
-string; lookbehind assertions and the \K escape sequence provide ways of
-inspecting characters before the start of a matched substring. The requirement
-for inspecting at least one character exists because an empty string can always
-be matched; without such a restriction there would always be a partial match of
-an empty string at the end of the subject.
+A partial match occurs during a call to <b>pcre_exec()</b> or
+<b>pcre16_exec()</b> when the end of the subject string is reached successfully,
+but matching cannot continue because more characters are needed. However, at
+least one character in the subject must have been inspected. This character
+need not form part of the final matched string; lookbehind assertions and the
+\K escape sequence provide ways of inspecting characters before the start of a
+matched substring. The requirement for inspecting at least one character exists
+because an empty string can always be matched; without such a restriction there
+would always be a partial match of an empty string at the end of the subject.
 </P>
 <P>
-If there are at least two slots in the offsets vector when <b>pcre_exec()</b>
-returns with a partial match, the first slot is set to the offset of the
-earliest character that was inspected when the partial match was found. For
-convenience, the second offset points to the end of the subject so that a
-substring can easily be identified.
+If there are at least two slots in the offsets vector when a partial match is
+returned, the first slot is set to the offset of the earliest character that
+was inspected. For convenience, the second offset points to the end of the
+subject so that a substring can easily be identified.
 </P>
 <P>
 For the majority of patterns, the first offset identifies the start of the
@@ -105,13 +104,14 @@ What happens when a partial match is identified depend
 partial matching options are set.
 </P>
 <br><b>
-PCRE_PARTIAL_SOFT with pcre_exec()
+PCRE_PARTIAL_SOFT WITH pcre_exec() OR pcre16_exec()
 </b><br>
 <P>
-If PCRE_PARTIAL_SOFT is set when <b>pcre_exec()</b> identifies a partial match,
-the partial match is remembered, but matching continues as normal, and other
-alternatives in the pattern are tried. If no complete match can be found,
-<b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH.
+If PCRE_PARTIAL_SOFT is set when <b>pcre_exec()</b> or <b>pcre16_exec()</b>
+identifies a partial match, the partial match is remembered, but matching
+continues as normal, and other alternatives in the pattern are tried. If no
+complete match can be found, PCRE_ERROR_PARTIAL is returned instead of
+PCRE_ERROR_NOMATCH.
 </P>
 <P>
 This option is "soft" because it prefers a complete match over a partial match.
@@ -134,22 +134,25 @@ example, there are two partial matches, because "dog" 
 matches the second alternative.)
 </P>
 <br><b>
-PCRE_PARTIAL_HARD with pcre_exec()
+PCRE_PARTIAL_HARD WITH pcre_exec() OR pcre16_exec()
 </b><br>
 <P>
-If PCRE_PARTIAL_HARD is set for <b>pcre_exec()</b>, it returns
-PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to
-search for possible complete matches. This option is "hard" because it prefers
-an earlier partial match over a later complete match. For this reason, the
-assumption is made that the end of the supplied subject string may not be the
-true end of the available data, and so, if \z, \Z, \b, \B, or $ are
-encountered at the end of the subject, the result is PCRE_ERROR_PARTIAL.
+If PCRE_PARTIAL_HARD is set for <b>pcre_exec()</b> or <b>pcre16_exec()</b>,
+PCRE_ERROR_PARTIAL is returned as soon as a partial match is found, without
+continuing to search for possible complete matches. This option is "hard"
+because it prefers an earlier partial match over a later complete match. For
+this reason, the assumption is made that the end of the supplied subject string
+may not be the true end of the available data, and so, if \z, \Z, \b, \B,
+or $ are encountered at the end of the subject, the result is
+PCRE_ERROR_PARTIAL, provided that at least one character in the subject has
+been inspected.
 </P>
 <P>
-Setting PCRE_PARTIAL_HARD also affects the way <b>pcre_exec()</b> checks UTF-8
-subject strings for validity. Normally, an invalid UTF-8 sequence causes the
-error PCRE_ERROR_BADUTF8. However, in the special case of a truncated UTF-8
-character at the end of the subject, PCRE_ERROR_SHORTUTF8 is returned when
+Setting PCRE_PARTIAL_HARD also affects the way UTF-8 and UTF-16
+subject strings are checked for validity. Normally, an invalid sequence
+causes the error PCRE_ERROR_BADUTF8 or PCRE_ERROR_BADUTF16. However, in the
+special case of a truncated character at the end of the subject,
+PCRE_ERROR_SHORTUTF8 or PCRE_ERROR_SHORTUTF16 is returned when
 PCRE_PARTIAL_HARD is set.
 </P>
 <br><b>
@@ -169,23 +172,23 @@ if the pattern is made ungreedy the result is differen
 <pre>
   /dog(sbody)??/
 </pre>
-In this case the result is always a complete match because <b>pcre_exec()</b>
-finds that first, and it never continues after finding a match. It might be
-easier to follow this explanation by thinking of the two patterns like this:
+In this case the result is always a complete match because that is found first,
+and matching never continues after finding a complete match. It might be easier
+to follow this explanation by thinking of the two patterns like this:
 <pre>
   /dog(sbody)?/    is the same as  /dogsbody|dog/
   /dog(sbody)??/   is the same as  /dog|dogsbody/
 </pre>
-The second pattern will never match "dogsbody" when <b>pcre_exec()</b> is
-used, because it will always find the shorter match first.
+The second pattern will never match "dogsbody", because it will always find the
+shorter match first.
 </P>
-<br><a name="SEC3" href="#TOC1">PARTIAL MATCHING USING pcre_dfa_exec()</a><br>
+<br><a name="SEC3" href="#TOC1">PARTIAL MATCHING USING pcre_dfa_exec() OR pcre16_dfa_exec()</a><br>
 <P>
-The <b>pcre_dfa_exec()</b> function moves along the subject string character by
-character, without backtracking, searching for all possible matches
-simultaneously. If the end of the subject is reached before the end of the
-pattern, there is the possibility of a partial match, again provided that at
-least one character has been inspected.
+The DFA functions move along the subject string character by character, without
+backtracking, searching for all possible matches simultaneously. If the end of
+the subject is reached before the end of the pattern, there is the possibility
+of a partial match, again provided that at least one character has been
+inspected.
 </P>
 <P>
 When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned only if there
@@ -196,16 +199,16 @@ partial match was found is set as the first matching s
 at least two slots in the offsets vector.
 </P>
 <P>
-Because <b>pcre_dfa_exec()</b> always searches for all possible matches, and
-there is no difference between greedy and ungreedy repetition, its behaviour is
-different from <b>pcre_exec</b> when PCRE_PARTIAL_HARD is set. Consider the
-string "dog" matched against the ungreedy pattern shown above:
+Because the DFA functions always search for all possible matches, and there is
+no difference between greedy and ungreedy repetition, their behaviour is
+different from the standard functions when PCRE_PARTIAL_HARD is set. Consider
+the string "dog" matched against the ungreedy pattern shown above:
 <pre>
   /dog(sbody)??/
 </pre>
-Whereas <b>pcre_exec()</b> stops as soon as it finds the complete match for
-"dog", <b>pcre_dfa_exec()</b> also finds the partial match for "dogsbody", and
-so returns that when PCRE_PARTIAL_HARD is set.
+Whereas the standard functions stop as soon as they find the complete match for
+"dog", the DFA functions also find the partial match for "dogsbody", and so
+return that when PCRE_PARTIAL_HARD is set.
 </P>
 <br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br>
 <P>
@@ -217,23 +220,19 @@ results. Consider this pattern:
 </pre>
 This matches "cat", provided there is a word boundary at either end. If the
 subject string is "the cat", the comparison of the final "t" with a following
-character cannot take place, so a partial match is found. However,
-<b>pcre_exec()</b> carries on with normal matching, which matches \b at the end
-of the subject when the last character is a letter, thus finding a complete
-match. The result, therefore, is <i>not</i> PCRE_ERROR_PARTIAL. The same thing
-happens with <b>pcre_dfa_exec()</b>, because it also finds the complete match.
+character cannot take place, so a partial match is found. However, normal
+matching carries on, and \b matches at the end of the subject when the last
+character is a letter, so a complete match is found. The result, therefore, is
+<i>not</i> PCRE_ERROR_PARTIAL. Using PCRE_PARTIAL_HARD in this case does yield
+PCRE_ERROR_PARTIAL, because then the partial match takes precedence.
 </P>
-<P>
-Using PCRE_PARTIAL_HARD in this case does yield PCRE_ERROR_PARTIAL, because
-then the partial match takes precedence.
-</P>
 <br><a name="SEC5" href="#TOC1">FORMERLY RESTRICTED PATTERNS</a><br>
 <P>
 For releases of PCRE prior to 8.00, because of the way certain internal
 optimizations were implemented in the <b>pcre_exec()</b> function, the
 PCRE_PARTIAL option (predecessor of PCRE_PARTIAL_SOFT) could not be used with
 all patterns. From release 8.00 onwards, the restrictions no longer apply, and
-partial matching with <b>pcre_exec()</b> can be requested for any pattern.
+partial matching with can be requested for any pattern.
 </P>
 <P>
 Items that were formerly restricted were repeated single characters and
@@ -265,22 +264,21 @@ that uses the date example quoted above:
 The first data string is matched completely, so <b>pcretest</b> shows the
 matched substrings. The remaining four strings do not match the complete
 pattern, but the first two are partial matches. Similar output is obtained
-when <b>pcre_dfa_exec()</b> is used.
+if DFA matching is used.
 </P>
 <P>
 If the escape sequence \P is present more than once in a <b>pcretest</b> data
 line, the PCRE_PARTIAL_HARD option is set for the match.
 </P>
-<br><a name="SEC7" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
+<br><a name="SEC7" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre16_dfa_exec()</a><br>
 <P>
-When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
-to continue the match by providing additional subject data and calling
-<b>pcre_dfa_exec()</b> again with the same compiled regular expression, this
-time setting the PCRE_DFA_RESTART option. You must pass the same working
-space as before, because this is where details of the previous partial match
-are stored. Here is an example using <b>pcretest</b>, using the \R escape
-sequence to set the PCRE_DFA_RESTART option (\D specifies the use of
-<b>pcre_dfa_exec()</b>):
+When a partial match has been found using a DFA matching function, it is
+possible to continue the match by providing additional subject data and calling
+the function again with the same compiled regular expression, this time setting
+the PCRE_DFA_RESTART option. You must pass the same working space as before,
+because this is where details of the previous partial match are stored. Here is
+an example using <b>pcretest</b>, using the \R escape sequence to set the
+PCRE_DFA_RESTART option (\D specifies the use of the DFA matching function):
 <pre>
     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
   data&#62; 23ja\P\D
@@ -297,33 +295,35 @@ program to do that if it needs to.
 <P>
 You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
 PCRE_DFA_RESTART to continue partial matching over multiple segments. This
-facility can be used to pass very long subject strings to
-<b>pcre_dfa_exec()</b>.
+facility can be used to pass very long subject strings to the DFA matching
+functions.
 </P>
-<br><a name="SEC8" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_exec()</a><br>
+<br><a name="SEC8" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre16_exec()</a><br>
 <P>
-From release 8.00, <b>pcre_exec()</b> can also be used to do multi-segment
-matching. Unlike <b>pcre_dfa_exec()</b>, it is not possible to restart the
-previous match with a new segment of data. Instead, new data must be added to
-the previous subject string, and the entire match re-run, starting from the
-point where the partial match occurred. Earlier data can be discarded. It is
-best to use PCRE_PARTIAL_HARD in this situation, because it does not treat the
-end of a segment as the end of the subject when matching \z, \Z, \b, \B,
-and $. Consider an unanchored pattern that matches dates:
+From release 8.00, the standard matching functions can also be used to do
+multi-segment matching. Unlike the DFA functions, it is not possible to
+restart the previous match with a new segment of data. Instead, new data must
+be added to the previous subject string, and the entire match re-run, starting
+from the point where the partial match occurred. Earlier data can be discarded.
+</P>
+<P>
+It is best to use PCRE_PARTIAL_HARD in this situation, because it does not
+treat the end of a segment as the end of the subject when matching \z, \Z,
+\b, \B, and $. Consider an unanchored pattern that matches dates:
 <pre>
     re&#62; /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
   data&#62; The date is 23ja\P\P
   Partial match: 23ja
 </pre>
 At this stage, an application could discard the text preceding "23ja", add on
-text from the next segment, and call <b>pcre_exec()</b> again. Unlike
-<b>pcre_dfa_exec()</b>, the entire matching string must always be available, and
+text from the next segment, and call the matching function again. Unlike the
+DFA matching functions the entire matching string must always be available, and
 the complete matching process occurs for each call, so more memory and more
 processing time is needed.
 </P>
 <P>
 <b>Note:</b> If the pattern contains lookbehind assertions, or \K, or starts
-with \b or \B, the string that is returned for a partial match will include
+with \b or \B, the string that is returned for a partial match includes
 characters that precede the partially matched string itself, because these must
 be retained when adding on more characters for a subsequent matching attempt.
 </P>
@@ -369,14 +369,14 @@ longer possible. Consider again this <b>pcretest</b> e
    0: dogsbody
    1: dog
 </pre>
-The first data line passes the string "dogsb" to <b>pcre_exec()</b>, setting the
-PCRE_PARTIAL_SOFT option. Although the string is a partial match for
-"dogsbody", the result is not PCRE_ERROR_PARTIAL, because the shorter string
-"dog" is a complete match. Similarly, when the subject is presented to
-<b>pcre_dfa_exec()</b> in several parts ("do" and "gsb" being the first two) the
-match stops when "dog" has been found, and it is not possible to continue. On
-the other hand, if "dogsbody" is presented as a single string,
-<b>pcre_dfa_exec()</b> finds both matches.
+The first data line passes the string "dogsb" to a standard matching function,
+setting the PCRE_PARTIAL_SOFT option. Although the string is a partial match
+for "dogsbody", the result is not PCRE_ERROR_PARTIAL, because the shorter
+string "dog" is a complete match. Similarly, when the subject is presented to
+a DFA matching function in several parts ("do" and "gsb" being the first two)
+the match stops when "dog" has been found, and it is not possible to continue.
+On the other hand, if "dogsbody" is presented as a single string, a DFA
+matching function finds both matches.
 </P>
 <P>
 Because of these problems, it is best to use PCRE_PARTIAL_HARD when matching
@@ -390,10 +390,9 @@ multi-segment data. The example above then behaves dif
   data&#62; gsb\R\P\P\D
   Partial match: gsb
 </pre>
-4. Patterns that contain alternatives at the top level which do not all
-start with the same pattern item may not work as expected when
-PCRE_DFA_RESTART is used with <b>pcre_dfa_exec()</b>. For example, consider this
-pattern:
+4. Patterns that contain alternatives at the top level which do not all start
+with the same pattern item may not work as expected when PCRE_DFA_RESTART is
+used. For example, consider this pattern:
 <pre>
   1234|3789
 </pre>
@@ -409,8 +408,8 @@ patterns or patterns such as:
   1234|ABCD
 </pre>
 where no string can be a partial match for both alternatives. This is not a
-problem if <b>pcre_exec()</b> is used, because the entire match has to be rerun
-each time:
+problem if a standard matching function is used, because the entire match has
+to be rerun each time:
 <pre>
     re&#62; /1234|3789/
   data&#62; ABC123\P\P
@@ -419,7 +418,7 @@ each time:
    0: 3789
 </pre>
 Of course, instead of using PCRE_DFA_RESTART, the same technique of re-running
-the entire match can also be used with <b>pcre_dfa_exec()</b>. Another
+the entire match can also be used with the DFA matching functions. Another
 possibility is to work with two buffers. If a partial match at offset <i>n</i>
 in the first buffer is followed by "no match" when PCRE_DFA_RESTART is used on
 the second buffer, you can then try a new match starting at offset <i>n+1</i> in
@@ -436,9 +435,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC11" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 26 August 2011
+Last updated: 21 January 2012
 <br>
-Copyright &copy; 1997-2011 University of Cambridge.
+Copyright &copy; 1997-2012 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.