--- embedaddon/pcre/doc/html/pcreapi.html 2012/02/21 23:50:25 1.1.1.2 +++ embedaddon/pcre/doc/html/pcreapi.html 2012/10/09 09:19:18 1.1.1.3 @@ -317,7 +317,7 @@ PCRE supports five different conventions for indicatin strings: a single CR (carriage return) character, a single LF (linefeed) character, the two-character sequence CRLF, any of the three preceding, or any Unicode newline sequence. The Unicode newline sequences are the three just -mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, +mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029).

@@ -524,7 +524,7 @@ documentation). For those options that can be differen the pattern, the contents of the options argument specifies their settings at the start of compilation and execution. The PCRE_ANCHORED, PCRE_BSR_xxx, PCRE_NEWLINE_xxx, PCRE_NO_UTF8_CHECK, and -PCRE_NO_START_OPT options can be set at the time of matching as well as at +PCRE_NO_START_OPTIMIZE options can be set at the time of matching as well as at compile time.

@@ -641,8 +641,8 @@ documentation.

   PCRE_EXTENDED
 
-If this bit is set, whitespace data characters in the pattern are totally -ignored except when escaped or inside a character class. Whitespace does not +If this bit is set, white space data characters in the pattern are totally +ignored except when escaped or inside a character class. White space does not include the VT character (code 11). In addition, characters between an unescaped # outside a character class and the next newline, inclusive, are also ignored. This is equivalent to Perl's /x option, and it can be changed within a @@ -659,7 +659,7 @@ happen to represent a newline do not count.

This option makes it possible to include comments inside complicated patterns. -Note, however, that this applies only to data characters. Whitespace characters +Note, however, that this applies only to data characters. White space characters may never appear within special character sequences in a pattern, for example within the sequence (?( that introduces a conditional subpattern.

@@ -745,7 +745,7 @@ CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies 
 preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies
 that any Unicode newline sequence should be recognized. The Unicode newline
 sequences are the three just mentioned, plus the single characters VT (vertical
-tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
+tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
 separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit
 library, the last two are recognized only in UTF-8 mode.
 

@@ -759,7 +759,7 @@ other combinations may yield unused numbers and cause

The only time that a line break in a pattern is specially recognized when -compiling is when PCRE_EXTENDED is set. CR and LF are whitespace characters, +compiling is when PCRE_EXTENDED is set. CR and LF are white space characters, and so are ignored in this mode. Also, an unescaped # outside a character class indicates a comment that lasts until after the next line break sequence. In other circumstances, line break sequences in patterns are treated as literal @@ -916,6 +916,8 @@ fallen out of use. To avoid confusion, they have not b 72 too many forward references 73 disallowed Unicode code point (>= 0xd800 && <= 0xdfff) 74 invalid UTF-16 string (specifically UTF-16) + 75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) + 76 character value in \u.... sequence is too large

The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may be used if the limits were changed when PCRE was built. @@ -949,12 +951,18 @@ wants to pass any of the other fields to pcre_exec( pcre_dfa_exec(), it must set up its own pcre_extra block.

-The second argument of pcre_study() contains option bits. There is only -one option: PCRE_STUDY_JIT_COMPILE. If this is set, and the just-in-time -compiler is available, the pattern is further compiled into machine code that -executes much faster than the pcre_exec() matching function. If -the just-in-time compiler is not available, this option is ignored. All other -bits in the options argument must be zero. +The second argument of pcre_study() contains option bits. There are three +options: +

+  PCRE_STUDY_JIT_COMPILE
+  PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
+  PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
+
+If any of these are set, and the just-in-time compiler is available, the +pattern is further compiled into machine code that executes much faster than +the pcre_exec() interpretive matching function. If the just-in-time +compiler is not available, these options are ignored. All other bits in the +options argument must be zero.

JIT compilation is a heavyweight optimization. It can take some time for @@ -979,8 +987,8 @@ When you are finished with a pattern, you can free the study data by calling pcre_free_study(). This function was added to the API for release 8.20. For earlier versions, the memory could be freed with pcre_free(), just like the pattern itself. This will still work in cases -where PCRE_STUDY_JIT_COMPILE is not used, but it is advisable to change to the -new function when convenient. +where JIT optimization is not used, but it is advisable to change to the new +function when convenient.

This is a typical way in which pcre_study() is used (except that in a @@ -1016,14 +1024,12 @@ matching. (In 16-bit mode, the bitmap is used for 16-b

These two optimizations apply to both pcre_exec() and -pcre_dfa_exec(). However, they are not used by pcre_exec() if -pcre_study() is called with the PCRE_STUDY_JIT_COMPILE option, and -just-in-time compiling is successful. The optimizations can be disabled by -setting the PCRE_NO_START_OPTIMIZE option when calling pcre_exec() or -pcre_dfa_exec(). You might want to do this if your pattern contains -callouts or (*MARK) (which cannot be handled by the JIT compiler), and you want -to make use of these facilities in cases where matching fails. See the -discussion of PCRE_NO_START_OPTIMIZE +pcre_dfa_exec(), and the information is also used by the JIT compiler. +The optimizations can be disabled by setting the PCRE_NO_START_OPTIMIZE option +when calling pcre_exec() or pcre_dfa_exec(), but if this is done, +JIT execution is also disabled. You might want to do this if your pattern +contains callouts or (*MARK) and you want to make use of these facilities in +cases where matching fails. See the discussion of PCRE_NO_START_OPTIMIZE below.


LOCALE SUPPORT
@@ -1199,20 +1205,19 @@ Return 1 if the (?J) or (?-J) option setting is used i
   PCRE_INFO_JIT
 
-Return 1 if the pattern was studied with the PCRE_STUDY_JIT_COMPILE option, and +Return 1 if the pattern was studied with one of the JIT options, and just-in-time compiling was successful. The fourth argument should point to an int variable. A return value of 0 means that JIT support is not available -in this version of PCRE, or that the pattern was not studied with the -PCRE_STUDY_JIT_COMPILE option, or that the JIT compiler could not handle this -particular pattern. See the +in this version of PCRE, or that the pattern was not studied with a JIT option, +or that the JIT compiler could not handle this particular pattern. See the pcrejit documentation for details of what can and cannot be handled.
   PCRE_INFO_JITSIZE
 
-If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option, -return the size of the JIT compiled code, otherwise return zero. The fourth -argument should point to a size_t variable. +If the pattern was successfully studied with a JIT option, return the size of +the JIT compiled code, otherwise return zero. The fourth argument should point +to a size_t variable.
   PCRE_INFO_LASTLITERAL
 
@@ -1224,6 +1229,13 @@ only if it follows something of variable length. For e /^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value is -1.
+  PCRE_INFO_MAXLOOKBEHIND
+
+Return the number of characters (NB not bytes) in the longest lookbehind +assertion in the pattern. Note that the simple assertions \b and \B require a +one-character lookbehind. This information is useful when doing multi-segment +matching using the partial matching facilities. +
   PCRE_INFO_MINLENGTH
 
If the pattern was studied and a minimum length for matching subject strings @@ -1439,22 +1451,22 @@ In the 16-bit version of this structure, the mark

-The flags field is a bitmap that specifies which of the other fields -are set. The flag bits are: +The flags field is used to specify which of the other fields are set. The +flag bits are:

-  PCRE_EXTRA_STUDY_DATA
+  PCRE_EXTRA_CALLOUT_DATA
   PCRE_EXTRA_EXECUTABLE_JIT
+  PCRE_EXTRA_MARK
   PCRE_EXTRA_MATCH_LIMIT
   PCRE_EXTRA_MATCH_LIMIT_RECURSION
-  PCRE_EXTRA_CALLOUT_DATA
+  PCRE_EXTRA_STUDY_DATA
   PCRE_EXTRA_TABLES
-  PCRE_EXTRA_MARK
 
Other flag bits should be set to zero. The study_data field and sometimes the executable_jit field are set in the pcre_extra block that is returned by pcre_study(), together with the appropriate flag bits. You -should not set these yourself, but you may add to the block by setting the -other fields and their corresponding flag bits. +should not set these yourself, but you may add to the block by setting other +fields and their corresponding flag bits.

The match_limit field provides a means of preventing PCRE from using up a @@ -1472,11 +1484,10 @@ in the subject string.

When pcre_exec() is called with a pattern that was successfully studied -with the PCRE_STUDY_JIT_COMPILE option, the way that the matching is executed -is entirely different. However, there is still the possibility of runaway -matching that goes on for a very long time, and so the match_limit value -is also used in this case (but in a different way) to limit how long the -matching can continue. +with a JIT option, the way that the matching is executed is entirely different. +However, there is still the possibility of runaway matching that goes on for a +very long time, and so the match_limit value is also used in this case +(but in a different way) to limit how long the matching can continue.

The default value for the limit can be set when PCRE is built; the default @@ -1497,8 +1508,7 @@ This limit is of use only if it is set smaller than

The default value for match_limit_recursion can be set when PCRE is @@ -1549,16 +1559,16 @@ Option bits for pcre_exec() The unused bits of the options argument for pcre_exec() must be zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, -PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and -PCRE_PARTIAL_HARD. +PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, and +PCRE_PARTIAL_SOFT.

-If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option, -the only supported options for JIT execution are PCRE_NO_UTF8_CHECK, -PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NOTEMPTY_ATSTART. Note in -particular that partial matching is not supported. If an unsupported option is -used, JIT execution is disabled and the normal interpretive code in -pcre_exec() is run. +If the pattern was successfully studied with one of the just-in-time (JIT) +compile options, the only supported options for JIT execution are +PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, +PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an +unsupported option is used, JIT execution is disabled and the normal +interpretive code in pcre_exec() is run.

   PCRE_ANCHORED
 
@@ -1681,7 +1691,8 @@ causing performance to suffer, but ensuring that in ca "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK) are considered at every possible starting position in the subject string. If PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching -time. +time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set, +matching is always done using interpretively.

Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation. @@ -1715,9 +1726,11 @@ returned. When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 string is automatically checked when pcre_exec() is subsequently called. -The value of startoffset is also checked to ensure that it points to the -start of a UTF-8 character. There is a discussion about the validity of UTF-8 -strings in the +The entire string is checked before any other processing takes place. The value +of startoffset is also checked to ensure that it points to the start of a +UTF-8 character. There is a discussion about the +validity of UTF-8 strings +in the pcreunicode page. If an invalid sequence of bytes is found, pcre_exec() returns the error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a @@ -1868,7 +1881,7 @@ string that it matched that is returned.

If the vector is too small to hold all the captured substring offsets, it is used as far as possible (up to two-thirds of its length), and the function -returns a value of zero. If neither the actual string matched not any captured +returns a value of zero. If neither the actual string matched nor any captured substrings are of interest, pcre_exec() may be called with ovector passed as NULL and ovecsize as zero. However, if the pattern contains back references and the ovector is not big enough to remember the related @@ -2067,18 +2080,18 @@ time.

   PCRE_ERROR_JIT_STACKLIMIT (-27)
 
-This error is returned when a pattern that was successfully studied using the -PCRE_STUDY_JIT_COMPILE option is being matched, but the memory available for -the just-in-time processing stack is not large enough. See the +This error is returned when a pattern that was successfully studied using a +JIT compile option is being matched, but the memory available for the +just-in-time processing stack is not large enough. See the pcrejit documentation for more details.
-  PCRE_ERROR_BADMODE (-28)
+  PCRE_ERROR_BADMODE        (-28)
 
This error is given if a pattern that was compiled by the 8-bit library is passed to a 16-bit library function, or vice versa.
-  PCRE_ERROR_BADENDIANNESS (-29)
+  PCRE_ERROR_BADENDIANNESS  (-29)
 
This error is given if a pattern that was compiled and saved is reloaded on a host with different endianness. The utility function @@ -2086,7 +2099,7 @@ host with different endianness. The utility function so that it runs on the new host.

-Error numbers -16 to -20 and -22 are not used by pcre_exec(). +Error numbers -16 to -20, -22, and -30 are not used by pcre_exec().


Reason codes for invalid UTF-8 strings @@ -2581,6 +2594,13 @@ When a recursive subpattern is processed, the matching recursively, using private vectors for ovector and workspace. This error is given if the output vector is not large enough. This should be extremely rare, as a vector of size 1000 is used. +
+  PCRE_ERROR_DFA_BADRESTART (-30)
+
+When pcre_dfa_exec() is called with the PCRE_DFA_RESTART option, +some plausibility checks are made on the contents of the workspace, which +should contain data about the previous partial match. If any of these checks +fail, this error is given.


SEE ALSO

@@ -2599,7 +2619,7 @@ Cambridge CB2 3QH, England.


REVISION

-Last updated: 21 January 2012 +Last updated: 17 June 2012
Copyright © 1997-2012 University of Cambridge.