--- embedaddon/pcre/doc/html/pcretest.html	2012/10/09 09:19:18	1.1.1.3
+++ embedaddon/pcre/doc/html/pcretest.html	2013/07/22 08:25:57	1.1.1.4
@@ -14,21 +14,22 @@ man page, in case the conversion went wrong.
 <br>
 <ul>
 <li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
-<li><a name="TOC2" href="#SEC2">PCRE's 8-BIT and 16-BIT LIBRARIES</a>
-<li><a name="TOC3" href="#SEC3">COMMAND LINE OPTIONS</a>
-<li><a name="TOC4" href="#SEC4">DESCRIPTION</a>
-<li><a name="TOC5" href="#SEC5">PATTERN MODIFIERS</a>
-<li><a name="TOC6" href="#SEC6">DATA LINES</a>
-<li><a name="TOC7" href="#SEC7">THE ALTERNATIVE MATCHING FUNCTION</a>
-<li><a name="TOC8" href="#SEC8">DEFAULT OUTPUT FROM PCRETEST</a>
-<li><a name="TOC9" href="#SEC9">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
-<li><a name="TOC10" href="#SEC10">RESTARTING AFTER A PARTIAL MATCH</a>
-<li><a name="TOC11" href="#SEC11">CALLOUTS</a>
-<li><a name="TOC12" href="#SEC12">NON-PRINTING CHARACTERS</a>
-<li><a name="TOC13" href="#SEC13">SAVING AND RELOADING COMPILED PATTERNS</a>
-<li><a name="TOC14" href="#SEC14">SEE ALSO</a>
-<li><a name="TOC15" href="#SEC15">AUTHOR</a>
-<li><a name="TOC16" href="#SEC16">REVISION</a>
+<li><a name="TOC2" href="#SEC2">INPUT DATA FORMAT</a>
+<li><a name="TOC3" href="#SEC3">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
+<li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a>
+<li><a name="TOC5" href="#SEC5">DESCRIPTION</a>
+<li><a name="TOC6" href="#SEC6">PATTERN MODIFIERS</a>
+<li><a name="TOC7" href="#SEC7">DATA LINES</a>
+<li><a name="TOC8" href="#SEC8">THE ALTERNATIVE MATCHING FUNCTION</a>
+<li><a name="TOC9" href="#SEC9">DEFAULT OUTPUT FROM PCRETEST</a>
+<li><a name="TOC10" href="#SEC10">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
+<li><a name="TOC11" href="#SEC11">RESTARTING AFTER A PARTIAL MATCH</a>
+<li><a name="TOC12" href="#SEC12">CALLOUTS</a>
+<li><a name="TOC13" href="#SEC13">NON-PRINTING CHARACTERS</a>
+<li><a name="TOC14" href="#SEC14">SAVING AND RELOADING COMPILED PATTERNS</a>
+<li><a name="TOC15" href="#SEC15">SEE ALSO</a>
+<li><a name="TOC16" href="#SEC16">AUTHOR</a>
+<li><a name="TOC17" href="#SEC17">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
 <P>
@@ -43,38 +44,75 @@ details of the regular expressions themselves, see the
 documentation. For details of the PCRE library function calls and their
 options, see the
 <a href="pcreapi.html"><b>pcreapi</b></a>
-and
+,
 <a href="pcre16.html"><b>pcre16</b></a>
-documentation. The input for <b>pcretest</b> is a sequence of regular expression
-patterns and strings to be matched, as described below. The output shows the
-result of each match. Options on the command line and the patterns control PCRE
-options and exactly what is output.
+and
+<a href="pcre32.html"><b>pcre32</b></a>
+documentation.
 </P>
-<br><a name="SEC2" href="#TOC1">PCRE's 8-BIT and 16-BIT LIBRARIES</a><br>
 <P>
+The input for <b>pcretest</b> is a sequence of regular expression patterns and
+strings to be matched, as described below. The output shows the result of each
+match. Options on the command line and the patterns control PCRE options and
+exactly what is output.
+</P>
+<P>
+As PCRE has evolved, it has acquired many different features, and as a result,
+<b>pcretest</b> now has rather a lot of obscure options for testing every
+possible feature. Some of these options are specifically designed for use in
+conjunction with the test script and data files that are distributed as part of
+PCRE, and are unlikely to be of use otherwise. They are all documented here,
+but without much justification.
+</P>
+<br><a name="SEC2" href="#TOC1">INPUT DATA FORMAT</a><br>
+<P>
+Input to <b>pcretest</b> is processed line by line, either by calling the C
+library's <b>fgets()</b> function, or via the <b>libreadline</b> library (see
+below). In Unix-like environments, <b>fgets()</b> treats any bytes other than
+newline as data characters. However, in some Windows environments character 26
+(hex 1A) causes an immediate end of file, and no further data is read. For
+maximum portability, therefore, it is safest to use only ASCII characters in
+<b>pcretest</b> input files.
+</P>
+<br><a name="SEC3" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
+<P>
 From release 8.30, two separate PCRE libraries can be built. The original one
 supports 8-bit character strings, whereas the newer 16-bit library supports
-character strings encoded in 16-bit units. The <b>pcretest</b> program can be
-used to test both libraries. However, it is itself still an 8-bit program,
-reading 8-bit input and writing 8-bit output. When testing the 16-bit library,
-the patterns and data strings are converted to 16-bit format before being
-passed to the PCRE library functions. Results are converted to 8-bit for
-output.
+character strings encoded in 16-bit units. From release 8.32, a third library
+can be built, supporting character strings encoded in 32-bit units. The
+<b>pcretest</b> program can be used to test all three libraries. However, it is
+itself still an 8-bit program, reading 8-bit input and writing 8-bit output.
+When testing the 16-bit or 32-bit library, the patterns and data strings are
+converted to 16- or 32-bit format before being passed to the PCRE library
+functions. Results are converted to 8-bit for output.
 </P>
 <P>
-References to functions and structures of the form <b>pcre[16]_xx</b> below
-mean "<b>pcre_xx</b> when using the 8-bit library or <b>pcre16_xx</b> when using
-the 16-bit library".
+References to functions and structures of the form <b>pcre[16|32]_xx</b> below
+mean "<b>pcre_xx</b> when using the 8-bit library, <b>pcre16_xx</b> when using
+the 16-bit library, or <b>pcre32_xx</b> when using the 32-bit library".
 </P>
-<br><a name="SEC3" href="#TOC1">COMMAND LINE OPTIONS</a><br>
+<br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br>
 <P>
-<b>-16</b>
-If both the 8-bit and the 16-bit libraries have been built, this option causes
-the 16-bit library to be used. If only the 16-bit library has been built, this
-is the default (so has no effect). If only the 8-bit library has been built,
+<b>-8</b>
+If both the 8-bit library has been built, this option causes the 8-bit library
+to be used (which is the default); if the 8-bit library has not been built,
 this option causes an error.
 </P>
 <P>
+<b>-16</b>
+If both the 8-bit or the 32-bit, and the 16-bit libraries have been built, this
+option causes the 16-bit library to be used. If only the 16-bit library has been
+built, this is the default (so has no effect). If only the 8-bit or the 32-bit
+library has been built, this option causes an error.
+</P>
+<P>
+<b>-32</b>
+If both the 8-bit or the 16-bit, and the 32-bit libraries have been built, this
+option causes the 32-bit library to be used. If only the 32-bit library has been
+built, this is the default (so has no effect). If only the 8-bit or the 16-bit
+library has been built, this option causes an error.
+</P>
+<P>
 <b>-b</b>
 Behave as if each pattern has the <b>/B</b> (show byte code) modifier; the
 internal form is output after compilation.
@@ -82,27 +120,38 @@ internal form is output after compilation.
 <P>
 <b>-C</b>
 Output the version number of the PCRE library, and all available information
-about the optional features that are included, and then exit. All other options
-are ignored.
+about the optional features that are included, and then exit with zero exit
+code. All other options are ignored.
 </P>
 <P>
 <b>-C</b> <i>option</i>
 Output information about a specific build-time option, then exit. This
 functionality is intended for use in scripts such as <b>RunTest</b>. The
-following options output the value indicated:
+following options output the value and set the exit code as indicated:
 <pre>
-  linksize   the internal link size (2, 3, or 4)
+  ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
+               0x15 or 0x25
+               0 if used in an ASCII environment
+               exit code is always 0
+  linksize   the configured internal link size (2, 3, or 4)
+               exit code is set to the link size
   newline    the default newline setting:
                CR, LF, CRLF, ANYCRLF, or ANY
+               exit code is always 0
 </pre>
-The following options output 1 for true or zero for false:
+The following options output 1 for true or 0 for false, and set the exit code
+to the same value:
 <pre>
+  ebcdic     compiled for an EBCDIC environment
   jit        just-in-time support is available
   pcre16     the 16-bit library was built
+  pcre32     the 32-bit library was built
   pcre8      the 8-bit library was built
   ucp        Unicode property support is available
-  utf        UTF-8 and/or UTF-16 support is available
-</PRE>
+  utf        UTF-8 and/or UTF-16 and/or UTF-32 support
+               is available
+</pre>
+If an unknown option is given, an error message is output; the exit code is 0.
 </P>
 <P>
 <b>-d</b>
@@ -113,8 +162,8 @@ form and information about the compiled pattern is out
 <P>
 <b>-dfa</b>
 Behave as if each data line contains the \D escape sequence; this causes the
-alternative matching function, <b>pcre[16]_dfa_exec()</b>, to be used instead of
-the standard <b>pcre[16]_exec()</b> function (more detail is given below).
+alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, to be used instead
+of the standard <b>pcre[16|32]_exec()</b> function (more detail is given below).
 </P>
 <P>
 <b>-help</b>
@@ -129,7 +178,7 @@ compiled pattern is given after compilation.
 <b>-M</b>
 Behave as if each data line contains the \M escape sequence; this causes
 PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by
-calling <b>pcre[16]_exec()</b> repeatedly with different limits.
+calling <b>pcre[16|32]_exec()</b> repeatedly with different limits.
 </P>
 <P>
 <b>-m</b>
@@ -140,9 +189,10 @@ bytes for both libraries.
 <P>
 <b>-o</b> <i>osize</i>
 Set the number of elements in the output vector that is used when calling
-<b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> to be <i>osize</i>. The
+<b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> to be <i>osize</i>. The
 default value is 45, which is enough for 14 capturing subexpressions for
-<b>pcre[16]_exec()</b> or 22 different matches for <b>pcre[16]_dfa_exec()</b>.
+<b>pcre[16|32]_exec()</b> or 22 different matches for
+<b>pcre[16|32]_dfa_exec()</b>.
 The vector size can be changed for individual matching calls by including \O
 in the data line (see below).
 </P>
@@ -165,7 +215,7 @@ megabytes.
 <b>-s</b> or <b>-s+</b>
 Behave as if each pattern has the <b>/S</b> modifier; in other words, force each
 pattern to be studied. If <b>-s+</b> is used, all the JIT compile options are
-passed to <b>pcre[16]_study()</b>, causing just-in-time optimization to be set
+passed to <b>pcre[16|32]_study()</b>, causing just-in-time optimization to be set
 up if it is available, for both full and partial matching. Specific JIT compile
 options can be selected by following <b>-s+</b> with a digit in the range 1 to
 7, which selects the JIT compile modes as follows:
@@ -180,8 +230,12 @@ options can be selected by following <b>-s+</b> with a
 If <b>-s++</b> is used instead of <b>-s+</b> (with or without a following digit),
 the text "(JIT)" is added to the first output line after a match or no match
 when JIT-compiled code was actually used.
-</P>
-<P>
+<br>
+<br>
+Note that there are pattern options that can override <b>-s</b>, either
+specifying no studying at all, or suppressing JIT compilation.
+<br>
+<br>
 If the <b>/I</b> or <b>/D</b> option is present on a pattern (requesting output
 about the compiled pattern), information about the result of studying is not
 included when studying is caused only by <b>-s</b> and neither <b>-i</b> nor
@@ -215,7 +269,7 @@ to iterate 500000 times.
 This is like <b>-t</b> except that it times only the matching phase, not the
 compile or study phases.
 </P>
-<br><a name="SEC4" href="#TOC1">DESCRIPTION</a><br>
+<br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br>
 <P>
 If <b>pcretest</b> is given two filename arguments, it reads from the first and
 writes to the second. If it is given only one filename argument, it reads from
@@ -272,23 +326,80 @@ backslash, because
 is interpreted as the first line of a pattern that starts with "abc/", causing
 pcretest to read the next line as a continuation of the regular expression.
 </P>
-<br><a name="SEC5" href="#TOC1">PATTERN MODIFIERS</a><br>
+<br><a name="SEC6" href="#TOC1">PATTERN MODIFIERS</a><br>
 <P>
 A pattern may be followed by any number of modifiers, which are mostly single
-characters. Following Perl usage, these are referred to below as, for example,
-"the <b>/i</b> modifier", even though the delimiter of the pattern need not
-always be a slash, and no slash is used when writing modifiers. White space may
-appear between the final pattern delimiter and the first modifier, and between
-the modifiers themselves.
+characters, though some of these can be qualified by further characters.
+Following Perl usage, these are referred to below as, for example, "the
+<b>/i</b> modifier", even though the delimiter of the pattern need not always be
+a slash, and no slash is used when writing modifiers. White space may appear
+between the final pattern delimiter and the first modifier, and between the
+modifiers themselves. For reference, here is a complete list of modifiers. They
+fall into several groups that are described in detail in the following
+sections.
+<pre>
+  <b>/8</b>              set UTF mode
+  <b>/9</b>              set PCRE_NEVER_UTF (locks out UTF mode)
+  <b>/?</b>              disable UTF validity check
+  <b>/+</b>              show remainder of subject after match
+  <b>/=</b>              show all captures (not just those that are set)
+
+  <b>/A</b>              set PCRE_ANCHORED
+  <b>/B</b>              show compiled code
+  <b>/C</b>              set PCRE_AUTO_CALLOUT
+  <b>/D</b>              same as <b>/B</b> plus <b>/I</b>
+  <b>/E</b>              set PCRE_DOLLAR_ENDONLY
+  <b>/F</b>              flip byte order in compiled pattern
+  <b>/f</b>              set PCRE_FIRSTLINE
+  <b>/G</b>              find all matches (shorten string)
+  <b>/g</b>              find all matches (use startoffset)
+  <b>/I</b>              show information about pattern
+  <b>/i</b>              set PCRE_CASELESS
+  <b>/J</b>              set PCRE_DUPNAMES
+  <b>/K</b>              show backtracking control names
+  <b>/L</b>              set locale
+  <b>/M</b>              show compiled memory size
+  <b>/m</b>              set PCRE_MULTILINE
+  <b>/N</b>              set PCRE_NO_AUTO_CAPTURE
+  <b>/P</b>              use the POSIX wrapper
+  <b>/S</b>              study the pattern after compilation
+  <b>/s</b>              set PCRE_DOTALL
+  <b>/T</b>              select character tables
+  <b>/U</b>              set PCRE_UNGREEDY
+  <b>/W</b>              set PCRE_UCP
+  <b>/X</b>              set PCRE_EXTRA
+  <b>/x</b>              set PCRE_EXTENDED
+  <b>/Y</b>              set PCRE_NO_START_OPTIMIZE
+  <b>/Z</b>              don't show lengths in <b>/B</b> output
+
+  <b>/&#60;any&#62;</b>          set PCRE_NEWLINE_ANY
+  <b>/&#60;anycrlf&#62;</b>      set PCRE_NEWLINE_ANYCRLF
+  <b>/&#60;cr&#62;</b>           set PCRE_NEWLINE_CR
+  <b>/&#60;crlf&#62;</b>         set PCRE_NEWLINE_CRLF
+  <b>/&#60;lf&#62;</b>           set PCRE_NEWLINE_LF
+  <b>/&#60;bsr_anycrlf&#62;</b>  set PCRE_BSR_ANYCRLF
+  <b>/&#60;bsr_unicode&#62;</b>  set PCRE_BSR_UNICODE
+  <b>/&#60;JS&#62;</b>           set PCRE_JAVASCRIPT_COMPAT
+
+</PRE>
 </P>
+<br><b>
+Perl-compatible modifiers
+</b><br>
 <P>
 The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS,
 PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
-<b>pcre[16]_compile()</b> is called. These four modifier letters have the same
+<b>pcre[16|32]_compile()</b> is called. These four modifier letters have the same
 effect as they do in Perl. For example:
 <pre>
   /caseless/i
-</pre>
+
+</PRE>
+</P>
+<br><b>
+Modifiers for other PCRE options
+</b><br>
+<P>
 The following table shows additional modifiers for setting PCRE compile-time
 options that do not correspond to anything in Perl:
 <pre>
@@ -298,6 +409,10 @@ options that do not correspond to anything in Perl:
   <b>/8</b>              PCRE_UTF16          ) when using the 16-bit
   <b>/?</b>              PCRE_NO_UTF16_CHECK )   library
 
+  <b>/8</b>              PCRE_UTF32          ) when using the 32-bit
+  <b>/?</b>              PCRE_NO_UTF32_CHECK )   library
+
+  <b>/9</b>              PCRE_NEVER_UTF
   <b>/A</b>              PCRE_ANCHORED
   <b>/C</b>              PCRE_AUTO_CALLOUT
   <b>/E</b>              PCRE_DOLLAR_ENDONLY
@@ -308,14 +423,14 @@ options that do not correspond to anything in Perl:
   <b>/W</b>              PCRE_UCP
   <b>/X</b>              PCRE_EXTRA
   <b>/Y</b>              PCRE_NO_START_OPTIMIZE
-  <b>/&#60;JS&#62;</b>           PCRE_JAVASCRIPT_COMPAT
+  <b>/&#60;any&#62;</b>          PCRE_NEWLINE_ANY
+  <b>/&#60;anycrlf&#62;</b>      PCRE_NEWLINE_ANYCRLF
   <b>/&#60;cr&#62;</b>           PCRE_NEWLINE_CR
-  <b>/&#60;lf&#62;</b>           PCRE_NEWLINE_LF
   <b>/&#60;crlf&#62;</b>         PCRE_NEWLINE_CRLF
-  <b>/&#60;anycrlf&#62;</b>      PCRE_NEWLINE_ANYCRLF
-  <b>/&#60;any&#62;</b>          PCRE_NEWLINE_ANY
+  <b>/&#60;lf&#62;</b>           PCRE_NEWLINE_LF
   <b>/&#60;bsr_anycrlf&#62;</b>  PCRE_BSR_ANYCRLF
   <b>/&#60;bsr_unicode&#62;</b>  PCRE_BSR_UNICODE
+  <b>/&#60;JS&#62;</b>           PCRE_JAVASCRIPT_COMPAT
 </pre>
 The modifiers that are enclosed in angle brackets are literal strings as shown,
 including the angle brackets, but the letters within can be in either case.
@@ -323,7 +438,7 @@ This example sets multiline matching with CRLF as the 
 <pre>
   /^abc/m&#60;CRLF&#62;
 </pre>
-As well as turning on the PCRE_UTF8/16 option, the <b>/8</b> modifier causes
+As well as turning on the PCRE_UTF8/16/32 option, the <b>/8</b> modifier causes
 all non-printing characters in output strings to be printed using the
 \x{hh...} notation. Otherwise, those less than 0x100 are output in hex without
 the curly brackets.
@@ -341,13 +456,13 @@ Searching for all possible matches within each subject
 by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
 again to search the remainder of the subject string. The difference between
 <b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to
-<b>pcre[16]_exec()</b> to start searching at a new point within the entire
+<b>pcre[16|32]_exec()</b> to start searching at a new point within the entire
 string (which is in effect what Perl does), whereas the latter passes over a
 shortened substring. This makes a difference to the matching process if the
 pattern begins with a lookbehind assertion (including \b or \B).
 </P>
 <P>
-If any call to <b>pcre[16]_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches
+If any call to <b>pcre[16|32]_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches
 an empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and
 PCRE_ANCHORED flags set in order to search for another, non-empty, match at the
 same point. If this second match fails, the start offset is advanced, and the
@@ -378,7 +493,7 @@ modifier because /S+ and /S++ have other meanings.
 The <b>/=</b> modifier requests that the values of all potential captured
 parentheses be output after a match. By default, only those up to the highest
 one actually used in the match are output (corresponding to the return code
-from <b>pcre[16]_exec()</b>). Values in the offsets vector corresponding to
+from <b>pcre[16|32]_exec()</b>). Values in the offsets vector corresponding to
 higher numbers should be set to -1, and these are output as "&#60;unset&#62;". This
 modifier gives a way of checking that this is happening.
 </P>
@@ -406,16 +521,16 @@ below.
 <P>
 The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
 compiled pattern (whether it is anchored, has a fixed first character, and
-so on). It does this by calling <b>pcre[16]_fullinfo()</b> after compiling a
+so on). It does this by calling <b>pcre[16|32]_fullinfo()</b> after compiling a
 pattern. If the pattern is studied, the results of that are also output.
 </P>
 <P>
 The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking
-control verbs that are returned from calls to <b>pcre[16]_exec()</b>. It causes
-<b>pcretest</b> to create a <b>pcre[16]_extra</b> block if one has not already
-been created by a call to <b>pcre[16]_study()</b>, and to set the
+control verbs that are returned from calls to <b>pcre[16|32]_exec()</b>. It causes
+<b>pcretest</b> to create a <b>pcre[16|32]_extra</b> block if one has not already
+been created by a call to <b>pcre[16|32]_study()</b>, and to set the
 PCRE_EXTRA_MARK flag and the <b>mark</b> field within it, every time that
-<b>pcre[16]_exec()</b> is called. If the variable that the <b>mark</b> field
+<b>pcre[16|32]_exec()</b> is called. If the variable that the <b>mark</b> field
 points to is non-NULL for a match, non-match, or partial match, <b>pcretest</b>
 prints the string to which it points. For a match, this is shown on a line by
 itself, tagged with "MK:". For a non-match it is added to the message.
@@ -427,8 +542,8 @@ example,
   /pattern/Lfr_FR
 </pre>
 For this reason, it must be the last modifier. The given locale is set,
-<b>pcre[16]_maketables()</b> is called to build a set of character tables for
-the locale, and this is then passed to <b>pcre[16]_compile()</b> when compiling
+<b>pcre[16|32]_maketables()</b> is called to build a set of character tables for
+the locale, and this is then passed to <b>pcre[16|32]_compile()</b> when compiling
 the regular expression. Without an <b>/L</b> (or <b>/T</b>) modifier, NULL is
 passed as the tables pointer; that is, <b>/L</b> applies only to the expression
 on which it appears.
@@ -436,22 +551,31 @@ on which it appears.
 <P>
 The <b>/M</b> modifier causes the size in bytes of the memory block used to hold
 the compiled pattern to be output. This does not include the size of the
-<b>pcre[16]</b> block; it is just the actual compiled data. If the pattern is
+<b>pcre[16|32]</b> block; it is just the actual compiled data. If the pattern is
 successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
 JIT compiled code is also output.
 </P>
 <P>
-If the <b>/S</b> modifier appears once, it causes <b>pcre[16]_study()</b> to be
-called after the expression has been compiled, and the results used when the
-expression is matched. If <b>/S</b> appears twice, it suppresses studying, even
+The <b>/S</b> modifier causes <b>pcre[16|32]_study()</b> to be called after the
+expression has been compiled, and the results used when the expression is
+matched. There are a number of qualifying characters that may follow <b>/S</b>.
+They may appear in any order.
+</P>
+<P>
+If <b>S</b> is followed by an exclamation mark, <b>pcre[16|32]_study()</b> is called
+with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
+<b>pcre_extra</b> block, even when studying discovers no useful information.
+</P>
+<P>
+If <b>/S</b> is followed by a second S character, it suppresses studying, even
 if it was requested externally by the <b>-s</b> command line option. This makes
 it possible to specify that certain patterns are always studied, and others are
 never studied, independently of <b>-s</b>. This feature is used in the test
 files in a few cases where the output is different when the pattern is studied.
 </P>
 <P>
-If the <b>/S</b> modifier is immediately followed by a + character, the call to
-<b>pcre[16]_study()</b> is made with all the JIT study options, requesting
+If the <b>/S</b> modifier is followed by a + character, the call to
+<b>pcre[16|32]_study()</b> is made with all the JIT study options, requesting
 just-in-time optimization support if it is available, for both normal and
 partial matching. If you want to restrict the JIT compiling modes, you can
 follow <b>/S+</b> with a digit in the range 1 to 7:
@@ -473,15 +597,21 @@ immediately after <b>/S</b> or <b>/S+</b> because this
 </P>
 <P>
 If JIT studying is successful, the compiled JIT code will automatically be used
-when <b>pcre[16]_exec()</b> is run, except when incompatible run-time options
+when <b>pcre[16|32]_exec()</b> is run, except when incompatible run-time options
 are specified. For more details, see the
 <a href="pcrejit.html"><b>pcrejit</b></a>
 documentation. See also the <b>\J</b> escape sequence below for a way of
 setting the size of the JIT stack.
 </P>
 <P>
+Finally, if <b>/S</b> is followed by a minus character, JIT compilation is
+suppressed, even if it was requested externally by the <b>-s</b> command line
+option. This makes it possible to specify that JIT is never to be used for
+certain patterns.
+</P>
+<P>
 The <b>/T</b> modifier must be followed by a single digit. It causes a specific
-set of built-in character tables to be passed to <b>pcre[16]_compile()</b>. It
+set of built-in character tables to be passed to <b>pcre[16|32]_compile()</b>. It
 is used in the standard PCRE tests to check behaviour with different character
 tables. The digit specifies the tables as follows:
 <pre>
@@ -512,9 +642,9 @@ function:
 The <b>/+</b> modifier works as described above. All other modifiers are
 ignored.
 </P>
-<br><a name="SEC6" href="#TOC1">DATA LINES</a><br>
+<br><a name="SEC7" href="#TOC1">DATA LINES</a><br>
 <P>
-Before each data line is passed to <b>pcre[16]_exec()</b>, leading and trailing
+Before each data line is passed to <b>pcre[16|32]_exec()</b>, leading and trailing
 white space is removed, and it is then scanned for \ escapes. Some of these
 are pretty esoteric features, intended for checking out some of the more
 complicated features of PCRE. If you are just testing "ordinary" regular
@@ -531,45 +661,45 @@ recognized:
   \t         tab (\x09)
   \v         vertical tab (\x0b)
   \nnn       octal character (up to 3 octal digits); always
-               a byte unless &#62; 255 in UTF-8 or 16-bit mode
+               a byte unless &#62; 255 in UTF-8 or 16-bit or 32-bit mode
   \xhh       hexadecimal byte (up to 2 hex digits)
   \x{hh...}  hexadecimal character (any number of hex digits)
-  \A         pass the PCRE_ANCHORED option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
-  \B         pass the PCRE_NOTBOL option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
-  \Cdd       call pcre[16]_copy_substring() for substring dd after a successful match (number less than 32)
-  \Cname     call pcre[16]_copy_named_substring() for substring "name" after a successful match (name termin-
+  \A         pass the PCRE_ANCHORED option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+  \B         pass the PCRE_NOTBOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+  \Cdd       call pcre[16|32]_copy_substring() for substring dd after a successful match (number less than 32)
+  \Cname     call pcre[16|32]_copy_named_substring() for substring "name" after a successful match (name termin-
                ated by next non alphanumeric character)
   \C+        show the current captured substrings at callout time
   \C-        do not supply a callout function
   \C!n       return 1 instead of 0 when callout number n is reached
   \C!n!m     return 1 instead of 0 when callout number n is reached for the nth time
   \C*n       pass the number n (may be negative) as callout data; this is used as the callout return value
-  \D         use the <b>pcre[16]_dfa_exec()</b> match function
-  \F         only shortest match for <b>pcre[16]_dfa_exec()</b>
-  \Gdd       call pcre[16]_get_substring() for substring dd after a successful match (number less than 32)
-  \Gname     call pcre[16]_get_named_substring() for substring "name" after a successful match (name termin-
+  \D         use the <b>pcre[16|32]_dfa_exec()</b> match function
+  \F         only shortest match for <b>pcre[16|32]_dfa_exec()</b>
+  \Gdd       call pcre[16|32]_get_substring() for substring dd after a successful match (number less than 32)
+  \Gname     call pcre[16|32]_get_named_substring() for substring "name" after a successful match (name termin-
                ated by next non-alphanumeric character)
   \Jdd       set up a JIT stack of dd kilobytes maximum (any number of digits)
-  \L         call pcre[16]_get_substringlist() after a successful match
+  \L         call pcre[16|32]_get_substringlist() after a successful match
   \M         discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
-  \N         pass the PCRE_NOTEMPTY option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>; if used twice, pass the
+  \N         pass the PCRE_NOTEMPTY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the
                PCRE_NOTEMPTY_ATSTART option
-  \Odd       set the size of the output vector passed to <b>pcre[16]_exec()</b> to dd (any number of digits)
-  \P         pass the PCRE_PARTIAL_SOFT option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>; if used twice, pass the
+  \Odd       set the size of the output vector passed to <b>pcre[16|32]_exec()</b> to dd (any number of digits)
+  \P         pass the PCRE_PARTIAL_SOFT option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the
                PCRE_PARTIAL_HARD option
   \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
-  \R         pass the PCRE_DFA_RESTART option to <b>pcre[16]_dfa_exec()</b>
+  \R         pass the PCRE_DFA_RESTART option to <b>pcre[16|32]_dfa_exec()</b>
   \S         output details of memory get/free calls during matching
-  \Y         pass the PCRE_NO_START_OPTIMIZE option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
-  \Z         pass the PCRE_NOTEOL option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
-  \?         pass the PCRE_NO_UTF[8|16]_CHECK option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
+  \Y         pass the PCRE_NO_START_OPTIMIZE option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+  \Z         pass the PCRE_NOTEOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+  \?         pass the PCRE_NO_UTF[8|16|32]_CHECK option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
   \&#62;dd       start the match at offset dd (optional "-"; then any number of digits); this sets the <i>startoffset</i>
-               argument for <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
-  \&#60;cr&#62;      pass the PCRE_NEWLINE_CR option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
-  \&#60;lf&#62;      pass the PCRE_NEWLINE_LF option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
-  \&#60;crlf&#62;    pass the PCRE_NEWLINE_CRLF option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
-  \&#60;anycrlf&#62; pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
-  \&#60;any&#62;     pass the PCRE_NEWLINE_ANY option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>
+               argument for <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+  \&#60;cr&#62;      pass the PCRE_NEWLINE_CR option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+  \&#60;lf&#62;      pass the PCRE_NEWLINE_LF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+  \&#60;crlf&#62;    pass the PCRE_NEWLINE_CRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+  \&#60;anycrlf&#62; pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+  \&#60;any&#62;     pass the PCRE_NEWLINE_ANY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
 </pre>
 The use of \x{hh...} is not dependent on the use of the <b>/8</b> modifier on
 the pattern. It is recognized always. There may be any number of hexadecimal
@@ -588,6 +718,10 @@ In UTF-16 mode, all 4-digit \x{hhhh} values are accept
 possible to construct invalid UTF-16 sequences for testing purposes.
 </P>
 <P>
+In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it
+possible to construct invalid UTF-32 sequences for testing purposes.
+</P>
+<P>
 The escapes that specify line ending sequences are literal strings, exactly as
 shown. No more than one newline setting should be present in any data line.
 </P>
@@ -604,12 +738,12 @@ is not being used. Providing a stack that is larger th
 necessary only for very complicated patterns.
 </P>
 <P>
-If \M is present, <b>pcretest</b> calls <b>pcre[16]_exec()</b> several times,
+If \M is present, <b>pcretest</b> calls <b>pcre[16|32]_exec()</b> several times,
 with different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
-fields of the <b>pcre[16]_extra</b> data structure, until it finds the minimum
-numbers for each parameter that allow <b>pcre[16]_exec()</b> to complete without
+fields of the <b>pcre[16|32]_extra</b> data structure, until it finds the minimum
+numbers for each parameter that allow <b>pcre[16|32]_exec()</b> to complete without
 error. Because this is testing a specific feature of the normal interpretive
-<b>pcre[16]_exec()</b> execution, the use of any JIT optimization that might
+<b>pcre[16|32]_exec()</b> execution, the use of any JIT optimization that might
 have been set up by the <b>/S+</b> qualifier of <b>-s+</b> option is disabled.
 </P>
 <P>
@@ -624,7 +758,7 @@ needed to complete the match attempt.
 <P>
 When \O is used, the value specified may be higher or lower than the size set
 by the <b>-O</b> command line option (or defaulted to 45); \O applies only to
-the call of <b>pcre[16]_exec()</b> for the line in which it appears.
+the call of <b>pcre[16|32]_exec()</b> for the line in which it appears.
 </P>
 <P>
 If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
@@ -632,11 +766,11 @@ API to be used, the only option-setting sequences that
 \N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively,
 to be passed to <b>regexec()</b>.
 </P>
-<br><a name="SEC7" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
+<br><a name="SEC8" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
 <P>
 By default, <b>pcretest</b> uses the standard PCRE matching function,
-<b>pcre[16]_exec()</b> to match each data line. PCRE also supports an
-alternative matching function, <b>pcre[16]_dfa_test()</b>, which operates in a
+<b>pcre[16|32]_exec()</b> to match each data line. PCRE also supports an
+alternative matching function, <b>pcre[16|32]_dfa_test()</b>, which operates in a
 different way, and has some restrictions. The differences between the two
 functions are described in the
 <a href="pcrematching.html"><b>pcrematching</b></a>
@@ -649,17 +783,17 @@ This function finds all possible matches at a given po
 escape sequence is present in the data line, it stops after the first match is
 found. This is always the shortest possible match.
 </P>
-<br><a name="SEC8" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
+<br><a name="SEC9" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
 <P>
 This section describes the output when the normal matching function,
-<b>pcre[16]_exec()</b>, is being used.
+<b>pcre[16|32]_exec()</b>, is being used.
 </P>
 <P>
 When a match succeeds, <b>pcretest</b> outputs the list of captured substrings
-that <b>pcre[16]_exec()</b> returns, starting with number 0 for the string that
+that <b>pcre[16|32]_exec()</b> returns, starting with number 0 for the string that
 matched the whole pattern. Otherwise, it outputs "No match" when the return is
 PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching
-substring when <b>pcre[16]_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that
+substring when <b>pcre[16|32]_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that
 this is the entire substring that was inspected during the partial match; it
 may include characters before the actual match start if a lookbehind assertion,
 \K, \b, or \B was involved.) For any other return, <b>pcretest</b> outputs
@@ -679,7 +813,7 @@ at least two. Here is an example of an interactive <b>
   No match
 </pre>
 Unset capturing substrings that are not followed by one that is set are not
-returned by <b>pcre[16]_exec()</b>, and are not shown by <b>pcretest</b>. In the
+returned by <b>pcre[16|32]_exec()</b>, and are not shown by <b>pcretest</b>. In the
 following example, there are two capturing substrings, but when the first data
 line is matched, the second, unset substring is not shown. An "internal" unset
 substring is shown as "&#60;unset&#62;", as for the second data line.
@@ -740,9 +874,9 @@ prompt is used for continuations), data lines may not.
 included in data by means of the \n escape (or \r, \r\n, etc., depending on
 the newline sequence setting).
 </P>
-<br><a name="SEC9" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
+<br><a name="SEC10" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
 <P>
-When the alternative matching function, <b>pcre[16]_dfa_exec()</b>, is used (by
+When the alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, is used (by
 means of the \D escape sequence or the <b>-dfa</b> command line option), the
 output consists of a list of all the matches that start at the first point in
 the subject where there is at least one match. For example:
@@ -776,7 +910,7 @@ at the end of the longest match. For example:
 Since the matching function does not support substring capture, the escape
 sequences that are concerned with captured substrings are not relevant.
 </P>
-<br><a name="SEC10" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
+<br><a name="SEC11" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
 <P>
 When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
 indicating that the subject partially matched the pattern, you can restart the
@@ -793,7 +927,7 @@ For further information about partial matching, see th
 <a href="pcrepartial.html"><b>pcrepartial</b></a>
 documentation.
 </P>
-<br><a name="SEC11" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC12" href="#TOC1">CALLOUTS</a><br>
 <P>
 If the pattern contains any callout requests, <b>pcretest</b>'s callout function
 is called during matching. This works with both matching functions. By default,
@@ -854,7 +988,7 @@ the
 <a href="pcrecallout.html"><b>pcrecallout</b></a>
 documentation.
 </P>
-<br><a name="SEC12" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
+<br><a name="SEC13" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
 <P>
 When <b>pcretest</b> is outputting text in the compiled version of a pattern,
 bytes other than 32-126 are always treated as non-printing characters are are
@@ -866,7 +1000,7 @@ string, it behaves in the same way, unless a different
 the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b>
 function to distinguish printing and non-printing characters.
 </P>
-<br><a name="SEC13" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
+<br><a name="SEC14" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
 <P>
 The facilities described in this section are not available when the POSIX
 interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
@@ -939,13 +1073,14 @@ string using a reloaded pattern is likely to cause <b>
 Finally, if you attempt to load a file that is not in the correct format, the
 result is undefined.
 </P>
-<br><a name="SEC14" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC15" href="#TOC1">SEE ALSO</a><br>
 <P>
-<b>pcre</b>(3), <b>pcre16</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
+<b>pcre</b>(3), <b>pcre16</b>(3), <b>pcre32</b>(3), <b>pcreapi</b>(3),
+<b>pcrecallout</b>(3),
 <b>pcrejit</b>, <b>pcrematching</b>(3), <b>pcrepartial</b>(d),
 <b>pcrepattern</b>(3), <b>pcreprecompile</b>(3).
 </P>
-<br><a name="SEC15" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC16" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@@ -954,11 +1089,11 @@ University Computing Service
 Cambridge CB2 3QH, England.
 <br>
 </P>
-<br><a name="SEC16" href="#TOC1">REVISION</a><br>
+<br><a name="SEC17" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 21 February 2012
+Last updated: 26 April 2013
 <br>
-Copyright &copy; 1997-2012 University of Cambridge.
+Copyright &copy; 1997-2013 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.