version 1.1.1.1, 2012/02/21 23:05:52
|
version 1.1.1.4, 2013/07/22 08:25:57
|
Line 14 man page, in case the conversion went wrong.
|
Line 14 man page, in case the conversion went wrong.
|
<br> |
<br> |
<ul> |
<ul> |
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a> |
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a> |
<li><a name="TOC2" href="#SEC2">COMMAND LINE OPTIONS</a> | <li><a name="TOC2" href="#SEC2">INPUT DATA FORMAT</a> |
<li><a name="TOC3" href="#SEC3">DESCRIPTION</a> | <li><a name="TOC3" href="#SEC3">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a> |
<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a> | <li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a> |
<li><a name="TOC5" href="#SEC5">DATA LINES</a> | <li><a name="TOC5" href="#SEC5">DESCRIPTION</a> |
<li><a name="TOC6" href="#SEC6">THE ALTERNATIVE MATCHING FUNCTION</a> | <li><a name="TOC6" href="#SEC6">PATTERN MODIFIERS</a> |
<li><a name="TOC7" href="#SEC7">DEFAULT OUTPUT FROM PCRETEST</a> | <li><a name="TOC7" href="#SEC7">DATA LINES</a> |
<li><a name="TOC8" href="#SEC8">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a> | <li><a name="TOC8" href="#SEC8">THE ALTERNATIVE MATCHING FUNCTION</a> |
<li><a name="TOC9" href="#SEC9">RESTARTING AFTER A PARTIAL MATCH</a> | <li><a name="TOC9" href="#SEC9">DEFAULT OUTPUT FROM PCRETEST</a> |
<li><a name="TOC10" href="#SEC10">CALLOUTS</a> | <li><a name="TOC10" href="#SEC10">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a> |
<li><a name="TOC11" href="#SEC11">NON-PRINTING CHARACTERS</a> | <li><a name="TOC11" href="#SEC11">RESTARTING AFTER A PARTIAL MATCH</a> |
<li><a name="TOC12" href="#SEC12">SAVING AND RELOADING COMPILED PATTERNS</a> | <li><a name="TOC12" href="#SEC12">CALLOUTS</a> |
<li><a name="TOC13" href="#SEC13">SEE ALSO</a> | <li><a name="TOC13" href="#SEC13">NON-PRINTING CHARACTERS</a> |
<li><a name="TOC14" href="#SEC14">AUTHOR</a> | <li><a name="TOC14" href="#SEC14">SAVING AND RELOADING COMPILED PATTERNS</a> |
<li><a name="TOC15" href="#SEC15">REVISION</a> | <li><a name="TOC15" href="#SEC15">SEE ALSO</a> |
| <li><a name="TOC16" href="#SEC16">AUTHOR</a> |
| <li><a name="TOC17" href="#SEC17">REVISION</a> |
</ul> |
</ul> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
<P> |
<P> |
Line 42 details of the regular expressions themselves, see the
|
Line 44 details of the regular expressions themselves, see the
|
documentation. For details of the PCRE library function calls and their |
documentation. For details of the PCRE library function calls and their |
options, see the |
options, see the |
<a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
documentation. The input for <b>pcretest</b> is a sequence of regular expression | , |
patterns and strings to be matched, as described below. The output shows the | <a href="pcre16.html"><b>pcre16</b></a> |
result of each match. Options on the command line and the patterns control PCRE | and |
options and exactly what is output. | <a href="pcre32.html"><b>pcre32</b></a> |
| documentation. |
</P> |
</P> |
<br><a name="SEC2" href="#TOC1">COMMAND LINE OPTIONS</a><br> |
|
<P> |
<P> |
|
The input for <b>pcretest</b> is a sequence of regular expression patterns and |
|
strings to be matched, as described below. The output shows the result of each |
|
match. Options on the command line and the patterns control PCRE options and |
|
exactly what is output. |
|
</P> |
|
<P> |
|
As PCRE has evolved, it has acquired many different features, and as a result, |
|
<b>pcretest</b> now has rather a lot of obscure options for testing every |
|
possible feature. Some of these options are specifically designed for use in |
|
conjunction with the test script and data files that are distributed as part of |
|
PCRE, and are unlikely to be of use otherwise. They are all documented here, |
|
but without much justification. |
|
</P> |
|
<br><a name="SEC2" href="#TOC1">INPUT DATA FORMAT</a><br> |
|
<P> |
|
Input to <b>pcretest</b> is processed line by line, either by calling the C |
|
library's <b>fgets()</b> function, or via the <b>libreadline</b> library (see |
|
below). In Unix-like environments, <b>fgets()</b> treats any bytes other than |
|
newline as data characters. However, in some Windows environments character 26 |
|
(hex 1A) causes an immediate end of file, and no further data is read. For |
|
maximum portability, therefore, it is safest to use only ASCII characters in |
|
<b>pcretest</b> input files. |
|
</P> |
|
<br><a name="SEC3" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br> |
|
<P> |
|
From release 8.30, two separate PCRE libraries can be built. The original one |
|
supports 8-bit character strings, whereas the newer 16-bit library supports |
|
character strings encoded in 16-bit units. From release 8.32, a third library |
|
can be built, supporting character strings encoded in 32-bit units. The |
|
<b>pcretest</b> program can be used to test all three libraries. However, it is |
|
itself still an 8-bit program, reading 8-bit input and writing 8-bit output. |
|
When testing the 16-bit or 32-bit library, the patterns and data strings are |
|
converted to 16- or 32-bit format before being passed to the PCRE library |
|
functions. Results are converted to 8-bit for output. |
|
</P> |
|
<P> |
|
References to functions and structures of the form <b>pcre[16|32]_xx</b> below |
|
mean "<b>pcre_xx</b> when using the 8-bit library, <b>pcre16_xx</b> when using |
|
the 16-bit library, or <b>pcre32_xx</b> when using the 32-bit library". |
|
</P> |
|
<br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br> |
|
<P> |
|
<b>-8</b> |
|
If both the 8-bit library has been built, this option causes the 8-bit library |
|
to be used (which is the default); if the 8-bit library has not been built, |
|
this option causes an error. |
|
</P> |
|
<P> |
|
<b>-16</b> |
|
If both the 8-bit or the 32-bit, and the 16-bit libraries have been built, this |
|
option causes the 16-bit library to be used. If only the 16-bit library has been |
|
built, this is the default (so has no effect). If only the 8-bit or the 32-bit |
|
library has been built, this option causes an error. |
|
</P> |
|
<P> |
|
<b>-32</b> |
|
If both the 8-bit or the 16-bit, and the 32-bit libraries have been built, this |
|
option causes the 32-bit library to be used. If only the 32-bit library has been |
|
built, this is the default (so has no effect). If only the 8-bit or the 16-bit |
|
library has been built, this option causes an error. |
|
</P> |
|
<P> |
<b>-b</b> |
<b>-b</b> |
Behave as if each pattern has the <b>/B</b> (show byte code) modifier; the |
Behave as if each pattern has the <b>/B</b> (show byte code) modifier; the |
internal form is output after compilation. |
internal form is output after compilation. |
Line 56 internal form is output after compilation.
|
Line 120 internal form is output after compilation.
|
<P> |
<P> |
<b>-C</b> |
<b>-C</b> |
Output the version number of the PCRE library, and all available information |
Output the version number of the PCRE library, and all available information |
about the optional features that are included, and then exit. | about the optional features that are included, and then exit with zero exit |
| code. All other options are ignored. |
</P> |
</P> |
<P> |
<P> |
|
<b>-C</b> <i>option</i> |
|
Output information about a specific build-time option, then exit. This |
|
functionality is intended for use in scripts such as <b>RunTest</b>. The |
|
following options output the value and set the exit code as indicated: |
|
<pre> |
|
ebcdic-nl the code for LF (= NL) in an EBCDIC environment: |
|
0x15 or 0x25 |
|
0 if used in an ASCII environment |
|
exit code is always 0 |
|
linksize the configured internal link size (2, 3, or 4) |
|
exit code is set to the link size |
|
newline the default newline setting: |
|
CR, LF, CRLF, ANYCRLF, or ANY |
|
exit code is always 0 |
|
</pre> |
|
The following options output 1 for true or 0 for false, and set the exit code |
|
to the same value: |
|
<pre> |
|
ebcdic compiled for an EBCDIC environment |
|
jit just-in-time support is available |
|
pcre16 the 16-bit library was built |
|
pcre32 the 32-bit library was built |
|
pcre8 the 8-bit library was built |
|
ucp Unicode property support is available |
|
utf UTF-8 and/or UTF-16 and/or UTF-32 support |
|
is available |
|
</pre> |
|
If an unknown option is given, an error message is output; the exit code is 0. |
|
</P> |
|
<P> |
<b>-d</b> |
<b>-d</b> |
Behave as if each pattern has the <b>/D</b> (debug) modifier; the internal |
Behave as if each pattern has the <b>/D</b> (debug) modifier; the internal |
form and information about the compiled pattern is output after compilation; |
form and information about the compiled pattern is output after compilation; |
Line 67 form and information about the compiled pattern is out
|
Line 162 form and information about the compiled pattern is out
|
<P> |
<P> |
<b>-dfa</b> |
<b>-dfa</b> |
Behave as if each data line contains the \D escape sequence; this causes the |
Behave as if each data line contains the \D escape sequence; this causes the |
alternative matching function, <b>pcre_dfa_exec()</b>, to be used instead of the | alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, to be used instead |
standard <b>pcre_exec()</b> function (more detail is given below). | of the standard <b>pcre[16|32]_exec()</b> function (more detail is given below). |
</P> |
</P> |
<P> |
<P> |
<b>-help</b> |
<b>-help</b> |
Line 83 compiled pattern is given after compilation.
|
Line 178 compiled pattern is given after compilation.
|
<b>-M</b> |
<b>-M</b> |
Behave as if each data line contains the \M escape sequence; this causes |
Behave as if each data line contains the \M escape sequence; this causes |
PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by |
PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by |
calling <b>pcre_exec()</b> repeatedly with different limits. | calling <b>pcre[16|32]_exec()</b> repeatedly with different limits. |
</P> |
</P> |
<P> |
<P> |
<b>-m</b> |
<b>-m</b> |
Output the size of each compiled pattern after it has been compiled. This is |
Output the size of each compiled pattern after it has been compiled. This is |
equivalent to adding <b>/M</b> to each regular expression. | equivalent to adding <b>/M</b> to each regular expression. The size is given in |
| bytes for both libraries. |
</P> |
</P> |
<P> |
<P> |
<b>-o</b> <i>osize</i> |
<b>-o</b> <i>osize</i> |
Set the number of elements in the output vector that is used when calling |
Set the number of elements in the output vector that is used when calling |
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> to be <i>osize</i>. The default value | <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> to be <i>osize</i>. The |
is 45, which is enough for 14 capturing subexpressions for <b>pcre_exec()</b> or | default value is 45, which is enough for 14 capturing subexpressions for |
22 different matches for <b>pcre_dfa_exec()</b>. The vector size can be | <b>pcre[16|32]_exec()</b> or 22 different matches for |
changed for individual matching calls by including \O in the data line (see | <b>pcre[16|32]_dfa_exec()</b>. |
below). | The vector size can be changed for individual matching calls by including \O |
| in the data line (see below). |
</P> |
</P> |
<P> |
<P> |
<b>-p</b> |
<b>-p</b> |
Behave as if each pattern has the <b>/P</b> modifier; the POSIX wrapper API is |
Behave as if each pattern has the <b>/P</b> modifier; the POSIX wrapper API is |
used to call PCRE. None of the other options has any effect when <b>-p</b> is |
used to call PCRE. None of the other options has any effect when <b>-p</b> is |
set. | set. This option can be used only with the 8-bit library. |
</P> |
</P> |
<P> |
<P> |
<b>-q</b> |
<b>-q</b> |
Line 117 megabytes.
|
Line 214 megabytes.
|
<P> |
<P> |
<b>-s</b> or <b>-s+</b> |
<b>-s</b> or <b>-s+</b> |
Behave as if each pattern has the <b>/S</b> modifier; in other words, force each |
Behave as if each pattern has the <b>/S</b> modifier; in other words, force each |
pattern to be studied. If <b>-s+</b> is used, the PCRE_STUDY_JIT_COMPILE flag is | pattern to be studied. If <b>-s+</b> is used, all the JIT compile options are |
passed to <b>pcre_study()</b>, causing just-in-time optimization to be set up if | passed to <b>pcre[16|32]_study()</b>, causing just-in-time optimization to be set |
it is available. If the <b>/I</b> or <b>/D</b> option is present on a pattern | up if it is available, for both full and partial matching. Specific JIT compile |
(requesting output about the compiled pattern), information about the result of | options can be selected by following <b>-s+</b> with a digit in the range 1 to |
studying is not included when studying is caused only by <b>-s</b> and neither | 7, which selects the JIT compile modes as follows: |
<b>-i</b> nor <b>-d</b> is present on the command line. This behaviour means that | <pre> |
the output from tests that are run with and without <b>-s</b> should be | 1 normal match only |
identical, except when options that output information about the actual running | 2 soft partial match only |
of a match are set. The <b>-M</b>, <b>-t</b>, and <b>-tm</b> options, which give | 3 normal match and soft partial match |
information about resources used, are likely to produce different output with | 4 hard partial match only |
and without <b>-s</b>. Output may also differ if the <b>/C</b> option is present | 6 soft and hard partial match |
on an individual pattern. This uses callouts to trace the the matching process, | 7 all three modes (default) |
and this may be different between studied and non-studied patterns. If the | </pre> |
pattern contains (*MARK) items there may also be differences, for the same | If <b>-s++</b> is used instead of <b>-s+</b> (with or without a following digit), |
reason. The <b>-s</b> command line option can be overridden for specific | the text "(JIT)" is added to the first output line after a match or no match |
patterns that should never be studied (see the <b>/S</b> pattern modifier | when JIT-compiled code was actually used. |
below). | <br> |
| <br> |
| Note that there are pattern options that can override <b>-s</b>, either |
| specifying no studying at all, or suppressing JIT compilation. |
| <br> |
| <br> |
| If the <b>/I</b> or <b>/D</b> option is present on a pattern (requesting output |
| about the compiled pattern), information about the result of studying is not |
| included when studying is caused only by <b>-s</b> and neither <b>-i</b> nor |
| <b>-d</b> is present on the command line. This behaviour means that the output |
| from tests that are run with and without <b>-s</b> should be identical, except |
| when options that output information about the actual running of a match are |
| set. |
| <br> |
| <br> |
| The <b>-M</b>, <b>-t</b>, and <b>-tm</b> options, which give information about |
| resources used, are likely to produce different output with and without |
| <b>-s</b>. Output may also differ if the <b>/C</b> option is present on an |
| individual pattern. This uses callouts to trace the the matching process, and |
| this may be different between studied and non-studied patterns. If the pattern |
| contains (*MARK) items there may also be differences, for the same reason. The |
| <b>-s</b> command line option can be overridden for specific patterns that |
| should never be studied (see the <b>/S</b> pattern modifier below). |
</P> |
</P> |
<P> |
<P> |
<b>-t</b> |
<b>-t</b> |
Line 150 to iterate 500000 times.
|
Line 269 to iterate 500000 times.
|
This is like <b>-t</b> except that it times only the matching phase, not the |
This is like <b>-t</b> except that it times only the matching phase, not the |
compile or study phases. |
compile or study phases. |
</P> |
</P> |
<br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br> | <br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br> |
<P> |
<P> |
If <b>pcretest</b> is given two filename arguments, it reads from the first and |
If <b>pcretest</b> is given two filename arguments, it reads from the first and |
writes to the second. If it is given only one filename argument, it reads from |
writes to the second. If it is given only one filename argument, it reads from |
Line 207 backslash, because
|
Line 326 backslash, because
|
is interpreted as the first line of a pattern that starts with "abc/", causing |
is interpreted as the first line of a pattern that starts with "abc/", causing |
pcretest to read the next line as a continuation of the regular expression. |
pcretest to read the next line as a continuation of the regular expression. |
</P> |
</P> |
<br><a name="SEC4" href="#TOC1">PATTERN MODIFIERS</a><br> | <br><a name="SEC6" href="#TOC1">PATTERN MODIFIERS</a><br> |
<P> |
<P> |
A pattern may be followed by any number of modifiers, which are mostly single |
A pattern may be followed by any number of modifiers, which are mostly single |
characters. Following Perl usage, these are referred to below as, for example, | characters, though some of these can be qualified by further characters. |
"the <b>/i</b> modifier", even though the delimiter of the pattern need not | Following Perl usage, these are referred to below as, for example, "the |
always be a slash, and no slash is used when writing modifiers. White space may | <b>/i</b> modifier", even though the delimiter of the pattern need not always be |
appear between the final pattern delimiter and the first modifier, and between | a slash, and no slash is used when writing modifiers. White space may appear |
the modifiers themselves. | between the final pattern delimiter and the first modifier, and between the |
| modifiers themselves. For reference, here is a complete list of modifiers. They |
| fall into several groups that are described in detail in the following |
| sections. |
| <pre> |
| <b>/8</b> set UTF mode |
| <b>/9</b> set PCRE_NEVER_UTF (locks out UTF mode) |
| <b>/?</b> disable UTF validity check |
| <b>/+</b> show remainder of subject after match |
| <b>/=</b> show all captures (not just those that are set) |
| |
| <b>/A</b> set PCRE_ANCHORED |
| <b>/B</b> show compiled code |
| <b>/C</b> set PCRE_AUTO_CALLOUT |
| <b>/D</b> same as <b>/B</b> plus <b>/I</b> |
| <b>/E</b> set PCRE_DOLLAR_ENDONLY |
| <b>/F</b> flip byte order in compiled pattern |
| <b>/f</b> set PCRE_FIRSTLINE |
| <b>/G</b> find all matches (shorten string) |
| <b>/g</b> find all matches (use startoffset) |
| <b>/I</b> show information about pattern |
| <b>/i</b> set PCRE_CASELESS |
| <b>/J</b> set PCRE_DUPNAMES |
| <b>/K</b> show backtracking control names |
| <b>/L</b> set locale |
| <b>/M</b> show compiled memory size |
| <b>/m</b> set PCRE_MULTILINE |
| <b>/N</b> set PCRE_NO_AUTO_CAPTURE |
| <b>/P</b> use the POSIX wrapper |
| <b>/S</b> study the pattern after compilation |
| <b>/s</b> set PCRE_DOTALL |
| <b>/T</b> select character tables |
| <b>/U</b> set PCRE_UNGREEDY |
| <b>/W</b> set PCRE_UCP |
| <b>/X</b> set PCRE_EXTRA |
| <b>/x</b> set PCRE_EXTENDED |
| <b>/Y</b> set PCRE_NO_START_OPTIMIZE |
| <b>/Z</b> don't show lengths in <b>/B</b> output |
| |
| <b>/<any></b> set PCRE_NEWLINE_ANY |
| <b>/<anycrlf></b> set PCRE_NEWLINE_ANYCRLF |
| <b>/<cr></b> set PCRE_NEWLINE_CR |
| <b>/<crlf></b> set PCRE_NEWLINE_CRLF |
| <b>/<lf></b> set PCRE_NEWLINE_LF |
| <b>/<bsr_anycrlf></b> set PCRE_BSR_ANYCRLF |
| <b>/<bsr_unicode></b> set PCRE_BSR_UNICODE |
| <b>/<JS></b> set PCRE_JAVASCRIPT_COMPAT |
| |
| </PRE> |
</P> |
</P> |
|
<br><b> |
|
Perl-compatible modifiers |
|
</b><br> |
<P> |
<P> |
The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS, |
The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS, |
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when |
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when |
<b>pcre_compile()</b> is called. These four modifier letters have the same | <b>pcre[16|32]_compile()</b> is called. These four modifier letters have the same |
effect as they do in Perl. For example: |
effect as they do in Perl. For example: |
<pre> |
<pre> |
/caseless/i |
/caseless/i |
</pre> | |
| </PRE> |
| </P> |
| <br><b> |
| Modifiers for other PCRE options |
| </b><br> |
| <P> |
The following table shows additional modifiers for setting PCRE compile-time |
The following table shows additional modifiers for setting PCRE compile-time |
options that do not correspond to anything in Perl: |
options that do not correspond to anything in Perl: |
<pre> |
<pre> |
<b>/8</b> PCRE_UTF8 | <b>/8</b> PCRE_UTF8 ) when using the 8-bit |
<b>/?</b> PCRE_NO_UTF8_CHECK | <b>/?</b> PCRE_NO_UTF8_CHECK ) library |
| |
| <b>/8</b> PCRE_UTF16 ) when using the 16-bit |
| <b>/?</b> PCRE_NO_UTF16_CHECK ) library |
| |
| <b>/8</b> PCRE_UTF32 ) when using the 32-bit |
| <b>/?</b> PCRE_NO_UTF32_CHECK ) library |
| |
| <b>/9</b> PCRE_NEVER_UTF |
<b>/A</b> PCRE_ANCHORED |
<b>/A</b> PCRE_ANCHORED |
<b>/C</b> PCRE_AUTO_CALLOUT |
<b>/C</b> PCRE_AUTO_CALLOUT |
<b>/E</b> PCRE_DOLLAR_ENDONLY |
<b>/E</b> PCRE_DOLLAR_ENDONLY |
Line 239 options that do not correspond to anything in Perl:
|
Line 423 options that do not correspond to anything in Perl:
|
<b>/W</b> PCRE_UCP |
<b>/W</b> PCRE_UCP |
<b>/X</b> PCRE_EXTRA |
<b>/X</b> PCRE_EXTRA |
<b>/Y</b> PCRE_NO_START_OPTIMIZE |
<b>/Y</b> PCRE_NO_START_OPTIMIZE |
<b>/<JS></b> PCRE_JAVASCRIPT_COMPAT | <b>/<any></b> PCRE_NEWLINE_ANY |
| <b>/<anycrlf></b> PCRE_NEWLINE_ANYCRLF |
<b>/<cr></b> PCRE_NEWLINE_CR |
<b>/<cr></b> PCRE_NEWLINE_CR |
<b>/<lf></b> PCRE_NEWLINE_LF |
|
<b>/<crlf></b> PCRE_NEWLINE_CRLF |
<b>/<crlf></b> PCRE_NEWLINE_CRLF |
<b>/<anycrlf></b> PCRE_NEWLINE_ANYCRLF | <b>/<lf></b> PCRE_NEWLINE_LF |
<b>/<any></b> PCRE_NEWLINE_ANY | |
<b>/<bsr_anycrlf></b> PCRE_BSR_ANYCRLF |
<b>/<bsr_anycrlf></b> PCRE_BSR_ANYCRLF |
<b>/<bsr_unicode></b> PCRE_BSR_UNICODE |
<b>/<bsr_unicode></b> PCRE_BSR_UNICODE |
|
<b>/<JS></b> PCRE_JAVASCRIPT_COMPAT |
</pre> |
</pre> |
The modifiers that are enclosed in angle brackets are literal strings as shown, |
The modifiers that are enclosed in angle brackets are literal strings as shown, |
including the angle brackets, but the letters within can be in either case. |
including the angle brackets, but the letters within can be in either case. |
Line 254 This example sets multiline matching with CRLF as the
|
Line 438 This example sets multiline matching with CRLF as the
|
<pre> |
<pre> |
/^abc/m<CRLF> |
/^abc/m<CRLF> |
</pre> |
</pre> |
As well as turning on the PCRE_UTF8 option, the <b>/8</b> modifier also causes | As well as turning on the PCRE_UTF8/16/32 option, the <b>/8</b> modifier causes |
any non-printing characters in output strings to be printed using the | all non-printing characters in output strings to be printed using the |
\x{hh...} notation if they are valid UTF-8 sequences. Full details of the PCRE | \x{hh...} notation. Otherwise, those less than 0x100 are output in hex without |
options are given in the | the curly brackets. |
| </P> |
| <P> |
| Full details of the PCRE options are given in the |
<a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
documentation. |
documentation. |
</P> |
</P> |
Line 269 Searching for all possible matches within each subject
|
Line 456 Searching for all possible matches within each subject
|
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called |
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called |
again to search the remainder of the subject string. The difference between |
again to search the remainder of the subject string. The difference between |
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to |
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to |
<b>pcre_exec()</b> to start searching at a new point within the entire string | <b>pcre[16|32]_exec()</b> to start searching at a new point within the entire |
(which is in effect what Perl does), whereas the latter passes over a shortened | string (which is in effect what Perl does), whereas the latter passes over a |
substring. This makes a difference to the matching process if the pattern | shortened substring. This makes a difference to the matching process if the |
begins with a lookbehind assertion (including \b or \B). | pattern begins with a lookbehind assertion (including \b or \B). |
</P> |
</P> |
<P> |
<P> |
If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an | If any call to <b>pcre[16|32]_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches |
empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and | an empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
PCRE_ANCHORED flags set in order to search for another, non-empty, match at the |
PCRE_ANCHORED flags set in order to search for another, non-empty, match at the |
same point. If this second match fails, the start offset is advanced, and the |
same point. If this second match fails, the start offset is advanced, and the |
normal match is retried. This imitates the way Perl handles such cases when |
normal match is retried. This imitates the way Perl handles such cases when |
Line 300 contains multiple copies of the same substring. If the
|
Line 487 contains multiple copies of the same substring. If the
|
twice, the same action is taken for captured substrings. In each case the |
twice, the same action is taken for captured substrings. In each case the |
remainder is output on the following line with a plus character following the |
remainder is output on the following line with a plus character following the |
capture number. Note that this modifier must not immediately follow the /S |
capture number. Note that this modifier must not immediately follow the /S |
modifier because /S+ has another meaning. | modifier because /S+ and /S++ have other meanings. |
</P> |
</P> |
<P> |
<P> |
The <b>/=</b> modifier requests that the values of all potential captured |
The <b>/=</b> modifier requests that the values of all potential captured |
parentheses be output after a match by <b>pcre_exec()</b>. By default, only | parentheses be output after a match. By default, only those up to the highest |
those up to the highest one actually used in the match are output | one actually used in the match are output (corresponding to the return code |
(corresponding to the return code from <b>pcre_exec()</b>). Values in the | from <b>pcre[16|32]_exec()</b>). Values in the offsets vector corresponding to |
offsets vector corresponding to higher numbers should be set to -1, and these | higher numbers should be set to -1, and these are output as "<unset>". This |
are output as "<unset>". This modifier gives a way of checking that this is | modifier gives a way of checking that this is happening. |
happening. | |
</P> |
</P> |
<P> |
<P> |
The <b>/B</b> modifier is a debugging feature. It requests that <b>pcretest</b> |
The <b>/B</b> modifier is a debugging feature. It requests that <b>pcretest</b> |
output a representation of the compiled byte code after compilation. Normally | output a representation of the compiled code after compilation. Normally this |
this information contains length and offset values; however, if <b>/Z</b> is | information contains length and offset values; however, if <b>/Z</b> is also |
also present, this data is replaced by spaces. This is a special feature for | present, this data is replaced by spaces. This is a special feature for use in |
use in the automatic test scripts; it ensures that the same output is generated | the automatic test scripts; it ensures that the same output is generated for |
for different internal link sizes. | different internal link sizes. |
</P> |
</P> |
<P> |
<P> |
The <b>/D</b> modifier is a PCRE debugging feature, and is equivalent to |
The <b>/D</b> modifier is a PCRE debugging feature, and is equivalent to |
Line 325 The <b>/D</b> modifier is a PCRE debugging feature, an
|
Line 511 The <b>/D</b> modifier is a PCRE debugging feature, an
|
</P> |
</P> |
<P> |
<P> |
The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the |
The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the |
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This | 2-byte and 4-byte fields in the compiled pattern. This facility is for testing |
facility is for testing the feature in PCRE that allows it to execute patterns | the feature in PCRE that allows it to execute patterns that were compiled on a |
that were compiled on a host with a different endianness. This feature is not | host with a different endianness. This feature is not available when the POSIX |
available when the POSIX interface to PCRE is being used, that is, when the | interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
<b>/P</b> pattern modifier is specified. See also the section about saving and | specified. See also the section about saving and reloading compiled patterns |
reloading compiled patterns below. | below. |
</P> |
</P> |
<P> |
<P> |
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the |
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the |
compiled pattern (whether it is anchored, has a fixed first character, and |
compiled pattern (whether it is anchored, has a fixed first character, and |
so on). It does this by calling <b>pcre_fullinfo()</b> after compiling a | so on). It does this by calling <b>pcre[16|32]_fullinfo()</b> after compiling a |
pattern. If the pattern is studied, the results of that are also output. |
pattern. If the pattern is studied, the results of that are also output. |
</P> |
</P> |
<P> |
<P> |
The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking |
The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking |
control verbs that are returned from calls to <b>pcre_exec()</b>. It causes | control verbs that are returned from calls to <b>pcre[16|32]_exec()</b>. It causes |
<b>pcretest</b> to create a <b>pcre_extra</b> block if one has not already been | <b>pcretest</b> to create a <b>pcre[16|32]_extra</b> block if one has not already |
created by a call to <b>pcre_study()</b>, and to set the PCRE_EXTRA_MARK flag | been created by a call to <b>pcre[16|32]_study()</b>, and to set the |
and the <b>mark</b> field within it, every time that <b>pcre_exec()</b> is | PCRE_EXTRA_MARK flag and the <b>mark</b> field within it, every time that |
called. If the variable that the <b>mark</b> field points to is non-NULL for a | <b>pcre[16|32]_exec()</b> is called. If the variable that the <b>mark</b> field |
match, non-match, or partial match, <b>pcretest</b> prints the string to which | points to is non-NULL for a match, non-match, or partial match, <b>pcretest</b> |
it points. For a match, this is shown on a line by itself, tagged with "MK:". | prints the string to which it points. For a match, this is shown on a line by |
For a non-match it is added to the message. | itself, tagged with "MK:". For a non-match it is added to the message. |
</P> |
</P> |
<P> |
<P> |
The <b>/L</b> modifier must be followed directly by the name of a locale, for |
The <b>/L</b> modifier must be followed directly by the name of a locale, for |
Line 356 example,
|
Line 542 example,
|
/pattern/Lfr_FR |
/pattern/Lfr_FR |
</pre> |
</pre> |
For this reason, it must be the last modifier. The given locale is set, |
For this reason, it must be the last modifier. The given locale is set, |
<b>pcre_maketables()</b> is called to build a set of character tables for the | <b>pcre[16|32]_maketables()</b> is called to build a set of character tables for |
locale, and this is then passed to <b>pcre_compile()</b> when compiling the | the locale, and this is then passed to <b>pcre[16|32]_compile()</b> when compiling |
regular expression. Without an <b>/L</b> (or <b>/T</b>) modifier, NULL is passed | the regular expression. Without an <b>/L</b> (or <b>/T</b>) modifier, NULL is |
as the tables pointer; that is, <b>/L</b> applies only to the expression on | passed as the tables pointer; that is, <b>/L</b> applies only to the expression |
which it appears. | on which it appears. |
</P> |
</P> |
<P> |
<P> |
The <b>/M</b> modifier causes the size of memory block used to hold the compiled | The <b>/M</b> modifier causes the size in bytes of the memory block used to hold |
pattern to be output. This does not include the size of the <b>pcre</b> block; | the compiled pattern to be output. This does not include the size of the |
it is just the actual compiled data. If the pattern is successfully studied | <b>pcre[16|32]</b> block; it is just the actual compiled data. If the pattern is |
with the PCRE_STUDY_JIT_COMPILE option, the size of the JIT compiled code is | successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the |
also output. | JIT compiled code is also output. |
</P> |
</P> |
<P> |
<P> |
If the <b>/S</b> modifier appears once, it causes <b>pcre_study()</b> to be | The <b>/S</b> modifier causes <b>pcre[16|32]_study()</b> to be called after the |
called after the expression has been compiled, and the results used when the | expression has been compiled, and the results used when the expression is |
expression is matched. If <b>/S</b> appears twice, it suppresses studying, even | matched. There are a number of qualifying characters that may follow <b>/S</b>. |
| They may appear in any order. |
| </P> |
| <P> |
| If <b>S</b> is followed by an exclamation mark, <b>pcre[16|32]_study()</b> is called |
| with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a |
| <b>pcre_extra</b> block, even when studying discovers no useful information. |
| </P> |
| <P> |
| If <b>/S</b> is followed by a second S character, it suppresses studying, even |
if it was requested externally by the <b>-s</b> command line option. This makes |
if it was requested externally by the <b>-s</b> command line option. This makes |
it possible to specify that certain patterns are always studied, and others are |
it possible to specify that certain patterns are always studied, and others are |
never studied, independently of <b>-s</b>. This feature is used in the test |
never studied, independently of <b>-s</b>. This feature is used in the test |
files in a few cases where the output is different when the pattern is studied. |
files in a few cases where the output is different when the pattern is studied. |
</P> |
</P> |
<P> |
<P> |
If the <b>/S</b> modifier is immediately followed by a + character, the call to | If the <b>/S</b> modifier is followed by a + character, the call to |
<b>pcre_study()</b> is made with the PCRE_STUDY_JIT_COMPILE option, requesting | <b>pcre[16|32]_study()</b> is made with all the JIT study options, requesting |
just-in-time optimization support if it is available. Note that there is also a | just-in-time optimization support if it is available, for both normal and |
<b>/+</b> modifier; it must not be given immediately after <b>/S</b> because this | partial matching. If you want to restrict the JIT compiling modes, you can |
will be misinterpreted. If JIT studying is successful, it will automatically be | follow <b>/S+</b> with a digit in the range 1 to 7: |
used when <b>pcre_exec()</b> is run, except when incompatible run-time options | <pre> |
are specified. These include the partial matching options; a complete list is | 1 normal match only |
given in the | 2 soft partial match only |
| 3 normal match and soft partial match |
| 4 hard partial match only |
| 6 soft and hard partial match |
| 7 all three modes (default) |
| </pre> |
| If <b>/S++</b> is used instead of <b>/S+</b> (with or without a following digit), |
| the text "(JIT)" is added to the first output line after a match or no match |
| when JIT-compiled code was actually used. |
| </P> |
| <P> |
| Note that there is also an independent <b>/+</b> modifier; it must not be given |
| immediately after <b>/S</b> or <b>/S+</b> because this will be misinterpreted. |
| </P> |
| <P> |
| If JIT studying is successful, the compiled JIT code will automatically be used |
| when <b>pcre[16|32]_exec()</b> is run, except when incompatible run-time options |
| are specified. For more details, see the |
<a href="pcrejit.html"><b>pcrejit</b></a> |
<a href="pcrejit.html"><b>pcrejit</b></a> |
documentation. See also the <b>\J</b> escape sequence below for a way of |
documentation. See also the <b>\J</b> escape sequence below for a way of |
setting the size of the JIT stack. |
setting the size of the JIT stack. |
</P> |
</P> |
<P> |
<P> |
|
Finally, if <b>/S</b> is followed by a minus character, JIT compilation is |
|
suppressed, even if it was requested externally by the <b>-s</b> command line |
|
option. This makes it possible to specify that JIT is never to be used for |
|
certain patterns. |
|
</P> |
|
<P> |
The <b>/T</b> modifier must be followed by a single digit. It causes a specific |
The <b>/T</b> modifier must be followed by a single digit. It causes a specific |
set of built-in character tables to be passed to <b>pcre_compile()</b>. It is | set of built-in character tables to be passed to <b>pcre[16|32]_compile()</b>. It |
used in the standard PCRE tests to check behaviour with different character | is used in the standard PCRE tests to check behaviour with different character |
tables. The digit specifies the tables as follows: |
tables. The digit specifies the tables as follows: |
<pre> |
<pre> |
0 the default ASCII tables, as distributed in |
0 the default ASCII tables, as distributed in |
Line 409 Using the POSIX wrapper API
|
Line 627 Using the POSIX wrapper API
|
</b><br> |
</b><br> |
<P> |
<P> |
The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper |
The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper |
API rather than its native API. When <b>/P</b> is set, the following modifiers | API rather than its native API. This supports only the 8-bit library. When |
set options for the <b>regcomp()</b> function: | <b>/P</b> is set, the following modifiers set options for the <b>regcomp()</b> |
| function: |
<pre> |
<pre> |
/i REG_ICASE |
/i REG_ICASE |
/m REG_NEWLINE |
/m REG_NEWLINE |
Line 423 set options for the <b>regcomp()</b> function:
|
Line 642 set options for the <b>regcomp()</b> function:
|
The <b>/+</b> modifier works as described above. All other modifiers are |
The <b>/+</b> modifier works as described above. All other modifiers are |
ignored. |
ignored. |
</P> |
</P> |
<br><a name="SEC5" href="#TOC1">DATA LINES</a><br> | <br><a name="SEC7" href="#TOC1">DATA LINES</a><br> |
<P> |
<P> |
Before each data line is passed to <b>pcre_exec()</b>, leading and trailing | Before each data line is passed to <b>pcre[16|32]_exec()</b>, leading and trailing |
white space is removed, and it is then scanned for \ escapes. Some of these |
white space is removed, and it is then scanned for \ escapes. Some of these |
are pretty esoteric features, intended for checking out some of the more |
are pretty esoteric features, intended for checking out some of the more |
complicated features of PCRE. If you are just testing "ordinary" regular |
complicated features of PCRE. If you are just testing "ordinary" regular |
Line 441 recognized:
|
Line 660 recognized:
|
\r carriage return (\x0d) |
\r carriage return (\x0d) |
\t tab (\x09) |
\t tab (\x09) |
\v vertical tab (\x0b) |
\v vertical tab (\x0b) |
\nnn octal character (up to 3 octal digits) | \nnn octal character (up to 3 octal digits); always |
always a byte unless > 255 in UTF-8 mode | a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode |
\xhh hexadecimal byte (up to 2 hex digits) |
\xhh hexadecimal byte (up to 2 hex digits) |
\x{hh...} hexadecimal character, any number of digits in UTF-8 mode | \x{hh...} hexadecimal character (any number of hex digits) |
\A pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \A pass the PCRE_ANCHORED option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\B pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \B pass the PCRE_NOTBOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\Cdd call pcre_copy_substring() for substring dd after a successful match (number less than 32) | \Cdd call pcre[16|32]_copy_substring() for substring dd after a successful match (number less than 32) |
\Cname call pcre_copy_named_substring() for substring "name" after a successful match (name termin- | \Cname call pcre[16|32]_copy_named_substring() for substring "name" after a successful match (name termin- |
ated by next non alphanumeric character) |
ated by next non alphanumeric character) |
\C+ show the current captured substrings at callout time |
\C+ show the current captured substrings at callout time |
\C- do not supply a callout function |
\C- do not supply a callout function |
\C!n return 1 instead of 0 when callout number n is reached |
\C!n return 1 instead of 0 when callout number n is reached |
\C!n!m return 1 instead of 0 when callout number n is reached for the nth time |
\C!n!m return 1 instead of 0 when callout number n is reached for the nth time |
\C*n pass the number n (may be negative) as callout data; this is used as the callout return value |
\C*n pass the number n (may be negative) as callout data; this is used as the callout return value |
\D use the <b>pcre_dfa_exec()</b> match function | \D use the <b>pcre[16|32]_dfa_exec()</b> match function |
\F only shortest match for <b>pcre_dfa_exec()</b> | \F only shortest match for <b>pcre[16|32]_dfa_exec()</b> |
\Gdd call pcre_get_substring() for substring dd after a successful match (number less than 32) | \Gdd call pcre[16|32]_get_substring() for substring dd after a successful match (number less than 32) |
\Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin- | \Gname call pcre[16|32]_get_named_substring() for substring "name" after a successful match (name termin- |
ated by next non-alphanumeric character) |
ated by next non-alphanumeric character) |
\Jdd set up a JIT stack of dd kilobytes maximum (any number of digits) |
\Jdd set up a JIT stack of dd kilobytes maximum (any number of digits) |
\L call pcre_get_substringlist() after a successful match | \L call pcre[16|32]_get_substringlist() after a successful match |
\M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings |
\M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings |
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>; if used twice, pass the | \N pass the PCRE_NOTEMPTY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the |
PCRE_NOTEMPTY_ATSTART option |
PCRE_NOTEMPTY_ATSTART option |
\Odd set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits) | \Odd set the size of the output vector passed to <b>pcre[16|32]_exec()</b> to dd (any number of digits) |
\P pass the PCRE_PARTIAL_SOFT option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>; if used twice, pass the | \P pass the PCRE_PARTIAL_SOFT option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the |
PCRE_PARTIAL_HARD option |
PCRE_PARTIAL_HARD option |
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits) |
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits) |
\R pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b> | \R pass the PCRE_DFA_RESTART option to <b>pcre[16|32]_dfa_exec()</b> |
\S output details of memory get/free calls during matching |
\S output details of memory get/free calls during matching |
\Y pass the PCRE_NO_START_OPTIMIZE option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \Y pass the PCRE_NO_START_OPTIMIZE option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \Z pass the PCRE_NOTEOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\? pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \? pass the PCRE_NO_UTF[8|16|32]_CHECK option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\>dd start the match at offset dd (optional "-"; then any number of digits); this sets the <i>startoffset</i> |
\>dd start the match at offset dd (optional "-"; then any number of digits); this sets the <i>startoffset</i> |
argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | argument for <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\<cr> pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<cr> pass the PCRE_NEWLINE_CR option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\<lf> pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<lf> pass the PCRE_NEWLINE_LF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\<crlf> pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<crlf> pass the PCRE_NEWLINE_CRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
\<any> pass the PCRE_NEWLINE_ANY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<any> pass the PCRE_NEWLINE_ANY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
</pre> |
</pre> |
Note that \xhh always specifies one byte, even in UTF-8 mode; this makes it | The use of \x{hh...} is not dependent on the use of the <b>/8</b> modifier on |
possible to construct invalid UTF-8 sequences for testing purposes. On the | the pattern. It is recognized always. There may be any number of hexadecimal |
other hand, \x{hh} is interpreted as a UTF-8 character in UTF-8 mode, | digits inside the braces; invalid values provoke error messages. |
generating more than one byte if the value is greater than 127. When not in | |
UTF-8 mode, it generates one byte for values less than 256, and causes an error | |
for greater values. | |
</P> |
</P> |
<P> |
<P> |
|
Note that \xhh specifies one byte rather than one character in UTF-8 mode; |
|
this makes it possible to construct invalid UTF-8 sequences for testing |
|
purposes. On the other hand, \x{hh} is interpreted as a UTF-8 character in |
|
UTF-8 mode, generating more than one byte if the value is greater than 127. |
|
When testing the 8-bit library not in UTF-8 mode, \x{hh} generates one byte |
|
for values less than 256, and causes an error for greater values. |
|
</P> |
|
<P> |
|
In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it |
|
possible to construct invalid UTF-16 sequences for testing purposes. |
|
</P> |
|
<P> |
|
In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it |
|
possible to construct invalid UTF-32 sequences for testing purposes. |
|
</P> |
|
<P> |
The escapes that specify line ending sequences are literal strings, exactly as |
The escapes that specify line ending sequences are literal strings, exactly as |
shown. No more than one newline setting should be present in any data line. |
shown. No more than one newline setting should be present in any data line. |
</P> |
</P> |
Line 506 is not being used. Providing a stack that is larger th
|
Line 738 is not being used. Providing a stack that is larger th
|
necessary only for very complicated patterns. |
necessary only for very complicated patterns. |
</P> |
</P> |
<P> |
<P> |
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with | If \M is present, <b>pcretest</b> calls <b>pcre[16|32]_exec()</b> several times, |
different values in the <i>match_limit</i> and <i>match_limit_recursion</i> | with different values in the <i>match_limit</i> and <i>match_limit_recursion</i> |
fields of the <b>pcre_extra</b> data structure, until it finds the minimum | fields of the <b>pcre[16|32]_extra</b> data structure, until it finds the minimum |
numbers for each parameter that allow <b>pcre_exec()</b> to complete without | numbers for each parameter that allow <b>pcre[16|32]_exec()</b> to complete without |
error. Because this is testing a specific feature of the normal interpretive |
error. Because this is testing a specific feature of the normal interpretive |
<b>pcre_exec()</b> execution, the use of any JIT optimization that might have | <b>pcre[16|32]_exec()</b> execution, the use of any JIT optimization that might |
been set up by the <b>/S+</b> qualifier of <b>-s+</b> option is disabled. | have been set up by the <b>/S+</b> qualifier of <b>-s+</b> option is disabled. |
</P> |
</P> |
<P> |
<P> |
The <i>match_limit</i> number is a measure of the amount of backtracking |
The <i>match_limit</i> number is a measure of the amount of backtracking |
Line 526 needed to complete the match attempt.
|
Line 758 needed to complete the match attempt.
|
<P> |
<P> |
When \O is used, the value specified may be higher or lower than the size set |
When \O is used, the value specified may be higher or lower than the size set |
by the <b>-O</b> command line option (or defaulted to 45); \O applies only to |
by the <b>-O</b> command line option (or defaulted to 45); \O applies only to |
the call of <b>pcre_exec()</b> for the line in which it appears. | the call of <b>pcre[16|32]_exec()</b> for the line in which it appears. |
</P> |
</P> |
<P> |
<P> |
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper |
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper |
Line 534 API to be used, the only option-setting sequences that
|
Line 766 API to be used, the only option-setting sequences that
|
\N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, |
\N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, |
to be passed to <b>regexec()</b>. |
to be passed to <b>regexec()</b>. |
</P> |
</P> |
|
<br><a name="SEC8" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br> |
<P> |
<P> |
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use |
|
of the <b>/8</b> modifier on the pattern. It is recognized always. There may be |
|
any number of hexadecimal digits inside the braces. The result is from one to |
|
six bytes, encoded according to the original UTF-8 rules of RFC 2279. This |
|
allows for values in the range 0 to 0x7FFFFFFF. Note that not all of those are |
|
valid Unicode code points, or indeed valid UTF-8 characters according to the |
|
later rules in RFC 3629. |
|
</P> |
|
<br><a name="SEC6" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br> |
|
<P> |
|
By default, <b>pcretest</b> uses the standard PCRE matching function, |
By default, <b>pcretest</b> uses the standard PCRE matching function, |
<b>pcre_exec()</b> to match each data line. From release 6.0, PCRE supports an | <b>pcre[16|32]_exec()</b> to match each data line. PCRE also supports an |
alternative matching function, <b>pcre_dfa_test()</b>, which operates in a | alternative matching function, <b>pcre[16|32]_dfa_test()</b>, which operates in a |
different way, and has some restrictions. The differences between the two |
different way, and has some restrictions. The differences between the two |
functions are described in the |
functions are described in the |
<a href="pcrematching.html"><b>pcrematching</b></a> |
<a href="pcrematching.html"><b>pcrematching</b></a> |
Line 555 documentation.
|
Line 778 documentation.
|
</P> |
</P> |
<P> |
<P> |
If a data line contains the \D escape sequence, or if the command line |
If a data line contains the \D escape sequence, or if the command line |
contains the <b>-dfa</b> option, the alternative matching function is called. | contains the <b>-dfa</b> option, the alternative matching function is used. |
This function finds all possible matches at a given point. If, however, the \F |
This function finds all possible matches at a given point. If, however, the \F |
escape sequence is present in the data line, it stops after the first match is |
escape sequence is present in the data line, it stops after the first match is |
found. This is always the shortest possible match. |
found. This is always the shortest possible match. |
</P> |
</P> |
<br><a name="SEC7" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br> | <br><a name="SEC9" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br> |
<P> |
<P> |
This section describes the output when the normal matching function, |
This section describes the output when the normal matching function, |
<b>pcre_exec()</b>, is being used. | <b>pcre[16|32]_exec()</b>, is being used. |
</P> |
</P> |
<P> |
<P> |
When a match succeeds, <b>pcretest</b> outputs the list of captured substrings |
When a match succeeds, <b>pcretest</b> outputs the list of captured substrings |
that <b>pcre_exec()</b> returns, starting with number 0 for the string that | that <b>pcre[16|32]_exec()</b> returns, starting with number 0 for the string that |
matched the whole pattern. Otherwise, it outputs "No match" when the return is |
matched the whole pattern. Otherwise, it outputs "No match" when the return is |
PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching |
PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching |
substring when <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that this is | substring when <b>pcre[16|32]_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that |
the entire substring that was inspected during the partial match; it may | this is the entire substring that was inspected during the partial match; it |
include characters before the actual match start if a lookbehind assertion, | may include characters before the actual match start if a lookbehind assertion, |
\K, \b, or \B was involved.) For any other return, <b>pcretest</b> outputs |
\K, \b, or \B was involved.) For any other return, <b>pcretest</b> outputs |
the PCRE negative error number and a short descriptive phrase. If the error is |
the PCRE negative error number and a short descriptive phrase. If the error is |
a failed UTF-8 string check, the byte offset of the start of the failing | a failed UTF string check, the offset of the start of the failing character and |
character and the reason code are also output, provided that the size of the | the reason code are also output, provided that the size of the output vector is |
output vector is at least two. Here is an example of an interactive | at least two. Here is an example of an interactive <b>pcretest</b> run. |
<b>pcretest</b> run. | |
<pre> |
<pre> |
$ pcretest |
$ pcretest |
PCRE version 8.13 2011-04-30 |
PCRE version 8.13 2011-04-30 |
Line 591 output vector is at least two. Here is an example of a
|
Line 813 output vector is at least two. Here is an example of a
|
No match |
No match |
</pre> |
</pre> |
Unset capturing substrings that are not followed by one that is set are not |
Unset capturing substrings that are not followed by one that is set are not |
returned by <b>pcre_exec()</b>, and are not shown by <b>pcretest</b>. In the | returned by <b>pcre[16|32]_exec()</b>, and are not shown by <b>pcretest</b>. In the |
following example, there are two capturing substrings, but when the first data |
following example, there are two capturing substrings, but when the first data |
line is matched, the second, unset substring is not shown. An "internal" unset |
line is matched, the second, unset substring is not shown. An "internal" unset |
substring is shown as "<unset>", as for the second data line. |
substring is shown as "<unset>", as for the second data line. |
Line 605 substring is shown as "<unset>", as for the se
|
Line 827 substring is shown as "<unset>", as for the se
|
1: <unset> |
1: <unset> |
2: b |
2: b |
</pre> |
</pre> |
If the strings contain any non-printing characters, they are output as \0x | If the strings contain any non-printing characters, they are output as \xhh |
escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the | escapes if the value is less than 256 and UTF mode is not set. Otherwise they |
pattern. See below for the definition of non-printing characters. If the | are output as \x{hh...} escapes. See below for the definition of non-printing |
pattern has the <b>/+</b> modifier, the output for substring 0 is followed by | characters. If the pattern has the <b>/+</b> modifier, the output for substring |
the the rest of the subject string, identified by "0+" like this: | 0 is followed by the the rest of the subject string, identified by "0+" like |
| this: |
<pre> |
<pre> |
re> /cat/+ |
re> /cat/+ |
data> cataract |
data> cataract |
Line 651 prompt is used for continuations), data lines may not.
|
Line 874 prompt is used for continuations), data lines may not.
|
included in data by means of the \n escape (or \r, \r\n, etc., depending on |
included in data by means of the \n escape (or \r, \r\n, etc., depending on |
the newline sequence setting). |
the newline sequence setting). |
</P> |
</P> |
<br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br> | <br><a name="SEC10" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br> |
<P> |
<P> |
When the alternative matching function, <b>pcre_dfa_exec()</b>, is used (by | When the alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, is used (by |
means of the \D escape sequence or the <b>-dfa</b> command line option), the |
means of the \D escape sequence or the <b>-dfa</b> command line option), the |
output consists of a list of all the matches that start at the first point in |
output consists of a list of all the matches that start at the first point in |
the subject where there is at least one match. For example: |
the subject where there is at least one match. For example: |
Line 687 at the end of the longest match. For example:
|
Line 910 at the end of the longest match. For example:
|
Since the matching function does not support substring capture, the escape |
Since the matching function does not support substring capture, the escape |
sequences that are concerned with captured substrings are not relevant. |
sequences that are concerned with captured substrings are not relevant. |
</P> |
</P> |
<br><a name="SEC9" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br> | <br><a name="SEC11" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br> |
<P> |
<P> |
When the alternative matching function has given the PCRE_ERROR_PARTIAL return, |
When the alternative matching function has given the PCRE_ERROR_PARTIAL return, |
indicating that the subject partially matched the pattern, you can restart the |
indicating that the subject partially matched the pattern, you can restart the |
Line 704 For further information about partial matching, see th
|
Line 927 For further information about partial matching, see th
|
<a href="pcrepartial.html"><b>pcrepartial</b></a> |
<a href="pcrepartial.html"><b>pcrepartial</b></a> |
documentation. |
documentation. |
</P> |
</P> |
<br><a name="SEC10" href="#TOC1">CALLOUTS</a><br> | <br><a name="SEC12" href="#TOC1">CALLOUTS</a><br> |
<P> |
<P> |
If the pattern contains any callout requests, <b>pcretest</b>'s callout function |
If the pattern contains any callout requests, <b>pcretest</b>'s callout function |
is called during matching. This works with both matching functions. By default, |
is called during matching. This works with both matching functions. By default, |
the called function displays the callout number, the start and current |
the called function displays the callout number, the start and current |
positions in the text at the callout time, and the next pattern item to be |
positions in the text at the callout time, and the next pattern item to be |
tested. For example, the output | tested. For example: |
<pre> |
<pre> |
--->pqrabcdef |
--->pqrabcdef |
0 ^ ^ \d |
0 ^ ^ \d |
</pre> |
</pre> |
indicates that callout number 0 occurred for a match attempt starting at the | This output indicates that callout number 0 occurred for a match attempt |
fourth character of the subject string, when the pointer was at the seventh | starting at the fourth character of the subject string, when the pointer was at |
character of the data, and when the next pattern item was \d. Just one | the seventh character of the data, and when the next pattern item was \d. Just |
circumflex is output if the start and current positions are the same. | one circumflex is output if the start and current positions are the same. |
</P> |
</P> |
<P> |
<P> |
Callouts numbered 255 are assumed to be automatic callouts, inserted as a |
Callouts numbered 255 are assumed to be automatic callouts, inserted as a |
Line 765 the
|
Line 988 the
|
<a href="pcrecallout.html"><b>pcrecallout</b></a> |
<a href="pcrecallout.html"><b>pcrecallout</b></a> |
documentation. |
documentation. |
</P> |
</P> |
<br><a name="SEC11" href="#TOC1">NON-PRINTING CHARACTERS</a><br> | <br><a name="SEC13" href="#TOC1">NON-PRINTING CHARACTERS</a><br> |
<P> |
<P> |
When <b>pcretest</b> is outputting text in the compiled version of a pattern, |
When <b>pcretest</b> is outputting text in the compiled version of a pattern, |
bytes other than 32-126 are always treated as non-printing characters are are |
bytes other than 32-126 are always treated as non-printing characters are are |
Line 777 string, it behaves in the same way, unless a different
|
Line 1000 string, it behaves in the same way, unless a different
|
the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b> |
the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b> |
function to distinguish printing and non-printing characters. |
function to distinguish printing and non-printing characters. |
</P> |
</P> |
<br><a name="SEC12" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br> | <br><a name="SEC14" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br> |
<P> |
<P> |
The facilities described in this section are not available when the POSIX |
The facilities described in this section are not available when the POSIX |
interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
Line 825 been loaded, <b>pcretest</b> proceeds to read data lin
|
Line 1048 been loaded, <b>pcretest</b> proceeds to read data lin
|
You can copy a file written by <b>pcretest</b> to a different host and reload it |
You can copy a file written by <b>pcretest</b> to a different host and reload it |
there, even if the new host has opposite endianness to the one on which the |
there, even if the new host has opposite endianness to the one on which the |
pattern was compiled. For example, you can compile on an i86 machine and run on |
pattern was compiled. For example, you can compile on an i86 machine and run on |
a SPARC machine. | a SPARC machine. When a pattern is reloaded on a host with different |
| endianness, the confirmation message is changed to: |
| <pre> |
| Compiled pattern (byte-inverted) loaded from /some/file |
| </pre> |
| The test suite contains some saved pre-compiled patterns with different |
| endianness. These are reloaded using "<!" instead of just "<". This suppresses |
| the "(byte-inverted)" text so that the output is the same on all hosts. It also |
| forces debugging output once the pattern has been reloaded. |
</P> |
</P> |
<P> |
<P> |
File names for saving and reloading can be absolute or relative, but note that |
File names for saving and reloading can be absolute or relative, but note that |
Line 842 string using a reloaded pattern is likely to cause <b>
|
Line 1073 string using a reloaded pattern is likely to cause <b>
|
Finally, if you attempt to load a file that is not in the correct format, the |
Finally, if you attempt to load a file that is not in the correct format, the |
result is undefined. |
result is undefined. |
</P> |
</P> |
<br><a name="SEC13" href="#TOC1">SEE ALSO</a><br> | <br><a name="SEC15" href="#TOC1">SEE ALSO</a><br> |
<P> |
<P> |
<b>pcre</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrejit</b>, | <b>pcre</b>(3), <b>pcre16</b>(3), <b>pcre32</b>(3), <b>pcreapi</b>(3), |
<b>pcrematching</b>(3), <b>pcrepartial</b>(d), <b>pcrepattern</b>(3), | <b>pcrecallout</b>(3), |
<b>pcreprecompile</b>(3). | <b>pcrejit</b>, <b>pcrematching</b>(3), <b>pcrepartial</b>(d), |
| <b>pcrepattern</b>(3), <b>pcreprecompile</b>(3). |
</P> |
</P> |
<br><a name="SEC14" href="#TOC1">AUTHOR</a><br> | <br><a name="SEC16" href="#TOC1">AUTHOR</a><br> |
<P> |
<P> |
Philip Hazel |
Philip Hazel |
<br> |
<br> |
Line 857 University Computing Service
|
Line 1089 University Computing Service
|
Cambridge CB2 3QH, England. |
Cambridge CB2 3QH, England. |
<br> |
<br> |
</P> |
</P> |
<br><a name="SEC15" href="#TOC1">REVISION</a><br> | <br><a name="SEC17" href="#TOC1">REVISION</a><br> |
<P> |
<P> |
Last updated: 02 December 2011 | Last updated: 26 April 2013 |
<br> |
<br> |
Copyright © 1997-2011 University of Cambridge. | Copyright © 1997-2013 University of Cambridge. |
<br> |
<br> |
<p> |
<p> |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |