|
version 1.1.1.1, 2012/02/21 23:05:52
|
version 1.1.1.4, 2013/07/22 08:25:57
|
|
Line 14 man page, in case the conversion went wrong.
|
Line 14 man page, in case the conversion went wrong.
|
| <br> |
<br> |
| <ul> |
<ul> |
| <li><a name="TOC1" href="#SEC1">SYNOPSIS</a> |
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a> |
| <li><a name="TOC2" href="#SEC2">COMMAND LINE OPTIONS</a> | <li><a name="TOC2" href="#SEC2">INPUT DATA FORMAT</a> |
| <li><a name="TOC3" href="#SEC3">DESCRIPTION</a> | <li><a name="TOC3" href="#SEC3">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a> |
| <li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a> | <li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a> |
| <li><a name="TOC5" href="#SEC5">DATA LINES</a> | <li><a name="TOC5" href="#SEC5">DESCRIPTION</a> |
| <li><a name="TOC6" href="#SEC6">THE ALTERNATIVE MATCHING FUNCTION</a> | <li><a name="TOC6" href="#SEC6">PATTERN MODIFIERS</a> |
| <li><a name="TOC7" href="#SEC7">DEFAULT OUTPUT FROM PCRETEST</a> | <li><a name="TOC7" href="#SEC7">DATA LINES</a> |
| <li><a name="TOC8" href="#SEC8">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a> | <li><a name="TOC8" href="#SEC8">THE ALTERNATIVE MATCHING FUNCTION</a> |
| <li><a name="TOC9" href="#SEC9">RESTARTING AFTER A PARTIAL MATCH</a> | <li><a name="TOC9" href="#SEC9">DEFAULT OUTPUT FROM PCRETEST</a> |
| <li><a name="TOC10" href="#SEC10">CALLOUTS</a> | <li><a name="TOC10" href="#SEC10">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a> |
| <li><a name="TOC11" href="#SEC11">NON-PRINTING CHARACTERS</a> | <li><a name="TOC11" href="#SEC11">RESTARTING AFTER A PARTIAL MATCH</a> |
| <li><a name="TOC12" href="#SEC12">SAVING AND RELOADING COMPILED PATTERNS</a> | <li><a name="TOC12" href="#SEC12">CALLOUTS</a> |
| <li><a name="TOC13" href="#SEC13">SEE ALSO</a> | <li><a name="TOC13" href="#SEC13">NON-PRINTING CHARACTERS</a> |
| <li><a name="TOC14" href="#SEC14">AUTHOR</a> | <li><a name="TOC14" href="#SEC14">SAVING AND RELOADING COMPILED PATTERNS</a> |
| <li><a name="TOC15" href="#SEC15">REVISION</a> | <li><a name="TOC15" href="#SEC15">SEE ALSO</a> |
| | <li><a name="TOC16" href="#SEC16">AUTHOR</a> |
| | <li><a name="TOC17" href="#SEC17">REVISION</a> |
| </ul> |
</ul> |
| <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
| <P> |
<P> |
|
Line 42 details of the regular expressions themselves, see the
|
Line 44 details of the regular expressions themselves, see the
|
| documentation. For details of the PCRE library function calls and their |
documentation. For details of the PCRE library function calls and their |
| options, see the |
options, see the |
| <a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
| documentation. The input for <b>pcretest</b> is a sequence of regular expression | , |
| patterns and strings to be matched, as described below. The output shows the | <a href="pcre16.html"><b>pcre16</b></a> |
| result of each match. Options on the command line and the patterns control PCRE | and |
| options and exactly what is output. | <a href="pcre32.html"><b>pcre32</b></a> |
| | documentation. |
| </P> |
</P> |
| <br><a name="SEC2" href="#TOC1">COMMAND LINE OPTIONS</a><br> |
|
| <P> |
<P> |
| |
The input for <b>pcretest</b> is a sequence of regular expression patterns and |
| |
strings to be matched, as described below. The output shows the result of each |
| |
match. Options on the command line and the patterns control PCRE options and |
| |
exactly what is output. |
| |
</P> |
| |
<P> |
| |
As PCRE has evolved, it has acquired many different features, and as a result, |
| |
<b>pcretest</b> now has rather a lot of obscure options for testing every |
| |
possible feature. Some of these options are specifically designed for use in |
| |
conjunction with the test script and data files that are distributed as part of |
| |
PCRE, and are unlikely to be of use otherwise. They are all documented here, |
| |
but without much justification. |
| |
</P> |
| |
<br><a name="SEC2" href="#TOC1">INPUT DATA FORMAT</a><br> |
| |
<P> |
| |
Input to <b>pcretest</b> is processed line by line, either by calling the C |
| |
library's <b>fgets()</b> function, or via the <b>libreadline</b> library (see |
| |
below). In Unix-like environments, <b>fgets()</b> treats any bytes other than |
| |
newline as data characters. However, in some Windows environments character 26 |
| |
(hex 1A) causes an immediate end of file, and no further data is read. For |
| |
maximum portability, therefore, it is safest to use only ASCII characters in |
| |
<b>pcretest</b> input files. |
| |
</P> |
| |
<br><a name="SEC3" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br> |
| |
<P> |
| |
From release 8.30, two separate PCRE libraries can be built. The original one |
| |
supports 8-bit character strings, whereas the newer 16-bit library supports |
| |
character strings encoded in 16-bit units. From release 8.32, a third library |
| |
can be built, supporting character strings encoded in 32-bit units. The |
| |
<b>pcretest</b> program can be used to test all three libraries. However, it is |
| |
itself still an 8-bit program, reading 8-bit input and writing 8-bit output. |
| |
When testing the 16-bit or 32-bit library, the patterns and data strings are |
| |
converted to 16- or 32-bit format before being passed to the PCRE library |
| |
functions. Results are converted to 8-bit for output. |
| |
</P> |
| |
<P> |
| |
References to functions and structures of the form <b>pcre[16|32]_xx</b> below |
| |
mean "<b>pcre_xx</b> when using the 8-bit library, <b>pcre16_xx</b> when using |
| |
the 16-bit library, or <b>pcre32_xx</b> when using the 32-bit library". |
| |
</P> |
| |
<br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br> |
| |
<P> |
| |
<b>-8</b> |
| |
If both the 8-bit library has been built, this option causes the 8-bit library |
| |
to be used (which is the default); if the 8-bit library has not been built, |
| |
this option causes an error. |
| |
</P> |
| |
<P> |
| |
<b>-16</b> |
| |
If both the 8-bit or the 32-bit, and the 16-bit libraries have been built, this |
| |
option causes the 16-bit library to be used. If only the 16-bit library has been |
| |
built, this is the default (so has no effect). If only the 8-bit or the 32-bit |
| |
library has been built, this option causes an error. |
| |
</P> |
| |
<P> |
| |
<b>-32</b> |
| |
If both the 8-bit or the 16-bit, and the 32-bit libraries have been built, this |
| |
option causes the 32-bit library to be used. If only the 32-bit library has been |
| |
built, this is the default (so has no effect). If only the 8-bit or the 16-bit |
| |
library has been built, this option causes an error. |
| |
</P> |
| |
<P> |
| <b>-b</b> |
<b>-b</b> |
| Behave as if each pattern has the <b>/B</b> (show byte code) modifier; the |
Behave as if each pattern has the <b>/B</b> (show byte code) modifier; the |
| internal form is output after compilation. |
internal form is output after compilation. |
|
Line 56 internal form is output after compilation.
|
Line 120 internal form is output after compilation.
|
| <P> |
<P> |
| <b>-C</b> |
<b>-C</b> |
| Output the version number of the PCRE library, and all available information |
Output the version number of the PCRE library, and all available information |
| about the optional features that are included, and then exit. | about the optional features that are included, and then exit with zero exit |
| | code. All other options are ignored. |
| </P> |
</P> |
| <P> |
<P> |
| |
<b>-C</b> <i>option</i> |
| |
Output information about a specific build-time option, then exit. This |
| |
functionality is intended for use in scripts such as <b>RunTest</b>. The |
| |
following options output the value and set the exit code as indicated: |
| |
<pre> |
| |
ebcdic-nl the code for LF (= NL) in an EBCDIC environment: |
| |
0x15 or 0x25 |
| |
0 if used in an ASCII environment |
| |
exit code is always 0 |
| |
linksize the configured internal link size (2, 3, or 4) |
| |
exit code is set to the link size |
| |
newline the default newline setting: |
| |
CR, LF, CRLF, ANYCRLF, or ANY |
| |
exit code is always 0 |
| |
</pre> |
| |
The following options output 1 for true or 0 for false, and set the exit code |
| |
to the same value: |
| |
<pre> |
| |
ebcdic compiled for an EBCDIC environment |
| |
jit just-in-time support is available |
| |
pcre16 the 16-bit library was built |
| |
pcre32 the 32-bit library was built |
| |
pcre8 the 8-bit library was built |
| |
ucp Unicode property support is available |
| |
utf UTF-8 and/or UTF-16 and/or UTF-32 support |
| |
is available |
| |
</pre> |
| |
If an unknown option is given, an error message is output; the exit code is 0. |
| |
</P> |
| |
<P> |
| <b>-d</b> |
<b>-d</b> |
| Behave as if each pattern has the <b>/D</b> (debug) modifier; the internal |
Behave as if each pattern has the <b>/D</b> (debug) modifier; the internal |
| form and information about the compiled pattern is output after compilation; |
form and information about the compiled pattern is output after compilation; |
|
Line 67 form and information about the compiled pattern is out
|
Line 162 form and information about the compiled pattern is out
|
| <P> |
<P> |
| <b>-dfa</b> |
<b>-dfa</b> |
| Behave as if each data line contains the \D escape sequence; this causes the |
Behave as if each data line contains the \D escape sequence; this causes the |
| alternative matching function, <b>pcre_dfa_exec()</b>, to be used instead of the | alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, to be used instead |
| standard <b>pcre_exec()</b> function (more detail is given below). | of the standard <b>pcre[16|32]_exec()</b> function (more detail is given below). |
| </P> |
</P> |
| <P> |
<P> |
| <b>-help</b> |
<b>-help</b> |
|
Line 83 compiled pattern is given after compilation.
|
Line 178 compiled pattern is given after compilation.
|
| <b>-M</b> |
<b>-M</b> |
| Behave as if each data line contains the \M escape sequence; this causes |
Behave as if each data line contains the \M escape sequence; this causes |
| PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by |
PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by |
| calling <b>pcre_exec()</b> repeatedly with different limits. | calling <b>pcre[16|32]_exec()</b> repeatedly with different limits. |
| </P> |
</P> |
| <P> |
<P> |
| <b>-m</b> |
<b>-m</b> |
| Output the size of each compiled pattern after it has been compiled. This is |
Output the size of each compiled pattern after it has been compiled. This is |
| equivalent to adding <b>/M</b> to each regular expression. | equivalent to adding <b>/M</b> to each regular expression. The size is given in |
| | bytes for both libraries. |
| </P> |
</P> |
| <P> |
<P> |
| <b>-o</b> <i>osize</i> |
<b>-o</b> <i>osize</i> |
| Set the number of elements in the output vector that is used when calling |
Set the number of elements in the output vector that is used when calling |
| <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> to be <i>osize</i>. The default value | <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> to be <i>osize</i>. The |
| is 45, which is enough for 14 capturing subexpressions for <b>pcre_exec()</b> or | default value is 45, which is enough for 14 capturing subexpressions for |
| 22 different matches for <b>pcre_dfa_exec()</b>. The vector size can be | <b>pcre[16|32]_exec()</b> or 22 different matches for |
| changed for individual matching calls by including \O in the data line (see | <b>pcre[16|32]_dfa_exec()</b>. |
| below). | The vector size can be changed for individual matching calls by including \O |
| | in the data line (see below). |
| </P> |
</P> |
| <P> |
<P> |
| <b>-p</b> |
<b>-p</b> |
| Behave as if each pattern has the <b>/P</b> modifier; the POSIX wrapper API is |
Behave as if each pattern has the <b>/P</b> modifier; the POSIX wrapper API is |
| used to call PCRE. None of the other options has any effect when <b>-p</b> is |
used to call PCRE. None of the other options has any effect when <b>-p</b> is |
| set. | set. This option can be used only with the 8-bit library. |
| </P> |
</P> |
| <P> |
<P> |
| <b>-q</b> |
<b>-q</b> |
|
Line 117 megabytes.
|
Line 214 megabytes.
|
| <P> |
<P> |
| <b>-s</b> or <b>-s+</b> |
<b>-s</b> or <b>-s+</b> |
| Behave as if each pattern has the <b>/S</b> modifier; in other words, force each |
Behave as if each pattern has the <b>/S</b> modifier; in other words, force each |
| pattern to be studied. If <b>-s+</b> is used, the PCRE_STUDY_JIT_COMPILE flag is | pattern to be studied. If <b>-s+</b> is used, all the JIT compile options are |
| passed to <b>pcre_study()</b>, causing just-in-time optimization to be set up if | passed to <b>pcre[16|32]_study()</b>, causing just-in-time optimization to be set |
| it is available. If the <b>/I</b> or <b>/D</b> option is present on a pattern | up if it is available, for both full and partial matching. Specific JIT compile |
| (requesting output about the compiled pattern), information about the result of | options can be selected by following <b>-s+</b> with a digit in the range 1 to |
| studying is not included when studying is caused only by <b>-s</b> and neither | 7, which selects the JIT compile modes as follows: |
| <b>-i</b> nor <b>-d</b> is present on the command line. This behaviour means that | <pre> |
| the output from tests that are run with and without <b>-s</b> should be | 1 normal match only |
| identical, except when options that output information about the actual running | 2 soft partial match only |
| of a match are set. The <b>-M</b>, <b>-t</b>, and <b>-tm</b> options, which give | 3 normal match and soft partial match |
| information about resources used, are likely to produce different output with | 4 hard partial match only |
| and without <b>-s</b>. Output may also differ if the <b>/C</b> option is present | 6 soft and hard partial match |
| on an individual pattern. This uses callouts to trace the the matching process, | 7 all three modes (default) |
| and this may be different between studied and non-studied patterns. If the | </pre> |
| pattern contains (*MARK) items there may also be differences, for the same | If <b>-s++</b> is used instead of <b>-s+</b> (with or without a following digit), |
| reason. The <b>-s</b> command line option can be overridden for specific | the text "(JIT)" is added to the first output line after a match or no match |
| patterns that should never be studied (see the <b>/S</b> pattern modifier | when JIT-compiled code was actually used. |
| below). | <br> |
| | <br> |
| | Note that there are pattern options that can override <b>-s</b>, either |
| | specifying no studying at all, or suppressing JIT compilation. |
| | <br> |
| | <br> |
| | If the <b>/I</b> or <b>/D</b> option is present on a pattern (requesting output |
| | about the compiled pattern), information about the result of studying is not |
| | included when studying is caused only by <b>-s</b> and neither <b>-i</b> nor |
| | <b>-d</b> is present on the command line. This behaviour means that the output |
| | from tests that are run with and without <b>-s</b> should be identical, except |
| | when options that output information about the actual running of a match are |
| | set. |
| | <br> |
| | <br> |
| | The <b>-M</b>, <b>-t</b>, and <b>-tm</b> options, which give information about |
| | resources used, are likely to produce different output with and without |
| | <b>-s</b>. Output may also differ if the <b>/C</b> option is present on an |
| | individual pattern. This uses callouts to trace the the matching process, and |
| | this may be different between studied and non-studied patterns. If the pattern |
| | contains (*MARK) items there may also be differences, for the same reason. The |
| | <b>-s</b> command line option can be overridden for specific patterns that |
| | should never be studied (see the <b>/S</b> pattern modifier below). |
| </P> |
</P> |
| <P> |
<P> |
| <b>-t</b> |
<b>-t</b> |
|
Line 150 to iterate 500000 times.
|
Line 269 to iterate 500000 times.
|
| This is like <b>-t</b> except that it times only the matching phase, not the |
This is like <b>-t</b> except that it times only the matching phase, not the |
| compile or study phases. |
compile or study phases. |
| </P> |
</P> |
| <br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br> | <br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br> |
| <P> |
<P> |
| If <b>pcretest</b> is given two filename arguments, it reads from the first and |
If <b>pcretest</b> is given two filename arguments, it reads from the first and |
| writes to the second. If it is given only one filename argument, it reads from |
writes to the second. If it is given only one filename argument, it reads from |
|
Line 207 backslash, because
|
Line 326 backslash, because
|
| is interpreted as the first line of a pattern that starts with "abc/", causing |
is interpreted as the first line of a pattern that starts with "abc/", causing |
| pcretest to read the next line as a continuation of the regular expression. |
pcretest to read the next line as a continuation of the regular expression. |
| </P> |
</P> |
| <br><a name="SEC4" href="#TOC1">PATTERN MODIFIERS</a><br> | <br><a name="SEC6" href="#TOC1">PATTERN MODIFIERS</a><br> |
| <P> |
<P> |
| A pattern may be followed by any number of modifiers, which are mostly single |
A pattern may be followed by any number of modifiers, which are mostly single |
| characters. Following Perl usage, these are referred to below as, for example, | characters, though some of these can be qualified by further characters. |
| "the <b>/i</b> modifier", even though the delimiter of the pattern need not | Following Perl usage, these are referred to below as, for example, "the |
| always be a slash, and no slash is used when writing modifiers. White space may | <b>/i</b> modifier", even though the delimiter of the pattern need not always be |
| appear between the final pattern delimiter and the first modifier, and between | a slash, and no slash is used when writing modifiers. White space may appear |
| the modifiers themselves. | between the final pattern delimiter and the first modifier, and between the |
| | modifiers themselves. For reference, here is a complete list of modifiers. They |
| | fall into several groups that are described in detail in the following |
| | sections. |
| | <pre> |
| | <b>/8</b> set UTF mode |
| | <b>/9</b> set PCRE_NEVER_UTF (locks out UTF mode) |
| | <b>/?</b> disable UTF validity check |
| | <b>/+</b> show remainder of subject after match |
| | <b>/=</b> show all captures (not just those that are set) |
| | |
| | <b>/A</b> set PCRE_ANCHORED |
| | <b>/B</b> show compiled code |
| | <b>/C</b> set PCRE_AUTO_CALLOUT |
| | <b>/D</b> same as <b>/B</b> plus <b>/I</b> |
| | <b>/E</b> set PCRE_DOLLAR_ENDONLY |
| | <b>/F</b> flip byte order in compiled pattern |
| | <b>/f</b> set PCRE_FIRSTLINE |
| | <b>/G</b> find all matches (shorten string) |
| | <b>/g</b> find all matches (use startoffset) |
| | <b>/I</b> show information about pattern |
| | <b>/i</b> set PCRE_CASELESS |
| | <b>/J</b> set PCRE_DUPNAMES |
| | <b>/K</b> show backtracking control names |
| | <b>/L</b> set locale |
| | <b>/M</b> show compiled memory size |
| | <b>/m</b> set PCRE_MULTILINE |
| | <b>/N</b> set PCRE_NO_AUTO_CAPTURE |
| | <b>/P</b> use the POSIX wrapper |
| | <b>/S</b> study the pattern after compilation |
| | <b>/s</b> set PCRE_DOTALL |
| | <b>/T</b> select character tables |
| | <b>/U</b> set PCRE_UNGREEDY |
| | <b>/W</b> set PCRE_UCP |
| | <b>/X</b> set PCRE_EXTRA |
| | <b>/x</b> set PCRE_EXTENDED |
| | <b>/Y</b> set PCRE_NO_START_OPTIMIZE |
| | <b>/Z</b> don't show lengths in <b>/B</b> output |
| | |
| | <b>/<any></b> set PCRE_NEWLINE_ANY |
| | <b>/<anycrlf></b> set PCRE_NEWLINE_ANYCRLF |
| | <b>/<cr></b> set PCRE_NEWLINE_CR |
| | <b>/<crlf></b> set PCRE_NEWLINE_CRLF |
| | <b>/<lf></b> set PCRE_NEWLINE_LF |
| | <b>/<bsr_anycrlf></b> set PCRE_BSR_ANYCRLF |
| | <b>/<bsr_unicode></b> set PCRE_BSR_UNICODE |
| | <b>/<JS></b> set PCRE_JAVASCRIPT_COMPAT |
| | |
| | </PRE> |
| </P> |
</P> |
| |
<br><b> |
| |
Perl-compatible modifiers |
| |
</b><br> |
| <P> |
<P> |
| The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS, |
The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS, |
| PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when |
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when |
| <b>pcre_compile()</b> is called. These four modifier letters have the same | <b>pcre[16|32]_compile()</b> is called. These four modifier letters have the same |
| effect as they do in Perl. For example: |
effect as they do in Perl. For example: |
| <pre> |
<pre> |
| /caseless/i |
/caseless/i |
| </pre> | |
| | </PRE> |
| | </P> |
| | <br><b> |
| | Modifiers for other PCRE options |
| | </b><br> |
| | <P> |
| The following table shows additional modifiers for setting PCRE compile-time |
The following table shows additional modifiers for setting PCRE compile-time |
| options that do not correspond to anything in Perl: |
options that do not correspond to anything in Perl: |
| <pre> |
<pre> |
| <b>/8</b> PCRE_UTF8 | <b>/8</b> PCRE_UTF8 ) when using the 8-bit |
| <b>/?</b> PCRE_NO_UTF8_CHECK | <b>/?</b> PCRE_NO_UTF8_CHECK ) library |
| | |
| | <b>/8</b> PCRE_UTF16 ) when using the 16-bit |
| | <b>/?</b> PCRE_NO_UTF16_CHECK ) library |
| | |
| | <b>/8</b> PCRE_UTF32 ) when using the 32-bit |
| | <b>/?</b> PCRE_NO_UTF32_CHECK ) library |
| | |
| | <b>/9</b> PCRE_NEVER_UTF |
| <b>/A</b> PCRE_ANCHORED |
<b>/A</b> PCRE_ANCHORED |
| <b>/C</b> PCRE_AUTO_CALLOUT |
<b>/C</b> PCRE_AUTO_CALLOUT |
| <b>/E</b> PCRE_DOLLAR_ENDONLY |
<b>/E</b> PCRE_DOLLAR_ENDONLY |
|
Line 239 options that do not correspond to anything in Perl:
|
Line 423 options that do not correspond to anything in Perl:
|
| <b>/W</b> PCRE_UCP |
<b>/W</b> PCRE_UCP |
| <b>/X</b> PCRE_EXTRA |
<b>/X</b> PCRE_EXTRA |
| <b>/Y</b> PCRE_NO_START_OPTIMIZE |
<b>/Y</b> PCRE_NO_START_OPTIMIZE |
| <b>/<JS></b> PCRE_JAVASCRIPT_COMPAT | <b>/<any></b> PCRE_NEWLINE_ANY |
| | <b>/<anycrlf></b> PCRE_NEWLINE_ANYCRLF |
| <b>/<cr></b> PCRE_NEWLINE_CR |
<b>/<cr></b> PCRE_NEWLINE_CR |
| <b>/<lf></b> PCRE_NEWLINE_LF |
|
| <b>/<crlf></b> PCRE_NEWLINE_CRLF |
<b>/<crlf></b> PCRE_NEWLINE_CRLF |
| <b>/<anycrlf></b> PCRE_NEWLINE_ANYCRLF | <b>/<lf></b> PCRE_NEWLINE_LF |
| <b>/<any></b> PCRE_NEWLINE_ANY | |
| <b>/<bsr_anycrlf></b> PCRE_BSR_ANYCRLF |
<b>/<bsr_anycrlf></b> PCRE_BSR_ANYCRLF |
| <b>/<bsr_unicode></b> PCRE_BSR_UNICODE |
<b>/<bsr_unicode></b> PCRE_BSR_UNICODE |
| |
<b>/<JS></b> PCRE_JAVASCRIPT_COMPAT |
| </pre> |
</pre> |
| The modifiers that are enclosed in angle brackets are literal strings as shown, |
The modifiers that are enclosed in angle brackets are literal strings as shown, |
| including the angle brackets, but the letters within can be in either case. |
including the angle brackets, but the letters within can be in either case. |
|
Line 254 This example sets multiline matching with CRLF as the
|
Line 438 This example sets multiline matching with CRLF as the
|
| <pre> |
<pre> |
| /^abc/m<CRLF> |
/^abc/m<CRLF> |
| </pre> |
</pre> |
| As well as turning on the PCRE_UTF8 option, the <b>/8</b> modifier also causes | As well as turning on the PCRE_UTF8/16/32 option, the <b>/8</b> modifier causes |
| any non-printing characters in output strings to be printed using the | all non-printing characters in output strings to be printed using the |
| \x{hh...} notation if they are valid UTF-8 sequences. Full details of the PCRE | \x{hh...} notation. Otherwise, those less than 0x100 are output in hex without |
| options are given in the | the curly brackets. |
| | </P> |
| | <P> |
| | Full details of the PCRE options are given in the |
| <a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
| documentation. |
documentation. |
| </P> |
</P> |
|
Line 269 Searching for all possible matches within each subject
|
Line 456 Searching for all possible matches within each subject
|
| by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called |
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called |
| again to search the remainder of the subject string. The difference between |
again to search the remainder of the subject string. The difference between |
| <b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to |
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to |
| <b>pcre_exec()</b> to start searching at a new point within the entire string | <b>pcre[16|32]_exec()</b> to start searching at a new point within the entire |
| (which is in effect what Perl does), whereas the latter passes over a shortened | string (which is in effect what Perl does), whereas the latter passes over a |
| substring. This makes a difference to the matching process if the pattern | shortened substring. This makes a difference to the matching process if the |
| begins with a lookbehind assertion (including \b or \B). | pattern begins with a lookbehind assertion (including \b or \B). |
| </P> |
</P> |
| <P> |
<P> |
| If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an | If any call to <b>pcre[16|32]_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches |
| empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and | an empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
| PCRE_ANCHORED flags set in order to search for another, non-empty, match at the |
PCRE_ANCHORED flags set in order to search for another, non-empty, match at the |
| same point. If this second match fails, the start offset is advanced, and the |
same point. If this second match fails, the start offset is advanced, and the |
| normal match is retried. This imitates the way Perl handles such cases when |
normal match is retried. This imitates the way Perl handles such cases when |
|
Line 300 contains multiple copies of the same substring. If the
|
Line 487 contains multiple copies of the same substring. If the
|
| twice, the same action is taken for captured substrings. In each case the |
twice, the same action is taken for captured substrings. In each case the |
| remainder is output on the following line with a plus character following the |
remainder is output on the following line with a plus character following the |
| capture number. Note that this modifier must not immediately follow the /S |
capture number. Note that this modifier must not immediately follow the /S |
| modifier because /S+ has another meaning. | modifier because /S+ and /S++ have other meanings. |
| </P> |
</P> |
| <P> |
<P> |
| The <b>/=</b> modifier requests that the values of all potential captured |
The <b>/=</b> modifier requests that the values of all potential captured |
| parentheses be output after a match by <b>pcre_exec()</b>. By default, only | parentheses be output after a match. By default, only those up to the highest |
| those up to the highest one actually used in the match are output | one actually used in the match are output (corresponding to the return code |
| (corresponding to the return code from <b>pcre_exec()</b>). Values in the | from <b>pcre[16|32]_exec()</b>). Values in the offsets vector corresponding to |
| offsets vector corresponding to higher numbers should be set to -1, and these | higher numbers should be set to -1, and these are output as "<unset>". This |
| are output as "<unset>". This modifier gives a way of checking that this is | modifier gives a way of checking that this is happening. |
| happening. | |
| </P> |
</P> |
| <P> |
<P> |
| The <b>/B</b> modifier is a debugging feature. It requests that <b>pcretest</b> |
The <b>/B</b> modifier is a debugging feature. It requests that <b>pcretest</b> |
| output a representation of the compiled byte code after compilation. Normally | output a representation of the compiled code after compilation. Normally this |
| this information contains length and offset values; however, if <b>/Z</b> is | information contains length and offset values; however, if <b>/Z</b> is also |
| also present, this data is replaced by spaces. This is a special feature for | present, this data is replaced by spaces. This is a special feature for use in |
| use in the automatic test scripts; it ensures that the same output is generated | the automatic test scripts; it ensures that the same output is generated for |
| for different internal link sizes. | different internal link sizes. |
| </P> |
</P> |
| <P> |
<P> |
| The <b>/D</b> modifier is a PCRE debugging feature, and is equivalent to |
The <b>/D</b> modifier is a PCRE debugging feature, and is equivalent to |
|
Line 325 The <b>/D</b> modifier is a PCRE debugging feature, an
|
Line 511 The <b>/D</b> modifier is a PCRE debugging feature, an
|
| </P> |
</P> |
| <P> |
<P> |
| The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the |
The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the |
| fields in the compiled pattern that contain 2-byte and 4-byte numbers. This | 2-byte and 4-byte fields in the compiled pattern. This facility is for testing |
| facility is for testing the feature in PCRE that allows it to execute patterns | the feature in PCRE that allows it to execute patterns that were compiled on a |
| that were compiled on a host with a different endianness. This feature is not | host with a different endianness. This feature is not available when the POSIX |
| available when the POSIX interface to PCRE is being used, that is, when the | interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
| <b>/P</b> pattern modifier is specified. See also the section about saving and | specified. See also the section about saving and reloading compiled patterns |
| reloading compiled patterns below. | below. |
| </P> |
</P> |
| <P> |
<P> |
| The <b>/I</b> modifier requests that <b>pcretest</b> output information about the |
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the |
| compiled pattern (whether it is anchored, has a fixed first character, and |
compiled pattern (whether it is anchored, has a fixed first character, and |
| so on). It does this by calling <b>pcre_fullinfo()</b> after compiling a | so on). It does this by calling <b>pcre[16|32]_fullinfo()</b> after compiling a |
| pattern. If the pattern is studied, the results of that are also output. |
pattern. If the pattern is studied, the results of that are also output. |
| </P> |
</P> |
| <P> |
<P> |
| The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking |
The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking |
| control verbs that are returned from calls to <b>pcre_exec()</b>. It causes | control verbs that are returned from calls to <b>pcre[16|32]_exec()</b>. It causes |
| <b>pcretest</b> to create a <b>pcre_extra</b> block if one has not already been | <b>pcretest</b> to create a <b>pcre[16|32]_extra</b> block if one has not already |
| created by a call to <b>pcre_study()</b>, and to set the PCRE_EXTRA_MARK flag | been created by a call to <b>pcre[16|32]_study()</b>, and to set the |
| and the <b>mark</b> field within it, every time that <b>pcre_exec()</b> is | PCRE_EXTRA_MARK flag and the <b>mark</b> field within it, every time that |
| called. If the variable that the <b>mark</b> field points to is non-NULL for a | <b>pcre[16|32]_exec()</b> is called. If the variable that the <b>mark</b> field |
| match, non-match, or partial match, <b>pcretest</b> prints the string to which | points to is non-NULL for a match, non-match, or partial match, <b>pcretest</b> |
| it points. For a match, this is shown on a line by itself, tagged with "MK:". | prints the string to which it points. For a match, this is shown on a line by |
| For a non-match it is added to the message. | itself, tagged with "MK:". For a non-match it is added to the message. |
| </P> |
</P> |
| <P> |
<P> |
| The <b>/L</b> modifier must be followed directly by the name of a locale, for |
The <b>/L</b> modifier must be followed directly by the name of a locale, for |
|
Line 356 example,
|
Line 542 example,
|
| /pattern/Lfr_FR |
/pattern/Lfr_FR |
| </pre> |
</pre> |
| For this reason, it must be the last modifier. The given locale is set, |
For this reason, it must be the last modifier. The given locale is set, |
| <b>pcre_maketables()</b> is called to build a set of character tables for the | <b>pcre[16|32]_maketables()</b> is called to build a set of character tables for |
| locale, and this is then passed to <b>pcre_compile()</b> when compiling the | the locale, and this is then passed to <b>pcre[16|32]_compile()</b> when compiling |
| regular expression. Without an <b>/L</b> (or <b>/T</b>) modifier, NULL is passed | the regular expression. Without an <b>/L</b> (or <b>/T</b>) modifier, NULL is |
| as the tables pointer; that is, <b>/L</b> applies only to the expression on | passed as the tables pointer; that is, <b>/L</b> applies only to the expression |
| which it appears. | on which it appears. |
| </P> |
</P> |
| <P> |
<P> |
| The <b>/M</b> modifier causes the size of memory block used to hold the compiled | The <b>/M</b> modifier causes the size in bytes of the memory block used to hold |
| pattern to be output. This does not include the size of the <b>pcre</b> block; | the compiled pattern to be output. This does not include the size of the |
| it is just the actual compiled data. If the pattern is successfully studied | <b>pcre[16|32]</b> block; it is just the actual compiled data. If the pattern is |
| with the PCRE_STUDY_JIT_COMPILE option, the size of the JIT compiled code is | successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the |
| also output. | JIT compiled code is also output. |
| </P> |
</P> |
| <P> |
<P> |
| If the <b>/S</b> modifier appears once, it causes <b>pcre_study()</b> to be | The <b>/S</b> modifier causes <b>pcre[16|32]_study()</b> to be called after the |
| called after the expression has been compiled, and the results used when the | expression has been compiled, and the results used when the expression is |
| expression is matched. If <b>/S</b> appears twice, it suppresses studying, even | matched. There are a number of qualifying characters that may follow <b>/S</b>. |
| | They may appear in any order. |
| | </P> |
| | <P> |
| | If <b>S</b> is followed by an exclamation mark, <b>pcre[16|32]_study()</b> is called |
| | with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a |
| | <b>pcre_extra</b> block, even when studying discovers no useful information. |
| | </P> |
| | <P> |
| | If <b>/S</b> is followed by a second S character, it suppresses studying, even |
| if it was requested externally by the <b>-s</b> command line option. This makes |
if it was requested externally by the <b>-s</b> command line option. This makes |
| it possible to specify that certain patterns are always studied, and others are |
it possible to specify that certain patterns are always studied, and others are |
| never studied, independently of <b>-s</b>. This feature is used in the test |
never studied, independently of <b>-s</b>. This feature is used in the test |
| files in a few cases where the output is different when the pattern is studied. |
files in a few cases where the output is different when the pattern is studied. |
| </P> |
</P> |
| <P> |
<P> |
| If the <b>/S</b> modifier is immediately followed by a + character, the call to | If the <b>/S</b> modifier is followed by a + character, the call to |
| <b>pcre_study()</b> is made with the PCRE_STUDY_JIT_COMPILE option, requesting | <b>pcre[16|32]_study()</b> is made with all the JIT study options, requesting |
| just-in-time optimization support if it is available. Note that there is also a | just-in-time optimization support if it is available, for both normal and |
| <b>/+</b> modifier; it must not be given immediately after <b>/S</b> because this | partial matching. If you want to restrict the JIT compiling modes, you can |
| will be misinterpreted. If JIT studying is successful, it will automatically be | follow <b>/S+</b> with a digit in the range 1 to 7: |
| used when <b>pcre_exec()</b> is run, except when incompatible run-time options | <pre> |
| are specified. These include the partial matching options; a complete list is | 1 normal match only |
| given in the | 2 soft partial match only |
| | 3 normal match and soft partial match |
| | 4 hard partial match only |
| | 6 soft and hard partial match |
| | 7 all three modes (default) |
| | </pre> |
| | If <b>/S++</b> is used instead of <b>/S+</b> (with or without a following digit), |
| | the text "(JIT)" is added to the first output line after a match or no match |
| | when JIT-compiled code was actually used. |
| | </P> |
| | <P> |
| | Note that there is also an independent <b>/+</b> modifier; it must not be given |
| | immediately after <b>/S</b> or <b>/S+</b> because this will be misinterpreted. |
| | </P> |
| | <P> |
| | If JIT studying is successful, the compiled JIT code will automatically be used |
| | when <b>pcre[16|32]_exec()</b> is run, except when incompatible run-time options |
| | are specified. For more details, see the |
| <a href="pcrejit.html"><b>pcrejit</b></a> |
<a href="pcrejit.html"><b>pcrejit</b></a> |
| documentation. See also the <b>\J</b> escape sequence below for a way of |
documentation. See also the <b>\J</b> escape sequence below for a way of |
| setting the size of the JIT stack. |
setting the size of the JIT stack. |
| </P> |
</P> |
| <P> |
<P> |
| |
Finally, if <b>/S</b> is followed by a minus character, JIT compilation is |
| |
suppressed, even if it was requested externally by the <b>-s</b> command line |
| |
option. This makes it possible to specify that JIT is never to be used for |
| |
certain patterns. |
| |
</P> |
| |
<P> |
| The <b>/T</b> modifier must be followed by a single digit. It causes a specific |
The <b>/T</b> modifier must be followed by a single digit. It causes a specific |
| set of built-in character tables to be passed to <b>pcre_compile()</b>. It is | set of built-in character tables to be passed to <b>pcre[16|32]_compile()</b>. It |
| used in the standard PCRE tests to check behaviour with different character | is used in the standard PCRE tests to check behaviour with different character |
| tables. The digit specifies the tables as follows: |
tables. The digit specifies the tables as follows: |
| <pre> |
<pre> |
| 0 the default ASCII tables, as distributed in |
0 the default ASCII tables, as distributed in |
|
Line 409 Using the POSIX wrapper API
|
Line 627 Using the POSIX wrapper API
|
| </b><br> |
</b><br> |
| <P> |
<P> |
| The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper |
The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper |
| API rather than its native API. When <b>/P</b> is set, the following modifiers | API rather than its native API. This supports only the 8-bit library. When |
| set options for the <b>regcomp()</b> function: | <b>/P</b> is set, the following modifiers set options for the <b>regcomp()</b> |
| | function: |
| <pre> |
<pre> |
| /i REG_ICASE |
/i REG_ICASE |
| /m REG_NEWLINE |
/m REG_NEWLINE |
|
Line 423 set options for the <b>regcomp()</b> function:
|
Line 642 set options for the <b>regcomp()</b> function:
|
| The <b>/+</b> modifier works as described above. All other modifiers are |
The <b>/+</b> modifier works as described above. All other modifiers are |
| ignored. |
ignored. |
| </P> |
</P> |
| <br><a name="SEC5" href="#TOC1">DATA LINES</a><br> | <br><a name="SEC7" href="#TOC1">DATA LINES</a><br> |
| <P> |
<P> |
| Before each data line is passed to <b>pcre_exec()</b>, leading and trailing | Before each data line is passed to <b>pcre[16|32]_exec()</b>, leading and trailing |
| white space is removed, and it is then scanned for \ escapes. Some of these |
white space is removed, and it is then scanned for \ escapes. Some of these |
| are pretty esoteric features, intended for checking out some of the more |
are pretty esoteric features, intended for checking out some of the more |
| complicated features of PCRE. If you are just testing "ordinary" regular |
complicated features of PCRE. If you are just testing "ordinary" regular |
|
Line 441 recognized:
|
Line 660 recognized:
|
| \r carriage return (\x0d) |
\r carriage return (\x0d) |
| \t tab (\x09) |
\t tab (\x09) |
| \v vertical tab (\x0b) |
\v vertical tab (\x0b) |
| \nnn octal character (up to 3 octal digits) | \nnn octal character (up to 3 octal digits); always |
| always a byte unless > 255 in UTF-8 mode | a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode |
| \xhh hexadecimal byte (up to 2 hex digits) |
\xhh hexadecimal byte (up to 2 hex digits) |
| \x{hh...} hexadecimal character, any number of digits in UTF-8 mode | \x{hh...} hexadecimal character (any number of hex digits) |
| \A pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \A pass the PCRE_ANCHORED option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \B pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \B pass the PCRE_NOTBOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \Cdd call pcre_copy_substring() for substring dd after a successful match (number less than 32) | \Cdd call pcre[16|32]_copy_substring() for substring dd after a successful match (number less than 32) |
| \Cname call pcre_copy_named_substring() for substring "name" after a successful match (name termin- | \Cname call pcre[16|32]_copy_named_substring() for substring "name" after a successful match (name termin- |
| ated by next non alphanumeric character) |
ated by next non alphanumeric character) |
| \C+ show the current captured substrings at callout time |
\C+ show the current captured substrings at callout time |
| \C- do not supply a callout function |
\C- do not supply a callout function |
| \C!n return 1 instead of 0 when callout number n is reached |
\C!n return 1 instead of 0 when callout number n is reached |
| \C!n!m return 1 instead of 0 when callout number n is reached for the nth time |
\C!n!m return 1 instead of 0 when callout number n is reached for the nth time |
| \C*n pass the number n (may be negative) as callout data; this is used as the callout return value |
\C*n pass the number n (may be negative) as callout data; this is used as the callout return value |
| \D use the <b>pcre_dfa_exec()</b> match function | \D use the <b>pcre[16|32]_dfa_exec()</b> match function |
| \F only shortest match for <b>pcre_dfa_exec()</b> | \F only shortest match for <b>pcre[16|32]_dfa_exec()</b> |
| \Gdd call pcre_get_substring() for substring dd after a successful match (number less than 32) | \Gdd call pcre[16|32]_get_substring() for substring dd after a successful match (number less than 32) |
| \Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin- | \Gname call pcre[16|32]_get_named_substring() for substring "name" after a successful match (name termin- |
| ated by next non-alphanumeric character) |
ated by next non-alphanumeric character) |
| \Jdd set up a JIT stack of dd kilobytes maximum (any number of digits) |
\Jdd set up a JIT stack of dd kilobytes maximum (any number of digits) |
| \L call pcre_get_substringlist() after a successful match | \L call pcre[16|32]_get_substringlist() after a successful match |
| \M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings |
\M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings |
| \N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>; if used twice, pass the | \N pass the PCRE_NOTEMPTY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the |
| PCRE_NOTEMPTY_ATSTART option |
PCRE_NOTEMPTY_ATSTART option |
| \Odd set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits) | \Odd set the size of the output vector passed to <b>pcre[16|32]_exec()</b> to dd (any number of digits) |
| \P pass the PCRE_PARTIAL_SOFT option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>; if used twice, pass the | \P pass the PCRE_PARTIAL_SOFT option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the |
| PCRE_PARTIAL_HARD option |
PCRE_PARTIAL_HARD option |
| \Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits) |
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits) |
| \R pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b> | \R pass the PCRE_DFA_RESTART option to <b>pcre[16|32]_dfa_exec()</b> |
| \S output details of memory get/free calls during matching |
\S output details of memory get/free calls during matching |
| \Y pass the PCRE_NO_START_OPTIMIZE option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \Y pass the PCRE_NO_START_OPTIMIZE option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \Z pass the PCRE_NOTEOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \? pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \? pass the PCRE_NO_UTF[8|16|32]_CHECK option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \>dd start the match at offset dd (optional "-"; then any number of digits); this sets the <i>startoffset</i> |
\>dd start the match at offset dd (optional "-"; then any number of digits); this sets the <i>startoffset</i> |
| argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | argument for <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<cr> pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<cr> pass the PCRE_NEWLINE_CR option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<lf> pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<lf> pass the PCRE_NEWLINE_LF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<crlf> pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<crlf> pass the PCRE_NEWLINE_CRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<any> pass the PCRE_NEWLINE_ANY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> | \<any> pass the PCRE_NEWLINE_ANY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| </pre> |
</pre> |
| Note that \xhh always specifies one byte, even in UTF-8 mode; this makes it | The use of \x{hh...} is not dependent on the use of the <b>/8</b> modifier on |
| possible to construct invalid UTF-8 sequences for testing purposes. On the | the pattern. It is recognized always. There may be any number of hexadecimal |
| other hand, \x{hh} is interpreted as a UTF-8 character in UTF-8 mode, | digits inside the braces; invalid values provoke error messages. |
| generating more than one byte if the value is greater than 127. When not in | |
| UTF-8 mode, it generates one byte for values less than 256, and causes an error | |
| for greater values. | |
| </P> |
</P> |
| <P> |
<P> |
| |
Note that \xhh specifies one byte rather than one character in UTF-8 mode; |
| |
this makes it possible to construct invalid UTF-8 sequences for testing |
| |
purposes. On the other hand, \x{hh} is interpreted as a UTF-8 character in |
| |
UTF-8 mode, generating more than one byte if the value is greater than 127. |
| |
When testing the 8-bit library not in UTF-8 mode, \x{hh} generates one byte |
| |
for values less than 256, and causes an error for greater values. |
| |
</P> |
| |
<P> |
| |
In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it |
| |
possible to construct invalid UTF-16 sequences for testing purposes. |
| |
</P> |
| |
<P> |
| |
In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it |
| |
possible to construct invalid UTF-32 sequences for testing purposes. |
| |
</P> |
| |
<P> |
| The escapes that specify line ending sequences are literal strings, exactly as |
The escapes that specify line ending sequences are literal strings, exactly as |
| shown. No more than one newline setting should be present in any data line. |
shown. No more than one newline setting should be present in any data line. |
| </P> |
</P> |
|
Line 506 is not being used. Providing a stack that is larger th
|
Line 738 is not being used. Providing a stack that is larger th
|
| necessary only for very complicated patterns. |
necessary only for very complicated patterns. |
| </P> |
</P> |
| <P> |
<P> |
| If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with | If \M is present, <b>pcretest</b> calls <b>pcre[16|32]_exec()</b> several times, |
| different values in the <i>match_limit</i> and <i>match_limit_recursion</i> | with different values in the <i>match_limit</i> and <i>match_limit_recursion</i> |
| fields of the <b>pcre_extra</b> data structure, until it finds the minimum | fields of the <b>pcre[16|32]_extra</b> data structure, until it finds the minimum |
| numbers for each parameter that allow <b>pcre_exec()</b> to complete without | numbers for each parameter that allow <b>pcre[16|32]_exec()</b> to complete without |
| error. Because this is testing a specific feature of the normal interpretive |
error. Because this is testing a specific feature of the normal interpretive |
| <b>pcre_exec()</b> execution, the use of any JIT optimization that might have | <b>pcre[16|32]_exec()</b> execution, the use of any JIT optimization that might |
| been set up by the <b>/S+</b> qualifier of <b>-s+</b> option is disabled. | have been set up by the <b>/S+</b> qualifier of <b>-s+</b> option is disabled. |
| </P> |
</P> |
| <P> |
<P> |
| The <i>match_limit</i> number is a measure of the amount of backtracking |
The <i>match_limit</i> number is a measure of the amount of backtracking |
|
Line 526 needed to complete the match attempt.
|
Line 758 needed to complete the match attempt.
|
| <P> |
<P> |
| When \O is used, the value specified may be higher or lower than the size set |
When \O is used, the value specified may be higher or lower than the size set |
| by the <b>-O</b> command line option (or defaulted to 45); \O applies only to |
by the <b>-O</b> command line option (or defaulted to 45); \O applies only to |
| the call of <b>pcre_exec()</b> for the line in which it appears. | the call of <b>pcre[16|32]_exec()</b> for the line in which it appears. |
| </P> |
</P> |
| <P> |
<P> |
| If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper |
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper |
|
Line 534 API to be used, the only option-setting sequences that
|
Line 766 API to be used, the only option-setting sequences that
|
| \N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, |
\N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, |
| to be passed to <b>regexec()</b>. |
to be passed to <b>regexec()</b>. |
| </P> |
</P> |
| |
<br><a name="SEC8" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br> |
| <P> |
<P> |
| The use of \x{hh...} to represent UTF-8 characters is not dependent on the use |
|
| of the <b>/8</b> modifier on the pattern. It is recognized always. There may be |
|
| any number of hexadecimal digits inside the braces. The result is from one to |
|
| six bytes, encoded according to the original UTF-8 rules of RFC 2279. This |
|
| allows for values in the range 0 to 0x7FFFFFFF. Note that not all of those are |
|
| valid Unicode code points, or indeed valid UTF-8 characters according to the |
|
| later rules in RFC 3629. |
|
| </P> |
|
| <br><a name="SEC6" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br> |
|
| <P> |
|
| By default, <b>pcretest</b> uses the standard PCRE matching function, |
By default, <b>pcretest</b> uses the standard PCRE matching function, |
| <b>pcre_exec()</b> to match each data line. From release 6.0, PCRE supports an | <b>pcre[16|32]_exec()</b> to match each data line. PCRE also supports an |
| alternative matching function, <b>pcre_dfa_test()</b>, which operates in a | alternative matching function, <b>pcre[16|32]_dfa_test()</b>, which operates in a |
| different way, and has some restrictions. The differences between the two |
different way, and has some restrictions. The differences between the two |
| functions are described in the |
functions are described in the |
| <a href="pcrematching.html"><b>pcrematching</b></a> |
<a href="pcrematching.html"><b>pcrematching</b></a> |
|
Line 555 documentation.
|
Line 778 documentation.
|
| </P> |
</P> |
| <P> |
<P> |
| If a data line contains the \D escape sequence, or if the command line |
If a data line contains the \D escape sequence, or if the command line |
| contains the <b>-dfa</b> option, the alternative matching function is called. | contains the <b>-dfa</b> option, the alternative matching function is used. |
| This function finds all possible matches at a given point. If, however, the \F |
This function finds all possible matches at a given point. If, however, the \F |
| escape sequence is present in the data line, it stops after the first match is |
escape sequence is present in the data line, it stops after the first match is |
| found. This is always the shortest possible match. |
found. This is always the shortest possible match. |
| </P> |
</P> |
| <br><a name="SEC7" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br> | <br><a name="SEC9" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br> |
| <P> |
<P> |
| This section describes the output when the normal matching function, |
This section describes the output when the normal matching function, |
| <b>pcre_exec()</b>, is being used. | <b>pcre[16|32]_exec()</b>, is being used. |
| </P> |
</P> |
| <P> |
<P> |
| When a match succeeds, <b>pcretest</b> outputs the list of captured substrings |
When a match succeeds, <b>pcretest</b> outputs the list of captured substrings |
| that <b>pcre_exec()</b> returns, starting with number 0 for the string that | that <b>pcre[16|32]_exec()</b> returns, starting with number 0 for the string that |
| matched the whole pattern. Otherwise, it outputs "No match" when the return is |
matched the whole pattern. Otherwise, it outputs "No match" when the return is |
| PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching |
PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching |
| substring when <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that this is | substring when <b>pcre[16|32]_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that |
| the entire substring that was inspected during the partial match; it may | this is the entire substring that was inspected during the partial match; it |
| include characters before the actual match start if a lookbehind assertion, | may include characters before the actual match start if a lookbehind assertion, |
| \K, \b, or \B was involved.) For any other return, <b>pcretest</b> outputs |
\K, \b, or \B was involved.) For any other return, <b>pcretest</b> outputs |
| the PCRE negative error number and a short descriptive phrase. If the error is |
the PCRE negative error number and a short descriptive phrase. If the error is |
| a failed UTF-8 string check, the byte offset of the start of the failing | a failed UTF string check, the offset of the start of the failing character and |
| character and the reason code are also output, provided that the size of the | the reason code are also output, provided that the size of the output vector is |
| output vector is at least two. Here is an example of an interactive | at least two. Here is an example of an interactive <b>pcretest</b> run. |
| <b>pcretest</b> run. | |
| <pre> |
<pre> |
| $ pcretest |
$ pcretest |
| PCRE version 8.13 2011-04-30 |
PCRE version 8.13 2011-04-30 |
|
Line 591 output vector is at least two. Here is an example of a
|
Line 813 output vector is at least two. Here is an example of a
|
| No match |
No match |
| </pre> |
</pre> |
| Unset capturing substrings that are not followed by one that is set are not |
Unset capturing substrings that are not followed by one that is set are not |
| returned by <b>pcre_exec()</b>, and are not shown by <b>pcretest</b>. In the | returned by <b>pcre[16|32]_exec()</b>, and are not shown by <b>pcretest</b>. In the |
| following example, there are two capturing substrings, but when the first data |
following example, there are two capturing substrings, but when the first data |
| line is matched, the second, unset substring is not shown. An "internal" unset |
line is matched, the second, unset substring is not shown. An "internal" unset |
| substring is shown as "<unset>", as for the second data line. |
substring is shown as "<unset>", as for the second data line. |
|
Line 605 substring is shown as "<unset>", as for the se
|
Line 827 substring is shown as "<unset>", as for the se
|
| 1: <unset> |
1: <unset> |
| 2: b |
2: b |
| </pre> |
</pre> |
| If the strings contain any non-printing characters, they are output as \0x | If the strings contain any non-printing characters, they are output as \xhh |
| escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the | escapes if the value is less than 256 and UTF mode is not set. Otherwise they |
| pattern. See below for the definition of non-printing characters. If the | are output as \x{hh...} escapes. See below for the definition of non-printing |
| pattern has the <b>/+</b> modifier, the output for substring 0 is followed by | characters. If the pattern has the <b>/+</b> modifier, the output for substring |
| the the rest of the subject string, identified by "0+" like this: | 0 is followed by the the rest of the subject string, identified by "0+" like |
| | this: |
| <pre> |
<pre> |
| re> /cat/+ |
re> /cat/+ |
| data> cataract |
data> cataract |
|
Line 651 prompt is used for continuations), data lines may not.
|
Line 874 prompt is used for continuations), data lines may not.
|
| included in data by means of the \n escape (or \r, \r\n, etc., depending on |
included in data by means of the \n escape (or \r, \r\n, etc., depending on |
| the newline sequence setting). |
the newline sequence setting). |
| </P> |
</P> |
| <br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br> | <br><a name="SEC10" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br> |
| <P> |
<P> |
| When the alternative matching function, <b>pcre_dfa_exec()</b>, is used (by | When the alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, is used (by |
| means of the \D escape sequence or the <b>-dfa</b> command line option), the |
means of the \D escape sequence or the <b>-dfa</b> command line option), the |
| output consists of a list of all the matches that start at the first point in |
output consists of a list of all the matches that start at the first point in |
| the subject where there is at least one match. For example: |
the subject where there is at least one match. For example: |
|
Line 687 at the end of the longest match. For example:
|
Line 910 at the end of the longest match. For example:
|
| Since the matching function does not support substring capture, the escape |
Since the matching function does not support substring capture, the escape |
| sequences that are concerned with captured substrings are not relevant. |
sequences that are concerned with captured substrings are not relevant. |
| </P> |
</P> |
| <br><a name="SEC9" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br> | <br><a name="SEC11" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br> |
| <P> |
<P> |
| When the alternative matching function has given the PCRE_ERROR_PARTIAL return, |
When the alternative matching function has given the PCRE_ERROR_PARTIAL return, |
| indicating that the subject partially matched the pattern, you can restart the |
indicating that the subject partially matched the pattern, you can restart the |
|
Line 704 For further information about partial matching, see th
|
Line 927 For further information about partial matching, see th
|
| <a href="pcrepartial.html"><b>pcrepartial</b></a> |
<a href="pcrepartial.html"><b>pcrepartial</b></a> |
| documentation. |
documentation. |
| </P> |
</P> |
| <br><a name="SEC10" href="#TOC1">CALLOUTS</a><br> | <br><a name="SEC12" href="#TOC1">CALLOUTS</a><br> |
| <P> |
<P> |
| If the pattern contains any callout requests, <b>pcretest</b>'s callout function |
If the pattern contains any callout requests, <b>pcretest</b>'s callout function |
| is called during matching. This works with both matching functions. By default, |
is called during matching. This works with both matching functions. By default, |
| the called function displays the callout number, the start and current |
the called function displays the callout number, the start and current |
| positions in the text at the callout time, and the next pattern item to be |
positions in the text at the callout time, and the next pattern item to be |
| tested. For example, the output | tested. For example: |
| <pre> |
<pre> |
| --->pqrabcdef |
--->pqrabcdef |
| 0 ^ ^ \d |
0 ^ ^ \d |
| </pre> |
</pre> |
| indicates that callout number 0 occurred for a match attempt starting at the | This output indicates that callout number 0 occurred for a match attempt |
| fourth character of the subject string, when the pointer was at the seventh | starting at the fourth character of the subject string, when the pointer was at |
| character of the data, and when the next pattern item was \d. Just one | the seventh character of the data, and when the next pattern item was \d. Just |
| circumflex is output if the start and current positions are the same. | one circumflex is output if the start and current positions are the same. |
| </P> |
</P> |
| <P> |
<P> |
| Callouts numbered 255 are assumed to be automatic callouts, inserted as a |
Callouts numbered 255 are assumed to be automatic callouts, inserted as a |
|
Line 765 the
|
Line 988 the
|
| <a href="pcrecallout.html"><b>pcrecallout</b></a> |
<a href="pcrecallout.html"><b>pcrecallout</b></a> |
| documentation. |
documentation. |
| </P> |
</P> |
| <br><a name="SEC11" href="#TOC1">NON-PRINTING CHARACTERS</a><br> | <br><a name="SEC13" href="#TOC1">NON-PRINTING CHARACTERS</a><br> |
| <P> |
<P> |
| When <b>pcretest</b> is outputting text in the compiled version of a pattern, |
When <b>pcretest</b> is outputting text in the compiled version of a pattern, |
| bytes other than 32-126 are always treated as non-printing characters are are |
bytes other than 32-126 are always treated as non-printing characters are are |
|
Line 777 string, it behaves in the same way, unless a different
|
Line 1000 string, it behaves in the same way, unless a different
|
| the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b> |
the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b> |
| function to distinguish printing and non-printing characters. |
function to distinguish printing and non-printing characters. |
| </P> |
</P> |
| <br><a name="SEC12" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br> | <br><a name="SEC14" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br> |
| <P> |
<P> |
| The facilities described in this section are not available when the POSIX |
The facilities described in this section are not available when the POSIX |
| interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
|
Line 825 been loaded, <b>pcretest</b> proceeds to read data lin
|
Line 1048 been loaded, <b>pcretest</b> proceeds to read data lin
|
| You can copy a file written by <b>pcretest</b> to a different host and reload it |
You can copy a file written by <b>pcretest</b> to a different host and reload it |
| there, even if the new host has opposite endianness to the one on which the |
there, even if the new host has opposite endianness to the one on which the |
| pattern was compiled. For example, you can compile on an i86 machine and run on |
pattern was compiled. For example, you can compile on an i86 machine and run on |
| a SPARC machine. | a SPARC machine. When a pattern is reloaded on a host with different |
| | endianness, the confirmation message is changed to: |
| | <pre> |
| | Compiled pattern (byte-inverted) loaded from /some/file |
| | </pre> |
| | The test suite contains some saved pre-compiled patterns with different |
| | endianness. These are reloaded using "<!" instead of just "<". This suppresses |
| | the "(byte-inverted)" text so that the output is the same on all hosts. It also |
| | forces debugging output once the pattern has been reloaded. |
| </P> |
</P> |
| <P> |
<P> |
| File names for saving and reloading can be absolute or relative, but note that |
File names for saving and reloading can be absolute or relative, but note that |
|
Line 842 string using a reloaded pattern is likely to cause <b>
|
Line 1073 string using a reloaded pattern is likely to cause <b>
|
| Finally, if you attempt to load a file that is not in the correct format, the |
Finally, if you attempt to load a file that is not in the correct format, the |
| result is undefined. |
result is undefined. |
| </P> |
</P> |
| <br><a name="SEC13" href="#TOC1">SEE ALSO</a><br> | <br><a name="SEC15" href="#TOC1">SEE ALSO</a><br> |
| <P> |
<P> |
| <b>pcre</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrejit</b>, | <b>pcre</b>(3), <b>pcre16</b>(3), <b>pcre32</b>(3), <b>pcreapi</b>(3), |
| <b>pcrematching</b>(3), <b>pcrepartial</b>(d), <b>pcrepattern</b>(3), | <b>pcrecallout</b>(3), |
| <b>pcreprecompile</b>(3). | <b>pcrejit</b>, <b>pcrematching</b>(3), <b>pcrepartial</b>(d), |
| | <b>pcrepattern</b>(3), <b>pcreprecompile</b>(3). |
| </P> |
</P> |
| <br><a name="SEC14" href="#TOC1">AUTHOR</a><br> | <br><a name="SEC16" href="#TOC1">AUTHOR</a><br> |
| <P> |
<P> |
| Philip Hazel |
Philip Hazel |
| <br> |
<br> |
|
Line 857 University Computing Service
|
Line 1089 University Computing Service
|
| Cambridge CB2 3QH, England. |
Cambridge CB2 3QH, England. |
| <br> |
<br> |
| </P> |
</P> |
| <br><a name="SEC15" href="#TOC1">REVISION</a><br> | <br><a name="SEC17" href="#TOC1">REVISION</a><br> |
| <P> |
<P> |
| Last updated: 02 December 2011 | Last updated: 26 April 2013 |
| <br> |
<br> |
| Copyright © 1997-2011 University of Cambridge. | Copyright © 1997-2013 University of Cambridge. |
| <br> |
<br> |
| <p> |
<p> |
| Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |