|
version 1.1.1.3, 2012/10/09 09:19:18
|
version 1.1.1.4, 2013/07/22 08:25:57
|
|
Line 14 man page, in case the conversion went wrong.
|
Line 14 man page, in case the conversion went wrong.
|
| <br> |
<br> |
| <ul> |
<ul> |
| <li><a name="TOC1" href="#SEC1">SYNOPSIS</a> |
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a> |
| <li><a name="TOC2" href="#SEC2">PCRE's 8-BIT and 16-BIT LIBRARIES</a> | <li><a name="TOC2" href="#SEC2">INPUT DATA FORMAT</a> |
| <li><a name="TOC3" href="#SEC3">COMMAND LINE OPTIONS</a> | <li><a name="TOC3" href="#SEC3">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a> |
| <li><a name="TOC4" href="#SEC4">DESCRIPTION</a> | <li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a> |
| <li><a name="TOC5" href="#SEC5">PATTERN MODIFIERS</a> | <li><a name="TOC5" href="#SEC5">DESCRIPTION</a> |
| <li><a name="TOC6" href="#SEC6">DATA LINES</a> | <li><a name="TOC6" href="#SEC6">PATTERN MODIFIERS</a> |
| <li><a name="TOC7" href="#SEC7">THE ALTERNATIVE MATCHING FUNCTION</a> | <li><a name="TOC7" href="#SEC7">DATA LINES</a> |
| <li><a name="TOC8" href="#SEC8">DEFAULT OUTPUT FROM PCRETEST</a> | <li><a name="TOC8" href="#SEC8">THE ALTERNATIVE MATCHING FUNCTION</a> |
| <li><a name="TOC9" href="#SEC9">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a> | <li><a name="TOC9" href="#SEC9">DEFAULT OUTPUT FROM PCRETEST</a> |
| <li><a name="TOC10" href="#SEC10">RESTARTING AFTER A PARTIAL MATCH</a> | <li><a name="TOC10" href="#SEC10">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a> |
| <li><a name="TOC11" href="#SEC11">CALLOUTS</a> | <li><a name="TOC11" href="#SEC11">RESTARTING AFTER A PARTIAL MATCH</a> |
| <li><a name="TOC12" href="#SEC12">NON-PRINTING CHARACTERS</a> | <li><a name="TOC12" href="#SEC12">CALLOUTS</a> |
| <li><a name="TOC13" href="#SEC13">SAVING AND RELOADING COMPILED PATTERNS</a> | <li><a name="TOC13" href="#SEC13">NON-PRINTING CHARACTERS</a> |
| <li><a name="TOC14" href="#SEC14">SEE ALSO</a> | <li><a name="TOC14" href="#SEC14">SAVING AND RELOADING COMPILED PATTERNS</a> |
| <li><a name="TOC15" href="#SEC15">AUTHOR</a> | <li><a name="TOC15" href="#SEC15">SEE ALSO</a> |
| <li><a name="TOC16" href="#SEC16">REVISION</a> | <li><a name="TOC16" href="#SEC16">AUTHOR</a> |
| | <li><a name="TOC17" href="#SEC17">REVISION</a> |
| </ul> |
</ul> |
| <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
| <P> |
<P> |
|
Line 43 details of the regular expressions themselves, see the
|
Line 44 details of the regular expressions themselves, see the
|
| documentation. For details of the PCRE library function calls and their |
documentation. For details of the PCRE library function calls and their |
| options, see the |
options, see the |
| <a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
| and | , |
| <a href="pcre16.html"><b>pcre16</b></a> |
<a href="pcre16.html"><b>pcre16</b></a> |
| documentation. The input for <b>pcretest</b> is a sequence of regular expression | and |
| patterns and strings to be matched, as described below. The output shows the | <a href="pcre32.html"><b>pcre32</b></a> |
| result of each match. Options on the command line and the patterns control PCRE | documentation. |
| options and exactly what is output. | |
| </P> |
</P> |
| <br><a name="SEC2" href="#TOC1">PCRE's 8-BIT and 16-BIT LIBRARIES</a><br> |
|
| <P> |
<P> |
| |
The input for <b>pcretest</b> is a sequence of regular expression patterns and |
| |
strings to be matched, as described below. The output shows the result of each |
| |
match. Options on the command line and the patterns control PCRE options and |
| |
exactly what is output. |
| |
</P> |
| |
<P> |
| |
As PCRE has evolved, it has acquired many different features, and as a result, |
| |
<b>pcretest</b> now has rather a lot of obscure options for testing every |
| |
possible feature. Some of these options are specifically designed for use in |
| |
conjunction with the test script and data files that are distributed as part of |
| |
PCRE, and are unlikely to be of use otherwise. They are all documented here, |
| |
but without much justification. |
| |
</P> |
| |
<br><a name="SEC2" href="#TOC1">INPUT DATA FORMAT</a><br> |
| |
<P> |
| |
Input to <b>pcretest</b> is processed line by line, either by calling the C |
| |
library's <b>fgets()</b> function, or via the <b>libreadline</b> library (see |
| |
below). In Unix-like environments, <b>fgets()</b> treats any bytes other than |
| |
newline as data characters. However, in some Windows environments character 26 |
| |
(hex 1A) causes an immediate end of file, and no further data is read. For |
| |
maximum portability, therefore, it is safest to use only ASCII characters in |
| |
<b>pcretest</b> input files. |
| |
</P> |
| |
<br><a name="SEC3" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br> |
| |
<P> |
| From release 8.30, two separate PCRE libraries can be built. The original one |
From release 8.30, two separate PCRE libraries can be built. The original one |
| supports 8-bit character strings, whereas the newer 16-bit library supports |
supports 8-bit character strings, whereas the newer 16-bit library supports |
| character strings encoded in 16-bit units. The <b>pcretest</b> program can be | character strings encoded in 16-bit units. From release 8.32, a third library |
| used to test both libraries. However, it is itself still an 8-bit program, | can be built, supporting character strings encoded in 32-bit units. The |
| reading 8-bit input and writing 8-bit output. When testing the 16-bit library, | <b>pcretest</b> program can be used to test all three libraries. However, it is |
| the patterns and data strings are converted to 16-bit format before being | itself still an 8-bit program, reading 8-bit input and writing 8-bit output. |
| passed to the PCRE library functions. Results are converted to 8-bit for | When testing the 16-bit or 32-bit library, the patterns and data strings are |
| output. | converted to 16- or 32-bit format before being passed to the PCRE library |
| | functions. Results are converted to 8-bit for output. |
| </P> |
</P> |
| <P> |
<P> |
| References to functions and structures of the form <b>pcre[16]_xx</b> below | References to functions and structures of the form <b>pcre[16|32]_xx</b> below |
| mean "<b>pcre_xx</b> when using the 8-bit library or <b>pcre16_xx</b> when using | mean "<b>pcre_xx</b> when using the 8-bit library, <b>pcre16_xx</b> when using |
| the 16-bit library". | the 16-bit library, or <b>pcre32_xx</b> when using the 32-bit library". |
| </P> |
</P> |
| <br><a name="SEC3" href="#TOC1">COMMAND LINE OPTIONS</a><br> | <br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br> |
| <P> |
<P> |
| <b>-16</b> | <b>-8</b> |
| If both the 8-bit and the 16-bit libraries have been built, this option causes | If both the 8-bit library has been built, this option causes the 8-bit library |
| the 16-bit library to be used. If only the 16-bit library has been built, this | to be used (which is the default); if the 8-bit library has not been built, |
| is the default (so has no effect). If only the 8-bit library has been built, | |
| this option causes an error. |
this option causes an error. |
| </P> |
</P> |
| <P> |
<P> |
| |
<b>-16</b> |
| |
If both the 8-bit or the 32-bit, and the 16-bit libraries have been built, this |
| |
option causes the 16-bit library to be used. If only the 16-bit library has been |
| |
built, this is the default (so has no effect). If only the 8-bit or the 32-bit |
| |
library has been built, this option causes an error. |
| |
</P> |
| |
<P> |
| |
<b>-32</b> |
| |
If both the 8-bit or the 16-bit, and the 32-bit libraries have been built, this |
| |
option causes the 32-bit library to be used. If only the 32-bit library has been |
| |
built, this is the default (so has no effect). If only the 8-bit or the 16-bit |
| |
library has been built, this option causes an error. |
| |
</P> |
| |
<P> |
| <b>-b</b> |
<b>-b</b> |
| Behave as if each pattern has the <b>/B</b> (show byte code) modifier; the |
Behave as if each pattern has the <b>/B</b> (show byte code) modifier; the |
| internal form is output after compilation. |
internal form is output after compilation. |
|
Line 82 internal form is output after compilation.
|
Line 120 internal form is output after compilation.
|
| <P> |
<P> |
| <b>-C</b> |
<b>-C</b> |
| Output the version number of the PCRE library, and all available information |
Output the version number of the PCRE library, and all available information |
| about the optional features that are included, and then exit. All other options | about the optional features that are included, and then exit with zero exit |
| are ignored. | code. All other options are ignored. |
| </P> |
</P> |
| <P> |
<P> |
| <b>-C</b> <i>option</i> |
<b>-C</b> <i>option</i> |
| Output information about a specific build-time option, then exit. This |
Output information about a specific build-time option, then exit. This |
| functionality is intended for use in scripts such as <b>RunTest</b>. The |
functionality is intended for use in scripts such as <b>RunTest</b>. The |
| following options output the value indicated: | following options output the value and set the exit code as indicated: |
| <pre> |
<pre> |
| linksize the internal link size (2, 3, or 4) | ebcdic-nl the code for LF (= NL) in an EBCDIC environment: |
| | 0x15 or 0x25 |
| | 0 if used in an ASCII environment |
| | exit code is always 0 |
| | linksize the configured internal link size (2, 3, or 4) |
| | exit code is set to the link size |
| newline the default newline setting: |
newline the default newline setting: |
| CR, LF, CRLF, ANYCRLF, or ANY |
CR, LF, CRLF, ANYCRLF, or ANY |
| |
exit code is always 0 |
| </pre> |
</pre> |
| The following options output 1 for true or zero for false: | The following options output 1 for true or 0 for false, and set the exit code |
| | to the same value: |
| <pre> |
<pre> |
| |
ebcdic compiled for an EBCDIC environment |
| jit just-in-time support is available |
jit just-in-time support is available |
| pcre16 the 16-bit library was built |
pcre16 the 16-bit library was built |
| |
pcre32 the 32-bit library was built |
| pcre8 the 8-bit library was built |
pcre8 the 8-bit library was built |
| ucp Unicode property support is available |
ucp Unicode property support is available |
| utf UTF-8 and/or UTF-16 support is available | utf UTF-8 and/or UTF-16 and/or UTF-32 support |
| </PRE> | is available |
| | </pre> |
| | If an unknown option is given, an error message is output; the exit code is 0. |
| </P> |
</P> |
| <P> |
<P> |
| <b>-d</b> |
<b>-d</b> |
|
Line 113 form and information about the compiled pattern is out
|
Line 162 form and information about the compiled pattern is out
|
| <P> |
<P> |
| <b>-dfa</b> |
<b>-dfa</b> |
| Behave as if each data line contains the \D escape sequence; this causes the |
Behave as if each data line contains the \D escape sequence; this causes the |
| alternative matching function, <b>pcre[16]_dfa_exec()</b>, to be used instead of | alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, to be used instead |
| the standard <b>pcre[16]_exec()</b> function (more detail is given below). | of the standard <b>pcre[16|32]_exec()</b> function (more detail is given below). |
| </P> |
</P> |
| <P> |
<P> |
| <b>-help</b> |
<b>-help</b> |
|
Line 129 compiled pattern is given after compilation.
|
Line 178 compiled pattern is given after compilation.
|
| <b>-M</b> |
<b>-M</b> |
| Behave as if each data line contains the \M escape sequence; this causes |
Behave as if each data line contains the \M escape sequence; this causes |
| PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by |
PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by |
| calling <b>pcre[16]_exec()</b> repeatedly with different limits. | calling <b>pcre[16|32]_exec()</b> repeatedly with different limits. |
| </P> |
</P> |
| <P> |
<P> |
| <b>-m</b> |
<b>-m</b> |
|
Line 140 bytes for both libraries.
|
Line 189 bytes for both libraries.
|
| <P> |
<P> |
| <b>-o</b> <i>osize</i> |
<b>-o</b> <i>osize</i> |
| Set the number of elements in the output vector that is used when calling |
Set the number of elements in the output vector that is used when calling |
| <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> to be <i>osize</i>. The | <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> to be <i>osize</i>. The |
| default value is 45, which is enough for 14 capturing subexpressions for |
default value is 45, which is enough for 14 capturing subexpressions for |
| <b>pcre[16]_exec()</b> or 22 different matches for <b>pcre[16]_dfa_exec()</b>. | <b>pcre[16|32]_exec()</b> or 22 different matches for |
| | <b>pcre[16|32]_dfa_exec()</b>. |
| The vector size can be changed for individual matching calls by including \O |
The vector size can be changed for individual matching calls by including \O |
| in the data line (see below). |
in the data line (see below). |
| </P> |
</P> |
|
Line 165 megabytes.
|
Line 215 megabytes.
|
| <b>-s</b> or <b>-s+</b> |
<b>-s</b> or <b>-s+</b> |
| Behave as if each pattern has the <b>/S</b> modifier; in other words, force each |
Behave as if each pattern has the <b>/S</b> modifier; in other words, force each |
| pattern to be studied. If <b>-s+</b> is used, all the JIT compile options are |
pattern to be studied. If <b>-s+</b> is used, all the JIT compile options are |
| passed to <b>pcre[16]_study()</b>, causing just-in-time optimization to be set | passed to <b>pcre[16|32]_study()</b>, causing just-in-time optimization to be set |
| up if it is available, for both full and partial matching. Specific JIT compile |
up if it is available, for both full and partial matching. Specific JIT compile |
| options can be selected by following <b>-s+</b> with a digit in the range 1 to |
options can be selected by following <b>-s+</b> with a digit in the range 1 to |
| 7, which selects the JIT compile modes as follows: |
7, which selects the JIT compile modes as follows: |
|
Line 180 options can be selected by following <b>-s+</b> with a
|
Line 230 options can be selected by following <b>-s+</b> with a
|
| If <b>-s++</b> is used instead of <b>-s+</b> (with or without a following digit), |
If <b>-s++</b> is used instead of <b>-s+</b> (with or without a following digit), |
| the text "(JIT)" is added to the first output line after a match or no match |
the text "(JIT)" is added to the first output line after a match or no match |
| when JIT-compiled code was actually used. |
when JIT-compiled code was actually used. |
| </P> | <br> |
| <P> | <br> |
| | Note that there are pattern options that can override <b>-s</b>, either |
| | specifying no studying at all, or suppressing JIT compilation. |
| | <br> |
| | <br> |
| If the <b>/I</b> or <b>/D</b> option is present on a pattern (requesting output |
If the <b>/I</b> or <b>/D</b> option is present on a pattern (requesting output |
| about the compiled pattern), information about the result of studying is not |
about the compiled pattern), information about the result of studying is not |
| included when studying is caused only by <b>-s</b> and neither <b>-i</b> nor |
included when studying is caused only by <b>-s</b> and neither <b>-i</b> nor |
|
Line 215 to iterate 500000 times.
|
Line 269 to iterate 500000 times.
|
| This is like <b>-t</b> except that it times only the matching phase, not the |
This is like <b>-t</b> except that it times only the matching phase, not the |
| compile or study phases. |
compile or study phases. |
| </P> |
</P> |
| <br><a name="SEC4" href="#TOC1">DESCRIPTION</a><br> | <br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br> |
| <P> |
<P> |
| If <b>pcretest</b> is given two filename arguments, it reads from the first and |
If <b>pcretest</b> is given two filename arguments, it reads from the first and |
| writes to the second. If it is given only one filename argument, it reads from |
writes to the second. If it is given only one filename argument, it reads from |
|
Line 272 backslash, because
|
Line 326 backslash, because
|
| is interpreted as the first line of a pattern that starts with "abc/", causing |
is interpreted as the first line of a pattern that starts with "abc/", causing |
| pcretest to read the next line as a continuation of the regular expression. |
pcretest to read the next line as a continuation of the regular expression. |
| </P> |
</P> |
| <br><a name="SEC5" href="#TOC1">PATTERN MODIFIERS</a><br> | <br><a name="SEC6" href="#TOC1">PATTERN MODIFIERS</a><br> |
| <P> |
<P> |
| A pattern may be followed by any number of modifiers, which are mostly single |
A pattern may be followed by any number of modifiers, which are mostly single |
| characters. Following Perl usage, these are referred to below as, for example, | characters, though some of these can be qualified by further characters. |
| "the <b>/i</b> modifier", even though the delimiter of the pattern need not | Following Perl usage, these are referred to below as, for example, "the |
| always be a slash, and no slash is used when writing modifiers. White space may | <b>/i</b> modifier", even though the delimiter of the pattern need not always be |
| appear between the final pattern delimiter and the first modifier, and between | a slash, and no slash is used when writing modifiers. White space may appear |
| the modifiers themselves. | between the final pattern delimiter and the first modifier, and between the |
| | modifiers themselves. For reference, here is a complete list of modifiers. They |
| | fall into several groups that are described in detail in the following |
| | sections. |
| | <pre> |
| | <b>/8</b> set UTF mode |
| | <b>/9</b> set PCRE_NEVER_UTF (locks out UTF mode) |
| | <b>/?</b> disable UTF validity check |
| | <b>/+</b> show remainder of subject after match |
| | <b>/=</b> show all captures (not just those that are set) |
| | |
| | <b>/A</b> set PCRE_ANCHORED |
| | <b>/B</b> show compiled code |
| | <b>/C</b> set PCRE_AUTO_CALLOUT |
| | <b>/D</b> same as <b>/B</b> plus <b>/I</b> |
| | <b>/E</b> set PCRE_DOLLAR_ENDONLY |
| | <b>/F</b> flip byte order in compiled pattern |
| | <b>/f</b> set PCRE_FIRSTLINE |
| | <b>/G</b> find all matches (shorten string) |
| | <b>/g</b> find all matches (use startoffset) |
| | <b>/I</b> show information about pattern |
| | <b>/i</b> set PCRE_CASELESS |
| | <b>/J</b> set PCRE_DUPNAMES |
| | <b>/K</b> show backtracking control names |
| | <b>/L</b> set locale |
| | <b>/M</b> show compiled memory size |
| | <b>/m</b> set PCRE_MULTILINE |
| | <b>/N</b> set PCRE_NO_AUTO_CAPTURE |
| | <b>/P</b> use the POSIX wrapper |
| | <b>/S</b> study the pattern after compilation |
| | <b>/s</b> set PCRE_DOTALL |
| | <b>/T</b> select character tables |
| | <b>/U</b> set PCRE_UNGREEDY |
| | <b>/W</b> set PCRE_UCP |
| | <b>/X</b> set PCRE_EXTRA |
| | <b>/x</b> set PCRE_EXTENDED |
| | <b>/Y</b> set PCRE_NO_START_OPTIMIZE |
| | <b>/Z</b> don't show lengths in <b>/B</b> output |
| | |
| | <b>/<any></b> set PCRE_NEWLINE_ANY |
| | <b>/<anycrlf></b> set PCRE_NEWLINE_ANYCRLF |
| | <b>/<cr></b> set PCRE_NEWLINE_CR |
| | <b>/<crlf></b> set PCRE_NEWLINE_CRLF |
| | <b>/<lf></b> set PCRE_NEWLINE_LF |
| | <b>/<bsr_anycrlf></b> set PCRE_BSR_ANYCRLF |
| | <b>/<bsr_unicode></b> set PCRE_BSR_UNICODE |
| | <b>/<JS></b> set PCRE_JAVASCRIPT_COMPAT |
| | |
| | </PRE> |
| </P> |
</P> |
| |
<br><b> |
| |
Perl-compatible modifiers |
| |
</b><br> |
| <P> |
<P> |
| The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS, |
The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS, |
| PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when |
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when |
| <b>pcre[16]_compile()</b> is called. These four modifier letters have the same | <b>pcre[16|32]_compile()</b> is called. These four modifier letters have the same |
| effect as they do in Perl. For example: |
effect as they do in Perl. For example: |
| <pre> |
<pre> |
| /caseless/i |
/caseless/i |
| </pre> | |
| | </PRE> |
| | </P> |
| | <br><b> |
| | Modifiers for other PCRE options |
| | </b><br> |
| | <P> |
| The following table shows additional modifiers for setting PCRE compile-time |
The following table shows additional modifiers for setting PCRE compile-time |
| options that do not correspond to anything in Perl: |
options that do not correspond to anything in Perl: |
| <pre> |
<pre> |
|
Line 298 options that do not correspond to anything in Perl:
|
Line 409 options that do not correspond to anything in Perl:
|
| <b>/8</b> PCRE_UTF16 ) when using the 16-bit |
<b>/8</b> PCRE_UTF16 ) when using the 16-bit |
| <b>/?</b> PCRE_NO_UTF16_CHECK ) library |
<b>/?</b> PCRE_NO_UTF16_CHECK ) library |
| |
|
| |
<b>/8</b> PCRE_UTF32 ) when using the 32-bit |
| |
<b>/?</b> PCRE_NO_UTF32_CHECK ) library |
| |
|
| |
<b>/9</b> PCRE_NEVER_UTF |
| <b>/A</b> PCRE_ANCHORED |
<b>/A</b> PCRE_ANCHORED |
| <b>/C</b> PCRE_AUTO_CALLOUT |
<b>/C</b> PCRE_AUTO_CALLOUT |
| <b>/E</b> PCRE_DOLLAR_ENDONLY |
<b>/E</b> PCRE_DOLLAR_ENDONLY |
|
Line 308 options that do not correspond to anything in Perl:
|
Line 423 options that do not correspond to anything in Perl:
|
| <b>/W</b> PCRE_UCP |
<b>/W</b> PCRE_UCP |
| <b>/X</b> PCRE_EXTRA |
<b>/X</b> PCRE_EXTRA |
| <b>/Y</b> PCRE_NO_START_OPTIMIZE |
<b>/Y</b> PCRE_NO_START_OPTIMIZE |
| <b>/<JS></b> PCRE_JAVASCRIPT_COMPAT | <b>/<any></b> PCRE_NEWLINE_ANY |
| | <b>/<anycrlf></b> PCRE_NEWLINE_ANYCRLF |
| <b>/<cr></b> PCRE_NEWLINE_CR |
<b>/<cr></b> PCRE_NEWLINE_CR |
| <b>/<lf></b> PCRE_NEWLINE_LF |
|
| <b>/<crlf></b> PCRE_NEWLINE_CRLF |
<b>/<crlf></b> PCRE_NEWLINE_CRLF |
| <b>/<anycrlf></b> PCRE_NEWLINE_ANYCRLF | <b>/<lf></b> PCRE_NEWLINE_LF |
| <b>/<any></b> PCRE_NEWLINE_ANY | |
| <b>/<bsr_anycrlf></b> PCRE_BSR_ANYCRLF |
<b>/<bsr_anycrlf></b> PCRE_BSR_ANYCRLF |
| <b>/<bsr_unicode></b> PCRE_BSR_UNICODE |
<b>/<bsr_unicode></b> PCRE_BSR_UNICODE |
| |
<b>/<JS></b> PCRE_JAVASCRIPT_COMPAT |
| </pre> |
</pre> |
| The modifiers that are enclosed in angle brackets are literal strings as shown, |
The modifiers that are enclosed in angle brackets are literal strings as shown, |
| including the angle brackets, but the letters within can be in either case. |
including the angle brackets, but the letters within can be in either case. |
|
Line 323 This example sets multiline matching with CRLF as the
|
Line 438 This example sets multiline matching with CRLF as the
|
| <pre> |
<pre> |
| /^abc/m<CRLF> |
/^abc/m<CRLF> |
| </pre> |
</pre> |
| As well as turning on the PCRE_UTF8/16 option, the <b>/8</b> modifier causes | As well as turning on the PCRE_UTF8/16/32 option, the <b>/8</b> modifier causes |
| all non-printing characters in output strings to be printed using the |
all non-printing characters in output strings to be printed using the |
| \x{hh...} notation. Otherwise, those less than 0x100 are output in hex without |
\x{hh...} notation. Otherwise, those less than 0x100 are output in hex without |
| the curly brackets. |
the curly brackets. |
|
Line 341 Searching for all possible matches within each subject
|
Line 456 Searching for all possible matches within each subject
|
| by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called |
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called |
| again to search the remainder of the subject string. The difference between |
again to search the remainder of the subject string. The difference between |
| <b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to |
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to |
| <b>pcre[16]_exec()</b> to start searching at a new point within the entire | <b>pcre[16|32]_exec()</b> to start searching at a new point within the entire |
| string (which is in effect what Perl does), whereas the latter passes over a |
string (which is in effect what Perl does), whereas the latter passes over a |
| shortened substring. This makes a difference to the matching process if the |
shortened substring. This makes a difference to the matching process if the |
| pattern begins with a lookbehind assertion (including \b or \B). |
pattern begins with a lookbehind assertion (including \b or \B). |
| </P> |
</P> |
| <P> |
<P> |
| If any call to <b>pcre[16]_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches | If any call to <b>pcre[16|32]_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches |
| an empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
an empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
| PCRE_ANCHORED flags set in order to search for another, non-empty, match at the |
PCRE_ANCHORED flags set in order to search for another, non-empty, match at the |
| same point. If this second match fails, the start offset is advanced, and the |
same point. If this second match fails, the start offset is advanced, and the |
|
Line 378 modifier because /S+ and /S++ have other meanings.
|
Line 493 modifier because /S+ and /S++ have other meanings.
|
| The <b>/=</b> modifier requests that the values of all potential captured |
The <b>/=</b> modifier requests that the values of all potential captured |
| parentheses be output after a match. By default, only those up to the highest |
parentheses be output after a match. By default, only those up to the highest |
| one actually used in the match are output (corresponding to the return code |
one actually used in the match are output (corresponding to the return code |
| from <b>pcre[16]_exec()</b>). Values in the offsets vector corresponding to | from <b>pcre[16|32]_exec()</b>). Values in the offsets vector corresponding to |
| higher numbers should be set to -1, and these are output as "<unset>". This |
higher numbers should be set to -1, and these are output as "<unset>". This |
| modifier gives a way of checking that this is happening. |
modifier gives a way of checking that this is happening. |
| </P> |
</P> |
|
Line 406 below.
|
Line 521 below.
|
| <P> |
<P> |
| The <b>/I</b> modifier requests that <b>pcretest</b> output information about the |
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the |
| compiled pattern (whether it is anchored, has a fixed first character, and |
compiled pattern (whether it is anchored, has a fixed first character, and |
| so on). It does this by calling <b>pcre[16]_fullinfo()</b> after compiling a | so on). It does this by calling <b>pcre[16|32]_fullinfo()</b> after compiling a |
| pattern. If the pattern is studied, the results of that are also output. |
pattern. If the pattern is studied, the results of that are also output. |
| </P> |
</P> |
| <P> |
<P> |
| The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking |
The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking |
| control verbs that are returned from calls to <b>pcre[16]_exec()</b>. It causes | control verbs that are returned from calls to <b>pcre[16|32]_exec()</b>. It causes |
| <b>pcretest</b> to create a <b>pcre[16]_extra</b> block if one has not already | <b>pcretest</b> to create a <b>pcre[16|32]_extra</b> block if one has not already |
| been created by a call to <b>pcre[16]_study()</b>, and to set the | been created by a call to <b>pcre[16|32]_study()</b>, and to set the |
| PCRE_EXTRA_MARK flag and the <b>mark</b> field within it, every time that |
PCRE_EXTRA_MARK flag and the <b>mark</b> field within it, every time that |
| <b>pcre[16]_exec()</b> is called. If the variable that the <b>mark</b> field | <b>pcre[16|32]_exec()</b> is called. If the variable that the <b>mark</b> field |
| points to is non-NULL for a match, non-match, or partial match, <b>pcretest</b> |
points to is non-NULL for a match, non-match, or partial match, <b>pcretest</b> |
| prints the string to which it points. For a match, this is shown on a line by |
prints the string to which it points. For a match, this is shown on a line by |
| itself, tagged with "MK:". For a non-match it is added to the message. |
itself, tagged with "MK:". For a non-match it is added to the message. |
|
Line 427 example,
|
Line 542 example,
|
| /pattern/Lfr_FR |
/pattern/Lfr_FR |
| </pre> |
</pre> |
| For this reason, it must be the last modifier. The given locale is set, |
For this reason, it must be the last modifier. The given locale is set, |
| <b>pcre[16]_maketables()</b> is called to build a set of character tables for | <b>pcre[16|32]_maketables()</b> is called to build a set of character tables for |
| the locale, and this is then passed to <b>pcre[16]_compile()</b> when compiling | the locale, and this is then passed to <b>pcre[16|32]_compile()</b> when compiling |
| the regular expression. Without an <b>/L</b> (or <b>/T</b>) modifier, NULL is |
the regular expression. Without an <b>/L</b> (or <b>/T</b>) modifier, NULL is |
| passed as the tables pointer; that is, <b>/L</b> applies only to the expression |
passed as the tables pointer; that is, <b>/L</b> applies only to the expression |
| on which it appears. |
on which it appears. |
|
Line 436 on which it appears.
|
Line 551 on which it appears.
|
| <P> |
<P> |
| The <b>/M</b> modifier causes the size in bytes of the memory block used to hold |
The <b>/M</b> modifier causes the size in bytes of the memory block used to hold |
| the compiled pattern to be output. This does not include the size of the |
the compiled pattern to be output. This does not include the size of the |
| <b>pcre[16]</b> block; it is just the actual compiled data. If the pattern is | <b>pcre[16|32]</b> block; it is just the actual compiled data. If the pattern is |
| successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the |
successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the |
| JIT compiled code is also output. |
JIT compiled code is also output. |
| </P> |
</P> |
| <P> |
<P> |
| If the <b>/S</b> modifier appears once, it causes <b>pcre[16]_study()</b> to be | The <b>/S</b> modifier causes <b>pcre[16|32]_study()</b> to be called after the |
| called after the expression has been compiled, and the results used when the | expression has been compiled, and the results used when the expression is |
| expression is matched. If <b>/S</b> appears twice, it suppresses studying, even | matched. There are a number of qualifying characters that may follow <b>/S</b>. |
| | They may appear in any order. |
| | </P> |
| | <P> |
| | If <b>S</b> is followed by an exclamation mark, <b>pcre[16|32]_study()</b> is called |
| | with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a |
| | <b>pcre_extra</b> block, even when studying discovers no useful information. |
| | </P> |
| | <P> |
| | If <b>/S</b> is followed by a second S character, it suppresses studying, even |
| if it was requested externally by the <b>-s</b> command line option. This makes |
if it was requested externally by the <b>-s</b> command line option. This makes |
| it possible to specify that certain patterns are always studied, and others are |
it possible to specify that certain patterns are always studied, and others are |
| never studied, independently of <b>-s</b>. This feature is used in the test |
never studied, independently of <b>-s</b>. This feature is used in the test |
| files in a few cases where the output is different when the pattern is studied. |
files in a few cases where the output is different when the pattern is studied. |
| </P> |
</P> |
| <P> |
<P> |
| If the <b>/S</b> modifier is immediately followed by a + character, the call to | If the <b>/S</b> modifier is followed by a + character, the call to |
| <b>pcre[16]_study()</b> is made with all the JIT study options, requesting | <b>pcre[16|32]_study()</b> is made with all the JIT study options, requesting |
| just-in-time optimization support if it is available, for both normal and |
just-in-time optimization support if it is available, for both normal and |
| partial matching. If you want to restrict the JIT compiling modes, you can |
partial matching. If you want to restrict the JIT compiling modes, you can |
| follow <b>/S+</b> with a digit in the range 1 to 7: |
follow <b>/S+</b> with a digit in the range 1 to 7: |
|
Line 473 immediately after <b>/S</b> or <b>/S+</b> because this
|
Line 597 immediately after <b>/S</b> or <b>/S+</b> because this
|
| </P> |
</P> |
| <P> |
<P> |
| If JIT studying is successful, the compiled JIT code will automatically be used |
If JIT studying is successful, the compiled JIT code will automatically be used |
| when <b>pcre[16]_exec()</b> is run, except when incompatible run-time options | when <b>pcre[16|32]_exec()</b> is run, except when incompatible run-time options |
| are specified. For more details, see the |
are specified. For more details, see the |
| <a href="pcrejit.html"><b>pcrejit</b></a> |
<a href="pcrejit.html"><b>pcrejit</b></a> |
| documentation. See also the <b>\J</b> escape sequence below for a way of |
documentation. See also the <b>\J</b> escape sequence below for a way of |
| setting the size of the JIT stack. |
setting the size of the JIT stack. |
| </P> |
</P> |
| <P> |
<P> |
| |
Finally, if <b>/S</b> is followed by a minus character, JIT compilation is |
| |
suppressed, even if it was requested externally by the <b>-s</b> command line |
| |
option. This makes it possible to specify that JIT is never to be used for |
| |
certain patterns. |
| |
</P> |
| |
<P> |
| The <b>/T</b> modifier must be followed by a single digit. It causes a specific |
The <b>/T</b> modifier must be followed by a single digit. It causes a specific |
| set of built-in character tables to be passed to <b>pcre[16]_compile()</b>. It | set of built-in character tables to be passed to <b>pcre[16|32]_compile()</b>. It |
| is used in the standard PCRE tests to check behaviour with different character |
is used in the standard PCRE tests to check behaviour with different character |
| tables. The digit specifies the tables as follows: |
tables. The digit specifies the tables as follows: |
| <pre> |
<pre> |
|
Line 512 function:
|
Line 642 function:
|
| The <b>/+</b> modifier works as described above. All other modifiers are |
The <b>/+</b> modifier works as described above. All other modifiers are |
| ignored. |
ignored. |
| </P> |
</P> |
| <br><a name="SEC6" href="#TOC1">DATA LINES</a><br> | <br><a name="SEC7" href="#TOC1">DATA LINES</a><br> |
| <P> |
<P> |
| Before each data line is passed to <b>pcre[16]_exec()</b>, leading and trailing | Before each data line is passed to <b>pcre[16|32]_exec()</b>, leading and trailing |
| white space is removed, and it is then scanned for \ escapes. Some of these |
white space is removed, and it is then scanned for \ escapes. Some of these |
| are pretty esoteric features, intended for checking out some of the more |
are pretty esoteric features, intended for checking out some of the more |
| complicated features of PCRE. If you are just testing "ordinary" regular |
complicated features of PCRE. If you are just testing "ordinary" regular |
|
Line 531 recognized:
|
Line 661 recognized:
|
| \t tab (\x09) |
\t tab (\x09) |
| \v vertical tab (\x0b) |
\v vertical tab (\x0b) |
| \nnn octal character (up to 3 octal digits); always |
\nnn octal character (up to 3 octal digits); always |
| a byte unless > 255 in UTF-8 or 16-bit mode | a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode |
| \xhh hexadecimal byte (up to 2 hex digits) |
\xhh hexadecimal byte (up to 2 hex digits) |
| \x{hh...} hexadecimal character (any number of hex digits) |
\x{hh...} hexadecimal character (any number of hex digits) |
| \A pass the PCRE_ANCHORED option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \A pass the PCRE_ANCHORED option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \B pass the PCRE_NOTBOL option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \B pass the PCRE_NOTBOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \Cdd call pcre[16]_copy_substring() for substring dd after a successful match (number less than 32) | \Cdd call pcre[16|32]_copy_substring() for substring dd after a successful match (number less than 32) |
| \Cname call pcre[16]_copy_named_substring() for substring "name" after a successful match (name termin- | \Cname call pcre[16|32]_copy_named_substring() for substring "name" after a successful match (name termin- |
| ated by next non alphanumeric character) |
ated by next non alphanumeric character) |
| \C+ show the current captured substrings at callout time |
\C+ show the current captured substrings at callout time |
| \C- do not supply a callout function |
\C- do not supply a callout function |
| \C!n return 1 instead of 0 when callout number n is reached |
\C!n return 1 instead of 0 when callout number n is reached |
| \C!n!m return 1 instead of 0 when callout number n is reached for the nth time |
\C!n!m return 1 instead of 0 when callout number n is reached for the nth time |
| \C*n pass the number n (may be negative) as callout data; this is used as the callout return value |
\C*n pass the number n (may be negative) as callout data; this is used as the callout return value |
| \D use the <b>pcre[16]_dfa_exec()</b> match function | \D use the <b>pcre[16|32]_dfa_exec()</b> match function |
| \F only shortest match for <b>pcre[16]_dfa_exec()</b> | \F only shortest match for <b>pcre[16|32]_dfa_exec()</b> |
| \Gdd call pcre[16]_get_substring() for substring dd after a successful match (number less than 32) | \Gdd call pcre[16|32]_get_substring() for substring dd after a successful match (number less than 32) |
| \Gname call pcre[16]_get_named_substring() for substring "name" after a successful match (name termin- | \Gname call pcre[16|32]_get_named_substring() for substring "name" after a successful match (name termin- |
| ated by next non-alphanumeric character) |
ated by next non-alphanumeric character) |
| \Jdd set up a JIT stack of dd kilobytes maximum (any number of digits) |
\Jdd set up a JIT stack of dd kilobytes maximum (any number of digits) |
| \L call pcre[16]_get_substringlist() after a successful match | \L call pcre[16|32]_get_substringlist() after a successful match |
| \M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings |
\M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings |
| \N pass the PCRE_NOTEMPTY option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>; if used twice, pass the | \N pass the PCRE_NOTEMPTY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the |
| PCRE_NOTEMPTY_ATSTART option |
PCRE_NOTEMPTY_ATSTART option |
| \Odd set the size of the output vector passed to <b>pcre[16]_exec()</b> to dd (any number of digits) | \Odd set the size of the output vector passed to <b>pcre[16|32]_exec()</b> to dd (any number of digits) |
| \P pass the PCRE_PARTIAL_SOFT option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b>; if used twice, pass the | \P pass the PCRE_PARTIAL_SOFT option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the |
| PCRE_PARTIAL_HARD option |
PCRE_PARTIAL_HARD option |
| \Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits) |
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits) |
| \R pass the PCRE_DFA_RESTART option to <b>pcre[16]_dfa_exec()</b> | \R pass the PCRE_DFA_RESTART option to <b>pcre[16|32]_dfa_exec()</b> |
| \S output details of memory get/free calls during matching |
\S output details of memory get/free calls during matching |
| \Y pass the PCRE_NO_START_OPTIMIZE option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \Y pass the PCRE_NO_START_OPTIMIZE option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \Z pass the PCRE_NOTEOL option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \Z pass the PCRE_NOTEOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \? pass the PCRE_NO_UTF[8|16]_CHECK option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \? pass the PCRE_NO_UTF[8|16|32]_CHECK option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \>dd start the match at offset dd (optional "-"; then any number of digits); this sets the <i>startoffset</i> |
\>dd start the match at offset dd (optional "-"; then any number of digits); this sets the <i>startoffset</i> |
| argument for <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | argument for <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<cr> pass the PCRE_NEWLINE_CR option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \<cr> pass the PCRE_NEWLINE_CR option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<lf> pass the PCRE_NEWLINE_LF option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \<lf> pass the PCRE_NEWLINE_LF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<crlf> pass the PCRE_NEWLINE_CRLF option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \<crlf> pass the PCRE_NEWLINE_CRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| \<any> pass the PCRE_NEWLINE_ANY option to <b>pcre[16]_exec()</b> or <b>pcre[16]_dfa_exec()</b> | \<any> pass the PCRE_NEWLINE_ANY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> |
| </pre> |
</pre> |
| The use of \x{hh...} is not dependent on the use of the <b>/8</b> modifier on |
The use of \x{hh...} is not dependent on the use of the <b>/8</b> modifier on |
| the pattern. It is recognized always. There may be any number of hexadecimal |
the pattern. It is recognized always. There may be any number of hexadecimal |
|
Line 588 In UTF-16 mode, all 4-digit \x{hhhh} values are accept
|
Line 718 In UTF-16 mode, all 4-digit \x{hhhh} values are accept
|
| possible to construct invalid UTF-16 sequences for testing purposes. |
possible to construct invalid UTF-16 sequences for testing purposes. |
| </P> |
</P> |
| <P> |
<P> |
| |
In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it |
| |
possible to construct invalid UTF-32 sequences for testing purposes. |
| |
</P> |
| |
<P> |
| The escapes that specify line ending sequences are literal strings, exactly as |
The escapes that specify line ending sequences are literal strings, exactly as |
| shown. No more than one newline setting should be present in any data line. |
shown. No more than one newline setting should be present in any data line. |
| </P> |
</P> |
|
Line 604 is not being used. Providing a stack that is larger th
|
Line 738 is not being used. Providing a stack that is larger th
|
| necessary only for very complicated patterns. |
necessary only for very complicated patterns. |
| </P> |
</P> |
| <P> |
<P> |
| If \M is present, <b>pcretest</b> calls <b>pcre[16]_exec()</b> several times, | If \M is present, <b>pcretest</b> calls <b>pcre[16|32]_exec()</b> several times, |
| with different values in the <i>match_limit</i> and <i>match_limit_recursion</i> |
with different values in the <i>match_limit</i> and <i>match_limit_recursion</i> |
| fields of the <b>pcre[16]_extra</b> data structure, until it finds the minimum | fields of the <b>pcre[16|32]_extra</b> data structure, until it finds the minimum |
| numbers for each parameter that allow <b>pcre[16]_exec()</b> to complete without | numbers for each parameter that allow <b>pcre[16|32]_exec()</b> to complete without |
| error. Because this is testing a specific feature of the normal interpretive |
error. Because this is testing a specific feature of the normal interpretive |
| <b>pcre[16]_exec()</b> execution, the use of any JIT optimization that might | <b>pcre[16|32]_exec()</b> execution, the use of any JIT optimization that might |
| have been set up by the <b>/S+</b> qualifier of <b>-s+</b> option is disabled. |
have been set up by the <b>/S+</b> qualifier of <b>-s+</b> option is disabled. |
| </P> |
</P> |
| <P> |
<P> |
|
Line 624 needed to complete the match attempt.
|
Line 758 needed to complete the match attempt.
|
| <P> |
<P> |
| When \O is used, the value specified may be higher or lower than the size set |
When \O is used, the value specified may be higher or lower than the size set |
| by the <b>-O</b> command line option (or defaulted to 45); \O applies only to |
by the <b>-O</b> command line option (or defaulted to 45); \O applies only to |
| the call of <b>pcre[16]_exec()</b> for the line in which it appears. | the call of <b>pcre[16|32]_exec()</b> for the line in which it appears. |
| </P> |
</P> |
| <P> |
<P> |
| If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper |
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper |
|
Line 632 API to be used, the only option-setting sequences that
|
Line 766 API to be used, the only option-setting sequences that
|
| \N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, |
\N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, |
| to be passed to <b>regexec()</b>. |
to be passed to <b>regexec()</b>. |
| </P> |
</P> |
| <br><a name="SEC7" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br> | <br><a name="SEC8" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br> |
| <P> |
<P> |
| By default, <b>pcretest</b> uses the standard PCRE matching function, |
By default, <b>pcretest</b> uses the standard PCRE matching function, |
| <b>pcre[16]_exec()</b> to match each data line. PCRE also supports an | <b>pcre[16|32]_exec()</b> to match each data line. PCRE also supports an |
| alternative matching function, <b>pcre[16]_dfa_test()</b>, which operates in a | alternative matching function, <b>pcre[16|32]_dfa_test()</b>, which operates in a |
| different way, and has some restrictions. The differences between the two |
different way, and has some restrictions. The differences between the two |
| functions are described in the |
functions are described in the |
| <a href="pcrematching.html"><b>pcrematching</b></a> |
<a href="pcrematching.html"><b>pcrematching</b></a> |
|
Line 649 This function finds all possible matches at a given po
|
Line 783 This function finds all possible matches at a given po
|
| escape sequence is present in the data line, it stops after the first match is |
escape sequence is present in the data line, it stops after the first match is |
| found. This is always the shortest possible match. |
found. This is always the shortest possible match. |
| </P> |
</P> |
| <br><a name="SEC8" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br> | <br><a name="SEC9" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br> |
| <P> |
<P> |
| This section describes the output when the normal matching function, |
This section describes the output when the normal matching function, |
| <b>pcre[16]_exec()</b>, is being used. | <b>pcre[16|32]_exec()</b>, is being used. |
| </P> |
</P> |
| <P> |
<P> |
| When a match succeeds, <b>pcretest</b> outputs the list of captured substrings |
When a match succeeds, <b>pcretest</b> outputs the list of captured substrings |
| that <b>pcre[16]_exec()</b> returns, starting with number 0 for the string that | that <b>pcre[16|32]_exec()</b> returns, starting with number 0 for the string that |
| matched the whole pattern. Otherwise, it outputs "No match" when the return is |
matched the whole pattern. Otherwise, it outputs "No match" when the return is |
| PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching |
PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching |
| substring when <b>pcre[16]_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that | substring when <b>pcre[16|32]_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that |
| this is the entire substring that was inspected during the partial match; it |
this is the entire substring that was inspected during the partial match; it |
| may include characters before the actual match start if a lookbehind assertion, |
may include characters before the actual match start if a lookbehind assertion, |
| \K, \b, or \B was involved.) For any other return, <b>pcretest</b> outputs |
\K, \b, or \B was involved.) For any other return, <b>pcretest</b> outputs |
|
Line 679 at least two. Here is an example of an interactive <b>
|
Line 813 at least two. Here is an example of an interactive <b>
|
| No match |
No match |
| </pre> |
</pre> |
| Unset capturing substrings that are not followed by one that is set are not |
Unset capturing substrings that are not followed by one that is set are not |
| returned by <b>pcre[16]_exec()</b>, and are not shown by <b>pcretest</b>. In the | returned by <b>pcre[16|32]_exec()</b>, and are not shown by <b>pcretest</b>. In the |
| following example, there are two capturing substrings, but when the first data |
following example, there are two capturing substrings, but when the first data |
| line is matched, the second, unset substring is not shown. An "internal" unset |
line is matched, the second, unset substring is not shown. An "internal" unset |
| substring is shown as "<unset>", as for the second data line. |
substring is shown as "<unset>", as for the second data line. |
|
Line 740 prompt is used for continuations), data lines may not.
|
Line 874 prompt is used for continuations), data lines may not.
|
| included in data by means of the \n escape (or \r, \r\n, etc., depending on |
included in data by means of the \n escape (or \r, \r\n, etc., depending on |
| the newline sequence setting). |
the newline sequence setting). |
| </P> |
</P> |
| <br><a name="SEC9" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br> | <br><a name="SEC10" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br> |
| <P> |
<P> |
| When the alternative matching function, <b>pcre[16]_dfa_exec()</b>, is used (by | When the alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, is used (by |
| means of the \D escape sequence or the <b>-dfa</b> command line option), the |
means of the \D escape sequence or the <b>-dfa</b> command line option), the |
| output consists of a list of all the matches that start at the first point in |
output consists of a list of all the matches that start at the first point in |
| the subject where there is at least one match. For example: |
the subject where there is at least one match. For example: |
|
Line 776 at the end of the longest match. For example:
|
Line 910 at the end of the longest match. For example:
|
| Since the matching function does not support substring capture, the escape |
Since the matching function does not support substring capture, the escape |
| sequences that are concerned with captured substrings are not relevant. |
sequences that are concerned with captured substrings are not relevant. |
| </P> |
</P> |
| <br><a name="SEC10" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br> | <br><a name="SEC11" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br> |
| <P> |
<P> |
| When the alternative matching function has given the PCRE_ERROR_PARTIAL return, |
When the alternative matching function has given the PCRE_ERROR_PARTIAL return, |
| indicating that the subject partially matched the pattern, you can restart the |
indicating that the subject partially matched the pattern, you can restart the |
|
Line 793 For further information about partial matching, see th
|
Line 927 For further information about partial matching, see th
|
| <a href="pcrepartial.html"><b>pcrepartial</b></a> |
<a href="pcrepartial.html"><b>pcrepartial</b></a> |
| documentation. |
documentation. |
| </P> |
</P> |
| <br><a name="SEC11" href="#TOC1">CALLOUTS</a><br> | <br><a name="SEC12" href="#TOC1">CALLOUTS</a><br> |
| <P> |
<P> |
| If the pattern contains any callout requests, <b>pcretest</b>'s callout function |
If the pattern contains any callout requests, <b>pcretest</b>'s callout function |
| is called during matching. This works with both matching functions. By default, |
is called during matching. This works with both matching functions. By default, |
|
Line 854 the
|
Line 988 the
|
| <a href="pcrecallout.html"><b>pcrecallout</b></a> |
<a href="pcrecallout.html"><b>pcrecallout</b></a> |
| documentation. |
documentation. |
| </P> |
</P> |
| <br><a name="SEC12" href="#TOC1">NON-PRINTING CHARACTERS</a><br> | <br><a name="SEC13" href="#TOC1">NON-PRINTING CHARACTERS</a><br> |
| <P> |
<P> |
| When <b>pcretest</b> is outputting text in the compiled version of a pattern, |
When <b>pcretest</b> is outputting text in the compiled version of a pattern, |
| bytes other than 32-126 are always treated as non-printing characters are are |
bytes other than 32-126 are always treated as non-printing characters are are |
|
Line 866 string, it behaves in the same way, unless a different
|
Line 1000 string, it behaves in the same way, unless a different
|
| the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b> |
the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b> |
| function to distinguish printing and non-printing characters. |
function to distinguish printing and non-printing characters. |
| </P> |
</P> |
| <br><a name="SEC13" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br> | <br><a name="SEC14" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br> |
| <P> |
<P> |
| The facilities described in this section are not available when the POSIX |
The facilities described in this section are not available when the POSIX |
| interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
|
Line 939 string using a reloaded pattern is likely to cause <b>
|
Line 1073 string using a reloaded pattern is likely to cause <b>
|
| Finally, if you attempt to load a file that is not in the correct format, the |
Finally, if you attempt to load a file that is not in the correct format, the |
| result is undefined. |
result is undefined. |
| </P> |
</P> |
| <br><a name="SEC14" href="#TOC1">SEE ALSO</a><br> | <br><a name="SEC15" href="#TOC1">SEE ALSO</a><br> |
| <P> |
<P> |
| <b>pcre</b>(3), <b>pcre16</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), | <b>pcre</b>(3), <b>pcre16</b>(3), <b>pcre32</b>(3), <b>pcreapi</b>(3), |
| | <b>pcrecallout</b>(3), |
| <b>pcrejit</b>, <b>pcrematching</b>(3), <b>pcrepartial</b>(d), |
<b>pcrejit</b>, <b>pcrematching</b>(3), <b>pcrepartial</b>(d), |
| <b>pcrepattern</b>(3), <b>pcreprecompile</b>(3). |
<b>pcrepattern</b>(3), <b>pcreprecompile</b>(3). |
| </P> |
</P> |
| <br><a name="SEC15" href="#TOC1">AUTHOR</a><br> | <br><a name="SEC16" href="#TOC1">AUTHOR</a><br> |
| <P> |
<P> |
| Philip Hazel |
Philip Hazel |
| <br> |
<br> |
|
Line 954 University Computing Service
|
Line 1089 University Computing Service
|
| Cambridge CB2 3QH, England. |
Cambridge CB2 3QH, England. |
| <br> |
<br> |
| </P> |
</P> |
| <br><a name="SEC16" href="#TOC1">REVISION</a><br> | <br><a name="SEC17" href="#TOC1">REVISION</a><br> |
| <P> |
<P> |
| Last updated: 21 February 2012 | Last updated: 26 April 2013 |
| <br> |
<br> |
| Copyright © 1997-2012 University of Cambridge. | Copyright © 1997-2013 University of Cambridge. |
| <br> |
<br> |
| <p> |
<p> |
| Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |