File:  [ELWIX - Embedded LightWeight unIX -] / embedaddon / pcre / doc / pcre.3
Revision 1.1.1.4 (vendor branch): download - view: text, annotated - select for diffs - revision graph
Mon Jul 22 08:25:56 2013 UTC (10 years, 11 months ago) by misho
Branches: pcre, MAIN
CVS tags: v8_33, HEAD
8.33

    1: .TH PCRE 3 "13 May 2013" "PCRE 8.33"
    2: .SH NAME
    3: PCRE - Perl-compatible regular expressions
    4: .SH INTRODUCTION
    5: .rs
    6: .sp
    7: The PCRE library is a set of functions that implement regular expression
    8: pattern matching using the same syntax and semantics as Perl, with just a few
    9: differences. Some features that appeared in Python and PCRE before they
   10: appeared in Perl are also available using the Python syntax, there is some
   11: support for one or two .NET and Oniguruma syntax items, and there is an option
   12: for requesting some minor changes that give better JavaScript compatibility.
   13: .P
   14: Starting with release 8.30, it is possible to compile two separate PCRE
   15: libraries: the original, which supports 8-bit character strings (including
   16: UTF-8 strings), and a second library that supports 16-bit character strings
   17: (including UTF-16 strings). The build process allows either one or both to be
   18: built. The majority of the work to make this possible was done by Zoltan
   19: Herczeg.
   20: .P
   21: Starting with release 8.32 it is possible to compile a third separate PCRE
   22: library that supports 32-bit character strings (including UTF-32 strings). The
   23: build process allows any combination of the 8-, 16- and 32-bit libraries. The
   24: work to make this possible was done by Christian Persch.
   25: .P
   26: The three libraries contain identical sets of functions, except that the names
   27: in the 16-bit library start with \fBpcre16_\fP instead of \fBpcre_\fP, and the
   28: names in the 32-bit library start with \fBpcre32_\fP instead of \fBpcre_\fP. To
   29: avoid over-complication and reduce the documentation maintenance load, most of
   30: the documentation describes the 8-bit library, with the differences for the
   31: 16-bit and 32-bit libraries described separately in the
   32: .\" HREF
   33: \fBpcre16\fP
   34: and
   35: .\" HREF
   36: \fBpcre32\fP
   37: .\"
   38: pages. References to functions or structures of the form \fIpcre[16|32]_xxx\fP
   39: should be read as meaning "\fIpcre_xxx\fP when using the 8-bit library,
   40: \fIpcre16_xxx\fP when using the 16-bit library, or \fIpcre32_xxx\fP when using
   41: the 32-bit library".
   42: .P
   43: The current implementation of PCRE corresponds approximately with Perl 5.12,
   44: including support for UTF-8/16/32 encoded strings and Unicode general category
   45: properties. However, UTF-8/16/32 and Unicode support has to be explicitly
   46: enabled; it is not the default. The Unicode tables correspond to Unicode
   47: release 6.2.0.
   48: .P
   49: In addition to the Perl-compatible matching function, PCRE contains an
   50: alternative function that matches the same compiled patterns in a different
   51: way. In certain circumstances, the alternative function has some advantages.
   52: For a discussion of the two matching algorithms, see the
   53: .\" HREF
   54: \fBpcrematching\fP
   55: .\"
   56: page.
   57: .P
   58: PCRE is written in C and released as a C library. A number of people have
   59: written wrappers and interfaces of various kinds. In particular, Google Inc.
   60: have provided a comprehensive C++ wrapper for the 8-bit library. This is now
   61: included as part of the PCRE distribution. The
   62: .\" HREF
   63: \fBpcrecpp\fP
   64: .\"
   65: page has details of this interface. Other people's contributions can be found
   66: in the \fIContrib\fP directory at the primary FTP site, which is:
   67: .sp
   68: .\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
   69: .\" </a>
   70: ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
   71: .\"
   72: .P
   73: Details of exactly which Perl regular expression features are and are not
   74: supported by PCRE are given in separate documents. See the
   75: .\" HREF
   76: \fBpcrepattern\fP
   77: .\"
   78: and
   79: .\" HREF
   80: \fBpcrecompat\fP
   81: .\"
   82: pages. There is a syntax summary in the
   83: .\" HREF
   84: \fBpcresyntax\fP
   85: .\"
   86: page.
   87: .P
   88: Some features of PCRE can be included, excluded, or changed when the library is
   89: built. The
   90: .\" HREF
   91: \fBpcre_config()\fP
   92: .\"
   93: function makes it possible for a client to discover which features are
   94: available. The features themselves are described in the
   95: .\" HREF
   96: \fBpcrebuild\fP
   97: .\"
   98: page. Documentation about building PCRE for various operating systems can be
   99: found in the
  100: .\" HTML <a href="README.txt">
  101: .\" </a>
  102: \fBREADME\fP
  103: .\"
  104: and
  105: .\" HTML <a href="NON-AUTOTOOLS-BUILD.txt">
  106: .\" </a>
  107: \fBNON-AUTOTOOLS_BUILD\fP
  108: .\"
  109: files in the source distribution.
  110: .P
  111: The libraries contains a number of undocumented internal functions and data
  112: tables that are used by more than one of the exported external functions, but
  113: which are not intended for use by external callers. Their names all begin with
  114: "_pcre_" or "_pcre16_" or "_pcre32_", which hopefully will not provoke any name
  115: clashes. In some environments, it is possible to control which external symbols
  116: are exported when a shared library is built, and in these cases the
  117: undocumented symbols are not exported.
  118: .
  119: .
  120: .SH "SECURITY CONSIDERATIONS"
  121: .rs
  122: .sp
  123: If you are using PCRE in a non-UTF application that permits users to supply
  124: arbitrary patterns for compilation, you should be aware of a feature that
  125: allows users to turn on UTF support from within a pattern, provided that PCRE
  126: was built with UTF support. For example, an 8-bit pattern that begins with
  127: "(*UTF8)" or "(*UTF)" turns on UTF-8 mode, which interprets patterns and
  128: subjects as strings of UTF-8 characters instead of individual 8-bit characters.
  129: This causes both the pattern and any data against which it is matched to be
  130: checked for UTF-8 validity. If the data string is very long, such a check might
  131: use sufficiently many resources as to cause your application to lose
  132: performance.
  133: .P
  134: One way of guarding against this possibility is to use the
  135: \fBpcre_fullinfo()\fP function to check the compiled pattern's options for UTF.
  136: Alternatively, from release 8.33, you can set the PCRE_NEVER_UTF option at
  137: compile time. This causes an compile time error if a pattern contains a
  138: UTF-setting sequence.
  139: .P
  140: If your application is one that supports UTF, be aware that validity checking
  141: can take time. If the same data string is to be matched many times, you can use
  142: the PCRE_NO_UTF[8|16|32]_CHECK option for the second and subsequent matches to
  143: save redundant checks.
  144: .P
  145: Another way that performance can be hit is by running a pattern that has a very
  146: large search tree against a string that will never match. Nested unlimited
  147: repeats in a pattern are a common example. PCRE provides some protection
  148: against this: see the PCRE_EXTRA_MATCH_LIMIT feature in the
  149: .\" HREF
  150: \fBpcreapi\fP
  151: .\"
  152: page.
  153: .
  154: .
  155: .SH "USER DOCUMENTATION"
  156: .rs
  157: .sp
  158: The user documentation for PCRE comprises a number of different sections. In
  159: the "man" format, each of these is a separate "man page". In the HTML format,
  160: each is a separate page, linked from the index page. In the plain text format,
  161: all the sections, except the \fBpcredemo\fP section, are concatenated, for ease
  162: of searching. The sections are as follows:
  163: .sp
  164:   pcre              this document
  165:   pcre-config       show PCRE installation configuration information
  166:   pcre16            details of the 16-bit library
  167:   pcre32            details of the 32-bit library
  168:   pcreapi           details of PCRE's native C API
  169:   pcrebuild         building PCRE
  170:   pcrecallout       details of the callout feature
  171:   pcrecompat        discussion of Perl compatibility
  172:   pcrecpp           details of the C++ wrapper for the 8-bit library
  173:   pcredemo          a demonstration C program that uses PCRE
  174:   pcregrep          description of the \fBpcregrep\fP command (8-bit only)
  175:   pcrejit           discussion of the just-in-time optimization support
  176:   pcrelimits        details of size and other limits
  177:   pcrematching      discussion of the two matching algorithms
  178:   pcrepartial       details of the partial matching facility
  179: .\" JOIN
  180:   pcrepattern       syntax and semantics of supported
  181:                       regular expressions
  182:   pcreperform       discussion of performance issues
  183:   pcreposix         the POSIX-compatible C API for the 8-bit library
  184:   pcreprecompile    details of saving and re-using precompiled patterns
  185:   pcresample        discussion of the pcredemo program
  186:   pcrestack         discussion of stack usage
  187:   pcresyntax        quick syntax reference
  188:   pcretest          description of the \fBpcretest\fP testing command
  189:   pcreunicode       discussion of Unicode and UTF-8/16/32 support
  190: .sp
  191: In addition, in the "man" and HTML formats, there is a short page for each
  192: C library function, listing its arguments and results.
  193: .
  194: .
  195: .SH AUTHOR
  196: .rs
  197: .sp
  198: .nf
  199: Philip Hazel
  200: University Computing Service
  201: Cambridge CB2 3QH, England.
  202: .fi
  203: .P
  204: Putting an actual email address here seems to have been a spam magnet, so I've
  205: taken it away. If you want to email me, use my two initials, followed by the
  206: two digits 10, at the domain cam.ac.uk.
  207: .
  208: .
  209: .SH REVISION
  210: .rs
  211: .sp
  212: .nf
  213: Last updated: 13 May 2013
  214: Copyright (c) 1997-2013 University of Cambridge.
  215: .fi

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>