embedaddon/pcre/doc/pcrebuild.3 - annotate

Return to pcrebuild.3 CVS log
Up to [ELWIX - Embedded LightWeight unIX -] / embedaddon / pcre / doc
Annotation of embedaddon/pcre/doc/pcrebuild.3, revision 1.1.1.2

1.1       misho       1: .TH PCREBUILD 3
                      2: .SH NAME
                      3: PCRE - Perl-compatible regular expressions
                      4: .
                      5: .
                      6: .SH "PCRE BUILD-TIME OPTIONS"
                      7: .rs
                      8: .sp
                      9: This document describes the optional features of PCRE that can be selected when
                     10: the library is compiled. It assumes use of the \fBconfigure\fP script, where
                     11: the optional features are selected or deselected by providing options to
                     12: \fBconfigure\fP before running the \fBmake\fP command. However, the same
                     13: options can be selected in both Unix-like and non-Unix-like environments using
                     14: the GUI facility of \fBcmake-gui\fP if you are using \fBCMake\fP instead of
                     15: \fBconfigure\fP to build PCRE.
                     16: .P
                     17: There is a lot more information about building PCRE in non-Unix-like
                     18: environments in the file called \fINON_UNIX_USE\fP, which is part of the PCRE
                     19: distribution. You should consult this file as well as the \fIREADME\fP file if
                     20: you are building in a non-Unix-like environment.
                     21: .P
                     22: The complete list of options for \fBconfigure\fP (which includes the standard
                     23: ones such as the selection of the installation directory) can be obtained by
                     24: running
                     25: .sp
                     26:   ./configure --help
                     27: .sp
                     28: The following sections include descriptions of options whose names begin with
                     29: --enable or --disable. These settings specify changes to the defaults for the
                     30: \fBconfigure\fP command. Because of the way that \fBconfigure\fP works,
                     31: --enable and --disable always come in pairs, so the complementary option always
                     32: exists as well, but as it specifies the default, it is not described.
                     33: .
                     34: .
1.1.1.2 ! misho      35: .SH "BUILDING 8-BIT and 16-BIT LIBRARIES"
        !            36: .rs
        !            37: .sp
        !            38: By default, a library called \fBlibpcre\fP is built, containing functions that
        !            39: take string arguments contained in vectors of bytes, either as single-byte
        !            40: characters, or interpreted as UTF-8 strings. You can also build a separate
        !            41: library, called \fBlibpcre16\fP, in which strings are contained in vectors of
        !            42: 16-bit data units and interpreted either as single-unit characters or UTF-16
        !            43: strings, by adding
        !            44: .sp
        !            45:   --enable-pcre16
        !            46: .sp
        !            47: to the \fBconfigure\fP command. If you do not want the 8-bit library, add
        !            48: .sp
        !            49:   --disable-pcre8
        !            50: .sp
        !            51: as well. At least one of the two libraries must be built. Note that the C++ and
        !            52: POSIX wrappers are for the 8-bit library only, and that \fBpcregrep\fP is an
        !            53: 8-bit program. None of these are built if you select only the 16-bit library.
        !            54: .
        !            55: .
1.1       misho      56: .SH "BUILDING SHARED AND STATIC LIBRARIES"
                     57: .rs
                     58: .sp
                     59: The PCRE building process uses \fBlibtool\fP to build both shared and static
                     60: Unix libraries by default. You can suppress one of these by adding one of
                     61: .sp
                     62:   --disable-shared
                     63:   --disable-static
                     64: .sp
                     65: to the \fBconfigure\fP command, as required.
                     66: .
                     67: .
                     68: .SH "C++ SUPPORT"
                     69: .rs
                     70: .sp
1.1.1.2 ! misho      71: By default, if the 8-bit library is being built, the \fBconfigure\fP script
        !            72: will search for a C++ compiler and C++ header files. If it finds them, it
        !            73: automatically builds the C++ wrapper library (which supports only 8-bit
        !            74: strings). You can disable this by adding
1.1       misho      75: .sp
                     76:   --disable-cpp
                     77: .sp
                     78: to the \fBconfigure\fP command.
                     79: .
                     80: .
1.1.1.2 ! misho      81: .SH "UTF-8 and UTF-16 SUPPORT"
1.1       misho      82: .rs
                     83: .sp
1.1.1.2 ! misho      84: To build PCRE with support for UTF Unicode character strings, add
1.1       misho      85: .sp
1.1.1.2 ! misho      86:   --enable-utf
1.1       misho      87: .sp
1.1.1.2 ! misho      88: to the \fBconfigure\fP command. This setting applies to both libraries, adding
        !            89: support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit
        !            90: library. There are no separate options for enabling UTF-8 and UTF-16
        !            91: independently because that would allow ridiculous settings such as requesting
        !            92: UTF-16 support while building only the 8-bit library. It is not possible to
        !            93: build one library with UTF support and the other without in the same
        !            94: configuration. (For backwards compatibility, --enable-utf8 is a synonym of
        !            95: --enable-utf.)
        !            96: .P
        !            97: Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As
        !            98: well as compiling PCRE with this option, you also have have to set the
        !            99: PCRE_UTF8 or PCRE_UTF16 option when you call one of the pattern compiling
        !           100: functions.
1.1       misho     101: .P
1.1.1.2 ! misho     102: If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
1.1       misho     103: its input to be either ASCII or UTF-8 (depending on the runtime option). It is
                    104: not possible to support both EBCDIC and UTF-8 codes in the same version of the
1.1.1.2 ! misho     105: library. Consequently, --enable-utf and --enable-ebcdic are mutually
1.1       misho     106: exclusive.
                    107: .
                    108: .
                    109: .SH "UNICODE CHARACTER PROPERTY SUPPORT"
                    110: .rs
                    111: .sp
1.1.1.2 ! misho     112: UTF support allows the libraries to process character codepoints up to 0x10ffff
        !           113: in the strings that they handle. On its own, however, it does not provide any
1.1       misho     114: facilities for accessing the properties of such characters. If you want to be
                    115: able to use the pattern escapes \eP, \ep, and \eX, which refer to Unicode
                    116: character properties, you must add
                    117: .sp
                    118:   --enable-unicode-properties
                    119: .sp
1.1.1.2 ! misho     120: to the \fBconfigure\fP command. This implies UTF support, even if you have
1.1       misho     121: not explicitly requested it.
                    122: .P
                    123: Including Unicode property support adds around 30K of tables to the PCRE
                    124: library. Only the general category properties such as \fILu\fP and \fINd\fP are
                    125: supported. Details are given in the
                    126: .\" HREF
                    127: \fBpcrepattern\fP
                    128: .\"
                    129: documentation.
                    130: .
                    131: .
                    132: .SH "JUST-IN-TIME COMPILER SUPPORT"
                    133: .rs
                    134: .sp
                    135: Just-in-time compiler support is included in the build by specifying
                    136: .sp
                    137:   --enable-jit
                    138: .sp
                    139: This support is available only for certain hardware architectures. If this
                    140: option is set for an unsupported architecture, a compile time error occurs.
                    141: See the
                    142: .\" HREF
                    143: \fBpcrejit\fP
                    144: .\"
                    145: documentation for a discussion of JIT usage. When JIT support is enabled,
                    146: pcregrep automatically makes use of it, unless you add
                    147: .sp
                    148:   --disable-pcregrep-jit
                    149: .sp
                    150: to the "configure" command.
                    151: .
                    152: .
                    153: .SH "CODE VALUE OF NEWLINE"
                    154: .rs
                    155: .sp
                    156: By default, PCRE interprets the linefeed (LF) character as indicating the end
                    157: of a line. This is the normal newline character on Unix-like systems. You can
                    158: compile PCRE to use carriage return (CR) instead, by adding
                    159: .sp
                    160:   --enable-newline-is-cr
                    161: .sp
                    162: to the \fBconfigure\fP command. There is also a --enable-newline-is-lf option,
                    163: which explicitly specifies linefeed as the newline character.
                    164: .sp
                    165: Alternatively, you can specify that line endings are to be indicated by the two
                    166: character sequence CRLF. If you want this, add
                    167: .sp
                    168:   --enable-newline-is-crlf
                    169: .sp
                    170: to the \fBconfigure\fP command. There is a fourth option, specified by
                    171: .sp
                    172:   --enable-newline-is-anycrlf
                    173: .sp
                    174: which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as
                    175: indicating a line ending. Finally, a fifth option, specified by
                    176: .sp
                    177:   --enable-newline-is-any
                    178: .sp
                    179: causes PCRE to recognize any Unicode newline sequence.
                    180: .P
                    181: Whatever line ending convention is selected when PCRE is built can be
                    182: overridden when the library functions are called. At build time it is
                    183: conventional to use the standard for your operating system.
                    184: .
                    185: .
                    186: .SH "WHAT \eR MATCHES"
                    187: .rs
                    188: .sp
                    189: By default, the sequence \eR in a pattern matches any Unicode newline sequence,
                    190: whatever has been selected as the line ending sequence. If you specify
                    191: .sp
                    192:   --enable-bsr-anycrlf
                    193: .sp
                    194: the default is changed so that \eR matches only CR, LF, or CRLF. Whatever is
                    195: selected when PCRE is built can be overridden when the library functions are
                    196: called.
                    197: .
                    198: .
                    199: .SH "POSIX MALLOC USAGE"
                    200: .rs
                    201: .sp
1.1.1.2 ! misho     202: When the 8-bit library is called through the POSIX interface (see the
1.1       misho     203: .\" HREF
                    204: \fBpcreposix\fP
                    205: .\"
                    206: documentation), additional working storage is required for holding the pointers
                    207: to capturing substrings, because PCRE requires three integers per substring,
                    208: whereas the POSIX interface provides only two. If the number of expected
                    209: substrings is small, the wrapper function uses space on the stack, because this
                    210: is faster than using \fBmalloc()\fP for each call. The default threshold above
                    211: which the stack is no longer used is 10; it can be changed by adding a setting
                    212: such as
                    213: .sp
                    214:   --with-posix-malloc-threshold=20
                    215: .sp
                    216: to the \fBconfigure\fP command.
                    217: .
                    218: .
                    219: .SH "HANDLING VERY LARGE PATTERNS"
                    220: .rs
                    221: .sp
                    222: Within a compiled pattern, offset values are used to point from one part to
                    223: another (for example, from an opening parenthesis to an alternation
                    224: metacharacter). By default, two-byte values are used for these offsets, leading
                    225: to a maximum size for a compiled pattern of around 64K. This is sufficient to
                    226: handle all but the most gigantic patterns. Nevertheless, some people do want to
1.1.1.2 ! misho     227: process truly enormous patterns, so it is possible to compile PCRE to use
1.1       misho     228: three-byte or four-byte offsets by adding a setting such as
                    229: .sp
                    230:   --with-link-size=3
                    231: .sp
1.1.1.2 ! misho     232: to the \fBconfigure\fP command. The value given must be 2, 3, or 4. For the
        !           233: 16-bit library, a value of 3 is rounded up to 4. Using longer offsets slows
        !           234: down the operation of PCRE because it has to load additional data when handling
        !           235: them.
1.1       misho     236: .
                    237: .
                    238: .SH "AVOIDING EXCESSIVE STACK USAGE"
                    239: .rs
                    240: .sp
                    241: When matching with the \fBpcre_exec()\fP function, PCRE implements backtracking
                    242: by making recursive calls to an internal function called \fBmatch()\fP. In
                    243: environments where the size of the stack is limited, this can severely limit
                    244: PCRE's operation. (The Unix environment does not usually suffer from this
                    245: problem, but it may sometimes be necessary to increase the maximum stack size.
                    246: There is a discussion in the
                    247: .\" HREF
                    248: \fBpcrestack\fP
                    249: .\"
                    250: documentation.) An alternative approach to recursion that uses memory from the
                    251: heap to remember data, instead of using recursive function calls, has been
                    252: implemented to work round the problem of limited stack size. If you want to
                    253: build a version of PCRE that works this way, add
                    254: .sp
                    255:   --disable-stack-for-recursion
                    256: .sp
                    257: to the \fBconfigure\fP command. With this configuration, PCRE will use the
                    258: \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory
                    259: management functions. By default these point to \fBmalloc()\fP and
                    260: \fBfree()\fP, but you can replace the pointers so that your own functions are
                    261: used instead.
                    262: .P
                    263: Separate functions are provided rather than using \fBpcre_malloc\fP and
                    264: \fBpcre_free\fP because the usage is very predictable: the block sizes
                    265: requested are always the same, and the blocks are always freed in reverse
                    266: order. A calling program might be able to implement optimized functions that
                    267: perform better than \fBmalloc()\fP and \fBfree()\fP. PCRE runs noticeably more
                    268: slowly when built in this way. This option affects only the \fBpcre_exec()\fP
                    269: function; it is not relevant for \fBpcre_dfa_exec()\fP.
                    270: .
                    271: .
                    272: .SH "LIMITING PCRE RESOURCE USAGE"
                    273: .rs
                    274: .sp
                    275: Internally, PCRE has a function called \fBmatch()\fP, which it calls repeatedly
                    276: (sometimes recursively) when matching a pattern with the \fBpcre_exec()\fP
                    277: function. By controlling the maximum number of times this function may be
                    278: called during a single matching operation, a limit can be placed on the
                    279: resources used by a single call to \fBpcre_exec()\fP. The limit can be changed
                    280: at run time, as described in the
                    281: .\" HREF
                    282: \fBpcreapi\fP
                    283: .\"
                    284: documentation. The default is 10 million, but this can be changed by adding a
                    285: setting such as
                    286: .sp
                    287:   --with-match-limit=500000
                    288: .sp
                    289: to the \fBconfigure\fP command. This setting has no effect on the
                    290: \fBpcre_dfa_exec()\fP matching function.
                    291: .P
                    292: In some environments it is desirable to limit the depth of recursive calls of
                    293: \fBmatch()\fP more strictly than the total number of calls, in order to
                    294: restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
                    295: is specified) that is used. A second limit controls this; it defaults to the
                    296: value that is set for --with-match-limit, which imposes no additional
                    297: constraints. However, you can set a lower limit by adding, for example,
                    298: .sp
                    299:   --with-match-limit-recursion=10000
                    300: .sp
                    301: to the \fBconfigure\fP command. This value can also be overridden at run time.
                    302: .
                    303: .
                    304: .SH "CREATING CHARACTER TABLES AT BUILD TIME"
                    305: .rs
                    306: .sp
                    307: PCRE uses fixed tables for processing characters whose code values are less
                    308: than 256. By default, PCRE is built with a set of tables that are distributed
                    309: in the file \fIpcre_chartables.c.dist\fP. These tables are for ASCII codes
                    310: only. If you add
                    311: .sp
                    312:   --enable-rebuild-chartables
                    313: .sp
                    314: to the \fBconfigure\fP command, the distributed tables are no longer used.
                    315: Instead, a program called \fBdftables\fP is compiled and run. This outputs the
                    316: source for new set of tables, created in the default locale of your C runtime
                    317: system. (This method of replacing the tables does not work if you are cross
                    318: compiling, because \fBdftables\fP is run on the local host. If you need to
                    319: create alternative tables when cross compiling, you will have to do so "by
                    320: hand".)
                    321: .
                    322: .
                    323: .SH "USING EBCDIC CODE"
                    324: .rs
                    325: .sp
                    326: PCRE assumes by default that it will run in an environment where the character
                    327: code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
                    328: most computer operating systems. PCRE can, however, be compiled to run in an
                    329: EBCDIC environment by adding
                    330: .sp
                    331:   --enable-ebcdic
                    332: .sp
                    333: to the \fBconfigure\fP command. This setting implies
                    334: --enable-rebuild-chartables. You should only use it if you know that you are in
                    335: an EBCDIC environment (for example, an IBM mainframe operating system). The
1.1.1.2 ! misho     336: --enable-ebcdic option is incompatible with --enable-utf.
1.1       misho     337: .
                    338: .
                    339: .SH "PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT"
                    340: .rs
                    341: .sp
                    342: By default, \fBpcregrep\fP reads all files as plain text. You can build it so
                    343: that it recognizes files whose names end in \fB.gz\fP or \fB.bz2\fP, and reads
                    344: them with \fBlibz\fP or \fBlibbz2\fP, respectively, by adding one or both of
                    345: .sp
                    346:   --enable-pcregrep-libz
                    347:   --enable-pcregrep-libbz2
                    348: .sp
                    349: to the \fBconfigure\fP command. These options naturally require that the
                    350: relevant libraries are installed on your system. Configuration will fail if
                    351: they are not.
                    352: .
                    353: .
                    354: .SH "PCREGREP BUFFER SIZE"
                    355: .rs
                    356: .sp
                    357: \fBpcregrep\fP uses an internal buffer to hold a "window" on the file it is
                    358: scanning, in order to be able to output "before" and "after" lines when it
                    359: finds a match. The size of the buffer is controlled by a parameter whose
                    360: default value is 20K. The buffer itself is three times this size, but because
                    361: of the way it is used for holding "before" lines, the longest line that is
                    362: guaranteed to be processable is the parameter size. You can change the default
                    363: parameter value by adding, for example,
                    364: .sp
                    365:   --with-pcregrep-bufsize=50K
                    366: .sp
                    367: to the \fBconfigure\fP command. The caller of \fPpcregrep\fP can, however,
                    368: override this value by specifying a run-time option.
                    369: .
                    370: .
                    371: .SH "PCRETEST OPTION FOR LIBREADLINE SUPPORT"
                    372: .rs
                    373: .sp
                    374: If you add
                    375: .sp
                    376:   --enable-pcretest-libreadline
                    377: .sp
                    378: to the \fBconfigure\fP command, \fBpcretest\fP is linked with the
                    379: \fBlibreadline\fP library, and when its input is from a terminal, it reads it
                    380: using the \fBreadline()\fP function. This provides line-editing and history
                    381: facilities. Note that \fBlibreadline\fP is GPL-licensed, so if you distribute a
                    382: binary of \fBpcretest\fP linked in this way, there may be licensing issues.
                    383: .P
                    384: Setting this option causes the \fB-lreadline\fP option to be added to the
                    385: \fBpcretest\fP build. In many operating environments with a sytem-installed
                    386: \fBlibreadline\fP this is sufficient. However, in some environments (e.g.
                    387: if an unmodified distribution version of readline is in use), some extra
                    388: configuration may be necessary. The INSTALL file for \fBlibreadline\fP says
                    389: this:
                    390: .sp
                    391:   "Readline uses the termcap functions, but does not link with the
                    392:   termcap or curses library itself, allowing applications which link
                    393:   with readline the to choose an appropriate library."
                    394: .sp
                    395: If your environment has not been set up so that an appropriate library is
                    396: automatically included, you may need to add something like
                    397: .sp
                    398:   LIBS="-ncurses"
                    399: .sp
                    400: immediately before the \fBconfigure\fP command.
                    401: .
                    402: .
                    403: .SH "SEE ALSO"
                    404: .rs
                    405: .sp
1.1.1.2 ! misho     406: \fBpcreapi\fP(3), \fBpcre16\fP, \fBpcre_config\fP(3).
1.1       misho     407: .
                    408: .
                    409: .SH AUTHOR
                    410: .rs
                    411: .sp
                    412: .nf
                    413: Philip Hazel
                    414: University Computing Service
                    415: Cambridge CB2 3QH, England.
                    416: .fi
                    417: .
                    418: .
                    419: .SH REVISION
                    420: .rs
                    421: .sp
                    422: .nf
1.1.1.2 ! misho     423: Last updated: 07 January 2012
        !           424: Copyright (c) 1997-2012 University of Cambridge.
1.1       misho     425: .fi
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>