--- embedaddon/pcre/README 2012/02/21 23:05:51 1.1.1.1 +++ embedaddon/pcre/README 2012/10/09 09:19:17 1.1.1.3 @@ -18,11 +18,12 @@ The contents of this README file are: The PCRE APIs Documentation for PCRE Contributions by users of PCRE - Building PCRE on non-Unix systems - Building PCRE on Unix-like systems - Retrieving configuration information on Unix-like systems - Shared libraries on Unix-like systems - Cross-compiling on Unix-like systems + Building PCRE on non-Unix-like systems + Building PCRE without using autotools + Building PCRE using autotools + Retrieving configuration information + Shared libraries + Cross-compiling using autotools Using HP's ANSI C++ compiler (aCC) Using PCRE from MySQL Making new tarballs @@ -34,16 +35,19 @@ The contents of this README file are: The PCRE APIs ------------- -PCRE is written in C, and it has its own API. The distribution also includes a -set of C++ wrapper functions (see the pcrecpp man page for details), courtesy -of Google Inc. +PCRE is written in C, and it has its own API. There are two sets of functions, +one for the 8-bit library, which processes strings of bytes, and one for the +16-bit library, which processes strings of 16-bit values. The distribution also +includes a set of C++ wrapper functions (see the pcrecpp man page for details), +courtesy of Google Inc., which can be used to call the 8-bit PCRE library from +C++. -In addition, there is a set of C wrapper functions that are based on the POSIX -regular expression API (see the pcreposix man page). These end up in the -library called libpcreposix. Note that this just provides a POSIX calling -interface to PCRE; the regular expressions themselves still follow Perl syntax -and semantics. The POSIX API is restricted, and does not give full access to -all of PCRE's facilities. +In addition, there is a set of C wrapper functions (again, just for the 8-bit +library) that are based on the POSIX regular expression API (see the pcreposix +man page). These end up in the library called libpcreposix. Note that this just +provides a POSIX calling interface to PCRE; the regular expressions themselves +still follow Perl syntax and semantics. The POSIX API is restricted, and does +not give full access to all of PCRE's facilities. The header file for the POSIX-style functions is called pcreposix.h. The official POSIX name is regex.h, but I did not want to risk possible problems @@ -106,36 +110,45 @@ Windows (I myself do not use Windows). Nowadays there in the standard distribution, so these contibutions have been archived. -Building PCRE on non-Unix systems ---------------------------------- +Building PCRE on non-Unix-like systems +-------------------------------------- -For a non-Unix system, please read the comments in the file NON-UNIX-USE, -though if your system supports the use of "configure" and "make" you may be -able to build PCRE in the same way as for Unix-like systems. PCRE can also be -configured in many platform environments using the GUI facility provided by -CMake's cmake-gui command. This creates Makefiles, solution files, etc. +For a non-Unix-like system, please read the comments in the file +NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and +"make" you may be able to build PCRE using autotools in the same way as for +many Unix-like systems. +PCRE can also be configured using the GUI facility provided by CMake's +cmake-gui command. This creates Makefiles, solution files, etc. The file +NON-AUTOTOOLS-BUILD has information about CMake. + PCRE has been compiled on many different operating systems. It should be straightforward to build PCRE on any system that has a Standard C compiler and library, because it uses only Standard C functions. -Building PCRE on Unix-like systems ----------------------------------- +Building PCRE without using autotools +------------------------------------- +The use of autotools (in particular, libtool) is problematic in some +environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD +file for ways of building PCRE without using autotools. + + +Building PCRE using autotools +----------------------------- + If you are using HP's ANSI C++ compiler (aCC), please see the special note in the section entitled "Using HP's ANSI C++ compiler (aCC)" below. -The following instructions assume the use of the widely used "configure, make, -make install" process. There is also support for CMake in the PCRE -distribution; there are some comments about using CMake in the NON-UNIX-USE -file, though it can also be used in Unix-like systems. +The following instructions assume the use of the widely used "configure; make; +make install" (autotools) process. -To build PCRE on a Unix-like system, first run the "configure" command from the -PCRE distribution directory, with your current directory set to the directory -where you want the files to be created. This command is a standard GNU -"autoconf" configuration script, for which generic instructions are supplied in -the file INSTALL. +To build PCRE on system that supports autotools, first run the "configure" +command from the PCRE distribution directory, with your current directory set +to the directory where you want the files to be created. This command is a +standard GNU "autoconf" configuration script, for which generic instructions +are supplied in the file INSTALL. Most commonly, people build PCRE within its own distribution directory, and in this case, on many systems, just running "./configure" is sufficient. However, @@ -143,9 +156,9 @@ the usual methods of changing standard defaults are av CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local -specifies that the C compiler should be run with the flags '-O2 -Wall' instead -of the default, and that "make install" should install PCRE under /opt/local -instead of the default /usr/local. +This command specifies that the C compiler should be run with the flags '-O2 +-Wall' instead of the default, and that "make install" should install PCRE +under /opt/local instead of the default /usr/local. If you want to build in a different directory, just run "configure" with that directory as current. For example, suppose you have unpacked the PCRE source @@ -169,11 +182,16 @@ library. They are also documented in the pcrebuild man (See also "Shared libraries on Unix-like systems" below.) -. If you want to suppress the building of the C++ wrapper library, you can add - --disable-cpp to the "configure" command. Otherwise, when "configure" is run, - it will try to find a C++ compiler and C++ header files, and if it succeeds, - it will try to build the C++ wrapper. +. By default, only the 8-bit library is built. If you add --enable-pcre16 to + the "configure" command, the 16-bit library is also built. If you want only + the 16-bit library, use "./configure --enable-pcre16 --disable-pcre8". +. If you are building the 8-bit library and want to suppress the building of + the C++ wrapper library, you can add --disable-cpp to the "configure" + command. Otherwise, when "configure" is run without --disable-pcre8, it will + try to find a C++ compiler and C++ header files, and if it succeeds, it will + try to build the C++ wrapper. + . If you want to include support for just-in-time compiling, which can give large performance improvements on certain platforms, add --enable-jit to the "configure" command. This support is available only for certain hardware @@ -184,20 +202,30 @@ library. They are also documented in the pcrebuild man you add --disable-pcregrep-jit to the "configure" command. . If you want to make use of the support for UTF-8 Unicode character strings in - PCRE, you must add --enable-utf8 to the "configure" command. Without it, the - code for handling UTF-8 is not included in the library. Even when included, - it still has to be enabled by an option at run time. When PCRE is compiled - with this option, its input can only either be ASCII or UTF-8, even when - running on EBCDIC platforms. It is not possible to use both --enable-utf8 and - --enable-ebcdic at the same time. + the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library, + you must add --enable-utf to the "configure" command. Without it, the code + for handling UTF-8 and UTF-16 is not included in the relevant library. Even + when --enable-utf is included, the use of a UTF encoding still has to be + enabled by an option at run time. When PCRE is compiled with this option, its + input can only either be ASCII or UTF-8/16, even when running on EBCDIC + platforms. It is not possible to use both --enable-utf and --enable-ebcdic at + the same time. -. If, in addition to support for UTF-8 character strings, you want to include - support for the \P, \p, and \X sequences that recognize Unicode character - properties, you must add --enable-unicode-properties to the "configure" - command. This adds about 30K to the size of the library (in the form of a - property table); only the basic two-letter properties such as Lu are - supported. +. There are no separate options for enabling UTF-8 and UTF-16 independently + because that would allow ridiculous settings such as requesting UTF-16 + support while building only the 8-bit library. However, the option + --enable-utf8 is retained for backwards compatibility with earlier releases + that did not support 16-bit character strings. It is synonymous with + --enable-utf. It is not possible to configure one library with UTF support + and the other without in the same configuration. +. If, in addition to support for UTF-8/16 character strings, you want to + include support for the \P, \p, and \X sequences that recognize Unicode + character properties, you must add --enable-unicode-properties to the + "configure" command. This adds about 30K to the size of the library (in the + form of a property table); only the basic two-letter properties such as Lu + are supported. + . You can build PCRE to recognize either CR or LF or the sequence CRLF or any of the preceding, or any of the Unicode newline sequences as indicating the end of a line. Whatever you specify at build time is the default; the caller @@ -249,10 +277,11 @@ library. They are also documented in the pcrebuild man sizes in the pcrestack man page. . The default maximum compiled pattern size is around 64K. You can increase - this by adding --with-link-size=3 to the "configure" command. You can - increase it even more by setting --with-link-size=4, but this is unlikely - ever to be necessary. Increasing the internal link size will reduce - performance. + this by adding --with-link-size=3 to the "configure" command. In the 8-bit + library, PCRE then uses three bytes instead of two for offsets to different + parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is + the same as --with-link-size=4, which (in both libraries) uses four-byte + offsets. Increasing the internal link size reduces performance. . You can build PCRE so that its internal match() function that is called from pcre_exec() does not call itself recursively. Instead, it uses memory blocks @@ -287,10 +316,12 @@ library. They are also documented in the pcrebuild man This automatically implies --enable-rebuild-chartables (see above). However, when PCRE is built this way, it always operates in EBCDIC. It cannot support - both EBCDIC and UTF-8. + both EBCDIC and UTF-8/16. -. It is possible to compile pcregrep to use libz and/or libbz2, in order to - read .gz and .bz2 files (respectively), by specifying one or both of +. The pcregrep program currently supports only 8-bit data files, and so + requires the 8-bit PCRE library. It is possible to compile pcregrep to use + libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by + specifying one or both of --enable-pcregrep-libz --enable-pcregrep-libbz2 @@ -305,16 +336,17 @@ library. They are also documented in the pcrebuild man The default value is 20K. . It is possible to compile pcretest so that it links with the libreadline - library, by specifying + or libedit libraries, by specifying, respectively, - --enable-pcretest-libreadline + --enable-pcretest-libreadline or --enable-pcretest-libedit If this is done, when pcretest's input is from a terminal, it reads it using the readline() function. This provides line-editing and history facilities. Note that libreadline is GPL-licenced, so if you distribute a binary of - pcretest linked in this way, there may be licensing issues. + pcretest linked in this way, there may be licensing issues. These can be + avoided by linking with libedit (which has a BSD licence) instead. - Setting this option causes the -lreadline option to be added to the pcretest + Enabling libreadline causes the -lreadline option to be added to the pcretest build. In many operating environments with a sytem-installed readline library this is sufficient. However, in some environments (e.g. if an unmodified distribution version of readline is in use), it may be necessary @@ -333,17 +365,17 @@ The "configure" script builds the following files for . pcre-config script that shows the building settings such as CFLAGS that were set for "configure" . libpcre.pc ) data for the pkg-config command +. libpcre16.pc ) . libpcreposix.pc ) . libtool script that builds shared and/or static libraries -. RunTest script for running tests on the basic C library -. RunGrepTest script for running tests on the pcregrep command Versions of config.h and pcre.h are distributed in the PCRE tarballs under the names config.h.generic and pcre.h.generic. These are provided for those who have to built PCRE without using "configure" or CMake. If you use "configure" or CMake, the .generic versions are not used. -If a C++ compiler is found, the following files are also built: +When building the 8-bit library, if a C++ compiler is found, the following +files are also built: . libpcrecpp.pc data for the pkg-config command . pcrecpparg.h header file for calling PCRE via the C++ wrapper @@ -353,14 +385,17 @@ The "configure" script also creates config.status, whi script that can be run to recreate the configuration, and config.log, which contains compiler output from tests that "configure" runs. -Once "configure" has run, you can run "make". It builds two libraries, called -libpcre and libpcreposix, a test program called pcretest, and the pcregrep -command. If a C++ compiler was found on your system, and you did not disable it -with --disable-cpp, "make" also builds the C++ wrapper library, which is called -libpcrecpp, and some test programs called pcrecpp_unittest, -pcre_scanner_unittest, and pcre_stringpiece_unittest. If you enabled JIT -support with --enable-jit, a test program called pcre_jit_test is also built. +Once "configure" has run, you can run "make". This builds either or both of the +libraries libpcre and libpcre16, and a test program called pcretest. If you +enabled JIT support with --enable-jit, a test program called pcre_jit_test is +built as well. +If the 8-bit library is built, libpcreposix and the pcregrep command are also +built, and if a C++ compiler was found on your system, and you did not disable +it with --disable-cpp, "make" builds the C++ wrapper library, which is called +libpcrecpp, as well as some test programs called pcrecpp_unittest, +pcre_scanner_unittest, and pcre_stringpiece_unittest. + The command "make check" runs all the appropriate tests. Details of the PCRE tests are given below in a separate section of this document. @@ -370,15 +405,17 @@ system. The following are installed (file names are al Commands (bin): pcretest - pcregrep + pcregrep (if 8-bit support is enabled) pcre-config Libraries (lib): - libpcre - libpcreposix - libpcrecpp (if C++ support is enabled) + libpcre16 (if 16-bit support is enabled) + libpcre (if 8-bit support is enabled) + libpcreposix (if 8-bit support is enabled) + libpcrecpp (if 8-bit and C++ support is enabled) Configuration information (lib/pkgconfig): + libpcre16.pc libpcre.pc libpcreposix.pc libpcrecpp.pc (if C++ support is enabled) @@ -419,8 +456,8 @@ This removes all the files that "make install" install remove any directories, because these are often shared with other programs. -Retrieving configuration information on Unix-like systems ---------------------------------------------------------- +Retrieving configuration information +------------------------------------ Running "make install" installs the command pcre-config, which can be used to recall information about the PCRE configuration and installation. For example: @@ -445,8 +482,8 @@ The data is held in *.pc files that are installed in a /lib/pkgconfig. -Shared libraries on Unix-like systems -------------------------------------- +Shared libraries +---------------- The default distribution builds PCRE as shared libraries and static libraries, as long as the operating system supports shared libraries. Shared library @@ -471,8 +508,8 @@ Then run "make" in the usual way. Similarly, you can u build only shared libraries. -Cross-compiling on Unix-like systems ------------------------------------- +Cross-compiling using autotools +------------------------------- You can specify CC and CFLAGS in the normal way to the "configure" command, in order to cross-compile PCRE for some other host. However, you should NOT @@ -544,22 +581,23 @@ script creates the .txt and HTML forms of the document Testing PCRE ------------ -To test the basic PCRE library on a Unix system, run the RunTest script that is -created by the configuring process. There is also a script called RunGrepTest -that tests the options of the pcregrep command. If the C++ wrapper library is -built, three test programs called pcrecpp_unittest, pcre_scanner_unittest, and -pcre_stringpiece_unittest are also built. When JIT support is enabled, another -test program called pcre_jit_test is built. +To test the basic PCRE library on a Unix-like system, run the RunTest script. +There is another script called RunGrepTest that tests the options of the +pcregrep command. If the C++ wrapper library is built, three test programs +called pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest +are also built. When JIT support is enabled, another test program called +pcre_jit_test is built. Both the scripts and all the program tests are run if you obey "make check" or -"make test". For other systems, see the instructions in NON-UNIX-USE. +"make test". For other environments, see the instructions in +NON-AUTOTOOLS-BUILD. The RunTest script runs the pcretest test program (which is documented in its own man page) on each of the relevant testinput files in the testdata directory, and compares the output with the contents of the corresponding testoutput files. Some tests are relevant only when certain build-time options -were selected. For example, the tests for UTF-8 support are run only if ---enable-utf8 was used. RunTest outputs a comment when it skips a test. +were selected. For example, the tests for UTF-8/16 support are run only if +--enable-utf was used. RunTest outputs a comment when it skips a test. Many of the tests that are not skipped are run up to three times. The second run forces pcre_study() to be called for all patterns except for a few in some @@ -567,17 +605,25 @@ tests that are marked "never study" (see the pcretest done). If JIT support is available, the non-DFA tests are run a third time, this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option. -RunTest uses a file called testtry to hold the main output from pcretest -(testsavedregex is also used as a working file). To run pcretest on just one of -the test files, give its number as an argument to RunTest, for example: +When both 8-bit and 16-bit support is enabled, the entire set of tests is run +twice, once for each library. If you want to run just one set of tests, call +RunTest with either the -8 or -16 option. - RunTest 2 +RunTest uses a file called testtry to hold the main output from pcretest. +Other files whose names begin with "test" are used as working files in some +tests. To run pcretest on just one or more specific test files, give their +numbers as arguments to RunTest, for example: + RunTest 2 7 11 + +You can also call RunTest with the single argument "list" to cause it to output +a list of tests. + The first test file can be fed directly into the perltest.pl script to check that Perl gives the same results. The only difference you should see is in the first few lines, where the Perl version is given instead of the PCRE version. -The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(), +The second set of tests check pcre_fullinfo(), pcre_study(), pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error detection, and run-time flags that are specific to PCRE, as well as the POSIX wrapper API. It also uses the debugging flags to check some of the internals of @@ -612,38 +658,34 @@ RunTest.bat. The version of RunTest.bat included with Windows versions of test 2. More info on using RunTest.bat is included in the document entitled NON-UNIX-USE.] -The fourth test checks the UTF-8 support. This file can be also fed directly to -the perltest.pl script, provided you are running Perl 5.8 or higher. +The fourth and fifth tests check the UTF-8/16 support and error handling and +internal UTF features of PCRE that are not relevant to Perl, respectively. The +sixth and seventh tests do the same for Unicode character properties support. -The fifth test checks error handling with UTF-8 encoding, and internal UTF-8 -features of PCRE that are not relevant to Perl. +The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative +matching function, in non-UTF-8/16 mode, UTF-8/16 mode, and UTF-8/16 mode with +Unicode property support, respectively. -The sixth test (which is Perl-5.10 compatible) checks the support for Unicode -character properties. This file can be also fed directly to the perltest.pl -script, provided you are running Perl 5.10 or higher. - -The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative -matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode -property support, respectively. - -The tenth test checks some internal offsets and code size features; it is run -only when the default "link size" of 2 is set (in other cases the sizes +The eleventh test checks some internal offsets and code size features; it is +run only when the default "link size" of 2 is set (in other cases the sizes change) and when Unicode property support is enabled. -The eleventh and twelfth tests check out features that are new in Perl 5.10, -without and with UTF-8 support, respectively. This file can be also fed -directly to the perltest.pl script, provided you are running Perl 5.10 or -higher. +The twelfth test is run only when JIT support is available, and the thirteenth +test is run only when JIT support is not available. They test some JIT-specific +features such as information output from pcretest about JIT compilation. -The thirteenth test checks a number internals and non-Perl features concerned -with Unicode property support. +The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and +the seventeenth, eighteenth, and nineteenth tests are run only in 16-bit mode. +These are tests that generate different output in the two modes. They are for +general cases, UTF-8/16 support, and Unicode property support, respectively. -The fourteenth test is run only when JIT support is available, and the -fifteenth test is run only when JIT support is not available. They test some -JIT-specific features such as information output from pcretest about JIT -compilation. +The twentieth test is run only in 16-bit mode. It tests some specific 16-bit +features of the DFA matching engine. +The twenty-first and twenty-second tests are run only in 16-bit mode, when the +link size is set to 2. They test reloading pre-compiled patterns. + Character tables ---------------- @@ -701,7 +743,9 @@ will cause PCRE to malfunction. File manifest ------------- -The distribution should contain the following files: +The distribution should contain the files listed below. Where a file name is +given as pcre[16]_xxx it means that there are two files, one with the name +pcre_xxx and the other with the name pcre16_xxx. (A) Source files of the PCRE library functions and their headers: @@ -710,31 +754,36 @@ The distribution should contain the following files: pcre_chartables.c.dist a default set of character tables that assume ASCII coding; used, unless --enable-rebuild-chartables is - specified, by copying to pcre_chartables.c + specified, by copying to pcre[16]_chartables.c pcreposix.c ) - pcre_compile.c ) - pcre_config.c ) - pcre_dfa_exec.c ) - pcre_exec.c ) - pcre_fullinfo.c ) - pcre_get.c ) sources for the functions in the library, - pcre_globals.c ) and some internal functions that they use - pcre_info.c ) - pcre_jit_compile.c ) - pcre_maketables.c ) - pcre_newline.c ) + pcre[16]_byte_order.c ) + pcre[16]_compile.c ) + pcre[16]_config.c ) + pcre[16]_dfa_exec.c ) + pcre[16]_exec.c ) + pcre[16]_fullinfo.c ) + pcre[16]_get.c ) sources for the functions in the library, + pcre[16]_globals.c ) and some internal functions that they use + pcre[16]_jit_compile.c ) + pcre[16]_maketables.c ) + pcre[16]_newline.c ) + pcre[16]_refcount.c ) + pcre[16]_string_utils.c ) + pcre[16]_study.c ) + pcre[16]_tables.c ) + pcre[16]_ucd.c ) + pcre[16]_version.c ) + pcre[16]_xclass.c ) pcre_ord2utf8.c ) - pcre_refcount.c ) - pcre_study.c ) - pcre_tables.c ) - pcre_try_flipped.c ) - pcre_ucd.c ) pcre_valid_utf8.c ) - pcre_version.c ) - pcre_xclass.c ) - pcre_printint.src ) debugging function that is #included in pcretest, + pcre16_ord2utf16.c ) + pcre16_utf16_utils.c ) + pcre16_valid_utf16.c ) + + pcre[16]_printint.c ) debugging function that is used by pcretest, ) and can also be #included in pcre_compile() + pcre.h.in template for pcre.h when built by "configure" pcreposix.h header for the external POSIX wrapper API pcre_internal.h header for internal use @@ -775,7 +824,8 @@ The distribution should contain the following files: Makefile.am ) the automake input that was used to create ) Makefile.in NEWS important changes in this release - NON-UNIX-USE notes on building PCRE on non-Unix systems + NON-UNIX-USE the previous name for NON-AUTOTOOLS-BUILD + NON-AUTOTOOLS-BUILD notes on building PCRE without using autotools PrepareRelease script to make preparations for "make dist" README this file RunTest a Unix shell script for running tests @@ -796,6 +846,7 @@ The distribution should contain the following files: doc/pcretest.txt plain text documentation of test program doc/perltest.txt plain text documentation of Perl test program install-sh a shell script for installing files + libpcre16.pc.in template for libpcre16.pc for pkg-config libpcre.pc.in template for libpcre.pc for pkg-config libpcreposix.pc.in template for libpcreposix.pc for pkg-config libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config @@ -812,11 +863,13 @@ The distribution should contain the following files: testdata/testinput* test data for main library tests testdata/testoutput* expected test results testdata/grep* input and output for pcregrep tests + testdata/* other supporting test files (D) Auxiliary files for cmake support cmake/COPYING-CMAKE-SCRIPTS cmake/FindPackageHandleStandardArgs.cmake + cmake/FindEditline.cmake cmake/FindReadline.cmake CMakeLists.txt config-cmake.h.in @@ -842,4 +895,4 @@ The distribution should contain the following files: Philip Hazel Email local part: ph10 Email domain: cam.ac.uk -Last updated: 06 September 2011 +Last updated: 18 June 2012