--- embedaddon/pcre/ChangeLog 2012/02/21 23:05:51 1.1.1.1 +++ embedaddon/pcre/ChangeLog 2014/06/15 19:46:04 1.1.1.5 @@ -1,6 +1,758 @@ ChangeLog for PCRE ------------------ +Version 8.34 15-December-2013 +----------------------------- + +1. Add pcre[16|32]_jit_free_unused_memory to forcibly free unused JIT + executable memory. Patch inspired by Carsten Klein. + +2. ./configure --enable-coverage defined SUPPORT_GCOV in config.h, although + this macro is never tested and has no effect, because the work to support + coverage involves only compiling and linking options and special targets in + the Makefile. The comment in config.h implied that defining the macro would + enable coverage support, which is totally false. There was also support for + setting this macro in the CMake files (my fault, I just copied it from + configure). SUPPORT_GCOV has now been removed. + +3. Make a small performance improvement in strlen16() and strlen32() in + pcretest. + +4. Change 36 for 8.33 left some unreachable statements in pcre_exec.c, + detected by the Solaris compiler (gcc doesn't seem to be able to diagnose + these cases). There was also one in pcretest.c. + +5. Cleaned up a "may be uninitialized" compiler warning in pcre_exec.c. + +6. In UTF mode, the code for checking whether a group could match an empty + string (which is used for indefinitely repeated groups to allow for + breaking an infinite loop) was broken when the group contained a repeated + negated single-character class with a character that occupied more than one + data item and had a minimum repetition of zero (for example, [^\x{100}]* in + UTF-8 mode). The effect was undefined: the group might or might not be + deemed as matching an empty string, or the program might have crashed. + +7. The code for checking whether a group could match an empty string was not + recognizing that \h, \H, \v, \V, and \R must match a character. + +8. Implemented PCRE_INFO_MATCH_EMPTY, which yields 1 if the pattern can match + an empty string. If it can, pcretest shows this in its information output. + +9. Fixed two related bugs that applied to Unicode extended grapheme clusters + that were repeated with a maximizing qualifier (e.g. \X* or \X{2,5}) when + matched by pcre_exec() without using JIT: + + (a) If the rest of the pattern did not match after a maximal run of + grapheme clusters, the code for backing up to try with fewer of them + did not always back up over a full grapheme when characters that do not + have the modifier quality were involved, e.g. Hangul syllables. + + (b) If the match point in a subject started with modifier character, and + there was no match, the code could incorrectly back up beyond the match + point, and potentially beyond the first character in the subject, + leading to a segfault or an incorrect match result. + +10. A conditional group with an assertion condition could lead to PCRE + recording an incorrect first data item for a match if no other first data + item was recorded. For example, the pattern (?(?=ab)ab) recorded "a" as a + first data item, and therefore matched "ca" after "c" instead of at the + start. + +11. Change 40 for 8.33 (allowing pcregrep to find empty strings) showed up a + bug that caused the command "echo a | ./pcregrep -M '|a'" to loop. + +12. The source of pcregrep now includes z/OS-specific code so that it can be + compiled for z/OS as part of the special z/OS distribution. + +13. Added the -T and -TM options to pcretest. + +14. The code in pcre_compile.c for creating the table of named capturing groups + has been refactored. Instead of creating the table dynamically during the + actual compiling pass, the information is remembered during the pre-compile + pass (on the stack unless there are more than 20 named groups, in which + case malloc() is used) and the whole table is created before the actual + compile happens. This has simplified the code (it is now nearly 150 lines + shorter) and prepared the way for better handling of references to groups + with duplicate names. + +15. A back reference to a named subpattern when there is more than one of the + same name now checks them in the order in which they appear in the pattern. + The first one that is set is used for the reference. Previously only the + first one was inspected. This change makes PCRE more compatible with Perl. + +16. Unicode character properties were updated from Unicode 6.3.0. + +17. The compile-time code for auto-possessification has been refactored, based + on a patch by Zoltan Herczeg. It now happens after instead of during + compilation. The code is cleaner, and more cases are handled. The option + PCRE_NO_AUTO_POSSESS is added for testing purposes, and the -O and /O + options in pcretest are provided to set it. It can also be set by + (*NO_AUTO_POSSESS) at the start of a pattern. + +18. The character VT has been added to the default ("C" locale) set of + characters that match \s and are generally treated as white space, + following this same change in Perl 5.18. There is now no difference between + "Perl space" and "POSIX space". Whether VT is treated as white space in + other locales depends on the locale. + +19. The code for checking named groups as conditions, either for being set or + for being recursed, has been refactored (this is related to 14 and 15 + above). Processing unduplicated named groups should now be as fast at + numerical groups, and processing duplicated groups should be faster than + before. + +20. Two patches to the CMake build system, by Alexander Barkov: + + (1) Replace the "source" command by "." in CMakeLists.txt because + "source" is a bash-ism. + + (2) Add missing HAVE_STDINT_H and HAVE_INTTYPES_H to config-cmake.h.in; + without these the CMake build does not work on Solaris. + +21. Perl has changed its handling of \8 and \9. If there is no previously + encountered capturing group of those numbers, they are treated as the + literal characters 8 and 9 instead of a binary zero followed by the + literals. PCRE now does the same. + +22. Following Perl, added \o{} to specify codepoints in octal, making it + possible to specify values greater than 0777 and also making them + unambiguous. + +23. Perl now gives an error for missing closing braces after \x{... instead of + treating the string as literal. PCRE now does the same. + +24. RunTest used to grumble if an inappropriate test was selected explicitly, + but just skip it when running all tests. This make it awkward to run ranges + of tests when one of them was inappropriate. Now it just skips any + inappropriate tests, as it always did when running all tests. + +25. If PCRE_AUTO_CALLOUT and PCRE_UCP were set for a pattern that contained + character types such as \d or \w, too many callouts were inserted, and the + data that they returned was rubbish. + +26. In UCP mode, \s was not matching two of the characters that Perl matches, + namely NEL (U+0085) and MONGOLIAN VOWEL SEPARATOR (U+180E), though they + were matched by \h. The code has now been refactored so that the lists of + the horizontal and vertical whitespace characters used for \h and \v (which + are defined only in one place) are now also used for \s. + +27. Add JIT support for the 64 bit TileGX architecture. + Patch by Jiong Wang (Tilera Corporation). + +28. Possessive quantifiers for classes (both explicit and automatically + generated) now use special opcodes instead of wrapping in ONCE brackets. + +29. Whereas an item such as A{4}+ ignored the possessivenes of the quantifier + (because it's meaningless), this was not happening when PCRE_CASELESS was + set. Not wrong, but inefficient. + +30. Updated perltest.pl to add /u (force Unicode mode) when /W (use Unicode + properties for \w, \d, etc) is present in a test regex. Otherwise if the + test contains no characters greater than 255, Perl doesn't realise it + should be using Unicode semantics. + +31. Upgraded the handling of the POSIX classes [:graph:], [:print:], and + [:punct:] when PCRE_UCP is set so as to include the same characters as Perl + does in Unicode mode. + +32. Added the "forbid" facility to pcretest so that putting tests into the + wrong test files can sometimes be quickly detected. + +33. There is now a limit (default 250) on the depth of nesting of parentheses. + This limit is imposed to control the amount of system stack used at compile + time. It can be changed at build time by --with-parens-nest-limit=xxx or + the equivalent in CMake. + +34. Character classes such as [A-\d] or [a-[:digit:]] now cause compile-time + errors. Perl warns for these when in warning mode, but PCRE has no facility + for giving warnings. + +35. Change 34 for 8.13 allowed quantifiers on assertions, because Perl does. + However, this was not working for (?!) because it is optimized to (*FAIL), + for which PCRE does not allow quantifiers. The optimization is now disabled + when a quantifier follows (?!). I can't see any use for this, but it makes + things uniform. + +36. Perl no longer allows group names to start with digits, so I have made this + change also in PCRE. It simplifies the code a bit. + +37. In extended mode, Perl ignores spaces before a + that indicates a + possessive quantifier. PCRE allowed a space before the quantifier, but not + before the possessive +. It now does. + +38. The use of \K (reset reported match start) within a repeated possessive + group such as (a\Kb)*+ was not working. + +40. Document that the same character tables must be used at compile time and + run time, and that the facility to pass tables to pcre_exec() and + pcre_dfa_exec() is for use only with saved/restored patterns. + +41. Applied Jeff Trawick's patch CMakeLists.txt, which "provides two new + features for Builds with MSVC: + + 1. Support pcre.rc and/or pcreposix.rc (as is already done for MinGW + builds). The .rc files can be used to set FileDescription and many other + attributes. + + 2. Add an option (-DINSTALL_MSVC_PDB) to enable installation of .pdb files. + This allows higher-level build scripts which want .pdb files to avoid + hard-coding the exact files needed." + +42. Added support for [[:<:]] and [[:>:]] as used in the BSD POSIX library to + mean "start of word" and "end of word", respectively, as a transition aid. + +43. A minimizing repeat of a class containing codepoints greater than 255 in + non-UTF 16-bit or 32-bit modes caused an internal error when PCRE was + compiled to use the heap for recursion. + +44. Got rid of some compiler warnings for unused variables when UTF but not UCP + is configured. + + +Version 8.33 28-May-2013 +------------------------ + +1. Added 'U' to some constants that are compared to unsigned integers, to + avoid compiler signed/unsigned warnings. Added (int) casts to unsigned + variables that are added to signed variables, to ensure the result is + signed and can be negated. + +2. Applied patch by Daniel Richard G for quashing MSVC warnings to the + CMake config files. + +3. Revise the creation of config.h.generic so that all boolean macros are + #undefined, whereas non-boolean macros are #ifndef/#endif-ed. This makes + overriding via -D on the command line possible. + +4. Changing the definition of the variable "op" in pcre_exec.c from pcre_uchar + to unsigned int is reported to make a quite noticeable speed difference in + a specific Windows environment. Testing on Linux did also appear to show + some benefit (and it is clearly not harmful). Also fixed the definition of + Xop which should be unsigned. + +5. Related to (4), changing the definition of the intermediate variable cc + in repeated character loops from pcre_uchar to pcre_uint32 also gave speed + improvements. + +6. Fix forward search in JIT when link size is 3 or greater. Also removed some + unnecessary spaces. + +7. Adjust autogen.sh and configure.ac to lose warnings given by automake 1.12 + and later. + +8. Fix two buffer over read issues in 16 and 32 bit modes. Affects JIT only. + +9. Optimizing fast_forward_start_bits in JIT. + +10. Adding support for callouts in JIT, and fixing some issues revealed + during this work. Namely: + + (a) Unoptimized capturing brackets incorrectly reset on backtrack. + + (b) Minimum length was not checked before the matching is started. + +11. The value of capture_last that is passed to callouts was incorrect in some + cases when there was a capture on one path that was subsequently abandoned + after a backtrack. Also, the capture_last value is now reset after a + recursion, since all captures are also reset in this case. + +12. The interpreter no longer returns the "too many substrings" error in the + case when an overflowing capture is in a branch that is subsequently + abandoned after a backtrack. + +13. In the pathological case when an offset vector of size 2 is used, pcretest + now prints out the matched string after a yield of 0 or 1. + +14. Inlining subpatterns in recursions, when certain conditions are fulfilled. + Only supported by the JIT compiler at the moment. + +15. JIT compiler now supports 32 bit Macs thanks to Lawrence Velazquez. + +16. Partial matches now set offsets[2] to the "bumpalong" value, that is, the + offset of the starting point of the matching process, provided the offsets + vector is large enough. + +17. The \A escape now records a lookbehind value of 1, though its execution + does not actually inspect the previous character. This is to ensure that, + in partial multi-segment matching, at least one character from the old + segment is retained when a new segment is processed. Otherwise, if there + are no lookbehinds in the pattern, \A might match incorrectly at the start + of a new segment. + +18. Added some #ifdef __VMS code into pcretest.c to help VMS implementations. + +19. Redefined some pcre_uchar variables in pcre_exec.c as pcre_uint32; this + gives some modest performance improvement in 8-bit mode. + +20. Added the PCRE-specific property \p{Xuc} for matching characters that can + be expressed in certain programming languages using Universal Character + Names. + +21. Unicode validation has been updated in the light of Unicode Corrigendum #9, + which points out that "non characters" are not "characters that may not + appear in Unicode strings" but rather "characters that are reserved for + internal use and have only local meaning". + +22. When a pattern was compiled with automatic callouts (PCRE_AUTO_CALLOUT) and + there was a conditional group that depended on an assertion, if the + assertion was false, the callout that immediately followed the alternation + in the condition was skipped when pcre_exec() was used for matching. + +23. Allow an explicit callout to be inserted before an assertion that is the + condition for a conditional group, for compatibility with automatic + callouts, which always insert a callout at this point. + +24. In 8.31, (*COMMIT) was confined to within a recursive subpattern. Perl also + confines (*SKIP) and (*PRUNE) in the same way, and this has now been done. + +25. (*PRUNE) is now supported by the JIT compiler. + +26. Fix infinite loop when /(?<=(*SKIP)ac)a/ is matched against aa. + +27. Fix the case where there are two or more SKIPs with arguments that may be + ignored. + +28. (*SKIP) is now supported by the JIT compiler. + +29. (*THEN) is now supported by the JIT compiler. + +30. Update RunTest with additional test selector options. + +31. The way PCRE handles backtracking verbs has been changed in two ways. + + (1) Previously, in something like (*COMMIT)(*SKIP), COMMIT would override + SKIP. Now, PCRE acts on whichever backtracking verb is reached first by + backtracking. In some cases this makes it more Perl-compatible, but Perl's + rather obscure rules do not always do the same thing. + + (2) Previously, backtracking verbs were confined within assertions. This is + no longer the case for positive assertions, except for (*ACCEPT). Again, + this sometimes improves Perl compatibility, and sometimes does not. + +32. A number of tests that were in test 2 because Perl did things differently + have been moved to test 1, because either Perl or PCRE has changed, and + these tests are now compatible. + +32. Backtracking control verbs are now handled in the same way in JIT and + interpreter. + +33. An opening parenthesis in a MARK/PRUNE/SKIP/THEN name in a pattern that + contained a forward subroutine reference caused a compile error. + +34. Auto-detect and optimize limited repetitions in JIT. + +35. Implement PCRE_NEVER_UTF to lock out the use of UTF, in particular, + blocking (*UTF) etc. + +36. In the interpreter, maximizing pattern repetitions for characters and + character types now use tail recursion, which reduces stack usage. + +37. The value of the max lookbehind was not correctly preserved if a compiled + and saved regex was reloaded on a host of different endianness. + +38. Implemented (*LIMIT_MATCH) and (*LIMIT_RECURSION). As part of the extension + of the compiled pattern block, expand the flags field from 16 to 32 bits + because it was almost full. + +39. Try madvise first before posix_madvise. + +40. Change 7 for PCRE 7.9 made it impossible for pcregrep to find empty lines + with a pattern such as ^$. It has taken 4 years for anybody to notice! The + original change locked out all matches of empty strings. This has been + changed so that one match of an empty string per line is recognized. + Subsequent searches on the same line (for colouring or for --only-matching, + for example) do not recognize empty strings. + +41. Applied a user patch to fix a number of spelling mistakes in comments. + +42. Data lines longer than 65536 caused pcretest to crash. + +43. Clarified the data type for length and startoffset arguments for pcre_exec + and pcre_dfa_exec in the function-specific man pages, where they were + explicitly stated to be in bytes, never having been updated. I also added + some clarification to the pcreapi man page. + +44. A call to pcre_dfa_exec() with an output vector size less than 2 caused + a segmentation fault. + + +Version 8.32 30-November-2012 +----------------------------- + +1. Improved JIT compiler optimizations for first character search and single + character iterators. + +2. Supporting IBM XL C compilers for PPC architectures in the JIT compiler. + Patch by Daniel Richard G. + +3. Single character iterator optimizations in the JIT compiler. + +4. Improved JIT compiler optimizations for character ranges. + +5. Rename the "leave" variable names to "quit" to improve WinCE compatibility. + Reported by Giuseppe D'Angelo. + +6. The PCRE_STARTLINE bit, indicating that a match can occur only at the start + of a line, was being set incorrectly in cases where .* appeared inside + atomic brackets at the start of a pattern, or where there was a subsequent + *PRUNE or *SKIP. + +7. Improved instruction cache flush for POWER/PowerPC. + Patch by Daniel Richard G. + +8. Fixed a number of issues in pcregrep, making it more compatible with GNU + grep: + + (a) There is now no limit to the number of patterns to be matched. + + (b) An error is given if a pattern is too long. + + (c) Multiple uses of --exclude, --exclude-dir, --include, and --include-dir + are now supported. + + (d) --exclude-from and --include-from (multiple use) have been added. + + (e) Exclusions and inclusions now apply to all files and directories, not + just to those obtained from scanning a directory recursively. + + (f) Multiple uses of -f and --file-list are now supported. + + (g) In a Windows environment, the default for -d has been changed from + "read" (the GNU grep default) to "skip", because otherwise the presence + of a directory in the file list provokes an error. + + (h) The documentation has been revised and clarified in places. + +9. Improve the matching speed of capturing brackets. + +10. Changed the meaning of \X so that it now matches a Unicode extended + grapheme cluster. + +11. Patch by Daniel Richard G to the autoconf files to add a macro for sorting + out POSIX threads when JIT support is configured. + +12. Added support for PCRE_STUDY_EXTRA_NEEDED. + +13. In the POSIX wrapper regcomp() function, setting re_nsub field in the preg + structure could go wrong in environments where size_t is not the same size + as int. + +14. Applied user-supplied patch to pcrecpp.cc to allow PCRE_NO_UTF8_CHECK to be + set. + +15. The EBCDIC support had decayed; later updates to the code had included + explicit references to (e.g.) \x0a instead of CHAR_LF. There has been a + general tidy up of EBCDIC-related issues, and the documentation was also + not quite right. There is now a test that can be run on ASCII systems to + check some of the EBCDIC-related things (but is it not a full test). + +16. The new PCRE_STUDY_EXTRA_NEEDED option is now used by pcregrep, resulting + in a small tidy to the code. + +17. Fix JIT tests when UTF is disabled and both 8 and 16 bit mode are enabled. + +18. If the --only-matching (-o) option in pcregrep is specified multiple + times, each one causes appropriate output. For example, -o1 -o2 outputs the + substrings matched by the 1st and 2nd capturing parentheses. A separating + string can be specified by --om-separator (default empty). + +19. Improving the first n character searches. + +20. Turn case lists for horizontal and vertical white space into macros so that + they are defined only once. + +21. This set of changes together give more compatible Unicode case-folding + behaviour for characters that have more than one other case when UCP + support is available. + + (a) The Unicode property table now has offsets into a new table of sets of + three or more characters that are case-equivalent. The MultiStage2.py + script that generates these tables (the pcre_ucd.c file) now scans + CaseFolding.txt instead of UnicodeData.txt for character case + information. + + (b) The code for adding characters or ranges of characters to a character + class has been abstracted into a generalized function that also handles + case-independence. In UTF-mode with UCP support, this uses the new data + to handle characters with more than one other case. + + (c) A bug that is fixed as a result of (b) is that codepoints less than 256 + whose other case is greater than 256 are now correctly matched + caselessly. Previously, the high codepoint matched the low one, but not + vice versa. + + (d) The processing of \h, \H, \v, and \ in character classes now makes use + of the new class addition function, using character lists defined as + macros alongside the case definitions of 20 above. + + (e) Caseless back references now work with characters that have more than + one other case. + + (f) General caseless matching of characters with more than one other case + is supported. + +22. Unicode character properties were updated from Unicode 6.2.0 + +23. Improved CMake support under Windows. Patch by Daniel Richard G. + +24. Add support for 32-bit character strings, and UTF-32 + +25. Major JIT compiler update (code refactoring and bugfixing). + Experimental Sparc 32 support is added. + +26. Applied a modified version of Daniel Richard G's patch to create + pcre.h.generic and config.h.generic by "make" instead of in the + PrepareRelease script. + +27. Added a definition for CHAR_NULL (helpful for the z/OS port), and use it in + pcre_compile.c when checking for a zero character. + +28. Introducing a native interface for JIT. Through this interface, the compiled + machine code can be directly executed. The purpose of this interface is to + provide fast pattern matching, so several sanity checks are not performed. + However, feature tests are still performed. The new interface provides + 1.4x speedup compared to the old one. + +29. If pcre_exec() or pcre_dfa_exec() was called with a negative value for + the subject string length, the error given was PCRE_ERROR_BADOFFSET, which + was confusing. There is now a new error PCRE_ERROR_BADLENGTH for this case. + +30. In 8-bit UTF-8 mode, pcretest failed to give an error for data codepoints + greater than 0x7fffffff (which cannot be represented in UTF-8, even under + the "old" RFC 2279). Instead, it ended up passing a negative length to + pcre_exec(). + +31. Add support for GCC's visibility feature to hide internal functions. + +32. Running "pcretest -C pcre8" or "pcretest -C pcre16" gave a spurious error + "unknown -C option" after outputting 0 or 1. + +33. There is now support for generating a code coverage report for the test + suite in environments where gcc is the compiler and lcov is installed. This + is mainly for the benefit of the developers. + +34. If PCRE is built with --enable-valgrind, certain memory regions are marked + unaddressable using valgrind annotations, allowing valgrind to detect + invalid memory accesses. This is mainly for the benefit of the developers. + +25. (*UTF) can now be used to start a pattern in any of the three libraries. + +26. Give configure error if --enable-cpp but no C++ compiler found. + + +Version 8.31 06-July-2012 +------------------------- + +1. Fixing a wrong JIT test case and some compiler warnings. + +2. Removed a bashism from the RunTest script. + +3. Add a cast to pcre_exec.c to fix the warning "unary minus operator applied + to unsigned type, result still unsigned" that was given by an MS compiler + on encountering the code "-sizeof(xxx)". + +4. Partial matching support is added to the JIT compiler. + +5. Fixed several bugs concerned with partial matching of items that consist + of more than one character: + + (a) /^(..)\1/ did not partially match "aba" because checking references was + done on an "all or nothing" basis. This also applied to repeated + references. + + (b) \R did not give a hard partial match if \r was found at the end of the + subject. + + (c) \X did not give a hard partial match after matching one or more + characters at the end of the subject. + + (d) When newline was set to CRLF, a pattern such as /a$/ did not recognize + a partial match for the string "\r". + + (e) When newline was set to CRLF, the metacharacter "." did not recognize + a partial match for a CR character at the end of the subject string. + +6. If JIT is requested using /S++ or -s++ (instead of just /S+ or -s+) when + running pcretest, the text "(JIT)" added to the output whenever JIT is + actually used to run the match. + +7. Individual JIT compile options can be set in pcretest by following -s+[+] + or /S+[+] with a digit between 1 and 7. + +8. OP_NOT now supports any UTF character not just single-byte ones. + +9. (*MARK) control verb is now supported by the JIT compiler. + +10. The command "./RunTest list" lists the available tests without actually + running any of them. (Because I keep forgetting what they all are.) + +11. Add PCRE_INFO_MAXLOOKBEHIND. + +12. Applied a (slightly modified) user-supplied patch that improves performance + when the heap is used for recursion (compiled with --disable-stack-for- + recursion). Instead of malloc and free for each heap frame each time a + logical recursion happens, frames are retained on a chain and re-used where + possible. This sometimes gives as much as 30% improvement. + +13. As documented, (*COMMIT) is now confined to within a recursive subpattern + call. + +14. As documented, (*COMMIT) is now confined to within a positive assertion. + +15. It is now possible to link pcretest with libedit as an alternative to + libreadline. + +16. (*COMMIT) control verb is now supported by the JIT compiler. + +17. The Unicode data tables have been updated to Unicode 6.1.0. + +18. Added --file-list option to pcregrep. + +19. Added binary file support to pcregrep, including the -a, --binary-files, + -I, and --text options. + +20. The madvise function is renamed for posix_madvise for QNX compatibility + reasons. Fixed by Giuseppe D'Angelo. + +21. Fixed a bug for backward assertions with REVERSE 0 in the JIT compiler. + +22. Changed the option for creating symbolic links for 16-bit man pages from + -s to -sf so that re-installing does not cause issues. + +23. Support PCRE_NO_START_OPTIMIZE in JIT as (*MARK) support requires it. + +24. Fixed a very old bug in pcretest that caused errors with restarted DFA + matches in certain environments (the workspace was not being correctly + retained). Also added to pcre_dfa_exec() a simple plausibility check on + some of the workspace data at the beginning of a restart. + +25. \s*\R was auto-possessifying the \s* when it should not, whereas \S*\R + was not doing so when it should - probably a typo introduced by SVN 528 + (change 8.10/14). + +26. When PCRE_UCP was not set, \w+\x{c4} was incorrectly auto-possessifying the + \w+ when the character tables indicated that \x{c4} was a word character. + There were several related cases, all because the tests for doing a table + lookup were testing for characters less than 127 instead of 255. + +27. If a pattern contains capturing parentheses that are not used in a match, + their slots in the ovector are set to -1. For those that are higher than + any matched groups, this happens at the end of processing. In the case when + there were back references that the ovector was too small to contain + (causing temporary malloc'd memory to be used during matching), and the + highest capturing number was not used, memory off the end of the ovector + was incorrectly being set to -1. (It was using the size of the temporary + memory instead of the true size.) + +28. To catch bugs like 27 using valgrind, when pcretest is asked to specify an + ovector size, it uses memory at the end of the block that it has got. + +29. Check for an overlong MARK name and give an error at compile time. The + limit is 255 for the 8-bit library and 65535 for the 16-bit library. + +30. JIT compiler update. + +31. JIT is now supported on jailbroken iOS devices. Thanks for Ruiger + Rill for the patch. + +32. Put spaces around SLJIT_PRINT_D in the JIT compiler. Required by CXX11. + +33. Variable renamings in the PCRE-JIT compiler. No functionality change. + +34. Fixed typos in pcregrep: in two places there was SUPPORT_LIBZ2 instead of + SUPPORT_LIBBZ2. This caused a build problem when bzip2 but not gzip (zlib) + was enabled. + +35. Improve JIT code generation for greedy plus quantifier. + +36. When /((?:a?)*)*c/ or /((?>a?)*)*c/ was matched against "aac", it set group + 1 to "aa" instead of to an empty string. The bug affected repeated groups + that could potentially match an empty string. + +37. Optimizing single character iterators in JIT. + +38. Wide characters specified with \uxxxx in JavaScript mode are now subject to + the same checks as \x{...} characters in non-JavaScript mode. Specifically, + codepoints that are too big for the mode are faulted, and in a UTF mode, + disallowed codepoints are also faulted. + +39. If PCRE was compiled with UTF support, in three places in the DFA + matcher there was code that should only have been obeyed in UTF mode, but + was being obeyed unconditionally. In 8-bit mode this could cause incorrect + processing when bytes with values greater than 127 were present. In 16-bit + mode the bug would be provoked by values in the range 0xfc00 to 0xdc00. In + both cases the values are those that cannot be the first data item in a UTF + character. The three items that might have provoked this were recursions, + possessively repeated groups, and atomic groups. + +40. Ensure that libpcre is explicitly listed in the link commands for pcretest + and pcregrep, because some OS require shared objects to be explicitly + passed to ld, causing the link step to fail if they are not. + +41. There were two incorrect #ifdefs in pcre_study.c, meaning that, in 16-bit + mode, patterns that started with \h* or \R* might be incorrectly matched. + + +Version 8.30 04-February-2012 +----------------------------- + +1. Renamed "isnumber" as "is_a_number" because in some Mac environments this + name is defined in ctype.h. + +2. Fixed a bug in fixed-length calculation for lookbehinds that would show up + only in quite long subpatterns. + +3. Removed the function pcre_info(), which has been obsolete and deprecated + since it was replaced by pcre_fullinfo() in February 2000. + +4. For a non-anchored pattern, if (*SKIP) was given with a name that did not + match a (*MARK), and the match failed at the start of the subject, a + reference to memory before the start of the subject could occur. This bug + was introduced by fix 17 of release 8.21. + +5. A reference to an unset group with zero minimum repetition was giving + totally wrong answers (in non-JavaScript-compatibility mode). For example, + /(another)?(\1?)test/ matched against "hello world test". This bug was + introduced in release 8.13. + +6. Add support for 16-bit character strings (a large amount of work involving + many changes and refactorings). + +7. RunGrepTest failed on msys because \r\n was replaced by whitespace when the + command "pattern=`printf 'xxx\r\njkl'`" was run. The pattern is now taken + from a file. + +8. Ovector size of 2 is also supported by JIT based pcre_exec (the ovector size + rounding is not applied in this particular case). + +9. The invalid Unicode surrogate codepoints U+D800 to U+DFFF are now rejected + if they appear, or are escaped, in patterns. + +10. Get rid of a number of -Wunused-but-set-variable warnings. + +11. The pattern /(?=(*:x))(q|)/ matches an empty string, and returns the mark + "x". The similar pattern /(?=(*:x))((*:y)q|)/ did not return a mark at all. + Oddly, Perl behaves the same way. PCRE has been fixed so that this pattern + also returns the mark "x". This bug applied to capturing parentheses, + non-capturing parentheses, and atomic parentheses. It also applied to some + assertions. + +12. Stephen Kelly's patch to CMakeLists.txt allows it to parse the version + information out of configure.ac instead of relying on pcre.h.generic, which + is not stored in the repository. + +13. Applied Dmitry V. Levin's patch for a more portable method for linking with + -lreadline. + +14. ZH added PCRE_CONFIG_JITTARGET; added its output to pcretest -C. + +15. Applied Graycode's patch to put the top-level frame on the stack rather + than the heap when not using the stack for recursion. This gives a + performance improvement in many cases when recursion is not deep. + +16. Experimental code added to "pcretest -C" to output the stack frame size. + + Version 8.21 12-Dec-2011 ------------------------ @@ -1131,7 +1883,8 @@ Version 7.9 11-Apr-09 7. A pattern that could match an empty string could cause pcregrep to loop; it doesn't make sense to accept an empty string match in pcregrep, so I have locked it out (using PCRE's PCRE_NOTEMPTY option). By experiment, this - seems to be how GNU grep behaves. + seems to be how GNU grep behaves. [But see later change 40 for release + 8.33.] 8. The pattern (?(?=.*b)b|^) was incorrectly compiled as "match must be at start or after a newline", because the conditional assertion was not being @@ -1374,7 +2127,7 @@ Version 7.7 07-May-08 containing () gave an internal compiling error instead of "reference to non-existent subpattern". Fortunately, when the pattern did exist, the compiled code was correct. (When scanning forwards to check for the - existencd of the subpattern, it was treating the data ']' as terminating + existence of the subpattern, it was treating the data ']' as terminating the class, so got the count wrong. When actually compiling, the reference was subsequently set up correctly.)