Diff for /embedaddon/pcre/doc/pcreunicode.3 between versions 1.1.1.2 and 1.1.1.3

version 1.1.1.2, 2012/02/21 23:50:25 version 1.1.1.3, 2012/10/09 09:19:17
Line 1 Line 1
.TH PCREUNICODE 3.TH PCREUNICODE 3 "14 April 2012" "PCRE 8.30"
 .SH NAME  .SH NAME
 PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
 .SH "UTF-8, UTF-16, AND UNICODE PROPERTY SUPPORT"  .SH "UTF-8, UTF-16, AND UNICODE PROPERTY SUPPORT"
Line 70  compatibility with Perl 5.6. PCRE does not support thi Line 70  compatibility with Perl 5.6. PCRE does not support thi
 .sp  .sp
 When you set the PCRE_UTF8 flag, the byte strings passed as patterns and  When you set the PCRE_UTF8 flag, the byte strings passed as patterns and
 subjects are (by default) checked for validity on entry to the relevant  subjects are (by default) checked for validity on entry to the relevant
functions. From release 7.3 of PCRE, the check is according the rules of RFCfunctions. The entire string is checked before any other processing takes
3629, which are themselves derived from the Unicode specification. Earlierplace. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
releases of PCRE followed the rules of RFC 2279, which allows the full range ofwhich are themselves derived from the Unicode specification. Earlier releases
31-bit values (0 to 0x7FFFFFFF). The current check allows only values in theof PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
 to U+10FFFF, excluding U+D800 to U+DFFF.
 .P  .P
 The excluded code points are the "Surrogate Area" of Unicode. They are reserved  The excluded code points are the "Surrogate Area" of Unicode. They are reserved
 for use by UTF-16, where they are used in pairs to encode codepoints with  for use by UTF-16, where they are used in pairs to encode codepoints with
Line 84  surrogate thing is a fudge for UTF-16 which unfortunat Line 85  surrogate thing is a fudge for UTF-16 which unfortunat
 .P  .P
 If an invalid UTF-8 string is passed to PCRE, an error return is given. At  If an invalid UTF-8 string is passed to PCRE, an error return is given. At
 compile time, the only additional information is the offset to the first byte  compile time, the only additional information is the offset to the first byte
of the failing character. The runtime functions \fBpcre_exec()\fP andof the failing character. The run-time functions \fBpcre_exec()\fP and
 \fBpcre_dfa_exec()\fP also pass back this information, as well as a more  \fBpcre_dfa_exec()\fP also pass back this information, as well as a more
 detailed reason code if the caller has provided memory in which to do this.  detailed reason code if the caller has provided memory in which to do this.
 .P  .P
 In some situations, you may already know that your strings are valid, and  In some situations, you may already know that your strings are valid, and
therefore want to skip these checks in order to improve performance. If you settherefore want to skip these checks in order to improve performance, for
the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes thatexample in the case of a long subject string that is being scanned repeatedly
the pattern or subject it is given (respectively) contains only valid UTF-8with different patterns. If you set the PCRE_NO_UTF8_CHECK flag at compile time
codes. In this case, it does not diagnose an invalid UTF-8 string.or at run time, PCRE assumes that the pattern or subject it is given
 (respectively) contains only valid UTF-8 codes. In this case, it does not
 diagnose an invalid UTF-8 string.
 .P  .P
 If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what  If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what
 happens depends on why the string is invalid. If the string conforms to the  happens depends on why the string is invalid. If the string conforms to the
Line 124  must be used in pairs in the correct manner. Line 127  must be used in pairs in the correct manner.
 .P  .P
 If an invalid UTF-16 string is passed to PCRE, an error return is given. At  If an invalid UTF-16 string is passed to PCRE, an error return is given. At
 compile time, the only additional information is the offset to the first data  compile time, the only additional information is the offset to the first data
unit of the failing character. The runtime functions \fBpcre16_exec()\fP andunit of the failing character. The run-time functions \fBpcre16_exec()\fP and
 \fBpcre16_dfa_exec()\fP also pass back this information, as well as a more  \fBpcre16_dfa_exec()\fP also pass back this information, as well as a more
 detailed reason code if the caller has provided memory in which to do this.  detailed reason code if the caller has provided memory in which to do this.
 .P  .P
Line 189  documentation. Line 192  documentation.
 7. Similarly, characters that match the POSIX named character classes are all  7. Similarly, characters that match the POSIX named character classes are all
 low-valued characters, unless the PCRE_UCP option is set.  low-valued characters, unless the PCRE_UCP option is set.
 .P  .P
8. However, the horizontal and vertical whitespace matching escapes (\eh, \eH,8. However, the horizontal and vertical white space matching escapes (\eh, \eH,
 \ev, and \eV) do match all the appropriate Unicode characters, whether or not  \ev, and \eV) do match all the appropriate Unicode characters, whether or not
 PCRE_UCP is set.  PCRE_UCP is set.
 .P  .P
Line 217  Cambridge CB2 3QH, England. Line 220  Cambridge CB2 3QH, England.
 .rs  .rs
 .sp  .sp
 .nf  .nf
Last updated: 13 January 2012Last updated: 14 April 2012
 Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
 .fi  .fi

Removed from v.1.1.1.2  
changed lines
  Added in v.1.1.1.3


FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>