embedaddon/pcre/doc/pcreunicode.3 - diff

Return to pcreunicode.3 CVS log

Up to [ELWIX - Embedded LightWeight unIX -] / embedaddon / pcre / doc

Diff for /embedaddon/pcre/doc/pcreunicode.3 between versions 1.1.1.2 and 1.1.1.3

version 1.1.1.2, 2012/02/21 23:50:25	version 1.1.1.3, 2012/10/09 09:19:17
Line 1	Line 1
.TH PCREUNICODE 3	.TH PCREUNICODE 3 "14 April 2012" "PCRE 8.30"
.SH NAME	.SH NAME
PCRE - Perl-compatible regular expressions	PCRE - Perl-compatible regular expressions
.SH "UTF-8, UTF-16, AND UNICODE PROPERTY SUPPORT"	.SH "UTF-8, UTF-16, AND UNICODE PROPERTY SUPPORT"
Line 70 compatibility with Perl 5.6. PCRE does not support thi	Line 70 compatibility with Perl 5.6. PCRE does not support thi
.sp	.sp
When you set the PCRE_UTF8 flag, the byte strings passed as patterns and	When you set the PCRE_UTF8 flag, the byte strings passed as patterns and
subjects are (by default) checked for validity on entry to the relevant	subjects are (by default) checked for validity on entry to the relevant
functions. From release 7.3 of PCRE, the check is according the rules of RFC	functions. The entire string is checked before any other processing takes
3629, which are themselves derived from the Unicode specification. Earlier	place. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
releases of PCRE followed the rules of RFC 2279, which allows the full range of	which are themselves derived from the Unicode specification. Earlier releases
31-bit values (0 to 0x7FFFFFFF). The current check allows only values in the	of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.	values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
	to U+10FFFF, excluding U+D800 to U+DFFF.
.P	.P
The excluded code points are the "Surrogate Area" of Unicode. They are reserved	The excluded code points are the "Surrogate Area" of Unicode. They are reserved
for use by UTF-16, where they are used in pairs to encode codepoints with	for use by UTF-16, where they are used in pairs to encode codepoints with
Line 84 surrogate thing is a fudge for UTF-16 which unfortunat	Line 85 surrogate thing is a fudge for UTF-16 which unfortunat
.P	.P
If an invalid UTF-8 string is passed to PCRE, an error return is given. At	If an invalid UTF-8 string is passed to PCRE, an error return is given. At
compile time, the only additional information is the offset to the first byte	compile time, the only additional information is the offset to the first byte
of the failing character. The runtime functions \fBpcre_exec()\fP and	of the failing character. The run-time functions \fBpcre_exec()\fP and
\fBpcre_dfa_exec()\fP also pass back this information, as well as a more	\fBpcre_dfa_exec()\fP also pass back this information, as well as a more
detailed reason code if the caller has provided memory in which to do this.	detailed reason code if the caller has provided memory in which to do this.
.P	.P
In some situations, you may already know that your strings are valid, and	In some situations, you may already know that your strings are valid, and
therefore want to skip these checks in order to improve performance. If you set	therefore want to skip these checks in order to improve performance, for
the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes that	example in the case of a long subject string that is being scanned repeatedly
the pattern or subject it is given (respectively) contains only valid UTF-8	with different patterns. If you set the PCRE_NO_UTF8_CHECK flag at compile time
codes. In this case, it does not diagnose an invalid UTF-8 string.	or at run time, PCRE assumes that the pattern or subject it is given
	(respectively) contains only valid UTF-8 codes. In this case, it does not
	diagnose an invalid UTF-8 string.
.P	.P
If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what	If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what
happens depends on why the string is invalid. If the string conforms to the	happens depends on why the string is invalid. If the string conforms to the
Line 124 must be used in pairs in the correct manner.	Line 127 must be used in pairs in the correct manner.
.P	.P
If an invalid UTF-16 string is passed to PCRE, an error return is given. At	If an invalid UTF-16 string is passed to PCRE, an error return is given. At
compile time, the only additional information is the offset to the first data	compile time, the only additional information is the offset to the first data
unit of the failing character. The runtime functions \fBpcre16_exec()\fP and	unit of the failing character. The run-time functions \fBpcre16_exec()\fP and
\fBpcre16_dfa_exec()\fP also pass back this information, as well as a more	\fBpcre16_dfa_exec()\fP also pass back this information, as well as a more
detailed reason code if the caller has provided memory in which to do this.	detailed reason code if the caller has provided memory in which to do this.
.P	.P
Line 189 documentation.	Line 192 documentation.
7. Similarly, characters that match the POSIX named character classes are all	7. Similarly, characters that match the POSIX named character classes are all
low-valued characters, unless the PCRE_UCP option is set.	low-valued characters, unless the PCRE_UCP option is set.
.P	.P
8. However, the horizontal and vertical whitespace matching escapes (\eh, \eH,	8. However, the horizontal and vertical white space matching escapes (\eh, \eH,
\ev, and \eV) do match all the appropriate Unicode characters, whether or not	\ev, and \eV) do match all the appropriate Unicode characters, whether or not
PCRE_UCP is set.	PCRE_UCP is set.
.P	.P
Line 217 Cambridge CB2 3QH, England.	Line 220 Cambridge CB2 3QH, England.
.rs	.rs
.sp	.sp
.nf	.nf
Last updated: 13 January 2012	Last updated: 14 April 2012
Copyright (c) 1997-2012 University of Cambridge.	Copyright (c) 1997-2012 University of Cambridge.
.fi	.fi

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>

Removed from v.1.1.1.2
changed lines
	Added in v.1.1.1.3