embedaddon/pcre/doc/pcre.txt - diff

Return to pcre.txt CVS log

Up to [ELWIX - Embedded LightWeight unIX -] / embedaddon / pcre / doc

Diff for /embedaddon/pcre/doc/pcre.txt between versions 1.1.1.2 and 1.1.1.3

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>

version 1.1.1.2, 2012/02/21 23:50:25	version 1.1.1.3, 2012/10/09 09:19:17
Line 367 OPTION NAMES	Line 367 OPTION NAMES
There are two new general option names, PCRE_UTF16 and	There are two new general option names, PCRE_UTF16 and
PCRE_NO_UTF16_CHECK, which correspond to PCRE_UTF8 and	PCRE_NO_UTF16_CHECK, which correspond to PCRE_UTF8 and
PCRE_NO_UTF8_CHECK in the 8-bit library. In fact, these new options	PCRE_NO_UTF8_CHECK in the 8-bit library. In fact, these new options
define the same bits in the options word.	define the same bits in the options word. There is a discussion about
	the validity of UTF-16 strings in the pcreunicode page.

For the pcre16_config() function there is an option PCRE_CONFIG_UTF16	For the pcre16_config() function there is an option PCRE_CONFIG_UTF16
that returns 1 if UTF-16 support is configured, otherwise 0. If this	that returns 1 if UTF-16 support is configured, otherwise 0. If this
option is given to pcre_config(), or if the PCRE_CONFIG_UTF8 option is	option is given to pcre_config(), or if the PCRE_CONFIG_UTF8 option is
given to pcre16_config(), the result is the PCRE_ERROR_BADOPTION error.	given to pcre16_config(), the result is the PCRE_ERROR_BADOPTION error.


CHARACTER CODES	CHARACTER CODES

In 16-bit mode, when PCRE_UTF16 is not set, character values are	In 16-bit mode, when PCRE_UTF16 is not set, charact In 16-bit mode, when PCRE_UTF16 is not set, charact
treated in the same way as in 8-bit, non UTF-8 mode, except, of course,	treated in the same way as in 8-bit, non UTF-8 mode, except, of course,
that they can range from 0 to 0xffff instead of 0 to 0xff. Character	that they can range from 0 to 0xffff instead of 0 to 0xff. Character
types for characters less than 0xff can therefore be influenced by the	types for characters less than 0xff can therefore be influenced by the
locale in the same way as before. Characters greater than 0xff have	locale in the same way as before. Characters greater than 0xff have
only one case, and no "type" (such as letter or digit).	only one case, and no "type" (such as letter or digit).

In UTF-16 mode, the character code is Unicode, in the range 0 to	In UTF-16 mode, the character code is Unicode, in the range 0 to
0x10ffff, with the exception of values in the range 0xd800 to 0xdfff	0x10ffff, with the exception of values in the range 0xd800 to 0xdfff
because those are "surrogate" values that are used in pairs to encode	because those are "surrogate" values that are used in pairs to encode
values greater than 0xffff.	values greater than 0xffff.

A UTF-16 string can indicate its endianness by special code knows as a	A UTF-16 string can indicate its endianness by special code knows as a
byte-order mark (BOM). The PCRE functions do not handle this, expecting	byte-order mark (BOM). The PCRE functions do not handle this, expecting
strings to be in host byte order. A utility function called	strings to be in host byte order. A utility function called
pcre16_utf16_to_host_byte_order() is provided to help with this (see	pcre16_utf16_to_host_byte_order() is provided to help with this (see
above).	above).


ERROR NAMES	ERROR NAMES

The errors PCRE_ERROR_BADUTF16_OFFSET and PCRE_ERROR_SHORTUTF16 corre-	The errors PCRE_ERROR_BADUTF16_OFFSET and PCRE_ERROR_SHORTUTF16 corre-
spond to their 8-bit counterparts. The error PCRE_ERROR_BADMODE is	spond to their 8-bit counterparts. The error PCRE_ERROR_BADMODE is
given when a compiled pattern is passed to a function that processes	given when a compiled pattern is passed to a function that processes
patterns in the other mode, for example, if a pattern compiled with	patterns in the other mode, for example, if a pattern compiled with

Removed from v.1.1.1.2
changed lines
	Added in v.1.1.1.3