embedaddon/pcre/doc/pcrepattern.3 - diff

Return to pcrepattern.3 CVS log

Up to [ELWIX - Embedded LightWeight unIX -] / embedaddon / pcre / doc

Diff for /embedaddon/pcre/doc/pcrepattern.3 between versions 1.1.1.2 and 1.1.1.3

version 1.1.1.2, 2012/02/21 23:50:25	version 1.1.1.3, 2012/10/09 09:19:17
Line 1	Line 1
.TH PCREPATTERN 3	.TH PCREPATTERN 3 "04 May 2012" "PCRE 8.31"
.SH NAME	.SH NAME
PCRE - Perl-compatible regular expressions	PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION DETAILS"	.SH "PCRE REGULAR EXPRESSION DETAILS"
Line 198 In a UTF mode, only ASCII numbers and letters have any	Line 198 In a UTF mode, only ASCII numbers and letters have any
backslash. All other characters (in particular, those whose codepoints are	backslash. All other characters (in particular, those whose codepoints are
greater than 127) are treated as literals.	greater than 127) are treated as literals.
.P	.P
If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the	If a pattern is compiled with the PCRE_EXTENDED option, white space in the
pattern (other than in a character class) and characters between a # outside	pattern (other than in a character class) and characters between a # outside
a character class and the next newline are ignored. An escaping backslash can	a character class and the next newline are ignored. An escaping backslash can
be used to include a whitespace or # character as part of the pattern.	be used to include a white space or # character as part of the pattern.
.P	.P
If you want to remove the special meaning from a sequence of characters, you	If you want to remove the special meaning from a sequence of characters, you
can do so by putting them between \eQ and \eE. This is different from Perl in	can do so by putting them between \eQ and \eE. This is different from Perl in
Line 237 one of the following escape sequences than the binary	Line 237 one of the following escape sequences than the binary
\ea alarm, that is, the BEL character (hex 07)	\ea alarm, that is, the BEL character (hex 07)
\ecx "control-x", where x is any ASCII character	\ecx "control-x", where x is any ASCII character
\ee escape (hex 1B)	\ee escape (hex 1B)
\ef formfeed (hex 0C)	\ef form feed (hex 0C)
\en linefeed (hex 0A)	\en linefeed (hex 0A)
\er carriage return (hex 0D)	\er carriage return (hex 0D)
\et tab (hex 09)	\et tab (hex 09)
Line 277 as just described only when it is followed by two hexa	Line 277 as just described only when it is followed by two hexa
Otherwise, it matches a literal "x" character. In JavaScript mode, support for	Otherwise, it matches a literal "x" character. In JavaScript mode, support for
code points greater than 256 is provided by \eu, which must be followed by	code points greater than 256 is provided by \eu, which must be followed by
four hexadecimal digits; otherwise it matches a literal "u" character.	four hexadecimal digits; otherwise it matches a literal "u" character.
	Character codes specified by \eu in JavaScript mode are constrained in the same
	was as those specified by \ex in non-JavaScript mode.
.P	.P
Characters whose value is less than 256 can be defined by either of the two	Characters whose value is less than 256 can be defined by either of the two
syntaxes for \ex (or by \eu in JavaScript mode). There is no difference in the	syntaxes for \ex (or by \eu in JavaScript mode). There is no difference in the
Line 399 Another use of backslash is for specifying generic cha	Line 401 Another use of backslash is for specifying generic cha
.sp	.sp
\ed any decimal digit	\ed any decimal digit
\eD any character that is not a decimal digit	\eD any character that is not a decimal digit
\eh any horizontal whitespace character	\eh any horizontal white space character
\eH any character that is not a horizontal whitespace character	\eH any character that is not a horizontal white space character
\es any whitespace character	\es any white space character
\eS any character that is not a whitespace character	\eS any character that is not a white space character
\ev any vertical whitespace character	\ev any vertical white space character
\eV any character that is not a vertical whitespace character	\eV any character that is not a vertical white space character
\ew any "word" character	\ew any "word" character
\eW any "non-word" character	\eW any "non-word" character
.sp	.sp
Line 493 The vertical space characters are:	Line 495 The vertical space characters are:
.sp	.sp
U+000A Linefeed	U+000A Linefeed
U+000B Vertical tab	U+000B Vertical tab
U+000C Formfeed	U+000C Form feed
U+000D Carriage return	U+000D Carriage return
U+0085 Next line	U+0085 Next line
U+2028 Line separator	U+2028 Line separator
Line 520 below.	Line 522 below.
.\"	.\"
This particular group matches either the two-character sequence CR followed by	This particular group matches either the two-character sequence CR followed by
LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,	LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (next	U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
line, U+0085). The two-character sequence is treated as a single unit that	line, U+0085). The two-character sequence is treated as a single unit that
cannot be split.	cannot be split.
.P	.P
Line 596 Armenian,	Line 598 Armenian,
Avestan,	Avestan,
Balinese,	Balinese,
Bamum,	Bamum,
	Batak,
Bengali,	Bengali,
Bopomofo,	Bopomofo,
	Brahmi,
Braille,	Braille,
Buginese,	Buginese,
Buhid,	Buhid,
Canadian_Aboriginal,	Canadian_Aboriginal,
Carian,	Carian,
	Chakma,
Cham,	Cham,
Cherokee,	Cherokee,
Common,	Common,
Line 645 Lisu,	Line 650 Lisu,
Lycian,	Lycian,
Lydian,	Lydian,
Malayalam,	Malayalam,
	Mandaic,
Meetei_Mayek,	Meetei_Mayek,
	Meroitic_Cursive,
	Meroitic_Hieroglyphs,
	Miao,
Mongolian,	Mongolian,
Myanmar,	Myanmar,
New_Tai_Lue,	New_Tai_Lue,
Line 664 Rejang,	Line 673 Rejang,
Runic,	Runic,
Samaritan,	Samaritan,
Saurashtra,	Saurashtra,
	Sharada,
Shavian,	Shavian,
Sinhala,	Sinhala,
	Sora_Sompeng,
Sundanese,	Sundanese,
Syloti_Nagri,	Syloti_Nagri,
Syriac,	Syriac,
Line 674 Tagbanwa,	Line 685 Tagbanwa,
Tai_Le,	Tai_Le,
Tai_Tham,	Tai_Tham,
Tai_Viet,	Tai_Viet,
	Takri,
Tamil,	Tamil,
Telugu,	Telugu,
Thaana,	Thaana,
Line 809 PCRE_UCP is set. They are:	Line 821 PCRE_UCP is set. They are:
Xwd Any Perl "word" character	Xwd Any Perl "word" character
.sp	.sp
Xan matches characters that have either the L (letter) or the N (number)	Xan matches characters that have either the L (letter) or the N (number)
property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or	property. Xps matches the characters tab, linefeed, vertical tab, form feed, or
carriage return, and any other character that has the Z (separator) property.	carriage return, and any other character that has the Z (separator) property.
Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the	Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the
same characters as Xan, plus underscore.	same characters as Xan, plus underscore.
Line 1010 used. Because \eC breaks up characters into individual	Line 1022 used. Because \eC breaks up characters into individual
unit with \eC in a UTF mode means that the rest of the string may start with a	unit with \eC in a UTF mode means that the rest of the string may start with a
malformed UTF character. This has undefined results, because PCRE assumes that	malformed UTF character. This has undefined results, because PCRE assumes that
it is dealing with valid UTF strings (and by default it checks this at the	it is dealing with valid UTF strings (and by default it checks this at the
start of processing unless the PCRE_NO_UTF8_CHECK option is used).	start of processing unless the PCRE_NO_UTF8_CHECK or PCRE_NO_UTF16_CHECK option
	is used).
.P	.P
PCRE does not allow \eC to appear in lookbehind assertions	PCRE does not allow \eC to appear in lookbehind assertions
.\" HTML <a href="#lookbehind">	.\" HTML <a href="#lookbehind">
Line 1832 Because there may be many capturing parentheses in a p	Line 1845 Because there may be many capturing parentheses in a p
following a backslash are taken as part of a potential back reference number.	following a backslash are taken as part of a potential back reference number.
If the pattern continues with a digit character, some delimiter must be used to	If the pattern continues with a digit character, some delimiter must be used to
terminate the back reference. If the PCRE_EXTENDED option is set, this can be	terminate the back reference. If the PCRE_EXTENDED option is set, this can be
whitespace. Otherwise, the \eg{ syntax or an empty comment (see	white space. Otherwise, the \eg{ syntax or an empty comment (see
.\" HTML <a href="#comments">	.\" HTML <a href="#comments">
.\" </a>	.\" </a>
"Comments"	"Comments"
Line 2189 subroutines that can be referenced from elsewhere. (Th	Line 2202 subroutines that can be referenced from elsewhere. (Th
subroutines	subroutines
.\"	.\"
is described below.) For example, a pattern to match an IPv4 address such as	is described below.) For example, a pattern to match an IPv4 address such as
"192.168.23.245" could be written like this (ignore whitespace and line	"192.168.23.245" could be written like this (ignore white space and line
breaks):	breaks):
.sp	.sp
(?(DEFINE) (?<byte> 2[0-4]\ed \| 25[0-5] \| 1\ed\ed \| [1-9]?\ed) )	(?(DEFINE) (?<byte> 2[0-4]\ed \| 25[0-5] \| 1\ed\ed \| [1-9]?\ed) )
Line 2588 exception: the name from a (MARK), (PRUNE), or (*THE	Line 2601 exception: the name from a (MARK), (PRUNE), or (*THE
a successful positive assertion \fIis\fP passed back when a match succeeds	a successful positive assertion \fIis\fP passed back when a match succeeds
(compare capturing parentheses in assertions). Note that such subpatterns are	(compare capturing parentheses in assertions). Note that such subpatterns are
processed as anchored at the point where they are tested. Note also that Perl's	processed as anchored at the point where they are tested. Note also that Perl's
treatment of subroutines is different in some cases.	treatment of subroutines and assertions is different in some cases.
.P	.P
The new verbs make use of what was previously invalid syntax: an opening	The new verbs make use of what was previously invalid syntax: an opening
parenthesis followed by an asterisk. They are generally of the form	parenthesis followed by an asterisk. They are generally of the form
(VERB) or (VERB:NAME). Some may take either form, with differing behaviour,	(VERB) or (VERB:NAME). Some may take either form, with differing behaviour,
depending on whether or not an argument is present. A name is any sequence of	depending on whether or not an argument is present. A name is any sequence of
characters that does not include a closing parenthesis. If the name is empty,	characters that does not include a closing parenthesis. The maximum length of
that is, if the closing parenthesis immediately follows the colon, the effect	name is 255 in the 8-bit library and 65535 in the 16-bit library. If the name
is as if the colon were not there. Any number of these verbs may occur in a	is empty, that is, if the closing parenthesis immediately follows the colon,
pattern.	the effect is as if the colon were not there. Any number of these verbs may
.P	occur in a pattern.
	.
	.
	.\" HTML <a name="nooptimize"></a>
	.SS "Optimizations that affect backtracking verbs"
	.rs
	.sp
PCRE contains some optimizations that are used to speed up matching by running	PCRE contains some optimizations that are used to speed up matching by running
some checks at the start of each match attempt. For example, it may know the	some checks at the start of each match attempt. For example, it may know the
minimum length of matching subject, or that a particular character must be	minimum length of matching subject, or that a particular character must be
Line 2606 present. When one of these optimizations suppresses th	Line 2625 present. When one of these optimizations suppresses th
included backtracking verbs will not, of course, be processed. You can suppress	included backtracking verbs will not, of course, be processed. You can suppress
the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option	the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option
when calling \fBpcre_compile()\fP or \fBpcre_exec()\fP, or by starting the	when calling \fBpcre_compile()\fP or \fBpcre_exec()\fP, or by starting the
pattern with (*NO_START_OPT).	pattern with (*NO_START_OPT). There is more discussion of this option in the
	section entitled
	.\" HTML <a href="pcreapi.html#execoptions">
	.\" </a>
	"Option bits for \fBpcre_exec()\fP"
	.\"
	in the
	.\" HREF
	\fBpcreapi\fP
	.\"
	documentation.
.P	.P
Experiments with Perl suggest that it too has similar optimizations, sometimes	Experiments with Perl suggest that it too has similar optimizations, sometimes
leading to anomalous results.	leading to anomalous results.
Line 2695 After a partial match or a failed match, the name of t	Line 2724 After a partial match or a failed match, the name of t
No match, mark = B	No match, mark = B
.sp	.sp
Note that in this unanchored example the mark is retained from the match	Note that in this unanchored example the mark is retained from the match
attempt that started at the letter "X". Subsequent match attempts starting at	attempt that started at the letter "X" in the subject. Subsequent match
"P" and then with an empty string do not get as far as the (*MARK) item, but	attempts starting at "P" and then with an empty string do not get as far as the
nevertheless do not reset it.	(*MARK) item, but nevertheless do not reset it.
	.P
	If you are interested in (*MARK) values after failed matches, you should
	probably set the PCRE_NO_START_OPTIMIZE option
	.\" HTML <a href="#nooptimize">
	.\" </a>
	(see above)
	.\"
	to ensure that the match is always attempted.
.	.
.	.
.SS "Verbs that act after backtracking"	.SS "Verbs that act after backtracking"
Line 2876 Cambridge CB2 3QH, England.	Line 2913 Cambridge CB2 3QH, England.
.rs	.rs
.sp	.sp
.nf	.nf
Last updated: 09 January 2012	Last updated: 17 June 2012
Copyright (c) 1997-2012 University of Cambridge.	Copyright (c) 1997-2012 University of Cambridge.
.fi	.fi

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>

Removed from v.1.1.1.2
changed lines
	Added in v.1.1.1.3