Annotation of embedaddon/pcre/doc/html/pcresyntax.html, revision 1.1.1.3
1.1 misho 1: <html>
2: <head>
3: <title>pcresyntax specification</title>
4: </head>
5: <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6: <h1>pcresyntax man page</h1>
7: <p>
8: Return to the <a href="index.html">PCRE index page</a>.
9: </p>
10: <p>
11: This page is part of the PCRE HTML documentation. It was generated automatically
12: from the original man page. If there is any nonsense in it, please consult the
13: man page, in case the conversion went wrong.
14: <br>
15: <ul>
16: <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
17: <li><a name="TOC2" href="#SEC2">QUOTING</a>
18: <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19: <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20: <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
21: <li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
22: <li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
23: <li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
24: <li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
25: <li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
26: <li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a>
27: <li><a name="TOC12" href="#SEC12">ALTERNATION</a>
28: <li><a name="TOC13" href="#SEC13">CAPTURING</a>
29: <li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
30: <li><a name="TOC15" href="#SEC15">COMMENT</a>
31: <li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
32: <li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
33: <li><a name="TOC18" href="#SEC18">BACKREFERENCES</a>
34: <li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
35: <li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a>
36: <li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a>
37: <li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a>
38: <li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a>
39: <li><a name="TOC24" href="#SEC24">CALLOUTS</a>
40: <li><a name="TOC25" href="#SEC25">SEE ALSO</a>
41: <li><a name="TOC26" href="#SEC26">AUTHOR</a>
42: <li><a name="TOC27" href="#SEC27">REVISION</a>
43: </ul>
44: <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
45: <P>
46: The full syntax and semantics of the regular expressions that are supported by
47: PCRE are described in the
48: <a href="pcrepattern.html"><b>pcrepattern</b></a>
1.1.1.2 misho 49: documentation. This document contains a quick-reference summary of the syntax.
1.1 misho 50: </P>
51: <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
52: <P>
53: <pre>
54: \x where x is non-alphanumeric is a literal x
55: \Q...\E treat enclosed characters as literal
56: </PRE>
57: </P>
58: <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
59: <P>
60: <pre>
61: \a alarm, that is, the BEL character (hex 07)
62: \cx "control-x", where x is any ASCII character
63: \e escape (hex 1B)
1.1.1.3 ! misho 64: \f form feed (hex 0C)
1.1 misho 65: \n newline (hex 0A)
66: \r carriage return (hex 0D)
67: \t tab (hex 09)
68: \ddd character with octal code ddd, or backreference
69: \xhh character with hex code hh
70: \x{hhh..} character with hex code hhh..
71: </PRE>
72: </P>
73: <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
74: <P>
75: <pre>
76: . any character except newline;
77: in dotall mode, any character whatsoever
1.1.1.2 misho 78: \C one data unit, even in UTF mode (best avoided)
1.1 misho 79: \d a decimal digit
80: \D a character that is not a decimal digit
1.1.1.3 ! misho 81: \h a horizontal white space character
! 82: \H a character that is not a horizontal white space character
1.1 misho 83: \N a character that is not a newline
84: \p{<i>xx</i>} a character with the <i>xx</i> property
85: \P{<i>xx</i>} a character without the <i>xx</i> property
86: \R a newline sequence
1.1.1.3 ! misho 87: \s a white space character
! 88: \S a character that is not a white space character
! 89: \v a vertical white space character
! 90: \V a character that is not a vertical white space character
1.1 misho 91: \w a "word" character
92: \W a "non-word" character
93: \X an extended Unicode sequence
94: </pre>
95: In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII
1.1.1.2 misho 96: characters, even in a UTF mode. However, this can be changed by setting the
1.1 misho 97: PCRE_UCP option.
98: </P>
99: <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
100: <P>
101: <pre>
102: C Other
103: Cc Control
104: Cf Format
105: Cn Unassigned
106: Co Private use
107: Cs Surrogate
108:
109: L Letter
110: Ll Lower case letter
111: Lm Modifier letter
112: Lo Other letter
113: Lt Title case letter
114: Lu Upper case letter
115: L& Ll, Lu, or Lt
116:
117: M Mark
118: Mc Spacing mark
119: Me Enclosing mark
120: Mn Non-spacing mark
121:
122: N Number
123: Nd Decimal number
124: Nl Letter number
125: No Other number
126:
127: P Punctuation
128: Pc Connector punctuation
129: Pd Dash punctuation
130: Pe Close punctuation
131: Pf Final punctuation
132: Pi Initial punctuation
133: Po Other punctuation
134: Ps Open punctuation
135:
136: S Symbol
137: Sc Currency symbol
138: Sk Modifier symbol
139: Sm Mathematical symbol
140: So Other symbol
141:
142: Z Separator
143: Zl Line separator
144: Zp Paragraph separator
145: Zs Space separator
146: </PRE>
147: </P>
148: <br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
149: <P>
150: <pre>
151: Xan Alphanumeric: union of properties L and N
152: Xps POSIX space: property Z or tab, NL, VT, FF, CR
153: Xsp Perl space: property Z or tab, NL, FF, CR
154: Xwd Perl word: property Xan or underscore
155: </PRE>
156: </P>
157: <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
158: <P>
159: Arabic,
160: Armenian,
161: Avestan,
162: Balinese,
163: Bamum,
1.1.1.3 ! misho 164: Batak,
1.1 misho 165: Bengali,
166: Bopomofo,
1.1.1.3 ! misho 167: Brahmi,
1.1 misho 168: Braille,
169: Buginese,
170: Buhid,
171: Canadian_Aboriginal,
172: Carian,
1.1.1.3 ! misho 173: Chakma,
1.1 misho 174: Cham,
175: Cherokee,
176: Common,
177: Coptic,
178: Cuneiform,
179: Cypriot,
180: Cyrillic,
181: Deseret,
182: Devanagari,
183: Egyptian_Hieroglyphs,
184: Ethiopic,
185: Georgian,
186: Glagolitic,
187: Gothic,
188: Greek,
189: Gujarati,
190: Gurmukhi,
191: Han,
192: Hangul,
193: Hanunoo,
194: Hebrew,
195: Hiragana,
196: Imperial_Aramaic,
197: Inherited,
198: Inscriptional_Pahlavi,
199: Inscriptional_Parthian,
200: Javanese,
201: Kaithi,
202: Kannada,
203: Katakana,
204: Kayah_Li,
205: Kharoshthi,
206: Khmer,
207: Lao,
208: Latin,
209: Lepcha,
210: Limbu,
211: Linear_B,
212: Lisu,
213: Lycian,
214: Lydian,
215: Malayalam,
1.1.1.3 ! misho 216: Mandaic,
1.1 misho 217: Meetei_Mayek,
1.1.1.3 ! misho 218: Meroitic_Cursive,
! 219: Meroitic_Hieroglyphs,
! 220: Miao,
1.1 misho 221: Mongolian,
222: Myanmar,
223: New_Tai_Lue,
224: Nko,
225: Ogham,
226: Old_Italic,
227: Old_Persian,
228: Old_South_Arabian,
229: Old_Turkic,
230: Ol_Chiki,
231: Oriya,
232: Osmanya,
233: Phags_Pa,
234: Phoenician,
235: Rejang,
236: Runic,
237: Samaritan,
238: Saurashtra,
1.1.1.3 ! misho 239: Sharada,
1.1 misho 240: Shavian,
241: Sinhala,
1.1.1.3 ! misho 242: Sora_Sompeng,
1.1 misho 243: Sundanese,
244: Syloti_Nagri,
245: Syriac,
246: Tagalog,
247: Tagbanwa,
248: Tai_Le,
249: Tai_Tham,
250: Tai_Viet,
1.1.1.3 ! misho 251: Takri,
1.1 misho 252: Tamil,
253: Telugu,
254: Thaana,
255: Thai,
256: Tibetan,
257: Tifinagh,
258: Ugaritic,
259: Vai,
260: Yi.
261: </P>
262: <br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
263: <P>
264: <pre>
265: [...] positive character class
266: [^...] negative character class
267: [x-y] range (can be used for hex characters)
268: [[:xxx:]] positive POSIX named set
269: [[:^xxx:]] negative POSIX named set
270:
271: alnum alphanumeric
272: alpha alphabetic
273: ascii 0-127
274: blank space or tab
275: cntrl control character
276: digit decimal digit
277: graph printing, excluding space
278: lower lower case letter
279: print printing, including space
280: punct printing, excluding alphanumeric
1.1.1.3 ! misho 281: space white space
1.1 misho 282: upper upper case letter
283: word same as \w
284: xdigit hexadecimal digit
285: </pre>
286: In PCRE, POSIX character set names recognize only ASCII characters by default,
287: but some of them use Unicode properties if PCRE_UCP is set. You can use
288: \Q...\E inside a character class.
289: </P>
290: <br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
291: <P>
292: <pre>
293: ? 0 or 1, greedy
294: ?+ 0 or 1, possessive
295: ?? 0 or 1, lazy
296: * 0 or more, greedy
297: *+ 0 or more, possessive
298: *? 0 or more, lazy
299: + 1 or more, greedy
300: ++ 1 or more, possessive
301: +? 1 or more, lazy
302: {n} exactly n
303: {n,m} at least n, no more than m, greedy
304: {n,m}+ at least n, no more than m, possessive
305: {n,m}? at least n, no more than m, lazy
306: {n,} n or more, greedy
307: {n,}+ n or more, possessive
308: {n,}? n or more, lazy
309: </PRE>
310: </P>
311: <br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
312: <P>
313: <pre>
314: \b word boundary
315: \B not a word boundary
316: ^ start of subject
317: also after internal newline in multiline mode
318: \A start of subject
319: $ end of subject
320: also before newline at end of subject
321: also before internal newline in multiline mode
322: \Z end of subject
323: also before newline at end of subject
324: \z end of subject
325: \G first matching position in subject
326: </PRE>
327: </P>
328: <br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br>
329: <P>
330: <pre>
331: \K reset start of match
332: </PRE>
333: </P>
334: <br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
335: <P>
336: <pre>
337: expr|expr|expr...
338: </PRE>
339: </P>
340: <br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
341: <P>
342: <pre>
343: (...) capturing group
344: (?<name>...) named capturing group (Perl)
345: (?'name'...) named capturing group (Perl)
346: (?P<name>...) named capturing group (Python)
347: (?:...) non-capturing group
348: (?|...) non-capturing group; reset group numbers for
349: capturing groups in each alternative
350: </PRE>
351: </P>
352: <br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
353: <P>
354: <pre>
355: (?>...) atomic, non-capturing group
356: </PRE>
357: </P>
358: <br><a name="SEC15" href="#TOC1">COMMENT</a><br>
359: <P>
360: <pre>
361: (?#....) comment (not nestable)
362: </PRE>
363: </P>
364: <br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
365: <P>
366: <pre>
367: (?i) caseless
368: (?J) allow duplicate names
369: (?m) multiline
370: (?s) single line (dotall)
371: (?U) default ungreedy (lazy)
372: (?x) extended (ignore white space)
373: (?-...) unset option(s)
374: </pre>
375: The following are recognized only at the start of a pattern or after one of the
376: newline-setting options with similar syntax:
377: <pre>
378: (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
1.1.1.2 misho 379: (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
380: (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
1.1 misho 381: (*UCP) set PCRE_UCP (use Unicode properties for \d etc)
382: </PRE>
383: </P>
384: <br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
385: <P>
386: <pre>
387: (?=...) positive look ahead
388: (?!...) negative look ahead
389: (?<=...) positive look behind
390: (?<!...) negative look behind
391: </pre>
392: Each top-level branch of a look behind must be of a fixed length.
393: </P>
394: <br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br>
395: <P>
396: <pre>
397: \n reference by number (can be ambiguous)
398: \gn reference by number
399: \g{n} reference by number
400: \g{-n} relative reference by number
401: \k<name> reference by name (Perl)
402: \k'name' reference by name (Perl)
403: \g{name} reference by name (Perl)
404: \k{name} reference by name (.NET)
405: (?P=name) reference by name (Python)
406: </PRE>
407: </P>
408: <br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
409: <P>
410: <pre>
411: (?R) recurse whole pattern
412: (?n) call subpattern by absolute number
413: (?+n) call subpattern by relative number
414: (?-n) call subpattern by relative number
415: (?&name) call subpattern by name (Perl)
416: (?P>name) call subpattern by name (Python)
417: \g<name> call subpattern by name (Oniguruma)
418: \g'name' call subpattern by name (Oniguruma)
419: \g<n> call subpattern by absolute number (Oniguruma)
420: \g'n' call subpattern by absolute number (Oniguruma)
421: \g<+n> call subpattern by relative number (PCRE extension)
422: \g'+n' call subpattern by relative number (PCRE extension)
423: \g<-n> call subpattern by relative number (PCRE extension)
424: \g'-n' call subpattern by relative number (PCRE extension)
425: </PRE>
426: </P>
427: <br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br>
428: <P>
429: <pre>
430: (?(condition)yes-pattern)
431: (?(condition)yes-pattern|no-pattern)
432:
433: (?(n)... absolute reference condition
434: (?(+n)... relative reference condition
435: (?(-n)... relative reference condition
436: (?(<name>)... named reference condition (Perl)
437: (?('name')... named reference condition (Perl)
438: (?(name)... named reference condition (PCRE)
439: (?(R)... overall recursion condition
440: (?(Rn)... specific group recursion condition
441: (?(R&name)... specific recursion condition
442: (?(DEFINE)... define subpattern for reference
443: (?(assert)... assertion condition
444: </PRE>
445: </P>
446: <br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br>
447: <P>
448: The following act immediately they are reached:
449: <pre>
450: (*ACCEPT) force successful match
451: (*FAIL) force backtrack; synonym (*F)
1.1.1.2 misho 452: (*MARK:NAME) set name to be passed back; synonym (*:NAME)
1.1 misho 453: </pre>
454: The following act only when a subsequent match failure causes a backtrack to
455: reach them. They all force a match failure, but they differ in what happens
456: afterwards. Those that advance the start-of-match point do so only if the
457: pattern is not anchored.
458: <pre>
459: (*COMMIT) overall failure, no advance of starting point
460: (*PRUNE) advance to next starting character
1.1.1.2 misho 461: (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
462: (*SKIP) advance to current matching position
463: (*SKIP:NAME) advance to position corresponding to an earlier
464: (*MARK:NAME); if not found, the (*SKIP) is ignored
1.1 misho 465: (*THEN) local failure, backtrack to next alternation
1.1.1.2 misho 466: (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
1.1 misho 467: </PRE>
468: </P>
469: <br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br>
470: <P>
471: These are recognized only at the very start of the pattern or after a
1.1.1.2 misho 472: (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option.
1.1 misho 473: <pre>
474: (*CR) carriage return only
475: (*LF) linefeed only
476: (*CRLF) carriage return followed by linefeed
477: (*ANYCRLF) all three of the above
478: (*ANY) any Unicode newline sequence
479: </PRE>
480: </P>
481: <br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br>
482: <P>
483: These are recognized only at the very start of the pattern or after a
1.1.1.2 misho 484: (*...) option that sets the newline convention or a UTF or UCP mode.
1.1 misho 485: <pre>
486: (*BSR_ANYCRLF) CR, LF, or CRLF
487: (*BSR_UNICODE) any Unicode newline sequence
488: </PRE>
489: </P>
490: <br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
491: <P>
492: <pre>
493: (?C) callout
494: (?Cn) callout with data n
495: </PRE>
496: </P>
497: <br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
498: <P>
499: <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
500: <b>pcrematching</b>(3), <b>pcre</b>(3).
501: </P>
502: <br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
503: <P>
504: Philip Hazel
505: <br>
506: University Computing Service
507: <br>
508: Cambridge CB2 3QH, England.
509: <br>
510: </P>
511: <br><a name="SEC27" href="#TOC1">REVISION</a><br>
512: <P>
1.1.1.2 misho 513: Last updated: 10 January 2012
1.1 misho 514: <br>
1.1.1.2 misho 515: Copyright © 1997-2012 University of Cambridge.
1.1 misho 516: <br>
517: <p>
518: Return to the <a href="index.html">PCRE index page</a>.
519: </p>
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>