Annotation of embedaddon/pcre/doc/html/pcresyntax.html, revision 1.1.1.4
1.1 misho 1: <html>
2: <head>
3: <title>pcresyntax specification</title>
4: </head>
5: <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6: <h1>pcresyntax man page</h1>
7: <p>
8: Return to the <a href="index.html">PCRE index page</a>.
9: </p>
10: <p>
11: This page is part of the PCRE HTML documentation. It was generated automatically
12: from the original man page. If there is any nonsense in it, please consult the
13: man page, in case the conversion went wrong.
14: <br>
15: <ul>
16: <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
17: <li><a name="TOC2" href="#SEC2">QUOTING</a>
18: <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19: <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20: <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
21: <li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
22: <li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
23: <li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
24: <li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
25: <li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
26: <li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a>
27: <li><a name="TOC12" href="#SEC12">ALTERNATION</a>
28: <li><a name="TOC13" href="#SEC13">CAPTURING</a>
29: <li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
30: <li><a name="TOC15" href="#SEC15">COMMENT</a>
31: <li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
32: <li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
33: <li><a name="TOC18" href="#SEC18">BACKREFERENCES</a>
34: <li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
35: <li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a>
36: <li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a>
37: <li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a>
38: <li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a>
39: <li><a name="TOC24" href="#SEC24">CALLOUTS</a>
40: <li><a name="TOC25" href="#SEC25">SEE ALSO</a>
41: <li><a name="TOC26" href="#SEC26">AUTHOR</a>
42: <li><a name="TOC27" href="#SEC27">REVISION</a>
43: </ul>
44: <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
45: <P>
46: The full syntax and semantics of the regular expressions that are supported by
47: PCRE are described in the
48: <a href="pcrepattern.html"><b>pcrepattern</b></a>
1.1.1.2 misho 49: documentation. This document contains a quick-reference summary of the syntax.
1.1 misho 50: </P>
51: <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
52: <P>
53: <pre>
54: \x where x is non-alphanumeric is a literal x
55: \Q...\E treat enclosed characters as literal
56: </PRE>
57: </P>
58: <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
59: <P>
60: <pre>
61: \a alarm, that is, the BEL character (hex 07)
62: \cx "control-x", where x is any ASCII character
63: \e escape (hex 1B)
1.1.1.3 misho 64: \f form feed (hex 0C)
1.1 misho 65: \n newline (hex 0A)
66: \r carriage return (hex 0D)
67: \t tab (hex 09)
68: \ddd character with octal code ddd, or backreference
69: \xhh character with hex code hh
70: \x{hhh..} character with hex code hhh..
71: </PRE>
72: </P>
73: <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
74: <P>
75: <pre>
76: . any character except newline;
77: in dotall mode, any character whatsoever
1.1.1.2 misho 78: \C one data unit, even in UTF mode (best avoided)
1.1 misho 79: \d a decimal digit
80: \D a character that is not a decimal digit
1.1.1.3 misho 81: \h a horizontal white space character
82: \H a character that is not a horizontal white space character
1.1 misho 83: \N a character that is not a newline
84: \p{<i>xx</i>} a character with the <i>xx</i> property
85: \P{<i>xx</i>} a character without the <i>xx</i> property
86: \R a newline sequence
1.1.1.3 misho 87: \s a white space character
88: \S a character that is not a white space character
89: \v a vertical white space character
90: \V a character that is not a vertical white space character
1.1 misho 91: \w a "word" character
92: \W a "non-word" character
1.1.1.4 ! misho 93: \X a Unicode extended grapheme cluster
1.1 misho 94: </pre>
95: In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII
1.1.1.2 misho 96: characters, even in a UTF mode. However, this can be changed by setting the
1.1 misho 97: PCRE_UCP option.
98: </P>
99: <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
100: <P>
101: <pre>
102: C Other
103: Cc Control
104: Cf Format
105: Cn Unassigned
106: Co Private use
107: Cs Surrogate
108:
109: L Letter
110: Ll Lower case letter
111: Lm Modifier letter
112: Lo Other letter
113: Lt Title case letter
114: Lu Upper case letter
115: L& Ll, Lu, or Lt
116:
117: M Mark
118: Mc Spacing mark
119: Me Enclosing mark
120: Mn Non-spacing mark
121:
122: N Number
123: Nd Decimal number
124: Nl Letter number
125: No Other number
126:
127: P Punctuation
128: Pc Connector punctuation
129: Pd Dash punctuation
130: Pe Close punctuation
131: Pf Final punctuation
132: Pi Initial punctuation
133: Po Other punctuation
134: Ps Open punctuation
135:
136: S Symbol
137: Sc Currency symbol
138: Sk Modifier symbol
139: Sm Mathematical symbol
140: So Other symbol
141:
142: Z Separator
143: Zl Line separator
144: Zp Paragraph separator
145: Zs Space separator
146: </PRE>
147: </P>
148: <br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
149: <P>
150: <pre>
151: Xan Alphanumeric: union of properties L and N
152: Xps POSIX space: property Z or tab, NL, VT, FF, CR
153: Xsp Perl space: property Z or tab, NL, FF, CR
1.1.1.4 ! misho 154: Xuc Univerally-named character: one that can be
! 155: represented by a Universal Character Name
1.1 misho 156: Xwd Perl word: property Xan or underscore
157: </PRE>
158: </P>
159: <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
160: <P>
161: Arabic,
162: Armenian,
163: Avestan,
164: Balinese,
165: Bamum,
1.1.1.3 misho 166: Batak,
1.1 misho 167: Bengali,
168: Bopomofo,
1.1.1.3 misho 169: Brahmi,
1.1 misho 170: Braille,
171: Buginese,
172: Buhid,
173: Canadian_Aboriginal,
174: Carian,
1.1.1.3 misho 175: Chakma,
1.1 misho 176: Cham,
177: Cherokee,
178: Common,
179: Coptic,
180: Cuneiform,
181: Cypriot,
182: Cyrillic,
183: Deseret,
184: Devanagari,
185: Egyptian_Hieroglyphs,
186: Ethiopic,
187: Georgian,
188: Glagolitic,
189: Gothic,
190: Greek,
191: Gujarati,
192: Gurmukhi,
193: Han,
194: Hangul,
195: Hanunoo,
196: Hebrew,
197: Hiragana,
198: Imperial_Aramaic,
199: Inherited,
200: Inscriptional_Pahlavi,
201: Inscriptional_Parthian,
202: Javanese,
203: Kaithi,
204: Kannada,
205: Katakana,
206: Kayah_Li,
207: Kharoshthi,
208: Khmer,
209: Lao,
210: Latin,
211: Lepcha,
212: Limbu,
213: Linear_B,
214: Lisu,
215: Lycian,
216: Lydian,
217: Malayalam,
1.1.1.3 misho 218: Mandaic,
1.1 misho 219: Meetei_Mayek,
1.1.1.3 misho 220: Meroitic_Cursive,
221: Meroitic_Hieroglyphs,
222: Miao,
1.1 misho 223: Mongolian,
224: Myanmar,
225: New_Tai_Lue,
226: Nko,
227: Ogham,
228: Old_Italic,
229: Old_Persian,
230: Old_South_Arabian,
231: Old_Turkic,
232: Ol_Chiki,
233: Oriya,
234: Osmanya,
235: Phags_Pa,
236: Phoenician,
237: Rejang,
238: Runic,
239: Samaritan,
240: Saurashtra,
1.1.1.3 misho 241: Sharada,
1.1 misho 242: Shavian,
243: Sinhala,
1.1.1.3 misho 244: Sora_Sompeng,
1.1 misho 245: Sundanese,
246: Syloti_Nagri,
247: Syriac,
248: Tagalog,
249: Tagbanwa,
250: Tai_Le,
251: Tai_Tham,
252: Tai_Viet,
1.1.1.3 misho 253: Takri,
1.1 misho 254: Tamil,
255: Telugu,
256: Thaana,
257: Thai,
258: Tibetan,
259: Tifinagh,
260: Ugaritic,
261: Vai,
262: Yi.
263: </P>
264: <br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
265: <P>
266: <pre>
267: [...] positive character class
268: [^...] negative character class
269: [x-y] range (can be used for hex characters)
270: [[:xxx:]] positive POSIX named set
271: [[:^xxx:]] negative POSIX named set
272:
273: alnum alphanumeric
274: alpha alphabetic
275: ascii 0-127
276: blank space or tab
277: cntrl control character
278: digit decimal digit
279: graph printing, excluding space
280: lower lower case letter
281: print printing, including space
282: punct printing, excluding alphanumeric
1.1.1.3 misho 283: space white space
1.1 misho 284: upper upper case letter
285: word same as \w
286: xdigit hexadecimal digit
287: </pre>
288: In PCRE, POSIX character set names recognize only ASCII characters by default,
289: but some of them use Unicode properties if PCRE_UCP is set. You can use
290: \Q...\E inside a character class.
291: </P>
292: <br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
293: <P>
294: <pre>
295: ? 0 or 1, greedy
296: ?+ 0 or 1, possessive
297: ?? 0 or 1, lazy
298: * 0 or more, greedy
299: *+ 0 or more, possessive
300: *? 0 or more, lazy
301: + 1 or more, greedy
302: ++ 1 or more, possessive
303: +? 1 or more, lazy
304: {n} exactly n
305: {n,m} at least n, no more than m, greedy
306: {n,m}+ at least n, no more than m, possessive
307: {n,m}? at least n, no more than m, lazy
308: {n,} n or more, greedy
309: {n,}+ n or more, possessive
310: {n,}? n or more, lazy
311: </PRE>
312: </P>
313: <br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
314: <P>
315: <pre>
316: \b word boundary
317: \B not a word boundary
318: ^ start of subject
319: also after internal newline in multiline mode
320: \A start of subject
321: $ end of subject
322: also before newline at end of subject
323: also before internal newline in multiline mode
324: \Z end of subject
325: also before newline at end of subject
326: \z end of subject
327: \G first matching position in subject
328: </PRE>
329: </P>
330: <br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br>
331: <P>
332: <pre>
333: \K reset start of match
334: </PRE>
335: </P>
336: <br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
337: <P>
338: <pre>
339: expr|expr|expr...
340: </PRE>
341: </P>
342: <br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
343: <P>
344: <pre>
345: (...) capturing group
346: (?<name>...) named capturing group (Perl)
347: (?'name'...) named capturing group (Perl)
348: (?P<name>...) named capturing group (Python)
349: (?:...) non-capturing group
350: (?|...) non-capturing group; reset group numbers for
351: capturing groups in each alternative
352: </PRE>
353: </P>
354: <br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
355: <P>
356: <pre>
357: (?>...) atomic, non-capturing group
358: </PRE>
359: </P>
360: <br><a name="SEC15" href="#TOC1">COMMENT</a><br>
361: <P>
362: <pre>
363: (?#....) comment (not nestable)
364: </PRE>
365: </P>
366: <br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
367: <P>
368: <pre>
369: (?i) caseless
370: (?J) allow duplicate names
371: (?m) multiline
372: (?s) single line (dotall)
373: (?U) default ungreedy (lazy)
374: (?x) extended (ignore white space)
375: (?-...) unset option(s)
376: </pre>
377: The following are recognized only at the start of a pattern or after one of the
378: newline-setting options with similar syntax:
379: <pre>
1.1.1.4 ! misho 380: (*LIMIT_MATCH=d) set the match limit to d (decimal number)
! 381: (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
1.1 misho 382: (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
1.1.1.2 misho 383: (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
384: (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
1.1.1.4 ! misho 385: (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32)
! 386: (*UTF) set appropriate UTF mode for the library in use
1.1 misho 387: (*UCP) set PCRE_UCP (use Unicode properties for \d etc)
388: </PRE>
389: </P>
390: <br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
391: <P>
392: <pre>
393: (?=...) positive look ahead
394: (?!...) negative look ahead
395: (?<=...) positive look behind
396: (?<!...) negative look behind
397: </pre>
398: Each top-level branch of a look behind must be of a fixed length.
399: </P>
400: <br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br>
401: <P>
402: <pre>
403: \n reference by number (can be ambiguous)
404: \gn reference by number
405: \g{n} reference by number
406: \g{-n} relative reference by number
407: \k<name> reference by name (Perl)
408: \k'name' reference by name (Perl)
409: \g{name} reference by name (Perl)
410: \k{name} reference by name (.NET)
411: (?P=name) reference by name (Python)
412: </PRE>
413: </P>
414: <br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
415: <P>
416: <pre>
417: (?R) recurse whole pattern
418: (?n) call subpattern by absolute number
419: (?+n) call subpattern by relative number
420: (?-n) call subpattern by relative number
421: (?&name) call subpattern by name (Perl)
422: (?P>name) call subpattern by name (Python)
423: \g<name> call subpattern by name (Oniguruma)
424: \g'name' call subpattern by name (Oniguruma)
425: \g<n> call subpattern by absolute number (Oniguruma)
426: \g'n' call subpattern by absolute number (Oniguruma)
427: \g<+n> call subpattern by relative number (PCRE extension)
428: \g'+n' call subpattern by relative number (PCRE extension)
429: \g<-n> call subpattern by relative number (PCRE extension)
430: \g'-n' call subpattern by relative number (PCRE extension)
431: </PRE>
432: </P>
433: <br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br>
434: <P>
435: <pre>
436: (?(condition)yes-pattern)
437: (?(condition)yes-pattern|no-pattern)
438:
439: (?(n)... absolute reference condition
440: (?(+n)... relative reference condition
441: (?(-n)... relative reference condition
442: (?(<name>)... named reference condition (Perl)
443: (?('name')... named reference condition (Perl)
444: (?(name)... named reference condition (PCRE)
445: (?(R)... overall recursion condition
446: (?(Rn)... specific group recursion condition
447: (?(R&name)... specific recursion condition
448: (?(DEFINE)... define subpattern for reference
449: (?(assert)... assertion condition
450: </PRE>
451: </P>
452: <br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br>
453: <P>
454: The following act immediately they are reached:
455: <pre>
456: (*ACCEPT) force successful match
457: (*FAIL) force backtrack; synonym (*F)
1.1.1.2 misho 458: (*MARK:NAME) set name to be passed back; synonym (*:NAME)
1.1 misho 459: </pre>
460: The following act only when a subsequent match failure causes a backtrack to
461: reach them. They all force a match failure, but they differ in what happens
462: afterwards. Those that advance the start-of-match point do so only if the
463: pattern is not anchored.
464: <pre>
465: (*COMMIT) overall failure, no advance of starting point
466: (*PRUNE) advance to next starting character
1.1.1.2 misho 467: (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
468: (*SKIP) advance to current matching position
469: (*SKIP:NAME) advance to position corresponding to an earlier
470: (*MARK:NAME); if not found, the (*SKIP) is ignored
1.1 misho 471: (*THEN) local failure, backtrack to next alternation
1.1.1.2 misho 472: (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
1.1 misho 473: </PRE>
474: </P>
475: <br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br>
476: <P>
477: These are recognized only at the very start of the pattern or after a
1.1.1.4 ! misho 478: (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
1.1 misho 479: <pre>
480: (*CR) carriage return only
481: (*LF) linefeed only
482: (*CRLF) carriage return followed by linefeed
483: (*ANYCRLF) all three of the above
484: (*ANY) any Unicode newline sequence
485: </PRE>
486: </P>
487: <br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br>
488: <P>
489: These are recognized only at the very start of the pattern or after a
1.1.1.2 misho 490: (*...) option that sets the newline convention or a UTF or UCP mode.
1.1 misho 491: <pre>
492: (*BSR_ANYCRLF) CR, LF, or CRLF
493: (*BSR_UNICODE) any Unicode newline sequence
494: </PRE>
495: </P>
496: <br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
497: <P>
498: <pre>
499: (?C) callout
500: (?Cn) callout with data n
501: </PRE>
502: </P>
503: <br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
504: <P>
505: <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
506: <b>pcrematching</b>(3), <b>pcre</b>(3).
507: </P>
508: <br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
509: <P>
510: Philip Hazel
511: <br>
512: University Computing Service
513: <br>
514: Cambridge CB2 3QH, England.
515: <br>
516: </P>
517: <br><a name="SEC27" href="#TOC1">REVISION</a><br>
518: <P>
1.1.1.4 ! misho 519: Last updated: 26 April 2013
1.1 misho 520: <br>
1.1.1.4 ! misho 521: Copyright © 1997-2013 University of Cambridge.
1.1 misho 522: <br>
523: <p>
524: Return to the <a href="index.html">PCRE index page</a>.
525: </p>
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>