Annotation of embedaddon/pcre/doc/pcresyntax.3, revision 1.1.1.3
1.1.1.3 ! misho 1: .TH PCRESYNTAX 3 "10 January 2012" "PCRE 8.30"
1.1 misho 2: .SH NAME
3: PCRE - Perl-compatible regular expressions
4: .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
5: .rs
6: .sp
7: The full syntax and semantics of the regular expressions that are supported by
8: PCRE are described in the
9: .\" HREF
10: \fBpcrepattern\fP
11: .\"
1.1.1.2 misho 12: documentation. This document contains a quick-reference summary of the syntax.
1.1 misho 13: .
14: .
15: .SH "QUOTING"
16: .rs
17: .sp
18: \ex where x is non-alphanumeric is a literal x
19: \eQ...\eE treat enclosed characters as literal
20: .
21: .
22: .SH "CHARACTERS"
23: .rs
24: .sp
25: \ea alarm, that is, the BEL character (hex 07)
26: \ecx "control-x", where x is any ASCII character
27: \ee escape (hex 1B)
1.1.1.3 ! misho 28: \ef form feed (hex 0C)
1.1 misho 29: \en newline (hex 0A)
30: \er carriage return (hex 0D)
31: \et tab (hex 09)
32: \eddd character with octal code ddd, or backreference
33: \exhh character with hex code hh
34: \ex{hhh..} character with hex code hhh..
35: .
36: .
37: .SH "CHARACTER TYPES"
38: .rs
39: .sp
40: . any character except newline;
41: in dotall mode, any character whatsoever
1.1.1.2 misho 42: \eC one data unit, even in UTF mode (best avoided)
1.1 misho 43: \ed a decimal digit
44: \eD a character that is not a decimal digit
1.1.1.3 ! misho 45: \eh a horizontal white space character
! 46: \eH a character that is not a horizontal white space character
1.1 misho 47: \eN a character that is not a newline
48: \ep{\fIxx\fP} a character with the \fIxx\fP property
49: \eP{\fIxx\fP} a character without the \fIxx\fP property
50: \eR a newline sequence
1.1.1.3 ! misho 51: \es a white space character
! 52: \eS a character that is not a white space character
! 53: \ev a vertical white space character
! 54: \eV a character that is not a vertical white space character
1.1 misho 55: \ew a "word" character
56: \eW a "non-word" character
57: \eX an extended Unicode sequence
58: .sp
59: In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
1.1.1.2 misho 60: characters, even in a UTF mode. However, this can be changed by setting the
1.1 misho 61: PCRE_UCP option.
62: .
63: .
64: .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
65: .rs
66: .sp
67: C Other
68: Cc Control
69: Cf Format
70: Cn Unassigned
71: Co Private use
72: Cs Surrogate
73: .sp
74: L Letter
75: Ll Lower case letter
76: Lm Modifier letter
77: Lo Other letter
78: Lt Title case letter
79: Lu Upper case letter
80: L& Ll, Lu, or Lt
81: .sp
82: M Mark
83: Mc Spacing mark
84: Me Enclosing mark
85: Mn Non-spacing mark
86: .sp
87: N Number
88: Nd Decimal number
89: Nl Letter number
90: No Other number
91: .sp
92: P Punctuation
93: Pc Connector punctuation
94: Pd Dash punctuation
95: Pe Close punctuation
96: Pf Final punctuation
97: Pi Initial punctuation
98: Po Other punctuation
99: Ps Open punctuation
100: .sp
101: S Symbol
102: Sc Currency symbol
103: Sk Modifier symbol
104: Sm Mathematical symbol
105: So Other symbol
106: .sp
107: Z Separator
108: Zl Line separator
109: Zp Paragraph separator
110: Zs Space separator
111: .
112: .
113: .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
114: .rs
115: .sp
116: Xan Alphanumeric: union of properties L and N
117: Xps POSIX space: property Z or tab, NL, VT, FF, CR
118: Xsp Perl space: property Z or tab, NL, FF, CR
119: Xwd Perl word: property Xan or underscore
120: .
121: .
122: .SH "SCRIPT NAMES FOR \ep AND \eP"
123: .rs
124: .sp
125: Arabic,
126: Armenian,
127: Avestan,
128: Balinese,
129: Bamum,
1.1.1.3 ! misho 130: Batak,
1.1 misho 131: Bengali,
132: Bopomofo,
1.1.1.3 ! misho 133: Brahmi,
1.1 misho 134: Braille,
135: Buginese,
136: Buhid,
137: Canadian_Aboriginal,
138: Carian,
1.1.1.3 ! misho 139: Chakma,
1.1 misho 140: Cham,
141: Cherokee,
142: Common,
143: Coptic,
144: Cuneiform,
145: Cypriot,
146: Cyrillic,
147: Deseret,
148: Devanagari,
149: Egyptian_Hieroglyphs,
150: Ethiopic,
151: Georgian,
152: Glagolitic,
153: Gothic,
154: Greek,
155: Gujarati,
156: Gurmukhi,
157: Han,
158: Hangul,
159: Hanunoo,
160: Hebrew,
161: Hiragana,
162: Imperial_Aramaic,
163: Inherited,
164: Inscriptional_Pahlavi,
165: Inscriptional_Parthian,
166: Javanese,
167: Kaithi,
168: Kannada,
169: Katakana,
170: Kayah_Li,
171: Kharoshthi,
172: Khmer,
173: Lao,
174: Latin,
175: Lepcha,
176: Limbu,
177: Linear_B,
178: Lisu,
179: Lycian,
180: Lydian,
181: Malayalam,
1.1.1.3 ! misho 182: Mandaic,
1.1 misho 183: Meetei_Mayek,
1.1.1.3 ! misho 184: Meroitic_Cursive,
! 185: Meroitic_Hieroglyphs,
! 186: Miao,
1.1 misho 187: Mongolian,
188: Myanmar,
189: New_Tai_Lue,
190: Nko,
191: Ogham,
192: Old_Italic,
193: Old_Persian,
194: Old_South_Arabian,
195: Old_Turkic,
196: Ol_Chiki,
197: Oriya,
198: Osmanya,
199: Phags_Pa,
200: Phoenician,
201: Rejang,
202: Runic,
203: Samaritan,
204: Saurashtra,
1.1.1.3 ! misho 205: Sharada,
1.1 misho 206: Shavian,
207: Sinhala,
1.1.1.3 ! misho 208: Sora_Sompeng,
1.1 misho 209: Sundanese,
210: Syloti_Nagri,
211: Syriac,
212: Tagalog,
213: Tagbanwa,
214: Tai_Le,
215: Tai_Tham,
216: Tai_Viet,
1.1.1.3 ! misho 217: Takri,
1.1 misho 218: Tamil,
219: Telugu,
220: Thaana,
221: Thai,
222: Tibetan,
223: Tifinagh,
224: Ugaritic,
225: Vai,
226: Yi.
227: .
228: .
229: .SH "CHARACTER CLASSES"
230: .rs
231: .sp
232: [...] positive character class
233: [^...] negative character class
234: [x-y] range (can be used for hex characters)
235: [[:xxx:]] positive POSIX named set
236: [[:^xxx:]] negative POSIX named set
237: .sp
238: alnum alphanumeric
239: alpha alphabetic
240: ascii 0-127
241: blank space or tab
242: cntrl control character
243: digit decimal digit
244: graph printing, excluding space
245: lower lower case letter
246: print printing, including space
247: punct printing, excluding alphanumeric
1.1.1.3 ! misho 248: space white space
1.1 misho 249: upper upper case letter
250: word same as \ew
251: xdigit hexadecimal digit
252: .sp
253: In PCRE, POSIX character set names recognize only ASCII characters by default,
254: but some of them use Unicode properties if PCRE_UCP is set. You can use
255: \eQ...\eE inside a character class.
256: .
257: .
258: .SH "QUANTIFIERS"
259: .rs
260: .sp
261: ? 0 or 1, greedy
262: ?+ 0 or 1, possessive
263: ?? 0 or 1, lazy
264: * 0 or more, greedy
265: *+ 0 or more, possessive
266: *? 0 or more, lazy
267: + 1 or more, greedy
268: ++ 1 or more, possessive
269: +? 1 or more, lazy
270: {n} exactly n
271: {n,m} at least n, no more than m, greedy
272: {n,m}+ at least n, no more than m, possessive
273: {n,m}? at least n, no more than m, lazy
274: {n,} n or more, greedy
275: {n,}+ n or more, possessive
276: {n,}? n or more, lazy
277: .
278: .
279: .SH "ANCHORS AND SIMPLE ASSERTIONS"
280: .rs
281: .sp
282: \eb word boundary
283: \eB not a word boundary
284: ^ start of subject
285: also after internal newline in multiline mode
286: \eA start of subject
287: $ end of subject
288: also before newline at end of subject
289: also before internal newline in multiline mode
290: \eZ end of subject
291: also before newline at end of subject
292: \ez end of subject
293: \eG first matching position in subject
294: .
295: .
296: .SH "MATCH POINT RESET"
297: .rs
298: .sp
299: \eK reset start of match
300: .
301: .
302: .SH "ALTERNATION"
303: .rs
304: .sp
305: expr|expr|expr...
306: .
307: .
308: .SH "CAPTURING"
309: .rs
310: .sp
311: (...) capturing group
312: (?<name>...) named capturing group (Perl)
313: (?'name'...) named capturing group (Perl)
314: (?P<name>...) named capturing group (Python)
315: (?:...) non-capturing group
316: (?|...) non-capturing group; reset group numbers for
317: capturing groups in each alternative
318: .
319: .
320: .SH "ATOMIC GROUPS"
321: .rs
322: .sp
323: (?>...) atomic, non-capturing group
324: .
325: .
326: .
327: .
328: .SH "COMMENT"
329: .rs
330: .sp
331: (?#....) comment (not nestable)
332: .
333: .
334: .SH "OPTION SETTING"
335: .rs
336: .sp
337: (?i) caseless
338: (?J) allow duplicate names
339: (?m) multiline
340: (?s) single line (dotall)
341: (?U) default ungreedy (lazy)
342: (?x) extended (ignore white space)
343: (?-...) unset option(s)
344: .sp
345: The following are recognized only at the start of a pattern or after one of the
346: newline-setting options with similar syntax:
347: .sp
348: (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
1.1.1.2 misho 349: (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
350: (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
1.1 misho 351: (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
352: .
353: .
354: .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
355: .rs
356: .sp
357: (?=...) positive look ahead
358: (?!...) negative look ahead
359: (?<=...) positive look behind
360: (?<!...) negative look behind
361: .sp
362: Each top-level branch of a look behind must be of a fixed length.
363: .
364: .
365: .SH "BACKREFERENCES"
366: .rs
367: .sp
368: \en reference by number (can be ambiguous)
369: \egn reference by number
370: \eg{n} reference by number
371: \eg{-n} relative reference by number
372: \ek<name> reference by name (Perl)
373: \ek'name' reference by name (Perl)
374: \eg{name} reference by name (Perl)
375: \ek{name} reference by name (.NET)
376: (?P=name) reference by name (Python)
377: .
378: .
379: .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
380: .rs
381: .sp
382: (?R) recurse whole pattern
383: (?n) call subpattern by absolute number
384: (?+n) call subpattern by relative number
385: (?-n) call subpattern by relative number
386: (?&name) call subpattern by name (Perl)
387: (?P>name) call subpattern by name (Python)
388: \eg<name> call subpattern by name (Oniguruma)
389: \eg'name' call subpattern by name (Oniguruma)
390: \eg<n> call subpattern by absolute number (Oniguruma)
391: \eg'n' call subpattern by absolute number (Oniguruma)
392: \eg<+n> call subpattern by relative number (PCRE extension)
393: \eg'+n' call subpattern by relative number (PCRE extension)
394: \eg<-n> call subpattern by relative number (PCRE extension)
395: \eg'-n' call subpattern by relative number (PCRE extension)
396: .
397: .
398: .SH "CONDITIONAL PATTERNS"
399: .rs
400: .sp
401: (?(condition)yes-pattern)
402: (?(condition)yes-pattern|no-pattern)
403: .sp
404: (?(n)... absolute reference condition
405: (?(+n)... relative reference condition
406: (?(-n)... relative reference condition
407: (?(<name>)... named reference condition (Perl)
408: (?('name')... named reference condition (Perl)
409: (?(name)... named reference condition (PCRE)
410: (?(R)... overall recursion condition
411: (?(Rn)... specific group recursion condition
412: (?(R&name)... specific recursion condition
413: (?(DEFINE)... define subpattern for reference
414: (?(assert)... assertion condition
415: .
416: .
417: .SH "BACKTRACKING CONTROL"
418: .rs
419: .sp
420: The following act immediately they are reached:
421: .sp
422: (*ACCEPT) force successful match
423: (*FAIL) force backtrack; synonym (*F)
1.1.1.2 misho 424: (*MARK:NAME) set name to be passed back; synonym (*:NAME)
1.1 misho 425: .sp
426: The following act only when a subsequent match failure causes a backtrack to
427: reach them. They all force a match failure, but they differ in what happens
428: afterwards. Those that advance the start-of-match point do so only if the
429: pattern is not anchored.
430: .sp
431: (*COMMIT) overall failure, no advance of starting point
432: (*PRUNE) advance to next starting character
1.1.1.2 misho 433: (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
434: (*SKIP) advance to current matching position
435: (*SKIP:NAME) advance to position corresponding to an earlier
436: (*MARK:NAME); if not found, the (*SKIP) is ignored
1.1 misho 437: (*THEN) local failure, backtrack to next alternation
1.1.1.2 misho 438: (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
1.1 misho 439: .
440: .
441: .SH "NEWLINE CONVENTIONS"
442: .rs
443: .sp
444: These are recognized only at the very start of the pattern or after a
1.1.1.2 misho 445: (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option.
1.1 misho 446: .sp
447: (*CR) carriage return only
448: (*LF) linefeed only
449: (*CRLF) carriage return followed by linefeed
450: (*ANYCRLF) all three of the above
451: (*ANY) any Unicode newline sequence
452: .
453: .
454: .SH "WHAT \eR MATCHES"
455: .rs
456: .sp
457: These are recognized only at the very start of the pattern or after a
1.1.1.2 misho 458: (*...) option that sets the newline convention or a UTF or UCP mode.
1.1 misho 459: .sp
460: (*BSR_ANYCRLF) CR, LF, or CRLF
461: (*BSR_UNICODE) any Unicode newline sequence
462: .
463: .
464: .SH "CALLOUTS"
465: .rs
466: .sp
467: (?C) callout
468: (?Cn) callout with data n
469: .
470: .
471: .SH "SEE ALSO"
472: .rs
473: .sp
474: \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
475: \fBpcrematching\fP(3), \fBpcre\fP(3).
476: .
477: .
478: .SH AUTHOR
479: .rs
480: .sp
481: .nf
482: Philip Hazel
483: University Computing Service
484: Cambridge CB2 3QH, England.
485: .fi
486: .
487: .
488: .SH REVISION
489: .rs
490: .sp
491: .nf
1.1.1.2 misho 492: Last updated: 10 January 2012
493: Copyright (c) 1997-2012 University of Cambridge.
1.1 misho 494: .fi
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>