Annotation of embedaddon/pcre/doc/pcresyntax.3, revision 1.1.1.2
1.1 misho 1: .TH PCRESYNTAX 3
2: .SH NAME
3: PCRE - Perl-compatible regular expressions
4: .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
5: .rs
6: .sp
7: The full syntax and semantics of the regular expressions that are supported by
8: PCRE are described in the
9: .\" HREF
10: \fBpcrepattern\fP
11: .\"
1.1.1.2 ! misho 12: documentation. This document contains a quick-reference summary of the syntax.
1.1 misho 13: .
14: .
15: .SH "QUOTING"
16: .rs
17: .sp
18: \ex where x is non-alphanumeric is a literal x
19: \eQ...\eE treat enclosed characters as literal
20: .
21: .
22: .SH "CHARACTERS"
23: .rs
24: .sp
25: \ea alarm, that is, the BEL character (hex 07)
26: \ecx "control-x", where x is any ASCII character
27: \ee escape (hex 1B)
28: \ef formfeed (hex 0C)
29: \en newline (hex 0A)
30: \er carriage return (hex 0D)
31: \et tab (hex 09)
32: \eddd character with octal code ddd, or backreference
33: \exhh character with hex code hh
34: \ex{hhh..} character with hex code hhh..
35: .
36: .
37: .SH "CHARACTER TYPES"
38: .rs
39: .sp
40: . any character except newline;
41: in dotall mode, any character whatsoever
1.1.1.2 ! misho 42: \eC one data unit, even in UTF mode (best avoided)
1.1 misho 43: \ed a decimal digit
44: \eD a character that is not a decimal digit
45: \eh a horizontal whitespace character
46: \eH a character that is not a horizontal whitespace character
47: \eN a character that is not a newline
48: \ep{\fIxx\fP} a character with the \fIxx\fP property
49: \eP{\fIxx\fP} a character without the \fIxx\fP property
50: \eR a newline sequence
51: \es a whitespace character
52: \eS a character that is not a whitespace character
53: \ev a vertical whitespace character
54: \eV a character that is not a vertical whitespace character
55: \ew a "word" character
56: \eW a "non-word" character
57: \eX an extended Unicode sequence
58: .sp
59: In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
1.1.1.2 ! misho 60: characters, even in a UTF mode. However, this can be changed by setting the
1.1 misho 61: PCRE_UCP option.
62: .
63: .
64: .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
65: .rs
66: .sp
67: C Other
68: Cc Control
69: Cf Format
70: Cn Unassigned
71: Co Private use
72: Cs Surrogate
73: .sp
74: L Letter
75: Ll Lower case letter
76: Lm Modifier letter
77: Lo Other letter
78: Lt Title case letter
79: Lu Upper case letter
80: L& Ll, Lu, or Lt
81: .sp
82: M Mark
83: Mc Spacing mark
84: Me Enclosing mark
85: Mn Non-spacing mark
86: .sp
87: N Number
88: Nd Decimal number
89: Nl Letter number
90: No Other number
91: .sp
92: P Punctuation
93: Pc Connector punctuation
94: Pd Dash punctuation
95: Pe Close punctuation
96: Pf Final punctuation
97: Pi Initial punctuation
98: Po Other punctuation
99: Ps Open punctuation
100: .sp
101: S Symbol
102: Sc Currency symbol
103: Sk Modifier symbol
104: Sm Mathematical symbol
105: So Other symbol
106: .sp
107: Z Separator
108: Zl Line separator
109: Zp Paragraph separator
110: Zs Space separator
111: .
112: .
113: .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
114: .rs
115: .sp
116: Xan Alphanumeric: union of properties L and N
117: Xps POSIX space: property Z or tab, NL, VT, FF, CR
118: Xsp Perl space: property Z or tab, NL, FF, CR
119: Xwd Perl word: property Xan or underscore
120: .
121: .
122: .SH "SCRIPT NAMES FOR \ep AND \eP"
123: .rs
124: .sp
125: Arabic,
126: Armenian,
127: Avestan,
128: Balinese,
129: Bamum,
130: Bengali,
131: Bopomofo,
132: Braille,
133: Buginese,
134: Buhid,
135: Canadian_Aboriginal,
136: Carian,
137: Cham,
138: Cherokee,
139: Common,
140: Coptic,
141: Cuneiform,
142: Cypriot,
143: Cyrillic,
144: Deseret,
145: Devanagari,
146: Egyptian_Hieroglyphs,
147: Ethiopic,
148: Georgian,
149: Glagolitic,
150: Gothic,
151: Greek,
152: Gujarati,
153: Gurmukhi,
154: Han,
155: Hangul,
156: Hanunoo,
157: Hebrew,
158: Hiragana,
159: Imperial_Aramaic,
160: Inherited,
161: Inscriptional_Pahlavi,
162: Inscriptional_Parthian,
163: Javanese,
164: Kaithi,
165: Kannada,
166: Katakana,
167: Kayah_Li,
168: Kharoshthi,
169: Khmer,
170: Lao,
171: Latin,
172: Lepcha,
173: Limbu,
174: Linear_B,
175: Lisu,
176: Lycian,
177: Lydian,
178: Malayalam,
179: Meetei_Mayek,
180: Mongolian,
181: Myanmar,
182: New_Tai_Lue,
183: Nko,
184: Ogham,
185: Old_Italic,
186: Old_Persian,
187: Old_South_Arabian,
188: Old_Turkic,
189: Ol_Chiki,
190: Oriya,
191: Osmanya,
192: Phags_Pa,
193: Phoenician,
194: Rejang,
195: Runic,
196: Samaritan,
197: Saurashtra,
198: Shavian,
199: Sinhala,
200: Sundanese,
201: Syloti_Nagri,
202: Syriac,
203: Tagalog,
204: Tagbanwa,
205: Tai_Le,
206: Tai_Tham,
207: Tai_Viet,
208: Tamil,
209: Telugu,
210: Thaana,
211: Thai,
212: Tibetan,
213: Tifinagh,
214: Ugaritic,
215: Vai,
216: Yi.
217: .
218: .
219: .SH "CHARACTER CLASSES"
220: .rs
221: .sp
222: [...] positive character class
223: [^...] negative character class
224: [x-y] range (can be used for hex characters)
225: [[:xxx:]] positive POSIX named set
226: [[:^xxx:]] negative POSIX named set
227: .sp
228: alnum alphanumeric
229: alpha alphabetic
230: ascii 0-127
231: blank space or tab
232: cntrl control character
233: digit decimal digit
234: graph printing, excluding space
235: lower lower case letter
236: print printing, including space
237: punct printing, excluding alphanumeric
238: space whitespace
239: upper upper case letter
240: word same as \ew
241: xdigit hexadecimal digit
242: .sp
243: In PCRE, POSIX character set names recognize only ASCII characters by default,
244: but some of them use Unicode properties if PCRE_UCP is set. You can use
245: \eQ...\eE inside a character class.
246: .
247: .
248: .SH "QUANTIFIERS"
249: .rs
250: .sp
251: ? 0 or 1, greedy
252: ?+ 0 or 1, possessive
253: ?? 0 or 1, lazy
254: * 0 or more, greedy
255: *+ 0 or more, possessive
256: *? 0 or more, lazy
257: + 1 or more, greedy
258: ++ 1 or more, possessive
259: +? 1 or more, lazy
260: {n} exactly n
261: {n,m} at least n, no more than m, greedy
262: {n,m}+ at least n, no more than m, possessive
263: {n,m}? at least n, no more than m, lazy
264: {n,} n or more, greedy
265: {n,}+ n or more, possessive
266: {n,}? n or more, lazy
267: .
268: .
269: .SH "ANCHORS AND SIMPLE ASSERTIONS"
270: .rs
271: .sp
272: \eb word boundary
273: \eB not a word boundary
274: ^ start of subject
275: also after internal newline in multiline mode
276: \eA start of subject
277: $ end of subject
278: also before newline at end of subject
279: also before internal newline in multiline mode
280: \eZ end of subject
281: also before newline at end of subject
282: \ez end of subject
283: \eG first matching position in subject
284: .
285: .
286: .SH "MATCH POINT RESET"
287: .rs
288: .sp
289: \eK reset start of match
290: .
291: .
292: .SH "ALTERNATION"
293: .rs
294: .sp
295: expr|expr|expr...
296: .
297: .
298: .SH "CAPTURING"
299: .rs
300: .sp
301: (...) capturing group
302: (?<name>...) named capturing group (Perl)
303: (?'name'...) named capturing group (Perl)
304: (?P<name>...) named capturing group (Python)
305: (?:...) non-capturing group
306: (?|...) non-capturing group; reset group numbers for
307: capturing groups in each alternative
308: .
309: .
310: .SH "ATOMIC GROUPS"
311: .rs
312: .sp
313: (?>...) atomic, non-capturing group
314: .
315: .
316: .
317: .
318: .SH "COMMENT"
319: .rs
320: .sp
321: (?#....) comment (not nestable)
322: .
323: .
324: .SH "OPTION SETTING"
325: .rs
326: .sp
327: (?i) caseless
328: (?J) allow duplicate names
329: (?m) multiline
330: (?s) single line (dotall)
331: (?U) default ungreedy (lazy)
332: (?x) extended (ignore white space)
333: (?-...) unset option(s)
334: .sp
335: The following are recognized only at the start of a pattern or after one of the
336: newline-setting options with similar syntax:
337: .sp
338: (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
1.1.1.2 ! misho 339: (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
! 340: (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
1.1 misho 341: (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
342: .
343: .
344: .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
345: .rs
346: .sp
347: (?=...) positive look ahead
348: (?!...) negative look ahead
349: (?<=...) positive look behind
350: (?<!...) negative look behind
351: .sp
352: Each top-level branch of a look behind must be of a fixed length.
353: .
354: .
355: .SH "BACKREFERENCES"
356: .rs
357: .sp
358: \en reference by number (can be ambiguous)
359: \egn reference by number
360: \eg{n} reference by number
361: \eg{-n} relative reference by number
362: \ek<name> reference by name (Perl)
363: \ek'name' reference by name (Perl)
364: \eg{name} reference by name (Perl)
365: \ek{name} reference by name (.NET)
366: (?P=name) reference by name (Python)
367: .
368: .
369: .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
370: .rs
371: .sp
372: (?R) recurse whole pattern
373: (?n) call subpattern by absolute number
374: (?+n) call subpattern by relative number
375: (?-n) call subpattern by relative number
376: (?&name) call subpattern by name (Perl)
377: (?P>name) call subpattern by name (Python)
378: \eg<name> call subpattern by name (Oniguruma)
379: \eg'name' call subpattern by name (Oniguruma)
380: \eg<n> call subpattern by absolute number (Oniguruma)
381: \eg'n' call subpattern by absolute number (Oniguruma)
382: \eg<+n> call subpattern by relative number (PCRE extension)
383: \eg'+n' call subpattern by relative number (PCRE extension)
384: \eg<-n> call subpattern by relative number (PCRE extension)
385: \eg'-n' call subpattern by relative number (PCRE extension)
386: .
387: .
388: .SH "CONDITIONAL PATTERNS"
389: .rs
390: .sp
391: (?(condition)yes-pattern)
392: (?(condition)yes-pattern|no-pattern)
393: .sp
394: (?(n)... absolute reference condition
395: (?(+n)... relative reference condition
396: (?(-n)... relative reference condition
397: (?(<name>)... named reference condition (Perl)
398: (?('name')... named reference condition (Perl)
399: (?(name)... named reference condition (PCRE)
400: (?(R)... overall recursion condition
401: (?(Rn)... specific group recursion condition
402: (?(R&name)... specific recursion condition
403: (?(DEFINE)... define subpattern for reference
404: (?(assert)... assertion condition
405: .
406: .
407: .SH "BACKTRACKING CONTROL"
408: .rs
409: .sp
410: The following act immediately they are reached:
411: .sp
412: (*ACCEPT) force successful match
413: (*FAIL) force backtrack; synonym (*F)
1.1.1.2 ! misho 414: (*MARK:NAME) set name to be passed back; synonym (*:NAME)
1.1 misho 415: .sp
416: The following act only when a subsequent match failure causes a backtrack to
417: reach them. They all force a match failure, but they differ in what happens
418: afterwards. Those that advance the start-of-match point do so only if the
419: pattern is not anchored.
420: .sp
421: (*COMMIT) overall failure, no advance of starting point
422: (*PRUNE) advance to next starting character
1.1.1.2 ! misho 423: (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
! 424: (*SKIP) advance to current matching position
! 425: (*SKIP:NAME) advance to position corresponding to an earlier
! 426: (*MARK:NAME); if not found, the (*SKIP) is ignored
1.1 misho 427: (*THEN) local failure, backtrack to next alternation
1.1.1.2 ! misho 428: (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
1.1 misho 429: .
430: .
431: .SH "NEWLINE CONVENTIONS"
432: .rs
433: .sp
434: These are recognized only at the very start of the pattern or after a
1.1.1.2 ! misho 435: (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option.
1.1 misho 436: .sp
437: (*CR) carriage return only
438: (*LF) linefeed only
439: (*CRLF) carriage return followed by linefeed
440: (*ANYCRLF) all three of the above
441: (*ANY) any Unicode newline sequence
442: .
443: .
444: .SH "WHAT \eR MATCHES"
445: .rs
446: .sp
447: These are recognized only at the very start of the pattern or after a
1.1.1.2 ! misho 448: (*...) option that sets the newline convention or a UTF or UCP mode.
1.1 misho 449: .sp
450: (*BSR_ANYCRLF) CR, LF, or CRLF
451: (*BSR_UNICODE) any Unicode newline sequence
452: .
453: .
454: .SH "CALLOUTS"
455: .rs
456: .sp
457: (?C) callout
458: (?Cn) callout with data n
459: .
460: .
461: .SH "SEE ALSO"
462: .rs
463: .sp
464: \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
465: \fBpcrematching\fP(3), \fBpcre\fP(3).
466: .
467: .
468: .SH AUTHOR
469: .rs
470: .sp
471: .nf
472: Philip Hazel
473: University Computing Service
474: Cambridge CB2 3QH, England.
475: .fi
476: .
477: .
478: .SH REVISION
479: .rs
480: .sp
481: .nf
1.1.1.2 ! misho 482: Last updated: 10 January 2012
! 483: Copyright (c) 1997-2012 University of Cambridge.
1.1 misho 484: .fi
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>