Annotation of embedaddon/pcre/doc/pcresyntax.3, revision 1.1.1.4
1.1.1.4 ! misho 1: .TH PCRESYNTAX 3 "26 April 2013" "PCRE 8.33"
1.1 misho 2: .SH NAME
3: PCRE - Perl-compatible regular expressions
4: .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
5: .rs
6: .sp
7: The full syntax and semantics of the regular expressions that are supported by
8: PCRE are described in the
9: .\" HREF
10: \fBpcrepattern\fP
11: .\"
1.1.1.2 misho 12: documentation. This document contains a quick-reference summary of the syntax.
1.1 misho 13: .
14: .
15: .SH "QUOTING"
16: .rs
17: .sp
18: \ex where x is non-alphanumeric is a literal x
19: \eQ...\eE treat enclosed characters as literal
20: .
21: .
22: .SH "CHARACTERS"
23: .rs
24: .sp
25: \ea alarm, that is, the BEL character (hex 07)
26: \ecx "control-x", where x is any ASCII character
27: \ee escape (hex 1B)
1.1.1.3 misho 28: \ef form feed (hex 0C)
1.1 misho 29: \en newline (hex 0A)
30: \er carriage return (hex 0D)
31: \et tab (hex 09)
32: \eddd character with octal code ddd, or backreference
33: \exhh character with hex code hh
34: \ex{hhh..} character with hex code hhh..
35: .
36: .
37: .SH "CHARACTER TYPES"
38: .rs
39: .sp
40: . any character except newline;
41: in dotall mode, any character whatsoever
1.1.1.2 misho 42: \eC one data unit, even in UTF mode (best avoided)
1.1 misho 43: \ed a decimal digit
44: \eD a character that is not a decimal digit
1.1.1.3 misho 45: \eh a horizontal white space character
46: \eH a character that is not a horizontal white space character
1.1 misho 47: \eN a character that is not a newline
48: \ep{\fIxx\fP} a character with the \fIxx\fP property
49: \eP{\fIxx\fP} a character without the \fIxx\fP property
50: \eR a newline sequence
1.1.1.3 misho 51: \es a white space character
52: \eS a character that is not a white space character
53: \ev a vertical white space character
54: \eV a character that is not a vertical white space character
1.1 misho 55: \ew a "word" character
56: \eW a "non-word" character
1.1.1.4 ! misho 57: \eX a Unicode extended grapheme cluster
1.1 misho 58: .sp
59: In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
1.1.1.2 misho 60: characters, even in a UTF mode. However, this can be changed by setting the
1.1 misho 61: PCRE_UCP option.
62: .
63: .
64: .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
65: .rs
66: .sp
67: C Other
68: Cc Control
69: Cf Format
70: Cn Unassigned
71: Co Private use
72: Cs Surrogate
73: .sp
74: L Letter
75: Ll Lower case letter
76: Lm Modifier letter
77: Lo Other letter
78: Lt Title case letter
79: Lu Upper case letter
80: L& Ll, Lu, or Lt
81: .sp
82: M Mark
83: Mc Spacing mark
84: Me Enclosing mark
85: Mn Non-spacing mark
86: .sp
87: N Number
88: Nd Decimal number
89: Nl Letter number
90: No Other number
91: .sp
92: P Punctuation
93: Pc Connector punctuation
94: Pd Dash punctuation
95: Pe Close punctuation
96: Pf Final punctuation
97: Pi Initial punctuation
98: Po Other punctuation
99: Ps Open punctuation
100: .sp
101: S Symbol
102: Sc Currency symbol
103: Sk Modifier symbol
104: Sm Mathematical symbol
105: So Other symbol
106: .sp
107: Z Separator
108: Zl Line separator
109: Zp Paragraph separator
110: Zs Space separator
111: .
112: .
113: .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
114: .rs
115: .sp
116: Xan Alphanumeric: union of properties L and N
117: Xps POSIX space: property Z or tab, NL, VT, FF, CR
118: Xsp Perl space: property Z or tab, NL, FF, CR
1.1.1.4 ! misho 119: Xuc Univerally-named character: one that can be
! 120: represented by a Universal Character Name
1.1 misho 121: Xwd Perl word: property Xan or underscore
122: .
123: .
124: .SH "SCRIPT NAMES FOR \ep AND \eP"
125: .rs
126: .sp
127: Arabic,
128: Armenian,
129: Avestan,
130: Balinese,
131: Bamum,
1.1.1.3 misho 132: Batak,
1.1 misho 133: Bengali,
134: Bopomofo,
1.1.1.3 misho 135: Brahmi,
1.1 misho 136: Braille,
137: Buginese,
138: Buhid,
139: Canadian_Aboriginal,
140: Carian,
1.1.1.3 misho 141: Chakma,
1.1 misho 142: Cham,
143: Cherokee,
144: Common,
145: Coptic,
146: Cuneiform,
147: Cypriot,
148: Cyrillic,
149: Deseret,
150: Devanagari,
151: Egyptian_Hieroglyphs,
152: Ethiopic,
153: Georgian,
154: Glagolitic,
155: Gothic,
156: Greek,
157: Gujarati,
158: Gurmukhi,
159: Han,
160: Hangul,
161: Hanunoo,
162: Hebrew,
163: Hiragana,
164: Imperial_Aramaic,
165: Inherited,
166: Inscriptional_Pahlavi,
167: Inscriptional_Parthian,
168: Javanese,
169: Kaithi,
170: Kannada,
171: Katakana,
172: Kayah_Li,
173: Kharoshthi,
174: Khmer,
175: Lao,
176: Latin,
177: Lepcha,
178: Limbu,
179: Linear_B,
180: Lisu,
181: Lycian,
182: Lydian,
183: Malayalam,
1.1.1.3 misho 184: Mandaic,
1.1 misho 185: Meetei_Mayek,
1.1.1.3 misho 186: Meroitic_Cursive,
187: Meroitic_Hieroglyphs,
188: Miao,
1.1 misho 189: Mongolian,
190: Myanmar,
191: New_Tai_Lue,
192: Nko,
193: Ogham,
194: Old_Italic,
195: Old_Persian,
196: Old_South_Arabian,
197: Old_Turkic,
198: Ol_Chiki,
199: Oriya,
200: Osmanya,
201: Phags_Pa,
202: Phoenician,
203: Rejang,
204: Runic,
205: Samaritan,
206: Saurashtra,
1.1.1.3 misho 207: Sharada,
1.1 misho 208: Shavian,
209: Sinhala,
1.1.1.3 misho 210: Sora_Sompeng,
1.1 misho 211: Sundanese,
212: Syloti_Nagri,
213: Syriac,
214: Tagalog,
215: Tagbanwa,
216: Tai_Le,
217: Tai_Tham,
218: Tai_Viet,
1.1.1.3 misho 219: Takri,
1.1 misho 220: Tamil,
221: Telugu,
222: Thaana,
223: Thai,
224: Tibetan,
225: Tifinagh,
226: Ugaritic,
227: Vai,
228: Yi.
229: .
230: .
231: .SH "CHARACTER CLASSES"
232: .rs
233: .sp
234: [...] positive character class
235: [^...] negative character class
236: [x-y] range (can be used for hex characters)
237: [[:xxx:]] positive POSIX named set
238: [[:^xxx:]] negative POSIX named set
239: .sp
240: alnum alphanumeric
241: alpha alphabetic
242: ascii 0-127
243: blank space or tab
244: cntrl control character
245: digit decimal digit
246: graph printing, excluding space
247: lower lower case letter
248: print printing, including space
249: punct printing, excluding alphanumeric
1.1.1.3 misho 250: space white space
1.1 misho 251: upper upper case letter
252: word same as \ew
253: xdigit hexadecimal digit
254: .sp
255: In PCRE, POSIX character set names recognize only ASCII characters by default,
256: but some of them use Unicode properties if PCRE_UCP is set. You can use
257: \eQ...\eE inside a character class.
258: .
259: .
260: .SH "QUANTIFIERS"
261: .rs
262: .sp
263: ? 0 or 1, greedy
264: ?+ 0 or 1, possessive
265: ?? 0 or 1, lazy
266: * 0 or more, greedy
267: *+ 0 or more, possessive
268: *? 0 or more, lazy
269: + 1 or more, greedy
270: ++ 1 or more, possessive
271: +? 1 or more, lazy
272: {n} exactly n
273: {n,m} at least n, no more than m, greedy
274: {n,m}+ at least n, no more than m, possessive
275: {n,m}? at least n, no more than m, lazy
276: {n,} n or more, greedy
277: {n,}+ n or more, possessive
278: {n,}? n or more, lazy
279: .
280: .
281: .SH "ANCHORS AND SIMPLE ASSERTIONS"
282: .rs
283: .sp
284: \eb word boundary
285: \eB not a word boundary
286: ^ start of subject
287: also after internal newline in multiline mode
288: \eA start of subject
289: $ end of subject
290: also before newline at end of subject
291: also before internal newline in multiline mode
292: \eZ end of subject
293: also before newline at end of subject
294: \ez end of subject
295: \eG first matching position in subject
296: .
297: .
298: .SH "MATCH POINT RESET"
299: .rs
300: .sp
301: \eK reset start of match
302: .
303: .
304: .SH "ALTERNATION"
305: .rs
306: .sp
307: expr|expr|expr...
308: .
309: .
310: .SH "CAPTURING"
311: .rs
312: .sp
313: (...) capturing group
314: (?<name>...) named capturing group (Perl)
315: (?'name'...) named capturing group (Perl)
316: (?P<name>...) named capturing group (Python)
317: (?:...) non-capturing group
318: (?|...) non-capturing group; reset group numbers for
319: capturing groups in each alternative
320: .
321: .
322: .SH "ATOMIC GROUPS"
323: .rs
324: .sp
325: (?>...) atomic, non-capturing group
326: .
327: .
328: .
329: .
330: .SH "COMMENT"
331: .rs
332: .sp
333: (?#....) comment (not nestable)
334: .
335: .
336: .SH "OPTION SETTING"
337: .rs
338: .sp
339: (?i) caseless
340: (?J) allow duplicate names
341: (?m) multiline
342: (?s) single line (dotall)
343: (?U) default ungreedy (lazy)
344: (?x) extended (ignore white space)
345: (?-...) unset option(s)
346: .sp
347: The following are recognized only at the start of a pattern or after one of the
348: newline-setting options with similar syntax:
349: .sp
1.1.1.4 ! misho 350: (*LIMIT_MATCH=d) set the match limit to d (decimal number)
! 351: (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
1.1 misho 352: (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
1.1.1.2 misho 353: (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
354: (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
1.1.1.4 ! misho 355: (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32)
! 356: (*UTF) set appropriate UTF mode for the library in use
1.1 misho 357: (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
358: .
359: .
360: .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
361: .rs
362: .sp
363: (?=...) positive look ahead
364: (?!...) negative look ahead
365: (?<=...) positive look behind
366: (?<!...) negative look behind
367: .sp
368: Each top-level branch of a look behind must be of a fixed length.
369: .
370: .
371: .SH "BACKREFERENCES"
372: .rs
373: .sp
374: \en reference by number (can be ambiguous)
375: \egn reference by number
376: \eg{n} reference by number
377: \eg{-n} relative reference by number
378: \ek<name> reference by name (Perl)
379: \ek'name' reference by name (Perl)
380: \eg{name} reference by name (Perl)
381: \ek{name} reference by name (.NET)
382: (?P=name) reference by name (Python)
383: .
384: .
385: .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
386: .rs
387: .sp
388: (?R) recurse whole pattern
389: (?n) call subpattern by absolute number
390: (?+n) call subpattern by relative number
391: (?-n) call subpattern by relative number
392: (?&name) call subpattern by name (Perl)
393: (?P>name) call subpattern by name (Python)
394: \eg<name> call subpattern by name (Oniguruma)
395: \eg'name' call subpattern by name (Oniguruma)
396: \eg<n> call subpattern by absolute number (Oniguruma)
397: \eg'n' call subpattern by absolute number (Oniguruma)
398: \eg<+n> call subpattern by relative number (PCRE extension)
399: \eg'+n' call subpattern by relative number (PCRE extension)
400: \eg<-n> call subpattern by relative number (PCRE extension)
401: \eg'-n' call subpattern by relative number (PCRE extension)
402: .
403: .
404: .SH "CONDITIONAL PATTERNS"
405: .rs
406: .sp
407: (?(condition)yes-pattern)
408: (?(condition)yes-pattern|no-pattern)
409: .sp
410: (?(n)... absolute reference condition
411: (?(+n)... relative reference condition
412: (?(-n)... relative reference condition
413: (?(<name>)... named reference condition (Perl)
414: (?('name')... named reference condition (Perl)
415: (?(name)... named reference condition (PCRE)
416: (?(R)... overall recursion condition
417: (?(Rn)... specific group recursion condition
418: (?(R&name)... specific recursion condition
419: (?(DEFINE)... define subpattern for reference
420: (?(assert)... assertion condition
421: .
422: .
423: .SH "BACKTRACKING CONTROL"
424: .rs
425: .sp
426: The following act immediately they are reached:
427: .sp
428: (*ACCEPT) force successful match
429: (*FAIL) force backtrack; synonym (*F)
1.1.1.2 misho 430: (*MARK:NAME) set name to be passed back; synonym (*:NAME)
1.1 misho 431: .sp
432: The following act only when a subsequent match failure causes a backtrack to
433: reach them. They all force a match failure, but they differ in what happens
434: afterwards. Those that advance the start-of-match point do so only if the
435: pattern is not anchored.
436: .sp
437: (*COMMIT) overall failure, no advance of starting point
438: (*PRUNE) advance to next starting character
1.1.1.2 misho 439: (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
440: (*SKIP) advance to current matching position
441: (*SKIP:NAME) advance to position corresponding to an earlier
442: (*MARK:NAME); if not found, the (*SKIP) is ignored
1.1 misho 443: (*THEN) local failure, backtrack to next alternation
1.1.1.2 misho 444: (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
1.1 misho 445: .
446: .
447: .SH "NEWLINE CONVENTIONS"
448: .rs
449: .sp
450: These are recognized only at the very start of the pattern or after a
1.1.1.4 ! misho 451: (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
1.1 misho 452: .sp
453: (*CR) carriage return only
454: (*LF) linefeed only
455: (*CRLF) carriage return followed by linefeed
456: (*ANYCRLF) all three of the above
457: (*ANY) any Unicode newline sequence
458: .
459: .
460: .SH "WHAT \eR MATCHES"
461: .rs
462: .sp
463: These are recognized only at the very start of the pattern or after a
1.1.1.2 misho 464: (*...) option that sets the newline convention or a UTF or UCP mode.
1.1 misho 465: .sp
466: (*BSR_ANYCRLF) CR, LF, or CRLF
467: (*BSR_UNICODE) any Unicode newline sequence
468: .
469: .
470: .SH "CALLOUTS"
471: .rs
472: .sp
473: (?C) callout
474: (?Cn) callout with data n
475: .
476: .
477: .SH "SEE ALSO"
478: .rs
479: .sp
480: \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
481: \fBpcrematching\fP(3), \fBpcre\fP(3).
482: .
483: .
484: .SH AUTHOR
485: .rs
486: .sp
487: .nf
488: Philip Hazel
489: University Computing Service
490: Cambridge CB2 3QH, England.
491: .fi
492: .
493: .
494: .SH REVISION
495: .rs
496: .sp
497: .nf
1.1.1.4 ! misho 498: Last updated: 26 April 2013
! 499: Copyright (c) 1997-2013 University of Cambridge.
1.1 misho 500: .fi
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>