Annotation of embedaddon/pcre/doc/pcresyntax.3, revision 1.1
1.1 ! misho 1: .TH PCRESYNTAX 3
! 2: .SH NAME
! 3: PCRE - Perl-compatible regular expressions
! 4: .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
! 5: .rs
! 6: .sp
! 7: The full syntax and semantics of the regular expressions that are supported by
! 8: PCRE are described in the
! 9: .\" HREF
! 10: \fBpcrepattern\fP
! 11: .\"
! 12: documentation. This document contains just a quick-reference summary of the
! 13: syntax.
! 14: .
! 15: .
! 16: .SH "QUOTING"
! 17: .rs
! 18: .sp
! 19: \ex where x is non-alphanumeric is a literal x
! 20: \eQ...\eE treat enclosed characters as literal
! 21: .
! 22: .
! 23: .SH "CHARACTERS"
! 24: .rs
! 25: .sp
! 26: \ea alarm, that is, the BEL character (hex 07)
! 27: \ecx "control-x", where x is any ASCII character
! 28: \ee escape (hex 1B)
! 29: \ef formfeed (hex 0C)
! 30: \en newline (hex 0A)
! 31: \er carriage return (hex 0D)
! 32: \et tab (hex 09)
! 33: \eddd character with octal code ddd, or backreference
! 34: \exhh character with hex code hh
! 35: \ex{hhh..} character with hex code hhh..
! 36: .
! 37: .
! 38: .SH "CHARACTER TYPES"
! 39: .rs
! 40: .sp
! 41: . any character except newline;
! 42: in dotall mode, any character whatsoever
! 43: \eC one byte, even in UTF-8 mode (best avoided)
! 44: \ed a decimal digit
! 45: \eD a character that is not a decimal digit
! 46: \eh a horizontal whitespace character
! 47: \eH a character that is not a horizontal whitespace character
! 48: \eN a character that is not a newline
! 49: \ep{\fIxx\fP} a character with the \fIxx\fP property
! 50: \eP{\fIxx\fP} a character without the \fIxx\fP property
! 51: \eR a newline sequence
! 52: \es a whitespace character
! 53: \eS a character that is not a whitespace character
! 54: \ev a vertical whitespace character
! 55: \eV a character that is not a vertical whitespace character
! 56: \ew a "word" character
! 57: \eW a "non-word" character
! 58: \eX an extended Unicode sequence
! 59: .sp
! 60: In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
! 61: characters, even in UTF-8 mode. However, this can be changed by setting the
! 62: PCRE_UCP option.
! 63: .
! 64: .
! 65: .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
! 66: .rs
! 67: .sp
! 68: C Other
! 69: Cc Control
! 70: Cf Format
! 71: Cn Unassigned
! 72: Co Private use
! 73: Cs Surrogate
! 74: .sp
! 75: L Letter
! 76: Ll Lower case letter
! 77: Lm Modifier letter
! 78: Lo Other letter
! 79: Lt Title case letter
! 80: Lu Upper case letter
! 81: L& Ll, Lu, or Lt
! 82: .sp
! 83: M Mark
! 84: Mc Spacing mark
! 85: Me Enclosing mark
! 86: Mn Non-spacing mark
! 87: .sp
! 88: N Number
! 89: Nd Decimal number
! 90: Nl Letter number
! 91: No Other number
! 92: .sp
! 93: P Punctuation
! 94: Pc Connector punctuation
! 95: Pd Dash punctuation
! 96: Pe Close punctuation
! 97: Pf Final punctuation
! 98: Pi Initial punctuation
! 99: Po Other punctuation
! 100: Ps Open punctuation
! 101: .sp
! 102: S Symbol
! 103: Sc Currency symbol
! 104: Sk Modifier symbol
! 105: Sm Mathematical symbol
! 106: So Other symbol
! 107: .sp
! 108: Z Separator
! 109: Zl Line separator
! 110: Zp Paragraph separator
! 111: Zs Space separator
! 112: .
! 113: .
! 114: .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
! 115: .rs
! 116: .sp
! 117: Xan Alphanumeric: union of properties L and N
! 118: Xps POSIX space: property Z or tab, NL, VT, FF, CR
! 119: Xsp Perl space: property Z or tab, NL, FF, CR
! 120: Xwd Perl word: property Xan or underscore
! 121: .
! 122: .
! 123: .SH "SCRIPT NAMES FOR \ep AND \eP"
! 124: .rs
! 125: .sp
! 126: Arabic,
! 127: Armenian,
! 128: Avestan,
! 129: Balinese,
! 130: Bamum,
! 131: Bengali,
! 132: Bopomofo,
! 133: Braille,
! 134: Buginese,
! 135: Buhid,
! 136: Canadian_Aboriginal,
! 137: Carian,
! 138: Cham,
! 139: Cherokee,
! 140: Common,
! 141: Coptic,
! 142: Cuneiform,
! 143: Cypriot,
! 144: Cyrillic,
! 145: Deseret,
! 146: Devanagari,
! 147: Egyptian_Hieroglyphs,
! 148: Ethiopic,
! 149: Georgian,
! 150: Glagolitic,
! 151: Gothic,
! 152: Greek,
! 153: Gujarati,
! 154: Gurmukhi,
! 155: Han,
! 156: Hangul,
! 157: Hanunoo,
! 158: Hebrew,
! 159: Hiragana,
! 160: Imperial_Aramaic,
! 161: Inherited,
! 162: Inscriptional_Pahlavi,
! 163: Inscriptional_Parthian,
! 164: Javanese,
! 165: Kaithi,
! 166: Kannada,
! 167: Katakana,
! 168: Kayah_Li,
! 169: Kharoshthi,
! 170: Khmer,
! 171: Lao,
! 172: Latin,
! 173: Lepcha,
! 174: Limbu,
! 175: Linear_B,
! 176: Lisu,
! 177: Lycian,
! 178: Lydian,
! 179: Malayalam,
! 180: Meetei_Mayek,
! 181: Mongolian,
! 182: Myanmar,
! 183: New_Tai_Lue,
! 184: Nko,
! 185: Ogham,
! 186: Old_Italic,
! 187: Old_Persian,
! 188: Old_South_Arabian,
! 189: Old_Turkic,
! 190: Ol_Chiki,
! 191: Oriya,
! 192: Osmanya,
! 193: Phags_Pa,
! 194: Phoenician,
! 195: Rejang,
! 196: Runic,
! 197: Samaritan,
! 198: Saurashtra,
! 199: Shavian,
! 200: Sinhala,
! 201: Sundanese,
! 202: Syloti_Nagri,
! 203: Syriac,
! 204: Tagalog,
! 205: Tagbanwa,
! 206: Tai_Le,
! 207: Tai_Tham,
! 208: Tai_Viet,
! 209: Tamil,
! 210: Telugu,
! 211: Thaana,
! 212: Thai,
! 213: Tibetan,
! 214: Tifinagh,
! 215: Ugaritic,
! 216: Vai,
! 217: Yi.
! 218: .
! 219: .
! 220: .SH "CHARACTER CLASSES"
! 221: .rs
! 222: .sp
! 223: [...] positive character class
! 224: [^...] negative character class
! 225: [x-y] range (can be used for hex characters)
! 226: [[:xxx:]] positive POSIX named set
! 227: [[:^xxx:]] negative POSIX named set
! 228: .sp
! 229: alnum alphanumeric
! 230: alpha alphabetic
! 231: ascii 0-127
! 232: blank space or tab
! 233: cntrl control character
! 234: digit decimal digit
! 235: graph printing, excluding space
! 236: lower lower case letter
! 237: print printing, including space
! 238: punct printing, excluding alphanumeric
! 239: space whitespace
! 240: upper upper case letter
! 241: word same as \ew
! 242: xdigit hexadecimal digit
! 243: .sp
! 244: In PCRE, POSIX character set names recognize only ASCII characters by default,
! 245: but some of them use Unicode properties if PCRE_UCP is set. You can use
! 246: \eQ...\eE inside a character class.
! 247: .
! 248: .
! 249: .SH "QUANTIFIERS"
! 250: .rs
! 251: .sp
! 252: ? 0 or 1, greedy
! 253: ?+ 0 or 1, possessive
! 254: ?? 0 or 1, lazy
! 255: * 0 or more, greedy
! 256: *+ 0 or more, possessive
! 257: *? 0 or more, lazy
! 258: + 1 or more, greedy
! 259: ++ 1 or more, possessive
! 260: +? 1 or more, lazy
! 261: {n} exactly n
! 262: {n,m} at least n, no more than m, greedy
! 263: {n,m}+ at least n, no more than m, possessive
! 264: {n,m}? at least n, no more than m, lazy
! 265: {n,} n or more, greedy
! 266: {n,}+ n or more, possessive
! 267: {n,}? n or more, lazy
! 268: .
! 269: .
! 270: .SH "ANCHORS AND SIMPLE ASSERTIONS"
! 271: .rs
! 272: .sp
! 273: \eb word boundary
! 274: \eB not a word boundary
! 275: ^ start of subject
! 276: also after internal newline in multiline mode
! 277: \eA start of subject
! 278: $ end of subject
! 279: also before newline at end of subject
! 280: also before internal newline in multiline mode
! 281: \eZ end of subject
! 282: also before newline at end of subject
! 283: \ez end of subject
! 284: \eG first matching position in subject
! 285: .
! 286: .
! 287: .SH "MATCH POINT RESET"
! 288: .rs
! 289: .sp
! 290: \eK reset start of match
! 291: .
! 292: .
! 293: .SH "ALTERNATION"
! 294: .rs
! 295: .sp
! 296: expr|expr|expr...
! 297: .
! 298: .
! 299: .SH "CAPTURING"
! 300: .rs
! 301: .sp
! 302: (...) capturing group
! 303: (?<name>...) named capturing group (Perl)
! 304: (?'name'...) named capturing group (Perl)
! 305: (?P<name>...) named capturing group (Python)
! 306: (?:...) non-capturing group
! 307: (?|...) non-capturing group; reset group numbers for
! 308: capturing groups in each alternative
! 309: .
! 310: .
! 311: .SH "ATOMIC GROUPS"
! 312: .rs
! 313: .sp
! 314: (?>...) atomic, non-capturing group
! 315: .
! 316: .
! 317: .
! 318: .
! 319: .SH "COMMENT"
! 320: .rs
! 321: .sp
! 322: (?#....) comment (not nestable)
! 323: .
! 324: .
! 325: .SH "OPTION SETTING"
! 326: .rs
! 327: .sp
! 328: (?i) caseless
! 329: (?J) allow duplicate names
! 330: (?m) multiline
! 331: (?s) single line (dotall)
! 332: (?U) default ungreedy (lazy)
! 333: (?x) extended (ignore white space)
! 334: (?-...) unset option(s)
! 335: .sp
! 336: The following are recognized only at the start of a pattern or after one of the
! 337: newline-setting options with similar syntax:
! 338: .sp
! 339: (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
! 340: (*UTF8) set UTF-8 mode (PCRE_UTF8)
! 341: (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
! 342: .
! 343: .
! 344: .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
! 345: .rs
! 346: .sp
! 347: (?=...) positive look ahead
! 348: (?!...) negative look ahead
! 349: (?<=...) positive look behind
! 350: (?<!...) negative look behind
! 351: .sp
! 352: Each top-level branch of a look behind must be of a fixed length.
! 353: .
! 354: .
! 355: .SH "BACKREFERENCES"
! 356: .rs
! 357: .sp
! 358: \en reference by number (can be ambiguous)
! 359: \egn reference by number
! 360: \eg{n} reference by number
! 361: \eg{-n} relative reference by number
! 362: \ek<name> reference by name (Perl)
! 363: \ek'name' reference by name (Perl)
! 364: \eg{name} reference by name (Perl)
! 365: \ek{name} reference by name (.NET)
! 366: (?P=name) reference by name (Python)
! 367: .
! 368: .
! 369: .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
! 370: .rs
! 371: .sp
! 372: (?R) recurse whole pattern
! 373: (?n) call subpattern by absolute number
! 374: (?+n) call subpattern by relative number
! 375: (?-n) call subpattern by relative number
! 376: (?&name) call subpattern by name (Perl)
! 377: (?P>name) call subpattern by name (Python)
! 378: \eg<name> call subpattern by name (Oniguruma)
! 379: \eg'name' call subpattern by name (Oniguruma)
! 380: \eg<n> call subpattern by absolute number (Oniguruma)
! 381: \eg'n' call subpattern by absolute number (Oniguruma)
! 382: \eg<+n> call subpattern by relative number (PCRE extension)
! 383: \eg'+n' call subpattern by relative number (PCRE extension)
! 384: \eg<-n> call subpattern by relative number (PCRE extension)
! 385: \eg'-n' call subpattern by relative number (PCRE extension)
! 386: .
! 387: .
! 388: .SH "CONDITIONAL PATTERNS"
! 389: .rs
! 390: .sp
! 391: (?(condition)yes-pattern)
! 392: (?(condition)yes-pattern|no-pattern)
! 393: .sp
! 394: (?(n)... absolute reference condition
! 395: (?(+n)... relative reference condition
! 396: (?(-n)... relative reference condition
! 397: (?(<name>)... named reference condition (Perl)
! 398: (?('name')... named reference condition (Perl)
! 399: (?(name)... named reference condition (PCRE)
! 400: (?(R)... overall recursion condition
! 401: (?(Rn)... specific group recursion condition
! 402: (?(R&name)... specific recursion condition
! 403: (?(DEFINE)... define subpattern for reference
! 404: (?(assert)... assertion condition
! 405: .
! 406: .
! 407: .SH "BACKTRACKING CONTROL"
! 408: .rs
! 409: .sp
! 410: The following act immediately they are reached:
! 411: .sp
! 412: (*ACCEPT) force successful match
! 413: (*FAIL) force backtrack; synonym (*F)
! 414: .sp
! 415: The following act only when a subsequent match failure causes a backtrack to
! 416: reach them. They all force a match failure, but they differ in what happens
! 417: afterwards. Those that advance the start-of-match point do so only if the
! 418: pattern is not anchored.
! 419: .sp
! 420: (*COMMIT) overall failure, no advance of starting point
! 421: (*PRUNE) advance to next starting character
! 422: (*SKIP) advance start to current matching position
! 423: (*THEN) local failure, backtrack to next alternation
! 424: .
! 425: .
! 426: .SH "NEWLINE CONVENTIONS"
! 427: .rs
! 428: .sp
! 429: These are recognized only at the very start of the pattern or after a
! 430: (*BSR_...) or (*UTF8) or (*UCP) option.
! 431: .sp
! 432: (*CR) carriage return only
! 433: (*LF) linefeed only
! 434: (*CRLF) carriage return followed by linefeed
! 435: (*ANYCRLF) all three of the above
! 436: (*ANY) any Unicode newline sequence
! 437: .
! 438: .
! 439: .SH "WHAT \eR MATCHES"
! 440: .rs
! 441: .sp
! 442: These are recognized only at the very start of the pattern or after a
! 443: (*...) option that sets the newline convention or UTF-8 or UCP mode.
! 444: .sp
! 445: (*BSR_ANYCRLF) CR, LF, or CRLF
! 446: (*BSR_UNICODE) any Unicode newline sequence
! 447: .
! 448: .
! 449: .SH "CALLOUTS"
! 450: .rs
! 451: .sp
! 452: (?C) callout
! 453: (?Cn) callout with data n
! 454: .
! 455: .
! 456: .SH "SEE ALSO"
! 457: .rs
! 458: .sp
! 459: \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
! 460: \fBpcrematching\fP(3), \fBpcre\fP(3).
! 461: .
! 462: .
! 463: .SH AUTHOR
! 464: .rs
! 465: .sp
! 466: .nf
! 467: Philip Hazel
! 468: University Computing Service
! 469: Cambridge CB2 3QH, England.
! 470: .fi
! 471: .
! 472: .
! 473: .SH REVISION
! 474: .rs
! 475: .sp
! 476: .nf
! 477: Last updated: 21 November 2010
! 478: Copyright (c) 1997-2010 University of Cambridge.
! 479: .fi
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>