Annotation of embedaddon/pcre/doc/pcresyntax.3, revision 1.1.1.2

1.1       misho       1: .TH PCRESYNTAX 3
                      2: .SH NAME
                      3: PCRE - Perl-compatible regular expressions
                      4: .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
                      5: .rs
                      6: .sp
                      7: The full syntax and semantics of the regular expressions that are supported by
                      8: PCRE are described in the
                      9: .\" HREF
                     10: \fBpcrepattern\fP
                     11: .\"
1.1.1.2 ! misho      12: documentation. This document contains a quick-reference summary of the syntax.
1.1       misho      13: .
                     14: .
                     15: .SH "QUOTING"
                     16: .rs
                     17: .sp
                     18:   \ex         where x is non-alphanumeric is a literal x
                     19:   \eQ...\eE    treat enclosed characters as literal
                     20: .
                     21: .
                     22: .SH "CHARACTERS"
                     23: .rs
                     24: .sp
                     25:   \ea         alarm, that is, the BEL character (hex 07)
                     26:   \ecx        "control-x", where x is any ASCII character
                     27:   \ee         escape (hex 1B)
                     28:   \ef         formfeed (hex 0C)
                     29:   \en         newline (hex 0A)
                     30:   \er         carriage return (hex 0D)
                     31:   \et         tab (hex 09)
                     32:   \eddd       character with octal code ddd, or backreference
                     33:   \exhh       character with hex code hh
                     34:   \ex{hhh..}  character with hex code hhh..
                     35: .
                     36: .
                     37: .SH "CHARACTER TYPES"
                     38: .rs
                     39: .sp
                     40:   .          any character except newline;
                     41:                in dotall mode, any character whatsoever
1.1.1.2 ! misho      42:   \eC         one data unit, even in UTF mode (best avoided)
1.1       misho      43:   \ed         a decimal digit
                     44:   \eD         a character that is not a decimal digit
                     45:   \eh         a horizontal whitespace character
                     46:   \eH         a character that is not a horizontal whitespace character
                     47:   \eN         a character that is not a newline
                     48:   \ep{\fIxx\fP}     a character with the \fIxx\fP property
                     49:   \eP{\fIxx\fP}     a character without the \fIxx\fP property
                     50:   \eR         a newline sequence
                     51:   \es         a whitespace character
                     52:   \eS         a character that is not a whitespace character
                     53:   \ev         a vertical whitespace character
                     54:   \eV         a character that is not a vertical whitespace character
                     55:   \ew         a "word" character
                     56:   \eW         a "non-word" character
                     57:   \eX         an extended Unicode sequence
                     58: .sp
                     59: In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
1.1.1.2 ! misho      60: characters, even in a UTF mode. However, this can be changed by setting the
1.1       misho      61: PCRE_UCP option.
                     62: .
                     63: .
                     64: .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
                     65: .rs
                     66: .sp
                     67:   C          Other
                     68:   Cc         Control
                     69:   Cf         Format
                     70:   Cn         Unassigned
                     71:   Co         Private use
                     72:   Cs         Surrogate
                     73: .sp
                     74:   L          Letter
                     75:   Ll         Lower case letter
                     76:   Lm         Modifier letter
                     77:   Lo         Other letter
                     78:   Lt         Title case letter
                     79:   Lu         Upper case letter
                     80:   L&         Ll, Lu, or Lt
                     81: .sp
                     82:   M          Mark
                     83:   Mc         Spacing mark
                     84:   Me         Enclosing mark
                     85:   Mn         Non-spacing mark
                     86: .sp
                     87:   N          Number
                     88:   Nd         Decimal number
                     89:   Nl         Letter number
                     90:   No         Other number
                     91: .sp
                     92:   P          Punctuation
                     93:   Pc         Connector punctuation
                     94:   Pd         Dash punctuation
                     95:   Pe         Close punctuation
                     96:   Pf         Final punctuation
                     97:   Pi         Initial punctuation
                     98:   Po         Other punctuation
                     99:   Ps         Open punctuation
                    100: .sp
                    101:   S          Symbol
                    102:   Sc         Currency symbol
                    103:   Sk         Modifier symbol
                    104:   Sm         Mathematical symbol
                    105:   So         Other symbol
                    106: .sp
                    107:   Z          Separator
                    108:   Zl         Line separator
                    109:   Zp         Paragraph separator
                    110:   Zs         Space separator
                    111: .
                    112: .
                    113: .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
                    114: .rs
                    115: .sp
                    116:   Xan        Alphanumeric: union of properties L and N
                    117:   Xps        POSIX space: property Z or tab, NL, VT, FF, CR
                    118:   Xsp        Perl space: property Z or tab, NL, FF, CR
                    119:   Xwd        Perl word: property Xan or underscore
                    120: .
                    121: .
                    122: .SH "SCRIPT NAMES FOR \ep AND \eP"
                    123: .rs
                    124: .sp
                    125: Arabic,
                    126: Armenian,
                    127: Avestan,
                    128: Balinese,
                    129: Bamum,
                    130: Bengali,
                    131: Bopomofo,
                    132: Braille,
                    133: Buginese,
                    134: Buhid,
                    135: Canadian_Aboriginal,
                    136: Carian,
                    137: Cham,
                    138: Cherokee,
                    139: Common,
                    140: Coptic,
                    141: Cuneiform,
                    142: Cypriot,
                    143: Cyrillic,
                    144: Deseret,
                    145: Devanagari,
                    146: Egyptian_Hieroglyphs,
                    147: Ethiopic,
                    148: Georgian,
                    149: Glagolitic,
                    150: Gothic,
                    151: Greek,
                    152: Gujarati,
                    153: Gurmukhi,
                    154: Han,
                    155: Hangul,
                    156: Hanunoo,
                    157: Hebrew,
                    158: Hiragana,
                    159: Imperial_Aramaic,
                    160: Inherited,
                    161: Inscriptional_Pahlavi,
                    162: Inscriptional_Parthian,
                    163: Javanese,
                    164: Kaithi,
                    165: Kannada,
                    166: Katakana,
                    167: Kayah_Li,
                    168: Kharoshthi,
                    169: Khmer,
                    170: Lao,
                    171: Latin,
                    172: Lepcha,
                    173: Limbu,
                    174: Linear_B,
                    175: Lisu,
                    176: Lycian,
                    177: Lydian,
                    178: Malayalam,
                    179: Meetei_Mayek,
                    180: Mongolian,
                    181: Myanmar,
                    182: New_Tai_Lue,
                    183: Nko,
                    184: Ogham,
                    185: Old_Italic,
                    186: Old_Persian,
                    187: Old_South_Arabian,
                    188: Old_Turkic,
                    189: Ol_Chiki,
                    190: Oriya,
                    191: Osmanya,
                    192: Phags_Pa,
                    193: Phoenician,
                    194: Rejang,
                    195: Runic,
                    196: Samaritan,
                    197: Saurashtra,
                    198: Shavian,
                    199: Sinhala,
                    200: Sundanese,
                    201: Syloti_Nagri,
                    202: Syriac,
                    203: Tagalog,
                    204: Tagbanwa,
                    205: Tai_Le,
                    206: Tai_Tham,
                    207: Tai_Viet,
                    208: Tamil,
                    209: Telugu,
                    210: Thaana,
                    211: Thai,
                    212: Tibetan,
                    213: Tifinagh,
                    214: Ugaritic,
                    215: Vai,
                    216: Yi.
                    217: .
                    218: .
                    219: .SH "CHARACTER CLASSES"
                    220: .rs
                    221: .sp
                    222:   [...]       positive character class
                    223:   [^...]      negative character class
                    224:   [x-y]       range (can be used for hex characters)
                    225:   [[:xxx:]]   positive POSIX named set
                    226:   [[:^xxx:]]  negative POSIX named set
                    227: .sp
                    228:   alnum       alphanumeric
                    229:   alpha       alphabetic
                    230:   ascii       0-127
                    231:   blank       space or tab
                    232:   cntrl       control character
                    233:   digit       decimal digit
                    234:   graph       printing, excluding space
                    235:   lower       lower case letter
                    236:   print       printing, including space
                    237:   punct       printing, excluding alphanumeric
                    238:   space       whitespace
                    239:   upper       upper case letter
                    240:   word        same as \ew
                    241:   xdigit      hexadecimal digit
                    242: .sp
                    243: In PCRE, POSIX character set names recognize only ASCII characters by default,
                    244: but some of them use Unicode properties if PCRE_UCP is set. You can use
                    245: \eQ...\eE inside a character class.
                    246: .
                    247: .
                    248: .SH "QUANTIFIERS"
                    249: .rs
                    250: .sp
                    251:   ?           0 or 1, greedy
                    252:   ?+          0 or 1, possessive
                    253:   ??          0 or 1, lazy
                    254:   *           0 or more, greedy
                    255:   *+          0 or more, possessive
                    256:   *?          0 or more, lazy
                    257:   +           1 or more, greedy
                    258:   ++          1 or more, possessive
                    259:   +?          1 or more, lazy
                    260:   {n}         exactly n
                    261:   {n,m}       at least n, no more than m, greedy
                    262:   {n,m}+      at least n, no more than m, possessive
                    263:   {n,m}?      at least n, no more than m, lazy
                    264:   {n,}        n or more, greedy
                    265:   {n,}+       n or more, possessive
                    266:   {n,}?       n or more, lazy
                    267: .
                    268: .
                    269: .SH "ANCHORS AND SIMPLE ASSERTIONS"
                    270: .rs
                    271: .sp
                    272:   \eb          word boundary
                    273:   \eB          not a word boundary
                    274:   ^           start of subject
                    275:                also after internal newline in multiline mode
                    276:   \eA          start of subject
                    277:   $           end of subject
                    278:                also before newline at end of subject
                    279:                also before internal newline in multiline mode
                    280:   \eZ          end of subject
                    281:                also before newline at end of subject
                    282:   \ez          end of subject
                    283:   \eG          first matching position in subject
                    284: .
                    285: .
                    286: .SH "MATCH POINT RESET"
                    287: .rs
                    288: .sp
                    289:   \eK          reset start of match
                    290: .
                    291: .
                    292: .SH "ALTERNATION"
                    293: .rs
                    294: .sp
                    295:   expr|expr|expr...
                    296: .
                    297: .
                    298: .SH "CAPTURING"
                    299: .rs
                    300: .sp
                    301:   (...)           capturing group
                    302:   (?<name>...)    named capturing group (Perl)
                    303:   (?'name'...)    named capturing group (Perl)
                    304:   (?P<name>...)   named capturing group (Python)
                    305:   (?:...)         non-capturing group
                    306:   (?|...)         non-capturing group; reset group numbers for
                    307:                    capturing groups in each alternative
                    308: .
                    309: .
                    310: .SH "ATOMIC GROUPS"
                    311: .rs
                    312: .sp
                    313:   (?>...)         atomic, non-capturing group
                    314: .
                    315: .
                    316: .
                    317: .
                    318: .SH "COMMENT"
                    319: .rs
                    320: .sp
                    321:   (?#....)        comment (not nestable)
                    322: .
                    323: .
                    324: .SH "OPTION SETTING"
                    325: .rs
                    326: .sp
                    327:   (?i)            caseless
                    328:   (?J)            allow duplicate names
                    329:   (?m)            multiline
                    330:   (?s)            single line (dotall)
                    331:   (?U)            default ungreedy (lazy)
                    332:   (?x)            extended (ignore white space)
                    333:   (?-...)         unset option(s)
                    334: .sp
                    335: The following are recognized only at the start of a pattern or after one of the
                    336: newline-setting options with similar syntax:
                    337: .sp
                    338:   (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
1.1.1.2 ! misho     339:   (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8)
        !           340:   (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16)
1.1       misho     341:   (*UCP)          set PCRE_UCP (use Unicode properties for \ed etc)
                    342: .
                    343: .
                    344: .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
                    345: .rs
                    346: .sp
                    347:   (?=...)         positive look ahead
                    348:   (?!...)         negative look ahead
                    349:   (?<=...)        positive look behind
                    350:   (?<!...)        negative look behind
                    351: .sp
                    352: Each top-level branch of a look behind must be of a fixed length.
                    353: .
                    354: .
                    355: .SH "BACKREFERENCES"
                    356: .rs
                    357: .sp
                    358:   \en              reference by number (can be ambiguous)
                    359:   \egn             reference by number
                    360:   \eg{n}           reference by number
                    361:   \eg{-n}          relative reference by number
                    362:   \ek<name>        reference by name (Perl)
                    363:   \ek'name'        reference by name (Perl)
                    364:   \eg{name}        reference by name (Perl)
                    365:   \ek{name}        reference by name (.NET)
                    366:   (?P=name)       reference by name (Python)
                    367: .
                    368: .
                    369: .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
                    370: .rs
                    371: .sp
                    372:   (?R)            recurse whole pattern
                    373:   (?n)            call subpattern by absolute number
                    374:   (?+n)           call subpattern by relative number
                    375:   (?-n)           call subpattern by relative number
                    376:   (?&name)        call subpattern by name (Perl)
                    377:   (?P>name)       call subpattern by name (Python)
                    378:   \eg<name>        call subpattern by name (Oniguruma)
                    379:   \eg'name'        call subpattern by name (Oniguruma)
                    380:   \eg<n>           call subpattern by absolute number (Oniguruma)
                    381:   \eg'n'           call subpattern by absolute number (Oniguruma)
                    382:   \eg<+n>          call subpattern by relative number (PCRE extension)
                    383:   \eg'+n'          call subpattern by relative number (PCRE extension)
                    384:   \eg<-n>          call subpattern by relative number (PCRE extension)
                    385:   \eg'-n'          call subpattern by relative number (PCRE extension)
                    386: .
                    387: .
                    388: .SH "CONDITIONAL PATTERNS"
                    389: .rs
                    390: .sp
                    391:   (?(condition)yes-pattern)
                    392:   (?(condition)yes-pattern|no-pattern)
                    393: .sp
                    394:   (?(n)...        absolute reference condition
                    395:   (?(+n)...       relative reference condition
                    396:   (?(-n)...       relative reference condition
                    397:   (?(<name>)...   named reference condition (Perl)
                    398:   (?('name')...   named reference condition (Perl)
                    399:   (?(name)...     named reference condition (PCRE)
                    400:   (?(R)...        overall recursion condition
                    401:   (?(Rn)...       specific group recursion condition
                    402:   (?(R&name)...   specific recursion condition
                    403:   (?(DEFINE)...   define subpattern for reference
                    404:   (?(assert)...   assertion condition
                    405: .
                    406: .
                    407: .SH "BACKTRACKING CONTROL"
                    408: .rs
                    409: .sp
                    410: The following act immediately they are reached:
                    411: .sp
                    412:   (*ACCEPT)       force successful match
                    413:   (*FAIL)         force backtrack; synonym (*F)
1.1.1.2 ! misho     414:   (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
1.1       misho     415: .sp
                    416: The following act only when a subsequent match failure causes a backtrack to
                    417: reach them. They all force a match failure, but they differ in what happens
                    418: afterwards. Those that advance the start-of-match point do so only if the
                    419: pattern is not anchored.
                    420: .sp
                    421:   (*COMMIT)       overall failure, no advance of starting point
                    422:   (*PRUNE)        advance to next starting character
1.1.1.2 ! misho     423:   (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
        !           424:   (*SKIP)         advance to current matching position
        !           425:   (*SKIP:NAME)    advance to position corresponding to an earlier
        !           426:                   (*MARK:NAME); if not found, the (*SKIP) is ignored
1.1       misho     427:   (*THEN)         local failure, backtrack to next alternation
1.1.1.2 ! misho     428:   (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
1.1       misho     429: .
                    430: .
                    431: .SH "NEWLINE CONVENTIONS"
                    432: .rs
                    433: .sp
                    434: These are recognized only at the very start of the pattern or after a
1.1.1.2 ! misho     435: (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option.
1.1       misho     436: .sp
                    437:   (*CR)           carriage return only
                    438:   (*LF)           linefeed only
                    439:   (*CRLF)         carriage return followed by linefeed
                    440:   (*ANYCRLF)      all three of the above
                    441:   (*ANY)          any Unicode newline sequence
                    442: .
                    443: .
                    444: .SH "WHAT \eR MATCHES"
                    445: .rs
                    446: .sp
                    447: These are recognized only at the very start of the pattern or after a
1.1.1.2 ! misho     448: (*...) option that sets the newline convention or a UTF or UCP mode.
1.1       misho     449: .sp
                    450:   (*BSR_ANYCRLF)  CR, LF, or CRLF
                    451:   (*BSR_UNICODE)  any Unicode newline sequence
                    452: .
                    453: .
                    454: .SH "CALLOUTS"
                    455: .rs
                    456: .sp
                    457:   (?C)      callout
                    458:   (?Cn)     callout with data n
                    459: .
                    460: .
                    461: .SH "SEE ALSO"
                    462: .rs
                    463: .sp
                    464: \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
                    465: \fBpcrematching\fP(3), \fBpcre\fP(3).
                    466: .
                    467: .
                    468: .SH AUTHOR
                    469: .rs
                    470: .sp
                    471: .nf
                    472: Philip Hazel
                    473: University Computing Service
                    474: Cambridge CB2 3QH, England.
                    475: .fi
                    476: .
                    477: .
                    478: .SH REVISION
                    479: .rs
                    480: .sp
                    481: .nf
1.1.1.2 ! misho     482: Last updated: 10 January 2012
        !           483: Copyright (c) 1997-2012 University of Cambridge.
1.1       misho     484: .fi

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>