Annotation of embedaddon/pcre/doc/pcresyntax.3, revision 1.1.1.5

1.1.1.5 ! misho       1: .TH PCRESYNTAX 3 "12 November 2013" "PCRE 8.34"
1.1       misho       2: .SH NAME
                      3: PCRE - Perl-compatible regular expressions
                      4: .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
                      5: .rs
                      6: .sp
                      7: The full syntax and semantics of the regular expressions that are supported by
                      8: PCRE are described in the
                      9: .\" HREF
                     10: \fBpcrepattern\fP
                     11: .\"
1.1.1.2   misho      12: documentation. This document contains a quick-reference summary of the syntax.
1.1       misho      13: .
                     14: .
                     15: .SH "QUOTING"
                     16: .rs
                     17: .sp
                     18:   \ex         where x is non-alphanumeric is a literal x
                     19:   \eQ...\eE    treat enclosed characters as literal
                     20: .
                     21: .
                     22: .SH "CHARACTERS"
                     23: .rs
                     24: .sp
                     25:   \ea         alarm, that is, the BEL character (hex 07)
                     26:   \ecx        "control-x", where x is any ASCII character
                     27:   \ee         escape (hex 1B)
1.1.1.3   misho      28:   \ef         form feed (hex 0C)
1.1       misho      29:   \en         newline (hex 0A)
                     30:   \er         carriage return (hex 0D)
                     31:   \et         tab (hex 09)
1.1.1.5 ! misho      32:   \e0dd       character with octal code 0dd
1.1       misho      33:   \eddd       character with octal code ddd, or backreference
1.1.1.5 ! misho      34:   \eo{ddd..}  character with octal code ddd..
1.1       misho      35:   \exhh       character with hex code hh
                     36:   \ex{hhh..}  character with hex code hhh..
1.1.1.5 ! misho      37: .sp
        !            38: Note that \e0dd is always an octal code, and that \e8 and \e9 are the literal
        !            39: characters "8" and "9".
1.1       misho      40: .
                     41: .
                     42: .SH "CHARACTER TYPES"
                     43: .rs
                     44: .sp
                     45:   .          any character except newline;
                     46:                in dotall mode, any character whatsoever
1.1.1.2   misho      47:   \eC         one data unit, even in UTF mode (best avoided)
1.1       misho      48:   \ed         a decimal digit
                     49:   \eD         a character that is not a decimal digit
1.1.1.3   misho      50:   \eh         a horizontal white space character
                     51:   \eH         a character that is not a horizontal white space character
1.1       misho      52:   \eN         a character that is not a newline
                     53:   \ep{\fIxx\fP}     a character with the \fIxx\fP property
                     54:   \eP{\fIxx\fP}     a character without the \fIxx\fP property
                     55:   \eR         a newline sequence
1.1.1.3   misho      56:   \es         a white space character
                     57:   \eS         a character that is not a white space character
                     58:   \ev         a vertical white space character
                     59:   \eV         a character that is not a vertical white space character
1.1       misho      60:   \ew         a "word" character
                     61:   \eW         a "non-word" character
1.1.1.4   misho      62:   \eX         a Unicode extended grapheme cluster
1.1       misho      63: .sp
1.1.1.5 ! misho      64: By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode
        !            65: or in the 16- bit and 32-bit libraries. However, if locale-specific matching is
        !            66: happening, \es and \ew may also match characters with code points in the range
        !            67: 128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences
        !            68: is changed to use Unicode properties and they match many more characters.
1.1       misho      69: .
                     70: .
                     71: .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
                     72: .rs
                     73: .sp
                     74:   C          Other
                     75:   Cc         Control
                     76:   Cf         Format
                     77:   Cn         Unassigned
                     78:   Co         Private use
                     79:   Cs         Surrogate
                     80: .sp
                     81:   L          Letter
                     82:   Ll         Lower case letter
                     83:   Lm         Modifier letter
                     84:   Lo         Other letter
                     85:   Lt         Title case letter
                     86:   Lu         Upper case letter
                     87:   L&         Ll, Lu, or Lt
                     88: .sp
                     89:   M          Mark
                     90:   Mc         Spacing mark
                     91:   Me         Enclosing mark
                     92:   Mn         Non-spacing mark
                     93: .sp
                     94:   N          Number
                     95:   Nd         Decimal number
                     96:   Nl         Letter number
                     97:   No         Other number
                     98: .sp
                     99:   P          Punctuation
                    100:   Pc         Connector punctuation
                    101:   Pd         Dash punctuation
                    102:   Pe         Close punctuation
                    103:   Pf         Final punctuation
                    104:   Pi         Initial punctuation
                    105:   Po         Other punctuation
                    106:   Ps         Open punctuation
                    107: .sp
                    108:   S          Symbol
                    109:   Sc         Currency symbol
                    110:   Sk         Modifier symbol
                    111:   Sm         Mathematical symbol
                    112:   So         Other symbol
                    113: .sp
                    114:   Z          Separator
                    115:   Zl         Line separator
                    116:   Zp         Paragraph separator
                    117:   Zs         Space separator
                    118: .
                    119: .
                    120: .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
                    121: .rs
                    122: .sp
                    123:   Xan        Alphanumeric: union of properties L and N
                    124:   Xps        POSIX space: property Z or tab, NL, VT, FF, CR
1.1.1.5 ! misho     125:   Xsp        Perl space: property Z or tab, NL, VT, FF, CR
1.1.1.4   misho     126:   Xuc        Univerally-named character: one that can be
                    127:                represented by a Universal Character Name
1.1       misho     128:   Xwd        Perl word: property Xan or underscore
1.1.1.5 ! misho     129: .sp
        !           130: Perl and POSIX space are now the same. Perl added VT to its space character set
        !           131: at release 5.18 and PCRE changed at release 8.34.
1.1       misho     132: .
                    133: .
                    134: .SH "SCRIPT NAMES FOR \ep AND \eP"
                    135: .rs
                    136: .sp
                    137: Arabic,
                    138: Armenian,
                    139: Avestan,
                    140: Balinese,
                    141: Bamum,
1.1.1.3   misho     142: Batak,
1.1       misho     143: Bengali,
                    144: Bopomofo,
1.1.1.3   misho     145: Brahmi,
1.1       misho     146: Braille,
                    147: Buginese,
                    148: Buhid,
                    149: Canadian_Aboriginal,
                    150: Carian,
1.1.1.3   misho     151: Chakma,
1.1       misho     152: Cham,
                    153: Cherokee,
                    154: Common,
                    155: Coptic,
                    156: Cuneiform,
                    157: Cypriot,
                    158: Cyrillic,
                    159: Deseret,
                    160: Devanagari,
                    161: Egyptian_Hieroglyphs,
                    162: Ethiopic,
                    163: Georgian,
                    164: Glagolitic,
                    165: Gothic,
                    166: Greek,
                    167: Gujarati,
                    168: Gurmukhi,
                    169: Han,
                    170: Hangul,
                    171: Hanunoo,
                    172: Hebrew,
                    173: Hiragana,
                    174: Imperial_Aramaic,
                    175: Inherited,
                    176: Inscriptional_Pahlavi,
                    177: Inscriptional_Parthian,
                    178: Javanese,
                    179: Kaithi,
                    180: Kannada,
                    181: Katakana,
                    182: Kayah_Li,
                    183: Kharoshthi,
                    184: Khmer,
                    185: Lao,
                    186: Latin,
                    187: Lepcha,
                    188: Limbu,
                    189: Linear_B,
                    190: Lisu,
                    191: Lycian,
                    192: Lydian,
                    193: Malayalam,
1.1.1.3   misho     194: Mandaic,
1.1       misho     195: Meetei_Mayek,
1.1.1.3   misho     196: Meroitic_Cursive,
                    197: Meroitic_Hieroglyphs,
                    198: Miao,
1.1       misho     199: Mongolian,
                    200: Myanmar,
                    201: New_Tai_Lue,
                    202: Nko,
                    203: Ogham,
                    204: Old_Italic,
                    205: Old_Persian,
                    206: Old_South_Arabian,
                    207: Old_Turkic,
                    208: Ol_Chiki,
                    209: Oriya,
                    210: Osmanya,
                    211: Phags_Pa,
                    212: Phoenician,
                    213: Rejang,
                    214: Runic,
                    215: Samaritan,
                    216: Saurashtra,
1.1.1.3   misho     217: Sharada,
1.1       misho     218: Shavian,
                    219: Sinhala,
1.1.1.3   misho     220: Sora_Sompeng,
1.1       misho     221: Sundanese,
                    222: Syloti_Nagri,
                    223: Syriac,
                    224: Tagalog,
                    225: Tagbanwa,
                    226: Tai_Le,
                    227: Tai_Tham,
                    228: Tai_Viet,
1.1.1.3   misho     229: Takri,
1.1       misho     230: Tamil,
                    231: Telugu,
                    232: Thaana,
                    233: Thai,
                    234: Tibetan,
                    235: Tifinagh,
                    236: Ugaritic,
                    237: Vai,
                    238: Yi.
                    239: .
                    240: .
                    241: .SH "CHARACTER CLASSES"
                    242: .rs
                    243: .sp
                    244:   [...]       positive character class
                    245:   [^...]      negative character class
                    246:   [x-y]       range (can be used for hex characters)
                    247:   [[:xxx:]]   positive POSIX named set
                    248:   [[:^xxx:]]  negative POSIX named set
                    249: .sp
                    250:   alnum       alphanumeric
                    251:   alpha       alphabetic
                    252:   ascii       0-127
                    253:   blank       space or tab
                    254:   cntrl       control character
                    255:   digit       decimal digit
                    256:   graph       printing, excluding space
                    257:   lower       lower case letter
                    258:   print       printing, including space
                    259:   punct       printing, excluding alphanumeric
1.1.1.3   misho     260:   space       white space
1.1       misho     261:   upper       upper case letter
                    262:   word        same as \ew
                    263:   xdigit      hexadecimal digit
                    264: .sp
                    265: In PCRE, POSIX character set names recognize only ASCII characters by default,
                    266: but some of them use Unicode properties if PCRE_UCP is set. You can use
                    267: \eQ...\eE inside a character class.
                    268: .
                    269: .
                    270: .SH "QUANTIFIERS"
                    271: .rs
                    272: .sp
                    273:   ?           0 or 1, greedy
                    274:   ?+          0 or 1, possessive
                    275:   ??          0 or 1, lazy
                    276:   *           0 or more, greedy
                    277:   *+          0 or more, possessive
                    278:   *?          0 or more, lazy
                    279:   +           1 or more, greedy
                    280:   ++          1 or more, possessive
                    281:   +?          1 or more, lazy
                    282:   {n}         exactly n
                    283:   {n,m}       at least n, no more than m, greedy
                    284:   {n,m}+      at least n, no more than m, possessive
                    285:   {n,m}?      at least n, no more than m, lazy
                    286:   {n,}        n or more, greedy
                    287:   {n,}+       n or more, possessive
                    288:   {n,}?       n or more, lazy
                    289: .
                    290: .
                    291: .SH "ANCHORS AND SIMPLE ASSERTIONS"
                    292: .rs
                    293: .sp
                    294:   \eb          word boundary
                    295:   \eB          not a word boundary
                    296:   ^           start of subject
                    297:                also after internal newline in multiline mode
                    298:   \eA          start of subject
                    299:   $           end of subject
                    300:                also before newline at end of subject
                    301:                also before internal newline in multiline mode
                    302:   \eZ          end of subject
                    303:                also before newline at end of subject
                    304:   \ez          end of subject
                    305:   \eG          first matching position in subject
                    306: .
                    307: .
                    308: .SH "MATCH POINT RESET"
                    309: .rs
                    310: .sp
                    311:   \eK          reset start of match
                    312: .
                    313: .
                    314: .SH "ALTERNATION"
                    315: .rs
                    316: .sp
                    317:   expr|expr|expr...
                    318: .
                    319: .
                    320: .SH "CAPTURING"
                    321: .rs
                    322: .sp
                    323:   (...)           capturing group
                    324:   (?<name>...)    named capturing group (Perl)
                    325:   (?'name'...)    named capturing group (Perl)
                    326:   (?P<name>...)   named capturing group (Python)
                    327:   (?:...)         non-capturing group
                    328:   (?|...)         non-capturing group; reset group numbers for
                    329:                    capturing groups in each alternative
                    330: .
                    331: .
                    332: .SH "ATOMIC GROUPS"
                    333: .rs
                    334: .sp
                    335:   (?>...)         atomic, non-capturing group
                    336: .
                    337: .
                    338: .
                    339: .
                    340: .SH "COMMENT"
                    341: .rs
                    342: .sp
                    343:   (?#....)        comment (not nestable)
                    344: .
                    345: .
                    346: .SH "OPTION SETTING"
                    347: .rs
                    348: .sp
                    349:   (?i)            caseless
                    350:   (?J)            allow duplicate names
                    351:   (?m)            multiline
                    352:   (?s)            single line (dotall)
                    353:   (?U)            default ungreedy (lazy)
                    354:   (?x)            extended (ignore white space)
                    355:   (?-...)         unset option(s)
                    356: .sp
                    357: The following are recognized only at the start of a pattern or after one of the
                    358: newline-setting options with similar syntax:
                    359: .sp
1.1.1.4   misho     360:   (*LIMIT_MATCH=d) set the match limit to d (decimal number)
                    361:   (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
1.1       misho     362:   (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
1.1.1.2   misho     363:   (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8)
                    364:   (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16)
1.1.1.4   misho     365:   (*UTF32)        set UTF-32 mode: 32-bit library (PCRE_UTF32)
                    366:   (*UTF)          set appropriate UTF mode for the library in use
1.1       misho     367:   (*UCP)          set PCRE_UCP (use Unicode properties for \ed etc)
1.1.1.5 ! misho     368: .sp
        !           369: Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
        !           370: limits set by the caller of pcre_exec(), not increase them.
1.1       misho     371: .
                    372: .
                    373: .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
                    374: .rs
                    375: .sp
                    376:   (?=...)         positive look ahead
                    377:   (?!...)         negative look ahead
                    378:   (?<=...)        positive look behind
                    379:   (?<!...)        negative look behind
                    380: .sp
                    381: Each top-level branch of a look behind must be of a fixed length.
                    382: .
                    383: .
                    384: .SH "BACKREFERENCES"
                    385: .rs
                    386: .sp
                    387:   \en              reference by number (can be ambiguous)
                    388:   \egn             reference by number
                    389:   \eg{n}           reference by number
                    390:   \eg{-n}          relative reference by number
                    391:   \ek<name>        reference by name (Perl)
                    392:   \ek'name'        reference by name (Perl)
                    393:   \eg{name}        reference by name (Perl)
                    394:   \ek{name}        reference by name (.NET)
                    395:   (?P=name)       reference by name (Python)
                    396: .
                    397: .
                    398: .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
                    399: .rs
                    400: .sp
                    401:   (?R)            recurse whole pattern
                    402:   (?n)            call subpattern by absolute number
                    403:   (?+n)           call subpattern by relative number
                    404:   (?-n)           call subpattern by relative number
                    405:   (?&name)        call subpattern by name (Perl)
                    406:   (?P>name)       call subpattern by name (Python)
                    407:   \eg<name>        call subpattern by name (Oniguruma)
                    408:   \eg'name'        call subpattern by name (Oniguruma)
                    409:   \eg<n>           call subpattern by absolute number (Oniguruma)
                    410:   \eg'n'           call subpattern by absolute number (Oniguruma)
                    411:   \eg<+n>          call subpattern by relative number (PCRE extension)
                    412:   \eg'+n'          call subpattern by relative number (PCRE extension)
                    413:   \eg<-n>          call subpattern by relative number (PCRE extension)
                    414:   \eg'-n'          call subpattern by relative number (PCRE extension)
                    415: .
                    416: .
                    417: .SH "CONDITIONAL PATTERNS"
                    418: .rs
                    419: .sp
                    420:   (?(condition)yes-pattern)
                    421:   (?(condition)yes-pattern|no-pattern)
                    422: .sp
                    423:   (?(n)...        absolute reference condition
                    424:   (?(+n)...       relative reference condition
                    425:   (?(-n)...       relative reference condition
                    426:   (?(<name>)...   named reference condition (Perl)
                    427:   (?('name')...   named reference condition (Perl)
                    428:   (?(name)...     named reference condition (PCRE)
                    429:   (?(R)...        overall recursion condition
                    430:   (?(Rn)...       specific group recursion condition
                    431:   (?(R&name)...   specific recursion condition
                    432:   (?(DEFINE)...   define subpattern for reference
                    433:   (?(assert)...   assertion condition
                    434: .
                    435: .
                    436: .SH "BACKTRACKING CONTROL"
                    437: .rs
                    438: .sp
                    439: The following act immediately they are reached:
                    440: .sp
                    441:   (*ACCEPT)       force successful match
                    442:   (*FAIL)         force backtrack; synonym (*F)
1.1.1.2   misho     443:   (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
1.1       misho     444: .sp
                    445: The following act only when a subsequent match failure causes a backtrack to
                    446: reach them. They all force a match failure, but they differ in what happens
                    447: afterwards. Those that advance the start-of-match point do so only if the
                    448: pattern is not anchored.
                    449: .sp
                    450:   (*COMMIT)       overall failure, no advance of starting point
                    451:   (*PRUNE)        advance to next starting character
1.1.1.2   misho     452:   (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
                    453:   (*SKIP)         advance to current matching position
                    454:   (*SKIP:NAME)    advance to position corresponding to an earlier
                    455:                   (*MARK:NAME); if not found, the (*SKIP) is ignored
1.1       misho     456:   (*THEN)         local failure, backtrack to next alternation
1.1.1.2   misho     457:   (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
1.1       misho     458: .
                    459: .
                    460: .SH "NEWLINE CONVENTIONS"
                    461: .rs
                    462: .sp
                    463: These are recognized only at the very start of the pattern or after a
1.1.1.4   misho     464: (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
1.1       misho     465: .sp
                    466:   (*CR)           carriage return only
                    467:   (*LF)           linefeed only
                    468:   (*CRLF)         carriage return followed by linefeed
                    469:   (*ANYCRLF)      all three of the above
                    470:   (*ANY)          any Unicode newline sequence
                    471: .
                    472: .
                    473: .SH "WHAT \eR MATCHES"
                    474: .rs
                    475: .sp
                    476: These are recognized only at the very start of the pattern or after a
1.1.1.2   misho     477: (*...) option that sets the newline convention or a UTF or UCP mode.
1.1       misho     478: .sp
                    479:   (*BSR_ANYCRLF)  CR, LF, or CRLF
                    480:   (*BSR_UNICODE)  any Unicode newline sequence
                    481: .
                    482: .
                    483: .SH "CALLOUTS"
                    484: .rs
                    485: .sp
                    486:   (?C)      callout
                    487:   (?Cn)     callout with data n
                    488: .
                    489: .
                    490: .SH "SEE ALSO"
                    491: .rs
                    492: .sp
                    493: \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
                    494: \fBpcrematching\fP(3), \fBpcre\fP(3).
                    495: .
                    496: .
                    497: .SH AUTHOR
                    498: .rs
                    499: .sp
                    500: .nf
                    501: Philip Hazel
                    502: University Computing Service
                    503: Cambridge CB2 3QH, England.
                    504: .fi
                    505: .
                    506: .
                    507: .SH REVISION
                    508: .rs
                    509: .sp
                    510: .nf
1.1.1.5 ! misho     511: Last updated: 12 November 2013
1.1.1.4   misho     512: Copyright (c) 1997-2013 University of Cambridge.
1.1       misho     513: .fi

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>