Annotation of embedaddon/pcre/doc/pcresyntax.3, revision 1.1

1.1     ! misho       1: .TH PCRESYNTAX 3
        !             2: .SH NAME
        !             3: PCRE - Perl-compatible regular expressions
        !             4: .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
        !             5: .rs
        !             6: .sp
        !             7: The full syntax and semantics of the regular expressions that are supported by
        !             8: PCRE are described in the
        !             9: .\" HREF
        !            10: \fBpcrepattern\fP
        !            11: .\"
        !            12: documentation. This document contains just a quick-reference summary of the
        !            13: syntax.
        !            14: .
        !            15: .
        !            16: .SH "QUOTING"
        !            17: .rs
        !            18: .sp
        !            19:   \ex         where x is non-alphanumeric is a literal x
        !            20:   \eQ...\eE    treat enclosed characters as literal
        !            21: .
        !            22: .
        !            23: .SH "CHARACTERS"
        !            24: .rs
        !            25: .sp
        !            26:   \ea         alarm, that is, the BEL character (hex 07)
        !            27:   \ecx        "control-x", where x is any ASCII character
        !            28:   \ee         escape (hex 1B)
        !            29:   \ef         formfeed (hex 0C)
        !            30:   \en         newline (hex 0A)
        !            31:   \er         carriage return (hex 0D)
        !            32:   \et         tab (hex 09)
        !            33:   \eddd       character with octal code ddd, or backreference
        !            34:   \exhh       character with hex code hh
        !            35:   \ex{hhh..}  character with hex code hhh..
        !            36: .
        !            37: .
        !            38: .SH "CHARACTER TYPES"
        !            39: .rs
        !            40: .sp
        !            41:   .          any character except newline;
        !            42:                in dotall mode, any character whatsoever
        !            43:   \eC         one byte, even in UTF-8 mode (best avoided)
        !            44:   \ed         a decimal digit
        !            45:   \eD         a character that is not a decimal digit
        !            46:   \eh         a horizontal whitespace character
        !            47:   \eH         a character that is not a horizontal whitespace character
        !            48:   \eN         a character that is not a newline
        !            49:   \ep{\fIxx\fP}     a character with the \fIxx\fP property
        !            50:   \eP{\fIxx\fP}     a character without the \fIxx\fP property
        !            51:   \eR         a newline sequence
        !            52:   \es         a whitespace character
        !            53:   \eS         a character that is not a whitespace character
        !            54:   \ev         a vertical whitespace character
        !            55:   \eV         a character that is not a vertical whitespace character
        !            56:   \ew         a "word" character
        !            57:   \eW         a "non-word" character
        !            58:   \eX         an extended Unicode sequence
        !            59: .sp
        !            60: In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
        !            61: characters, even in UTF-8 mode. However, this can be changed by setting the
        !            62: PCRE_UCP option.
        !            63: .
        !            64: .
        !            65: .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
        !            66: .rs
        !            67: .sp
        !            68:   C          Other
        !            69:   Cc         Control
        !            70:   Cf         Format
        !            71:   Cn         Unassigned
        !            72:   Co         Private use
        !            73:   Cs         Surrogate
        !            74: .sp
        !            75:   L          Letter
        !            76:   Ll         Lower case letter
        !            77:   Lm         Modifier letter
        !            78:   Lo         Other letter
        !            79:   Lt         Title case letter
        !            80:   Lu         Upper case letter
        !            81:   L&         Ll, Lu, or Lt
        !            82: .sp
        !            83:   M          Mark
        !            84:   Mc         Spacing mark
        !            85:   Me         Enclosing mark
        !            86:   Mn         Non-spacing mark
        !            87: .sp
        !            88:   N          Number
        !            89:   Nd         Decimal number
        !            90:   Nl         Letter number
        !            91:   No         Other number
        !            92: .sp
        !            93:   P          Punctuation
        !            94:   Pc         Connector punctuation
        !            95:   Pd         Dash punctuation
        !            96:   Pe         Close punctuation
        !            97:   Pf         Final punctuation
        !            98:   Pi         Initial punctuation
        !            99:   Po         Other punctuation
        !           100:   Ps         Open punctuation
        !           101: .sp
        !           102:   S          Symbol
        !           103:   Sc         Currency symbol
        !           104:   Sk         Modifier symbol
        !           105:   Sm         Mathematical symbol
        !           106:   So         Other symbol
        !           107: .sp
        !           108:   Z          Separator
        !           109:   Zl         Line separator
        !           110:   Zp         Paragraph separator
        !           111:   Zs         Space separator
        !           112: .
        !           113: .
        !           114: .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
        !           115: .rs
        !           116: .sp
        !           117:   Xan        Alphanumeric: union of properties L and N
        !           118:   Xps        POSIX space: property Z or tab, NL, VT, FF, CR
        !           119:   Xsp        Perl space: property Z or tab, NL, FF, CR
        !           120:   Xwd        Perl word: property Xan or underscore
        !           121: .
        !           122: .
        !           123: .SH "SCRIPT NAMES FOR \ep AND \eP"
        !           124: .rs
        !           125: .sp
        !           126: Arabic,
        !           127: Armenian,
        !           128: Avestan,
        !           129: Balinese,
        !           130: Bamum,
        !           131: Bengali,
        !           132: Bopomofo,
        !           133: Braille,
        !           134: Buginese,
        !           135: Buhid,
        !           136: Canadian_Aboriginal,
        !           137: Carian,
        !           138: Cham,
        !           139: Cherokee,
        !           140: Common,
        !           141: Coptic,
        !           142: Cuneiform,
        !           143: Cypriot,
        !           144: Cyrillic,
        !           145: Deseret,
        !           146: Devanagari,
        !           147: Egyptian_Hieroglyphs,
        !           148: Ethiopic,
        !           149: Georgian,
        !           150: Glagolitic,
        !           151: Gothic,
        !           152: Greek,
        !           153: Gujarati,
        !           154: Gurmukhi,
        !           155: Han,
        !           156: Hangul,
        !           157: Hanunoo,
        !           158: Hebrew,
        !           159: Hiragana,
        !           160: Imperial_Aramaic,
        !           161: Inherited,
        !           162: Inscriptional_Pahlavi,
        !           163: Inscriptional_Parthian,
        !           164: Javanese,
        !           165: Kaithi,
        !           166: Kannada,
        !           167: Katakana,
        !           168: Kayah_Li,
        !           169: Kharoshthi,
        !           170: Khmer,
        !           171: Lao,
        !           172: Latin,
        !           173: Lepcha,
        !           174: Limbu,
        !           175: Linear_B,
        !           176: Lisu,
        !           177: Lycian,
        !           178: Lydian,
        !           179: Malayalam,
        !           180: Meetei_Mayek,
        !           181: Mongolian,
        !           182: Myanmar,
        !           183: New_Tai_Lue,
        !           184: Nko,
        !           185: Ogham,
        !           186: Old_Italic,
        !           187: Old_Persian,
        !           188: Old_South_Arabian,
        !           189: Old_Turkic,
        !           190: Ol_Chiki,
        !           191: Oriya,
        !           192: Osmanya,
        !           193: Phags_Pa,
        !           194: Phoenician,
        !           195: Rejang,
        !           196: Runic,
        !           197: Samaritan,
        !           198: Saurashtra,
        !           199: Shavian,
        !           200: Sinhala,
        !           201: Sundanese,
        !           202: Syloti_Nagri,
        !           203: Syriac,
        !           204: Tagalog,
        !           205: Tagbanwa,
        !           206: Tai_Le,
        !           207: Tai_Tham,
        !           208: Tai_Viet,
        !           209: Tamil,
        !           210: Telugu,
        !           211: Thaana,
        !           212: Thai,
        !           213: Tibetan,
        !           214: Tifinagh,
        !           215: Ugaritic,
        !           216: Vai,
        !           217: Yi.
        !           218: .
        !           219: .
        !           220: .SH "CHARACTER CLASSES"
        !           221: .rs
        !           222: .sp
        !           223:   [...]       positive character class
        !           224:   [^...]      negative character class
        !           225:   [x-y]       range (can be used for hex characters)
        !           226:   [[:xxx:]]   positive POSIX named set
        !           227:   [[:^xxx:]]  negative POSIX named set
        !           228: .sp
        !           229:   alnum       alphanumeric
        !           230:   alpha       alphabetic
        !           231:   ascii       0-127
        !           232:   blank       space or tab
        !           233:   cntrl       control character
        !           234:   digit       decimal digit
        !           235:   graph       printing, excluding space
        !           236:   lower       lower case letter
        !           237:   print       printing, including space
        !           238:   punct       printing, excluding alphanumeric
        !           239:   space       whitespace
        !           240:   upper       upper case letter
        !           241:   word        same as \ew
        !           242:   xdigit      hexadecimal digit
        !           243: .sp
        !           244: In PCRE, POSIX character set names recognize only ASCII characters by default,
        !           245: but some of them use Unicode properties if PCRE_UCP is set. You can use
        !           246: \eQ...\eE inside a character class.
        !           247: .
        !           248: .
        !           249: .SH "QUANTIFIERS"
        !           250: .rs
        !           251: .sp
        !           252:   ?           0 or 1, greedy
        !           253:   ?+          0 or 1, possessive
        !           254:   ??          0 or 1, lazy
        !           255:   *           0 or more, greedy
        !           256:   *+          0 or more, possessive
        !           257:   *?          0 or more, lazy
        !           258:   +           1 or more, greedy
        !           259:   ++          1 or more, possessive
        !           260:   +?          1 or more, lazy
        !           261:   {n}         exactly n
        !           262:   {n,m}       at least n, no more than m, greedy
        !           263:   {n,m}+      at least n, no more than m, possessive
        !           264:   {n,m}?      at least n, no more than m, lazy
        !           265:   {n,}        n or more, greedy
        !           266:   {n,}+       n or more, possessive
        !           267:   {n,}?       n or more, lazy
        !           268: .
        !           269: .
        !           270: .SH "ANCHORS AND SIMPLE ASSERTIONS"
        !           271: .rs
        !           272: .sp
        !           273:   \eb          word boundary
        !           274:   \eB          not a word boundary
        !           275:   ^           start of subject
        !           276:                also after internal newline in multiline mode
        !           277:   \eA          start of subject
        !           278:   $           end of subject
        !           279:                also before newline at end of subject
        !           280:                also before internal newline in multiline mode
        !           281:   \eZ          end of subject
        !           282:                also before newline at end of subject
        !           283:   \ez          end of subject
        !           284:   \eG          first matching position in subject
        !           285: .
        !           286: .
        !           287: .SH "MATCH POINT RESET"
        !           288: .rs
        !           289: .sp
        !           290:   \eK          reset start of match
        !           291: .
        !           292: .
        !           293: .SH "ALTERNATION"
        !           294: .rs
        !           295: .sp
        !           296:   expr|expr|expr...
        !           297: .
        !           298: .
        !           299: .SH "CAPTURING"
        !           300: .rs
        !           301: .sp
        !           302:   (...)           capturing group
        !           303:   (?<name>...)    named capturing group (Perl)
        !           304:   (?'name'...)    named capturing group (Perl)
        !           305:   (?P<name>...)   named capturing group (Python)
        !           306:   (?:...)         non-capturing group
        !           307:   (?|...)         non-capturing group; reset group numbers for
        !           308:                    capturing groups in each alternative
        !           309: .
        !           310: .
        !           311: .SH "ATOMIC GROUPS"
        !           312: .rs
        !           313: .sp
        !           314:   (?>...)         atomic, non-capturing group
        !           315: .
        !           316: .
        !           317: .
        !           318: .
        !           319: .SH "COMMENT"
        !           320: .rs
        !           321: .sp
        !           322:   (?#....)        comment (not nestable)
        !           323: .
        !           324: .
        !           325: .SH "OPTION SETTING"
        !           326: .rs
        !           327: .sp
        !           328:   (?i)            caseless
        !           329:   (?J)            allow duplicate names
        !           330:   (?m)            multiline
        !           331:   (?s)            single line (dotall)
        !           332:   (?U)            default ungreedy (lazy)
        !           333:   (?x)            extended (ignore white space)
        !           334:   (?-...)         unset option(s)
        !           335: .sp
        !           336: The following are recognized only at the start of a pattern or after one of the
        !           337: newline-setting options with similar syntax:
        !           338: .sp
        !           339:   (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
        !           340:   (*UTF8)         set UTF-8 mode (PCRE_UTF8)
        !           341:   (*UCP)          set PCRE_UCP (use Unicode properties for \ed etc)
        !           342: .
        !           343: .
        !           344: .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
        !           345: .rs
        !           346: .sp
        !           347:   (?=...)         positive look ahead
        !           348:   (?!...)         negative look ahead
        !           349:   (?<=...)        positive look behind
        !           350:   (?<!...)        negative look behind
        !           351: .sp
        !           352: Each top-level branch of a look behind must be of a fixed length.
        !           353: .
        !           354: .
        !           355: .SH "BACKREFERENCES"
        !           356: .rs
        !           357: .sp
        !           358:   \en              reference by number (can be ambiguous)
        !           359:   \egn             reference by number
        !           360:   \eg{n}           reference by number
        !           361:   \eg{-n}          relative reference by number
        !           362:   \ek<name>        reference by name (Perl)
        !           363:   \ek'name'        reference by name (Perl)
        !           364:   \eg{name}        reference by name (Perl)
        !           365:   \ek{name}        reference by name (.NET)
        !           366:   (?P=name)       reference by name (Python)
        !           367: .
        !           368: .
        !           369: .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
        !           370: .rs
        !           371: .sp
        !           372:   (?R)            recurse whole pattern
        !           373:   (?n)            call subpattern by absolute number
        !           374:   (?+n)           call subpattern by relative number
        !           375:   (?-n)           call subpattern by relative number
        !           376:   (?&name)        call subpattern by name (Perl)
        !           377:   (?P>name)       call subpattern by name (Python)
        !           378:   \eg<name>        call subpattern by name (Oniguruma)
        !           379:   \eg'name'        call subpattern by name (Oniguruma)
        !           380:   \eg<n>           call subpattern by absolute number (Oniguruma)
        !           381:   \eg'n'           call subpattern by absolute number (Oniguruma)
        !           382:   \eg<+n>          call subpattern by relative number (PCRE extension)
        !           383:   \eg'+n'          call subpattern by relative number (PCRE extension)
        !           384:   \eg<-n>          call subpattern by relative number (PCRE extension)
        !           385:   \eg'-n'          call subpattern by relative number (PCRE extension)
        !           386: .
        !           387: .
        !           388: .SH "CONDITIONAL PATTERNS"
        !           389: .rs
        !           390: .sp
        !           391:   (?(condition)yes-pattern)
        !           392:   (?(condition)yes-pattern|no-pattern)
        !           393: .sp
        !           394:   (?(n)...        absolute reference condition
        !           395:   (?(+n)...       relative reference condition
        !           396:   (?(-n)...       relative reference condition
        !           397:   (?(<name>)...   named reference condition (Perl)
        !           398:   (?('name')...   named reference condition (Perl)
        !           399:   (?(name)...     named reference condition (PCRE)
        !           400:   (?(R)...        overall recursion condition
        !           401:   (?(Rn)...       specific group recursion condition
        !           402:   (?(R&name)...   specific recursion condition
        !           403:   (?(DEFINE)...   define subpattern for reference
        !           404:   (?(assert)...   assertion condition
        !           405: .
        !           406: .
        !           407: .SH "BACKTRACKING CONTROL"
        !           408: .rs
        !           409: .sp
        !           410: The following act immediately they are reached:
        !           411: .sp
        !           412:   (*ACCEPT)       force successful match
        !           413:   (*FAIL)         force backtrack; synonym (*F)
        !           414: .sp
        !           415: The following act only when a subsequent match failure causes a backtrack to
        !           416: reach them. They all force a match failure, but they differ in what happens
        !           417: afterwards. Those that advance the start-of-match point do so only if the
        !           418: pattern is not anchored.
        !           419: .sp
        !           420:   (*COMMIT)       overall failure, no advance of starting point
        !           421:   (*PRUNE)        advance to next starting character
        !           422:   (*SKIP)         advance start to current matching position
        !           423:   (*THEN)         local failure, backtrack to next alternation
        !           424: .
        !           425: .
        !           426: .SH "NEWLINE CONVENTIONS"
        !           427: .rs
        !           428: .sp
        !           429: These are recognized only at the very start of the pattern or after a
        !           430: (*BSR_...) or (*UTF8) or (*UCP) option.
        !           431: .sp
        !           432:   (*CR)           carriage return only
        !           433:   (*LF)           linefeed only
        !           434:   (*CRLF)         carriage return followed by linefeed
        !           435:   (*ANYCRLF)      all three of the above
        !           436:   (*ANY)          any Unicode newline sequence
        !           437: .
        !           438: .
        !           439: .SH "WHAT \eR MATCHES"
        !           440: .rs
        !           441: .sp
        !           442: These are recognized only at the very start of the pattern or after a
        !           443: (*...) option that sets the newline convention or UTF-8 or UCP mode.
        !           444: .sp
        !           445:   (*BSR_ANYCRLF)  CR, LF, or CRLF
        !           446:   (*BSR_UNICODE)  any Unicode newline sequence
        !           447: .
        !           448: .
        !           449: .SH "CALLOUTS"
        !           450: .rs
        !           451: .sp
        !           452:   (?C)      callout
        !           453:   (?Cn)     callout with data n
        !           454: .
        !           455: .
        !           456: .SH "SEE ALSO"
        !           457: .rs
        !           458: .sp
        !           459: \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
        !           460: \fBpcrematching\fP(3), \fBpcre\fP(3).
        !           461: .
        !           462: .
        !           463: .SH AUTHOR
        !           464: .rs
        !           465: .sp
        !           466: .nf
        !           467: Philip Hazel
        !           468: University Computing Service
        !           469: Cambridge CB2 3QH, England.
        !           470: .fi
        !           471: .
        !           472: .
        !           473: .SH REVISION
        !           474: .rs
        !           475: .sp
        !           476: .nf
        !           477: Last updated: 21 November 2010
        !           478: Copyright (c) 1997-2010 University of Cambridge.
        !           479: .fi

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>