Annotation of embedaddon/php/ext/ereg/regex/regex.3, revision 1.1

1.1     ! misho       1: .TH REGEX 3 "17 May 1993"
        !             2: .BY "Henry Spencer"
        !             3: .de ZR
        !             4: .\" one other place knows this name:  the SEE ALSO section
        !             5: .IR regex (7) \\$1
        !             6: ..
        !             7: .SH NAME
        !             8: regcomp, regexec, regerror, regfree \- regular-expression library
        !             9: .SH SYNOPSIS
        !            10: .ft B
        !            11: .\".na
        !            12: #include <sys/types.h>
        !            13: .br
        !            14: #include <regex.h>
        !            15: .HP 10
        !            16: int regcomp(regex_t\ *preg, const\ char\ *pattern, int\ cflags);
        !            17: .HP
        !            18: int\ regexec(const\ regex_t\ *preg, const\ char\ *string,
        !            19: size_t\ nmatch, regmatch_t\ pmatch[], int\ eflags);
        !            20: .HP
        !            21: size_t\ regerror(int\ errcode, const\ regex_t\ *preg,
        !            22: char\ *errbuf, size_t\ errbuf_size);
        !            23: .HP
        !            24: void\ regfree(regex_t\ *preg);
        !            25: .\".ad
        !            26: .ft
        !            27: .SH DESCRIPTION
        !            28: These routines implement POSIX 1003.2 regular expressions (``RE''s);
        !            29: see
        !            30: .ZR .
        !            31: .I Regcomp
        !            32: compiles an RE written as a string into an internal form,
        !            33: .I regexec
        !            34: matches that internal form against a string and reports results,
        !            35: .I regerror
        !            36: transforms error codes from either into human-readable messages,
        !            37: and
        !            38: .I regfree
        !            39: frees any dynamically-allocated storage used by the internal form
        !            40: of an RE.
        !            41: .PP
        !            42: The header
        !            43: .I <regex.h>
        !            44: declares two structure types,
        !            45: .I regex_t
        !            46: and
        !            47: .IR regmatch_t ,
        !            48: the former for compiled internal forms and the latter for match reporting.
        !            49: It also declares the four functions,
        !            50: a type
        !            51: .IR regoff_t ,
        !            52: and a number of constants with names starting with ``REG_''.
        !            53: .PP
        !            54: .I Regcomp
        !            55: compiles the regular expression contained in the
        !            56: .I pattern
        !            57: string,
        !            58: subject to the flags in
        !            59: .IR cflags ,
        !            60: and places the results in the
        !            61: .I regex_t
        !            62: structure pointed to by
        !            63: .IR preg .
        !            64: .I Cflags
        !            65: is the bitwise OR of zero or more of the following flags:
        !            66: .IP REG_EXTENDED \w'REG_EXTENDED'u+2n
        !            67: Compile modern (``extended'') REs,
        !            68: rather than the obsolete (``basic'') REs that
        !            69: are the default.
        !            70: .IP REG_BASIC
        !            71: This is a synonym for 0,
        !            72: provided as a counterpart to REG_EXTENDED to improve readability.
        !            73: .IP REG_NOSPEC
        !            74: Compile with recognition of all special characters turned off.
        !            75: All characters are thus considered ordinary,
        !            76: so the ``RE'' is a literal string.
        !            77: This is an extension,
        !            78: compatible with but not specified by POSIX 1003.2,
        !            79: and should be used with
        !            80: caution in software intended to be portable to other systems.
        !            81: REG_EXTENDED and REG_NOSPEC may not be used
        !            82: in the same call to
        !            83: .IR regcomp .
        !            84: .IP REG_ICASE
        !            85: Compile for matching that ignores upper/lower case distinctions.
        !            86: See
        !            87: .ZR .
        !            88: .IP REG_NOSUB
        !            89: Compile for matching that need only report success or failure,
        !            90: not what was matched.
        !            91: .IP REG_NEWLINE
        !            92: Compile for newline-sensitive matching.
        !            93: By default, newline is a completely ordinary character with no special
        !            94: meaning in either REs or strings.
        !            95: With this flag,
        !            96: `[^' bracket expressions and `.' never match newline,
        !            97: a `^' anchor matches the null string after any newline in the string
        !            98: in addition to its normal function,
        !            99: and the `$' anchor matches the null string before any newline in the
        !           100: string in addition to its normal function.
        !           101: .IP REG_PEND
        !           102: The regular expression ends,
        !           103: not at the first NUL,
        !           104: but just before the character pointed to by the
        !           105: .I re_endp
        !           106: member of the structure pointed to by
        !           107: .IR preg .
        !           108: The
        !           109: .I re_endp
        !           110: member is of type
        !           111: .IR const\ char\ * .
        !           112: This flag permits inclusion of NULs in the RE;
        !           113: they are considered ordinary characters.
        !           114: This is an extension,
        !           115: compatible with but not specified by POSIX 1003.2,
        !           116: and should be used with
        !           117: caution in software intended to be portable to other systems.
        !           118: .PP
        !           119: When successful,
        !           120: .I regcomp
        !           121: returns 0 and fills in the structure pointed to by
        !           122: .IR preg .
        !           123: One member of that structure
        !           124: (other than
        !           125: .IR re_endp )
        !           126: is publicized:
        !           127: .IR re_nsub ,
        !           128: of type
        !           129: .IR size_t ,
        !           130: contains the number of parenthesized subexpressions within the RE
        !           131: (except that the value of this member is undefined if the
        !           132: REG_NOSUB flag was used).
        !           133: If
        !           134: .I regcomp
        !           135: fails, it returns a non-zero error code;
        !           136: see DIAGNOSTICS.
        !           137: .PP
        !           138: .I Regexec
        !           139: matches the compiled RE pointed to by
        !           140: .I preg
        !           141: against the
        !           142: .IR string ,
        !           143: subject to the flags in
        !           144: .IR eflags ,
        !           145: and reports results using
        !           146: .IR nmatch ,
        !           147: .IR pmatch ,
        !           148: and the returned value.
        !           149: The RE must have been compiled by a previous invocation of
        !           150: .IR regcomp .
        !           151: The compiled form is not altered during execution of
        !           152: .IR regexec ,
        !           153: so a single compiled RE can be used simultaneously by multiple threads.
        !           154: .PP
        !           155: By default,
        !           156: the NUL-terminated string pointed to by
        !           157: .I string
        !           158: is considered to be the text of an entire line, minus any terminating
        !           159: newline.
        !           160: The
        !           161: .I eflags
        !           162: argument is the bitwise OR of zero or more of the following flags:
        !           163: .IP REG_NOTBOL \w'REG_STARTEND'u+2n
        !           164: The first character of
        !           165: the string
        !           166: is not the beginning of a line, so the `^' anchor should not match before it.
        !           167: This does not affect the behavior of newlines under REG_NEWLINE.
        !           168: .IP REG_NOTEOL
        !           169: The NUL terminating
        !           170: the string
        !           171: does not end a line, so the `$' anchor should not match before it.
        !           172: This does not affect the behavior of newlines under REG_NEWLINE.
        !           173: .IP REG_STARTEND
        !           174: The string is considered to start at
        !           175: \fIstring\fR\ + \fIpmatch\fR[0].\fIrm_so\fR
        !           176: and to have a terminating NUL located at
        !           177: \fIstring\fR\ + \fIpmatch\fR[0].\fIrm_eo\fR
        !           178: (there need not actually be a NUL at that location),
        !           179: regardless of the value of
        !           180: .IR nmatch .
        !           181: See below for the definition of
        !           182: .IR pmatch
        !           183: and
        !           184: .IR nmatch .
        !           185: This is an extension,
        !           186: compatible with but not specified by POSIX 1003.2,
        !           187: and should be used with
        !           188: caution in software intended to be portable to other systems.
        !           189: Note that a non-zero \fIrm_so\fR does not imply REG_NOTBOL;
        !           190: REG_STARTEND affects only the location of the string,
        !           191: not how it is matched.
        !           192: .PP
        !           193: See
        !           194: .ZR
        !           195: for a discussion of what is matched in situations where an RE or a
        !           196: portion thereof could match any of several substrings of
        !           197: .IR string .
        !           198: .PP
        !           199: Normally,
        !           200: .I regexec
        !           201: returns 0 for success and the non-zero code REG_NOMATCH for failure.
        !           202: Other non-zero error codes may be returned in exceptional situations;
        !           203: see DIAGNOSTICS.
        !           204: .PP
        !           205: If REG_NOSUB was specified in the compilation of the RE,
        !           206: or if
        !           207: .I nmatch
        !           208: is 0,
        !           209: .I regexec
        !           210: ignores the
        !           211: .I pmatch
        !           212: argument (but see below for the case where REG_STARTEND is specified).
        !           213: Otherwise,
        !           214: .I pmatch
        !           215: points to an array of
        !           216: .I nmatch
        !           217: structures of type
        !           218: .IR regmatch_t .
        !           219: Such a structure has at least the members
        !           220: .I rm_so
        !           221: and
        !           222: .IR rm_eo ,
        !           223: both of type
        !           224: .I regoff_t
        !           225: (a signed arithmetic type at least as large as an
        !           226: .I off_t
        !           227: and a
        !           228: .IR ssize_t ),
        !           229: containing respectively the offset of the first character of a substring
        !           230: and the offset of the first character after the end of the substring.
        !           231: Offsets are measured from the beginning of the
        !           232: .I string
        !           233: argument given to
        !           234: .IR regexec .
        !           235: An empty substring is denoted by equal offsets,
        !           236: both indicating the character following the empty substring.
        !           237: .PP
        !           238: The 0th member of the
        !           239: .I pmatch
        !           240: array is filled in to indicate what substring of
        !           241: .I string
        !           242: was matched by the entire RE.
        !           243: Remaining members report what substring was matched by parenthesized
        !           244: subexpressions within the RE;
        !           245: member
        !           246: .I i
        !           247: reports subexpression
        !           248: .IR i ,
        !           249: with subexpressions counted (starting at 1) by the order of their opening
        !           250: parentheses in the RE, left to right.
        !           251: Unused entries in the array\(emcorresponding either to subexpressions that
        !           252: did not participate in the match at all, or to subexpressions that do not
        !           253: exist in the RE (that is, \fIi\fR\ > \fIpreg\fR\->\fIre_nsub\fR)\(emhave both
        !           254: .I rm_so
        !           255: and
        !           256: .I rm_eo
        !           257: set to \-1.
        !           258: If a subexpression participated in the match several times,
        !           259: the reported substring is the last one it matched.
        !           260: (Note, as an example in particular, that when the RE `(b*)+' matches `bbb',
        !           261: the parenthesized subexpression matches each of the three `b's and then
        !           262: an infinite number of empty strings following the last `b',
        !           263: so the reported substring is one of the empties.)
        !           264: .PP
        !           265: If REG_STARTEND is specified,
        !           266: .I pmatch
        !           267: must point to at least one
        !           268: .I regmatch_t
        !           269: (even if
        !           270: .I nmatch
        !           271: is 0 or REG_NOSUB was specified),
        !           272: to hold the input offsets for REG_STARTEND.
        !           273: Use for output is still entirely controlled by
        !           274: .IR nmatch ;
        !           275: if
        !           276: .I nmatch
        !           277: is 0 or REG_NOSUB was specified,
        !           278: the value of
        !           279: .IR pmatch [0]
        !           280: will not be changed by a successful
        !           281: .IR regexec .
        !           282: .PP
        !           283: .I Regerror
        !           284: maps a non-zero
        !           285: .I errcode
        !           286: from either
        !           287: .I regcomp
        !           288: or
        !           289: .I regexec
        !           290: to a human-readable, printable message.
        !           291: If
        !           292: .I preg
        !           293: is non-NULL,
        !           294: the error code should have arisen from use of
        !           295: the
        !           296: .I regex_t
        !           297: pointed to by
        !           298: .IR preg ,
        !           299: and if the error code came from
        !           300: .IR regcomp ,
        !           301: it should have been the result from the most recent
        !           302: .I regcomp
        !           303: using that
        !           304: .IR regex_t .
        !           305: .RI ( Regerror
        !           306: may be able to supply a more detailed message using information
        !           307: from the
        !           308: .IR regex_t .)
        !           309: .I Regerror
        !           310: places the NUL-terminated message into the buffer pointed to by
        !           311: .IR errbuf ,
        !           312: limiting the length (including the NUL) to at most
        !           313: .I errbuf_size
        !           314: bytes.
        !           315: If the whole message won't fit,
        !           316: as much of it as will fit before the terminating NUL is supplied.
        !           317: In any case,
        !           318: the returned value is the size of buffer needed to hold the whole
        !           319: message (including terminating NUL).
        !           320: If
        !           321: .I errbuf_size
        !           322: is 0,
        !           323: .I errbuf
        !           324: is ignored but the return value is still correct.
        !           325: .PP
        !           326: If the
        !           327: .I errcode
        !           328: given to
        !           329: .I regerror
        !           330: is first ORed with REG_ITOA,
        !           331: the ``message'' that results is the printable name of the error code,
        !           332: e.g. ``REG_NOMATCH'',
        !           333: rather than an explanation thereof.
        !           334: If
        !           335: .I errcode
        !           336: is REG_ATOI,
        !           337: then
        !           338: .I preg
        !           339: shall be non-NULL and the
        !           340: .I re_endp
        !           341: member of the structure it points to
        !           342: must point to the printable name of an error code;
        !           343: in this case, the result in
        !           344: .I errbuf
        !           345: is the decimal digits of
        !           346: the numeric value of the error code
        !           347: (0 if the name is not recognized).
        !           348: REG_ITOA and REG_ATOI are intended primarily as debugging facilities;
        !           349: they are extensions,
        !           350: compatible with but not specified by POSIX 1003.2,
        !           351: and should be used with
        !           352: caution in software intended to be portable to other systems.
        !           353: Be warned also that they are considered experimental and changes are possible.
        !           354: .PP
        !           355: .I Regfree
        !           356: frees any dynamically-allocated storage associated with the compiled RE
        !           357: pointed to by
        !           358: .IR preg .
        !           359: The remaining
        !           360: .I regex_t
        !           361: is no longer a valid compiled RE
        !           362: and the effect of supplying it to
        !           363: .I regexec
        !           364: or
        !           365: .I regerror
        !           366: is undefined.
        !           367: .PP
        !           368: None of these functions references global variables except for tables
        !           369: of constants;
        !           370: all are safe for use from multiple threads if the arguments are safe.
        !           371: .SH IMPLEMENTATION CHOICES
        !           372: There are a number of decisions that 1003.2 leaves up to the implementor,
        !           373: either by explicitly saying ``undefined'' or by virtue of them being
        !           374: forbidden by the RE grammar.
        !           375: This implementation treats them as follows.
        !           376: .PP
        !           377: See
        !           378: .ZR
        !           379: for a discussion of the definition of case-independent matching.
        !           380: .PP
        !           381: There is no particular limit on the length of REs,
        !           382: except insofar as memory is limited.
        !           383: Memory usage is approximately linear in RE size, and largely insensitive
        !           384: to RE complexity, except for bounded repetitions.
        !           385: See BUGS for one short RE using them
        !           386: that will run almost any system out of memory.
        !           387: .PP
        !           388: A backslashed character other than one specifically given a magic meaning
        !           389: by 1003.2 (such magic meanings occur only in obsolete [``basic''] REs)
        !           390: is taken as an ordinary character.
        !           391: .PP
        !           392: Any unmatched [ is a REG_EBRACK error.
        !           393: .PP
        !           394: Equivalence classes cannot begin or end bracket-expression ranges.
        !           395: The endpoint of one range cannot begin another.
        !           396: .PP
        !           397: RE_DUP_MAX, the limit on repetition counts in bounded repetitions, is 255.
        !           398: .PP
        !           399: A repetition operator (?, *, +, or bounds) cannot follow another
        !           400: repetition operator.
        !           401: A repetition operator cannot begin an expression or subexpression
        !           402: or follow `^' or `|'.
        !           403: .PP
        !           404: `|' cannot appear first or last in a (sub)expression or after another `|',
        !           405: i.e. an operand of `|' cannot be an empty subexpression.
        !           406: An empty parenthesized subexpression, `()', is legal and matches an
        !           407: empty (sub)string.
        !           408: An empty string is not a legal RE.
        !           409: .PP
        !           410: A `{' followed by a digit is considered the beginning of bounds for a
        !           411: bounded repetition, which must then follow the syntax for bounds.
        !           412: A `{' \fInot\fR followed by a digit is considered an ordinary character.
        !           413: .PP
        !           414: `^' and `$' beginning and ending subexpressions in obsolete (``basic'')
        !           415: REs are anchors, not ordinary characters.
        !           416: .SH SEE ALSO
        !           417: grep(1), regex(7)
        !           418: .PP
        !           419: POSIX 1003.2, sections 2.8 (Regular Expression Notation)
        !           420: and
        !           421: B.5 (C Binding for Regular Expression Matching).
        !           422: .SH DIAGNOSTICS
        !           423: Non-zero error codes from
        !           424: .I regcomp
        !           425: and
        !           426: .I regexec
        !           427: include the following:
        !           428: .PP
        !           429: .nf
        !           430: .ta \w'REG_ECOLLATE'u+3n
        !           431: REG_NOMATCH    regexec() failed to match
        !           432: REG_BADPAT     invalid regular expression
        !           433: REG_ECOLLATE   invalid collating element
        !           434: REG_ECTYPE     invalid character class
        !           435: REG_EESCAPE    \e applied to unescapable character
        !           436: REG_ESUBREG    invalid backreference number
        !           437: REG_EBRACK     brackets [ ] not balanced
        !           438: REG_EPAREN     parentheses ( ) not balanced
        !           439: REG_EBRACE     braces { } not balanced
        !           440: REG_BADBR      invalid repetition count(s) in { }
        !           441: REG_ERANGE     invalid character range in [ ]
        !           442: REG_ESPACE     ran out of memory
        !           443: REG_BADRPT     ?, *, or + operand invalid
        !           444: REG_EMPTY      empty (sub)expression
        !           445: REG_ASSERT     ``can't happen''\(emyou found a bug
        !           446: REG_INVARG     invalid argument, e.g. negative-length string
        !           447: .fi
        !           448: .SH HISTORY
        !           449: Written by Henry Spencer at University of Toronto,
        !           450: henry@zoo.toronto.edu.
        !           451: .SH BUGS
        !           452: This is an alpha release with known defects.
        !           453: Please report problems.
        !           454: .PP
        !           455: There is one known functionality bug.
        !           456: The implementation of internationalization is incomplete:
        !           457: the locale is always assumed to be the default one of 1003.2,
        !           458: and only the collating elements etc. of that locale are available.
        !           459: .PP
        !           460: The back-reference code is subtle and doubts linger about its correctness
        !           461: in complex cases.
        !           462: .PP
        !           463: .I Regexec
        !           464: performance is poor.
        !           465: This will improve with later releases.
        !           466: .I Nmatch
        !           467: exceeding 0 is expensive;
        !           468: .I nmatch
        !           469: exceeding 1 is worse.
        !           470: .I Regexec
        !           471: is largely insensitive to RE complexity \fIexcept\fR that back
        !           472: references are massively expensive.
        !           473: RE length does matter; in particular, there is a strong speed bonus
        !           474: for keeping RE length under about 30 characters,
        !           475: with most special characters counting roughly double.
        !           476: .PP
        !           477: .I Regcomp
        !           478: implements bounded repetitions by macro expansion,
        !           479: which is costly in time and space if counts are large
        !           480: or bounded repetitions are nested.
        !           481: An RE like, say,
        !           482: `((((a{1,100}){1,100}){1,100}){1,100}){1,100}'
        !           483: will (eventually) run almost any existing machine out of swap space.
        !           484: .PP
        !           485: There are suspected problems with response to obscure error conditions.
        !           486: Notably,
        !           487: certain kinds of internal overflow,
        !           488: produced only by truly enormous REs or by multiply nested bounded repetitions,
        !           489: are probably not handled well.
        !           490: .PP
        !           491: Due to a mistake in 1003.2, things like `a)b' are legal REs because `)' is
        !           492: a special character only in the presence of a previous unmatched `('.
        !           493: This can't be fixed until the spec is fixed.
        !           494: .PP
        !           495: The standard's definition of back references is vague.
        !           496: For example, does
        !           497: `a\e(\e(b\e)*\e2\e)*d' match `abbbd'?
        !           498: Until the standard is clarified,
        !           499: behavior in such cases should not be relied on.
        !           500: .PP
        !           501: The implementation of word-boundary matching is a bit of a kludge,
        !           502: and bugs may lurk in combinations of word-boundary matching and anchoring.

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>