Annotation of embedaddon/pcre/doc/html/pcrecpp.html, revision 1.1

1.1     ! misho       1: <html>
        !             2: <head>
        !             3: <title>pcrecpp specification</title>
        !             4: </head>
        !             5: <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
        !             6: <h1>pcrecpp man page</h1>
        !             7: <p>
        !             8: Return to the <a href="index.html">PCRE index page</a>.
        !             9: </p>
        !            10: <p>
        !            11: This page is part of the PCRE HTML documentation. It was generated automatically
        !            12: from the original man page. If there is any nonsense in it, please consult the
        !            13: man page, in case the conversion went wrong.
        !            14: <br>
        !            15: <ul>
        !            16: <li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
        !            17: <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
        !            18: <li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
        !            19: <li><a name="TOC4" href="#SEC4">QUOTING METACHARACTERS</a>
        !            20: <li><a name="TOC5" href="#SEC5">PARTIAL MATCHES</a>
        !            21: <li><a name="TOC6" href="#SEC6">UTF-8 AND THE MATCHING INTERFACE</a>
        !            22: <li><a name="TOC7" href="#SEC7">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
        !            23: <li><a name="TOC8" href="#SEC8">SCANNING TEXT INCREMENTALLY</a>
        !            24: <li><a name="TOC9" href="#SEC9">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
        !            25: <li><a name="TOC10" href="#SEC10">REPLACING PARTS OF STRINGS</a>
        !            26: <li><a name="TOC11" href="#SEC11">AUTHOR</a>
        !            27: <li><a name="TOC12" href="#SEC12">REVISION</a>
        !            28: </ul>
        !            29: <br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
        !            30: <P>
        !            31: <b>#include &#60;pcrecpp.h&#62;</b>
        !            32: </P>
        !            33: <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
        !            34: <P>
        !            35: The C++ wrapper for PCRE was provided by Google Inc. Some additional
        !            36: functionality was added by Giuseppe Maxia. This brief man page was constructed
        !            37: from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
        !            38: further details.
        !            39: </P>
        !            40: <br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
        !            41: <P>
        !            42: The "FullMatch" operation checks that supplied text matches a supplied pattern
        !            43: exactly. If pointer arguments are supplied, it copies matched sub-strings that
        !            44: match sub-patterns into them.
        !            45: <pre>
        !            46:   Example: successful match
        !            47:      pcrecpp::RE re("h.*o");
        !            48:      re.FullMatch("hello");
        !            49: 
        !            50:   Example: unsuccessful match (requires full match):
        !            51:      pcrecpp::RE re("e");
        !            52:      !re.FullMatch("hello");
        !            53: 
        !            54:   Example: creating a temporary RE object:
        !            55:      pcrecpp::RE("h.*o").FullMatch("hello");
        !            56: </pre>
        !            57: You can pass in a "const char*" or a "string" for "text". The examples below
        !            58: tend to use a const char*. You can, as in the different examples above, store
        !            59: the RE object explicitly in a variable or use a temporary RE object. The
        !            60: examples below use one mode or the other arbitrarily. Either could correctly be
        !            61: used for any of these examples.
        !            62: </P>
        !            63: <P>
        !            64: You must supply extra pointer arguments to extract matched subpieces.
        !            65: <pre>
        !            66:   Example: extracts "ruby" into "s" and 1234 into "i"
        !            67:      int i;
        !            68:      string s;
        !            69:      pcrecpp::RE re("(\\w+):(\\d+)");
        !            70:      re.FullMatch("ruby:1234", &s, &i);
        !            71: 
        !            72:   Example: does not try to extract any extra sub-patterns
        !            73:      re.FullMatch("ruby:1234", &s);
        !            74: 
        !            75:   Example: does not try to extract into NULL
        !            76:      re.FullMatch("ruby:1234", NULL, &i);
        !            77: 
        !            78:   Example: integer overflow causes failure
        !            79:      !re.FullMatch("ruby:1234567891234", NULL, &i);
        !            80: 
        !            81:   Example: fails because there aren't enough sub-patterns:
        !            82:      !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
        !            83: 
        !            84:   Example: fails because string cannot be stored in integer
        !            85:      !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
        !            86: </pre>
        !            87: The provided pointer arguments can be pointers to any scalar numeric
        !            88: type, or one of:
        !            89: <pre>
        !            90:    string        (matched piece is copied to string)
        !            91:    StringPiece   (StringPiece is mutated to point to matched piece)
        !            92:    T             (where "bool T::ParseFrom(const char*, int)" exists)
        !            93:    NULL          (the corresponding matched sub-pattern is not copied)
        !            94: </pre>
        !            95: The function returns true iff all of the following conditions are satisfied:
        !            96: <pre>
        !            97:   a. "text" matches "pattern" exactly;
        !            98: 
        !            99:   b. The number of matched sub-patterns is &#62;= number of supplied
        !           100:      pointers;
        !           101: 
        !           102:   c. The "i"th argument has a suitable type for holding the
        !           103:      string captured as the "i"th sub-pattern. If you pass in
        !           104:      void * NULL for the "i"th argument, or a non-void * NULL
        !           105:      of the correct type, or pass fewer arguments than the
        !           106:      number of sub-patterns, "i"th captured sub-pattern is
        !           107:      ignored.
        !           108: </pre>
        !           109: CAVEAT: An optional sub-pattern that does not exist in the matched
        !           110: string is assigned the empty string. Therefore, the following will
        !           111: return false (because the empty string is not a valid number):
        !           112: <pre>
        !           113:    int number;
        !           114:    pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
        !           115: </pre>
        !           116: The matching interface supports at most 16 arguments per call.
        !           117: If you need more, consider using the more general interface
        !           118: <b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
        !           119: <b>DoMatch</b>.
        !           120: </P>
        !           121: <P>
        !           122: NOTE: Do not use <b>no_arg</b>, which is used internally to mark the end of a
        !           123: list of optional arguments, as a placeholder for missing arguments, as this can
        !           124: lead to segfaults.
        !           125: </P>
        !           126: <br><a name="SEC4" href="#TOC1">QUOTING METACHARACTERS</a><br>
        !           127: <P>
        !           128: You can use the "QuoteMeta" operation to insert backslashes before all
        !           129: potentially meaningful characters in a string. The returned string, used as a
        !           130: regular expression, will exactly match the original string.
        !           131: <pre>
        !           132:   Example:
        !           133:      string quoted = RE::QuoteMeta(unquoted);
        !           134: </pre>
        !           135: Note that it's legal to escape a character even if it has no special meaning in
        !           136: a regular expression -- so this function does that. (This also makes it
        !           137: identical to the perl function of the same name; see "perldoc -f quotemeta".)
        !           138: For example, "1.5-2.0?" becomes "1\.5\-2\.0\?".
        !           139: </P>
        !           140: <br><a name="SEC5" href="#TOC1">PARTIAL MATCHES</a><br>
        !           141: <P>
        !           142: You can use the "PartialMatch" operation when you want the pattern
        !           143: to match any substring of the text.
        !           144: <pre>
        !           145:   Example: simple search for a string:
        !           146:      pcrecpp::RE("ell").PartialMatch("hello");
        !           147: 
        !           148:   Example: find first number in a string:
        !           149:      int number;
        !           150:      pcrecpp::RE re("(\\d+)");
        !           151:      re.PartialMatch("x*100 + 20", &number);
        !           152:      assert(number == 100);
        !           153: </PRE>
        !           154: </P>
        !           155: <br><a name="SEC6" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
        !           156: <P>
        !           157: By default, pattern and text are plain text, one byte per character. The UTF8
        !           158: flag, passed to the constructor, causes both pattern and string to be treated
        !           159: as UTF-8 text, still a byte stream but potentially multiple bytes per
        !           160: character. In practice, the text is likelier to be UTF-8 than the pattern, but
        !           161: the match returned may depend on the UTF8 flag, so always use it when matching
        !           162: UTF8 text. For example, "." will match one byte normally but with UTF8 set may
        !           163: match up to three bytes of a multi-byte character.
        !           164: <pre>
        !           165:   Example:
        !           166:      pcrecpp::RE_Options options;
        !           167:      options.set_utf8();
        !           168:      pcrecpp::RE re(utf8_pattern, options);
        !           169:      re.FullMatch(utf8_string);
        !           170: 
        !           171:   Example: using the convenience function UTF8():
        !           172:      pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
        !           173:      re.FullMatch(utf8_string);
        !           174: </pre>
        !           175: NOTE: The UTF8 flag is ignored if pcre was not configured with the
        !           176: <pre>
        !           177:       --enable-utf8 flag.
        !           178: </PRE>
        !           179: </P>
        !           180: <br><a name="SEC7" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
        !           181: <P>
        !           182: PCRE defines some modifiers to change the behavior of the regular expression
        !           183: engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
        !           184: pass such modifiers to a RE class. Currently, the following modifiers are
        !           185: supported:
        !           186: <pre>
        !           187:    modifier              description               Perl corresponding
        !           188: 
        !           189:    PCRE_CASELESS         case insensitive match      /i
        !           190:    PCRE_MULTILINE        multiple lines match        /m
        !           191:    PCRE_DOTALL           dot matches newlines        /s
        !           192:    PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
        !           193:    PCRE_EXTRA            strict escape parsing       N/A
        !           194:    PCRE_EXTENDED         ignore whitespaces          /x
        !           195:    PCRE_UTF8             handles UTF8 chars          built-in
        !           196:    PCRE_UNGREEDY         reverses * and *?           N/A
        !           197:    PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
        !           198: </pre>
        !           199: (*) Both Perl and PCRE allow non capturing parentheses by means of the
        !           200: "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
        !           201: capture, while (ab|cd) does.
        !           202: </P>
        !           203: <P>
        !           204: For a full account on how each modifier works, please check the
        !           205: PCRE API reference page.
        !           206: </P>
        !           207: <P>
        !           208: For each modifier, there are two member functions whose name is made
        !           209: out of the modifier in lowercase, without the "PCRE_" prefix. For
        !           210: instance, PCRE_CASELESS is handled by
        !           211: <pre>
        !           212:   bool caseless()
        !           213: </pre>
        !           214: which returns true if the modifier is set, and
        !           215: <pre>
        !           216:   RE_Options & set_caseless(bool)
        !           217: </pre>
        !           218: which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
        !           219: accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
        !           220: functions. Setting <i>match_limit</i> to a non-zero value will limit the
        !           221: execution of pcre to keep it from doing bad things like blowing the stack or
        !           222: taking an eternity to return a result. A value of 5000 is good enough to stop
        !           223: stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
        !           224: match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
        !           225: which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
        !           226: recurses. <b>match_limit()</b> limits the number of matches PCRE does;
        !           227: <b>match_limit_recursion()</b> limits the depth of internal recursion, and
        !           228: therefore the amount of stack that is used.
        !           229: </P>
        !           230: <P>
        !           231: Normally, to pass one or more modifiers to a RE class, you declare
        !           232: a <i>RE_Options</i> object, set the appropriate options, and pass this
        !           233: object to a RE constructor. Example:
        !           234: <pre>
        !           235:    RE_Options opt;
        !           236:    opt.set_caseless(true);
        !           237:    if (RE("HELLO", opt).PartialMatch("hello world")) ...
        !           238: </pre>
        !           239: RE_options has two constructors. The default constructor takes no arguments and
        !           240: creates a set of flags that are off by default. The optional parameter
        !           241: <i>option_flags</i> is to facilitate transfer of legacy code from C programs.
        !           242: This lets you do
        !           243: <pre>
        !           244:    RE(pattern,
        !           245:      RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
        !           246: </pre>
        !           247: However, new code is better off doing
        !           248: <pre>
        !           249:    RE(pattern,
        !           250:      RE_Options().set_caseless(true).set_multiline(true))
        !           251:        .PartialMatch(str);
        !           252: </pre>
        !           253: If you are going to pass one of the most used modifiers, there are some
        !           254: convenience functions that return a RE_Options class with the
        !           255: appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
        !           256: <b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
        !           257: </P>
        !           258: <P>
        !           259: If you need to set several options at once, and you don't want to go through
        !           260: the pains of declaring a RE_Options object and setting several options, there
        !           261: is a parallel method that give you such ability on the fly. You can concatenate
        !           262: several <b>set_xxxxx()</b> member functions, since each of them returns a
        !           263: reference to its class object. For example, to pass PCRE_CASELESS,
        !           264: PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
        !           265: <pre>
        !           266:    RE(" ^ xyz \\s+ .* blah$",
        !           267:      RE_Options()
        !           268:        .set_caseless(true)
        !           269:        .set_extended(true)
        !           270:        .set_multiline(true)).PartialMatch(sometext);
        !           271: 
        !           272: </PRE>
        !           273: </P>
        !           274: <br><a name="SEC8" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
        !           275: <P>
        !           276: The "Consume" operation may be useful if you want to repeatedly
        !           277: match regular expressions at the front of a string and skip over
        !           278: them as they match. This requires use of the "StringPiece" type,
        !           279: which represents a sub-range of a real string. Like RE, StringPiece
        !           280: is defined in the pcrecpp namespace.
        !           281: <pre>
        !           282:   Example: read lines of the form "var = value" from a string.
        !           283:      string contents = ...;                 // Fill string somehow
        !           284:      pcrecpp::StringPiece input(contents);  // Wrap in a StringPiece
        !           285: 
        !           286:      string var;
        !           287:      int value;
        !           288:      pcrecpp::RE re("(\\w+) = (\\d+)\n");
        !           289:      while (re.Consume(&input, &var, &value)) {
        !           290:        ...;
        !           291:      }
        !           292: </pre>
        !           293: Each successful call to "Consume" will set "var/value", and also
        !           294: advance "input" so it points past the matched text.
        !           295: </P>
        !           296: <P>
        !           297: The "FindAndConsume" operation is similar to "Consume" but does not
        !           298: anchor your match at the beginning of the string. For example, you
        !           299: could extract all words from a string by repeatedly calling
        !           300: <pre>
        !           301:   pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
        !           302: </PRE>
        !           303: </P>
        !           304: <br><a name="SEC9" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
        !           305: <P>
        !           306: By default, if you pass a pointer to a numeric value, the
        !           307: corresponding text is interpreted as a base-10 number. You can
        !           308: instead wrap the pointer with a call to one of the operators Hex(),
        !           309: Octal(), or CRadix() to interpret the text in another base. The
        !           310: CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
        !           311: prefixes, but defaults to base-10.
        !           312: <pre>
        !           313:   Example:
        !           314:     int a, b, c, d;
        !           315:     pcrecpp::RE re("(.*) (.*) (.*) (.*)");
        !           316:     re.FullMatch("100 40 0100 0x40",
        !           317:                  pcrecpp::Octal(&a), pcrecpp::Hex(&b),
        !           318:                  pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
        !           319: </pre>
        !           320: will leave 64 in a, b, c, and d.
        !           321: </P>
        !           322: <br><a name="SEC10" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
        !           323: <P>
        !           324: You can replace the first match of "pattern" in "str" with "rewrite".
        !           325: Within "rewrite", backslash-escaped digits (\1 to \9) can be
        !           326: used to insert text matching corresponding parenthesized group
        !           327: from the pattern. \0 in "rewrite" refers to the entire matching
        !           328: text. For example:
        !           329: <pre>
        !           330:   string s = "yabba dabba doo";
        !           331:   pcrecpp::RE("b+").Replace("d", &s);
        !           332: </pre>
        !           333: will leave "s" containing "yada dabba doo". The result is true if the pattern
        !           334: matches and a replacement occurs, false otherwise.
        !           335: </P>
        !           336: <P>
        !           337: <b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
        !           338: occurrences of the pattern in the string with the rewrite. Replacements are
        !           339: not subject to re-matching. For example:
        !           340: <pre>
        !           341:   string s = "yabba dabba doo";
        !           342:   pcrecpp::RE("b+").GlobalReplace("d", &s);
        !           343: </pre>
        !           344: will leave "s" containing "yada dada doo". It returns the number of
        !           345: replacements made.
        !           346: </P>
        !           347: <P>
        !           348: <b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
        !           349: "rewrite" is copied into "out" (an additional argument) with substitutions.
        !           350: The non-matching portions of "text" are ignored. Returns true iff a match
        !           351: occurred and the extraction happened successfully;  if no match occurs, the
        !           352: string is left unaffected.
        !           353: </P>
        !           354: <br><a name="SEC11" href="#TOC1">AUTHOR</a><br>
        !           355: <P>
        !           356: The C++ wrapper was contributed by Google Inc.
        !           357: <br>
        !           358: Copyright &copy; 2007 Google Inc.
        !           359: <br>
        !           360: </P>
        !           361: <br><a name="SEC12" href="#TOC1">REVISION</a><br>
        !           362: <P>
        !           363: Last updated: 17 March 2009
        !           364: <br>
        !           365: Minor typo fixed: 25 July 2011
        !           366: <br>
        !           367: <p>
        !           368: Return to the <a href="index.html">PCRE index page</a>.
        !           369: </p>

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>