Annotation of embedaddon/pcre/doc/html/pcrecpp.html, revision 1.1
1.1 ! misho 1: <html>
! 2: <head>
! 3: <title>pcrecpp specification</title>
! 4: </head>
! 5: <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
! 6: <h1>pcrecpp man page</h1>
! 7: <p>
! 8: Return to the <a href="index.html">PCRE index page</a>.
! 9: </p>
! 10: <p>
! 11: This page is part of the PCRE HTML documentation. It was generated automatically
! 12: from the original man page. If there is any nonsense in it, please consult the
! 13: man page, in case the conversion went wrong.
! 14: <br>
! 15: <ul>
! 16: <li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
! 17: <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
! 18: <li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
! 19: <li><a name="TOC4" href="#SEC4">QUOTING METACHARACTERS</a>
! 20: <li><a name="TOC5" href="#SEC5">PARTIAL MATCHES</a>
! 21: <li><a name="TOC6" href="#SEC6">UTF-8 AND THE MATCHING INTERFACE</a>
! 22: <li><a name="TOC7" href="#SEC7">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
! 23: <li><a name="TOC8" href="#SEC8">SCANNING TEXT INCREMENTALLY</a>
! 24: <li><a name="TOC9" href="#SEC9">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
! 25: <li><a name="TOC10" href="#SEC10">REPLACING PARTS OF STRINGS</a>
! 26: <li><a name="TOC11" href="#SEC11">AUTHOR</a>
! 27: <li><a name="TOC12" href="#SEC12">REVISION</a>
! 28: </ul>
! 29: <br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
! 30: <P>
! 31: <b>#include <pcrecpp.h></b>
! 32: </P>
! 33: <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
! 34: <P>
! 35: The C++ wrapper for PCRE was provided by Google Inc. Some additional
! 36: functionality was added by Giuseppe Maxia. This brief man page was constructed
! 37: from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
! 38: further details.
! 39: </P>
! 40: <br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
! 41: <P>
! 42: The "FullMatch" operation checks that supplied text matches a supplied pattern
! 43: exactly. If pointer arguments are supplied, it copies matched sub-strings that
! 44: match sub-patterns into them.
! 45: <pre>
! 46: Example: successful match
! 47: pcrecpp::RE re("h.*o");
! 48: re.FullMatch("hello");
! 49:
! 50: Example: unsuccessful match (requires full match):
! 51: pcrecpp::RE re("e");
! 52: !re.FullMatch("hello");
! 53:
! 54: Example: creating a temporary RE object:
! 55: pcrecpp::RE("h.*o").FullMatch("hello");
! 56: </pre>
! 57: You can pass in a "const char*" or a "string" for "text". The examples below
! 58: tend to use a const char*. You can, as in the different examples above, store
! 59: the RE object explicitly in a variable or use a temporary RE object. The
! 60: examples below use one mode or the other arbitrarily. Either could correctly be
! 61: used for any of these examples.
! 62: </P>
! 63: <P>
! 64: You must supply extra pointer arguments to extract matched subpieces.
! 65: <pre>
! 66: Example: extracts "ruby" into "s" and 1234 into "i"
! 67: int i;
! 68: string s;
! 69: pcrecpp::RE re("(\\w+):(\\d+)");
! 70: re.FullMatch("ruby:1234", &s, &i);
! 71:
! 72: Example: does not try to extract any extra sub-patterns
! 73: re.FullMatch("ruby:1234", &s);
! 74:
! 75: Example: does not try to extract into NULL
! 76: re.FullMatch("ruby:1234", NULL, &i);
! 77:
! 78: Example: integer overflow causes failure
! 79: !re.FullMatch("ruby:1234567891234", NULL, &i);
! 80:
! 81: Example: fails because there aren't enough sub-patterns:
! 82: !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
! 83:
! 84: Example: fails because string cannot be stored in integer
! 85: !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
! 86: </pre>
! 87: The provided pointer arguments can be pointers to any scalar numeric
! 88: type, or one of:
! 89: <pre>
! 90: string (matched piece is copied to string)
! 91: StringPiece (StringPiece is mutated to point to matched piece)
! 92: T (where "bool T::ParseFrom(const char*, int)" exists)
! 93: NULL (the corresponding matched sub-pattern is not copied)
! 94: </pre>
! 95: The function returns true iff all of the following conditions are satisfied:
! 96: <pre>
! 97: a. "text" matches "pattern" exactly;
! 98:
! 99: b. The number of matched sub-patterns is >= number of supplied
! 100: pointers;
! 101:
! 102: c. The "i"th argument has a suitable type for holding the
! 103: string captured as the "i"th sub-pattern. If you pass in
! 104: void * NULL for the "i"th argument, or a non-void * NULL
! 105: of the correct type, or pass fewer arguments than the
! 106: number of sub-patterns, "i"th captured sub-pattern is
! 107: ignored.
! 108: </pre>
! 109: CAVEAT: An optional sub-pattern that does not exist in the matched
! 110: string is assigned the empty string. Therefore, the following will
! 111: return false (because the empty string is not a valid number):
! 112: <pre>
! 113: int number;
! 114: pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
! 115: </pre>
! 116: The matching interface supports at most 16 arguments per call.
! 117: If you need more, consider using the more general interface
! 118: <b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
! 119: <b>DoMatch</b>.
! 120: </P>
! 121: <P>
! 122: NOTE: Do not use <b>no_arg</b>, which is used internally to mark the end of a
! 123: list of optional arguments, as a placeholder for missing arguments, as this can
! 124: lead to segfaults.
! 125: </P>
! 126: <br><a name="SEC4" href="#TOC1">QUOTING METACHARACTERS</a><br>
! 127: <P>
! 128: You can use the "QuoteMeta" operation to insert backslashes before all
! 129: potentially meaningful characters in a string. The returned string, used as a
! 130: regular expression, will exactly match the original string.
! 131: <pre>
! 132: Example:
! 133: string quoted = RE::QuoteMeta(unquoted);
! 134: </pre>
! 135: Note that it's legal to escape a character even if it has no special meaning in
! 136: a regular expression -- so this function does that. (This also makes it
! 137: identical to the perl function of the same name; see "perldoc -f quotemeta".)
! 138: For example, "1.5-2.0?" becomes "1\.5\-2\.0\?".
! 139: </P>
! 140: <br><a name="SEC5" href="#TOC1">PARTIAL MATCHES</a><br>
! 141: <P>
! 142: You can use the "PartialMatch" operation when you want the pattern
! 143: to match any substring of the text.
! 144: <pre>
! 145: Example: simple search for a string:
! 146: pcrecpp::RE("ell").PartialMatch("hello");
! 147:
! 148: Example: find first number in a string:
! 149: int number;
! 150: pcrecpp::RE re("(\\d+)");
! 151: re.PartialMatch("x*100 + 20", &number);
! 152: assert(number == 100);
! 153: </PRE>
! 154: </P>
! 155: <br><a name="SEC6" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
! 156: <P>
! 157: By default, pattern and text are plain text, one byte per character. The UTF8
! 158: flag, passed to the constructor, causes both pattern and string to be treated
! 159: as UTF-8 text, still a byte stream but potentially multiple bytes per
! 160: character. In practice, the text is likelier to be UTF-8 than the pattern, but
! 161: the match returned may depend on the UTF8 flag, so always use it when matching
! 162: UTF8 text. For example, "." will match one byte normally but with UTF8 set may
! 163: match up to three bytes of a multi-byte character.
! 164: <pre>
! 165: Example:
! 166: pcrecpp::RE_Options options;
! 167: options.set_utf8();
! 168: pcrecpp::RE re(utf8_pattern, options);
! 169: re.FullMatch(utf8_string);
! 170:
! 171: Example: using the convenience function UTF8():
! 172: pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
! 173: re.FullMatch(utf8_string);
! 174: </pre>
! 175: NOTE: The UTF8 flag is ignored if pcre was not configured with the
! 176: <pre>
! 177: --enable-utf8 flag.
! 178: </PRE>
! 179: </P>
! 180: <br><a name="SEC7" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
! 181: <P>
! 182: PCRE defines some modifiers to change the behavior of the regular expression
! 183: engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
! 184: pass such modifiers to a RE class. Currently, the following modifiers are
! 185: supported:
! 186: <pre>
! 187: modifier description Perl corresponding
! 188:
! 189: PCRE_CASELESS case insensitive match /i
! 190: PCRE_MULTILINE multiple lines match /m
! 191: PCRE_DOTALL dot matches newlines /s
! 192: PCRE_DOLLAR_ENDONLY $ matches only at end N/A
! 193: PCRE_EXTRA strict escape parsing N/A
! 194: PCRE_EXTENDED ignore whitespaces /x
! 195: PCRE_UTF8 handles UTF8 chars built-in
! 196: PCRE_UNGREEDY reverses * and *? N/A
! 197: PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
! 198: </pre>
! 199: (*) Both Perl and PCRE allow non capturing parentheses by means of the
! 200: "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
! 201: capture, while (ab|cd) does.
! 202: </P>
! 203: <P>
! 204: For a full account on how each modifier works, please check the
! 205: PCRE API reference page.
! 206: </P>
! 207: <P>
! 208: For each modifier, there are two member functions whose name is made
! 209: out of the modifier in lowercase, without the "PCRE_" prefix. For
! 210: instance, PCRE_CASELESS is handled by
! 211: <pre>
! 212: bool caseless()
! 213: </pre>
! 214: which returns true if the modifier is set, and
! 215: <pre>
! 216: RE_Options & set_caseless(bool)
! 217: </pre>
! 218: which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
! 219: accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
! 220: functions. Setting <i>match_limit</i> to a non-zero value will limit the
! 221: execution of pcre to keep it from doing bad things like blowing the stack or
! 222: taking an eternity to return a result. A value of 5000 is good enough to stop
! 223: stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
! 224: match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
! 225: which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
! 226: recurses. <b>match_limit()</b> limits the number of matches PCRE does;
! 227: <b>match_limit_recursion()</b> limits the depth of internal recursion, and
! 228: therefore the amount of stack that is used.
! 229: </P>
! 230: <P>
! 231: Normally, to pass one or more modifiers to a RE class, you declare
! 232: a <i>RE_Options</i> object, set the appropriate options, and pass this
! 233: object to a RE constructor. Example:
! 234: <pre>
! 235: RE_Options opt;
! 236: opt.set_caseless(true);
! 237: if (RE("HELLO", opt).PartialMatch("hello world")) ...
! 238: </pre>
! 239: RE_options has two constructors. The default constructor takes no arguments and
! 240: creates a set of flags that are off by default. The optional parameter
! 241: <i>option_flags</i> is to facilitate transfer of legacy code from C programs.
! 242: This lets you do
! 243: <pre>
! 244: RE(pattern,
! 245: RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
! 246: </pre>
! 247: However, new code is better off doing
! 248: <pre>
! 249: RE(pattern,
! 250: RE_Options().set_caseless(true).set_multiline(true))
! 251: .PartialMatch(str);
! 252: </pre>
! 253: If you are going to pass one of the most used modifiers, there are some
! 254: convenience functions that return a RE_Options class with the
! 255: appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
! 256: <b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
! 257: </P>
! 258: <P>
! 259: If you need to set several options at once, and you don't want to go through
! 260: the pains of declaring a RE_Options object and setting several options, there
! 261: is a parallel method that give you such ability on the fly. You can concatenate
! 262: several <b>set_xxxxx()</b> member functions, since each of them returns a
! 263: reference to its class object. For example, to pass PCRE_CASELESS,
! 264: PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
! 265: <pre>
! 266: RE(" ^ xyz \\s+ .* blah$",
! 267: RE_Options()
! 268: .set_caseless(true)
! 269: .set_extended(true)
! 270: .set_multiline(true)).PartialMatch(sometext);
! 271:
! 272: </PRE>
! 273: </P>
! 274: <br><a name="SEC8" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
! 275: <P>
! 276: The "Consume" operation may be useful if you want to repeatedly
! 277: match regular expressions at the front of a string and skip over
! 278: them as they match. This requires use of the "StringPiece" type,
! 279: which represents a sub-range of a real string. Like RE, StringPiece
! 280: is defined in the pcrecpp namespace.
! 281: <pre>
! 282: Example: read lines of the form "var = value" from a string.
! 283: string contents = ...; // Fill string somehow
! 284: pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
! 285:
! 286: string var;
! 287: int value;
! 288: pcrecpp::RE re("(\\w+) = (\\d+)\n");
! 289: while (re.Consume(&input, &var, &value)) {
! 290: ...;
! 291: }
! 292: </pre>
! 293: Each successful call to "Consume" will set "var/value", and also
! 294: advance "input" so it points past the matched text.
! 295: </P>
! 296: <P>
! 297: The "FindAndConsume" operation is similar to "Consume" but does not
! 298: anchor your match at the beginning of the string. For example, you
! 299: could extract all words from a string by repeatedly calling
! 300: <pre>
! 301: pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
! 302: </PRE>
! 303: </P>
! 304: <br><a name="SEC9" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
! 305: <P>
! 306: By default, if you pass a pointer to a numeric value, the
! 307: corresponding text is interpreted as a base-10 number. You can
! 308: instead wrap the pointer with a call to one of the operators Hex(),
! 309: Octal(), or CRadix() to interpret the text in another base. The
! 310: CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
! 311: prefixes, but defaults to base-10.
! 312: <pre>
! 313: Example:
! 314: int a, b, c, d;
! 315: pcrecpp::RE re("(.*) (.*) (.*) (.*)");
! 316: re.FullMatch("100 40 0100 0x40",
! 317: pcrecpp::Octal(&a), pcrecpp::Hex(&b),
! 318: pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
! 319: </pre>
! 320: will leave 64 in a, b, c, and d.
! 321: </P>
! 322: <br><a name="SEC10" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
! 323: <P>
! 324: You can replace the first match of "pattern" in "str" with "rewrite".
! 325: Within "rewrite", backslash-escaped digits (\1 to \9) can be
! 326: used to insert text matching corresponding parenthesized group
! 327: from the pattern. \0 in "rewrite" refers to the entire matching
! 328: text. For example:
! 329: <pre>
! 330: string s = "yabba dabba doo";
! 331: pcrecpp::RE("b+").Replace("d", &s);
! 332: </pre>
! 333: will leave "s" containing "yada dabba doo". The result is true if the pattern
! 334: matches and a replacement occurs, false otherwise.
! 335: </P>
! 336: <P>
! 337: <b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
! 338: occurrences of the pattern in the string with the rewrite. Replacements are
! 339: not subject to re-matching. For example:
! 340: <pre>
! 341: string s = "yabba dabba doo";
! 342: pcrecpp::RE("b+").GlobalReplace("d", &s);
! 343: </pre>
! 344: will leave "s" containing "yada dada doo". It returns the number of
! 345: replacements made.
! 346: </P>
! 347: <P>
! 348: <b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
! 349: "rewrite" is copied into "out" (an additional argument) with substitutions.
! 350: The non-matching portions of "text" are ignored. Returns true iff a match
! 351: occurred and the extraction happened successfully; if no match occurs, the
! 352: string is left unaffected.
! 353: </P>
! 354: <br><a name="SEC11" href="#TOC1">AUTHOR</a><br>
! 355: <P>
! 356: The C++ wrapper was contributed by Google Inc.
! 357: <br>
! 358: Copyright © 2007 Google Inc.
! 359: <br>
! 360: </P>
! 361: <br><a name="SEC12" href="#TOC1">REVISION</a><br>
! 362: <P>
! 363: Last updated: 17 March 2009
! 364: <br>
! 365: Minor typo fixed: 25 July 2011
! 366: <br>
! 367: <p>
! 368: Return to the <a href="index.html">PCRE index page</a>.
! 369: </p>
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>