Annotation of embedaddon/pcre/doc/pcrecpp.3, revision 1.1
1.1 ! misho 1: .TH PCRECPP 3
! 2: .SH NAME
! 3: PCRE - Perl-compatible regular expressions.
! 4: .SH "SYNOPSIS OF C++ WRAPPER"
! 5: .rs
! 6: .sp
! 7: .B #include <pcrecpp.h>
! 8: .
! 9: .SH DESCRIPTION
! 10: .rs
! 11: .sp
! 12: The C++ wrapper for PCRE was provided by Google Inc. Some additional
! 13: functionality was added by Giuseppe Maxia. This brief man page was constructed
! 14: from the notes in the \fIpcrecpp.h\fP file, which should be consulted for
! 15: further details.
! 16: .
! 17: .
! 18: .SH "MATCHING INTERFACE"
! 19: .rs
! 20: .sp
! 21: The "FullMatch" operation checks that supplied text matches a supplied pattern
! 22: exactly. If pointer arguments are supplied, it copies matched sub-strings that
! 23: match sub-patterns into them.
! 24: .sp
! 25: Example: successful match
! 26: pcrecpp::RE re("h.*o");
! 27: re.FullMatch("hello");
! 28: .sp
! 29: Example: unsuccessful match (requires full match):
! 30: pcrecpp::RE re("e");
! 31: !re.FullMatch("hello");
! 32: .sp
! 33: Example: creating a temporary RE object:
! 34: pcrecpp::RE("h.*o").FullMatch("hello");
! 35: .sp
! 36: You can pass in a "const char*" or a "string" for "text". The examples below
! 37: tend to use a const char*. You can, as in the different examples above, store
! 38: the RE object explicitly in a variable or use a temporary RE object. The
! 39: examples below use one mode or the other arbitrarily. Either could correctly be
! 40: used for any of these examples.
! 41: .P
! 42: You must supply extra pointer arguments to extract matched subpieces.
! 43: .sp
! 44: Example: extracts "ruby" into "s" and 1234 into "i"
! 45: int i;
! 46: string s;
! 47: pcrecpp::RE re("(\e\ew+):(\e\ed+)");
! 48: re.FullMatch("ruby:1234", &s, &i);
! 49: .sp
! 50: Example: does not try to extract any extra sub-patterns
! 51: re.FullMatch("ruby:1234", &s);
! 52: .sp
! 53: Example: does not try to extract into NULL
! 54: re.FullMatch("ruby:1234", NULL, &i);
! 55: .sp
! 56: Example: integer overflow causes failure
! 57: !re.FullMatch("ruby:1234567891234", NULL, &i);
! 58: .sp
! 59: Example: fails because there aren't enough sub-patterns:
! 60: !pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s);
! 61: .sp
! 62: Example: fails because string cannot be stored in integer
! 63: !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
! 64: .sp
! 65: The provided pointer arguments can be pointers to any scalar numeric
! 66: type, or one of:
! 67: .sp
! 68: string (matched piece is copied to string)
! 69: StringPiece (StringPiece is mutated to point to matched piece)
! 70: T (where "bool T::ParseFrom(const char*, int)" exists)
! 71: NULL (the corresponding matched sub-pattern is not copied)
! 72: .sp
! 73: The function returns true iff all of the following conditions are satisfied:
! 74: .sp
! 75: a. "text" matches "pattern" exactly;
! 76: .sp
! 77: b. The number of matched sub-patterns is >= number of supplied
! 78: pointers;
! 79: .sp
! 80: c. The "i"th argument has a suitable type for holding the
! 81: string captured as the "i"th sub-pattern. If you pass in
! 82: void * NULL for the "i"th argument, or a non-void * NULL
! 83: of the correct type, or pass fewer arguments than the
! 84: number of sub-patterns, "i"th captured sub-pattern is
! 85: ignored.
! 86: .sp
! 87: CAVEAT: An optional sub-pattern that does not exist in the matched
! 88: string is assigned the empty string. Therefore, the following will
! 89: return false (because the empty string is not a valid number):
! 90: .sp
! 91: int number;
! 92: pcrecpp::RE::FullMatch("abc", "[a-z]+(\e\ed+)?", &number);
! 93: .sp
! 94: The matching interface supports at most 16 arguments per call.
! 95: If you need more, consider using the more general interface
! 96: \fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for
! 97: \fBDoMatch\fP.
! 98: .P
! 99: NOTE: Do not use \fBno_arg\fP, which is used internally to mark the end of a
! 100: list of optional arguments, as a placeholder for missing arguments, as this can
! 101: lead to segfaults.
! 102: .
! 103: .
! 104: .SH "QUOTING METACHARACTERS"
! 105: .rs
! 106: .sp
! 107: You can use the "QuoteMeta" operation to insert backslashes before all
! 108: potentially meaningful characters in a string. The returned string, used as a
! 109: regular expression, will exactly match the original string.
! 110: .sp
! 111: Example:
! 112: string quoted = RE::QuoteMeta(unquoted);
! 113: .sp
! 114: Note that it's legal to escape a character even if it has no special meaning in
! 115: a regular expression -- so this function does that. (This also makes it
! 116: identical to the perl function of the same name; see "perldoc -f quotemeta".)
! 117: For example, "1.5-2.0?" becomes "1\e.5\e-2\e.0\e?".
! 118: .
! 119: .SH "PARTIAL MATCHES"
! 120: .rs
! 121: .sp
! 122: You can use the "PartialMatch" operation when you want the pattern
! 123: to match any substring of the text.
! 124: .sp
! 125: Example: simple search for a string:
! 126: pcrecpp::RE("ell").PartialMatch("hello");
! 127: .sp
! 128: Example: find first number in a string:
! 129: int number;
! 130: pcrecpp::RE re("(\e\ed+)");
! 131: re.PartialMatch("x*100 + 20", &number);
! 132: assert(number == 100);
! 133: .
! 134: .
! 135: .SH "UTF-8 AND THE MATCHING INTERFACE"
! 136: .rs
! 137: .sp
! 138: By default, pattern and text are plain text, one byte per character. The UTF8
! 139: flag, passed to the constructor, causes both pattern and string to be treated
! 140: as UTF-8 text, still a byte stream but potentially multiple bytes per
! 141: character. In practice, the text is likelier to be UTF-8 than the pattern, but
! 142: the match returned may depend on the UTF8 flag, so always use it when matching
! 143: UTF8 text. For example, "." will match one byte normally but with UTF8 set may
! 144: match up to three bytes of a multi-byte character.
! 145: .sp
! 146: Example:
! 147: pcrecpp::RE_Options options;
! 148: options.set_utf8();
! 149: pcrecpp::RE re(utf8_pattern, options);
! 150: re.FullMatch(utf8_string);
! 151: .sp
! 152: Example: using the convenience function UTF8():
! 153: pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
! 154: re.FullMatch(utf8_string);
! 155: .sp
! 156: NOTE: The UTF8 flag is ignored if pcre was not configured with the
! 157: --enable-utf8 flag.
! 158: .
! 159: .
! 160: .SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE"
! 161: .rs
! 162: .sp
! 163: PCRE defines some modifiers to change the behavior of the regular expression
! 164: engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
! 165: pass such modifiers to a RE class. Currently, the following modifiers are
! 166: supported:
! 167: .sp
! 168: modifier description Perl corresponding
! 169: .sp
! 170: PCRE_CASELESS case insensitive match /i
! 171: PCRE_MULTILINE multiple lines match /m
! 172: PCRE_DOTALL dot matches newlines /s
! 173: PCRE_DOLLAR_ENDONLY $ matches only at end N/A
! 174: PCRE_EXTRA strict escape parsing N/A
! 175: PCRE_EXTENDED ignore whitespaces /x
! 176: PCRE_UTF8 handles UTF8 chars built-in
! 177: PCRE_UNGREEDY reverses * and *? N/A
! 178: PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
! 179: .sp
! 180: (*) Both Perl and PCRE allow non capturing parentheses by means of the
! 181: "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
! 182: capture, while (ab|cd) does.
! 183: .P
! 184: For a full account on how each modifier works, please check the
! 185: PCRE API reference page.
! 186: .P
! 187: For each modifier, there are two member functions whose name is made
! 188: out of the modifier in lowercase, without the "PCRE_" prefix. For
! 189: instance, PCRE_CASELESS is handled by
! 190: .sp
! 191: bool caseless()
! 192: .sp
! 193: which returns true if the modifier is set, and
! 194: .sp
! 195: RE_Options & set_caseless(bool)
! 196: .sp
! 197: which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
! 198: accessed through the \fBset_match_limit()\fP and \fBmatch_limit()\fP member
! 199: functions. Setting \fImatch_limit\fP to a non-zero value will limit the
! 200: execution of pcre to keep it from doing bad things like blowing the stack or
! 201: taking an eternity to return a result. A value of 5000 is good enough to stop
! 202: stack blowup in a 2MB thread stack. Setting \fImatch_limit\fP to zero disables
! 203: match limiting. Alternatively, you can call \fBmatch_limit_recursion()\fP
! 204: which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
! 205: recurses. \fBmatch_limit()\fP limits the number of matches PCRE does;
! 206: \fBmatch_limit_recursion()\fP limits the depth of internal recursion, and
! 207: therefore the amount of stack that is used.
! 208: .P
! 209: Normally, to pass one or more modifiers to a RE class, you declare
! 210: a \fIRE_Options\fP object, set the appropriate options, and pass this
! 211: object to a RE constructor. Example:
! 212: .sp
! 213: RE_Options opt;
! 214: opt.set_caseless(true);
! 215: if (RE("HELLO", opt).PartialMatch("hello world")) ...
! 216: .sp
! 217: RE_options has two constructors. The default constructor takes no arguments and
! 218: creates a set of flags that are off by default. The optional parameter
! 219: \fIoption_flags\fP is to facilitate transfer of legacy code from C programs.
! 220: This lets you do
! 221: .sp
! 222: RE(pattern,
! 223: RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
! 224: .sp
! 225: However, new code is better off doing
! 226: .sp
! 227: RE(pattern,
! 228: RE_Options().set_caseless(true).set_multiline(true))
! 229: .PartialMatch(str);
! 230: .sp
! 231: If you are going to pass one of the most used modifiers, there are some
! 232: convenience functions that return a RE_Options class with the
! 233: appropriate modifier already set: \fBCASELESS()\fP, \fBUTF8()\fP,
! 234: \fBMULTILINE()\fP, \fBDOTALL\fP(), and \fBEXTENDED()\fP.
! 235: .P
! 236: If you need to set several options at once, and you don't want to go through
! 237: the pains of declaring a RE_Options object and setting several options, there
! 238: is a parallel method that give you such ability on the fly. You can concatenate
! 239: several \fBset_xxxxx()\fP member functions, since each of them returns a
! 240: reference to its class object. For example, to pass PCRE_CASELESS,
! 241: PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
! 242: .sp
! 243: RE(" ^ xyz \e\es+ .* blah$",
! 244: RE_Options()
! 245: .set_caseless(true)
! 246: .set_extended(true)
! 247: .set_multiline(true)).PartialMatch(sometext);
! 248: .sp
! 249: .
! 250: .
! 251: .SH "SCANNING TEXT INCREMENTALLY"
! 252: .rs
! 253: .sp
! 254: The "Consume" operation may be useful if you want to repeatedly
! 255: match regular expressions at the front of a string and skip over
! 256: them as they match. This requires use of the "StringPiece" type,
! 257: which represents a sub-range of a real string. Like RE, StringPiece
! 258: is defined in the pcrecpp namespace.
! 259: .sp
! 260: Example: read lines of the form "var = value" from a string.
! 261: string contents = ...; // Fill string somehow
! 262: pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
! 263: .sp
! 264: string var;
! 265: int value;
! 266: pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en");
! 267: while (re.Consume(&input, &var, &value)) {
! 268: ...;
! 269: }
! 270: .sp
! 271: Each successful call to "Consume" will set "var/value", and also
! 272: advance "input" so it points past the matched text.
! 273: .P
! 274: The "FindAndConsume" operation is similar to "Consume" but does not
! 275: anchor your match at the beginning of the string. For example, you
! 276: could extract all words from a string by repeatedly calling
! 277: .sp
! 278: pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word)
! 279: .
! 280: .
! 281: .SH "PARSING HEX/OCTAL/C-RADIX NUMBERS"
! 282: .rs
! 283: .sp
! 284: By default, if you pass a pointer to a numeric value, the
! 285: corresponding text is interpreted as a base-10 number. You can
! 286: instead wrap the pointer with a call to one of the operators Hex(),
! 287: Octal(), or CRadix() to interpret the text in another base. The
! 288: CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
! 289: prefixes, but defaults to base-10.
! 290: .sp
! 291: Example:
! 292: int a, b, c, d;
! 293: pcrecpp::RE re("(.*) (.*) (.*) (.*)");
! 294: re.FullMatch("100 40 0100 0x40",
! 295: pcrecpp::Octal(&a), pcrecpp::Hex(&b),
! 296: pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
! 297: .sp
! 298: will leave 64 in a, b, c, and d.
! 299: .
! 300: .
! 301: .SH "REPLACING PARTS OF STRINGS"
! 302: .rs
! 303: .sp
! 304: You can replace the first match of "pattern" in "str" with "rewrite".
! 305: Within "rewrite", backslash-escaped digits (\e1 to \e9) can be
! 306: used to insert text matching corresponding parenthesized group
! 307: from the pattern. \e0 in "rewrite" refers to the entire matching
! 308: text. For example:
! 309: .sp
! 310: string s = "yabba dabba doo";
! 311: pcrecpp::RE("b+").Replace("d", &s);
! 312: .sp
! 313: will leave "s" containing "yada dabba doo". The result is true if the pattern
! 314: matches and a replacement occurs, false otherwise.
! 315: .P
! 316: \fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all
! 317: occurrences of the pattern in the string with the rewrite. Replacements are
! 318: not subject to re-matching. For example:
! 319: .sp
! 320: string s = "yabba dabba doo";
! 321: pcrecpp::RE("b+").GlobalReplace("d", &s);
! 322: .sp
! 323: will leave "s" containing "yada dada doo". It returns the number of
! 324: replacements made.
! 325: .P
! 326: \fBExtract\fP is like \fBReplace\fP, except that if the pattern matches,
! 327: "rewrite" is copied into "out" (an additional argument) with substitutions.
! 328: The non-matching portions of "text" are ignored. Returns true iff a match
! 329: occurred and the extraction happened successfully; if no match occurs, the
! 330: string is left unaffected.
! 331: .
! 332: .
! 333: .SH AUTHOR
! 334: .rs
! 335: .sp
! 336: .nf
! 337: The C++ wrapper was contributed by Google Inc.
! 338: Copyright (c) 2007 Google Inc.
! 339: .fi
! 340: .
! 341: .
! 342: .SH REVISION
! 343: .rs
! 344: .sp
! 345: .nf
! 346: Last updated: 17 March 2009
! 347: Minor typo fixed: 25 July 2011
! 348: .fi
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>