Return to pcrecallout.html CVS log | Up to [ELWIX - Embedded LightWeight unIX -] / embedaddon / pcre / doc / html |
1.1 ! misho 1: <html> ! 2: <head> ! 3: <title>pcrecallout specification</title> ! 4: </head> ! 5: <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> ! 6: <h1>pcrecallout man page</h1> ! 7: <p> ! 8: Return to the <a href="index.html">PCRE index page</a>. ! 9: </p> ! 10: <p> ! 11: This page is part of the PCRE HTML documentation. It was generated automatically ! 12: from the original man page. If there is any nonsense in it, please consult the ! 13: man page, in case the conversion went wrong. ! 14: <br> ! 15: <ul> ! 16: <li><a name="TOC1" href="#SEC1">PCRE CALLOUTS</a> ! 17: <li><a name="TOC2" href="#SEC2">MISSING CALLOUTS</a> ! 18: <li><a name="TOC3" href="#SEC3">THE CALLOUT INTERFACE</a> ! 19: <li><a name="TOC4" href="#SEC4">RETURN VALUES</a> ! 20: <li><a name="TOC5" href="#SEC5">AUTHOR</a> ! 21: <li><a name="TOC6" href="#SEC6">REVISION</a> ! 22: </ul> ! 23: <br><a name="SEC1" href="#TOC1">PCRE CALLOUTS</a><br> ! 24: <P> ! 25: <b>int (*pcre_callout)(pcre_callout_block *);</b> ! 26: </P> ! 27: <P> ! 28: PCRE provides a feature called "callout", which is a means of temporarily ! 29: passing control to the caller of PCRE in the middle of pattern matching. The ! 30: caller of PCRE provides an external function by putting its entry point in the ! 31: global variable <i>pcre_callout</i>. By default, this variable contains NULL, ! 32: which disables all calling out. ! 33: </P> ! 34: <P> ! 35: Within a regular expression, (?C) indicates the points at which the external ! 36: function is to be called. Different callout points can be identified by putting ! 37: a number less than 256 after the letter C. The default value is zero. ! 38: For example, this pattern has two callout points: ! 39: <pre> ! 40: (?C1)abc(?C2)def ! 41: </pre> ! 42: If the PCRE_AUTO_CALLOUT option bit is set when <b>pcre_compile()</b> or ! 43: <b>pcre_compile2()</b> is called, PCRE automatically inserts callouts, all with ! 44: number 255, before each item in the pattern. For example, if PCRE_AUTO_CALLOUT ! 45: is used with the pattern ! 46: <pre> ! 47: A(\d{2}|--) ! 48: </pre> ! 49: it is processed as if it were ! 50: <br> ! 51: <br> ! 52: (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255) ! 53: <br> ! 54: <br> ! 55: Notice that there is a callout before and after each parenthesis and ! 56: alternation bar. Automatic callouts can be used for tracking the progress of ! 57: pattern matching. The ! 58: <a href="pcretest.html"><b>pcretest</b></a> ! 59: command has an option that sets automatic callouts; when it is used, the output ! 60: indicates how the pattern is matched. This is useful information when you are ! 61: trying to optimize the performance of a particular pattern. ! 62: </P> ! 63: <P> ! 64: The use of callouts in a pattern makes it ineligible for optimization by the ! 65: just-in-time compiler. Studying such a pattern with the PCRE_STUDY_JIT_COMPILE ! 66: option always fails. ! 67: </P> ! 68: <br><a name="SEC2" href="#TOC1">MISSING CALLOUTS</a><br> ! 69: <P> ! 70: You should be aware that, because of optimizations in the way PCRE matches ! 71: patterns by default, callouts sometimes do not happen. For example, if the ! 72: pattern is ! 73: <pre> ! 74: ab(?C4)cd ! 75: </pre> ! 76: PCRE knows that any matching string must contain the letter "d". If the subject ! 77: string is "abyz", the lack of "d" means that matching doesn't ever start, and ! 78: the callout is never reached. However, with "abyd", though the result is still ! 79: no match, the callout is obeyed. ! 80: </P> ! 81: <P> ! 82: If the pattern is studied, PCRE knows the minimum length of a matching string, ! 83: and will immediately give a "no match" return without actually running a match ! 84: if the subject is not long enough, or, for unanchored patterns, if it has ! 85: been scanned far enough. ! 86: </P> ! 87: <P> ! 88: You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE ! 89: option to <b>pcre_compile()</b>, <b>pcre_exec()</b>, or <b>pcre_dfa_exec()</b>, ! 90: or by starting the pattern with (*NO_START_OPT). This slows down the matching ! 91: process, but does ensure that callouts such as the example above are obeyed. ! 92: </P> ! 93: <br><a name="SEC3" href="#TOC1">THE CALLOUT INTERFACE</a><br> ! 94: <P> ! 95: During matching, when PCRE reaches a callout point, the external function ! 96: defined by <i>pcre_callout</i> is called (if it is set). This applies to both ! 97: the <b>pcre_exec()</b> and the <b>pcre_dfa_exec()</b> matching functions. The ! 98: only argument to the callout function is a pointer to a <b>pcre_callout</b> ! 99: block. This structure contains the following fields: ! 100: <pre> ! 101: int <i>version</i>; ! 102: int <i>callout_number</i>; ! 103: int *<i>offset_vector</i>; ! 104: const char *<i>subject</i>; ! 105: int <i>subject_length</i>; ! 106: int <i>start_match</i>; ! 107: int <i>current_position</i>; ! 108: int <i>capture_top</i>; ! 109: int <i>capture_last</i>; ! 110: void *<i>callout_data</i>; ! 111: int <i>pattern_position</i>; ! 112: int <i>next_item_length</i>; ! 113: const unsigned char *<i>mark</i>; ! 114: </pre> ! 115: The <i>version</i> field is an integer containing the version number of the ! 116: block format. The initial version was 0; the current version is 2. The version ! 117: number will change again in future if additional fields are added, but the ! 118: intention is never to remove any of the existing fields. ! 119: </P> ! 120: <P> ! 121: The <i>callout_number</i> field contains the number of the callout, as compiled ! 122: into the pattern (that is, the number after ?C for manual callouts, and 255 for ! 123: automatically generated callouts). ! 124: </P> ! 125: <P> ! 126: The <i>offset_vector</i> field is a pointer to the vector of offsets that was ! 127: passed by the caller to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. When ! 128: <b>pcre_exec()</b> is used, the contents can be inspected in order to extract ! 129: substrings that have been matched so far, in the same way as for extracting ! 130: substrings after a match has completed. For <b>pcre_dfa_exec()</b> this field is ! 131: not useful. ! 132: </P> ! 133: <P> ! 134: The <i>subject</i> and <i>subject_length</i> fields contain copies of the values ! 135: that were passed to <b>pcre_exec()</b>. ! 136: </P> ! 137: <P> ! 138: The <i>start_match</i> field normally contains the offset within the subject at ! 139: which the current match attempt started. However, if the escape sequence \K ! 140: has been encountered, this value is changed to reflect the modified starting ! 141: point. If the pattern is not anchored, the callout function may be called ! 142: several times from the same point in the pattern for different starting points ! 143: in the subject. ! 144: </P> ! 145: <P> ! 146: The <i>current_position</i> field contains the offset within the subject of the ! 147: current match pointer. ! 148: </P> ! 149: <P> ! 150: When the <b>pcre_exec()</b> function is used, the <i>capture_top</i> field ! 151: contains one more than the number of the highest numbered captured substring so ! 152: far. If no substrings have been captured, the value of <i>capture_top</i> is ! 153: one. This is always the case when <b>pcre_dfa_exec()</b> is used, because it ! 154: does not support captured substrings. ! 155: </P> ! 156: <P> ! 157: The <i>capture_last</i> field contains the number of the most recently captured ! 158: substring. If no substrings have been captured, its value is -1. This is always ! 159: the case when <b>pcre_dfa_exec()</b> is used. ! 160: </P> ! 161: <P> ! 162: The <i>callout_data</i> field contains a value that is passed to ! 163: <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> specifically so that it can be ! 164: passed back in callouts. It is passed in the <i>pcre_callout</i> field of the ! 165: <b>pcre_extra</b> data structure. If no such data was passed, the value of ! 166: <i>callout_data</i> in a <b>pcre_callout</b> block is NULL. There is a ! 167: description of the <b>pcre_extra</b> structure in the ! 168: <a href="pcreapi.html"><b>pcreapi</b></a> ! 169: documentation. ! 170: </P> ! 171: <P> ! 172: The <i>pattern_position</i> field is present from version 1 of the ! 173: <i>pcre_callout</i> structure. It contains the offset to the next item to be ! 174: matched in the pattern string. ! 175: </P> ! 176: <P> ! 177: The <i>next_item_length</i> field is present from version 1 of the ! 178: <i>pcre_callout</i> structure. It contains the length of the next item to be ! 179: matched in the pattern string. When the callout immediately precedes an ! 180: alternation bar, a closing parenthesis, or the end of the pattern, the length ! 181: is zero. When the callout precedes an opening parenthesis, the length is that ! 182: of the entire subpattern. ! 183: </P> ! 184: <P> ! 185: The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to ! 186: help in distinguishing between different automatic callouts, which all have the ! 187: same callout number. However, they are set for all callouts. ! 188: </P> ! 189: <P> ! 190: The <i>mark</i> field is present from version 2 of the <i>pcre_callout</i> ! 191: structure. In callouts from <b>pcre_exec()</b> it contains a pointer to the ! 192: zero-terminated name of the most recently passed (*MARK), (*PRUNE), or (*THEN) ! 193: item in the match, or NULL if no such items have been passed. Instances of ! 194: (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In ! 195: callouts from <b>pcre_dfa_exec()</b> this field always contains NULL. ! 196: </P> ! 197: <br><a name="SEC4" href="#TOC1">RETURN VALUES</a><br> ! 198: <P> ! 199: The external callout function returns an integer to PCRE. If the value is zero, ! 200: matching proceeds as normal. If the value is greater than zero, matching fails ! 201: at the current point, but the testing of other matching possibilities goes ! 202: ahead, just as if a lookahead assertion had failed. If the value is less than ! 203: zero, the match is abandoned, and <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> ! 204: returns the negative value. ! 205: </P> ! 206: <P> ! 207: Negative values should normally be chosen from the set of PCRE_ERROR_xxx ! 208: values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure. ! 209: The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions; ! 210: it will never be used by PCRE itself. ! 211: </P> ! 212: <br><a name="SEC5" href="#TOC1">AUTHOR</a><br> ! 213: <P> ! 214: Philip Hazel ! 215: <br> ! 216: University Computing Service ! 217: <br> ! 218: Cambridge CB2 3QH, England. ! 219: <br> ! 220: </P> ! 221: <br><a name="SEC6" href="#TOC1">REVISION</a><br> ! 222: <P> ! 223: Last updated: 30 November 2011 ! 224: <br> ! 225: Copyright © 1997-2011 University of Cambridge. ! 226: <br> ! 227: <p> ! 228: Return to the <a href="index.html">PCRE index page</a>. ! 229: </p>