Return to README.txt CVS log | Up to [ELWIX - Embedded LightWeight unIX -] / embedaddon / sqlite3 / ext / icu |
1.1 ! misho 1: ! 2: This directory contains source code for the SQLite "ICU" extension, an ! 3: integration of the "International Components for Unicode" library with ! 4: SQLite. Documentation follows. ! 5: ! 6: 1. Features ! 7: ! 8: 1.1 SQL Scalars upper() and lower() ! 9: 1.2 Unicode Aware LIKE Operator ! 10: 1.3 ICU Collation Sequences ! 11: 1.4 SQL REGEXP Operator ! 12: ! 13: 2. Compilation and Usage ! 14: ! 15: 3. Bugs, Problems and Security Issues ! 16: ! 17: 3.1 The "case_sensitive_like" Pragma ! 18: 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro ! 19: 3.3 Collation Sequence Security Issue ! 20: ! 21: ! 22: 1. FEATURES ! 23: ! 24: 1.1 SQL Scalars upper() and lower() ! 25: ! 26: SQLite's built-in implementations of these two functions only ! 27: provide case mapping for the 26 letters used in the English ! 28: language. The ICU based functions provided by this extension ! 29: provide case mapping, where defined, for the full range of ! 30: unicode characters. ! 31: ! 32: ICU provides two types of case mapping, "general" case mapping and ! 33: "language specific". Refer to ICU documentation for the differences ! 34: between the two. Specifically: ! 35: ! 36: http://www.icu-project.org/userguide/caseMappings.html ! 37: http://www.icu-project.org/userguide/posix.html#case_mappings ! 38: ! 39: To utilise "general" case mapping, the upper() or lower() scalar ! 40: functions are invoked with one argument: ! 41: ! 42: upper('ABC') -> 'abc' ! 43: lower('abc') -> 'ABC' ! 44: ! 45: To access ICU "language specific" case mapping, upper() or lower() ! 46: should be invoked with two arguments. The second argument is the name ! 47: of the locale to use. Passing an empty string ("") or SQL NULL value ! 48: as the second argument is the same as invoking the 1 argument version ! 49: of upper() or lower(): ! 50: ! 51: lower('I', 'en_us') -> 'i' ! 52: lower('I', 'tr_tr') -> 'ı' (small dotless i) ! 53: ! 54: 1.2 Unicode Aware LIKE Operator ! 55: ! 56: Similarly to the upper() and lower() functions, the built-in SQLite LIKE ! 57: operator understands case equivalence for the 26 letters of the English ! 58: language alphabet. The implementation of LIKE included in this ! 59: extension uses the ICU function u_foldCase() to provide case ! 60: independent comparisons for the full range of unicode characters. ! 61: ! 62: The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the ! 63: dotless 'I' character used in the Turkish language is considered ! 64: to be in the same equivalence class as the dotted 'I' character ! 65: used by many languages (including English). ! 66: ! 67: 1.3 ICU Collation Sequences ! 68: ! 69: A special SQL scalar function, icu_load_collation() is provided that ! 70: may be used to register ICU collation sequences with SQLite. It ! 71: is always called with exactly two arguments, the ICU locale ! 72: identifying the collation sequence to ICU, and the name of the ! 73: SQLite collation sequence to create. For example, to create an ! 74: SQLite collation sequence named "turkish" using Turkish language ! 75: sorting rules, the SQL statement: ! 76: ! 77: SELECT icu_load_collation('tr_TR', 'turkish'); ! 78: ! 79: Or, for Australian English: ! 80: ! 81: SELECT icu_load_collation('en_AU', 'australian'); ! 82: ! 83: The identifiers "turkish" and "australian" may then be used ! 84: as collation sequence identifiers in SQL statements: ! 85: ! 86: CREATE TABLE aust_turkish_penpals( ! 87: australian_penpal_name TEXT COLLATE australian, ! 88: turkish_penpal_name TEXT COLLATE turkish ! 89: ); ! 90: ! 91: 1.4 SQL REGEXP Operator ! 92: ! 93: This extension provides an implementation of the SQL binary ! 94: comparision operator "REGEXP", based on the regular expression functions ! 95: provided by the ICU library. The syntax of the operator is as described ! 96: in SQLite documentation: ! 97: ! 98: <string> REGEXP <re-pattern> ! 99: ! 100: This extension uses the ICU defaults for regular expression matching ! 101: behaviour. Specifically, this means that: ! 102: ! 103: * Matching is case-sensitive, ! 104: * Regular expression comments are not allowed within patterns, and ! 105: * The '^' and '$' characters match the beginning and end of the ! 106: <string> argument, not the beginning and end of lines within ! 107: the <string> argument. ! 108: ! 109: Even more specifically, the value passed to the "flags" parameter ! 110: of ICU C function uregex_open() is 0. ! 111: ! 112: ! 113: 2 COMPILATION AND USAGE ! 114: ! 115: The easiest way to compile and use the ICU extension is to build ! 116: and use it as a dynamically loadable SQLite extension. To do this ! 117: using gcc on *nix: ! 118: ! 119: gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so ! 120: ! 121: You may need to add "-I" flags so that gcc can find sqlite3ext.h ! 122: and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be ! 123: loaded into sqlite in the same way as any other dynamically loadable ! 124: extension. ! 125: ! 126: ! 127: 3 BUGS, PROBLEMS AND SECURITY ISSUES ! 128: ! 129: 3.1 The "case_sensitive_like" Pragma ! 130: ! 131: This extension does not work well with the "case_sensitive_like" ! 132: pragma. If this pragma is used before the ICU extension is loaded, ! 133: then the pragma has no effect. If the pragma is used after the ICU ! 134: extension is loaded, then SQLite ignores the ICU implementation and ! 135: always uses the built-in LIKE operator. ! 136: ! 137: The ICU extension LIKE operator is always case insensitive. ! 138: ! 139: 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro ! 140: ! 141: Passing very long patterns to the built-in SQLite LIKE operator can ! 142: cause excessive CPU usage. To curb this problem, SQLite defines the ! 143: SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a ! 144: pattern in bytes (irrespective of encoding). The default value is ! 145: defined in internal header file "limits.h". ! 146: ! 147: The ICU extension LIKE implementation suffers from the same ! 148: problem and uses the same solution. However, since the ICU extension ! 149: code does not include the SQLite file "limits.h", modifying ! 150: the default value therein does not affect the ICU extension. ! 151: The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by ! 152: the ICU extension LIKE operator is 50000, defined in source ! 153: file "icu.c". ! 154: ! 155: 3.3 Collation Sequence Security Issue ! 156: ! 157: Internally, SQLite assumes that indices stored in database files ! 158: are sorted according to the collation sequence indicated by the ! 159: SQL schema. Changing the definition of a collation sequence after ! 160: an index has been built is therefore equivalent to database ! 161: corruption. The SQLite library is not very well tested under ! 162: these conditions, and may contain potential buffer overruns ! 163: or other programming errors that could be exploited by a malicious ! 164: programmer. ! 165: ! 166: If the ICU extension is used in an environment where potentially ! 167: malicious users may execute arbitrary SQL (i.e. gears), they ! 168: should be prevented from invoking the icu_load_collation() function, ! 169: possibly using the authorisation callback.