Annotation of embedaddon/sqlite3/ext/icu/README.txt, revision 1.1.1.1

1.1       misho       1: 
                      2: This directory contains source code for the SQLite "ICU" extension, an
                      3: integration of the "International Components for Unicode" library with
                      4: SQLite. Documentation follows.
                      5: 
                      6:     1. Features
                      7:     
                      8:         1.1  SQL Scalars upper() and lower()
                      9:         1.2  Unicode Aware LIKE Operator
                     10:         1.3  ICU Collation Sequences
                     11:         1.4  SQL REGEXP Operator
                     12:     
                     13:     2. Compilation and Usage
                     14:     
                     15:     3. Bugs, Problems and Security Issues
                     16:     
                     17:         3.1  The "case_sensitive_like" Pragma
                     18:         3.2  The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
                     19:         3.3  Collation Sequence Security Issue
                     20: 
                     21: 
                     22: 1. FEATURES
                     23: 
                     24:   1.1  SQL Scalars upper() and lower()
                     25: 
                     26:     SQLite's built-in implementations of these two functions only 
                     27:     provide case mapping for the 26 letters used in the English
                     28:     language. The ICU based functions provided by this extension
                     29:     provide case mapping, where defined, for the full range of 
                     30:     unicode characters.
                     31: 
                     32:     ICU provides two types of case mapping, "general" case mapping and
                     33:     "language specific". Refer to ICU documentation for the differences
                     34:     between the two. Specifically:
                     35: 
                     36:        http://www.icu-project.org/userguide/caseMappings.html
                     37:        http://www.icu-project.org/userguide/posix.html#case_mappings
                     38: 
                     39:     To utilise "general" case mapping, the upper() or lower() scalar 
                     40:     functions are invoked with one argument:
                     41: 
                     42:         upper('ABC') -> 'abc'
                     43:         lower('abc') -> 'ABC'
                     44: 
                     45:     To access ICU "language specific" case mapping, upper() or lower()
                     46:     should be invoked with two arguments. The second argument is the name
                     47:     of the locale to use. Passing an empty string ("") or SQL NULL value
                     48:     as the second argument is the same as invoking the 1 argument version
                     49:     of upper() or lower():
                     50: 
                     51:         lower('I', 'en_us') -> 'i'
                     52:         lower('I', 'tr_tr') -> 'ı' (small dotless i)
                     53: 
                     54:   1.2  Unicode Aware LIKE Operator
                     55: 
                     56:     Similarly to the upper() and lower() functions, the built-in SQLite LIKE
                     57:     operator understands case equivalence for the 26 letters of the English
                     58:     language alphabet. The implementation of LIKE included in this
                     59:     extension uses the ICU function u_foldCase() to provide case
                     60:     independent comparisons for the full range of unicode characters.  
                     61: 
                     62:     The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the
                     63:     dotless 'I' character used in the Turkish language is considered
                     64:     to be in the same equivalence class as the dotted 'I' character
                     65:     used by many languages (including English).
                     66: 
                     67:   1.3  ICU Collation Sequences
                     68: 
                     69:     A special SQL scalar function, icu_load_collation() is provided that 
                     70:     may be used to register ICU collation sequences with SQLite. It
                     71:     is always called with exactly two arguments, the ICU locale 
                     72:     identifying the collation sequence to ICU, and the name of the
                     73:     SQLite collation sequence to create. For example, to create an
                     74:     SQLite collation sequence named "turkish" using Turkish language
                     75:     sorting rules, the SQL statement:
                     76: 
                     77:         SELECT icu_load_collation('tr_TR', 'turkish');
                     78: 
                     79:     Or, for Australian English:
                     80: 
                     81:         SELECT icu_load_collation('en_AU', 'australian');
                     82: 
                     83:     The identifiers "turkish" and "australian" may then be used
                     84:     as collation sequence identifiers in SQL statements:
                     85: 
                     86:         CREATE TABLE aust_turkish_penpals(
                     87:           australian_penpal_name TEXT COLLATE australian,
                     88:           turkish_penpal_name    TEXT COLLATE turkish
                     89:         );
                     90:   
                     91:   1.4 SQL REGEXP Operator
                     92: 
                     93:     This extension provides an implementation of the SQL binary
                     94:     comparision operator "REGEXP", based on the regular expression functions
                     95:     provided by the ICU library. The syntax of the operator is as described
                     96:     in SQLite documentation:
                     97: 
                     98:         <string> REGEXP <re-pattern>
                     99: 
                    100:     This extension uses the ICU defaults for regular expression matching
                    101:     behaviour. Specifically, this means that:
                    102: 
                    103:         * Matching is case-sensitive,
                    104:         * Regular expression comments are not allowed within patterns, and
                    105:         * The '^' and '$' characters match the beginning and end of the
                    106:           <string> argument, not the beginning and end of lines within
                    107:           the <string> argument.
                    108: 
                    109:     Even more specifically, the value passed to the "flags" parameter
                    110:     of ICU C function uregex_open() is 0.
                    111: 
                    112: 
                    113: 2  COMPILATION AND USAGE
                    114: 
                    115:   The easiest way to compile and use the ICU extension is to build
                    116:   and use it as a dynamically loadable SQLite extension. To do this
                    117:   using gcc on *nix:
                    118: 
                    119:     gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so
                    120: 
                    121:   You may need to add "-I" flags so that gcc can find sqlite3ext.h
                    122:   and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be
                    123:   loaded into sqlite in the same way as any other dynamically loadable
                    124:   extension.
                    125: 
                    126: 
                    127: 3 BUGS, PROBLEMS AND SECURITY ISSUES
                    128: 
                    129:   3.1 The "case_sensitive_like" Pragma
                    130: 
                    131:     This extension does not work well with the "case_sensitive_like"
                    132:     pragma. If this pragma is used before the ICU extension is loaded,
                    133:     then the pragma has no effect. If the pragma is used after the ICU
                    134:     extension is loaded, then SQLite ignores the ICU implementation and
                    135:     always uses the built-in LIKE operator.
                    136: 
                    137:     The ICU extension LIKE operator is always case insensitive.
                    138: 
                    139:   3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
                    140: 
                    141:     Passing very long patterns to the built-in SQLite LIKE operator can
                    142:     cause excessive CPU usage. To curb this problem, SQLite defines the
                    143:     SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a
                    144:     pattern in bytes (irrespective of encoding). The default value is
                    145:     defined in internal header file "limits.h".
                    146:     
                    147:     The ICU extension LIKE implementation suffers from the same 
                    148:     problem and uses the same solution. However, since the ICU extension
                    149:     code does not include the SQLite file "limits.h", modifying
                    150:     the default value therein does not affect the ICU extension.
                    151:     The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by
                    152:     the ICU extension LIKE operator is 50000, defined in source 
                    153:     file "icu.c".
                    154: 
                    155:   3.3 Collation Sequence Security Issue
                    156: 
                    157:     Internally, SQLite assumes that indices stored in database files
                    158:     are sorted according to the collation sequence indicated by the
                    159:     SQL schema. Changing the definition of a collation sequence after
                    160:     an index has been built is therefore equivalent to database
                    161:     corruption. The SQLite library is not very well tested under
                    162:     these conditions, and may contain potential buffer overruns
                    163:     or other programming errors that could be exploited by a malicious
                    164:     programmer.
                    165: 
                    166:     If the ICU extension is used in an environment where potentially
                    167:     malicious users may execute arbitrary SQL (i.e. gears), they
                    168:     should be prevented from invoking the icu_load_collation() function,
                    169:     possibly using the authorisation callback.

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>