Annotation of embedaddon/sqlite3/ext/icu/README.txt, revision 1.1

1.1     ! misho       1: 
        !             2: This directory contains source code for the SQLite "ICU" extension, an
        !             3: integration of the "International Components for Unicode" library with
        !             4: SQLite. Documentation follows.
        !             5: 
        !             6:     1. Features
        !             7:     
        !             8:         1.1  SQL Scalars upper() and lower()
        !             9:         1.2  Unicode Aware LIKE Operator
        !            10:         1.3  ICU Collation Sequences
        !            11:         1.4  SQL REGEXP Operator
        !            12:     
        !            13:     2. Compilation and Usage
        !            14:     
        !            15:     3. Bugs, Problems and Security Issues
        !            16:     
        !            17:         3.1  The "case_sensitive_like" Pragma
        !            18:         3.2  The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
        !            19:         3.3  Collation Sequence Security Issue
        !            20: 
        !            21: 
        !            22: 1. FEATURES
        !            23: 
        !            24:   1.1  SQL Scalars upper() and lower()
        !            25: 
        !            26:     SQLite's built-in implementations of these two functions only 
        !            27:     provide case mapping for the 26 letters used in the English
        !            28:     language. The ICU based functions provided by this extension
        !            29:     provide case mapping, where defined, for the full range of 
        !            30:     unicode characters.
        !            31: 
        !            32:     ICU provides two types of case mapping, "general" case mapping and
        !            33:     "language specific". Refer to ICU documentation for the differences
        !            34:     between the two. Specifically:
        !            35: 
        !            36:        http://www.icu-project.org/userguide/caseMappings.html
        !            37:        http://www.icu-project.org/userguide/posix.html#case_mappings
        !            38: 
        !            39:     To utilise "general" case mapping, the upper() or lower() scalar 
        !            40:     functions are invoked with one argument:
        !            41: 
        !            42:         upper('ABC') -> 'abc'
        !            43:         lower('abc') -> 'ABC'
        !            44: 
        !            45:     To access ICU "language specific" case mapping, upper() or lower()
        !            46:     should be invoked with two arguments. The second argument is the name
        !            47:     of the locale to use. Passing an empty string ("") or SQL NULL value
        !            48:     as the second argument is the same as invoking the 1 argument version
        !            49:     of upper() or lower():
        !            50: 
        !            51:         lower('I', 'en_us') -> 'i'
        !            52:         lower('I', 'tr_tr') -> 'ı' (small dotless i)
        !            53: 
        !            54:   1.2  Unicode Aware LIKE Operator
        !            55: 
        !            56:     Similarly to the upper() and lower() functions, the built-in SQLite LIKE
        !            57:     operator understands case equivalence for the 26 letters of the English
        !            58:     language alphabet. The implementation of LIKE included in this
        !            59:     extension uses the ICU function u_foldCase() to provide case
        !            60:     independent comparisons for the full range of unicode characters.  
        !            61: 
        !            62:     The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the
        !            63:     dotless 'I' character used in the Turkish language is considered
        !            64:     to be in the same equivalence class as the dotted 'I' character
        !            65:     used by many languages (including English).
        !            66: 
        !            67:   1.3  ICU Collation Sequences
        !            68: 
        !            69:     A special SQL scalar function, icu_load_collation() is provided that 
        !            70:     may be used to register ICU collation sequences with SQLite. It
        !            71:     is always called with exactly two arguments, the ICU locale 
        !            72:     identifying the collation sequence to ICU, and the name of the
        !            73:     SQLite collation sequence to create. For example, to create an
        !            74:     SQLite collation sequence named "turkish" using Turkish language
        !            75:     sorting rules, the SQL statement:
        !            76: 
        !            77:         SELECT icu_load_collation('tr_TR', 'turkish');
        !            78: 
        !            79:     Or, for Australian English:
        !            80: 
        !            81:         SELECT icu_load_collation('en_AU', 'australian');
        !            82: 
        !            83:     The identifiers "turkish" and "australian" may then be used
        !            84:     as collation sequence identifiers in SQL statements:
        !            85: 
        !            86:         CREATE TABLE aust_turkish_penpals(
        !            87:           australian_penpal_name TEXT COLLATE australian,
        !            88:           turkish_penpal_name    TEXT COLLATE turkish
        !            89:         );
        !            90:   
        !            91:   1.4 SQL REGEXP Operator
        !            92: 
        !            93:     This extension provides an implementation of the SQL binary
        !            94:     comparision operator "REGEXP", based on the regular expression functions
        !            95:     provided by the ICU library. The syntax of the operator is as described
        !            96:     in SQLite documentation:
        !            97: 
        !            98:         <string> REGEXP <re-pattern>
        !            99: 
        !           100:     This extension uses the ICU defaults for regular expression matching
        !           101:     behaviour. Specifically, this means that:
        !           102: 
        !           103:         * Matching is case-sensitive,
        !           104:         * Regular expression comments are not allowed within patterns, and
        !           105:         * The '^' and '$' characters match the beginning and end of the
        !           106:           <string> argument, not the beginning and end of lines within
        !           107:           the <string> argument.
        !           108: 
        !           109:     Even more specifically, the value passed to the "flags" parameter
        !           110:     of ICU C function uregex_open() is 0.
        !           111: 
        !           112: 
        !           113: 2  COMPILATION AND USAGE
        !           114: 
        !           115:   The easiest way to compile and use the ICU extension is to build
        !           116:   and use it as a dynamically loadable SQLite extension. To do this
        !           117:   using gcc on *nix:
        !           118: 
        !           119:     gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so
        !           120: 
        !           121:   You may need to add "-I" flags so that gcc can find sqlite3ext.h
        !           122:   and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be
        !           123:   loaded into sqlite in the same way as any other dynamically loadable
        !           124:   extension.
        !           125: 
        !           126: 
        !           127: 3 BUGS, PROBLEMS AND SECURITY ISSUES
        !           128: 
        !           129:   3.1 The "case_sensitive_like" Pragma
        !           130: 
        !           131:     This extension does not work well with the "case_sensitive_like"
        !           132:     pragma. If this pragma is used before the ICU extension is loaded,
        !           133:     then the pragma has no effect. If the pragma is used after the ICU
        !           134:     extension is loaded, then SQLite ignores the ICU implementation and
        !           135:     always uses the built-in LIKE operator.
        !           136: 
        !           137:     The ICU extension LIKE operator is always case insensitive.
        !           138: 
        !           139:   3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
        !           140: 
        !           141:     Passing very long patterns to the built-in SQLite LIKE operator can
        !           142:     cause excessive CPU usage. To curb this problem, SQLite defines the
        !           143:     SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a
        !           144:     pattern in bytes (irrespective of encoding). The default value is
        !           145:     defined in internal header file "limits.h".
        !           146:     
        !           147:     The ICU extension LIKE implementation suffers from the same 
        !           148:     problem and uses the same solution. However, since the ICU extension
        !           149:     code does not include the SQLite file "limits.h", modifying
        !           150:     the default value therein does not affect the ICU extension.
        !           151:     The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by
        !           152:     the ICU extension LIKE operator is 50000, defined in source 
        !           153:     file "icu.c".
        !           154: 
        !           155:   3.3 Collation Sequence Security Issue
        !           156: 
        !           157:     Internally, SQLite assumes that indices stored in database files
        !           158:     are sorted according to the collation sequence indicated by the
        !           159:     SQL schema. Changing the definition of a collation sequence after
        !           160:     an index has been built is therefore equivalent to database
        !           161:     corruption. The SQLite library is not very well tested under
        !           162:     these conditions, and may contain potential buffer overruns
        !           163:     or other programming errors that could be exploited by a malicious
        !           164:     programmer.
        !           165: 
        !           166:     If the ICU extension is used in an environment where potentially
        !           167:     malicious users may execute arbitrary SQL (i.e. gears), they
        !           168:     should be prevented from invoking the icu_load_collation() function,
        !           169:     possibly using the authorisation callback.

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>