embedaddon/php/ext/intl/doc/Tutorial.txt - annotate

Return to Tutorial.txt CVS log
Up to [ELWIX - Embedded LightWeight unIX -] / embedaddon / php / ext / intl / doc
Annotation of embedaddon/php/ext/intl/doc/Tutorial.txt, revision 1.1

1.1     ! misho       1: 1. Collator::getAvailableLocales().
        !             2: Return the locales available at the time of the call, including registered locales.
        !             3: If a sever error occurs (such as out of memory condition) this will return null.
        !             4: If there is no locale data, an empty enumeration will be returned. 
        !             5: Returned locales list is a strings in format of RFC4646 standart (see http://www.rfc-editor.org/rfc/rfc4646.txt).
        !             6: Examle of locales format: 'en_US', 'ru_UA', 'ua_UA' (see http://demo.icu-project.org/icu-bin/locexp).
        !             7: 
        !             8: 
        !             9: 2. Collator::getDisplayName( $obj_locale, $disp_locale ).
        !            10: Get name of the object for the desired Locale, in the desired langauge. Both arguments 
        !            11: must be from getAvailableLocales method.
        !            12: 
        !            13:      @param  string  $obj_locale   Locale to get display name for.
        !            14:      @param  string  $disp_locale  Specifies the desired locale for output
        !            15: 
        !            16: Both parameters are case insensitive.
        !            17: For locale format see RFC4647 standart in ftp://ftp.rfc-editor.org/in-notes/rfc4647.txt
        !            18: 
        !            19: 3. Collator::getLocaleByType( $type ).
        !            20: Allow user to select whether she wants information on requested, valid or actual locale.
        !            21: Returned locale tag is a string formatted to a RFC4646 standart and normalize to normal form -
        !            22: value is a string from 
        !            23: For example, a collator for "en_US_CALIFORNIA" was requested. In the current state of ICU (2.0), 
        !            24: the requested locale is "en_US_CALIFORNIA", the valid locale is "en_US" (most specific locale 
        !            25: supported by ICU) and the actual locale is "root" (the collation data comes unmodified from the UCA) 
        !            26: The locale is considered supported by ICU if there is a core ICU bundle for that locale (although 
        !            27: it may be empty). 
        !            28: 
        !            29: 
        !            30: 4. VariableTop
        !            31: The Variable_Top attribute is only meaningful if the Alternate attribute is not set to NonIgnorable.
        !            32: In such a case, it controls which characters count as ignorable. The string value specifies 
        !            33: the "highest" character (in UCA order) weight that is to be considered ignorable.
        !            34: Thus, for example, if a user wanted whitespace to be ignorable, but not any visible characters, 
        !            35: then s/he would use the value Variable_Top="\u0020" (space). The string should only be a 
        !            36: single character. All characters of the same primary weight are equivalent, so 
        !            37: Variable_Top="\u3000" (ideographic space) has the same effect as Variable_Top="\u0020".
        !            38: This setting (alone) has little impact on string comparison performance; setting it lower or higher
        !            39: will make sort keys slightly shorter or longer respectively.
        !            40: 
        !            41: 
        !            42: 5. Strength
        !            43: The ICU Collation Service supports many levels of comparison (named "Levels", but also
        !            44: known as "Strengths"). Having these categories enables ICU to sort strings precisely
        !            45: according to local conventions. However, by allowing the levels to be selectively
        !            46: employed, searching for a string in text can be performed with various matching
        !            47: conditions.
        !            48: Performance optimizations have been made for ICU collation with the default level
        !            49: settings. Performance specific impacts are discussed in the Performance section below.
        !            50: Following is a list of the names for each level and an example usage:
        !            51: 
        !            52: 1. Primary Level: Typically, this is used to denote differences between base characters
        !            53: (for example, "a" < "b"). It is the strongest difference. For example, dictionaries are
        !            54: divided into different sections by base character. This is also called the level1
        !            55: strength.
        !            56: 
        !            57: 2. Secondary Level: Accents in the characters are considered secondary differences (for
        !            58: example, "as" < "as" < "at"). Other differences between letters can also be considered
        !            59: secondary differences, depending on the language. A secondary difference is ignored
        !            60: when there is a primary difference anywhere in the strings. This is also called the
        !            61: level2 strength.
        !            62: Note: In some languages (such as Danish), certain accented letters are considered to
        !            63: be separate base characters. In most languages, however, an accented letter only has a
        !            64: secondary difference from the unaccented version of that letter.
        !            65: 
        !            66: 3. Tertiary Level: Upper and lower case differences in characters are distinguished at the
        !            67: tertiary level (for example, "ao" < "Ao" < "ao"). In addition, a variant of a letter differs
        !            68: from the base form on the tertiary level (such as "A" and " "). Another ? example is the
        !            69: difference between large and small Kana. A tertiary difference is ignored when there is
        !            70: a primary or secondary difference anywhere in the strings. This is also called the level3
        !            71: strength.
        !            72: 
        !            73: 4. Quaternary Level: When punctuation is ignored (see Ignoring Punctuations ) at level
        !            74: 13, an additional level can be used to distinguish words with and without punctuation
        !            75: (for example, "ab" < "a-b" < "aB"). This difference is ignored when there is a primary,
        !            76: secondary or tertiary difference. This is also known as the level4 strength. The
        !            77: quaternary level should only be used if ignoring punctuation is required or when
        !            78: processing Japanese text (see Hiragana processing).
        !            79: 
        !            80: 5. Identical Level: When all other levels are equal, the identical level is used as a
        !            81: tiebreaker. The Unicode code point values of the NFD form of each string are
        !            82: compared at this level, just in case there is no difference at levels 14
        !            83: . For example, Hebrew cantillation marks are only distinguished at this level. This level should be
        !            84: used sparingly, as only code point values differences between two strings is an
        !            85: extremely rare occurrence. Using this level substantially decreases the performance for
        !            86: both incremental comparison and sort key generation (as well as increasing the sort
        !            87: key length). It is also known as level 5 strength.
        !            88: 
        !            89: For example, people may choose to ignore accents or ignore accents and case when searching
        !            90: for text. Almost all characters are distinguished by the first three levels, and in most
        !            91: locales the default value is thus Tertiary. However, if Alternate is set to be Shifted,
        !            92: then the Quaternary strength can be used to break ties among whitespace, punctuation, and
        !            93: symbols that would otherwise be ignored. If very fine distinctions among characters are required,
        !            94: then the Identical strength can be used (for example, Identical Strength distinguishes
        !            95: between the Mathematical Bold Small A and the Mathematical Italic Small A.). However, using
        !            96: levels higher than Tertiary the Identical strength result in significantly longer sort
        !            97: keys, and slower string comparison performance for equal strings.
        !            98: 
        !            99: 
        !           100: 
        !           101: 6. Collator::__construct( $locale ).
        !           102: The Locale attribute is typically the most important attribute for correct sorting and matching,
        !           103: according to the user expectations in different countries and regions. The default UCA
        !           104: ordering will only sort a few languages such as Dutch and Portuguese correctly ("correctly"
        !           105: meaning according to the normal expectations for users of the languages).
        !           106: Otherwise, you need to supply the locale to UCA in order to properly collate text for a
        !           107: given language. Thus a locale needs to be supplied so as to choose a collator that is correctly
        !           108: tailored for that locale. The choice of a locale will automatically preset the values for
        !           109: all of the attributes to something that is reasonable for that locale. Thus most of the time the
        !           110: other attributes do not need to be explicitly set. In some cases, the choice of locale will make a
        !           111: difference in string comparison performance and/or sort key length.
        !           112: In short attribute names, <language>_<script>_<region>_<keyword>.
        !           113: Not all the elements are required. Valid values for locale elements are general valid values
        !           114: for RFC4646 locale naming, and RFC 4647 lookup algorithm.
        !           115: Example:
        !           116: Locale="sv" (Swedish) "Kypper" < "Kopfe"
        !           117: Locale="de" (German) "Kopfe" < "Kypper"
        !           118: 
        !           119: 
        !           120: 7. Collator::get/setAttribute.
        !           121: ICU uses UCA as a default starting point for ordering. Not all languages have sorting sequences
        !           122: that correspond with the UCA because UCA cannot simultaneously encompass the specifics of all
        !           123: the languages currently in use. Therefore, ICU provides a data-driven, flexible, and run-time
        !           124: customizable mechanism called "tailoring". Tailoring overrides the default order of code points
        !           125: and the values of the ICU Collation Service attributes.
        !           126: Collator have followed attributes:
        !           127:    - FRENCH_COLLATION, possible values are: 
        !           128:        ON
        !           129:        OFF (default)
        !           130:        DEFAULT
        !           131: 
        !           132:    - CASE_FIRST, possible values are:
        !           133:        OFF (default)
        !           134:        LOWER_FIRST
        !           135:        UPPER_FIRST
        !           136:        DEFAULT
        !           137: 
        !           138:    - CASE_LEVEL, possible values are:
        !           139:        OFF (default)
        !           140:        ON
        !           141:        DEFAULT
        !           142: 
        !           143:    - NORMALIZATION_MODE, possible values are:
        !           144:        OFF (default)
        !           145:        ON
        !           146:        DEFAULT
        !           147: 
        !           148:    - STRENGTH, possible values are:
        !           149:        PRIMARY
        !           150:        SECONDARY
        !           151:        TERTIARY (default)
        !           152:        QUATERNARY
        !           153:        IDENTICAL
        !           154:        DEFAULT
        !           155: 
        !           156:    - ALTERNATE_HANDLING, possible values are:
        !           157:        NON_IGNORABLE (default)
        !           158:        SHIFTED
        !           159:        DEFAULT
        !           160: 
        !           161:    - HIRAGANA_QUATERNARY_MODE, possible values are:
        !           162:        ON
        !           163:        OFF (default)
        !           164:        DEFAULT
        !           165: 
        !           166:    - NUMERIC_COLLATION, possible values are:
        !           167:        ON
        !           168:        OFF (default)
        !           169:        DEFAULT
        !           170: 
        !           171: Description of all of this attributes:
        !           172: 
        !           173: FRENCH_COLLATION - Sort strings with different accents from the back of the string. This attribute
        !           174: is automatically set to On for the French locales and a few others. Users normally would
        !           175: not need to explicitly set this attribute. There is a string comparison performance cost when
        !           176: it is set On, but sort key length is unaffected.
        !           177: Example:
        !           178: F=X cote < cote < cote < cote
        !           179: F=O cote < cote < cote < cote
        !           180: 
        !           181: CASE_FIRST - The Case_First attribute is used to control whether uppercase letters come before
        !           182: lowercase letters or vice versa, in the absence of other differences in the strings. The possible
        !           183: values are Uppercase_First (U) and Lowercase_First (L), plus the standard Default and Off.
        !           184: There is almost no difference between the Off and Lowercase_First options in terms of results,
        !           185: so typically users will not use Lowercase_First: only Off or Uppercase_First. (People interested
        !           186: in the detailed differences between X and L should consult the Collation Customization).
        !           187: Specifying either L or U won't affect string comparison performance, but will affect the sort key
        !           188: length.
        !           189: Example:
        !           190: C=X or C=L "china" < "China" < "denmark" <
        !           191: "Denmark"
        !           192: C=U "China" < "china" < "Denmark" < "denmark"
        !           193: 
        !           194: CASE_LEVEL - The Case_Level attribute is used when ignoring accents but not case. In such a situation,
        !           195: set Strength to be Primary, and Case_Level to be On. In most locales, this setting is Off by default.
        !           196: There is a small string comparison performance and sort key impact if this attribute is set to be On.
        !           197: Example:
        !           198: S=1, E=X role = Role = role
        !           199: S=1, E=O role = role < Role
        !           200: 
        !           201: NORMALIZATION_MODE - The Normalization setting determines whether text is thoroughly normalized
        !           202: or not in comparison. Even if the setting is off (which is the default for many locales), text as
        !           203: represented in common usage will compare correctly (for details, see UTN #5). Only if the accent
        !           204: marks are in noncanonical order will there be a problem. If the setting is On, then the best
        !           205: results are guaranteed for all possible text input. There is a medium string comparison performance
        !           206: cost if this attribute is On, depending on the frequency of sequences that require normalization.
        !           207: There is no significant effect on sort key length. If the input text is known to be in NFD or NFKD
        !           208: normalization forms, there is no need to enable this Normalization option.
        !           209: 
        !           210: STRENGTH - see Collator::setStrength chapter.
        !           211: 
        !           212: ALTERNATE_HANDLING - The Alternate attribute is used to control the handling of the socalled
        !           213: variable characters in the UCA: whitespace, punctuation and symbols. If Alternate is set to
        !           214: NonIgnorable (N), then differences among these characters are of the same importance as
        !           215: differences among letters. If Alternate is set to Shifted (S), then these characters are of only
        !           216: minor importance. The Shifted value is often used in combination with Strength set to Quaternary.
        !           217: In such a case, whitespace, punctuation, and symbols are considered when comparing strings,
        !           218: but only if all other aspects of the strings (base letters, accents, and case) are identical.
        !           219: If Alternate is not set to Shifted, then there is no difference between a Strength of 3 and
        !           220: a Strength of 4. For more information and examples, see
        !           221: Variable_Weighting in the UCA (http://www.unicode.org/reports/tr10/#Variable_Weighting).
        !           222: The reason the Alternate values are not simply On and Off is that additional Alternate values
        !           223: may be added in the future. The UCA option Blanked is expressed with Strength set to 3,
        !           224: and Alternate set to Shifted. The default for most locales is NonIgnorable. If Shifted is selected,
        !           225: it may be slower if there are many strings that are the same except for punctuation;
        !           226: sort key length will not be affected unless the strength level is also increased.
        !           227: Example:
        !           228: S=3, A=N di Silva < Di Silva < diSilva < U.S.A. < USA
        !           229: S=3, A=S di Silva = diSilva < Di Silva < U.S.A. = USA
        !           230: S=4, A=S di Silva < diSilva < Di Silva < U.S.A. < USA
        !           231: 
        !           232: HIRAGANA_QUATERNARY_MODE - Compatibility with JIS x 4061 requires the introduction of an additional
        !           233: level to distinguish Hiragana and Katakana characters. If compatibility with that standard is required,
        !           234: then this attribute should be set On, and the strength set to Quaternary. This will affect sort key
        !           235: length and string comparison string comparison performance.
        !           236: 
        !           237: NUMERIC_COLLATION - When turned on, this attribute generates a collation key for the
        !           238: numeric value of substrings of digits. This is a way to get '100' to sort AFTER '2'.
        !           239:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>