Return to intsubset2.xml CVS log | Up to [ELWIX - Embedded LightWeight unIX -] / embedaddon / libxml2 / result / noent |
1.1 ! misho 1: <?xml version="1.0"?> ! 2: <!DOCTYPE kanjidic2 [ ! 3: <!-- Version 1.3 ! 4: This is the DTD of the XML-format kanji file combining information from ! 5: the KANJIDIC and KANJD212 files. It is intended to be largely self- ! 6: documenting, with each field being accompanied by an explanatory ! 7: comment. ! 8: ! 9: The file covers the following kanji: ! 10: (a) the 6,355 kanji from JIS X 0208; ! 11: (b) the 5,801 kanji from JIS X 0212; ! 12: (c) the 3,625 kanji from JIS X 0213 as follows: ! 13: (i) the 2,741 kanji which are also in JIS X 0212 have ! 14: JIS X 0213 code-points (kuten) added to the existing entry; ! 15: (ii) the 884 "new" kanji have new entries. ! 16: ! 17: At the end of the explanation for a number of fields there is a tag ! 18: with the format [N]. This indicates the leading letter(s) of the ! 19: equivalent field in the KANJIDIC and KANJD212 files. ! 20: ! 21: The KANJIDIC documentation should also be read for additional ! 22: information about the information in the file. ! 23: --><!ELEMENT kanjidic2 (header , character*)> ! 24: <!ELEMENT header (file_version , database_version , date_of_creation)> ! 25: <!-- ! 26: The single header element will contain identification information ! 27: about the version of the file ! 28: --><!ELEMENT file_version (#PCDATA)> ! 29: <!-- ! 30: This field denotes the version of kanjidic2 structure, as more ! 31: than one version may exist. ! 32: --><!ELEMENT database_version (#PCDATA)> ! 33: <!-- ! 34: The version of the file, in the format YYYY-NN, where NN will be ! 35: a number starting with 01 for the first version released in a ! 36: calendar year, then increasing for each version in that year. ! 37: --><!ELEMENT date_of_creation (#PCDATA)> ! 38: <!-- ! 39: The date the file was created in international format (YYYY-MM-DD). ! 40: --><!ELEMENT character (literal , codepoint , radical , misc , dic_number? , query_code? , reading_meaning? , nanori?)*> ! 41: <!ELEMENT literal (#PCDATA)> ! 42: <!-- ! 43: The character itself in UTF8 coding. ! 44: --><!ELEMENT codepoint (cp_value)+> ! 45: <!-- ! 46: The codepoint element states the code of the character in the various ! 47: character set standards. ! 48: --><!ELEMENT cp_value (#PCDATA)> ! 49: <!-- ! 50: The cp_value contains the codepoint of the character in a particular ! 51: standard. The standard will be identified in the cp_type attribute. ! 52: --><!ATTLIST cp_value cp_type CDATA #REQUIRED> ! 53: <!-- ! 54: The cp_type attribute states the coding standard applying to the ! 55: element. The values assigned so far are: ! 56: jis208 - JIS X 0208-1997 - kuten coding (nn-nn) ! 57: jis212 - JIS X 0212-1990 - kuten coding (nn-nn) ! 58: jis213 - JIS X 0213-2000 - kuten coding (p-nn-nn) ! 59: ucs - Unicode 4.0 - hex coding (4 or 5 hexadecimal digits) ! 60: --><!ELEMENT radical (rad_value)+> ! 61: <!ELEMENT rad_value (#PCDATA)> ! 62: <!-- ! 63: The radical number, in the range 1 to 214. The particular ! 64: classification type is stated in the rad_type attribute. ! 65: --><!ATTLIST rad_value rad_type CDATA #REQUIRED> ! 66: <!-- ! 67: The rad_type attribute states the type of radical classification. ! 68: classical - as recorded in the KangXi Zidian. ! 69: nelson - as used in the Nelson "Modern Japanese-English ! 70: Character Dictionary" (i.e. the Classic, not the New Nelson). ! 71: This will only be used where Nelson reclassified the kanji. ! 72: --><!ELEMENT misc (grade? , stroke_count+ , variant* , freq* , rad_name*)> ! 73: <!ELEMENT grade (#PCDATA)> ! 74: <!-- ! 75: The Jouyou Kanji grade level. 1 through 6 indicate the grade in which ! 76: the kanji is taught in Japanese schools. 8 indicates it is one of the ! 77: remaining Jouyou Kanji to be learned in junior high school, and 9 ! 78: indicates it is a Jinmeiyou (for use in names) kanji. [G] ! 79: --><!ELEMENT stroke_count (#PCDATA)> ! 80: <!-- ! 81: The stroke count of the kanji, including the radical. If more than ! 82: one, the first is considered the accepted count, while subsequent ones ! 83: are common miscounts. (See Appendix E. of the KANJIDIC documentation ! 84: for some of the rules applied when counting strokes in some of the ! 85: radicals.) [S] ! 86: --><!ELEMENT variant (#PCDATA)> ! 87: <!-- ! 88: A cross-reference code to another kanji, usually regarded as a variant. ! 89: The type of cross-reference is given in the var_type attribute. ! 90: --><!ATTLIST variant var_type CDATA #REQUIRED> ! 91: <!-- ! 92: The var_type attribute indicates the type of variant code. The current ! 93: values are: ! 94: jis208 - in JIS X 0208 - kuten coding ! 95: jis212 - in JIS X 0212 - kuten coding ! 96: jis213 - in JIS X 0213 - kuten coding ! 97: deroo - De Roo number - numeric ! 98: njecd - Halpern NJECD index number - numeric ! 99: s_h - The Kanji Dictionary (Spahn & Hadamitzky) - descriptor ! 100: nelson - "Classic" Nelson - numeric ! 101: oneill - Japanese Names (O'Neill) - numeric ! 102: --><!ELEMENT freq (#PCDATA)> ! 103: <!-- ! 104: A frequency-of-use ranking. The 2,500 most-used characters have a ! 105: ranking; those characters that lack this field are not ranked. The ! 106: frequency is a number from 1 to 2,500 that expresses the relative ! 107: frequency of occurrence of a character in modern Japanese. This is ! 108: based on a survey in newspapers, so it is biassed towards kanji ! 109: used in newspaper articles. The discrimination between the less ! 110: frequently used kanji is not strong. ! 111: --><!ELEMENT rad_name (#PCDATA)> ! 112: <!-- ! 113: When the kanji is itself a radical and has a name, this element ! 114: contains the name (in hiragana.) [T2] ! 115: --><!ELEMENT dic_number (dic_ref)+> ! 116: <!-- ! 117: This element contains the index numbers and similar unstructured ! 118: information such as page numbers in a number of published dictionaries, ! 119: and instructional books on kanji. ! 120: --><!ELEMENT dic_ref (#PCDATA)> ! 121: <!-- ! 122: Each dic_ref contains an index number. The particular dictionary, ! 123: etc. is defined by the dr_type attribute. ! 124: --><!ATTLIST dic_ref dr_type CDATA #REQUIRED> ! 125: <!-- ! 126: The dr_type defines the dictionary or reference book, etc. to which ! 127: dic_ref element applies. The initial allocation is: ! 128: nelson_c - "Modern Reader's Japanese-English Character Dictionary", ! 129: edited by Andrew Nelson (now published as the "Classic" ! 130: Nelson). ! 131: nelson_n - "The New Nelson Japanese-English Character Dictionary", ! 132: edited by John Haig. ! 133: halpern_njecd - "New Japanese-English Character Dictionary", ! 134: edited by Jack Halpern. ! 135: halpern_kkld - "Kanji Learners Dictionary" (Kodansha) edited by ! 136: Jack Halpern. ! 137: heisig - "Remembering The Kanji" by James Heisig. ! 138: gakken - "A New Dictionary of Kanji Usage" (Gakken) ! 139: oneill_names - "Japanese Names", by P.G. O'Neill. ! 140: oneill_kk - "Essential Kanji" by P.G. O'Neill. ! 141: moro - "Daikanwajiten" compiled by Morohashi. For some kanji two ! 142: additional attributes are used: m_vol: the volume of the ! 143: dictionary in which the kanji is found, and m_page: the page ! 144: number in the volume. ! 145: henshall - "A Guide To Remembering Japanese Characters" by ! 146: Kenneth G. Henshall. ! 147: sh_kk - "Kanji and Kana" by Spahn and Hadamitzky. ! 148: sakade - "A Guide To Reading and Writing Japanese" edited by ! 149: Florence Sakade. ! 150: henshall3 - "A Guide To Reading and Writing Japanese" 3rd ! 151: edition, edited by Henshall, Seeley and De Groot. ! 152: tutt_cards - Tuttle Kanji Cards, compiled by Alexander Kask. ! 153: crowley - "The Kanji Way to Japanese Language Power" by ! 154: Dale Crowley. ! 155: kanji_in_context - "Kanji in Context" by Nishiguchi and Kono. ! 156: busy_people - "Japanese For Busy People" vols I-III, published ! 157: by the AJLT. The codes are the volume.chapter. ! 158: kodansha_compact - the "Kodansha Compact Kanji Guide". ! 159: --><!ATTLIST dic_ref m_vol CDATA #IMPLIED> ! 160: <!-- ! 161: See above under "moro". ! 162: --><!ATTLIST dic_ref m_page CDATA #IMPLIED> ! 163: <!-- ! 164: See above under "moro". ! 165: --><!ELEMENT query_code (q_code)+> ! 166: <!-- ! 167: These codes contain information relating to the glyph, and can be used ! 168: for finding a required kanji. The type of code is defined by the ! 169: qc_type attribute. ! 170: --><!ELEMENT q_code (#PCDATA)> ! 171: <!-- ! 172: The q_code contains the actual query-code value, according to the ! 173: qc_type attribute. ! 174: --><!ATTLIST q_code qc_type CDATA #REQUIRED> ! 175: <!-- ! 176: The q_code attribute defines the type of query code. The current values ! 177: are: ! 178: skip - Halpern's SKIP (System of Kanji Indexing by Patterns) ! 179: code. The format is n-nn-nn. See the KANJIDIC documentation ! 180: for a description of the code and restrictions on the ! 181: commercial use of this data. [P] ! 182: ! 183: sh_desc - the descriptor codes for The Kanji Dictionary (Tuttle ! 184: 1996) by Spahn and Hadamitzky. They are in the form nxnn.n, ! 185: e.g. 3k11.2, where the kanji has 3 strokes in the ! 186: identifying radical, it is radical "k" in the SH ! 187: classification system, there are 11 other strokes, and it is ! 188: the 2nd kanji in the 3k11 sequence. (I am very grateful to ! 189: Mark Spahn for providing the list of these descriptor codes ! 190: for the kanji in this file.) [I] ! 191: four_corner - the "Four Corner" code for the kanji. This is a code ! 192: invented by Wang Chen in 1928. See the KANJIDIC documentation ! 193: for an overview of the Four Corner System. [Q] ! 194: ! 195: deroo - the codes developed by the late Father Joseph De Roo, and ! 196: published in his book "2001 Kanji" (Bojinsha). Fr De Roo ! 197: gave his permission for these codes to be included. [DR] ! 198: misclass - a possible misclassification of the kanji according ! 199: to one of the code types. (See the "Z" codes in the KANJIDIC ! 200: documentation for more details.) ! 201: ! 202: --><!ELEMENT reading_meaning (rmgroup* , nanori*)> ! 203: <!-- ! 204: The readings for the kanji in several languages, and the meanings, also ! 205: in several languages. The readings and meanings are grouped to enable ! 206: the handling of the situation where the meaning is differentiated by ! 207: reading. [T1] ! 208: --><!ELEMENT nanori (#PCDATA)> ! 209: <!-- ! 210: Japanese readings that are now only associated with names. ! 211: --><!ELEMENT rmgroup (reading* , meaning*)> ! 212: <!ELEMENT reading (#PCDATA)> ! 213: <!-- ! 214: The reading element contains the reading or pronunciation ! 215: of the kanji. ! 216: --><!ATTLIST reading r_type CDATA #REQUIRED> ! 217: <!-- ! 218: The r_type attribute defines the type of reading in the reading ! 219: element. The current values are: ! 220: pinyin - the modern PinYin romanization of the Chinese reading ! 221: of the kanji. The tones are represented by a concluding ! 222: digit. [Y] ! 223: korean_r - the romanized form of the Korean reading(s) of the ! 224: kanji. The readings are in the (Republic of Korea) Ministry ! 225: of Education style of romanization. [W] ! 226: korean_h - the Korean reading(s) of the kanji in hangul. ! 227: ja_on - the "on" Japanese reading of the kanji, in katakana. A ! 228: second attribute r_status, if present, will indicate with ! 229: a value of "jy" whether the reading is approved for a ! 230: "Jouyou kanji". ! 231: ja_kun - the "kun" Japanese reading of the kanji, in hiragana. ! 232: Where relevant the okurigana is also included separated by a ! 233: ".". Readings associated with prefixes and suffixes are ! 234: marked with a "-". A second attribute r_status, if present, ! 235: will indicate with a value of "jy" whether the reading is ! 236: approved for a "Jouyou kanji". ! 237: --><!ATTLIST reading r_status CDATA #IMPLIED> ! 238: <!-- ! 239: See under ja_on and ja_kun above. ! 240: --><!ELEMENT meaning (#PCDATA)> ! 241: <!-- ! 242: The meaning associated with the kanji. ! 243: --><!ATTLIST meaning m_lang CDATA #IMPLIED> ! 244: <!-- ! 245: The m_lang attribute defines the target language of the meaning. It ! 246: will be coded using the two-letter language code from the ISO 639 ! 247: standard. When absent, the value "en" (i.e. English) is implied. [{}] ! 248: -->]> ! 249: <kanjidic2> ! 250: </kanjidic2>