embedaddon/libxml2/TODO - view

File: [ELWIX - Embedded LightWeight unIX -] / embedaddon / libxml2 / TODO
Revision 1.1.1.1 (vendor branch): download - view: text, annotated - select for diffs - revision graph
Tue Feb 21 23:37:58 2012 UTC (12 years, 3 months ago) by misho
Branches: libxml2, MAIN
CVS tags: v2_9_1p0, v2_9_1, v2_8_0p0, v2_8_0, v2_7_8, HEAD

libxml2

1: 124907 HTML parse buffer problem when parsing larse in-memory docs 2: 124110 DTD validation && wrong namespace 3: 123564 xmllint --html --format 4: 5: TODO for the XML parser and stuff: 6: ================================== 7: 8: $Id: TODO,v 1.1.1.1 2012/02/21 23:37:58 misho Exp $ 9: 10: this tend to be outdated :-\ ... 11: 12: DOCS: 13: ===== 14: 15: - use case of using XInclude to load for example a description. 16: order document + product base -(XSLT)-> quote with XIncludes 17: | 18: HTML output with description of parts <---(XSLT)-- 19: 20: TODO: 21: ===== 22: - XInclude at the SAX level (libSRVG) 23: - fix the C code prototype to bring back doc/libxml-undocumented.txt 24: to a reasonable level 25: - Computation of base when HTTP redirect occurs, might affect HTTP 26: interfaces. 27: - Computation of base in XInclude. Relativization of URIs. 28: - listing all attributes in a node. 29: - Better checking of external parsed entities TAG 1234 30: - Go through erratas and do the cleanup. 31: http://www.w3.org/XML/xml-19980210-errata ... started ... 32: - jamesh suggestion: SAX like functions to save a document ie. call a 33: function to open a new element with given attributes, write character 34: data, close last element, etc 35: + inversted SAX, initial patch in April 2002 archives. 36: - htmlParseDoc has parameter encoding which is not used. 37: Function htmlCreateDocParserCtxt ignore it. 38: - fix realloc() usage. 39: - Stricten the UTF8 conformance (Martin Duerst): 40: http://www.w3.org/2001/06/utf-8-test/. 41: The bad files are in http://www.w3.org/2001/06/utf-8-wrong/. 42: - xml:id normalized value 43: 44: TODO: 45: ===== 46: 47: - move all string manipulation functions (xmlStrdup, xmlStrlen, etc.) to 48: global.c. Bjorn noted that the following files depends on parser.o solely 49: because of these string functions: entities.o, global.o, hash.o, tree.o, 50: xmlIO.o, and xpath.o. 51: 52: - Optimization of tag strings allocation ? 53: 54: - maintain coherency of namespace when doing cut'n paste operations 55: => the functions are coded, but need testing 56: 57: - function to rebuild the ID table 58: - functions to rebuild the DTD hash tables (after DTD changes). 59: 60: 61: EXTENSIONS: 62: =========== 63: 64: - Tools to produce man pages from the SGML docs. 65: 66: - Add Xpointer recognition/API 67: 68: - Add Xlink recognition/API 69: => started adding an xlink.[ch] with a unified API for XML and HTML. 70: it's crap :-( 71: 72: - Implement XSchemas 73: => Really need to be done <grin/> 74: - datatype are complete, but structure support is very limited. 75: 76: - extend the shell with: 77: - edit 78: - load/save 79: - mv (yum, yum, but it's harder because directories are ordered in 80: our case, mvup and mvdown would be required) 81: 82: 83: Done: 84: ===== 85: 86: - Add HTML validation using the XHTML DTD 87: - problem: do we want to keep and maintain the code for handling 88: DTD/System ID cache directly in libxml ? 89: => not really done that way, but there are new APIs to check elements 90: or attributes. Otherwise XHTML validation directly ... 91: 92: - XML Schemas datatypes except Base64 and BinHex 93: 94: - Relax NG validation 95: 96: - XmlTextReader streaming API + validation 97: 98: - Add a DTD cache prefilled with xhtml DTDs and entities and a program to 99: manage them -> like the /usr/bin/install-catalog from SGML 100: right place seems $datadir/xmldtds 101: Maybe this is better left to user apps 102: => use a catalog instead , and xhtml1-dtd package 103: 104: - Add output to XHTML 105: => XML serializer automatically recognize the DTd and apply the specific 106: rules. 107: 108: - Fix output of <tst val="x
y"/> 109: 110: - compliance to XML-Namespace checking, see section 6 of 111: http://www.w3.org/TR/REC-xml-names/ 112: 113: - Correct standalone checking/emitting (hard) 114: 2.9 Standalone Document Declaration 115: 116: - Implement OASIS XML Catalog support 117: http://www.oasis-open.org/committees/entity/ 118: 119: - Get OASIS testsuite to a more friendly result, check all the results 120: once stable. the check-xml-test-suite.py script does this 121: 122: - Implement XSLT 123: => libxslt 124: 125: - Finish XPath 126: => attributes addressing troubles 127: => defaulted attributes handling 128: => namespace axis ? 129: done as XSLT got debugged 130: 131: - bug reported by Michael Meallin on validation problems 132: => Actually means I need to add support (and warn) for non-deterministic 133: content model. 134: - Handle undefined namespaces in entity contents better ... at least 135: issue a warning 136: - DOM needs 137: int xmlPruneProp(xmlNodePtr node, xmlAtttrPtr attr); 138: => done it's actually xmlRemoveProp xmlUnsetProp xmlUnsetNsProp 139: 140: - HTML: handling of Script and style data elements, need special code in 141: the parser and saving functions (handling of < > " ' ...): 142: http://www.w3.org/TR/html4/types.html#type-script 143: Attributes are no problems since entities are accepted. 144: - DOM needs 145: xmlAttrPtr xmlNewDocProp(xmlDocPtr doc, const xmlChar *name, const xmlChar *value) 146: - problem when parsing hrefs with & with the HTML parser (IRC ac) 147: - If the internal encoding is not UTF8 saving to a given encoding doesn't 148: work => fix to force UTF8 encoding ... 149: done, added documentation too 150: - Add an ASCII I/O encoder (asciiToUTF8 and UTF8Toascii) 151: - Issue warning when using non-absolute namespaces URI. 152: - the html parser should add <head> and <body> if they don't exist 153: started, not finished. 154: Done, the automatic closing is added and 3 testcases were inserted 155: - Command to force the parser to stop parsing and ignore the rest of the file. 156: xmlStopParser() should allow this, mostly untested 157: - support for HTML empty attributes like <hr noshade> 158: - plugged iconv() in for support of a large set of encodings. 159: - xmlSwitchToEncoding() rewrite done 160: - URI checkings (no fragments) rfc2396.txt 161: - Added a clean mechanism for overload or added input methods: 162: xmlRegisterInputCallbacks() 163: - dynamically adapt the alloc entry point to use g_alloc()/g_free() 164: if the programmer wants it: 165: - use xmlMemSetup() to reset the routines used. 166: - Check attribute normalization especially xmlGetProp() 167: - Validity checking problems for NOTATIONS attributes 168: - Validity checking problems for ENTITY ENTITIES attributes 169: - Parsing of a well balanced chunk xmlParseBalancedChunkMemory() 170: - URI module: validation, base, etc ... see uri.[ch] 171: - turn tester into a generic program xmllint installed with libxml 172: - extend validity checks to go through entities content instead of 173: just labelling them PCDATA 174: - Save Dtds using the children list instead of dumping the tables, 175: order is preserved as well as comments and PIs 176: - Wrote a notice of changes requires to go from 1.x to 2.x 177: - make sure that all SAX callbacks are disabled if a WF error is detected 178: - checking/handling of newline normalization 179: http://localhost/www.xml.com/axml/target.html#sec-line-ends 180: - correct checking of '&' '%' on entities content. 181: - checking of PE/Nesting on entities declaration 182: - checking/handling of xml:space 183: - checking done. 184: - handling done, not well tested 185: - Language identification code, productions [33] to [38] 186: => done, the check has been added and report WFness errors 187: - Conditional sections in DTDs [61] to [65] 188: => should this crap be really implemented ??? 189: => Yep OASIS testsuite uses them 190: - Allow parsed entities defined in the internal subset to override 191: the ones defined in the external subset (DtD customization). 192: => This mean that the entity content should be computed only at 193: use time, i.e. keep the orig string only at parse time and expand 194: only when referenced from the external subset :-( 195: Needed for complete use of most DTD from Eve Maler 196: - Add regression tests for all WFC errors 197: => did some in test/WFC 198: => added OASIS testsuite routines 199: http://xmlsoft.org/conf/result.html 200: 201: - I18N: http://wap.trondheim.com/vaer/index.phtml is not XML and accepted 202: by the XML parser, UTF-8 should be checked when there is no "encoding" 203: declared ! 204: - Support for UTF-8 and UTF-16 encoding 205: => added some convertion routines provided by Martin Durst 206: patched them, got fixes from @@@ 207: I plan to keep everything internally as UTF-8 (or ISO-Latin-X) 208: this is slightly more costly but more compact, and recent processors 209: efficiency is cache related. The key for good performances is keeping 210: the data set small, so will I. 211: => the new progressive reading routines call the detection code 212: is enabled, tested the ISO->UTF-8 stuff 213: - External entities loading: 214: - allow override by client code 215: - make sure it is alled for all external entities referenced 216: Done, client code should use xmlSetExternalEntityLoader() to set 217: the default loading routine. It will be called each time an external 218: entity entity resolution is triggered. 219: - maintain ID coherency when removing/changing attributes 220: The function used to deallocate attributes now check for it being an 221: ID and removes it from the table. 222: - push mode parsing i.e. non-blocking state based parser 223: done, both for XML and HTML parsers. Use xmlCreatePushParserCtxt() 224: and xmlParseChunk() and html counterparts. 225: The tester program now has a --push option to select that parser 226: front-end. Douplicated tests to use both and check results are similar. 227: 228: - Most of XPath, still see some troubles and occasionnal memleaks. 229: - an XML shell, allowing to traverse/manipulate an XML document with 230: a shell like interface, and using XPath for the anming syntax 231: - use of readline and history added when available 232: - the shell interface has been cleanly separated and moved to debugXML.c 233: - HTML parser, should be fairly stable now 234: - API to search the lang of an attribute 235: - Collect IDs at parsing and maintain a table. 236: PBM: maintain the table coherency 237: PBM: how to detect ID types in absence of DtD ! 238: - Use it for XPath ID support 239: - Add validity checking 240: Should be finished now ! 241: - Add regression tests with entity substitutions 242: 243: - External Parsed entities, either XML or external Subset [78] and [79] 244: parsing the xmllang DtD now works, so it should be sufficient for 245: most cases ! 246: 247: - progressive reading. The entity support is a first step toward 248: asbtraction of an input stream. A large part of the context is still 249: located on the stack, moving to a state machine and putting everyting 250: in the parsing context should provide an adequate solution. 251: => Rather than progressive parsing, give more power to the SAX-like 252: interface. Currently the DOM-like representation is built but 253: => it should be possible to define that only as a set of SAX callbacks 254: and remove the tree creation from the parser code. 255: DONE 256: 257: - DOM support, instead of using a proprietary in memory 258: format for the document representation, the parser should 259: call a DOM API to actually build the resulting document. 260: Then the parser becomes independent of the in-memory 261: representation of the document. Even better using RPC's 262: the parser can actually build the document in another 263: program. 264: => Work started, now the internal representation is by default 265: very near a direct DOM implementation. The DOM glue is implemented 266: as a separate module. See the GNOME gdome module. 267: 268: - C++ support : John Ehresman <jehresma@dsg.harvard.edu> 269: - Updated code to follow more recent specs, added compatibility flag 270: - Better error handling, use a dedicated, overridable error 271: handling function. 272: - Support for CDATA. 273: - Keep track of line numbers for better error reporting. 274: - Support for PI (SAX one). 275: - Support for Comments (bad, should be in ASAP, they are parsed 276: but not stored), should be configurable. 277: - Improve the support of entities on save (+SAX). 278: