Annotation of embedaddon/libxml2/TODO, revision 1.1
1.1 ! misho 1: 124907 HTML parse buffer problem when parsing larse in-memory docs
! 2: 124110 DTD validation && wrong namespace
! 3: 123564 xmllint --html --format
! 4:
! 5: TODO for the XML parser and stuff:
! 6: ==================================
! 7:
! 8: $Id$
! 9:
! 10: this tend to be outdated :-\ ...
! 11:
! 12: DOCS:
! 13: =====
! 14:
! 15: - use case of using XInclude to load for example a description.
! 16: order document + product base -(XSLT)-> quote with XIncludes
! 17: |
! 18: HTML output with description of parts <---(XSLT)--
! 19:
! 20: TODO:
! 21: =====
! 22: - XInclude at the SAX level (libSRVG)
! 23: - fix the C code prototype to bring back doc/libxml-undocumented.txt
! 24: to a reasonable level
! 25: - Computation of base when HTTP redirect occurs, might affect HTTP
! 26: interfaces.
! 27: - Computation of base in XInclude. Relativization of URIs.
! 28: - listing all attributes in a node.
! 29: - Better checking of external parsed entities TAG 1234
! 30: - Go through erratas and do the cleanup.
! 31: http://www.w3.org/XML/xml-19980210-errata ... started ...
! 32: - jamesh suggestion: SAX like functions to save a document ie. call a
! 33: function to open a new element with given attributes, write character
! 34: data, close last element, etc
! 35: + inversted SAX, initial patch in April 2002 archives.
! 36: - htmlParseDoc has parameter encoding which is not used.
! 37: Function htmlCreateDocParserCtxt ignore it.
! 38: - fix realloc() usage.
! 39: - Stricten the UTF8 conformance (Martin Duerst):
! 40: http://www.w3.org/2001/06/utf-8-test/.
! 41: The bad files are in http://www.w3.org/2001/06/utf-8-wrong/.
! 42: - xml:id normalized value
! 43:
! 44: TODO:
! 45: =====
! 46:
! 47: - move all string manipulation functions (xmlStrdup, xmlStrlen, etc.) to
! 48: global.c. Bjorn noted that the following files depends on parser.o solely
! 49: because of these string functions: entities.o, global.o, hash.o, tree.o,
! 50: xmlIO.o, and xpath.o.
! 51:
! 52: - Optimization of tag strings allocation ?
! 53:
! 54: - maintain coherency of namespace when doing cut'n paste operations
! 55: => the functions are coded, but need testing
! 56:
! 57: - function to rebuild the ID table
! 58: - functions to rebuild the DTD hash tables (after DTD changes).
! 59:
! 60:
! 61: EXTENSIONS:
! 62: ===========
! 63:
! 64: - Tools to produce man pages from the SGML docs.
! 65:
! 66: - Add Xpointer recognition/API
! 67:
! 68: - Add Xlink recognition/API
! 69: => started adding an xlink.[ch] with a unified API for XML and HTML.
! 70: it's crap :-(
! 71:
! 72: - Implement XSchemas
! 73: => Really need to be done <grin/>
! 74: - datatype are complete, but structure support is very limited.
! 75:
! 76: - extend the shell with:
! 77: - edit
! 78: - load/save
! 79: - mv (yum, yum, but it's harder because directories are ordered in
! 80: our case, mvup and mvdown would be required)
! 81:
! 82:
! 83: Done:
! 84: =====
! 85:
! 86: - Add HTML validation using the XHTML DTD
! 87: - problem: do we want to keep and maintain the code for handling
! 88: DTD/System ID cache directly in libxml ?
! 89: => not really done that way, but there are new APIs to check elements
! 90: or attributes. Otherwise XHTML validation directly ...
! 91:
! 92: - XML Schemas datatypes except Base64 and BinHex
! 93:
! 94: - Relax NG validation
! 95:
! 96: - XmlTextReader streaming API + validation
! 97:
! 98: - Add a DTD cache prefilled with xhtml DTDs and entities and a program to
! 99: manage them -> like the /usr/bin/install-catalog from SGML
! 100: right place seems $datadir/xmldtds
! 101: Maybe this is better left to user apps
! 102: => use a catalog instead , and xhtml1-dtd package
! 103:
! 104: - Add output to XHTML
! 105: => XML serializer automatically recognize the DTd and apply the specific
! 106: rules.
! 107:
! 108: - Fix output of <tst val="x
y"/>
! 109:
! 110: - compliance to XML-Namespace checking, see section 6 of
! 111: http://www.w3.org/TR/REC-xml-names/
! 112:
! 113: - Correct standalone checking/emitting (hard)
! 114: 2.9 Standalone Document Declaration
! 115:
! 116: - Implement OASIS XML Catalog support
! 117: http://www.oasis-open.org/committees/entity/
! 118:
! 119: - Get OASIS testsuite to a more friendly result, check all the results
! 120: once stable. the check-xml-test-suite.py script does this
! 121:
! 122: - Implement XSLT
! 123: => libxslt
! 124:
! 125: - Finish XPath
! 126: => attributes addressing troubles
! 127: => defaulted attributes handling
! 128: => namespace axis ?
! 129: done as XSLT got debugged
! 130:
! 131: - bug reported by Michael Meallin on validation problems
! 132: => Actually means I need to add support (and warn) for non-deterministic
! 133: content model.
! 134: - Handle undefined namespaces in entity contents better ... at least
! 135: issue a warning
! 136: - DOM needs
! 137: int xmlPruneProp(xmlNodePtr node, xmlAtttrPtr attr);
! 138: => done it's actually xmlRemoveProp xmlUnsetProp xmlUnsetNsProp
! 139:
! 140: - HTML: handling of Script and style data elements, need special code in
! 141: the parser and saving functions (handling of < > " ' ...):
! 142: http://www.w3.org/TR/html4/types.html#type-script
! 143: Attributes are no problems since entities are accepted.
! 144: - DOM needs
! 145: xmlAttrPtr xmlNewDocProp(xmlDocPtr doc, const xmlChar *name, const xmlChar *value)
! 146: - problem when parsing hrefs with & with the HTML parser (IRC ac)
! 147: - If the internal encoding is not UTF8 saving to a given encoding doesn't
! 148: work => fix to force UTF8 encoding ...
! 149: done, added documentation too
! 150: - Add an ASCII I/O encoder (asciiToUTF8 and UTF8Toascii)
! 151: - Issue warning when using non-absolute namespaces URI.
! 152: - the html parser should add <head> and <body> if they don't exist
! 153: started, not finished.
! 154: Done, the automatic closing is added and 3 testcases were inserted
! 155: - Command to force the parser to stop parsing and ignore the rest of the file.
! 156: xmlStopParser() should allow this, mostly untested
! 157: - support for HTML empty attributes like <hr noshade>
! 158: - plugged iconv() in for support of a large set of encodings.
! 159: - xmlSwitchToEncoding() rewrite done
! 160: - URI checkings (no fragments) rfc2396.txt
! 161: - Added a clean mechanism for overload or added input methods:
! 162: xmlRegisterInputCallbacks()
! 163: - dynamically adapt the alloc entry point to use g_alloc()/g_free()
! 164: if the programmer wants it:
! 165: - use xmlMemSetup() to reset the routines used.
! 166: - Check attribute normalization especially xmlGetProp()
! 167: - Validity checking problems for NOTATIONS attributes
! 168: - Validity checking problems for ENTITY ENTITIES attributes
! 169: - Parsing of a well balanced chunk xmlParseBalancedChunkMemory()
! 170: - URI module: validation, base, etc ... see uri.[ch]
! 171: - turn tester into a generic program xmllint installed with libxml
! 172: - extend validity checks to go through entities content instead of
! 173: just labelling them PCDATA
! 174: - Save Dtds using the children list instead of dumping the tables,
! 175: order is preserved as well as comments and PIs
! 176: - Wrote a notice of changes requires to go from 1.x to 2.x
! 177: - make sure that all SAX callbacks are disabled if a WF error is detected
! 178: - checking/handling of newline normalization
! 179: http://localhost/www.xml.com/axml/target.html#sec-line-ends
! 180: - correct checking of '&' '%' on entities content.
! 181: - checking of PE/Nesting on entities declaration
! 182: - checking/handling of xml:space
! 183: - checking done.
! 184: - handling done, not well tested
! 185: - Language identification code, productions [33] to [38]
! 186: => done, the check has been added and report WFness errors
! 187: - Conditional sections in DTDs [61] to [65]
! 188: => should this crap be really implemented ???
! 189: => Yep OASIS testsuite uses them
! 190: - Allow parsed entities defined in the internal subset to override
! 191: the ones defined in the external subset (DtD customization).
! 192: => This mean that the entity content should be computed only at
! 193: use time, i.e. keep the orig string only at parse time and expand
! 194: only when referenced from the external subset :-(
! 195: Needed for complete use of most DTD from Eve Maler
! 196: - Add regression tests for all WFC errors
! 197: => did some in test/WFC
! 198: => added OASIS testsuite routines
! 199: http://xmlsoft.org/conf/result.html
! 200:
! 201: - I18N: http://wap.trondheim.com/vaer/index.phtml is not XML and accepted
! 202: by the XML parser, UTF-8 should be checked when there is no "encoding"
! 203: declared !
! 204: - Support for UTF-8 and UTF-16 encoding
! 205: => added some convertion routines provided by Martin Durst
! 206: patched them, got fixes from @@@
! 207: I plan to keep everything internally as UTF-8 (or ISO-Latin-X)
! 208: this is slightly more costly but more compact, and recent processors
! 209: efficiency is cache related. The key for good performances is keeping
! 210: the data set small, so will I.
! 211: => the new progressive reading routines call the detection code
! 212: is enabled, tested the ISO->UTF-8 stuff
! 213: - External entities loading:
! 214: - allow override by client code
! 215: - make sure it is alled for all external entities referenced
! 216: Done, client code should use xmlSetExternalEntityLoader() to set
! 217: the default loading routine. It will be called each time an external
! 218: entity entity resolution is triggered.
! 219: - maintain ID coherency when removing/changing attributes
! 220: The function used to deallocate attributes now check for it being an
! 221: ID and removes it from the table.
! 222: - push mode parsing i.e. non-blocking state based parser
! 223: done, both for XML and HTML parsers. Use xmlCreatePushParserCtxt()
! 224: and xmlParseChunk() and html counterparts.
! 225: The tester program now has a --push option to select that parser
! 226: front-end. Douplicated tests to use both and check results are similar.
! 227:
! 228: - Most of XPath, still see some troubles and occasionnal memleaks.
! 229: - an XML shell, allowing to traverse/manipulate an XML document with
! 230: a shell like interface, and using XPath for the anming syntax
! 231: - use of readline and history added when available
! 232: - the shell interface has been cleanly separated and moved to debugXML.c
! 233: - HTML parser, should be fairly stable now
! 234: - API to search the lang of an attribute
! 235: - Collect IDs at parsing and maintain a table.
! 236: PBM: maintain the table coherency
! 237: PBM: how to detect ID types in absence of DtD !
! 238: - Use it for XPath ID support
! 239: - Add validity checking
! 240: Should be finished now !
! 241: - Add regression tests with entity substitutions
! 242:
! 243: - External Parsed entities, either XML or external Subset [78] and [79]
! 244: parsing the xmllang DtD now works, so it should be sufficient for
! 245: most cases !
! 246:
! 247: - progressive reading. The entity support is a first step toward
! 248: asbtraction of an input stream. A large part of the context is still
! 249: located on the stack, moving to a state machine and putting everyting
! 250: in the parsing context should provide an adequate solution.
! 251: => Rather than progressive parsing, give more power to the SAX-like
! 252: interface. Currently the DOM-like representation is built but
! 253: => it should be possible to define that only as a set of SAX callbacks
! 254: and remove the tree creation from the parser code.
! 255: DONE
! 256:
! 257: - DOM support, instead of using a proprietary in memory
! 258: format for the document representation, the parser should
! 259: call a DOM API to actually build the resulting document.
! 260: Then the parser becomes independent of the in-memory
! 261: representation of the document. Even better using RPC's
! 262: the parser can actually build the document in another
! 263: program.
! 264: => Work started, now the internal representation is by default
! 265: very near a direct DOM implementation. The DOM glue is implemented
! 266: as a separate module. See the GNOME gdome module.
! 267:
! 268: - C++ support : John Ehresman <jehresma@dsg.harvard.edu>
! 269: - Updated code to follow more recent specs, added compatibility flag
! 270: - Better error handling, use a dedicated, overridable error
! 271: handling function.
! 272: - Support for CDATA.
! 273: - Keep track of line numbers for better error reporting.
! 274: - Support for PI (SAX one).
! 275: - Support for Comments (bad, should be in ASAP, they are parsed
! 276: but not stored), should be configurable.
! 277: - Improve the support of entities on save (+SAX).
! 278:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>