Annotation of embedaddon/libxml2/TODO, revision 1.1

1.1     ! misho       1: 124907 HTML parse buffer problem when parsing larse in-memory docs
        !             2: 124110 DTD validation && wrong namespace
        !             3: 123564 xmllint --html --format
        !             4: 
        !             5:            TODO for the XML parser and stuff:
        !             6:           ==================================
        !             7: 
        !             8:       $Id$
        !             9: 
        !            10:     this tend to be outdated :-\ ...
        !            11: 
        !            12: DOCS:
        !            13: =====
        !            14: 
        !            15: - use case of using XInclude to load for example a description.
        !            16:   order document + product base -(XSLT)-> quote with XIncludes 
        !            17:                                                    |
        !            18:   HTML output with description of parts <---(XSLT)--
        !            19: 
        !            20: TODO:
        !            21: =====
        !            22: - XInclude at the SAX level (libSRVG)
        !            23: - fix the C code prototype to bring back doc/libxml-undocumented.txt
        !            24:   to a reasonable level
        !            25: - Computation of base when HTTP redirect occurs, might affect HTTP
        !            26:   interfaces.
        !            27: - Computation of base in XInclude. Relativization of URIs.
        !            28: - listing all attributes in a node.
        !            29: - Better checking of external parsed entities TAG 1234
        !            30: - Go through erratas and do the cleanup.
        !            31:   http://www.w3.org/XML/xml-19980210-errata ... started ...
        !            32: - jamesh suggestion: SAX like functions to save a document ie. call a
        !            33:   function to open a new element with given attributes, write character
        !            34:   data, close last element, etc
        !            35:   + inversted SAX, initial patch in April 2002 archives.
        !            36: - htmlParseDoc has parameter encoding which is not used.
        !            37:   Function htmlCreateDocParserCtxt ignore it.
        !            38: - fix realloc() usage.
        !            39: - Stricten the UTF8 conformance (Martin Duerst):
        !            40:   http://www.w3.org/2001/06/utf-8-test/.
        !            41:   The bad files are in http://www.w3.org/2001/06/utf-8-wrong/.
        !            42: - xml:id normalized value
        !            43: 
        !            44: TODO:
        !            45: =====
        !            46: 
        !            47: - move all string manipulation functions (xmlStrdup, xmlStrlen, etc.) to
        !            48:   global.c. Bjorn noted that the following files depends on parser.o solely
        !            49:   because of these string functions: entities.o, global.o, hash.o, tree.o,
        !            50:   xmlIO.o, and xpath.o.
        !            51: 
        !            52: - Optimization of tag strings allocation ?
        !            53: 
        !            54: - maintain coherency of namespace when doing cut'n paste operations
        !            55:   => the functions are coded, but need testing
        !            56: 
        !            57: - function to rebuild the ID table
        !            58: - functions to rebuild the DTD hash tables (after DTD changes).
        !            59:    
        !            60: 
        !            61: EXTENSIONS:
        !            62: ===========
        !            63: 
        !            64: - Tools to produce man pages from the SGML docs.
        !            65: 
        !            66: - Add Xpointer recognition/API
        !            67: 
        !            68: - Add Xlink recognition/API
        !            69:   => started adding an xlink.[ch] with a unified API for XML and HTML.
        !            70:      it's crap :-(
        !            71: 
        !            72: - Implement XSchemas
        !            73:   => Really need to be done <grin/>
        !            74:   - datatype are complete, but structure support is very limited.
        !            75: 
        !            76: - extend the shell with:
        !            77:    - edit
        !            78:    - load/save
        !            79:    - mv (yum, yum, but it's harder because directories are ordered in
        !            80:      our case, mvup and mvdown would be required)
        !            81: 
        !            82: 
        !            83: Done:
        !            84: =====
        !            85: 
        !            86: - Add HTML validation using the XHTML DTD
        !            87:   - problem: do we want to keep and maintain the code for handling
        !            88:     DTD/System ID cache directly in libxml ?
        !            89:   => not really done that way, but there are new APIs to check elements
        !            90:      or attributes. Otherwise XHTML validation directly ...
        !            91: 
        !            92: - XML Schemas datatypes except Base64 and BinHex
        !            93: 
        !            94: - Relax NG validation
        !            95: 
        !            96: - XmlTextReader streaming API + validation
        !            97: 
        !            98: - Add a DTD cache prefilled with xhtml DTDs and entities and a program to
        !            99:   manage them -> like the /usr/bin/install-catalog from SGML
        !           100:   right place seems $datadir/xmldtds
        !           101:   Maybe this is better left to user apps
        !           102:   => use a catalog instead , and xhtml1-dtd package
        !           103: 
        !           104: - Add output to XHTML
        !           105:   => XML serializer automatically recognize the DTd and apply the specific
        !           106:      rules.
        !           107: 
        !           108: - Fix output of <tst val="x&#xA;y"/>
        !           109: 
        !           110: - compliance to XML-Namespace checking, see section 6 of
        !           111:   http://www.w3.org/TR/REC-xml-names/
        !           112: 
        !           113: - Correct standalone checking/emitting (hard)
        !           114:   2.9 Standalone Document Declaration
        !           115: 
        !           116: - Implement OASIS XML Catalog support
        !           117:   http://www.oasis-open.org/committees/entity/
        !           118: 
        !           119: - Get OASIS testsuite to a more friendly result, check all the results
        !           120:   once stable. the check-xml-test-suite.py script does this
        !           121: 
        !           122: - Implement XSLT
        !           123:   => libxslt
        !           124: 
        !           125: - Finish XPath
        !           126:   => attributes addressing troubles
        !           127:   => defaulted attributes handling
        !           128:   => namespace axis ?
        !           129:   done as XSLT got debugged
        !           130: 
        !           131: - bug reported by Michael Meallin on validation problems
        !           132:   => Actually means I need to add support (and warn) for non-deterministic
        !           133:      content model.
        !           134: - Handle undefined namespaces in entity contents better ... at least
        !           135:   issue a warning
        !           136: - DOM needs
        !           137:   int xmlPruneProp(xmlNodePtr node, xmlAtttrPtr attr);
        !           138:   => done it's actually xmlRemoveProp xmlUnsetProp xmlUnsetNsProp
        !           139: 
        !           140: - HTML: handling of Script and style data elements, need special code in
        !           141:   the parser and saving functions (handling of < > " ' ...):
        !           142:   http://www.w3.org/TR/html4/types.html#type-script
        !           143:   Attributes are no problems since entities are accepted.
        !           144: - DOM needs
        !           145:   xmlAttrPtr xmlNewDocProp(xmlDocPtr doc, const xmlChar *name, const xmlChar *value)
        !           146: - problem when parsing hrefs with & with the HTML parser (IRC ac)
        !           147: - If the internal encoding is not UTF8 saving to a given encoding doesn't
        !           148:   work => fix to force UTF8 encoding ...
        !           149:   done, added documentation too
        !           150: - Add an ASCII I/O encoder (asciiToUTF8 and UTF8Toascii)
        !           151: - Issue warning when using non-absolute namespaces URI.
        !           152: - the html parser should add <head> and <body> if they don't exist
        !           153:   started, not finished.
        !           154:   Done, the automatic closing is added and 3 testcases were inserted
        !           155: - Command to force the parser to stop parsing and ignore the rest of the file.
        !           156:   xmlStopParser() should allow this, mostly untested
        !           157: - support for HTML empty attributes like <hr noshade>
        !           158: - plugged iconv() in for support of a large set of encodings.
        !           159: - xmlSwitchToEncoding() rewrite done
        !           160: - URI checkings (no fragments) rfc2396.txt
        !           161: - Added a clean mechanism for overload or added input methods:
        !           162:   xmlRegisterInputCallbacks()
        !           163: - dynamically adapt the alloc entry point to use g_alloc()/g_free()
        !           164:   if the programmer wants it: 
        !           165:     - use xmlMemSetup() to reset the routines used.
        !           166: - Check attribute normalization especially xmlGetProp()
        !           167: - Validity checking problems for NOTATIONS attributes
        !           168: - Validity checking problems for ENTITY ENTITIES attributes
        !           169: - Parsing of a well balanced chunk xmlParseBalancedChunkMemory()
        !           170: - URI module: validation, base, etc ... see uri.[ch]
        !           171: - turn tester into a generic program xmllint installed with libxml
        !           172: - extend validity checks to go through entities content instead of
        !           173:   just labelling them PCDATA
        !           174: - Save Dtds using the children list instead of dumping the tables,
        !           175:   order is preserved as well as comments and PIs
        !           176: - Wrote a notice of changes requires to go from 1.x to 2.x
        !           177: - make sure that all SAX callbacks are disabled if a WF error is detected
        !           178: - checking/handling of newline normalization
        !           179:   http://localhost/www.xml.com/axml/target.html#sec-line-ends
        !           180: - correct checking of '&' '%' on entities content.
        !           181: - checking of PE/Nesting on entities declaration
        !           182: - checking/handling of xml:space
        !           183:    - checking done.
        !           184:    - handling done, not well tested
        !           185: - Language identification code, productions [33] to [38]
        !           186:   => done, the check has been added and report WFness errors
        !           187: - Conditional sections in DTDs [61] to [65]
        !           188:   => should this crap be really implemented ???
        !           189:   => Yep OASIS testsuite uses them
        !           190: - Allow parsed entities defined in the internal subset to override
        !           191:   the ones defined in the external subset (DtD customization).
        !           192:   => This mean that the entity content should be computed only at
        !           193:      use time, i.e. keep the orig string only at parse time and expand
        !           194:      only when referenced from the external subset :-(
        !           195:      Needed for complete use of most DTD from Eve Maler
        !           196: - Add regression tests for all WFC errors
        !           197:   => did some in test/WFC
        !           198:   => added OASIS testsuite routines
        !           199:      http://xmlsoft.org/conf/result.html
        !           200: 
        !           201: - I18N: http://wap.trondheim.com/vaer/index.phtml is not XML and accepted
        !           202:   by the XML parser, UTF-8 should be checked when there is no "encoding"
        !           203:   declared !
        !           204: - Support for UTF-8 and UTF-16 encoding
        !           205:   => added some convertion routines provided by Martin Durst
        !           206:      patched them, got fixes from @@@
        !           207:      I plan to keep everything internally as UTF-8 (or ISO-Latin-X)
        !           208:      this is slightly more costly but more compact, and recent processors
        !           209:      efficiency is cache related. The key for good performances is keeping
        !           210:      the data set small, so will I.
        !           211:   => the new progressive reading routines call the detection code
        !           212:      is enabled, tested the ISO->UTF-8 stuff
        !           213: - External entities loading: 
        !           214:    - allow override by client code
        !           215:    - make sure it is alled for all external entities referenced
        !           216:   Done, client code should use xmlSetExternalEntityLoader() to set
        !           217:   the default loading routine. It will be called each time an external
        !           218:   entity entity resolution is triggered.
        !           219: - maintain ID coherency when removing/changing attributes
        !           220:   The function used to deallocate attributes now check for it being an
        !           221:   ID and removes it from the table.
        !           222: - push mode parsing i.e. non-blocking state based parser
        !           223:   done, both for XML and HTML parsers. Use xmlCreatePushParserCtxt()
        !           224:   and xmlParseChunk() and html counterparts.
        !           225:   The tester program now has a --push option to select that parser 
        !           226:   front-end. Douplicated tests to use both and check results are similar.
        !           227: 
        !           228: - Most of XPath, still see some troubles and occasionnal memleaks.
        !           229: - an XML shell, allowing to traverse/manipulate an XML document with
        !           230:   a shell like interface, and using XPath for the anming syntax
        !           231:   - use of readline and history added when available
        !           232:   - the shell interface has been cleanly separated and moved to debugXML.c
        !           233: - HTML parser, should be fairly stable now
        !           234: - API to search the lang of an attribute
        !           235: - Collect IDs at parsing and maintain a table. 
        !           236:    PBM: maintain the table coherency
        !           237:    PBM: how to detect ID types in absence of DtD !
        !           238: - Use it for XPath ID support
        !           239: - Add validity checking
        !           240:   Should be finished now !
        !           241: - Add regression tests with entity substitutions
        !           242: 
        !           243: - External Parsed entities, either XML or external Subset [78] and [79]
        !           244:   parsing the xmllang DtD now works, so it should be sufficient for
        !           245:   most cases !
        !           246: 
        !           247: - progressive reading. The entity support is a first step toward
        !           248:   asbtraction of an input stream. A large part of the context is still
        !           249:   located on the stack, moving to a state machine and putting everyting
        !           250:   in the parsing context should provide an adequate solution.
        !           251:   => Rather than progressive parsing, give more power to the SAX-like
        !           252:      interface. Currently the DOM-like representation is built but
        !           253:      => it should be possible to define that only as a set of SAX callbacks
        !           254:        and remove the tree creation from the parser code.
        !           255:        DONE
        !           256: 
        !           257: - DOM support, instead of using a proprietary in memory
        !           258:   format for the document representation, the parser should
        !           259:   call a DOM API to actually build the resulting document.
        !           260:   Then the parser becomes independent of the in-memory
        !           261:   representation of the document. Even better using RPC's
        !           262:   the parser can actually build the document in another
        !           263:   program.
        !           264:   => Work started, now the internal representation is by default
        !           265:      very near a direct DOM implementation. The DOM glue is implemented
        !           266:      as a separate module. See the GNOME gdome module.
        !           267: 
        !           268: - C++ support : John Ehresman <jehresma@dsg.harvard.edu>
        !           269: - Updated code to follow more recent specs, added compatibility flag
        !           270: - Better error handling, use a dedicated, overridable error
        !           271:   handling function.
        !           272: - Support for CDATA.
        !           273: - Keep track of line numbers for better error reporting.
        !           274: - Support for PI (SAX one).
        !           275: - Support for Comments (bad, should be in ASAP, they are parsed
        !           276:   but not stored), should be configurable.
        !           277: - Improve the support of entities on save (+SAX).
        !           278: 

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>