Annotation of embedaddon/libxml2/doc/guidelines.html, revision 1.1

1.1     ! misho       1: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        !             2:     "http://www.w3.org/TR/html4/loose.dtd">
        !             3: <html>
        !             4: <head>
        !             5:   <meta http-equiv="Content-Type" content="text/html">
        !             6:   <style type="text/css"></style>
        !             7: <!--
        !             8: TD {font-family: Verdana,Arial,Helvetica}
        !             9: BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
        !            10: H1 {font-family: Verdana,Arial,Helvetica}
        !            11: H2 {font-family: Verdana,Arial,Helvetica}
        !            12: H3 {font-family: Verdana,Arial,Helvetica}
        !            13: A:link, A:visited, A:active { text-decoration: underline }
        !            14:   </style>
        !            15: -->
        !            16:   <title>XML resources publication guidelines</title>
        !            17: </head>
        !            18: 
        !            19: <body bgcolor="#fffacd" text="#000000">
        !            20: <h1 align="center">XML resources publication guidelines</h1>
        !            21: 
        !            22: <p></p>
        !            23: 
        !            24: <p>The goal of this document is to provide a set of guidelines and tips
        !            25: helping the publication and deployment of <a
        !            26: href="http://www.w3.org/XML/">XML</a> resources for the <a
        !            27: href="http://www.gnome.org/">GNOME project</a>. However it is not tied to
        !            28: GNOME and might be helpful more generally. I welcome <a
        !            29: href="mailto:veillard@redhat.com">feedback</a> on this document.</p>
        !            30: 
        !            31: <p>The intended audience is the software developers who started using XML
        !            32: for some of the resources of their project, as a storage format, for data
        !            33: exchange, checking or transformations. There have been an increasing number
        !            34: of new XML formats defined, but not all steps have been taken, possibly because of
        !            35: lack of documentation, to truly gain all the benefits of the use of XML.
        !            36: These guidelines hope to improve the matter and provide a better overview of
        !            37: the overall XML processing and associated steps needed to deploy it
        !            38: successfully:</p>
        !            39: 
        !            40: <p>Table of contents:</p>
        !            41: <ol>
        !            42:   <li><a href="#Design">Design guidelines</a></li>
        !            43:   <li><a href="#Canonical">Canonical URL</a></li>
        !            44:   <li><a href="#Catalog">Catalog setup</a></li>
        !            45:   <li><a href="#Package">Package integration</a></li>
        !            46: </ol>
        !            47: 
        !            48: <h2><a name="Design">Design guidelines</a></h2>
        !            49: 
        !            50: <p>This part intends to focus on the format itself of XML. It may  arrive
        !            51: a bit too late since the structure of the document may already be cast in
        !            52: existing and deployed code. Still, here are a few rules which might be helpful
        !            53: when designing a new XML vocabulary or making the revision of an existing
        !            54: format:</p>
        !            55: 
        !            56: <h3>Reuse existing formats:</h3>
        !            57: 
        !            58: <p>This may sounds a bit simplistic, but before designing your own format,
        !            59: try to lookup existing XML vocabularies on similar data. Ideally this allows
        !            60: you to reuse them, in which case a lot of the existing tools like DTD, schemas
        !            61: and stylesheets may already be available. If you are looking at a
        !            62: documentation format, <a href="http://www.docbook.org/">DocBook</a> should
        !            63: handle your needs. If reuse is not possible because some semantic or use case
        !            64: aspects are too different this will be helpful avoiding design errors like
        !            65: targeting the vocabulary to the wrong abstraction level. In this format
        !            66: design phase try to be synthetic and be sure to express the real content of
        !            67: your data and use the XML structure to express the semantic and context of
        !            68: those data.</p>
        !            69: 
        !            70: <h3>DTD rules:</h3>
        !            71: 
        !            72: <p>Building a DTD (Document Type Definition) or a Schema describing the
        !            73: structure allowed by instances is the core of the design process of the
        !            74: vocabulary. Here are a few tips:</p>
        !            75: <ul>
        !            76:   <li>use significant words for the element and attributes names.</li>
        !            77:   <li>do not use attributes for general textual content, attributes
        !            78:     will be modified by the parser before reaching the application,
        !            79:     spaces and line informations will be modified.</li>
        !            80:   <li>use single elements for every string that might be subject to
        !            81:     localization. The canonical way to localize XML content is to use
        !            82:     siblings element carrying different xml:lang attributes like in the
        !            83:     following:
        !            84:     <pre>&lt;welcome&gt;
        !            85:   &lt;msg xml:lang="en"&gt;hello&lt;/msg&gt;
        !            86:   &lt;msg xml:lang="fr"&gt;bonjour&lt;/msg&gt;
        !            87: &lt;/welcome&gt;</pre>
        !            88:   </li>
        !            89:   <li>use attributes to refine the content of an element but avoid them for
        !            90:     more complex tasks, attribute parsing is not cheaper than an element and
        !            91:     it is far easier to make an element content more complex while attribute
        !            92:     will have to remain very simple.</li>
        !            93: </ul>
        !            94: 
        !            95: <h3>Versioning:</h3>
        !            96: 
        !            97: <p>As part of the design, make sure the structure you define will be usable
        !            98: for future extension that you may not consider for the current version. There
        !            99: are two parts to this:</p>
        !           100: <ul>
        !           101:   <li>Make sure the instance contains a version number which will allow to
        !           102:     make backward compatibility easy. Something as simple as having a
        !           103:     <code>version="1.0"</code> on the root document of the instance is
        !           104:     sufficient.</li>
        !           105:   <li>While designing the code doing the analysis of the data provided by the
        !           106:     XML parser, make sure you can work with unknown versions, generate a UI
        !           107:     warning and process only the tags recognized by your version but keep in
        !           108:     mind that you should not break on unknown elements if the version
        !           109:     attribute was not in the recognized set.</li>
        !           110: </ul>
        !           111: 
        !           112: <h3>Other design parts:</h3>
        !           113: 
        !           114: <p>While defining you vocabulary, try to think in term of other usage of your
        !           115: data, for example how using XSLT stylesheets could be used to make an HTML
        !           116: view of your data, or to convert it into a different format. Checking XML
        !           117: Schemas and looking at defining an XML Schema with a more complete
        !           118: validation and datatyping of your data structures is important, this helps
        !           119: avoiding some mistakes in the design phase.</p>
        !           120: 
        !           121: <h3>Namespace:</h3>
        !           122: 
        !           123: <p>If you expect your XML vocabulary to be used or recognized outside of your
        !           124: application (for example binding a specific processing from a graphic shell
        !           125: like Nautilus to an instance of your data) then you should really define an <a
        !           126: href="http://www.w3.org/TR/REC-xml-names/">XML namespace</a> for your
        !           127: vocabulary. A namespace name is an URL (absolute URI more precisely). It is
        !           128: generally recommended to anchor it as an HTTP resource to a server associated
        !           129: with the software project. See the next section about this. In practice this
        !           130: will mean that XML parsers will not handle your element names as-is but as a
        !           131: couple based on the namespace name and the element name. This allows it to
        !           132: recognize and disambiguate processing. Unicity of the namespace name can be
        !           133: for the most part guaranteed by the use of the DNS registry. Namespace can
        !           134: also be used to carry versioning information like:</p>
        !           135: 
        !           136: <p><code>"http://www.gnome.org/project/projectname/1.0/"</code></p>
        !           137: 
        !           138: <p>An easy way to use them is to make them the default namespace on the
        !           139: root element of the XML instance like:</p>
        !           140: <pre>&lt;structure xmlns="http://www.gnome.org/project/projectname/1.0/"&gt;
        !           141:   &lt;data&gt;
        !           142:   ...
        !           143:   &lt;/data&gt;
        !           144: &lt;/structure&gt;</pre>
        !           145: 
        !           146: <p>In that document, structure and all descendant elements like data are in
        !           147: the given namespace.</p>
        !           148: 
        !           149: <h2><a name="Canonical">Canonical URL</a></h2>
        !           150: 
        !           151: <p>As seen in the previous namespace section, while XML processing is not
        !           152: tied to the Web there is a natural synergy between both. XML was designed to
        !           153: be available on the Web, and keeping the infrastructure that way helps
        !           154: deploying the XML resources. The core of this issue is the notion of
        !           155: "Canonical URL" of an XML resource. The resource can be an XML document, a
        !           156: DTD, a stylesheet, a schema, or even non-XML data associated with an XML
        !           157: resource, the canonical URL is the URL where the "master" copy of that
        !           158: resource is expected to be present on the Web. Usually when processing XML a
        !           159: copy of the resource will be present on the local disk, maybe in
        !           160: /usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\
        !           161: (horror !). The key point is that the way to name that resource should be
        !           162: independent of the actual place where it resides on disk if it is available,
        !           163: and the fact that the processing will still work if there is no local copy
        !           164: (and that the machine where the processing is connected to the Internet).</p>
        !           165: 
        !           166: <p>What this really means is that one should never use the local name of a
        !           167: resource to reference it but always use the canonical URL. For example in a
        !           168: DocBook instance the following should not be used:</p>
        !           169: <pre>&lt;!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
        !           170: 
        !           171: 
        !           172:                          "/usr/share/xml/docbook/4.2/docbookx.dtd"&gt;</pre>
        !           173: 
        !           174: <p>But always reference the canonical URL for the DTD:</p>
        !           175: <pre>&lt;!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
        !           176: 
        !           177: 
        !           178:                          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"&gt;   </pre>
        !           179: 
        !           180: <p>Similarly, the document instance may reference the <a
        !           181: href="http://www.w3.org/TR/xslt">XSLT</a> stylesheets needed to process it to
        !           182: generate HTML, and the canonical URL should be used:</p>
        !           183: <pre>&lt;?xml-stylesheet
        !           184:   href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"
        !           185:   type="text/xsl"?&gt;</pre>
        !           186: 
        !           187: <p>Defining the canonical URL for the resources needed should obey a few
        !           188: simple rules similar to those used to design namespace names:</p>
        !           189: <ul>
        !           190:   <li>use a DNS name you know is associated to the project and will be
        !           191:     available on the long term</li>
        !           192:   <li>within that server space, reserve the right to the subtree where you
        !           193:     intend to keep those data</li>
        !           194:   <li>version the URL so that multiple concurrent versions of the resources
        !           195:     can be hosted simultaneously</li>
        !           196: </ul>
        !           197: 
        !           198: <h2><a name="Catalog">Catalog setup</a></h2>
        !           199: 
        !           200: <h3>How catalogs work:</h3>
        !           201: 
        !           202: <p>The catalogs are the technical mechanism which allow the XML processing
        !           203: tools to use a local copy of the resources if it is available even if the
        !           204: instance document references the canonical URL. <a
        !           205: href="http://www.oasis-open.org/committees/entity/">XML Catalogs</a> are
        !           206: anchored in the root catalog (usually <code>/etc/xml/catalog</code> or
        !           207: defined by the user). They are a tree of XML documents defining the mappings
        !           208: between the canonical naming space and the local installed ones, this can be
        !           209: seen as a static cache structure.</p>
        !           210: 
        !           211: <p>When the XML processor is asked to process a resource it will
        !           212: automatically test for a locally available version in the catalog, starting
        !           213: from the root catalog, and possibly fetching sub-catalog resources until it
        !           214: finds that the catalog has that resource or not. If not the default
        !           215: processing of fetching the resource from the Web is done, allowing in most
        !           216: case to recover from a catalog miss. The key point is that the document
        !           217: instances are totally independent of the availability of a catalog or from
        !           218: the actual place where the local resource they reference may be installed.
        !           219: This greatly improves the management of the documents in the long run, making
        !           220: them independent of the platform or toolchain used to process them. The
        !           221: figure below tries to express that  mechanism:<img src="catalog.gif"
        !           222: alt="Picture describing the catalog "></p>
        !           223: 
        !           224: <h3>Usual catalog setup:</h3>
        !           225: 
        !           226: <p>Usually catalogs for a project are setup as a 2 level hierarchical cache,
        !           227: the root catalog containing only "delegates" indicating a separate subcatalog
        !           228: dedicated to the project. The goal is to keep the root catalog clean and
        !           229: simplify the maintenance of the catalog by using separate catalogs per
        !           230: project. For example when creating a catalog for the <a
        !           231: href="http://www.w3.org/TR/xhtml1">XHTML1</a> DTDs, only 3 items are added to
        !           232: the root catalog:</p>
        !           233: <pre>  &lt;delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0"
        !           234:                   catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/&gt;
        !           235:   &lt;delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
        !           236:                   catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/&gt;
        !           237:   &lt;delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
        !           238:                   catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/&gt;</pre>
        !           239: 
        !           240: <p>They are all "delegates" meaning that if the catalog system is asked to
        !           241: resolve a reference corresponding to them, it has to lookup a sub catalog.
        !           242: Here the subcatalog was installed as
        !           243: <code>/usr/share/sgml/xhtml1/xmlcatalog</code> in the local tree. That
        !           244: decision is left to the sysadmin or the packager for that system and may
        !           245: obey different rules, but the actual place on the filesystem (or on a
        !           246: resource cache on the local network) will not influence the processing as
        !           247: long as it is available. The first rule indicate that if the reference uses a
        !           248: PUBLIC identifier beginning with the</p>
        !           249: 
        !           250: <p><code>"-//W3C//DTD XHTML 1.0"</code></p>
        !           251: 
        !           252: <p>substring, then the catalog lookup should be limited to the specific given
        !           253: lookup catalog. Similarly the second and third entries indicate those
        !           254: delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL
        !           255: starts with the <code>"http://www.w3.org/TR/xhtml1/DTD"</code> substring
        !           256: which indicates the location on the W3C server where the XHTML1 resources are
        !           257: stored. Those are the beginning of all Canonical URLs for XHTML1 resources.
        !           258: Those three rules are sufficient in practice to capture all references to XHTML1
        !           259: resources and direct the processing tools to the right subcatalog.</p>
        !           260: 
        !           261: <h3>A subcatalog example:</h3>
        !           262: 
        !           263: <p>Here is the complete subcatalog used for XHTML1:</p>
        !           264: <pre>&lt;?xml version="1.0"?&gt;
        !           265: &lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
        !           266:           "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
        !           267: &lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
        !           268:   &lt;public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
        !           269:           uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/&gt;
        !           270:   &lt;public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
        !           271:           uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/&gt;
        !           272:   &lt;public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN"
        !           273:           uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/&gt;
        !           274:   &lt;rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
        !           275:           rewritePrefix="xhtml1-20020801/DTD"/&gt;
        !           276:   &lt;rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
        !           277:           rewritePrefix="xhtml1-20020801/DTD"/&gt;
        !           278: &lt;/catalog&gt;</pre>
        !           279: 
        !           280: <p>There are a few things to notice:</p>
        !           281: <ul>
        !           282:   <li>this is an XML resource, it points to the DTD using Canonical URLs, the
        !           283:     root element defines a namespace (but based on an URN not an HTTP
        !           284:   URL).</li>
        !           285:   <li>it contains 5 rules, the 3 first ones are direct mapping for the 3
        !           286:     PUBLIC identifiers defined by the XHTML1 specification and associating
        !           287:     them with the local resource containing the DTD, the 2 last ones are
        !           288:     rewrite rules allowing to build the local filename for any URL based on
        !           289:     "http://www.w3.org/TR/xhtml1/DTD", the local cache simplifies the rules by
        !           290:     keeping the same structure as the on-line server at the Canonical URL</li>
        !           291:   <li>the local resources are designated using URI references (the uri or
        !           292:     rewritePrefix attributes), the base being the containing sub-catalog URL,
        !           293:     which means that in practice the copy of the XHTML1 strict DTD is stored
        !           294:     locally in
        !           295:     <code>/usr/share/sgml/xhtml1/xmlcatalog/xhtml1-20020801/DTD/xhtml1-strict.dtd</code></li>
        !           296: </ul>
        !           297: 
        !           298: <p>Those 5 rules are sufficient to cover all references to the resources held
        !           299: at the Canonical URL for the XHTML1 DTDs.</p>
        !           300: 
        !           301: <h2><a name="Package">Package integration</a></h2>
        !           302: 
        !           303: <p>Creating and removing catalogs should be handled as part of the process of
        !           304: (un)installing the local copy of the resources. The catalog files being XML
        !           305: resources should be processed with XML based tools to avoid problems with the
        !           306: generated files, the xmlcatalog command coming with libxml2 allows you to create
        !           307: catalogs, and add or remove rules at that time. Here is a complete example
        !           308: coming from the RPM for the XHTML1 DTDs post install script. While this example
        !           309: is platform and packaging specific, this can be useful as a an example in
        !           310: other contexts:</p>
        !           311: <pre>%post
        !           312: CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
        !           313: #
        !           314: # Register it in the super catalog with the appropriate delegates
        !           315: #
        !           316: ROOTCATALOG=/etc/xml/catalog
        !           317: 
        !           318: if [ ! -r $ROOTCATALOG ]
        !           319: then
        !           320:     /usr/bin/xmlcatalog --noout --create $ROOTCATALOG
        !           321: fi
        !           322: 
        !           323: if [ -w $ROOTCATALOG ]
        !           324: then
        !           325:         /usr/bin/xmlcatalog --noout --add "delegatePublic" \
        !           326:                 "-//W3C//DTD XHTML 1.0" \
        !           327:                 "file://$CATALOG" $ROOTCATALOG
        !           328:         /usr/bin/xmlcatalog --noout --add "delegateSystem" \
        !           329:                 "http://www.w3.org/TR/xhtml1/DTD" \
        !           330:                 "file://$CATALOG" $ROOTCATALOG
        !           331:         /usr/bin/xmlcatalog --noout --add "delegateURI" \
        !           332:                 "http://www.w3.org/TR/xhtml1/DTD" \
        !           333:                 "file://$CATALOG" $ROOTCATALOG
        !           334: fi</pre>
        !           335: 
        !           336: <p>The XHTML1 subcatalog is not created on-the-fly in that case, it is
        !           337: installed as part of the files of the packages. So the only work needed is to
        !           338: make sure the root catalog exists and register the delegate rules.</p>
        !           339: 
        !           340: <p>Similarly, the script for the post-uninstall just remove the rules from the
        !           341: catalog:</p>
        !           342: <pre>%postun
        !           343: #
        !           344: # On removal, unregister the xmlcatalog from the supercatalog
        !           345: #
        !           346: if [ "$1" = 0 ]; then
        !           347:     CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
        !           348:     ROOTCATALOG=/etc/xml/catalog
        !           349: 
        !           350:     if [ -w $ROOTCATALOG ]
        !           351:     then
        !           352:             /usr/bin/xmlcatalog --noout --del \
        !           353:                     "-//W3C//DTD XHTML 1.0" $ROOTCATALOG
        !           354:             /usr/bin/xmlcatalog --noout --del \
        !           355:                     "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
        !           356:             /usr/bin/xmlcatalog --noout --del \
        !           357:                     "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
        !           358:     fi
        !           359: fi</pre>
        !           360: 
        !           361: <p>Note the test against $1, this is needed to not remove the delegate rules
        !           362: in case of upgrade of the package.</p>
        !           363: 
        !           364: <p>Following the set of guidelines and tips provided in this document should
        !           365: help deploy the XML resources in the GNOME framework without much pain and
        !           366: ensure a smooth evolution of the resource and instances.</p>
        !           367: 
        !           368: <p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p>
        !           369: 
        !           370: <p>$Id$</p>
        !           371: 
        !           372: <p></p>
        !           373: </body>
        !           374: </html>

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>