Annotation of embedaddon/libxml2/doc/guidelines.html, revision 1.1
1.1 ! misho 1: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
! 2: "http://www.w3.org/TR/html4/loose.dtd">
! 3: <html>
! 4: <head>
! 5: <meta http-equiv="Content-Type" content="text/html">
! 6: <style type="text/css"></style>
! 7: <!--
! 8: TD {font-family: Verdana,Arial,Helvetica}
! 9: BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
! 10: H1 {font-family: Verdana,Arial,Helvetica}
! 11: H2 {font-family: Verdana,Arial,Helvetica}
! 12: H3 {font-family: Verdana,Arial,Helvetica}
! 13: A:link, A:visited, A:active { text-decoration: underline }
! 14: </style>
! 15: -->
! 16: <title>XML resources publication guidelines</title>
! 17: </head>
! 18:
! 19: <body bgcolor="#fffacd" text="#000000">
! 20: <h1 align="center">XML resources publication guidelines</h1>
! 21:
! 22: <p></p>
! 23:
! 24: <p>The goal of this document is to provide a set of guidelines and tips
! 25: helping the publication and deployment of <a
! 26: href="http://www.w3.org/XML/">XML</a> resources for the <a
! 27: href="http://www.gnome.org/">GNOME project</a>. However it is not tied to
! 28: GNOME and might be helpful more generally. I welcome <a
! 29: href="mailto:veillard@redhat.com">feedback</a> on this document.</p>
! 30:
! 31: <p>The intended audience is the software developers who started using XML
! 32: for some of the resources of their project, as a storage format, for data
! 33: exchange, checking or transformations. There have been an increasing number
! 34: of new XML formats defined, but not all steps have been taken, possibly because of
! 35: lack of documentation, to truly gain all the benefits of the use of XML.
! 36: These guidelines hope to improve the matter and provide a better overview of
! 37: the overall XML processing and associated steps needed to deploy it
! 38: successfully:</p>
! 39:
! 40: <p>Table of contents:</p>
! 41: <ol>
! 42: <li><a href="#Design">Design guidelines</a></li>
! 43: <li><a href="#Canonical">Canonical URL</a></li>
! 44: <li><a href="#Catalog">Catalog setup</a></li>
! 45: <li><a href="#Package">Package integration</a></li>
! 46: </ol>
! 47:
! 48: <h2><a name="Design">Design guidelines</a></h2>
! 49:
! 50: <p>This part intends to focus on the format itself of XML. It may arrive
! 51: a bit too late since the structure of the document may already be cast in
! 52: existing and deployed code. Still, here are a few rules which might be helpful
! 53: when designing a new XML vocabulary or making the revision of an existing
! 54: format:</p>
! 55:
! 56: <h3>Reuse existing formats:</h3>
! 57:
! 58: <p>This may sounds a bit simplistic, but before designing your own format,
! 59: try to lookup existing XML vocabularies on similar data. Ideally this allows
! 60: you to reuse them, in which case a lot of the existing tools like DTD, schemas
! 61: and stylesheets may already be available. If you are looking at a
! 62: documentation format, <a href="http://www.docbook.org/">DocBook</a> should
! 63: handle your needs. If reuse is not possible because some semantic or use case
! 64: aspects are too different this will be helpful avoiding design errors like
! 65: targeting the vocabulary to the wrong abstraction level. In this format
! 66: design phase try to be synthetic and be sure to express the real content of
! 67: your data and use the XML structure to express the semantic and context of
! 68: those data.</p>
! 69:
! 70: <h3>DTD rules:</h3>
! 71:
! 72: <p>Building a DTD (Document Type Definition) or a Schema describing the
! 73: structure allowed by instances is the core of the design process of the
! 74: vocabulary. Here are a few tips:</p>
! 75: <ul>
! 76: <li>use significant words for the element and attributes names.</li>
! 77: <li>do not use attributes for general textual content, attributes
! 78: will be modified by the parser before reaching the application,
! 79: spaces and line informations will be modified.</li>
! 80: <li>use single elements for every string that might be subject to
! 81: localization. The canonical way to localize XML content is to use
! 82: siblings element carrying different xml:lang attributes like in the
! 83: following:
! 84: <pre><welcome>
! 85: <msg xml:lang="en">hello</msg>
! 86: <msg xml:lang="fr">bonjour</msg>
! 87: </welcome></pre>
! 88: </li>
! 89: <li>use attributes to refine the content of an element but avoid them for
! 90: more complex tasks, attribute parsing is not cheaper than an element and
! 91: it is far easier to make an element content more complex while attribute
! 92: will have to remain very simple.</li>
! 93: </ul>
! 94:
! 95: <h3>Versioning:</h3>
! 96:
! 97: <p>As part of the design, make sure the structure you define will be usable
! 98: for future extension that you may not consider for the current version. There
! 99: are two parts to this:</p>
! 100: <ul>
! 101: <li>Make sure the instance contains a version number which will allow to
! 102: make backward compatibility easy. Something as simple as having a
! 103: <code>version="1.0"</code> on the root document of the instance is
! 104: sufficient.</li>
! 105: <li>While designing the code doing the analysis of the data provided by the
! 106: XML parser, make sure you can work with unknown versions, generate a UI
! 107: warning and process only the tags recognized by your version but keep in
! 108: mind that you should not break on unknown elements if the version
! 109: attribute was not in the recognized set.</li>
! 110: </ul>
! 111:
! 112: <h3>Other design parts:</h3>
! 113:
! 114: <p>While defining you vocabulary, try to think in term of other usage of your
! 115: data, for example how using XSLT stylesheets could be used to make an HTML
! 116: view of your data, or to convert it into a different format. Checking XML
! 117: Schemas and looking at defining an XML Schema with a more complete
! 118: validation and datatyping of your data structures is important, this helps
! 119: avoiding some mistakes in the design phase.</p>
! 120:
! 121: <h3>Namespace:</h3>
! 122:
! 123: <p>If you expect your XML vocabulary to be used or recognized outside of your
! 124: application (for example binding a specific processing from a graphic shell
! 125: like Nautilus to an instance of your data) then you should really define an <a
! 126: href="http://www.w3.org/TR/REC-xml-names/">XML namespace</a> for your
! 127: vocabulary. A namespace name is an URL (absolute URI more precisely). It is
! 128: generally recommended to anchor it as an HTTP resource to a server associated
! 129: with the software project. See the next section about this. In practice this
! 130: will mean that XML parsers will not handle your element names as-is but as a
! 131: couple based on the namespace name and the element name. This allows it to
! 132: recognize and disambiguate processing. Unicity of the namespace name can be
! 133: for the most part guaranteed by the use of the DNS registry. Namespace can
! 134: also be used to carry versioning information like:</p>
! 135:
! 136: <p><code>"http://www.gnome.org/project/projectname/1.0/"</code></p>
! 137:
! 138: <p>An easy way to use them is to make them the default namespace on the
! 139: root element of the XML instance like:</p>
! 140: <pre><structure xmlns="http://www.gnome.org/project/projectname/1.0/">
! 141: <data>
! 142: ...
! 143: </data>
! 144: </structure></pre>
! 145:
! 146: <p>In that document, structure and all descendant elements like data are in
! 147: the given namespace.</p>
! 148:
! 149: <h2><a name="Canonical">Canonical URL</a></h2>
! 150:
! 151: <p>As seen in the previous namespace section, while XML processing is not
! 152: tied to the Web there is a natural synergy between both. XML was designed to
! 153: be available on the Web, and keeping the infrastructure that way helps
! 154: deploying the XML resources. The core of this issue is the notion of
! 155: "Canonical URL" of an XML resource. The resource can be an XML document, a
! 156: DTD, a stylesheet, a schema, or even non-XML data associated with an XML
! 157: resource, the canonical URL is the URL where the "master" copy of that
! 158: resource is expected to be present on the Web. Usually when processing XML a
! 159: copy of the resource will be present on the local disk, maybe in
! 160: /usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\
! 161: (horror !). The key point is that the way to name that resource should be
! 162: independent of the actual place where it resides on disk if it is available,
! 163: and the fact that the processing will still work if there is no local copy
! 164: (and that the machine where the processing is connected to the Internet).</p>
! 165:
! 166: <p>What this really means is that one should never use the local name of a
! 167: resource to reference it but always use the canonical URL. For example in a
! 168: DocBook instance the following should not be used:</p>
! 169: <pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
! 170:
! 171:
! 172: "/usr/share/xml/docbook/4.2/docbookx.dtd"></pre>
! 173:
! 174: <p>But always reference the canonical URL for the DTD:</p>
! 175: <pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
! 176:
! 177:
! 178: "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> </pre>
! 179:
! 180: <p>Similarly, the document instance may reference the <a
! 181: href="http://www.w3.org/TR/xslt">XSLT</a> stylesheets needed to process it to
! 182: generate HTML, and the canonical URL should be used:</p>
! 183: <pre><?xml-stylesheet
! 184: href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"
! 185: type="text/xsl"?></pre>
! 186:
! 187: <p>Defining the canonical URL for the resources needed should obey a few
! 188: simple rules similar to those used to design namespace names:</p>
! 189: <ul>
! 190: <li>use a DNS name you know is associated to the project and will be
! 191: available on the long term</li>
! 192: <li>within that server space, reserve the right to the subtree where you
! 193: intend to keep those data</li>
! 194: <li>version the URL so that multiple concurrent versions of the resources
! 195: can be hosted simultaneously</li>
! 196: </ul>
! 197:
! 198: <h2><a name="Catalog">Catalog setup</a></h2>
! 199:
! 200: <h3>How catalogs work:</h3>
! 201:
! 202: <p>The catalogs are the technical mechanism which allow the XML processing
! 203: tools to use a local copy of the resources if it is available even if the
! 204: instance document references the canonical URL. <a
! 205: href="http://www.oasis-open.org/committees/entity/">XML Catalogs</a> are
! 206: anchored in the root catalog (usually <code>/etc/xml/catalog</code> or
! 207: defined by the user). They are a tree of XML documents defining the mappings
! 208: between the canonical naming space and the local installed ones, this can be
! 209: seen as a static cache structure.</p>
! 210:
! 211: <p>When the XML processor is asked to process a resource it will
! 212: automatically test for a locally available version in the catalog, starting
! 213: from the root catalog, and possibly fetching sub-catalog resources until it
! 214: finds that the catalog has that resource or not. If not the default
! 215: processing of fetching the resource from the Web is done, allowing in most
! 216: case to recover from a catalog miss. The key point is that the document
! 217: instances are totally independent of the availability of a catalog or from
! 218: the actual place where the local resource they reference may be installed.
! 219: This greatly improves the management of the documents in the long run, making
! 220: them independent of the platform or toolchain used to process them. The
! 221: figure below tries to express that mechanism:<img src="catalog.gif"
! 222: alt="Picture describing the catalog "></p>
! 223:
! 224: <h3>Usual catalog setup:</h3>
! 225:
! 226: <p>Usually catalogs for a project are setup as a 2 level hierarchical cache,
! 227: the root catalog containing only "delegates" indicating a separate subcatalog
! 228: dedicated to the project. The goal is to keep the root catalog clean and
! 229: simplify the maintenance of the catalog by using separate catalogs per
! 230: project. For example when creating a catalog for the <a
! 231: href="http://www.w3.org/TR/xhtml1">XHTML1</a> DTDs, only 3 items are added to
! 232: the root catalog:</p>
! 233: <pre> <delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0"
! 234: catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
! 235: <delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
! 236: catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
! 237: <delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
! 238: catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/></pre>
! 239:
! 240: <p>They are all "delegates" meaning that if the catalog system is asked to
! 241: resolve a reference corresponding to them, it has to lookup a sub catalog.
! 242: Here the subcatalog was installed as
! 243: <code>/usr/share/sgml/xhtml1/xmlcatalog</code> in the local tree. That
! 244: decision is left to the sysadmin or the packager for that system and may
! 245: obey different rules, but the actual place on the filesystem (or on a
! 246: resource cache on the local network) will not influence the processing as
! 247: long as it is available. The first rule indicate that if the reference uses a
! 248: PUBLIC identifier beginning with the</p>
! 249:
! 250: <p><code>"-//W3C//DTD XHTML 1.0"</code></p>
! 251:
! 252: <p>substring, then the catalog lookup should be limited to the specific given
! 253: lookup catalog. Similarly the second and third entries indicate those
! 254: delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL
! 255: starts with the <code>"http://www.w3.org/TR/xhtml1/DTD"</code> substring
! 256: which indicates the location on the W3C server where the XHTML1 resources are
! 257: stored. Those are the beginning of all Canonical URLs for XHTML1 resources.
! 258: Those three rules are sufficient in practice to capture all references to XHTML1
! 259: resources and direct the processing tools to the right subcatalog.</p>
! 260:
! 261: <h3>A subcatalog example:</h3>
! 262:
! 263: <p>Here is the complete subcatalog used for XHTML1:</p>
! 264: <pre><?xml version="1.0"?>
! 265: <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
! 266: "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
! 267: <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
! 268: <public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
! 269: uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/>
! 270: <public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
! 271: uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/>
! 272: <public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN"
! 273: uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/>
! 274: <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
! 275: rewritePrefix="xhtml1-20020801/DTD"/>
! 276: <rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
! 277: rewritePrefix="xhtml1-20020801/DTD"/>
! 278: </catalog></pre>
! 279:
! 280: <p>There are a few things to notice:</p>
! 281: <ul>
! 282: <li>this is an XML resource, it points to the DTD using Canonical URLs, the
! 283: root element defines a namespace (but based on an URN not an HTTP
! 284: URL).</li>
! 285: <li>it contains 5 rules, the 3 first ones are direct mapping for the 3
! 286: PUBLIC identifiers defined by the XHTML1 specification and associating
! 287: them with the local resource containing the DTD, the 2 last ones are
! 288: rewrite rules allowing to build the local filename for any URL based on
! 289: "http://www.w3.org/TR/xhtml1/DTD", the local cache simplifies the rules by
! 290: keeping the same structure as the on-line server at the Canonical URL</li>
! 291: <li>the local resources are designated using URI references (the uri or
! 292: rewritePrefix attributes), the base being the containing sub-catalog URL,
! 293: which means that in practice the copy of the XHTML1 strict DTD is stored
! 294: locally in
! 295: <code>/usr/share/sgml/xhtml1/xmlcatalog/xhtml1-20020801/DTD/xhtml1-strict.dtd</code></li>
! 296: </ul>
! 297:
! 298: <p>Those 5 rules are sufficient to cover all references to the resources held
! 299: at the Canonical URL for the XHTML1 DTDs.</p>
! 300:
! 301: <h2><a name="Package">Package integration</a></h2>
! 302:
! 303: <p>Creating and removing catalogs should be handled as part of the process of
! 304: (un)installing the local copy of the resources. The catalog files being XML
! 305: resources should be processed with XML based tools to avoid problems with the
! 306: generated files, the xmlcatalog command coming with libxml2 allows you to create
! 307: catalogs, and add or remove rules at that time. Here is a complete example
! 308: coming from the RPM for the XHTML1 DTDs post install script. While this example
! 309: is platform and packaging specific, this can be useful as a an example in
! 310: other contexts:</p>
! 311: <pre>%post
! 312: CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
! 313: #
! 314: # Register it in the super catalog with the appropriate delegates
! 315: #
! 316: ROOTCATALOG=/etc/xml/catalog
! 317:
! 318: if [ ! -r $ROOTCATALOG ]
! 319: then
! 320: /usr/bin/xmlcatalog --noout --create $ROOTCATALOG
! 321: fi
! 322:
! 323: if [ -w $ROOTCATALOG ]
! 324: then
! 325: /usr/bin/xmlcatalog --noout --add "delegatePublic" \
! 326: "-//W3C//DTD XHTML 1.0" \
! 327: "file://$CATALOG" $ROOTCATALOG
! 328: /usr/bin/xmlcatalog --noout --add "delegateSystem" \
! 329: "http://www.w3.org/TR/xhtml1/DTD" \
! 330: "file://$CATALOG" $ROOTCATALOG
! 331: /usr/bin/xmlcatalog --noout --add "delegateURI" \
! 332: "http://www.w3.org/TR/xhtml1/DTD" \
! 333: "file://$CATALOG" $ROOTCATALOG
! 334: fi</pre>
! 335:
! 336: <p>The XHTML1 subcatalog is not created on-the-fly in that case, it is
! 337: installed as part of the files of the packages. So the only work needed is to
! 338: make sure the root catalog exists and register the delegate rules.</p>
! 339:
! 340: <p>Similarly, the script for the post-uninstall just remove the rules from the
! 341: catalog:</p>
! 342: <pre>%postun
! 343: #
! 344: # On removal, unregister the xmlcatalog from the supercatalog
! 345: #
! 346: if [ "$1" = 0 ]; then
! 347: CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
! 348: ROOTCATALOG=/etc/xml/catalog
! 349:
! 350: if [ -w $ROOTCATALOG ]
! 351: then
! 352: /usr/bin/xmlcatalog --noout --del \
! 353: "-//W3C//DTD XHTML 1.0" $ROOTCATALOG
! 354: /usr/bin/xmlcatalog --noout --del \
! 355: "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
! 356: /usr/bin/xmlcatalog --noout --del \
! 357: "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
! 358: fi
! 359: fi</pre>
! 360:
! 361: <p>Note the test against $1, this is needed to not remove the delegate rules
! 362: in case of upgrade of the package.</p>
! 363:
! 364: <p>Following the set of guidelines and tips provided in this document should
! 365: help deploy the XML resources in the GNOME framework without much pain and
! 366: ensure a smooth evolution of the resource and instances.</p>
! 367:
! 368: <p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p>
! 369:
! 370: <p>$Id$</p>
! 371:
! 372: <p></p>
! 373: </body>
! 374: </html>
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>