Annotation of embedaddon/expat/doc/xmlwf.1, revision 1.1.1.1

1.1       misho       1: .\" This manpage has been automatically generated by docbook2man 
                      2: .\" from a DocBook document.  This tool can be found at:
                      3: .\" <http://shell.ipoline.com/~elmert/comp/docbook2X/> 
                      4: .\" Please send any bug reports, improvements, comments, patches, 
                      5: .\" etc. to Steve Cheng <steve@ggi-project.org>.
                      6: .TH "XMLWF" "1" "24 January 2003" "" ""
                      7: .SH NAME
                      8: xmlwf \- Determines if an XML document is well-formed
                      9: .SH SYNOPSIS
                     10: 
                     11: \fBxmlwf\fR [ \fB-s\fR]  [ \fB-n\fR]  [ \fB-p\fR]  [ \fB-x\fR]  [ \fB-e \fIencoding\fB\fR]  [ \fB-w\fR]  [ \fB-d \fIoutput-dir\fB\fR]  [ \fB-c\fR]  [ \fB-m\fR]  [ \fB-r\fR]  [ \fB-t\fR]  [ \fB-v\fR]  [ \fBfile ...\fR] 
                     12: 
                     13: .SH "DESCRIPTION"
                     14: .PP
                     15: \fBxmlwf\fR uses the Expat library to
                     16: determine if an XML document is well-formed.  It is
                     17: non-validating.
                     18: .PP
                     19: If you do not specify any files on the command-line, and you
                     20: have a recent version of \fBxmlwf\fR, the
                     21: input file will be read from standard input.
                     22: .SH "WELL-FORMED DOCUMENTS"
                     23: .PP
                     24: A well-formed document must adhere to the
                     25: following rules:
                     26: .TP 0.2i
                     27: \(bu
                     28: The file begins with an XML declaration.  For instance,
                     29: <?xml version="1.0" standalone="yes"?>.
                     30: \fBNOTE:\fR
                     31: \fBxmlwf\fR does not currently
                     32: check for a valid XML declaration.
                     33: .TP 0.2i
                     34: \(bu
                     35: Every start tag is either empty (<tag/>)
                     36: or has a corresponding end tag.
                     37: .TP 0.2i
                     38: \(bu
                     39: There is exactly one root element.  This element must contain
                     40: all other elements in the document.  Only comments, white
                     41: space, and processing instructions may come after the close
                     42: of the root element.
                     43: .TP 0.2i
                     44: \(bu
                     45: All elements nest properly.
                     46: .TP 0.2i
                     47: \(bu
                     48: All attribute values are enclosed in quotes (either single
                     49: or double).
                     50: .PP
                     51: If the document has a DTD, and it strictly complies with that
                     52: DTD, then the document is also considered \fBvalid\fR.
                     53: \fBxmlwf\fR is a non-validating parser --
                     54: it does not check the DTD.  However, it does support
                     55: external entities (see the \fB-x\fR option).
                     56: .SH "OPTIONS"
                     57: .PP
                     58: When an option includes an argument, you may specify the argument either
                     59: separately ("\fB-d\fR output") or concatenated with the
                     60: option ("\fB-d\fRoutput").  \fBxmlwf\fR
                     61: supports both.
                     62: .TP
                     63: \fB-c\fR
                     64: If the input file is well-formed and \fBxmlwf\fR
                     65: doesn't encounter any errors, the input file is simply copied to
                     66: the output directory unchanged.
                     67: This implies no namespaces (turns off \fB-n\fR) and
                     68: requires \fB-d\fR to specify an output file.
                     69: .TP
                     70: \fB-d output-dir\fR
                     71: Specifies a directory to contain transformed
                     72: representations of the input files.
                     73: By default, \fB-d\fR outputs a canonical representation
                     74: (described below).
                     75: You can select different output formats using \fB-c\fR
                     76: and \fB-m\fR.
                     77: 
                     78: The output filenames will
                     79: be exactly the same as the input filenames or "STDIN" if the input is
                     80: coming from standard input.  Therefore, you must be careful that the
                     81: output file does not go into the same directory as the input
                     82: file.  Otherwise, \fBxmlwf\fR will delete the
                     83: input file before it generates the output file (just like running
                     84: cat < file > file in most shells).
                     85: 
                     86: Two structurally equivalent XML documents have a byte-for-byte
                     87: identical canonical XML representation.
                     88: Note that ignorable white space is considered significant and
                     89: is treated equivalently to data.
                     90: More on canonical XML can be found at
                     91: http://www.jclark.com/xml/canonxml.html .
                     92: .TP
                     93: \fB-e encoding\fR
                     94: Specifies the character encoding for the document, overriding
                     95: any document encoding declaration.  \fBxmlwf\fR
                     96: supports four built-in encodings:
                     97: US-ASCII,
                     98: UTF-8,
                     99: UTF-16, and
                    100: ISO-8859-1.
                    101: Also see the \fB-w\fR option.
                    102: .TP
                    103: \fB-m\fR
                    104: Outputs some strange sort of XML file that completely
                    105: describes the input file, including character positions.
                    106: Requires \fB-d\fR to specify an output file.
                    107: .TP
                    108: \fB-n\fR
                    109: Turns on namespace processing.  (describe namespaces)
                    110: \fB-c\fR disables namespaces.
                    111: .TP
                    112: \fB-p\fR
                    113: Tells xmlwf to process external DTDs and parameter
                    114: entities.
                    115: 
                    116: Normally \fBxmlwf\fR never parses parameter
                    117: entities.  \fB-p\fR tells it to always parse them.
                    118: \fB-p\fR implies \fB-x\fR.
                    119: .TP
                    120: \fB-r\fR
                    121: Normally \fBxmlwf\fR memory-maps the XML file
                    122: before parsing; this can result in faster parsing on many
                    123: platforms.
                    124: \fB-r\fR turns off memory-mapping and uses normal file
                    125: IO calls instead.
                    126: Of course, memory-mapping is automatically turned off
                    127: when reading from standard input.
                    128: 
                    129: Use of memory-mapping can cause some platforms to report
                    130: substantially higher memory usage for
                    131: \fBxmlwf\fR, but this appears to be a matter of
                    132: the operating system reporting memory in a strange way; there is
                    133: not a leak in \fBxmlwf\fR.
                    134: .TP
                    135: \fB-s\fR
                    136: Prints an error if the document is not standalone. 
                    137: A document is standalone if it has no external subset and no
                    138: references to parameter entities.
                    139: .TP
                    140: \fB-t\fR
                    141: Turns on timings.  This tells Expat to parse the entire file,
                    142: but not perform any processing.
                    143: This gives a fairly accurate idea of the raw speed of Expat itself
                    144: without client overhead.
                    145: \fB-t\fR turns off most of the output options
                    146: (\fB-d\fR, \fB-m\fR, \fB-c\fR,
                    147: \&...).
                    148: .TP
                    149: \fB-v\fR
                    150: Prints the version of the Expat library being used, including some
                    151: information on the compile-time configuration of the library, and
                    152: then exits.
                    153: .TP
                    154: \fB-w\fR
                    155: Enables support for Windows code pages.
                    156: Normally, \fBxmlwf\fR will throw an error if it
                    157: runs across an encoding that it is not equipped to handle itself.  With
                    158: \fB-w\fR, xmlwf will try to use a Windows code
                    159: page.  See also \fB-e\fR.
                    160: .TP
                    161: \fB-x\fR
                    162: Turns on parsing external entities.
                    163: 
                    164: Non-validating parsers are not required to resolve external
                    165: entities, or even expand entities at all.
                    166: Expat always expands internal entities (?),
                    167: but external entity parsing must be enabled explicitly.
                    168: 
                    169: External entities are simply entities that obtain their
                    170: data from outside the XML file currently being parsed.
                    171: 
                    172: This is an example of an internal entity:
                    173: 
                    174: .nf
                    175: <!ENTITY vers '1.0.2'>
                    176: .fi
                    177: 
                    178: And here are some examples of external entities:
                    179: 
                    180: .nf
                    181: <!ENTITY header SYSTEM "header-&vers;.xml">  (parsed)
                    182: <!ENTITY logo SYSTEM "logo.png" PNG>         (unparsed)
                    183: .fi
                    184: .TP
                    185: \fB--\fR
                    186: (Two hyphens.)
                    187: Terminates the list of options.  This is only needed if a filename
                    188: starts with a hyphen.  For example:
                    189: 
                    190: .nf
                    191: xmlwf -- -myfile.xml
                    192: .fi
                    193: 
                    194: will run \fBxmlwf\fR on the file
                    195: \fI-myfile.xml\fR.
                    196: .PP
                    197: Older versions of \fBxmlwf\fR do not support
                    198: reading from standard input.
                    199: .SH "OUTPUT"
                    200: .PP
                    201: If an input file is not well-formed,
                    202: \fBxmlwf\fR prints a single line describing
                    203: the problem to standard output.  If a file is well formed,
                    204: \fBxmlwf\fR outputs nothing.
                    205: Note that the result code is \fBnot\fR set.
                    206: .SH "BUGS"
                    207: .PP
                    208: According to the W3C standard, an XML file without a
                    209: declaration at the beginning is not considered well-formed.
                    210: However, \fBxmlwf\fR allows this to pass.
                    211: .PP
                    212: \fBxmlwf\fR returns a 0 - noerr result,
                    213: even if the file is not well-formed.  There is no good way for
                    214: a program to use \fBxmlwf\fR to quickly
                    215: check a file -- it must parse \fBxmlwf\fR's
                    216: standard output.
                    217: .PP
                    218: The errors should go to standard error, not standard output.
                    219: .PP
                    220: There should be a way to get \fB-d\fR to send its
                    221: output to standard output rather than forcing the user to send
                    222: it to a file.
                    223: .PP
                    224: I have no idea why anyone would want to use the
                    225: \fB-d\fR, \fB-c\fR, and
                    226: \fB-m\fR options.  If someone could explain it to
                    227: me, I'd like to add this information to this manpage.
                    228: .SH "ALTERNATIVES"
                    229: .PP
                    230: Here are some XML validators on the web:
                    231: 
                    232: .nf
                    233: http://www.hcrc.ed.ac.uk/~richard/xml-check.html
                    234: http://www.stg.brown.edu/service/xmlvalid/
                    235: http://www.scripting.com/frontier5/xml/code/xmlValidator.html
                    236: http://www.xml.com/pub/a/tools/ruwf/check.html
                    237: .fi
                    238: .SH "SEE ALSO"
                    239: .PP
                    240: 
                    241: .nf
                    242: The Expat home page:        http://www.libexpat.org/
                    243: The W3 XML specification:   http://www.w3.org/TR/REC-xml
                    244: .fi
                    245: .SH "AUTHOR"
                    246: .PP
                    247: This manual page was written by Scott Bronson <bronson@rinspin.com> for
                    248: the Debian GNU/Linux system (but may be used by others).  Permission is
                    249: granted to copy, distribute and/or modify this document under
                    250: the terms of the GNU Free Documentation
                    251: License, Version 1.1.

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>