create macro-like objects (called entities) that are expanded into text as the XML docu-
ment is parsed. All of this is explored in Chapter 2.
An XML document that conforms to the fundamental syntax of markup tags is
called well-formed. To be a well-formed document, all elements must have matching
opening and closing tags, and the tags must be nested properly. For example, the ex-
pression <p><b>text</b></p> is well-formed because the opening tags, <p><b>,
have closing tags, </b></p>, and they are nested properly. To be well-formed every
closing tag must be a match with the most recent tag that is still open. In short, all tags
must be closed in the exact reverse order in which they were opened. The expressions
<p><b>text</b> and <p><b>text</p></b> are not well-formed.
An XML document that conforms to the rules of its DTD is referred to as a valid doc-
ument. For a document to be successfully tested as being valid, it must also be well-
formed. Some parsers can check the document against the DTD definitions and throw
an exception if the document is not valid.
Use of DTDs is very important to the portability of XML. If a DTD is well written
(that is, if all the tags are defined properly), a process can be written that will be able to
read and interpret any XML data from any document that conforms to the rules of the
DTD.
A single XML document can use more than one DTD. However, this multiple DTD
use can result in a naming collision. If two or more DTDs define a tag by the same
name, they will more than likely define that tag as having different characteristics. For
example, one could be defined as requiring a font attribute, whereas the other has no
such attribute. This problem is solved by using device known as a namespace. An ele-
ment specified as being from one namespace is distinct from one of the same name
from another namespace. For example, if a pair of DTDs both include a definition for
an element with the tag name selectable, one DTD could be declared in the name-
space max and the other in the namespace scrim; then there would be the two distinct
tag names max:selectable and scrim:selectable available for use in the docu-
ment. Examples using namespaces are explained in Chapter 2.
XSL
XSL stands for XML Stylesheet Language. It is used as a set of instructions for the
translation of the content of an XML document into another formusually a presenta-
tion form intended to be displayed to a human. An XSL program is actually, in itself, a
document that adheres to the syntax of XML. It contains a set of detailed instructions
for extracting data from another XML document and converting it to a new format. Per-
forming such transformation is the subject of Chapter 8.
The process of using XSL to change the format of the data is known as transformation.
Transformation methods are built into the JAXP that can be used to perform any data
format translation you define in an XSL document. These transformations can be pro-
grammed directly into Java instead of using XSL, but XSL simplifies things by taking
care of some of the underlying mechanics, such as walking through the memory-
resident parse tree to examine the source document. It also supplies you with some
6
Chapter 1
3851 P-01 1/28/02 10:32 AM Page 6