In Depth
As you know, Extensible Markup Language (XML) is the base
language of XHTML. To be able to extend XHTML with new elements and attributes,
we’re going to dig into XML in this chapter, seeing how it works and then using
it to extend XHTML. I’ll begin with a solid foundation of XML.
XML is a markup language that you use to describe data, and it
allows far more precise structuring of that data than is possible with HTML. In
XML, you create your own tags and syntax for those tags, so you can let the
document structure follow the data structure. Using a scripting language like
JavaScript, you can access the various elements of an XML page and make use of
your data. In this chapter, I’ll start by discussing how to create an XML
document and how to work with it using JavaScript. In the next chapter, I’ll
take a look at the data-binding uses for XML.
The following list contains some resources you can use to learn
more about XML:
• http://msdn.microsoft.com/workshop/xml/index.asp—Microsoft’s
discussion of XML.
• http://msdn.microsoft.com/xml/tutorial/default.asp—Microsoft’s
XML tutorial.
• www.projectcool.com/developer/xmlz/index.html—Project
Cool’s in-depth tutorial.
• www.w3.org/TR/REC-xml—The latest XML specification.
The World Wide Web Consortium (W3C) is in charge of the specification of XML.
W3C sets the rules on how to create Document Type Definitions (DTDs) and other
elements that we’ll see throughout this chapter.
What Does XML Look Like?
I’ll create an XML page that holds the purchasing records of
several customers, showing how easy it is to create data structures in XML. As
you might expect, to start an XML page, you begin with the XML processing
instruction <?xml version =
"1.0"?>, which tells the browser that this document is XML.
Here’s the necessary first line of an XML document, just as in
XHTML:
<?xml version = "1.0"?>
.
.
.
You can name your own tags in XML, and I’ll do that here. The
body of the XML document should be enclosed in one XML element, which I’ll call
<DOCUMENT>:
<?xml version = "1.0"?>
<DOCUMENT>
.
.
.
</DOCUMENT>
Now I’ll start storing purchasing data by customer. To store a
customer’s data, I’ll create a new element, <CUSTOMER>,
which goes inside the <DOCUMENT>
element:
<?xml version = "1.0"?>
<DOCUMENT>
<CUSTOMER>
.
.
.
</CUSTOMER>
</DOCUMENT>
I can also store the customer’s name by creating a new <NAME> element, which itself
contains two elements—<LAST_NAME>
and <FIRST_NAME>:
<?xml version = "1.0"?>
<DOCUMENT>
<CUSTOMER>
<NAME>
<LAST_NAME>Thomson</LAST_NAME>
<FIRST_NAME>Susan</FIRST_NAME>
</NAME>
.
.
.
</CUSTOMER>
</DOCUMENT>
Additionally, I store the date of the record and the customer
orders in an
<ORDERS> element, where I place all the items the customer bought:
<?xml version = "1.0"?>
<DOCUMENT>
<CUSTOMER>
<NAME>
<LAST_NAME>Thomson</LAST_NAME>
<FIRST_NAME>Susan</FIRST_NAME>
</NAME>
<DATE>September 1,
2001</DATE>
<ORDERS>
<ITEM>
<PRODUCT>Video tape</PRODUCT>
<NUMBER>5</NUMBER>
<PRICE>$1.25</PRICE>
</ITEM>
<ITEM>
<PRODUCT>Shovel</PRODUCT>
<NUMBER>2</NUMBER>
<PRICE>$4.98</PRICE>
</ITEM>
</ORDERS>
</CUSTOMER>
.
.
.
</DOCUMENT>
I can store the records of as many customers as I want in this
XML page. Here’s how I add a new customer’s record:
<?xml version = "1.0"?>
<DOCUMENT>
<CUSTOMER>
<NAME>
<LAST_NAME>Thomson</LAST_NAME>
<FIRST_NAME>Susan</FIRST_NAME>
</NAME>
<DATE>September 1, 2001</DATE>
<ORDERS>
<ITEM>
<PRODUCT>Video tape</PRODUCT>
<NUMBER>5</NUMBER>
<PRICE>$1.25</PRICE>
</ITEM>
<ITEM>
<PRODUCT>Shovel</PRODUCT>
<NUMBER>2</NUMBER>
<PRICE>$4.98</PRICE>
</ITEM>
</ORDERS>
</CUSTOMER>
<CUSTOMER>
<NAME>
<LAST_NAME>Smithson</LAST_NAME>
<FIRST_NAME>Nancy</FIRST_NAME>
</NAME>
<DATE>September 2, 2001</DATE>
<ORDERS>
<ITEM>
<PRODUCT>Ribbon</PRODUCT>
<NUMBER>12</NUMBER>
<PRICE>$2.95</PRICE>
</ITEM>
<ITEM>
<PRODUCT>Goldfish</PRODUCT>
<NUMBER>6</NUMBER>
<PRICE>$1.50</PRICE>
</ITEM>
</ORDERS>
</CUSTOMER>
</DOCUMENT>
As you can see, XML provides you with a way of creating and
structuring your data in a manner that fits that data. You might be wondering
how browsers deal with such free-form data. For example, how will a browser
know how you want the <CUSTOMER>
element displayed? This points out a fundamental difference between XML and
HTML; XML provides a way of structuring your data, not a method to display it
as HTML does. (However, you can use Cascading Style Sheets [CSS] or the
Extensible Stylesheet Language [XSL] to do just that.) Although HTML can indicate
which text should be bold and which text italic, XML has no such formatting
built-in.
Internet Explorer provides you with access to the elements in an
XML page, as we’ll see throughout this chapter, in which I’ll use JavaScript to
access the data in XML pages. It’s up to you to interpret the data in the
document itself—Internet Explorer only makes it available to you through an
object model with properties and methods.
Internet Explorer can display an XML document directly, and you
can see the page we’ve just created in Figure 15.1. (You must give the file the
extension .xml to documents you want to view as XML documents.)
Figure 15.1 An XML document in Internet Explorer.
You can click the plus (+)
and minus (–) signs to
expand and collapse XML elements. As you see in the figure, I’ve collapsed the
first <CUSTOMER>
element and expanded the second.
XML browsers can do even more—they can check the XML page’s
syntax. You provide the elements in the page and specify what syntax is legal
and what is not. For example, you indicate which elements may contain other
elements, exactly which elements an element can contain, how many elements it
can contain, and so on. There are two ways of specifying syntax for an XML
page—using a Document Type Definition (DTD), as we’ve seen with XHTML, or using
an XML schema. An XML
schema is a Microsoft innovation that serves the same purpose as a DTD,
although the schema is supposed to be easier to create and allow you more
control. I’ll take a look at creating both DTDs and schemas in this chapter.
The latest technique in Web pages is separating the user
interface from data, and XML enables you to do this. (The lack of such a
separation is the reason W3C didn’t adopt Netscape’s <LAYER> element as official.) On its Microsoft
Developer Network (MSDN) Web site, Microsoft says, “XML separates the data from
the presentation and the process, enabling you to display and process the data
as you wish by applying different style sheets and applications.” In practice,
what this means is that the real XML processing takes place in code, and you’re
responsible for writing that code. As the XML tags you use become standardized
in your group or corporation, you can exchange XML pages with others. The
JavaScript you write can extract the data from the XML page and work with it,
even displaying that data to your specification. We’ll see quite a few examples
of this in this chapter.
From Microsoft’s point of view, you can use XML to create:
• An ordinary document
• A structured record,
such as an appointment record or purchase order
• An object with data and
methods, such as the persistent form of a Java object or ActiveX control
• A data record, such as
the result set of a query
• Metacontent about a Web
site, such as Channel Definition Format (CDF) data
• A graphical
presentation, such as an application’s user interface
• XML schemas and types
We’ve seen how to create a basic XML document, but there’s more
to the process. Ideally, XML documents—and therefore XHTML documents—should
also be valid and well-formed, and I’ll
discuss what this means before getting into the details of working with an XML
document’s data.
Valid and Well-Formed XML Documents
An XML document is considered valid
if there is a DTD or an XML schema associated with it and if the document
complies with the DTD or schema. That’s all there is to making a document
valid.
TIP: To
check the validity of an XML page, you can open it in Internet Explorer, which
will tell you if the document does not comply with the DTD or the schema. Also,
you might want to check out the Microsoft XML validator page at http://msdn.microsoft.com/downloads/samples/internet/xml/xml_validator/default.asp.
You can download and run the Microsoft validator to test XML documents, or
enter the URL of an XML document to check it online.
An XML document is considered well
formed if it contains one or more elements, if there is precisely one
element (the root or document element) for which
neither the start nor the end tag is inside any other element, and if all other
tags nest within each other correctly. In addition, all elements used in the
document must be predefined in XML, a DTD, or an XML schema.
TIP: Note
in particular the requirement that the entire XML document be enclosed in one
element, the root element. This fact will be important when we start working
with the contents of XML documents in code because we’ll get access to the root
element first, and then move to other elements as required. Take a look at the
previous XML example, in which the root element is <DOCUMENT>. In XHTML documents, the
root element is <html>.
Here’s an example. I’ll add a DTD to the XML document we created
in the beginning of the chapter to make it both valid and well formed:
<?xml version = "1.0" ?>
<!DOCTYPE
DOCUMENT [
<!ELEMENT DOCUMENT (CUSTOMER)*>
<!ELEMENT CUSTOMER (NAME,DATE,ORDERS)>
<!ELEMENT NAME (LAST_NAME,FIRST_NAME)>
<!ELEMENT LAST_NAME (#PCDATA)>
<!ELEMENT FIRST_NAME (#PCDATA)>
<!ELEMENT DATE (#PCDATA)>
<!ELEMENT ORDERS (ITEM)*>
<!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)>
<!ELEMENT PRODUCT (#PCDATA)>
<!ELEMENT NUMBER (#PCDATA)>
<!ELEMENT PRICE (#PCDATA)>
]>
<DOCUMENT>
<CUSTOMER>
<NAME>
<LAST_NAME>Thomson</LAST_NAME>
<FIRST_NAME>Susan</FIRST_NAME>
</NAME>
<DATE>September 1, 2001</DATE>
<ORDERS>
<ITEM>
<PRODUCT>Video tape</PRODUCT>
<NUMBER>5</NUMBER>
<PRICE>$1.25</PRICE>
</ITEM>
<ITEM>
<PRODUCT>Shovel</PRODUCT>
<NUMBER>2</NUMBER>
<PRICE>$4.98</PRICE>
</ITEM>
</ORDERS>
</CUSTOMER>
<CUSTOMER>
<NAME>
<LAST_NAME>Smithson</LAST_NAME>
<FIRST_NAME>Nancy</FIRST_NAME>
</NAME>
<DATE>September 2, 2001</DATE>
<ORDERS>
<ITEM>
<PRODUCT>Ribbon</PRODUCT>
<NUMBER>12</NUMBER>
<PRICE>$2.95</PRICE>
</ITEM>
<ITEM>
<PRODUCT>Goldfish</PRODUCT>
<NUMBER>6</NUMBER>
<PRICE>$1.50</PRICE>
</ITEM>
</ORDERS>
</CUSTOMER>
</DOCUMENT>
Here’s a document that is well formed but not valid (because
there is no DTD or schema):
<?xml version="1.0"?>
<DOCUMENT>
<TITLE>
A Noisy Noise
Annoys An Oyster
</TITLE>
</DOCUMENT>
Here is a document that contains a nesting error and no DTD, so
it is neither valid nor well formed:
<?xml version="1.0"?>
<TITLE>
A Noisy Noise
Annoys An Oyster
<HEADING>
</TITLE>
A Study Of
Shellfish And Audio Disturbances
</HEADING>
Most XML parsers, like the one in Internet Explorer, require XML
documents to be well formed but not necessarily valid. (Most XML parsers do not
require a DTD, but if there is one, the parser will use it to check the XML
document.) The formal specification recommends that your XML documents be both
valid and well formed.
To make an XML document valid, you must check it against a DTD
or schema. In the following sections, I’ll briefly discuss how to create both
of these items and what they look like. (Note also that neither a DTD nor a
schema is necessary before Internet Explorer will work with an XML document.)
XML Document Type Definitions
We’ve already seen how easy it is to create XML documents. In
fact, if you want to make sure your documents are valid (that is, adhere to the
syntax rules you set), you’ll need a DTD or a schema. Creating these items is
more complex than creating XML documents. I’ll take a look at DTDs first.
You can use internal or external DTDs with XML documents. Here’s
an example of an internal DTD. Note that you enclose the DTD in a <!DOCTYPE> element,
providing the name of the root element of the document (which is THESIS here) in the <!DOCTYPE> element:
<?xml version="1.0"?>
<!DOCTYPE
THESIS [
<!ELEMENT
THESIS (P*)>
<!ELEMENT
P (#PCDATA)>
]>
<THESIS>
<P>
This is my Ph.D.
thesis.
</P>
<P>
Do you like it?
</P>
<P>
If so, please give
me my Ph.D.
</P>
</THESIS>
The DTD indicates how the syntax works for the XML elements
you’re creating. For example, which elements can be inside which other
elements? This DTD follows the W3C syntax conventions, which means that you
specify each element with <!ELEMENT>.
You can also specify that the contents of an element be parsed character data (#PCDATA), other elements that
you’ve created, or both. In this example, I’m indicating that the <THESIS> element must
contain only <P>
elements, but it can contain zero or more occurrences of the <P> element (which
is what the asterisk [*] after P in
<!ELEMENT THESIS (P*)>
means). The following list contains the symbols that you can use when defining
the syntax of an element:
• a b—a
followed by b.
• a | b—a or b but not both.
• a - b—The set of strings represented by a but not represented by b.
• a?—a or
nothing.
• a+—One or more occurrences of a.
• a*—Zero or more occurrences of a.
• (expression)—Surrounding an expression with
parentheses means that it is treated as a unit and can carry the suffix
operator ?, *, or +.
In addition to defining the <THESIS>
element, I define the <P>
element so that it can hold only text, that is, parsed character data, with the
keyword #PCDATA:
<?xml version="1.0"?>
<!DOCTYPE THESIS [
<!ELEMENT THESIS
(P*)>
]>
<THESIS>
<P>
This is my Ph.D.
thesis.
</P>
<P>
Do you like it?
</P>
<P>
If so, please give
me my Ph.D.
</P>
</THESIS>
In this way, I’ve specified the syntax of these two elements, <THESIS> and <P>. We’ll learn how to
create DTDs, and we’ll review an extensive example in the section “Creating XML
Documents with DTDs” in the “Immediate Solutions” section.
You can also specify an external
DTD with the SYSTEM keyword
in the <!DOCTYPE>
element. (The SYSTEM keyword
is primarily for private DTDs; as we’ll see in the “Immediate Solutions”
section “Creating Public Extended XHTML DTDs,” you can also use the PUBLIC keyword to create public
DTDs.) Here’s an example of an external DTD:
<?xml version="1.0"?>
<!DOCTYPE DOCUMENT SYSTEM
"dtdthesis.dtd">
<DOCUMENT>
<P>
This is my Ph.D.
thesis.
</P>
<P>
Do you like it?
</P>
<P>
If so, please give
me my Ph.D.
</P>
</DOCUMENT>
The file dtdthesis.dtd just contains the <!ELEMENT> elements, like this:
<!ELEMENT THESIS (P*)>
<!ELEMENT P (#PCDATA)>
And that’s all it takes to create an external DTD. Besides
specifying the syntax of XML elements, you can also specify which attributes
elements can have, as we’ll see in “Specifying Attributes in DTDs” in the
“Immediate Solutions” section. In the meantime, I’ll take a look at creating
XML schemas now.
XML Schemas
XML schemas are an alternative to DTDs, and are supported in
some measure by Microsoft. If you want to create valid XML documents for use
with Internet Explorer, you can use either a DTD or a schema.
NOTE: The
XML Schema implementation that ships with Internet Explorer 5 is based on the
XML-Data Note (www.w3.org/TR/1998/NOTE-XML-data-0105/)
posted by the W3C in January 1998 and on the Document Content Description (DCD)
Note (www.w3.org/TR/NOTE-dcd).
It’s out of date with the current W3C XML Schema working draft, which you can
find at www.w3.org/TR/xmlschema-0.
Here’s an example in which I’m creating an XML document with the
root element <TASKFORCE>.
To specify the name of a schema that resides in a separate file, you use the xmlns (XML namespace) attribute in
an XML document, like this:
<?xml version="1.0" ?>
<TASKFORCE
xmlns="x-schema:schema1.xml">
<EMPLOYEE>George
Patton</EMPLOYEE>
<EMPLOYEE>Douglas MacArthur</EMPLOYEE>
<DESCRIPTION>XML
Programming Taskforce</DESCRIPTION>
</TASKFORCE>
In this case, I’m indicating that the schema for this XML
document is schema1.xml. In schema1.xml, you start with the <SCHEMA> element like this:
<SCHEMA NAME="schema1">
.
.
.
</SCHEMA>
To use schemas, you must include the following two lines, which
create two XML namespaces with the xmlns
keyword, which you set to the Uniform Resource Names (URNs) for the Microsoft
definitions for use in schemas:
<SCHEMA NAME="schema1"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
.
.
.
</SCHEMA>
The xmlns
attribute defines an XML namespace, and we’ve been using xmlns all along in XHTML documents
to define the approved XHTML namespace, http://www.w3.org/1999/xhtml:
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/tr/xhtml1/DTD/xhtml1-transitional.dtd">
<html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>
.
.
.
Namespaces were introduced to avoid element and attribute name
clashes. For example, the code snippet just before the previous one defines a
namespace called dt, for
data type, which includes an attribute named type.
You might have already defined an attribute named type, so to avoid clashes with Microsoft’s attribute
with the same name, you qualify the Microsoft attributes with the dt namespace prefix, like this: dt:type="int". You can
also qualify element names with namespaces in the same way, such as <coriolis:document>, which
creates a <document>
element as defined in the coriolis
namespace. Namespaces become important when you’re importing someone else’s XML
element and attribute definitions, which you do when you’re creating a
Microsoft XML schema. (I’ll cover how to use namespaces in schemas in “Creating
XML Documents with Schemas” in the “Immediate Solutions” section.)
To specify the syntax of an element in a schema, you use the <ELEMENTTYPE> element, as in
this next example. I’m specifying that the <EMPLOYEE>
and <DESCRIPTION>
elements can contain only text and that their specifications are closed, which means that
they cannot accept any content other than what is listed. (If you leave the
specifications open, the element can contain content other than what you list.)
Here’s the code:
<SCHEMA NAME="schema1"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ELEMENTTYPE name="EMPLOYEE"
content="textOnly" model="closed"/>
<ELEMENTTYPE
name="DESCRIPTION" content="textOnly"
model="closed"/>
.
.
.
</SCHEMA>
Here’s how I define the <TASKFORCE>
element, which can contain both
<EMPLOYEE> and <DESCRIPTION>
elements—but can contain only
elements (not text). You specify this condition with the eltOnly keyword. I’m also
specifying that the <EMPLOYEE>
element must occur at least once, and the <DESCRIPTION>
element must occur once, but only once, in the <TASKFORCE>
element:
<SCHEMA NAME="schema1"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ELEMENTTYPE
name="EMPLOYEE" content="textOnly"
model="closed"/>
<ELEMENTTYPE
name="DESCRIPTION" content="textOnly"
model="closed"/>
<ELEMENTTYPE name="TASKFORCE"
content="eltOnly" model="closed">
<ELEMENT type="EMPLOYEE" minOccurs="1"
maxOccurs="*"/>
<ELEMENT type="DESCRIPTION" minOccurs="1"
maxOccurs="1"/>
</ELEMENTTYPE>
</SCHEMA>
That completes our overview of XML DTDs and schemas; I’ll
discuss how to work with XML documents in Internet Explorer 5 next.