perfectxml.com
 Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
  You are here: home »» Free Library »» The Coriolis Group XML Books » Chapter 15 from XHTML Black Book Saturday, 23 February 2008
 

Chapter 15

XML and Extending XHTML

Page 1 of 5. Goto Page 2 | 3 | 4 | 5


In Depth

As you know, Extensible Markup Language (XML) is the base language of XHTML. To be able to extend XHTML with new elements and attributes, we’re going to dig into XML in this chapter, seeing how it works and then using it to extend XHTML. I’ll begin with a solid foundation of XML.

XML is a markup language that you use to describe data, and it allows far more precise structuring of that data than is possible with HTML. In XML, you create your own tags and syntax for those tags, so you can let the document structure follow the data structure. Using a scripting language like JavaScript, you can access the various elements of an XML page and make use of your data. In this chapter, I’ll start by discussing how to create an XML document and how to work with it using JavaScript. In the next chapter, I’ll take a look at the data-binding uses for XML.

The following list contains some resources you can use to learn more about XML:

http://msdn.microsoft.com/workshop/xml/index.asp—Microsoft’s discussion of XML.

http://msdn.microsoft.com/xml/tutorial/default.asp—Microsoft’s XML tutorial.

www.projectcool.com/developer/xmlz/index.html—Project Cool’s in-depth tutorial.

www.w3.org/TR/REC-xml—The latest XML specification. The World Wide Web Consortium (W3C) is in charge of the specification of XML. W3C sets the rules on how to create Document Type Definitions (DTDs) and other elements that we’ll see throughout this chapter.

What Does XML Look Like?

I’ll create an XML page that holds the purchasing records of several customers, showing how easy it is to create data structures in XML. As you might expect, to start an XML page, you begin with the XML processing instruction <?xml version = "1.0"?>, which tells the browser that this document is XML.

Here’s the necessary first line of an XML document, just as in XHTML:

 

<?xml version = "1.0"?>

    .

    .

    .

 

You can name your own tags in XML, and I’ll do that here. The body of the XML document should be enclosed in one XML element, which I’ll call <DOCUMENT>:

 

<?xml version = "1.0"?>

<DOCUMENT>

    .

    .

    .

</DOCUMENT>

 

Now I’ll start storing purchasing data by customer. To store a customer’s data, I’ll create a new element, <CUSTOMER>, which goes inside the <DOCUMENT> element:

 

<?xml version = "1.0"?>

<DOCUMENT>

    <CUSTOMER>

    .

    .

    .

    </CUSTOMER>

</DOCUMENT>

 

I can also store the customer’s name by creating a new <NAME> element, which itself contains two elements—<LAST_NAME> and <FIRST_NAME>:

 

<?xml version = "1.0"?>

<DOCUMENT>

    <CUSTOMER>

        <NAME>

            <LAST_NAME>Thomson</LAST_NAME>

            <FIRST_NAME>Susan</FIRST_NAME>

        </NAME>

        .

        .

        .

    </CUSTOMER>

</DOCUMENT>

 

Additionally, I store the date of the record and the customer orders in an
<ORDERS>
element, where I place all the items the customer bought:

 

<?xml version = "1.0"?>

<DOCUMENT>

    <CUSTOMER>

        <NAME>

            <LAST_NAME>Thomson</LAST_NAME>

            <FIRST_NAME>Susan</FIRST_NAME>

        </NAME>

        <DATE>September 1, 2001</DATE>

        <ORDERS>

            <ITEM>

                <PRODUCT>Video tape</PRODUCT>

                <NUMBER>5</NUMBER>

                <PRICE>$1.25</PRICE>

            </ITEM>

            <ITEM>

                <PRODUCT>Shovel</PRODUCT>

                <NUMBER>2</NUMBER>

                <PRICE>$4.98</PRICE>

            </ITEM>

        </ORDERS>

    </CUSTOMER>

    .

    .

    .

</DOCUMENT>

 

I can store the records of as many customers as I want in this XML page. Here’s how I add a new customer’s record:

 

<?xml version = "1.0"?>

<DOCUMENT>

    <CUSTOMER>

        <NAME>

            <LAST_NAME>Thomson</LAST_NAME>

            <FIRST_NAME>Susan</FIRST_NAME>

        </NAME>

        <DATE>September 1, 2001</DATE>

        <ORDERS>

            <ITEM>

                <PRODUCT>Video tape</PRODUCT>

                <NUMBER>5</NUMBER>

                <PRICE>$1.25</PRICE>

            </ITEM>

            <ITEM>

                <PRODUCT>Shovel</PRODUCT>

                <NUMBER>2</NUMBER>

                <PRICE>$4.98</PRICE>

            </ITEM>

        </ORDERS>

    </CUSTOMER>

    <CUSTOMER>

        <NAME>

            <LAST_NAME>Smithson</LAST_NAME>

            <FIRST_NAME>Nancy</FIRST_NAME>

        </NAME>

        <DATE>September 2, 2001</DATE>

        <ORDERS>

            <ITEM>

                <PRODUCT>Ribbon</PRODUCT>

                <NUMBER>12</NUMBER>

                <PRICE>$2.95</PRICE>

            </ITEM>

            <ITEM>

                <PRODUCT>Goldfish</PRODUCT>

                <NUMBER>6</NUMBER>

                <PRICE>$1.50</PRICE>

            </ITEM>

        </ORDERS>

    </CUSTOMER>

</DOCUMENT>

 

As you can see, XML provides you with a way of creating and structuring your data in a manner that fits that data. You might be wondering how browsers deal with such free-form data. For example, how will a browser know how you want the <CUSTOMER> element displayed? This points out a fundamental difference between XML and HTML; XML provides a way of structuring your data, not a method to display it as HTML does. (However, you can use Cascading Style Sheets [CSS] or the Extensible Stylesheet Language [XSL] to do just that.) Although HTML can indicate which text should be bold and which text italic, XML has no such formatting built-in.

Internet Explorer provides you with access to the elements in an XML page, as we’ll see throughout this chapter, in which I’ll use JavaScript to access the data in XML pages. It’s up to you to interpret the data in the document itself—Internet Explorer only makes it available to you through an object model with properties and methods.

Internet Explorer can display an XML document directly, and you can see the page we’ve just created in Figure 15.1. (You must give the file the extension .xml to documents you want to view as XML documents.)

 

Figure 15.1  An XML document in Internet Explorer.

 

You can click the plus (+) and minus () signs to expand and collapse XML elements. As you see in the figure, I’ve collapsed the first <CUSTOMER> element and expanded the second.

XML browsers can do even more—they can check the XML page’s syntax. You provide the elements in the page and specify what syntax is legal and what is not. For example, you indicate which elements may contain other elements, exactly which elements an element can contain, how many elements it can contain, and so on. There are two ways of specifying syntax for an XML page—using a Document Type Definition (DTD), as we’ve seen with XHTML, or using an XML schema. An XML schema is a Microsoft innovation that serves the same purpose as a DTD, although the schema is supposed to be easier to create and allow you more control. I’ll take a look at creating both DTDs and schemas in this chapter.

The latest technique in Web pages is separating the user interface from data, and XML enables you to do this. (The lack of such a separation is the reason W3C didn’t adopt Netscape’s <LAYER> element as official.) On its Microsoft Developer Network (MSDN) Web site, Microsoft says, “XML separates the data from the presentation and the process, enabling you to display and process the data as you wish by applying different style sheets and applications.” In practice, what this means is that the real XML processing takes place in code, and you’re responsible for writing that code. As the XML tags you use become standardized in your group or corporation, you can exchange XML pages with others. The JavaScript you write can extract the data from the XML page and work with it, even displaying that data to your specification. We’ll see quite a few examples of this in this chapter.

From Microsoft’s point of view, you can use XML to create:

An ordinary document

A structured record, such as an appointment record or purchase order

An object with data and methods, such as the persistent form of a Java object or ActiveX control

A data record, such as the result set of a query

Metacontent about a Web site, such as Channel Definition Format (CDF) data

A graphical presentation, such as an application’s user interface

XML schemas and types

We’ve seen how to create a basic XML document, but there’s more to the process. Ideally, XML documents—and therefore XHTML documents—should also be valid and well-formed, and I’ll discuss what this means before getting into the details of working with an XML document’s data.

Valid and Well-Formed XML Documents

An XML document is considered valid if there is a DTD or an XML schema associated with it and if the document complies with the DTD or schema. That’s all there is to making a document valid.

TIP: To check the validity of an XML page, you can open it in Internet Explorer, which will tell you if the document does not comply with the DTD or the schema. Also, you might want to check out the Microsoft XML validator page at http://msdn.microsoft.com/downloads/samples/internet/xml/xml_validator/default.asp. You can download and run the Microsoft validator to test XML documents, or enter the URL of an XML document to check it online.

An XML document is considered well formed if it contains one or more elements, if there is precisely one element (the root or document element) for which neither the start nor the end tag is inside any other element, and if all other tags nest within each other correctly. In addition, all elements used in the document must be predefined in XML, a DTD, or an XML schema.

TIP: Note in particular the requirement that the entire XML document be enclosed in one element, the root element. This fact will be important when we start working with the contents of XML documents in code because we’ll get access to the root element first, and then move to other elements as required. Take a look at the previous XML example, in which the root element is <DOCUMENT>. In XHTML documents, the root element is <html>.

Here’s an example. I’ll add a DTD to the XML document we created in the beginning of the chapter to make it both valid and well formed:

 

<?xml version = "1.0" ?>

<!DOCTYPE DOCUMENT [                           

<!ELEMENT DOCUMENT (CUSTOMER)*>                

<!ELEMENT CUSTOMER (NAME,DATE,ORDERS)>         

<!ELEMENT NAME (LAST_NAME,FIRST_NAME)>           

<!ELEMENT LAST_NAME (#PCDATA)>                  

<!ELEMENT FIRST_NAME (#PCDATA)>                 

<!ELEMENT DATE (#PCDATA)>                      

<!ELEMENT ORDERS (ITEM)*>                      

<!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)>         

<!ELEMENT PRODUCT (#PCDATA)>                   

<!ELEMENT NUMBER (#PCDATA)>                    

<!ELEMENT PRICE (#PCDATA)>                     

]>                                             

<DOCUMENT>

    <CUSTOMER>

        <NAME>

            <LAST_NAME>Thomson</LAST_NAME>

            <FIRST_NAME>Susan</FIRST_NAME>

        </NAME>

        <DATE>September 1, 2001</DATE>

        <ORDERS>

            <ITEM>

                <PRODUCT>Video tape</PRODUCT>

                <NUMBER>5</NUMBER>

                <PRICE>$1.25</PRICE>

            </ITEM>

            <ITEM>

                <PRODUCT>Shovel</PRODUCT>

                <NUMBER>2</NUMBER>

                <PRICE>$4.98</PRICE>

            </ITEM>

        </ORDERS>

    </CUSTOMER>

    <CUSTOMER>

        <NAME>

            <LAST_NAME>Smithson</LAST_NAME>

            <FIRST_NAME>Nancy</FIRST_NAME>

        </NAME>

        <DATE>September 2, 2001</DATE>

        <ORDERS>

            <ITEM>

                <PRODUCT>Ribbon</PRODUCT>

                <NUMBER>12</NUMBER>

                <PRICE>$2.95</PRICE>

            </ITEM>

            <ITEM>

                <PRODUCT>Goldfish</PRODUCT>

                <NUMBER>6</NUMBER>

                <PRICE>$1.50</PRICE>

            </ITEM>

        </ORDERS>

    </CUSTOMER>

</DOCUMENT>

 

Here’s a document that is well formed but not valid (because there is no DTD or schema):

 

<?xml version="1.0"?>

<DOCUMENT>

    <TITLE>

        A Noisy Noise Annoys An Oyster

    </TITLE>

</DOCUMENT>

 

Here is a document that contains a nesting error and no DTD, so it is neither valid nor well formed:

 

<?xml version="1.0"?>

    <TITLE>

        A Noisy Noise Annoys An Oyster

    <HEADING>

    </TITLE>

        A Study Of Shellfish And Audio Disturbances

    </HEADING>

 

Most XML parsers, like the one in Internet Explorer, require XML documents to be well formed but not necessarily valid. (Most XML parsers do not require a DTD, but if there is one, the parser will use it to check the XML document.) The formal specification recommends that your XML documents be both valid and well formed.

To make an XML document valid, you must check it against a DTD or schema. In the following sections, I’ll briefly discuss how to create both of these items and what they look like. (Note also that neither a DTD nor a schema is necessary before Internet Explorer will work with an XML document.)

XML Document Type Definitions

We’ve already seen how easy it is to create XML documents. In fact, if you want to make sure your documents are valid (that is, adhere to the syntax rules you set), you’ll need a DTD or a schema. Creating these items is more complex than creating XML documents. I’ll take a look at DTDs first.

You can use internal or external DTDs with XML documents. Here’s an example of an internal DTD. Note that you enclose the DTD in a <!DOCTYPE> element, providing the name of the root element of the document (which is THESIS here) in the <!DOCTYPE> element:

 

<?xml version="1.0"?>

<!DOCTYPE THESIS [                 

    <!ELEMENT THESIS (P*)>

    <!ELEMENT P (#PCDATA)>

]>

<THESIS>

    <P>

        This is my Ph.D. thesis.

    </P>

    <P>

        Do you like it?

    </P>

    <P>

        If so, please give me my Ph.D.

    </P>

</THESIS>

 

The DTD indicates how the syntax works for the XML elements you’re creating. For example, which elements can be inside which other elements? This DTD follows the W3C syntax conventions, which means that you specify each element with <!ELEMENT>. You can also specify that the contents of an element be parsed character data (#PCDATA), other elements that you’ve created, or both. In this example, I’m indicating that the <THESIS> element must contain only <P> elements, but it can contain zero or more occurrences of the <P> element (which
is what the asterisk [*] after P in <!ELEMENT THESIS (P*)> means). The following list contains the symbols that you can use when defining the syntax of an element:

a ba followed by b.

a | ba or b but not both.

a - b—The set of strings represented by a but not represented by b.

a?a or nothing.

a+—One or more occurrences of a.

a*—Zero or more occurrences of a.

(expression)—Surrounding an expression with parentheses means that it is treated as a unit and can carry the suffix operator ?, *, or +.

In addition to defining the <THESIS> element, I define the <P> element so that it can hold only text, that is, parsed character data, with the keyword #PCDATA:

 

<?xml version="1.0"?>

<!DOCTYPE THESIS [                 

    <!ELEMENT THESIS (P*)>

    <!ELEMENT P (#PCDATA)>

]>

<THESIS>

    <P>

        This is my Ph.D. thesis.

    </P>

    <P>

        Do you like it?

    </P>

    <P>

        If so, please give me my Ph.D.

    </P>

</THESIS>

 

In this way, I’ve specified the syntax of these two elements, <THESIS> and <P>. We’ll learn how to create DTDs, and we’ll review an extensive example in the section “Creating XML Documents with DTDs” in the “Immediate Solutions” section.

You can also specify an external DTD with the SYSTEM keyword in the <!DOCTYPE> element. (The SYSTEM keyword is primarily for private DTDs; as we’ll see in the “Immediate Solutions” section “Creating Public Extended XHTML DTDs,” you can also use the PUBLIC keyword to create public DTDs.) Here’s an example of an external DTD:

 

<?xml version="1.0"?>

<!DOCTYPE DOCUMENT SYSTEM "dtdthesis.dtd">

<DOCUMENT>

    <P>

        This is my Ph.D. thesis.

    </P>

    <P>

        Do you like it?

    </P>

    <P>

        If so, please give me my Ph.D.

    </P>

</DOCUMENT>

 

The file dtdthesis.dtd just contains the <!ELEMENT> elements, like this:

 

<!ELEMENT THESIS (P*)>

<!ELEMENT P (#PCDATA)>

 

And that’s all it takes to create an external DTD. Besides specifying the syntax of XML elements, you can also specify which attributes elements can have, as we’ll see in “Specifying Attributes in DTDs” in the “Immediate Solutions” section. In the meantime, I’ll take a look at creating XML schemas now.

XML Schemas

XML schemas are an alternative to DTDs, and are supported in some measure by Microsoft. If you want to create valid XML documents for use with Internet Explorer, you can use either a DTD or a schema.

NOTE: The XML Schema implementation that ships with Internet Explorer 5 is based on the XML-Data Note (www.w3.org/TR/1998/NOTE-XML-data-0105/) posted by the W3C in January 1998 and on the Document Content Description (DCD) Note (www.w3.org/TR/NOTE-dcd). It’s out of date with the current W3C XML Schema working draft, which you can find at www.w3.org/TR/xmlschema-0.

Here’s an example in which I’m creating an XML document with the root element <TASKFORCE>. To specify the name of a schema that resides in a separate file, you use the xmlns (XML namespace) attribute in an XML document, like this:

 

<?xml version="1.0" ?>

<TASKFORCE xmlns="x-schema:schema1.xml">

    <EMPLOYEE>George Patton</EMPLOYEE>

    <EMPLOYEE>Douglas MacArthur</EMPLOYEE>

    <DESCRIPTION>XML Programming Taskforce</DESCRIPTION>

</TASKFORCE>

 

In this case, I’m indicating that the schema for this XML document is schema1.xml. In schema1.xml, you start with the <SCHEMA> element like this:

 

<SCHEMA NAME="schema1">

    .

    .

    .

</SCHEMA>

 

To use schemas, you must include the following two lines, which create two XML namespaces with the xmlns keyword, which you set to the Uniform Resource Names (URNs) for the Microsoft definitions for use in schemas:

 

<SCHEMA NAME="schema1"

    xmlns="urn:schemas-microsoft-com:xml-data"

    xmlns:dt="urn:schemas-microsoft-com:datatypes">

    .

    .

    .

</SCHEMA>

 

The xmlns attribute defines an XML namespace, and we’ve been using xmlns all along in XHTML documents to define the approved XHTML namespace, http://www.w3.org/1999/xhtml:

 

<?xml version="1.0"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/tr/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

    <head>

        <title>

    .

    .

    .

 

Namespaces were introduced to avoid element and attribute name clashes. For example, the code snippet just before the previous one defines a namespace called dt, for data type, which includes an attribute named type. You might have already defined an attribute named type, so to avoid clashes with Microsoft’s attribute with the same name, you qualify the Microsoft attributes with the dt namespace prefix, like this: dt:type="int". You can also qualify element names with namespaces in the same way, such as <coriolis:document>, which creates a <document> element as defined in the coriolis namespace. Namespaces become important when you’re importing someone else’s XML element and attribute definitions, which you do when you’re creating a Microsoft XML schema. (I’ll cover how to use namespaces in schemas in “Creating XML Documents with Schemas” in the “Immediate Solutions” section.)

To specify the syntax of an element in a schema, you use the <ELEMENTTYPE> element, as in this next example. I’m specifying that the <EMPLOYEE> and <DESCRIPTION> elements can contain only text and that their specifications are closed, which means that they cannot accept any content other than what is listed. (If you leave the specifications open, the element can contain content other than what you list.) Here’s the code:

 

<SCHEMA NAME="schema1"

    xmlns="urn:schemas-microsoft-com:xml-data"

    xmlns:dt="urn:schemas-microsoft-com:datatypes">

 

    <ELEMENTTYPE name="EMPLOYEE" content="textOnly" model="closed"/>

    <ELEMENTTYPE name="DESCRIPTION" content="textOnly" model="closed"/>

    .

    .

    .

</SCHEMA>

 

Here’s how I define the <TASKFORCE> element, which can contain both
<EMPLOYEE>
and <DESCRIPTION> elements—but can contain only elements (not text). You specify this condition with the eltOnly keyword. I’m also
specifying that the <EMPLOYEE> element must occur at least once, and the <DESCRIPTION> element must occur once, but only once, in the <TASKFORCE> element:

 

<SCHEMA NAME="schema1"

    xmlns="urn:schemas-microsoft-com:xml-data"

    xmlns:dt="urn:schemas-microsoft-com:datatypes">

 

    <ELEMENTTYPE name="EMPLOYEE" content="textOnly" model="closed"/>

    <ELEMENTTYPE name="DESCRIPTION" content="textOnly" model="closed"/>

 

    <ELEMENTTYPE name="TASKFORCE" content="eltOnly" model="closed">

        <ELEMENT type="EMPLOYEE" minOccurs="1" maxOccurs="*"/>

        <ELEMENT type="DESCRIPTION" minOccurs="1" maxOccurs="1"/>

  </ELEMENTTYPE>

</SCHEMA>

 

That completes our overview of XML DTDs and schemas; I’ll discuss how to work with XML documents in Internet Explorer 5 next.



Page 1 of 5. Goto Page 2 | 3 | 4 | 5



  Contact Us | E-mail Us | Site Guide | About PerfectXML | Advertise ©2004 perfectxml.com. All rights reserved. | Privacy