perfectxml.com
 Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
home » free library » Microsoft Press » Chapter 3: Creating DTDs from the book XML Pocket Consultant Fri, Jul 13, 2007
Chapter 3: Creating DTDs

In the previous chapter you learned about the basic structures for creating XML documents. As mentioned previously, XML documents can contain many different types of markup, including elements, attributes, and entity references. Whether you generate documents manually in an editor, programmatically in an application, or automatically using a document management system, youíll often need to ensure that documents conform to a specific set of rules. That is, youíll want to ensure that not only are the data structures in the documents formatted correctly, but also that the documents can be understood by the applications that will process them. What you need is a way of expressing the necessary data structures as a set of rules and ensuring conformity. What you need is a custom application of XML. Enter document type definitions (DTDs) and schemas.

  Sample Chapter from the book:


XML Pocket Consultant
Both DTDs and schemas allow you to create XML applications. An XML application defines a custom markup language that describes specific types of data and uses the rules set out in the associated DTD or schema. The DTD/schema rules specify items that are allowed or required in compliant documents. Once you create an XML application using a DTD or schema, you can write documents that conform to your custom markup language. Application software, database systems, and other programs can use the DTD/schema to interpret compliant documents and ensure conformity to the rule set. DTDs are the focus of this chapter. Youíll learn more about schemas in Part III, "XML Schemas."

Understanding DTDs

DTDs have a formal—fairly rigid—syntax that precisely describes the elements and entities that may appear in a document, as well as the contents and attributes for acceptable elements. In a DTD you could specify that a purchase order must have one and only one order number but can have one or more requested products. You could go on to specify that each purchase order must have one order date and one customer identifier but no more. These details in the DTD would allow programs to determine if purchase orders are valid.

Validity is an important concept when DTDs are used. If a document is valid, it can be said that it conforms to its DTD. If a document is invalid, the document doesnít conform to its DTD. However, keep in mind that validation is an optional step in processing XML. Programs that use validating parsers can compare documents to their DTD and list places where the document differs from the DTD specification. The programs can then determine actions to take regarding noncompliance with the DTD. Some programs may mark the document as invalid and stop processing it. Other programs may try to correct problems in the document and reprocess it.

Although DTDs can help you specify constraints for documents, DTDs donít specify every nuance of a documentís format. Among other things, a DTD doesnít control allowed values, the denotation of elements (explicit meaning), the connotation of elements (figurative meaning), or the character data that can be associated with elements. This allows for flexibility in the document structure so that you can create many types of documents using the same set of rules.

All valid documents include a reference to the DTD to which they conform. DTDs arenít mandatory, however. When a document lacks a DTD, the XML processor canít verify that the data format is appropriate, but it can still attempt to interpret the data.

DTDs can be specified in several different ways. An internal DTD is one that is defined within a document. An external DTD is one that is defined in a separate document and is imported into the document. Both types of DTDs have their advantages and disadvantages.

Internal DTDs are convenient when you want to apply constraints to an individual document and then easily distribute the document along with its DTD. Theyíre also convenient when youíre developing a complex DTD and want to test an example document against the DTD. Putting the DTD and the related markup in the same file makes it easy to modify the DTD and the example document as often as necessary during testing.

With an external DTD, you place a reference to a DTD in a file rather than the DTD itself. This makes it easy to apply the DTD to multiple documents. Because the DTD is referenced rather than included, you can make changes to the DTD later and you donít need to edit the DTD definition in each and every document to which itís applied. Two types of external DTDs are used:

  • Public  Public DTDs are DTDs that have been standardized and provide a publicly available set of rules for writing specific types of XML documents, such as those used by the airlines or insurance industries.
  • Nonpublic  Nonpublic DTDs are DTDs created by private organizations or individuals. Generally speaking, these DTDs arenít publicly available (or havenít become a public standard).

When you use an external DTD, you should set the standalone attribute of the XML declaration to no, such as:

<?xml version="1.0" encoding="US-ASCII" standalone="no"?>


Working with Internal DTDs

You specify internal DTDs using the DOCTYPE assignment. The DOCTYPE assignment is one of the most basic elements in an XML document. Similar to the document type element, which is a container for all other elements, the DOCTYPE declaration is a container for all DTD assignments.

Declaring Internal DTDs

Internal DTD declarations are formatted as follows:

<!DOCTYPE root_name [ assignments ]>

The declaration begins with the DOCTYPE keyword, followed by the name of the root element for the document. Typically, the name of the root element serves as a descriptor for the type of information the document contains. The root name is followed by an open bracket, which signifies the beginning of the declaration assignments. Because thereís usually a large group of declarations, assignments are normally entered on separate lines following the document type declaration. The last entry in the document type declaration is always the closing bracket for the DOCTYPE keyword.

Following this, if you wanted to structure a set of purchase orders, you might define the document type as follows:

<?XML version="1.0" ?>
<!DOCTYPE purchase_order [
 
]>

Within the DTD for the purchase_order document, you could then define elements, such as:

<!DOCTYPE purchase_order [
 <!ELEMENT purchase_order (customer)>
 <!ELEMENT customer (account_id, name)>
 <!ELEMENT account_id (#PCDATA)>
 <!ELEMENT name (first, mi, last)>
 <!ELEMENT first (#PCDATA)>
 <!ELEMENT mi (#PCDATA)>
 <!ELEMENT last (#PCDATA)>
]>

The example DTD declares seven elements (purchase_order, customer, account_id, name, first, mi, and last) and sets the order in which those elements may be entered in a document. The line breaks used arenít relevant to the DTD, and neither is the order in which the elements are listed. Although the elements are entered from the highest level to the lowest level, you could also enter them in this order:

<!DOCTYPE purchase_order [
 <!ELEMENT last (#PCDATA)>
 <!ELEMENT mi (#PCDATA)>
 <!ELEMENT first (#PCDATA)>
 <!ELEMENT name (first, mi, last)>
 <!ELEMENT account_id (#PCDATA)>
 <!ELEMENT customer (account_id, name)>
 <!ELEMENT purchase_order (customer)>
]>

or this order:

<!DOCTYPE purchase_order [
 <!ELEMENT account_id (#PCDATA)>
 <!ELEMENT customer (account_id, name)>
 <!ELEMENT first (#PCDATA)>
 <!ELEMENT last (#PCDATA)>
 <!ELEMENT mi (#PCDATA)>
 <!ELEMENT name (first, mi, last)>
 <!ELEMENT purchase_order (customer)>
]>

As these examples show, the order of DTD declarations isnít important. What is important are the declaration, the declaration name, and the associated values. In the case of elements, the values in parentheses set the order in which the elements must be used. Here, customer elements must contain exactly one account_id element followed by exactly one name element. The name element must contain exactly one first element followed by exactly one mi ele-ment followed by exactly one last element. The first, mi, and last elements must contain parsed character data (#PCDATA), which is raw text that could contain entity references, such as &gt; or &lt; but donít contain other markup or child elements.

The spacing between the declaration name and other elements is also important. The following declaration is improperly formatted:

<!ELEMENTaccount_id (#PCDATA)>

as is the following declaration:

<!ELEMENT account_id(#PCDATA)>

The correct format is:

<!ELEMENT account_id (#PCDATA)>

Listing 3-1 provides the source for a basic document with an internal DTD. As you can see, the internal DTD definition follows the XML declaration and is in turn followed by the documentís contents. The opening tag for the root element, purchase_order, is the first tag in the body of the document. Itís followed by other elements in the order prescribed in the DTD. The closing tag for the root element is that last item in the document.

Listing 3-1.  An XML Document with an Internal DTD

<?xml version="1.0" ?>
<!DOCTYPE purchase_order [
 <!ELEMENT purchase_order (customer)>
 <!ELEMENT customer (account_id, name)>
 <!ELEMENT account_id (#PCDATA)>
 <!ELEMENT name (first, mi, last)>
 <!ELEMENT first (#PCDATA)>
 <!ELEMENT mi (#PCDATA)>
 <!ELEMENT last (#PCDATA)>
]>

<purchase_order>
 <customer>
  <account_id>10-487</account_id>
  <name>
   <first> William </first>
   <mi> R </mi>
   <last> Stanek </last>
  </name>
  </customer>
</purchase_order>

Adding Internal DTDs to Documents

To declare an internal DTD in an XML document, follow these steps:

  1. Open your XML document for editing. At the top of the document following the XML declaration, type <!DOCTYPE root_name [, where root_name is the name of the documentís root element.
  2. Enter a few blank lines, in which youíll later enter your declarations as discussed in Chapter 4, "XML Elements in DTDs," Chapter 5, "XML Attributes in DTDs," and Chapter 6, "XML Entities and Notations in DTDs."
  3. Type ]> to complete the DTD.

The result should look similar to the following example:

<?XML version="1.0" ?>
<!DOCTYPE purchase_order [
 
]>

Working with External DTDs

You specify external DTDs using a DOCTYPE assignment that contains a Uniform Resource Identifier (URI). The URI in the assignment identifies the location of the DTD. Because URIs are a superset of Uniform Resource Locators (URLs) and Uniform Resource Names (URNs), XML documents can reference both a URL and a URN.

The DOCTYPE declaration must occur after the XML declaration but before the root element. Officially, the part of the XML document before the root element start tag is called the prolog. You can think of the prolog as a header, much like the header in HTML documents.

As discussed previously in this chapter, there are two types of external DTDs: public and nonpublic. The sections that follow examine each type of external DTD.

Declaring Public External DTDs

Standard, publicly accessible DTDs are specified using the keyword PUBLIC in the DOCTYPE declaration. A public DTD can have a public ID, officially referred to as a formal public identifier (FPI). The idea is that an XML parser could use the public ID to find the latest version of the DTD on a public server. In practice, however, most XML parsers rely on the public ID to locate and validate documents.

The following document type declaration refers to the version 2.0 DTD specification for XML 1.0:

<!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.0//EN"
     "/XML/1998/06/xmlspec-v20.dtd">

By examining the previous declaration, you can learn many things about how declarations are defined and used. The example declaration says that the root element is spec and then specifies information about the DTDís owner and location. The owner information is supplied first as a URN:

-//W3C//DTD Specification V2.0//EN

The double slashes (//) separate categories of information regarding the DTD and its owner:

  • - (Minus)  Indicates that the DTD isnít a recognized standard. A plus (+) here would have meant that the DTD is a recognized standard.
  • W3C  Specifies the owner of the DTD as the World Wide Web Consortium (W3C). This means that the W3C wrote and maintains the DTD. The owner can be a person or an organization.
  • DTD Specification V2.0  Sets a descriptive label for the DTD. The label can contain any standard characters except double slashes (//).
  • EN  A two-letter abbreviation for the language of the XML documents to which the DTD applies. In this case the language is U.S. English. A complete list of two-letter language abbreviations is specified in ISO 639-1.

Unlike the URN, which allows the DTD to be located by a publicly identified name, the next section of the declaration refers to a static URL:

"/XML/1998/06/xmlspec-v20.dtd"

In this example the URL is relative to a location on a specific server but could also have been an absolute URL that pointed to a specific location on a remote server, such as:

"http://www.w3.org/XML/1998/06/xmlspec-v20.dtd"

The important thing to note about the URL is the DTD file name, which is xmlspec-v20.dtd. As with XML file names, the extension for DTDs doesnít have to be .dtd, as shown. However, the .dtd extension does make it easier for you and other developers to locate your DTDs.

Listing 3-2 provides the source for a basic document with an external DTD that is public. As with an internal DTD definition, an external DTD definition declares the root element, which in this case is purchase_order. The document type declaration is in turn followed by the documentís contents.

Listing 3-2.  An XML Document with a Public External DTD

<?xml version="1.0" standalone="no"?>
<!DOCTYPE purchase_order PUBLIC "-//Stanek//PO Specification//EN"
 "http://www.tvpress.com/pospec.dtd">
<purchase_order>
 <customer>
  <account_id>10-487</account_id>
  <name>
   <first> William </first>
   <mi> R </mi>
   <last> Stanek </last>
  </name>
 </customer>
</purchase_order>

Adding Public External DTDs to Documents

To add a public external DTD to an XML document, follow these steps:

  1. Open your XML document for editing. In the XML declaration, add standalone="no" (or replace an existing value of yes with no).
  2. Type <!DOCTYPE root_name, where root_name is the name of the documentís root element.
  3. Type PUBLIC to indicate that the external DTD is publicly accessible. Be sure thereís a space before and after the keyword.
  4. Type the public ID of the external DTD between quotation marks, such as: "-//Stanek//PO Specification//EN".
  5. Type the URL for the public DTD between quotation marks, such as: "http://www.microsoft.com/pospec.dtd".
  6. Type > to complete the declaration.

The result should look similar to the following:

<!DOCTYPE purchase_order PUBLIC "-//Stanek//PO Specification//EN"
 "http://www.microsoft.com/pospec.dtd">

Declaring Nonpublic External DTDs

DTDs that organizations and individuals create for their own purposes are personal DTDs and are declared with the keyword SYSTEM rather than PUBLIC. With nonpublic DTDs, the standard declaration usually looks like this:

<!DOCTYPE purchase_order SYSTEM 
 "http://www.microsoft.com/pospec.dtd">

In this example:

  • purchase_order  The designator for the root element, which is the name of the root of the XML tree.
  • SYSTEM  Identifies an external DTD that isnít public and generally is created by an organization or individual for their own purposes.
  • http://www.microsoft.com/pospec.dtd  Sets the URL for the DTD, which could be relative to a specific location or an absolute URL to a specific location on a remote server.

Listing 3-3 provides the source for a basic document with an external DTD that isnít public. Again, the document type declaration is followed by contents of the document. Note that the DTD location reflects only a file name, which means the file is in the same directory as the associated XML document.

Listing 3-3.  An XML Document with a Nonpublic External DTD

<?xml version="1.0" standalone="no"?>
<!DOCTYPE purchase_order SYSTEM "pospec.dtd">
<purchase_order>
 <customer>
  <account_id>10-487</account_id>
  <name>
   <first> William </first>
   <mi> R </mi>
   <last> Stanek </last>
  </name>
 </customer>
</purchase_order>

Adding Nonpublic DTDs to Documents

To add a nonpublic DTD to an XML document, follow these steps:

  1. Open your XML document for editing. In the XML declaration, add standalone="no" (or replace an existing value of yes with no).
  2. Type <!DOCTYPE root_name, where root_name is the name of the documentís root element.
  3. Type SYSTEM to indicate that the external DTD is nonpublic and nonstandardized. Be sure thereís a space before and after the keyword.
  4. Type the URL for the DTD between quotation marks, such as: "pospec.dtd".
  5. Type > to complete the declaration.

The result should look similar to the following:

<!DOCTYPE purchase_order SYSTEM "pospec.dtd">

Resolving Errors with Externally Referenced DTDs

The XML parser processing a document must be able to locate external DTDs using the URI youíve provided. If the parser is unable to locate the DTD, the error youíll see usually specifies that the system canít locate the resource specified or that there was an error processing the resource DTD.

If this occurs, check the accuracy and syntax of the URL youíre using. Keep in mind that URLs in the form dir_name/file.dtd are located relative to the current directory but URLs in the form /dir_name/file.dtd are located relative to the root directory for the server or current file system. Further, if only a filename is specified as the URL, the DTD is expected to be in the same directory as the associated XML document.

Combining Internal and External DTDs

XML documents can have an internal and an external DTD. To do this, you add the internal DTD declarations after specifying the location of the external DTD. As before, the internal declarations begin with the open bracket ([) and end with the closing bracket and the greater than sign (]>).

Listing 3-4 shows an example document with an internal DTD and a nonpublic external DTD. Note the reference to the external nonpublic DTD (pospec.dtd) as well as the internal DTD declarations.

Listing 3-4.  An XML Document with an Internal and External DTD

<?xml version="1.0" standalone="no"?>
<!DOCTYPE purchase_order SYSTEM "pospec.dtd"[
 <!ELEMENT purchase_order (customer)>
 <!ELEMENT customer (account_id, name)>
 <!ELEMENT account_id (#PCDATA)>
 <!ELEMENT name (first, mi, last)>
 <!ELEMENT first (#PCDATA)>
 <!ELEMENT mi (#PCDATA)>
 <!ELEMENT last (#PCDATA)>
]>

<purchase_order>
 <customer>
  <account_id>10-487</account_id>
  <name>
   <first> William </first>
   <mi> R </mi>
   <last> Stanek </last>
  </name>
 </customer>
</purchase_order>

If you decide to create documents with an internal and external DTD, you should ensure that the two DTDs are compatible. When determining compatibility, keep in mind these basic rules:

  • Neither DTD can override the element or attribute declarations of the other. This also means that the DTDs canít contain the same element or attribute declarations.
  • Entity declarations can be declared as external and can be redefined in an internal DTD. If there are conflicting entity declarations, the first declaration has precedence. Because internal DTDs are read first, internal declarations have precedence over identically named external references. However, when the external reference is read, its definition is still applied.

Adding Internal and External DTDs to Documents

To add an internal and external DTD to an XML document, follow these steps:

  1. Open your XML document for editing. In the XML declaration, add standalone="no" (or replace an existing value of yes with no).
  2. Type <!DOCTYPE root_name, where root_name is the name of the documentís root element.
  3. Type PUBLIC or SYSTEM as appropriate. The PUBLIC keyword indicates a public, standard DTD. The SYSTEM keyword indicates a nonpublic, nonstan-dard DTD.
  4. If youíre specifying a public external DTD, type the public ID of the external DTD between quotation marks, such as: "-//Stanek//PO Specification//EN".
  5. Type the URL for the public DTD between quotation marks, such as: "http://www.microsoft.com/pospec.dtd".
  6. Type [.
  7. Enter a few blank lines, in which youíll later enter your declarations as discussed in Chapters 4, 5, and 6.
  8. Type ]> to complete the DTD.

The result should look similar to the following example:

<?XML version="1.0" standalone="no"?>
<!DOCTYPE purchase_order PUBLIC "-//Stanek//PO Specification//EN"
      "http://www.microsoft.com/pospec.dtd" [
 
]>

Writing External DTD Files

DTD files are standard Unicode or ASCII text files that contain the definitions youíre declaring externally. In the DTD file, you donít enter the DOCTYPE declaration or any formatting characters other than those required for the definitions being declared. Although the file can be named with any valid system name, itís better to name the file with the .dtd extension. The .dtd extension makes it easy to recognize that the file contains a DTD.

Listing 3-5 provides the contents of an external DTD file and shows how the file could be referenced in a conforming document.

Listing 3-5.  An External DTD File with an Associated XML Document File

Filename: pospec.dtd

<!ELEMENT purchase_order (customer)>
<!ELEMENT customer (account_id, name)>
<!ELEMENT account_id (#PCDATA)>
<!ELEMENT name (first, mi, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT mi (#PCDATA)>
<!ELEMENT last (#PCDATA)>

Filename: purchase.xml

<?xml version="1.0" standalone="no"?>
<!DOCTYPE purchase_order SYSTEM "pospec.dtd">
<purchase_order>
 <customer>
  <account_id>10-487</account_id>
  <name>
   <first> William </first>
   <mi> R </mi>
   <last> Stanek </last>
  </name>
 </customer>
</purchase_order>

Although DTD files can contain blank lines and properly formatted comments, they shouldnít contain XML or DOCTYPE declarations. Additionally, XML elements arenít allowed inside a DTD. An example DTD file with properly formatted comments follows:

<!— Purchase Order Specification V2.1 —>
<!— Author: William Stanek —>
<!— Last modified: 12/15/01 —>
<!ELEMENT purchase_order (customer)>
<!ELEMENT customer (account_id, name)>
<!ELEMENT account_id (#PCDATA)>
<!ELEMENT name (first, mi, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT mi (#PCDATA)>
<!ELEMENT last (#PCDATA)>

Note that DTD comments use the same syntax as standard XML comments, beginning with <!-- and ending with -->.

Sample Chapter from the book:


XML Pocket Consultant


  Contact Us | E-mail Us | Site Guide | About PerfectXML | Advertise ©2004 perfectxml.com. All rights reserved. | Privacy