The main difference between XML languages and HTML and other SGML
languages is that XML documents can be parsed without a DTD, whereas
SGML documents (whether in HTML or any other SGML language) can be parsed
only with the help of the DTD. This is because, in SGML languages, the end tag of
an element can frequently be omitted even if the element is not empty: in HTML,
you dont have to close off your <p>s with a </p>. For HTML empty elements, the
end tag is always optional: nobody puts <br></br> in a Web page.
HTML vs. XHTML
Listing 2-1 provides an example of a perfectly grammatical HTML document
(paralist.htm); it uses CSS within a style attribute to specify the font properties
for the first <p> element:
Listing 2-1. An HTML Document
<html>
<head><title>HTML Example</title></head>
<body bgcolor=#ffffef>
<h1>Heading</h1>
<p style=color:maroon;font-size:2em>a paragraph with <em>italics</em>
followed by a list
<ul>
<li>item one
<li>item two
</ul>
<p>Another paragraph with a line break <br> in the middle.
</body>
</html>
What would the element tree for this document look like? Figure 2-1 shows
one possibility.
42
Chapter 2