As you can imagine, this means a lot of work both for parser writers, and for
parsers themselves. All SGML parsers are complex, but HTML parsers are even
more so because they try to anticipate and correct users mistakes. XML is more
economical on all counts. (One of XMLs design goals was that it shall be easy to
write programs that process XML documents.) XML parsers, especially nonvali-
dating ones, are small and relatively easy to write both because XML is simpler
than SGML and because XMLs attitude to syntax errors is totally negative.
Error handling by XML parsers is not only strict, it is also uniform. The W3C
XML recommendation, in a section on conformant processors, precisely specifies
what a processor must do in response to different kinds of errors. There is, as we
mentioned in Chapter 1, a test suite that is designed to test the parsers compli-
ance with the XML specification, especially in its error handling. (See
http://oasis-open.org/committees/xml-conformance/xml-test-suite.shtml.)
There are two main reasons for this strict attitude. First, XML parsers are fre-
quently used to mediate between computer applications or components within
an application. XML data is often generated by a program and consumed by
another program that performs computations on it. In this sort of configuration,
ill-formed data must be inadmissible. (In particular, it must be inadmissible to
feed the same XML data into two different browsers and see one of them succeed
and the other fail to parse it.) Second, XML was developed from the start in anti-
cipation of small mobile devices. A parser sitting in a cell phone, wristwatch, or
remote sensor cannot afford the megabytes of memory that are needed to anti-
cipate and accommodate grammatical error.
47
Well-Formed Documents and Namespaces
NOTE Dave Raggett, a longtime staff member at W3C, wrote a remarkable
program called Tidy (http://www.w3.org/People/Raggett/tidy/). It per-
forms several functions on HTML documents: fixes grammatical errors,
points out deprecated features, and converts the HTML document to
XHTML.We will use Tidy in Chapter 7.
Why XHTML?
Why use XHTML instead of HTML? The main reason is that the entire array of
XML technologies becomes available to you. If you want your Web page to be
produced by an XSLT program, you have to make your template HTML material
conformant to XML rules because an XSLT program is an XML document, and if
you enter, for example, <br> instead of <br />, the parser will object.
Since 1999, HTML has been in effect mothballed while XHTML has been an
area of active development. A quick look at the list of W3C recommendations
shows XHTML 1.1Module-Based XHTML (May 2001) and XHTML Basic for