XML for Web Developers – in 500 words or less!
Compiled by perfectxml.com Team
Friday, November 01, 2002
Recently, Andrew Watt
(author of the book
Sams Teach Yourself XML in 10 Minutes)
started an interesting discussion thread on
XML-DEV mailing list.
The topic was to describe XML in 500 words or less to an "ordinary" Web developer (who has no formal
Computer Science training).
The first reply came from Mike Champion and he referred to his reply from the
At the most basic level, XML is simply a standardized syntax, or
metalanguage for building languages, for describing labelled trees.
But such a definition understates the potential of XML just as badly
as the statement "XML is the silver bullet for interoperability"
overstates it. "XML" provides not only this rather abstract
metalanguage, but also a community of experts, tools, examples, and
associated specifications that provide a substantial percentage of
the infrastructure that underlies Web-based applications and business
XML is about standardized syntax, labels, and trees.
The "standardized syntax" means that you usually don't have to
write your own grammars and parsers, you can leverage XML. XML is
not a perfect or optimal syntax, but it's good enough for most
purposes, and its limitations for any particular problem are
generally overwhelmed by the benefits provided by the network effect.
The "labels" make it much easier for human readers and simple
programs to associate a value with a description of what it is all
about. This is not a general solution to the (probably insolveable)
problem of representing the "meaning" of data, but it is a very
practical solution to real world information exchange problems. C'mon
folks, there's no "standard" for paper invoices, and they are written
up using different languages, formats, currencies, and private
vocabularies for centuries. I submit that human accounting clerks in
bureaucratic organizations are not all that much better at pattern
recognition than modern specialized software can be, and if "labels"
have taken the world of business as far as it has, that's something
worth taking seriously. If formal type systems and ontologies can do
better, fine ... but I'll believe that when I see it working well in
"Trees" mean that an XML document can describe a hierarchical
package of inter-related information. This lets XML easily represent
"documents" and "messages" of the sort that have enabled human and
organization communication since the invention of paper.
Andrew replied describing the wordings such as "metalanguage" and "labelled tree" as
"XML geek terminology"!
How would one explain XML to a "typical" Web developer?
Mike replied agreeing that "it is hard to de-geekify one's vocabulary", but at the same time asked to
be "more specific about what we can assume about typical Web developers".
Dare Obasanjo from Microsoft replied describing XML in one sentence:
XML is a format for describing structured data using a syntax similar to HTML but with stricter rules.
Meanwhile, Andrew replied to Mike's post. Andrew mentioned that
"Communicating XML with the bulk of developers is key to XML's future acceptance" and that
there are many out there who do not even understand XML's hierarchical nature.
Paul Prescod responded with an excellent explanation:
XML is a common alphabet and grammar ("syntax") for newer Web
development standards like SVG (a common vector graphics format for the
Web), XForms (the next generation of Web forms) and RSS (a way to
summarize the information on your site so that "subscribers" can find
out about documents added to the site).
Before XML was popular, it was common to invent new syntaxes from
scratch. Learning, implementing and using new syntaxes can be quite
challenging so this process was very inefficient.
For instance before SVG, it was common to do vector graphics in
PostScript. If you compare these SVG and Postscript examples you'll see
SVG example has much more in common with the HTML you already
know than the Postscript.
Tools for working with HTML (especially its XHTML variant) can more
easily be adapted for working with SVG than with something totally
different like Postscript.
XML goes well beyond web interface development standards. It is also
used under the covers for many machine to machine communications such as
automated purchasing. The benefits there are similar.
Compare a purchase order in the older
to the markup-based one in XML
Similarly, compare the
line-based syntax MCF
Each of these standards (SVG for graphics, RSS for syndication, ebXML
for electronic business) can be considered a "vocabulary". Just as it is
more efficient to invent new words for English than to invent a whole
new language, it is more efficient to invent new vocabularies for XML
than invent whole new syntaxes.
Vocabularies in English are defined in glossaries or dictionaries.
Similarly, vocabularies in XML are defined in so-called "schemas". Just
as there are a variety of formats for glossaries, there are a variety of
"schema languages" for defining XML vocabularies. The existence of
schema languages serves to encourage an important best-practice:
documentation of vocabularies! Before XML, it was common for the syntax
and vocabulary of data formats to be completely proprietary trade
secrets (consider the Office document file format). This is no longer
considered an acceptable practice and Microsoft Office 11 is expected to
have a fully-documented XML file-format (and perhaps a formal schema).
XML also encourages best practices around internationalization. XML
builds on a standard for characters known as "Unicode." Using XML (and
thus Unicode), it is easy to insert characters in anything from Arabic
to Chinese to Cyrllic and even from ancient dead languages! This is an
important step forward in an industry that has historically thought of
English first and then other languages only much later.
In response to Andrew's earlier reply about Web developers not being able to understand terms such
as "labelled trees", Uche Ogbuji offered a different opinion:
Any Web developer likely to be in a
position to consider XML (i.e. is not just a
graphic artist or copy writer, but designs technology to be deployed to the
server) better understand the most basic principles of information
management. If not, the answer, as far as I'm concerned, is not to try
telling them everything in kindergarten terms, but rather to find their boss
and tell them they have hired a liability. Believe me, they will cause far
more damage in their work than anyone can help, regardless of how things are
explained to them.
Every profession has standards. Web developers should be no exception.
Rick Jelliffe wrote:
XML is like
- SGML without configurability
- HTML without forgivingness
- LISP without functions
- CSV without flatness
- PDF without Acrobat
- ASN.1 without binary encodings
- EDI without commercial semantics
- RTF without word-processing semantics
- CORBA without tight coupling
- ZIP without compression or packaging
- FLASH without the multimedia
- A database without a DBMS or DDL or DML or SQL or a formal model
- A MIME header which does not evaporate
- Morse code with more characters
- Unicode with more control characters
- A mean spoilsport, depriving programmers the fun of inventing their own syntaxes during work hours
- The first step in Mao's journey of a thousand miles
- The intersection of James Clark and Oracle
- The common ground between Simon St. L and Henry Thomson
- The secret love child of Uche and Elliotte
- Microsoft's secret weapon against Sun's Open Office
- Sun's secret weapon against Microsoft's Office
- The town bicycle
In other words, XML is a very thin layer which provides some common
functionality that may be useful in many disparate applications.
Typical uses are data exchange, protocols, literature processing, information archiving,
hypertext, and configuration files.
Some of its benefits can be, that it is:
- standard: learn once, use many times, and you don't need to explain the syntax to others,
- generalized: can be applied to many kinds of data,
- human-readable: not the same thing as "comprehensible" markup, and
- it can be parsed and is susceptible to treatment as an artificial language (using tools based on formal grammar theory and parse trees.)
Steve Muench from Oracle replied with following one-para description of XML:
XML stands for the "Extensible Markup Language",
which defines a universal standard for electronic
data exchange. It provides a rigorous set of rules
enabling the structure inherent in data to be easily
encoded and unambiguously interpreted using human-
readable text documents.
Finally, Jeff Lowery presented the following description:
At an idealized level, XML is a generic syntax that is amenable to a wide
variety of contexts and domains.
Its utility is attributable to the following qualities:
- It is text based: Labels, delimiters, and content are encoded as Unicode characters.
Consequence: There are few impediments to transferring XML data across
heterogeneous system boundaries. XML documents can be created using simple
- It is well-formed: The syntax can be interpreted unambiguously and without reliance upon rules
Consequence: XML parsers are simple and ubiquitous.
- It is partially self-describing: All content (data) is labeled.
Consequence: Content type can be clearly identified, both by human and machine. However, semantics are implicit.
- It is structured: The syntax describes a single-rooted tree.
Consequence: Most real-world content maps well to hierarchies; hierarchies
are also readily grasped by human beings. Thus, XML documents tend to be
At a practical level, XML has some warts:
- It is text based
Consequence: Not all data is efficiently encoded as characters.
- It is well-formed
Consequence: Well-formedness is wordy.
- It is partially self-describing
Consequence: Names alone are often not sufficient to unambiguously convey
- It is structured
Consequence: Data retrieval from trees is not always efficient. Some data
cannot be naturally modeled hierarchically.
To an "ordinary" Web developer, we would explain XML using the "by example" approach:
- As Web developers are familiar with HTML and Web pages, the first example we would take is
of a "data" related Web site such as www.nasdaq.com, and
explain how HTML combines "data" with the "presentation", describe how difficult (if not impossible) it
would be if an application were to get to this data (in other words, machine-to-machine data communication, without any platform issues).
- We would then describe that XML is like HTML, as it is text-based and uses "angled brackets"; then clarify on differences
such as XML is all about data, XML is more strict than HTML, the tag-set is not restricted, and so on.
- We would then write a simple XML document, and describe how easy it is to "get to the data" or
search, filter, etc.
- We would then talk about details on success of XML. Discussion including, but not limited to,
presence of supporting "XML-family" of standards, such as XSLT, XPath, DOM, SAX, Schemas, etc.;
the availability of tools (such as parsers/processors);
the textual, self-describing, human-readable and structured nature of XML.
- Next, we'll talk about "innovative" technologies and standards built upon XML,
such as XML-based messaging or XML Web services.
- Finally, we'll ask the "Web developer" to get hands dirty with some code and try out using
XML in his/her Web application.
On Microsoft platform, we'll guide them to try using MSXML
in an ASP page or System.Xml namespace in an ASP.NET page.
On Apache, various tools and technologies from
Apache Software Foundation , &
- We would also recommend following books:
Related reading: Defining Web Services Is No Easy Task
One presenter at last week's XML Web Services One Conference drew a laugh when she told attendees, "Ask five people to define Web services and you'll get at least six answers."
Web services: The next big thing?
With all the next-big-thing hype surrounding Web services, it is easy to write off the idea as marketing mumbo jumbo. But those who do miss a very important point: Web services probably are the next big thing. At the very least, this latest model of collaborative computing is the next logical step in the evolution of e-business, and it may represent a fundamental shift in the way we build and use software.