perfectxml.com
 Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
home » Info Bank » Articles » XML for Web Developers – in 500 words or less! Saturday, 14 July 2007

XML for Web Developers – in 500 words or less!

Compiled by perfectxml.com Team
Friday, November 01, 2002

Recently, Andrew Watt (author of the book Sams Teach Yourself XML in 10 Minutes) started an interesting discussion thread on XML-DEV mailing list. The topic was to describe XML in 500 words or less to an "ordinary" Web developer (who has no formal Computer Science training).

The first reply came from Mike Champion and he referred to his reply from the XML-DEV archives:

At the most basic level, XML is simply a standardized syntax, or metalanguage for building languages, for describing labelled trees. But such a definition understates the potential of XML just as badly as the statement "XML is the silver bullet for interoperability" overstates it. "XML" provides not only this rather abstract metalanguage, but also a community of experts, tools, examples, and associated specifications that provide a substantial percentage of the infrastructure that underlies Web-based applications and business integration efforts.
... ...

XML is about standardized syntax, labels, and trees.

  • The "standardized syntax" means that you usually don't have to write your own grammars and parsers, you can leverage XML. XML is not a perfect or optimal syntax, but it's good enough for most purposes, and its limitations for any particular problem are generally overwhelmed by the benefits provided by the network effect.

  • The "labels" make it much easier for human readers and simple programs to associate a value with a description of what it is all about. This is not a general solution to the (probably insolveable) problem of representing the "meaning" of data, but it is a very practical solution to real world information exchange problems. C'mon folks, there's no "standard" for paper invoices, and they are written up using different languages, formats, currencies, and private vocabularies for centuries. I submit that human accounting clerks in bureaucratic organizations are not all that much better at pattern recognition than modern specialized software can be, and if "labels" have taken the world of business as far as it has, that's something worth taking seriously. If formal type systems and ontologies can do better, fine ... but I'll believe that when I see it working well in widespread practice

  • "Trees" mean that an XML document can describe a hierarchical package of inter-related information. This lets XML easily represent "documents" and "messages" of the sort that have enabled human and organization communication since the invention of paper.



Andrew replied describing the wordings such as "metalanguage" and "labelled tree" as "XML geek terminology"! How would one explain XML to a "typical" Web developer?

Mike replied agreeing that "it is hard to de-geekify one's vocabulary", but at the same time asked to be "more specific about what we can assume about typical Web developers".

Dare Obasanjo from Microsoft replied describing XML in one sentence:

XML is a format for describing structured data using a syntax similar to HTML but with stricter rules.

Meanwhile, Andrew replied to Mike's post. Andrew mentioned that "Communicating XML with the bulk of developers is key to XML's future acceptance" and that there are many out there who do not even understand XML's hierarchical nature.

Paul Prescod responded with an excellent explanation:

XML is a common alphabet and grammar ("syntax") for newer Web development standards like SVG (a common vector graphics format for the Web), XForms (the next generation of Web forms) and RSS (a way to summarize the information on your site so that "subscribers" can find out about documents added to the site).

Before XML was popular, it was common to invent new syntaxes from scratch. Learning, implementing and using new syntaxes can be quite challenging so this process was very inefficient.

For instance before SVG, it was common to do vector graphics in PostScript. If you compare these SVG and Postscript examples you'll see that the SVG example has much more in common with the HTML you already know than the Postscript.

Tools for working with HTML (especially its XHTML variant) can more easily be adapted for working with SVG than with something totally different like Postscript.

XML goes well beyond web interface development standards. It is also used under the covers for many machine to machine communications such as automated purchasing. The benefits there are similar.

Compare a purchase order in the older EDI syntax to the markup-based one in XML

Similarly, compare the line-based syntax MCF to RSS

Each of these standards (SVG for graphics, RSS for syndication, ebXML for electronic business) can be considered a "vocabulary". Just as it is more efficient to invent new words for English than to invent a whole new language, it is more efficient to invent new vocabularies for XML than invent whole new syntaxes.

Vocabularies in English are defined in glossaries or dictionaries. Similarly, vocabularies in XML are defined in so-called "schemas". Just as there are a variety of formats for glossaries, there are a variety of "schema languages" for defining XML vocabularies. The existence of schema languages serves to encourage an important best-practice: documentation of vocabularies! Before XML, it was common for the syntax and vocabulary of data formats to be completely proprietary trade secrets (consider the Office document file format). This is no longer considered an acceptable practice and Microsoft Office 11 is expected to have a fully-documented XML file-format (and perhaps a formal schema).

XML also encourages best practices around internationalization. XML builds on a standard for characters known as "Unicode." Using XML (and thus Unicode), it is easy to insert characters in anything from Arabic to Chinese to Cyrllic and even from ancient dead languages! This is an important step forward in an industry that has historically thought of English first and then other languages only much later.

In response to Andrew's earlier reply about Web developers not being able to understand terms such as "labelled trees", Uche Ogbuji offered a different opinion:

Any Web developer likely to be in a position to consider XML (i.e. is not just a graphic artist or copy writer, but designs technology to be deployed to the server) better understand the most basic principles of information management. If not, the answer, as far as I'm concerned, is not to try telling them everything in kindergarten terms, but rather to find their boss and tell them they have hired a liability. Believe me, they will cause far more damage in their work than anyone can help, regardless of how things are explained to them.

Every profession has standards. Web developers should be no exception.

Rick Jelliffe wrote:

XML is like

  • SGML without configurability
  • HTML without forgivingness
  • LISP without functions
  • CSV without flatness
  • PDF without Acrobat
  • ASN.1 without binary encodings
  • EDI without commercial semantics
  • RTF without word-processing semantics
  • CORBA without tight coupling
  • ZIP without compression or packaging
  • FLASH without the multimedia
  • A database without a DBMS or DDL or DML or SQL or a formal model
  • A MIME header which does not evaporate
  • Morse code with more characters
  • Unicode with more control characters
  • A mean spoilsport, depriving programmers the fun of inventing their own syntaxes during work hours
  • The first step in Mao's journey of a thousand miles
  • The intersection of James Clark and Oracle
  • The common ground between Simon St. L and Henry Thomson
  • The secret love child of Uche and Elliotte
  • Microsoft's secret weapon against Sun's Open Office
  • Sun's secret weapon against Microsoft's Office
  • The town bicycle

In other words, XML is a very thin layer which provides some common functionality that may be useful in many disparate applications.

Typical uses are data exchange, protocols, literature processing, information archiving, hypertext, and configuration files.

Some of its benefits can be, that it is:

  • standard: learn once, use many times, and you don't need to explain the syntax to others,
  • generalized: can be applied to many kinds of data,
  • human-readable: not the same thing as "comprehensible" markup, and
  • it can be parsed and is susceptible to treatment as an artificial language (using tools based on formal grammar theory and parse trees.)


Steve Muench from Oracle replied with following one-para description of XML:

XML stands for the "Extensible Markup Language", which defines a universal standard for electronic data exchange. It provides a rigorous set of rules enabling the structure inherent in data to be easily encoded and unambiguously interpreted using human- readable text documents.

Finally, Jeff Lowery presented the following description:

At an idealized level, XML is a generic syntax that is amenable to a wide variety of contexts and domains. Its utility is attributable to the following qualities:

  1. It is text based: Labels, delimiters, and content are encoded as Unicode characters.

    Consequence: There are few impediments to transferring XML data across heterogeneous system boundaries. XML documents can be created using simple text editors.

  2. It is well-formed: The syntax can be interpreted unambiguously and without reliance upon rules of inference.

    Consequence: XML parsers are simple and ubiquitous.

  3. It is partially self-describing: All content (data) is labeled.

    Consequence: Content type can be clearly identified, both by human and machine. However, semantics are implicit.

  4. It is structured: The syntax describes a single-rooted tree.

    Consequence: Most real-world content maps well to hierarchies; hierarchies are also readily grasped by human beings. Thus, XML documents tend to be human-readable.

At a practical level, XML has some warts:

  1. It is text based

    Consequence: Not all data is efficiently encoded as characters.

  2. It is well-formed

    Consequence: Well-formedness is wordy.

  3. It is partially self-describing

    Consequence: Names alone are often not sufficient to unambiguously convey semantics.

  4. It is structured

    Consequence: Data retrieval from trees is not always efficient. Some data cannot be naturally modeled hierarchically.

Our Take:

To an "ordinary" Web developer, we would explain XML using the "by example" approach:

  • As Web developers are familiar with HTML and Web pages, the first example we would take is of a "data" related Web site such as www.nasdaq.com, and explain how HTML combines "data" with the "presentation", describe how difficult (if not impossible) it would be if an application were to get to this data (in other words, machine-to-machine data communication, without any platform issues).

  • We would then describe that XML is like HTML, as it is text-based and uses "angled brackets"; then clarify on differences such as XML is all about data, XML is more strict than HTML, the tag-set is not restricted, and so on.

  • We would then write a simple XML document, and describe how easy it is to "get to the data" or search, filter, etc.

  • We would then talk about details on success of XML. Discussion including, but not limited to, presence of supporting "XML-family" of standards, such as XSLT, XPath, DOM, SAX, Schemas, etc.; the availability of tools (such as parsers/processors); the textual, self-describing, human-readable and structured nature of XML.

  • Next, we'll talk about "innovative" technologies and standards built upon XML, such as XML-based messaging or XML Web services.

  • Finally, we'll ask the "Web developer" to get hands dirty with some code and try out using XML in his/her Web application. On Microsoft platform, we'll guide them to try using MSXML in an ASP page or System.Xml namespace in an ASP.NET page. On Apache, various tools and technologies from IBM, Apache Software Foundation , & Oracle.

  • We would also recommend following books:

Practical XML for the Web
XML Family of Specifications
XML Application Development with MSXML 4.0
Professional XML for .NET Developers
Processing XML with Java


Related reading: Defining Web Services Is No Easy Task
One presenter at last week's XML Web Services One Conference drew a laugh when she told attendees, "Ask five people to define Web services and you'll get at least six answers." Read more…

Web services: The next big thing?
With all the next-big-thing hype surrounding Web services, it is easy to write off the idea as marketing mumbo jumbo. But those who do miss a very important point: Web services probably are the next big thing. At the very least, this latest model of collaborative computing is the next logical step in the evolution of e-business, and it may represent a fundamental shift in the way we build and use software. Learn More...


  Contact Us | E-mail Us | Site Guide | About PerfectXML | Advertise ©2004 perfectxml.com. All rights reserved. | Privacy