Extensible Markup Language (XML) is a
W3C Recommendation (
http://www.w3.org/TR/REC-xml) for how to represent information in a
text-based document.
To give you an analogy: if Java is portable code, then, XML is portable data.
XML is extensible (as it does not have any fixed set of tags); it makes use of markup (angled brackets, elements, attributes) to add meaning to the text; and it is a meta-language (language used to create other languages; XML syntax is used for various other markup languages, such as SVG, XSLT, WML, and so on).
XML is not...
...a programming language like C++, Visual Basic, ...
Even though XML has word "language" in its acronym, it does not refer to programming language; but it means meta-language. Syntax/rules defined by XML can be used to create other markup languages.
...only for Internet/Web applications
It is true that XML is best choice if you need to transfer data cross-platform over Internet. However, XML is
not just for Internet/Web applications. Today, XML is being used for wide variety of applications, such
as graphics (SVG), configuration files, code documentation, speech/multimodal, MathML, and so on.
...something to replace or in competition with HTML
When you first look at XML, it might look very similar to HTML.
Why not, like HTML, XML is also a markup language; XML document also has hierarchical structure containing
elements (start-tag and end-tag, example <name>Darshan</name>) and
attributes. However, XML is not here to replace HTML. XML's goal and utility is quite different from that of HTML.
It is true that XML is being used to make HTML better (XHTML); but XML is not necessarily here to only
replace or compete with HTML. HTML is about presentation; XML is all about data.
...some proprietary technology
As noted earlier, XML is created by the same standards body (W3C) that created HTML. XML standard has received excellent tool and vendor support. All major vendors support XML and provide tools/technologies that use XML and/or help in working with XML.
XML is all about data; it does not provide any display/presentation details.
XML does not have a fixed set of tags
XML is case-sensitive
XML has strict rules
Each start tag should have an end tag (<Name>D</Name> or <Comments />)
Attribute values must be in single or double quotes (<Vendor id="1" /> or <Vendor id='1' />)
Tags cannot overlap (<a><b></a></b> is not allowed). They should be properly nested (<a><b></b></a>).
Only one top/root element is allowed.
Strict rules for element names (names such as <123 /> is not allowed)
No element may have two attributes with the same name
Well-Formed and Valid XML
All documents that conform to XML 1.0 rules (one root element, matching start and end tags, attributes in quotation marks, ...) are known as
Well-Formed XML documents.
In addition to above XML 1.0 rules, if XML documents also follow the rules that you have defined (structural rules, such as hierarchy of elements, presence of attributes, value data types, child-element occurrences, etc.), that document can be called as a Valid XML document. You can write DTDs/XML schemas (XSD) to validate the XML documents. DTDs/XML Schemas help in making sure that XML structure looks like as you expected.
In other words, all Well-Formed XML documents that adhere to structure defined by DTD or XML Schema, are known as valid XML document.
All valid XML documents are well-formed, but converse may not necessarily be true.
In a well-formed XML document:
entity references are used for five special characters (& for &; < for <; > for >; ' for '; and " for "
character references are used for other special characters (for example: ® or ® is used for ®; Ω or Ω for Ω)
Characters between 0 to 31 (except CR, LF, and tab) are not allowed
What’s so great about XML?
Self-describing data in text format
XML's textual nature makes it highly portable. This is the reason XML is being heavily used for cross-platform data integration. If meaningful tag/attribute names are used to enclose the data, the document becomes self-describing (compare this with comma-separated-values CSV or fixed-length delimited data).
Open, Standard, License-free, Platform-neutral with great tool and vendor support
Many developers have started using XML in their application design/architecture because of the fact
that XML is an open standard (and hence avoiding vendor lock-in); and at
the same time all vendors agree upon and support XML;
and also that XML is surrounded by many supporting standards (XSLT, XPath, XML Schema, and so on) that help
while implementing XML solution.
Clean separation between Content & Presentation
Once you have your data in XML format, you can transform same XML document into HTML, text,
SVG (scalable vector graphics), WML (wireless markup language), PDF, or to any other format that you desire.
And hence you have "document-view" architecture, wherein there is a clean separation between your
content/data and the presentation.
Unicode support
XML documents may contain Unicode characters (excluding the surrogate blocks, FFFE, and FFFF);
and hence XML can be readily used for international applications.
Easy to transfer and transform
XML can be easily transformed to any other format.
XSLT stylesheets can be used to transform XML document into any other format (such as HTML, CSV, PDF, and so on).
Easy to parse, process, and search
XML parsers are used to parse and process XML. For instance, MSXML (Microsoft XML Core Services) or .NET can be used on Microsoft platform to work with XML. Similarly, there are many Java-based XML processing APIs (JAXP, Xerces, Xalan, and so on) available from various vendors, including Sun, Apache, and Oracle.
(Machine and) Human-readable
XML documents are text documents; and hence you don't need any special tools to read or write XML documents.
Just notepad would do!
Hierarchical Structure
XML documents are hierarchical in nature – with one top-level root element, and hence is an excellent choice for modeling hierarchical data in an easy-to-read fashion.
Enables many other technologies (for example: Web services)
Web services refer to cross-platform messaging over Internet.
Web services facilitate application integration.
XML is one of the core building blocks in the Web services architecture.
For instance, the caller (client) sends the XML-formatted message envelope to the server;
and in return, server sends XML-formatted response.
What's NOT so great about XML?
XML is a space, processor, and bandwidth hog
If you are working on an application which let's assume that will be used inside corporate network
and that good performance and/or low network bandwidth usage is a critical requirement.
In such cases, it does not makes sense to use XML for data transfer! You can use proprietary binary
format for optimized results. XML's textual nature and markup requirements places more demand for
space, bandwidth, and processor.
What about binary data?
We now know that XML documents are text data. What if you need to transmit some binary data?
There are primarily three options if you need to send some binary data along with XML:
Provide a reference
Instead of sending binary data along with XML, you can just include a reference to the binary data and let the client get to it separately.