Introduction
The
eXtensible Markup Language (XML)
started its life as replacement for the immens popularly but limited HTML-format.
In time, it became apparent that XML was even more useful "outside the browser"
and that it was extremely suitable as a format for data interchange and data
storage. The popularity and wide-spread use of XML has led to a large and
continuously growing amount of XML documents and a strong demand for efficient
storage, query and retrieval solutions.
Initially
XML database vendors concentrated on devising smart storage methods for XML
data, but nowadays the focus seems to have shifted towards creating powerful
query and retrieval methods. Some early attempts to define a query language for
XML data (including XQL,
XML-QL, Quilt)
have been made. XQuery is the
first language to receive industry-wide attention and support. It is currently
being developed by the W3C XML Query Working Group and has a "Working Draft" status.
Industry experts expect XQuery to do for XML and XML databases what SQL did for
relational data and relational database systems: provide a vendor independent,
powerful and easy-to-use method for query and retrieval of XML data.
The Language
The
data model that XQuery uses is based on that of XPath and defines each XML
document as a tree of nodes. The data model is not only capable of handling
documents but is also designed to work on well-formed document parts (a.k.a.
"fragments"), collections of documents, or collections of fragments.
XQuery
is a functional language where each query is an expression. There are 7 types
of expressions in XQuery: path expressions, element constructors, FLWR expressions,
expressions involving operators and functions, conditional expressions,
quantified expressions and expressions that test or modify datatypes. The
various expressions can be used together both sequentially and nested.
(all examples are copied from or inspired by the XML Query Use Cases)
a. Path expressions
Path expressions are based on the
syntax of XPath, the XML standard for specifying "paths" in an XML document,
for example:
Find all titles of chapters in document books.xml:
document("books.xml")//chapter/title
Find all books in document bib.xml published by Addison-Wesley after 1991:
document(bib.xml")//book[publisher = "Addison-Wesley" AND @year > "1991"]
b. Element constructors
This type of expression is used when a query needs to create
new elements, for example:
Generate a <book> element with attribute "year" and
whose value is the title of the book:
<book>
{ $b/@year }
{ $b/title }
</book>
The variable $b is bound in another
part of the query. When the complete query is run, the above element
constructor will generate a result like this:
<book year="1992">
<title>Advanced Programming in the Unix environment</title>
</book>
c. FLWR expressions
The FLWR (pronounced "flower") expression is the analogue
of the SELECT-FROM-WHERE construction in SQL and forms the skeleton of the XQuery
expression. A FLWR expression consists of:
- FOR-clause: binds one or
more variables to a sequence of values returned by another expression
(usually a path expression) and iterates over the values.
- LET-clause: also binds
one or more variables but without iterating.
- WHERE-clause: contains
one or more predicates that filters or limits the set of nodes as
generated by the FOR/LET-clauses.
-
RETURN-clause: generates
the output of the FLWR expression. The RETURN-clause usually contains one
or more element constructors and/or references to variables and is executed once
for each node-reference that is returned by the FOR/LET/WHERE-clauses.
The following example returns the title and average price of
all books published by Addison-Wesley:
<results>
{
FOR $t IN distinct(document("prices.xml")/prices/book/title)
LET $p := avg(document("prices.xml")/prices/book[title=$t]/price)
WHERE (document("bib/xml")/book[title=$t]/publisher) = "Addison-Wesley"
RETURN
<result>
{ $t }
<avg>
{ $p }
</avg>
</result>
}
</results>
d. Expressions involving operators and functions
XQuery provides most of operators and functions that can
also be found in other computer languages, including arithmetic operators,
comparison operators, logical operators and sequence-related operators. The
built-in functions include AVG, SUM, COUNT, MAX and MIN,
but also XML document and node set related functions like DOCUMENT, EMPTY and
DISTINCT.
In this example the minimum price of each book is returned in
element <minprice> which has the title of the book as an attribute:
<results>
{
LET $doc := document("prices.xml")
FOR $t IN distinct($doc/book/title)
LET $p := $doc/book[title = $t]/price
RETURN
<minprice title={ $t/text() }>
{
min($p)
}
</minprice>
}
</results>
Besides built-in functions, XQuery also provides a mechanism
for specifying user defined functions.
e. Conditional expressions
XQuery also allows the usage of IF-THEN-ELSE expressions:
<user>
{ $u/userid }
{ $u/name }
{
IF (empty($b))
THEN <status>inactive</status>
ELSE <status>active</status>
}
</user>
f. Quantified expressions
SOME and EVERY are so-called quantified expressions. Through
the SOME expression it is possible to identify whether at least one node of a
set of nodes satisfies a predicate. The EVERY expression is used to test
whether all nodes of a set satisfy a predicate.
The following example lists the names of users, if any, who have bid on every item:
<frequent_bidder>
{
FOR $u IN document("users.xml")//user_tuple
WHERE
EVERY $item IN document("items.xml")//item_tuple SATISFIES
SOME $b IN document("bids.xml")//bid_tuple SATISFIES
($item/itemno = $b/itemno AND $u/userid = $b/userid)
RETURN
$u/name
}
</frequent_bidder>
g. Expressions that test or modify datatypes
XQuery supports both standard
datatypes (based on XML Schema's type system) as well as user-defined
datatypes. The INSTANCEOF and TYPESWITCH/CASE expressions are used to test
whether an instance is of a certain datatype.
Implementations
Currently
several implementations of XQuery are available (see References). X-Hive Corporation has developed an XQuery
implementation on top of its native XML database, X-Hive/DB. The goal of this
implementation was twofold: to investigate the implementability and usability
of XQuery in X-Hive/DB and to provide feedback to the W3C XML Query Working
Group especially from a "native XML database vendor's point-of-view". Both
goals were met: X-Hive currently implements the majority of the XQuery
specifications and comments (part 1, part 2) and corrections have been submitted to the Working Group.
Like
most of the implementations, X-Hive's XQuery implementation includes the XML
Query Use Cases and associated data. Unlike other implemenations, X-Hive also
provides the sample queries and
data from the XMach-1 XML benchmark as this set of samples features queries
that are run over a collection of XML documents.
Current Shortcomings
XQuery is still in the W3C Working Draft
stage. The combined working drafts contain numerous issues to be
resolved. Apart from the issues actually mentioned, there are also many
inconsistencies within and between the working drafts. As a minor
example, the Use Cases document uses several functions that are not defined in
the Functions & Operators document. Current implementations and
queries will need major rewrites as the drafts evolve.
Update queries
(including insert queries) are specifically not a goal of XQuery version 1.0,
but are expected in a later version. Without update queries it is not
possible to use XQuery as a complete database interface in the way that SQL is
now used for relational database systems.
Going by the current
drafts, XQuery will not contain full text search facilities like "Find all
elements containing a particular word". Considering some of the
current application areas of XML, users will have a need for such
facilities. Of course, it is already very hard to give a definition of
"word" that satisfies both English and French users, let alone to make
it work for Kantonese and every other language as well. These problems
may relegate these features to the realm of vendor specific extensions forever.
XQuery will use (a
subset of) the yet to be defined XPath 2.0 language. Backward
compatibility with XPath 1.0 is an important goal of XPath 2.0. However,
this does not fit well with several XQuery fundamentals. For example, in
XPath 1.0 node-sets do not have an ordering, while in XQuery they do have an
ordering to allow sorting. This means that in many places XQuery is going
to be a compromise between the cleanest solution and the one that keeps XPath
1.0 expressions working.
Despite
the ambition to make XQuery the default query language for XML, XQuery itself
is not XML. To solve this, the W3C is also working on a XML syntax for the
XQuery semantics: XQueryX.
Conclusion
Though
still a lot of work has to be done, XQuery is a very promising initiative in
defining the standard for query and retrieval of XML documents and document
collections. The majority of XML database vendors, and even some relational
database vendors, have developed or announced XQuery implementations.
References to XQuery Implementations
About X-Hive Corporation
X-Hive Corporation is a leading innovator in XML database technology.
Its mission is to provide superior technology and expertise to the growing
market for XML applications and services. Its flagship product, X-Hive/DB
is a native XML database based upon open standards which has the ability
to instantly locate and retrieve the smallest element within large quantities
of data. This sets X-Hive/DB apart from the competition and makes it the ideal
foundation for building mission critical applications and large volume XML
data environments.
X-Hive Corporation is an active member of the World Wide Web Consortium
(W3C) and is based in Rotterdam, The Netherlands. X-Hive is proud to be an
Official Partner 2001 for Renault Sport F1 and the Benetton Formula 1 Team.
For more information, visit www.x-hive.com.