perfectxml.com
 Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
  You are here: home Info Bank Articles » An Introduction to XQuery Saturday, 23 February 2008
 

Back to Articles Page      

        

An Introduction to XQuery

By: Bas de Bakker and Irsan Widarto ( X-Hive Corporation External link )

Introduction

The eXtensible Markup Language (XML) started its life as replacement for the immens popularly but limited HTML-format. In time, it became apparent that XML was even more useful "outside the browser" and that it was extremely suitable as a format for data interchange and data storage. The popularity and wide-spread use of XML has led to a large and continuously growing amount of XML documents and a strong demand for efficient storage, query and retrieval solutions.

Initially XML database vendors concentrated on devising smart storage methods for XML data, but nowadays the focus seems to have shifted towards creating powerful query and retrieval methods. Some early attempts to define a query language for XML data (including XQL, XML-QL, Quilt) have been made. XQuery is the first language to receive industry-wide attention and support. It is currently being developed by the W3C XML Query Working Group and has a "Working Draft" status.

Industry experts expect XQuery to do for XML and XML databases what SQL did for relational data and relational database systems: provide a vendor independent, powerful and easy-to-use method for query and retrieval of XML data.

The Language

The data model that XQuery uses is based on that of XPath and defines each XML document as a tree of nodes. The data model is not only capable of handling documents but is also designed to work on well-formed document parts (a.k.a. "fragments"), collections of documents, or collections of fragments.

XQuery is a functional language where each query is an expression. There are 7 types of expressions in XQuery: path expressions, element constructors, FLWR expressions, expressions involving operators and functions, conditional expressions, quantified expressions and expressions that test or modify datatypes. The various expressions can be used together both sequentially and nested.

(all examples are copied from or inspired by the XML Query Use Cases)

a. Path expressions
Path expressions are based on the syntax of XPath, the XML standard for specifying "paths" in an XML document, for example:

Find all titles of chapters in document books.xml:
	document("books.xml")//chapter/title
Find all books in document bib.xml published by Addison-Wesley after 1991:
	document(bib.xml")//book[publisher = "Addison-Wesley" AND @year > "1991"]

b. Element constructors
This type of expression is used when a query needs to create new elements, for example:

Generate a <book> element with attribute "year" and whose value is the title of the book:
	<book>
	  { $b/@year }
	  { $b/title }
  	</book>
The variable $b is bound in another part of the query. When the complete query is run, the above element constructor will generate a result like this:
	<book year="1992">
	    <title>Advanced Programming in the Unix environment</title>
	</book>

c. FLWR expressions
The FLWR (pronounced "flower") expression is the analogue of the SELECT-FROM-WHERE construction in SQL and forms the skeleton of the XQuery expression. A FLWR expression consists of:

  • FOR-clause: binds one or more variables to a sequence of values returned by another expression (usually a path expression) and iterates over the values.
  • LET-clause: also binds one or more variables but without iterating.
  • WHERE-clause: contains one or more predicates that filters or limits the set of nodes as generated by the FOR/LET-clauses.
  • RETURN-clause: generates the output of the FLWR expression. The RETURN-clause usually contains one or more element constructors and/or references to variables and is executed once for each node-reference that is returned by the FOR/LET/WHERE-clauses.

The following example returns the title and average price of all books published by Addison-Wesley:

	<results>
	{
	  FOR $t IN distinct(document("prices.xml")/prices/book/title)
	  LET $p := avg(document("prices.xml")/prices/book[title=$t]/price) 
	  WHERE (document("bib/xml")/book[title=$t]/publisher) = "Addison-Wesley"
	  RETURN
	    <result>
	      { $t }
	      <avg>
	        { $p }
	      </avg>
	    </result>
	}
	</results>

d. Expressions involving operators and functions
XQuery provides most of operators and functions that can also be found in other computer languages, including arithmetic operators, comparison operators, logical operators and sequence-related operators. The built-in functions include AVG, SUM, COUNT, MAX and MIN, but also XML document and node set related functions like DOCUMENT, EMPTY and DISTINCT.

In this example the minimum price of each book is returned in element <minprice> which has the title of the book as an attribute:

	<results>
	{
	  LET $doc := document("prices.xml")
	  FOR $t IN distinct($doc/book/title)
	  LET $p := $doc/book[title = $t]/price
	  RETURN
	    <minprice title={ $t/text() }>
	    { 
	      min($p) 
	    }
	    </minprice>
	}
	</results>

Besides built-in functions, XQuery also provides a mechanism for specifying user defined functions.

e. Conditional expressions
XQuery also allows the usage of IF-THEN-ELSE expressions:

        <user>
          { $u/userid }
          { $u/name }
          {
            IF (empty($b))
            THEN <status>inactive</status>
            ELSE <status>active</status>
          }
	</user>

f. Quantified expressions
SOME and EVERY are so-called quantified expressions. Through the SOME expression it is possible to identify whether at least one node of a set of nodes satisfies a predicate. The EVERY expression is used to test whether all nodes of a set satisfy a predicate.

The following example lists the names of users, if any, who have bid on every item:

	<frequent_bidder>
	{
	  FOR $u IN document("users.xml")//user_tuple
	  WHERE 
	    EVERY $item IN document("items.xml")//item_tuple SATISFIES 
	      SOME $b IN document("bids.xml")//bid_tuple SATISFIES 
	        ($item/itemno = $b/itemno AND $u/userid = $b/userid)
	  RETURN
	    $u/name
	}
	</frequent_bidder>

g. Expressions that test or modify datatypes
XQuery supports both standard datatypes (based on XML Schema's type system) as well as user-defined datatypes. The INSTANCEOF and TYPESWITCH/CASE expressions are used to test whether an instance is of a certain datatype.


Implementations

Currently several implementations of XQuery are available (see References). X-Hive Corporation has developed an XQuery implementation on top of its native XML database, X-Hive/DB. The goal of this implementation was twofold: to investigate the implementability and usability of XQuery in X-Hive/DB and to provide feedback to the W3C XML Query Working Group especially from a "native XML database vendor's point-of-view". Both goals were met: X-Hive currently implements the majority of the XQuery specifications and comments (part 1, part 2) and corrections have been submitted to the Working Group.

Like most of the implementations, X-Hive's XQuery implementation includes the XML Query Use Cases and associated data. Unlike other implemenations, X-Hive also provides the sample queries and data from the XMach-1 XML benchmark as this set of samples features queries that are run over a collection of XML documents.

Current Shortcomings

XQuery is still in the W3C Working Draft stage. The combined working drafts contain numerous issues to be resolved. Apart from the issues actually mentioned, there are also many inconsistencies within and between the working drafts. As a minor example, the Use Cases document uses several functions that are not defined in the Functions & Operators document. Current implementations and queries will need major rewrites as the drafts evolve.

Update queries (including insert queries) are specifically not a goal of XQuery version 1.0, but are expected in a later version. Without update queries it is not possible to use XQuery as a complete database interface in the way that SQL is now used for relational database systems.

Going by the current drafts, XQuery will not contain full text search facilities like "Find all elements containing a particular word". Considering some of the current application areas of XML, users will have a need for such facilities. Of course, it is already very hard to give a definition of "word" that satisfies both English and French users, let alone to make it work for Kantonese and every other language as well. These problems may relegate these features to the realm of vendor specific extensions forever.

XQuery will use (a subset of) the yet to be defined XPath 2.0 language. Backward compatibility with XPath 1.0 is an important goal of XPath 2.0. However, this does not fit well with several XQuery fundamentals. For example, in XPath 1.0 node-sets do not have an ordering, while in XQuery they do have an ordering to allow sorting. This means that in many places XQuery is going to be a compromise between the cleanest solution and the one that keeps XPath 1.0 expressions working.

Despite the ambition to make XQuery the default query language for XML, XQuery itself is not XML. To solve this, the W3C is also working on a XML syntax for the XQuery semantics: XQueryX.

Conclusion

Though still a lot of work has to be done, XQuery is a very promising initiative in defining the standard for query and retrieval of XML documents and document collections. The majority of XML database vendors, and even some relational database vendors, have developed or announced XQuery implementations.

References to XQuery Implementations

About X-Hive Corporation

X-Hive Corporation is a leading innovator in XML database technology. Its mission is to provide superior technology and expertise to the growing market for XML applications and services. Its flagship product, X-Hive/DB is a native XML database based upon open standards which has the ability to instantly locate and retrieve the smallest element within large quantities of data. This sets X-Hive/DB apart from the competition and makes it the ideal foundation for building mission critical applications and large volume XML data environments.

X-Hive Corporation is an active member of the World Wide Web Consortium (W3C) and is based in Rotterdam, The Netherlands. X-Hive is proud to be an Official Partner 2001 for Renault Sport F1 and the Benetton Formula 1 Team. For more information, visit www.x-hive.com.

  

Back to Articles Page      

All information on this site is for training only. We do not warrant its correctness or its fitness to be used. The risk of using it remains entirely with the user. 

 

  Contact Us | E-mail Us | Site Guide | About PerfectXML | Advertise ©2004 perfectxml.com. All rights reserved. | Privacy