perfectxml.com
 Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
  You are here: home Info Bank Articles OPENXML Saturday, 23 February 2008
 

Back to Articles Page      

        

OPENXML

Author: Darshan Singh -- Managing Editor, perfectxml.com


Article Date: August 2001
Abstract: With SQL Server 2000, there are many ways you can get XML representation of relational data. The options include using FOR XML clause with SELECT statement, ADO Recordset's Save method with adPersistXML parameter, etc. It would be nice to have a reverse mechanism, which allows turning XML document into tabular, relational rowset format. The good news is that SQL Server 2000 team thought about this and introduced a new Transact-SQL keyword named OPENXML along with two new supporting system stored procedures: sp_xml_preparedocument and sp_xml_removedocument.

OPENXML allows accessing XML data as if it were a relational recordset. It provides a tabular (rowset) view of in-memory representation of an XML document. Technically, OPENXML is a rowset provider similar to a table or a view; hence it can be used wherever a table or a view is used. For instance, you can use OPENXML with SELECT or SELECT INTO statements, to work on an XML document that is retained in memory.

In this article, we're going to take a detailed look at OPENXML syntax and write couple of examples. Since use of OPENXML is dependent on sp_xml_preparedocument and sp_xml_removedocument system stored procedures, we'll study those first.
Related Articles: Importing XML into SQL Server 2000
Exporting SQL Data as XML
  • Using sp_xml_preparedocument
    As mentioned earlier, OPENXML provides a tabular rowset view of an in-memory XML document. So how do we set up our XML document in memory so that OPENXML can consume it? The answer is to use sp_xml_preparedocument system stored procedure.

    The sp_xml_preparedocument stored procedure takes XML document as a string input parameter and returns a handle to this in-memory representation, which then can be passed to OPENXML statement.

    sp_xml_preparedocument [@hdoc =] <integer variable> OUTPUT, [, @xmltext = <character data>] [, @xpath_namespaces = <namespace URIs>]

    This stored procedure reads the XML text provided as input, parses the text using Microsoft XML parser MSXML (msxml2.dll), and returns a handle that can be used to access this internal memory representation of input XML document.
    Note that SQL Server 2000 will use MSXML 2.6, and not the latest MSXML 3.0, even though it is installed in replace mode. This will be fixed in SP1 for SQL Server 2000.
    The parameters of this stored procedure are fairly self-describing: hdoc is an output parameter and is an integer handle to in-memory representation of XML document passed as xmltext input parameter. The third optional parameter lists the namespaces URIs used in the XML document.

    The sp_xml_preparedocument stored procedure loads and stores XML document in memory that is part of SQL Server cache. Hence it is recommended to free up the memory once the handle is no longer required. This is done using sp_xml_removedocument system stored procedure.


  • Using sp_xml_removedocument
    It is very important to do cleanup and free up memory to avoid running into out of memory issues. The system stored procedure sp_xml_removedocument takes document handle as an integer input parameter and removes the internal representation of the XML document specified by the document handle and invalidates the document handle.

    sp_xml_removedocument [@hdoc =] <handle to XML document>

    To summarize, use of OPENXML involves three steps: calling sp_xml_preparedocument to generate in-memory representation of parsed XML document, using OPENXML, and finally doing cleanup by calling sp_xml_removedocument. Let's now look at OPENXML in detail.


  • OPENXML
    Once we have the XML document loaded in memory using sp_xml_preparedocument, we can then pass the document handle to OPENXML and use the XML document as traditional relational tabular rowset. The syntax of OPENXML looks like:

    OPENXML (<handle>, <XPath rowpattern>, [, <mapping flags>]) [WITH (<Schema Declaration> | <Table Name>)]

    The various clauses with OPENXML statement are explained here.
    • <handle> parameter is the document handle of the internal representation of an XML document created by calling sp_xml_preparedocument.
    • <XPath rowpattern> is the XPath pattern used to identify the nodes to be processed as rows.
    • <mapping flags>, an optional input parameter, is the bit-mask flag value to indicate if attributes or sub-elements be used to do mapping between XML data and the relational rowset columns, and how the spill-over column should be filled. The values that this parameter can have include:
      0 Use the attribute-centric mapping. This is the default.
      1 Use the attribute-centric mapping. This can be combined with XML_ELEMENTS (2) to indicate use of element-centric mapping for all columns not yet dealt with. If XPath finds both an attribute and an element with the same name, the attribute wins. This can also be combined with option 8 or both (2 and 8).
      2 Use the element-centric mapping. This can be combined with XML_ATTRIBUTES (1) to indicate use of attribute-centric mapping for all columns not yet dealt with. If XPath finds both an element and an attribute with the same name, the element wins. This can also be combined with option 8 or both (1 and 8).
      8 This flag indicates that the consumed data should not be copied to the overflow property @mp:xmltext. More about this and @mp:xmltext meta property discussed later in the article. This can be combined with XML_ATTRIBUTES (1) or XML_ELEMENTS (2).
    • Finally, the optional WITH clause provides the format for resulting rowset and additional mapping information, using either <Schema Declaration> or by specifying an existing <Table Name>.


    The <Schema Declaration> is of the form:

    <Column Name> <data type> [<Column XPath> | <Meta Property>]
    [, <Column Name> <data type> [<Column XPath> | <Meta Property>] ]

    • <Column Name> is the name of attribute or element being retrieved and the column name in the rowset. This will also serve as the name that we'll refer to when building the SELECT list, performing JOINs, etc.
    • <data type> is any valid SQL Server data type of the column in the rowset. As XML can have data types that are not equivalents of those in SQL Server, an automatic coercion takes place in these cases if necessary. If the column is of type timestamp, the data value from XML document is disregarded when selecting from an OPENXML rowset, and the autofill values are returned.
    • <Column XPath> is an optional relative XPath pattern that describes how the XML nodes should be mapped to columns. If this clause is not included, the default mapping (attribute-centric or element-centric as specified by <mapping flags>) is used.
    • <Meta Property> is one of special Meta properties that OPENXML provides. Meta properties describe various aspects of whatever part of XML DOM you're interested in. Here is a list of available Meta properties:

    @mp:id Provides system-generated, document-wide identifier of the DOM node. An XML ID of 0 indicates that the element is a root element and its @mp:parentid is NULL.
    @mp:parentid Same as @mp:id, only for the parent.
    @mp:localname Provides the non-fully qualified name of the node. It is used with prefix and namespace URI to name element or attribute nodes.
    @mp:parentlocalname Same as @mp:localname, only for the parent.
    @mp:namespaceuri Provides the namespace URI of the current element. If the value of this attribute is NULL, no namespace is present.
    @mp:parentnamespaceuri Same as @mp:namespaceuri, only for the parent.
    @mp:prefix The namespace prefix of the current element name.
    @mp:prev Stores the @mp:id of the previous sibling relative to a node. Using this, we can tell something about the ordering of the elements at the current level of the hierarchy.
    @mp:xmltext This Meta property is used for processing purposes, and contains the actual XML text for the current element as used in the overflow handling of OPENXML.


    The alternative to <Schema Declaration> is specifying the name of an existing table. <Table Name> can be specified if a table with the desired schema already exists and no column patterns are required.

    If the optional WITH clause is not specified, the results are returned in a format known as edge table, which represents a single table having columns: id, parentid, nodetype, localname, prefix, namespaceuri, datatype, prev, and text. The nodetype column can have value of 1, 2 or 3 to indicate a element node, attribute node, or text node respectively.

    Having looked at syntax of OPENXML and available options, let's now look at some examples.


  • Simple SELECT Example
    This is a very simple example illustrating how we can use OPENXML as a rowset provider in conjunction with SELECT T-SQL statement:

    DECLARE @idoc int

    DECLARE @doc varchar (1000)

    SET @doc ='
    <ROOT>
    <ShipperRec Company="ABC Shippers" Ph="(503) 555-9191" />
    </ROOT>'

    --Create an internal representation of the XML document
    EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

    -- Execute a SELECT statement that uses the OPENXML rowset provider
    SELECT *
    FROM OPENXML (@idoc, '/ROOT/ShipperRec',1)
    WITH (Company varchar(80),
    Ph varchar(48)
    )

    -- Clear the XML document from memory
    EXEC sp_xml_removedocument @idoc



    The above lines of code simply declares two variables, one to hold document handle and other variable is a XML string. In this example we are directly assigning the XML text to the variable, but remember that this XML document can come from any source, for instance an web application form post, or two-tier Microsoft Visual Basic client application, etc.

    The three important statements in above code include calling sp_xml_preparedocument to load the XML document string @doc. Next, we call SELECT statement with using OPENXML in the FROM clause. The OPENXML uses an attribute-centric mapping and its schema definition specifies output rowset to have two columns: Company and Ph. The XPath row pattern (/ROOT/ShipperRec) identifies the node in XML tree to work on. Finally, we call to remove XML document from memory. The output of above example execution looks like:

    Company            Ph
    ---------------    ------------------
    ABC Shippers       (503) 555-9191
    


  • Data Modifications with OPENXML
    OPENXML can also be used to perform data modifications. Let's change the above example slightly and see how we can insert records into a table:

    ...
    -- Execute an INSERT statement that uses the OPENXML rowset provider
    INSERT INTO Shippers
    SELECT *
    FROM OPENXML (@idoc, '/ROOT/ShipperRec',1)
    WITH (Company varchar(80),
    Ph varchar(48)
    )

    ...


    The only change we have done is added an INSERT INTO statement and with this change, the code inserts a new record into Shippers table in Northwind database.


  • Element-centric Mapping
    Here is an example of how to use XML elements instead of attributes to map to rowset columns:

    DECLARE @idoc int

    DECLARE @doc varchar (1000)

    SET @doc ='
    <ROOT>
    <ShipperRec>
    <Company>ABC Shippers</Company>
    <Ph>(503) 555-9191</Ph>
    </ShipperRec>
    </ROOT>'

    --Create an internal representation of the XML document
    EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

    -- Execute a SELECT statement that uses the OPENXML rowset provider
    SELECT *
    FROM OPENXML (@idoc, '/ROOT/ShipperRec',2)
    WITH (Company varchar(80),
    Ph varchar(48)
    )


    -- Clear the XML document from memory
    EXEC sp_xml_removedocument @idoc


    The source XML document now uses sub-elements rather than attributes to specify data values and XML_ELEMENTS (2) is passed as third parameter to OPENXML. It is also possible to combine attribute-centric and element-centric mapping.


  • Summary
    OPENXML provides a way to treat XML document as a relational rowset and hence can be used in conjunction with T-SQL statements to work directly on XML data. The first step for using OPENXML is making call to sp_xml_preparedocument system stored procedure, which loads XML document string into memory, parses it and returns a handle that further can be used to access this in-memory representation. Finally, when done, clean up is required and call to sp_xml_removedocument frees up the memory allocated to this internal representation of the XML document.

    In this article, we studied the OPENXML syntax and options available with it. We also saw couple of simple OPENXML examples.


If you have any questions or comments, feel free to contact author of this article, Darshan Singh at darshan@perfectxml.com.
Related KB Articles:
Recently asked questions on this article:
Q:  I would like to dynamically generate the OPENXML statement and pass it the selection XPath expression and the select column list. Can you please show an example of how to do it?

A:  There are I think few ways to dynamically build and execute T-SQL statement. Here is an example that makes use of sp_executesql stored procedure to execute a dynamically generated T-SQL statement.

--Source XML Document
DECLARE @doc nvarchar (4000)
SET @doc ='<empDetails>
    <emp id="1" firstName="John" lastName="smith"/>
    <emp id="2" firstName="Mark" lastName="Robinson"/>
    <emp id="3" firstName="Dave" lastName="Schur"/>
</empDetails>'

--Load XML document
DECLARE @idoc int
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

--Dynamically generated SQL statement
DECLARE @sqlStatement nvarchar(4000)

--Our main params that we would like to pass/control
DECLARE @primaryXPath nvarchar(255)
DECLARE @selectFieldList varchar(2000)

--First Test
    SET @primaryXPath = '/empDetails/emp'
    SET @selectFieldList = '[FirstName] NVARCHAR(100) ''@firstName'''
    SET @sqlStatement = 'SELECT * FROM OPENXML (@idoc, @primaryXPath, 3)
    WITH
    (' + @selectFieldList + ')'
    
    EXECUTE  sp_executesql @sqlStatement, 
    @params = N'@idoc INT, @primaryXPath nvarchar(255), @selectFieldList varchar(2000)',
    @idoc=@idoc, @primaryXPath=@primaryXPath, @selectFieldList=@selectFieldList

--Second Test
    SET @primaryXPath = '/empDetails/emp[@id=3]'
    SET @selectFieldList='[FirstName] NVARCHAR(100) ''@firstName'', [LastName] NVARCHAR(100) ''@lastName'''
    SET @sqlStatement = 'SELECT * FROM OPENXML (@idoc, @primaryXPath, 3)
    WITH
    (' + @selectFieldList + ')'
    
    EXECUTE  sp_executesql @sqlStatement, 
    @params = N'@idoc INT, @primaryXPath nvarchar(255), @selectFieldList varchar(2000)',
    @idoc=@idoc, @primaryXPath=@primaryXPath, @selectFieldList=@selectFieldList

--Unload the document
EXEC sp_xml_removedocument @idoc 
The first test above produces three rows with firstName value; and the second test returns first and last name for employee with id = 3.


Q:  I have an XML document that uses Namespaces, when I try to use this document with OPENXML, it does not work. I am not able to get the node values.

A:  The answer lies in the fact that sp_xml_preparedocument takes an optional third parameter where we can specify the namespace to be used with the XPath expressions passed to the OPENXML. The use of prefixes in the namespace declaration with sp_xml_preparedocument is mandatory, even if the XML document uses the default namespace.

In the following example, the XML document uses the namespace declarations (default and with prefix). The first OPENXML illustrates accessing the nodes defined with namespace prefix; and the second OPENXML illustrates accessing the nodes defined with default namespace.


DECLARE @idoc int

DECLARE @doc varchar (1000)

SET @doc ='
<?xml version="1.0" encoding="UTF-8"?>
<bk:Books xmlns:bk="http://www.perfectxml.com">
	<bk:Book bk:ISBN="186100589X">
		<bk:Title>XML Application Development with MSXML 4.0</bk:Title>
		<bk:Publisher>Wrox Press</bk:Publisher>
		<bk:DateOfPurchase>2/1/2002</bk:DateOfPurchase>
		<Stores xmlns="BooksURI">
			Amazon, B&amp;N
		</Stores>
	</bk:Book>
	<bk:Book bk:ISBN="1861005466">
		<bk:Title>Professional SQL Server 2000 XML</bk:Title>
		<bk:Publisher>Wrox Press</bk:Publisher>
		<bk:DateOfPurchase>9/10/2001</bk:DateOfPurchase>
		<Stores xmlns="BooksURI">
			BookPool
		</Stores>
	</bk:Book>
</bk:Books>
'

--Create an internal representation of the XML document
-- Example 1: XML doc uses the Namespace prefix
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc, 
   '<bk:Books xmlns:bk="http://www.perfectxml.com" />'

-- Execute a SELECT statement that uses the OPENXML rowset provider
SELECT *
FROM OPENXML (@idoc, '/bk:Books/bk:Book', 8)
WITH 
(
	ISBN varchar(100) '@bk:ISBN',
	Title varchar(100) 'bk:Title',
	Publisher varchar(100) 'bk:Publisher',
	DateOfPurchase varchar(100) 'bk:DateOfPurchase'
)

-- Clear the XML document from memory
EXEC sp_xml_removedocument @idoc 

---------------------------------------

--Create an internal representation of the XML document
-- Example 2: XML Node uses the Default Namespace
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc, 
   '<bk:Books xmlns:st="BooksURI" xmlns:bk="http://www.perfectxml.com"/>'

-- Execute a SELECT statement that uses the OPENXML rowset provider
SELECT *
FROM OPENXML (@idoc, '/bk:Books/bk:Book', 8)
WITH 
(
	Stores varchar(100) 'st:Stores'
)

-- Clear the XML document from memory
EXEC sp_xml_removedocument @idoc 

In the above example, the XML document contains Books node and its child nodes defined using a namespace prefix, while the Stores node uses the default namespace.

Note how we have defined the namespace with the sp_xml_preparedocument; even for the default namespace node in the XML document, we have defined the prefix while declaring the namespace with sp_xml_preparedocument.

Q:  How do I update using OPENXML and is it required to match column names with attribute/element names? Thanks.

Answer:

A:  Here is an example of updating a record using OPENXML

USE tempdb
go
CREATE TABLE emps (empid INT IDENTITY(1,1), empname VARCHAR(30)) 
go 

INSERT INTO emps SELECT 'Darshan' 
go 
INSERT INTO emps SELECT 'Sunny' 
go 
INSERT INTO emps SELECT 'Andy' 
go 

SELECT * FROM emps 
go

-----------Updating using OPENXML
DECLARE @hDoc int
DECLARE @cDoc varchar(8000)

SET @cDOC = '<ROOT><NewName ID="3">ASmith</NewName></ROOT>' 

EXEC sp_xml_preparedocument @hDoc OUTPUT, @cDoc 

UPDATE emps 
	SET empname = x.NewName
	FROM 
	(
		SELECT NewName, empid
		FROM OPENXML(@hDOC, '/ROOT', 1)
		WITH 
		(
			NewName varchar(30) '/ROOT/NewName',
			empid int '/ROOT/NewName/@ID'
		)
	) x, emps
	WHERE emps.empid = x.empid

EXEC sp_xml_removedocument @hdoc 
go
SELECT * FROM emps
go


B.) I don't think element/attribute names should necessarily match column names (infact above example proves that! ).
Here is another small example of inserting data into a table from the XML document:

DROP TABLE emps
go
CREATE TABLE emps (ID int, EmpName varchar(30))
GO

DECLARE @hDoc int
DECLARE @cDoc varchar(8000)

SET @cDOC = '<ROOT><NewName ID="3">ASmith</NewName><NewName ID="4">BOB</NewName></ROOT>'
EXEC sp_xml_preparedocument @hDoc OUTPUT, @cDoc 

INSERT INTO emps 
	SELECT *
	FROM OPENXML(@hDOC, '/ROOT/NewName', 3)
	WITH 
	(
		ID int '@ID',
		NewName varchar(30) '.'
	)

EXEC sp_xml_removedocument @hdoc 
go
SELECT * FROM emps
go

  

Back to Articles Page      

All information on this site is for training only. We do not warrant its correctness or its fitness to be used. The risk of using it remains entirely with the user. 

 

  Contact Us | E-mail Us | Site Guide | About PerfectXML | Advertise ©2004 perfectxml.com. All rights reserved. | Privacy