Transforming XML with XPath in SQL Server

Since the next version of SQL Server (2005, aka Yukon) is at least nominally coming out this year, it seems a bit late to write a tip about using the XML features in SQL Server 2000. However, better late than never.

As part of a project I’m working on, I needed to bring XML data into our production database for subsequent processing. The XML document that our partner’s application provides is what I would consider “loosely typed”—all the useful information about the data contained within is defined outside the actual schema. For an example, say the canonical sample XML document is written like this:


<ROOT>

<Customer CustomerID="VINET" ContactName="Paul Henriot">
 <Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
  <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
  <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
 </Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
 <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
  <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
 </Order>
</Customer>
</ROOT>

In the flavor of data description used by our partner, the same document would be written like this:


<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
 <Order>
  <OrderAttribute rawname="EmployeeID">"5"</OrderAttribute>
  <OrderAttribute rawname="OrderDate=">"1996-07-04T00:00:00"</OrderAttribute>
   <OrderDetail>
    <OrderDetailAttribute rawname="OrderID">"10248"</OrderDetailAttribute>
    <OrderDetailAttribute rawname="ProductID">"11"</OrderDetailAttribute>
    <OrderDetailAttribute rawname="Quantity">"12"</OrderDetailAttribute>
   </OrderDetail>
...
<ROOT>

That’s only slightly unfortunate; after all, XML is meant to be transformed, and if you wanted to turn the sample into sensible row-and-column data for a relational database you could use the features provided in OPENXML to do so.

Ah, but it turns out the XPATH syntax needed to transform the data (using the ColPattern mapping in the OPENXML) is slightly nontrivial. To extract the text of the elements, you need the text() function, which must be called after the predicate specifier. That is, if your OPENXML statement looks at the /Root/Customer/Order/OrderDetail branch, rather than writing OrderDetailAttribute/text()[@rawname="Quantity"] to get the value of the quantity of the order, you need to write child::OrderDetailAttribute[@rawname="Quantity"]/text(). (Note: If you wanted to include details from both the order and orderdetail branch, you actually have to specify child:: rather than SQL Server’s shorthand, otherwise you get a crossproduct.) (see below)

Anyway, once you have the correct XPath sequence, the rest is relatively trivial. The stored procedure below accepts the XML document as a parameter (you need to do this, generally speaking, unless you know that the XML document will be smaller than 4000 characters in size, since SQL Server 2000 doesn’t let you use local variables of type TEXT) and transforms it into a rowset, ready to be inserted into the database or otherwise manipulated:


CREATE PROCEDURE dbo.import_xml
	@xmldoc text
as
	DECLARE @hdoc int

EXEC sp_xml_preparedocument @hdoc OUTPUT, @xmldoc

--let's peek at the rowset

SELECT * FROM 
OPENXML (@hdoc,'/Root/Customer/Order',1)
 WITH (CustomerID varchar(300) 
'../@CustomerID', EmployeeID varchar(30)
'OrderAttribute[@rawname="EmployeeID"]', OrderID varchar(300)
'OrderDetail/OrderDetailAttribute[@rawname="OrderID"]', ProductID varchar(30)
'OrderDetail/OrderDetailAttribute[@rawname="ProductID"]', Quantity varchar(30)
'OrderDetail/OrderDetailAttribute[@rawname="Quantity"]') exec sp_xml_removedocument @hdoc GO

which produces, for the sample file excerpted above,

CustomerID EmployeeID OrderID ProductID Quantity
VINET 5 10248 11 12

This is probably old hat to those of my readers who know what the hell I’m talking about, but it was eye opening and nonobvious to me.

Update: Corrected after some input from a reader who knew better than I what the hell I was talking about (thanks, Michael Rys). As it turns out, you don’t need the text() function; the XPath shown above will correctly pull the text in the node once you’ve written the predicate correctly. Nothing to see here; move along.