Transforming XML with XPath in SQL Server

Since the next version of SQL Server (2005, aka Yukon) is at least nominally coming out this year, it seems a bit late to write a tip about using the XML features in SQL Server 2000. However, better late than never.

As part of a project I’m working on, I needed to bring XML data into our production database for subsequent processing. The XML document that our partner’s application provides is what I would consider “loosely typed”—all the useful information about the data contained within is defined outside the actual schema. For an example, say the canonical sample XML document is written like this:


<ROOT>

<Customer CustomerID="VINET" ContactName="Paul Henriot">
 <Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
  <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
  <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
 </Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
 <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
  <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
 </Order>
</Customer>
</ROOT>

In the flavor of data description used by our partner, the same document would be written like this:


<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
 <Order>
  <OrderAttribute rawname="EmployeeID">"5"</OrderAttribute>
  <OrderAttribute rawname="OrderDate=">"1996-07-04T00:00:00"</OrderAttribute>
   <OrderDetail>
    <OrderDetailAttribute rawname="OrderID">"10248"</OrderDetailAttribute>
    <OrderDetailAttribute rawname="ProductID">"11"</OrderDetailAttribute>
    <OrderDetailAttribute rawname="Quantity">"12"</OrderDetailAttribute>
   </OrderDetail>
...
<ROOT>

That’s only slightly unfortunate; after all, XML is meant to be transformed, and if you wanted to turn the sample into sensible row-and-column data for a relational database you could use the features provided in OPENXML to do so.

Ah, but it turns out the XPATH syntax needed to transform the data (using the ColPattern mapping in the OPENXML) is slightly nontrivial. To extract the text of the elements, you need the text() function, which must be called after the predicate specifier. That is, if your OPENXML statement looks at the /Root/Customer/Order/OrderDetail branch, rather than writing OrderDetailAttribute/text()[@rawname="Quantity"] to get the value of the quantity of the order, you need to write child::OrderDetailAttribute[@rawname="Quantity"]/text(). (Note: If you wanted to include details from both the order and orderdetail branch, you actually have to specify child:: rather than SQL Server’s shorthand, otherwise you get a crossproduct.) (see below)

Anyway, once you have the correct XPath sequence, the rest is relatively trivial. The stored procedure below accepts the XML document as a parameter (you need to do this, generally speaking, unless you know that the XML document will be smaller than 4000 characters in size, since SQL Server 2000 doesn’t let you use local variables of type TEXT) and transforms it into a rowset, ready to be inserted into the database or otherwise manipulated:


CREATE PROCEDURE dbo.import_xml
	@xmldoc text
as
	DECLARE @hdoc int

EXEC sp_xml_preparedocument @hdoc OUTPUT, @xmldoc

--let's peek at the rowset

SELECT * FROM 
OPENXML (@hdoc,'/Root/Customer/Order',1)
 WITH (CustomerID varchar(300) 
'../@CustomerID', EmployeeID varchar(30)
'OrderAttribute[@rawname="EmployeeID"]', OrderID varchar(300)
'OrderDetail/OrderDetailAttribute[@rawname="OrderID"]', ProductID varchar(30)
'OrderDetail/OrderDetailAttribute[@rawname="ProductID"]', Quantity varchar(30)
'OrderDetail/OrderDetailAttribute[@rawname="Quantity"]') exec sp_xml_removedocument @hdoc GO

which produces, for the sample file excerpted above,

CustomerID EmployeeID OrderID ProductID Quantity
VINET 5 10248 11 12

This is probably old hat to those of my readers who know what the hell I’m talking about, but it was eye opening and nonobvious to me.

Update: Corrected after some input from a reader who knew better than I what the hell I was talking about (thanks, Michael Rys). As it turns out, you don’t need the text() function; the XPath shown above will correctly pull the text in the node once you’ve written the predicate correctly. Nothing to see here; move along.

New music Tuesday: Spoon

Man. I wish the iTunes Music Store folks would get their acts together and consistently update the New Releases and Just Added pages. As a music addict, I generally await Tuesdays like I await my first cup of coffee in the morning, and I just can’t get going without my fix. So it’s a good thing that the store’s banner ads at least are up to date; they advised me that the new release from Spoon, Gimme Fiction, is available.

I was looking forward to the release because the lead single, “I Turn My Camera On,” was excellent—a bit like an indie white boy version of Prince, only more swagger-y. But after the disappointment with the new music listings, I thought, I wonder if it’s available at eMusic instead? Sure enough. And buying a fifty-song booster pack let me download the album, plus recordings by Ali Farka Toure, Gene Ammons, Red Garland, and Material for just a few dollars more than the Spoon album by itself would cost at the iTMS.

Also available, and according to Fury worth hearing, is the new recording of Pierre Boulez conducting his own Le Marteau Sans Maitre. (And thanks to Fury for hipping me to the Gurgling Cod, which just got added to my list of favorite Boston area blogs.)

Finally, I happened to be in our local indy record store (okay, chain) this weekend, and found a copy of Godspeed You Black Emperor’s F#, A#, (∞) on vinyl, about which the band’s official discography page says it best:

initially limited to 500 numbered copies, then repressed. hand-made jacket, with one of three actual photographs glued on to the front. comes with a bag of goodies, including a penny flattened by a train. each track fills a side, and is made up of different pieces. side b runs out into a locked groove.

It is not a knock against the recording to say that last night, in a somewhat exhausted state, it took me about ten minutes to realize that the album runs out into a locked groove; the musical effect is quite consistent with the rest of that side. F#, A#, ∞, indeed. But what a cool album (even without the bag of goodies). And apparently, the only way to get the experience is on vinyl, since the band remastered and re-recorded portions for the 1998 CD release. (See the aforementioned discography page for some MP3 samples of the CD version.)

Mother’s Day … for peace?

The Reverend Dr. Nancy Taylor pointed out from the pulpit of Old South Church on Sunday that Mother’s Day has deep ties to Boston, and to progressive thought. Julia Ward Howe, the author of the words to the “Battle Hymn of the Republic,” called for the first Mother’s Day in 1870 as a day for peace, a protest against the carnage of war. (It was to take another 37 years to get an actual observance of the day, by which point it had morphed into a memorial day for mothers.) Here is Howe’s actual proclamation from 1870:

Arise, then, women of this day! Arise all women who have hearts,
whether our baptism be that of water or of fears!

Say firmly: “We will not have great questions decided by
irrelevant agencies. Our husbands shall not come to us, reeking
with carnage, for caresses and applause. Our sons shall not be
taken from us to unlearn all that we have been able to teach
them of charity, mercy and patience.”

We women of one country will be too tender of those of another
country to allow our sons to be trained to injure theirs. From
the bosom of the devastated earth a voice goes up with our own.
It says “Disarm, Disarm! The sword of murder is not the balance
of justice.”

Blood does not wipe our dishonor nor violence indicate possession.
As men have often forsaken the plow and the anvil at the summons
of war, let women now leave all that may be left of home for a
great and earnest day of counsel. Let them meet first, as women,
to bewail and commemorate the dead.

Let them then solemnly take counsel with each other as to the
means whereby the great human family can live in peace, each
bearing after their own time the sacred impress, not of Caesar,
but of God.

In the name of womanhood and of humanity, I earnestly ask that a
general congress of women without limit of nationality may be
appointed and held at some place deemed most convenient and at
the earliest period consistent with its objects, to promote the
alliance of the different nationalities, the amicable settlement
of international questions, the great and general interests of
peace.

Interesting that somehow in the intervening 135 years between the first proclamation and today, Mother’s Day became about flowers, candy, and lunch out… Dr. Taylor’s sermon on the topic was thought-provoking; hopefully it will get posted soon. (I really need to talk to her about blogging.)

(I’m about two days behind on my blogging; I was thinking about writing this on Sunday but didn’t quite get there.)