Transforming XML with XPath in SQL Server

Since the next version of SQL Server (2005, aka Yukon) is at least nominally coming out this year, it seems a bit late to write a tip about using the XML features in SQL Server 2000. However, better late than never.

As part of a project I’m working on, I needed to bring XML data into our production database for subsequent processing. The XML document that our partner’s application provides is what I would consider “loosely typed”—all the useful information about the data contained within is defined outside the actual schema. For an example, say the canonical sample XML document is written like this:


<ROOT>

<Customer CustomerID="VINET" ContactName="Paul Henriot">
 <Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
  <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
  <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
 </Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
 <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
  <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
 </Order>
</Customer>
</ROOT>

In the flavor of data description used by our partner, the same document would be written like this:


<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
 <Order>
  <OrderAttribute rawname="EmployeeID">"5"</OrderAttribute>
  <OrderAttribute rawname="OrderDate=">"1996-07-04T00:00:00"</OrderAttribute>
   <OrderDetail>
    <OrderDetailAttribute rawname="OrderID">"10248"</OrderDetailAttribute>
    <OrderDetailAttribute rawname="ProductID">"11"</OrderDetailAttribute>
    <OrderDetailAttribute rawname="Quantity">"12"</OrderDetailAttribute>
   </OrderDetail>
...
<ROOT>

That’s only slightly unfortunate; after all, XML is meant to be transformed, and if you wanted to turn the sample into sensible row-and-column data for a relational database you could use the features provided in OPENXML to do so.

Ah, but it turns out the XPATH syntax needed to transform the data (using the ColPattern mapping in the OPENXML) is slightly nontrivial. To extract the text of the elements, you need the text() function, which must be called after the predicate specifier. That is, if your OPENXML statement looks at the /Root/Customer/Order/OrderDetail branch, rather than writing OrderDetailAttribute/text()[@rawname="Quantity"] to get the value of the quantity of the order, you need to write child::OrderDetailAttribute[@rawname="Quantity"]/text(). (Note: If you wanted to include details from both the order and orderdetail branch, you actually have to specify child:: rather than SQL Server’s shorthand, otherwise you get a crossproduct.) (see below)

Anyway, once you have the correct XPath sequence, the rest is relatively trivial. The stored procedure below accepts the XML document as a parameter (you need to do this, generally speaking, unless you know that the XML document will be smaller than 4000 characters in size, since SQL Server 2000 doesn’t let you use local variables of type TEXT) and transforms it into a rowset, ready to be inserted into the database or otherwise manipulated:


CREATE PROCEDURE dbo.import_xml
	@xmldoc text
as
	DECLARE @hdoc int

EXEC sp_xml_preparedocument @hdoc OUTPUT, @xmldoc

--let's peek at the rowset

SELECT * FROM 
OPENXML (@hdoc,'/Root/Customer/Order',1)
 WITH (CustomerID varchar(300) 
'../@CustomerID', EmployeeID varchar(30)
'OrderAttribute[@rawname="EmployeeID"]', OrderID varchar(300)
'OrderDetail/OrderDetailAttribute[@rawname="OrderID"]', ProductID varchar(30)
'OrderDetail/OrderDetailAttribute[@rawname="ProductID"]', Quantity varchar(30)
'OrderDetail/OrderDetailAttribute[@rawname="Quantity"]') exec sp_xml_removedocument @hdoc GO

which produces, for the sample file excerpted above,

CustomerID EmployeeID OrderID ProductID Quantity
VINET 5 10248 11 12

This is probably old hat to those of my readers who know what the hell I’m talking about, but it was eye opening and nonobvious to me.

Update: Corrected after some input from a reader who knew better than I what the hell I was talking about (thanks, Michael Rys). As it turns out, you don’t need the text() function; the XPath shown above will correctly pull the text in the node once you’ve written the predicate correctly. Nothing to see here; move along.

New music Tuesday: Spoon

Man. I wish the iTunes Music Store folks would get their acts together and consistently update the New Releases and Just Added pages. As a music addict, I generally await Tuesdays like I await my first cup of coffee in the morning, and I just can’t get going without my fix. So it’s a good thing that the store’s banner ads at least are up to date; they advised me that the new release from Spoon, Gimme Fiction, is available.

I was looking forward to the release because the lead single, “I Turn My Camera On,” was excellent—a bit like an indie white boy version of Prince, only more swagger-y. But after the disappointment with the new music listings, I thought, I wonder if it’s available at eMusic instead? Sure enough. And buying a fifty-song booster pack let me download the album, plus recordings by Ali Farka Toure, Gene Ammons, Red Garland, and Material for just a few dollars more than the Spoon album by itself would cost at the iTMS.

Also available, and according to Fury worth hearing, is the new recording of Pierre Boulez conducting his own Le Marteau Sans Maitre. (And thanks to Fury for hipping me to the Gurgling Cod, which just got added to my list of favorite Boston area blogs.)

Finally, I happened to be in our local indy record store (okay, chain) this weekend, and found a copy of Godspeed You Black Emperor’s F#, A#, (∞) on vinyl, about which the band’s official discography page says it best:

initially limited to 500 numbered copies, then repressed. hand-made jacket, with one of three actual photographs glued on to the front. comes with a bag of goodies, including a penny flattened by a train. each track fills a side, and is made up of different pieces. side b runs out into a locked groove.

It is not a knock against the recording to say that last night, in a somewhat exhausted state, it took me about ten minutes to realize that the album runs out into a locked groove; the musical effect is quite consistent with the rest of that side. F#, A#, ∞, indeed. But what a cool album (even without the bag of goodies). And apparently, the only way to get the experience is on vinyl, since the band remastered and re-recorded portions for the 1998 CD release. (See the aforementioned discography page for some MP3 samples of the CD version.)

Mother’s Day … for peace?

The Reverend Dr. Nancy Taylor pointed out from the pulpit of Old South Church on Sunday that Mother’s Day has deep ties to Boston, and to progressive thought. Julia Ward Howe, the author of the words to the “Battle Hymn of the Republic,” called for the first Mother’s Day in 1870 as a day for peace, a protest against the carnage of war. (It was to take another 37 years to get an actual observance of the day, by which point it had morphed into a memorial day for mothers.) Here is Howe’s actual proclamation from 1870:

Arise, then, women of this day! Arise all women who have hearts,
whether our baptism be that of water or of fears!

Say firmly: “We will not have great questions decided by
irrelevant agencies. Our husbands shall not come to us, reeking
with carnage, for caresses and applause. Our sons shall not be
taken from us to unlearn all that we have been able to teach
them of charity, mercy and patience.”

We women of one country will be too tender of those of another
country to allow our sons to be trained to injure theirs. From
the bosom of the devastated earth a voice goes up with our own.
It says “Disarm, Disarm! The sword of murder is not the balance
of justice.”

Blood does not wipe our dishonor nor violence indicate possession.
As men have often forsaken the plow and the anvil at the summons
of war, let women now leave all that may be left of home for a
great and earnest day of counsel. Let them meet first, as women,
to bewail and commemorate the dead.

Let them then solemnly take counsel with each other as to the
means whereby the great human family can live in peace, each
bearing after their own time the sacred impress, not of Caesar,
but of God.

In the name of womanhood and of humanity, I earnestly ask that a
general congress of women without limit of nationality may be
appointed and held at some place deemed most convenient and at
the earliest period consistent with its objects, to promote the
alliance of the different nationalities, the amicable settlement
of international questions, the great and general interests of
peace.

Interesting that somehow in the intervening 135 years between the first proclamation and today, Mother’s Day became about flowers, candy, and lunch out… Dr. Taylor’s sermon on the topic was thought-provoking; hopefully it will get posted soon. (I really need to talk to her about blogging.)

(I’m about two days behind on my blogging; I was thinking about writing this on Sunday but didn’t quite get there.)

Stumps and saws and prybars

Lisa spent the weekend in our front yard doing “structural landscape improvements” (meaning pulling nasty hedges out by the roots), while I helped uproot some of the stumps while reversing the last bit of damage done to the house when we moved the fridge in.

First, the hedges: with such a small front yard, we decided it didn’t make sense to devote a lot of it to hedges—particularly to hedges that grow like weeds and lose all their leaves in a spectacularly ugly fashion in winter. So last weekend Lisa took a variety of pruning tools to them and cut them down to a few inches above the ground. This was a weekend’s worth of work in itself, particularly the part where we had to bag all the cuttings. Then on Friday Lisa took a day off and started digging the roots out. Almost all the hedges had taproots that were a foot long or more and several inches in diameter, which were impossible to dig out with shovels and very difficult to sever. Fortunately our neighbors stopped by and said, “We have a crowbar that might help you out.” Thanks, said Lisa, but we already have one. “Not like this you don’t,” said our neighbor, and came back with a five-foot-long solid iron bar that had to be about seventy years old. It was cylindrical at one end, but tapered to a rectangular form at the other end with a pyramidal tip. I had never heard of using a prybar for gardening, but it’s apparently a standard item in Roger Cook’s toolkit. I’ll upload a picture later.

With the prybar in hand, Lisa was able to lever out all the taproots unaided, leaving nothing but a few smaller, easily-extractable roots behind. I helped with some of the bigger root balls and did disposal work, but all in all the job went extraordinarily smoothly, which was good because I had other tasks to do.

I spent Saturday sanding the first coat of joint compound I had previously applied over the bare frame and drywall of our kitchen doorway, then applying a second skim coat. I was worried about making a mess with the sanding, but I found a useful ShopVac attachment that kept the dust from getting out of hand. I primed most of the new wall surface on Sunday, but I need to go back and make one more pass on one side where the compound didn’t smooth out the tape. Fixing the doorway left just one problem—the ceiling over the fridge.

Over the last few months we’ve kept looking at the gaping holes in the ceiling over the fridge (regular readers will remember we discovered them when I pulled down a cabinet to make room for the fridge). We had thought about putting up a patch, but after the pain of working with the backing board on the doorway we started contemplating other approaches. Late last week Lisa asked, “Why don’t we just re-mount the cabinet?” and I had to admit it seemed like the best approach. Unfortunately I needed to make the cabinet about an inch shorter, and I only had about 5/8ths of an inch extra trim above the top and below the bottom of the cabinet. So out came the Sawzall, which I used to trim a half inch from the top and the bottom of the cabinet. It was tight—in one or two places I accidentally planed a little off the top of the cabinet—but in the end it got it done. Then Lisa and I put the cabinet on a dolly and rolled it up and into the kitchen via our porch. The last step was the hardest, essentially a clean-and-jerk from the dolly straight up to the top of the fridge, but the cabinet made it and is now resting up there until I can re-secure it to the studs.

I dream of having a kitchen in which the Sawzall has no place. I think it will be a while before we’re there.

iChat issues in Tiger

The iChat issues I wrote about persist, despite the now-total absence of Virex from my machine. These threads on the Apple discussion board suggests that it is a fairly widespread problem. I’ve read every suggestion, from the mundane (do routine system maintenance and try again) to the exotic (delete some entries in the iChat plist file, make sure your correspondents apply all system updates), to the just plain superstitious (make sure you start iChat before you start any other application).

Me? I think this smells like insufficient testing on Apple’s part, and I’m looking forward to seeing the first post-Tiger iChat update.

Can you IM me now? Good!

Boing Boing: Space Needle to be converted to WiFi antenna. According to this KOMO TV story, Speakeasy (and two other firms) are teaming up to create a city-wide WiMax network that will eventually be available to individuals as well as businesses. Though I disagree with Boing Boing’s assertion that the Space Needle is a “white elephant” (the only other places with an equally cool Seattle views are Pike Place Market and Anthony’s Pier 66, and both of those have bay views rather than the incredibly cool Lake Union views), I can’t argue with the following:

Quinn Norton first observed that looking at some big weird chunk of metal (say, a Stanford radio-telescope) and saying “That would make a great WiFi antenna” is the twenty-first century equivalent of pointing at every hollow object and opining “that would make a great bong.”

Whither Mac anti-virus protection?

I’m starting to wonder a little about what I will do on my Mac for virus protection since Virex 7.5.x isn’t Tiger compatible. I’m apparently not alone in wondering. MrBarrett.com neatly summarizes the current Mac antivirus marketplace, and points out a few contenders I hadn’t considered, including Sophos (which is apparently only available to business customers) and ClamAV, which may be the only option to fill the gap between now and the vaporous release of Virex 7.7.

Solving ASP.NET application problems

I have been working with a partner of ours to get an ASP.NET web application running on my Windows 2000 computer at the office. We had no joy for several hours last night trying to figure out why none of the .aspx pages in the application could be contacted. We were getting an interesting error message:


Server Error in '/BogusAppname' Application.
----------------------------------------------------

The resource cannot be found.
Description: HTTP 404. The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable. Please review the following URL and make sure that it is spelled correctly.

Requested Url: /BogusAppname/login.aspx

----------------------------------------------------
Version Information: Microsoft .NET Framework Version:1.1.4322.2032; ASP.NET Version:1.1.4322.2032

I had to give up with the guy from our partner company at 8 PM my time last night. This morning, I did some Googling and found an all-too-simple-sounding solution: the application wasn’t correctly registered as a virtual directory. I went to the IIS manager, created a new virtual directory with the same name as the directory I was trying to hit, and lo and behold the application started working.

The difference was only apparent in the properties dialog on the application directory. The virtual directories on the server had “Virtual Directory” as the first tab on the properties dialog, but our application directory had “Directory”.

Perhaps this will save someone’s thinning hair.

Calling BS

Two compelling cries of bullshit in the blogosphere this morning. First, Dave Winer pegs the onslaught of advertisements in RSS accurately: it represents lazy thinking by marketers:

Here’s some food for thought for “marketers” who say they need to put ads in RSS feeds to make them pay. By some calculations, in three years, 27 percent of the NY Times hits will originate from their RSS feeds. The BBC is aiming for 10 percent by the end of the year. Neither company puts ads in their feeds because: The feeds themselves are ads for the stories they link to, which are revenue-generators. Anything that keeps people from clicking, that confuses them, takes them off course, is going to drop the click-through rate. And it’s a good deal for the users, because they get the headlines and summaries for articles they only have a superficial interest in, and can easily access the full stories for articles that they want more information on. The rare win-win.

And Doc Searls posts an insightful criticism of the effects of bell-curve thinking on IT, the educational system, and individual achievers, in a post that follows up his equally insightful two-part review of Tom Friedman’s The World is Flat and “It’s a Flat World, After All”:

…all this might also help explain why I chafe at the caste system implied in labels like “Alpha blogger.”

What I love about blogging is that it isn’t school. Instead it’s a great way to discover how the long, flat tail features plenty of original and brilliant individuals. These good folks succeed by earning links, not grades. It’s a much better, and a much flatter, system.

Expansion begins at home, in Redmond

microsoft campus buildings 1 - 6

CNet: After looking around, Microsoft decides to expand at home. That the travel across the 520 bridge to get to hypothetical Seattle offices should be viewed as an insurmountable obstacle to development should surprise no one. Microsoft doesn’t know how to build software (broad generalization) unless all the people are within two minutes’ walk of each other. For all its high-techness, the most surprising thing about the Microsoft culture is how meeting-centric it is.

The thing that caught my eye was this sentence in the third paragraph: “The company also plans to demolish and rebuild 600,000 square feet of older buildings that lack the power and cooling capacity needed for modern computer equipment.” I wonder if that means a final sayonara for buildings 1 through 6? These were the original Microsoft buildings on the Redmond campus, and while they’re earthquake damaged, mold infested, and confusing as hell (check out the satellite photo here and then imagine navigating around the corridors inside those Xs), they’s historic. They also fit sympathetically into the wooded ravine landscape in a way that none of the later buildings manage. Maybe one of my Microsoftie friends can comment on this?

Tiger part II: iChat and Virex

Tonight we tried to talk to Lisa’s parents over iChat, and it didn’t work. I kept throttling the bandwidth of the client down, and it kept reporting “Insufficient bandwidth to maintain the connection.” I thought, huh? Then I checked online.

Thanks to the magic of Google, I found it: Virex 7.5.1, not compatible with Tiger. The good readers of Macintouch had already flagged it as an issue with iChat. I had forgotten about the reported incompatibility until an iChat reader pointed out that processor utilization was pegged by one of the vshield processes. Sure enough: killing the process freed up the CPU.

The Virex issue is troubling: it’s software that was provided by Apple, via the .Mac subscription service. Surely they would have thought to test it? Or for Apple to let Network Associates know that they ought to test it?

Time traveler’s convention, anyone?

Okay, pop quiz: what’s funnier than a bunch of MIT students deciding that this weekend will be the only ever Time Travelers’ Convention (because, technically, you would only need one time traveler convention)? First, the decision that it would be held in East Campus Courtyard, which certainly won’t hold all the necessary time travelers. Second, the story on CNet. And third, the comments on CNet, which as of this writing included:

  • I went last year. I was a bit disappointed as the only people there were a bunch of geek wanna-bes.”
  • Which parallel universe? They forgot to mention which parallel universe it was held in. My timeline led to 28,762 instances — of which 32 looked very likely — but I was low on Planck energy… ”

Finally, of course, the CNet story misses one very important thing: the proximity of the Time Travelers’ Convention to the Disco Dance Floor. I submit that this is nothing more than a cheap excuse for a party. Which of course I want to crash.

Tiger notes: install, Spotlight, one-time hits

My copy of Mac OS X 10.4 (Tiger) arrived yesterday, finally. So far? Well, the upgrade went smoothly enough. That’s about all I can report, really; I got home at 9 pm last night and had just enough time to run a backup, verify and repair some minor permissions issues on my hard drive, and kick off the installer before going to bed.

One thing that almost bit me in the butt: hard disk space. I have habitually been down to less than 2 GB free disk space for about the past month—blame digital music—and the installer told me it didn’t have enough room to install Tiger. I was able to proceed by deselecting a bunch of printer drivers. I would have deselected some language files instead, but it appears that, at least with Tiger, there is no way to opt not to upgrade a previously installed language pack. (Incidentally, it makes me nervous that by deselecting drivers from eight or so printer manufacturers, I was able to reclaim nearly 800 MB of hard disk from the install. What do they put in those things—encyclopedias?)

At any rate, in the morning I checked the install after walking the dogs and found that it had happily rebooted and was waiting for me to log in. I did so, watched it slowly proceed, decided not to wait for it, and got in the shower. When I got out the login had finished and I could play with Dashboard and Spotlight.

Spotlight is cool: it fished up a bunch of stuff I didn’t know I had, including iChat logs, when I typed in my wife’s name. However, the short results list (which appears in a dropdown menu as you type, along with the option to show all results) is going to suffer from the same search challenges as Internet search engines: given a potential universe of content, how do you decide which content to surface as most relevant?

In this case, the problem was, I think, Spotlight’s result categories. By default, Spotlight returns categorized search results. Amazon and Microsoft.com both used to do this. The problem with categorized search results is that they interfere with the relevance ranking of the actual results list. For instance, if the four most relevant results for the query “doc searls” included a chat log, an Address Book card, a mail message from him, two more chat logs, a bookmark, and another mail message, how should the search results be categorized? If your first category is “Chat,” including the first, fourth, and fifth search hits, the Address Book card and mail message appear lower in the search list than they should, making the search results appear incorrect. In my case, I searched for “lisa” and the system returned a bunch of information, including an address card. But it wasn’t Lisa’s address—it was the address card of one of her friends, on whose card I had entered “Lisa Jarrett” in the Friend field.

I have a suspicion that some of my issues with Spotlight were related to the fact that it was still indexing my hard drive. This also caused Dashboard to be less responsive than it could have been. I can definitely see the joy to come with Dashboard, though; just having one-key access to a good dictionary and to Wikipedia is a killer benefit.

I had to go to work, so I left Mail importing my 44,000+ email messages (Mail in Tiger uses a new file format to store mail messages, so there’s a one-time hit for translation and indexing). More reports tonight.