September 22, 2003 – Jarrett House North

I get to do random weird stuff in my job. Sometimes I’m working from really high level customer satisfaction data and making recommendations about how to improve the customer experience on Microsoft.com. Other times I have to roll up my sleeves and get into the rawest of raw data before I can get to the recommendations. I just got done wading through some pretty low level log data with SQL and thought I’d share how I was able to munge it to get usable information. If you’re not a SQL geek, feel free to skip.

I have a raw data set that consists of information about clusters and tests. There are one or more tests that run periodically against each of a set of clusters. Each time the test runs, a set of information is recorded: the server and test number, the date and time, and the status returned by the test (essentially a pass-fail), among other parameters.

Here’s a sample:

Server ID	Test ID	Date and Time	Status
1	1	9/15/2003 00:01:23.456	Pass
1	1	9/15/2003 00:01:24.540	Fail
1	1	9/15/2003 00:01:25.006	Pass
1	1	9/15/2003 00:01:28.456	Pass

This is pretty useful, except what I really want to know is, if a test fails, how long does it take before the test starts passing again? And how many times does it happen a day? a week?

So I started trying to aggregate the data into something that would look like this:

Server ID	Test ID	Event Start	Event End	Duration
1	1	9/15/2003 00:01:24.540	9/15/2003 00:01:25.006	0.454

It turns out to be trickier than I thought. What I ended up having to do was to join the table to itself to get the beginning and end date (and therefore duration), then use a NOT EXISTS clause to screen out lots and lots of cases where one failure might have multiple rows afterward with normal statuses—because if you don’t, the table above would show two events, both starting at the same time but one ending at 00:01:25.006 and the other at 00:01:28.456.

Here’s the query I used to make it all work:

select eh1.server_id, eh1.TestID, eh1.DateAndTime as DateStart1, eh2.DateAndTime as DateEnd2, DateDiff(ss,eh1.DateAndTime,eh2.DateAndTime) as Duration from EventHeap eh1 INNER JOIN EventHeap eh2 ON ( eh1.server_id = eh2.server_id AND eh1.TestID = eh2.testid) INNER JOIN EventHeap eh3 ON ( eh1.server_id = eh3.server_id AND eh1.TestID = eh3.TestID ) WHERE eh1.EventHeapID < eh2.EventHeapID AND eh1.Status = 'Fail' AND eh2.Status = 'Pass' AND NOT EXISTS ( select * from EventHeap eh4 WHERE eh4.server_id = eh1.server_id AND eh4.TestID = eh1.TestID AND eh4.EventHeapID < eh2.EventHeapID AND eh4.EventHeapID >= eh1.EventHeapID AND eh4.Status = 'Pass' ) GROUP BY eh1.server_id, eh1.TestID, eh1.DateAndTime,eh2.DateAndTime ORDER BY eh1.server_id, eh1.TestID, DateStart1

The biggest problem I have now is performance; running the aggregation on 57,000 rows takes a while. But the end result is much more usable data.

Mandatory disclaimer: This posting is provided “AS IS” with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Day: September 22, 2003

Fun with SQL and temporal data