Followup day part 2: Coalescing temporal data in SQL queries

I have to confess that there’s a little trick I didn’t mention in my first post about summarizing time range data using SQL. Specifically, my solution relies on the data set being sorted in a certain way, in this case by server_id and DateAndTime, and then inserting a sequential key on the table using an identity column. So my solution isn’t very general.

This came back to bite me in the butt when I wanted to take the next step and summarize the output from that script further by eliminating the TestID and summarizing by server_id and time range. I couldn’t get my original algorithm to work. At all. Frustrated, I did more research and found that this problem, which is formally known as coalescing temporal data, is really hard—so hard that there are people like database guru Rick Snodgrass who devote their whole careers to figuring out how best to summarize temporal data using SQL.

The difficulty is that SQL is a set based language, but to properly summarize temporal data, SQL needs to understand time spans bounded by a start and end date and be able to compare them to see if one partially or wholly contains another. Fortunately for me, Snodgrass wrote an article a few years back in Database Programming and Design, called “Temporal Coalescing,” that lays out several options for solving this problem, including a mostly procedural option, a cursor-based option, an option all in one SQL query that’s even hairier than the one I proposed, and an option that uses a view and a HAVING COUNT statement, which is what I’m using now. It’s not fast, but it is correct. Here’s Snodgrass’s sample code, translated into the terms of my original solution:

CREATE VIEW V1 (server_id, DateStart1, DateEnd2) 
AS SELECT F.server_id, F.DateStart1, L.DateEnd2
FROM ServerHistory AS F, ServerHistory AS L, ServerHistory AS E 
WHERE F.DateEnd2 <= L.DateEnd2
   AND F.server_id = L.server_id AND F.server_id = E.server_id
GROUP BY F.server_id, F.DateStart1, L.DateEnd2
HAVING COUNT(CASE
         WHEN (E.DateStart1 < F.DateStart1
            AND F.DateStart1 <= E.DateEnd2)
         OR (E.DateStart1 <= L.DateEnd2
            AND L.DateEnd2 < E.DateEnd2)
      THEN 1 END) = 0

CREATE TABLE Temp(server_id int,
   DateStart1 DATETIME, DateEnd2 DATETIME)
INSERT INTO Temp
SELECT server_id, DateStart1, MIN(DateEnd2) 
FROM V1
GROUP BY server_id, DateStart1

Mandatory disclaimer: This posting is provided “AS IS” with no warranties, and confers no rights.

Followup day part 1: Voting machines

Some additional thoughts and links about the Diebold voting machine security issue: Greg was kind enough to point out to me over IM, rather than chiding me publicly for it on his blog, that it’s unlikely that Georgia’s Democrat Secretary of State, who has responsibility for running elections, would be unlikely to help rig an election via voting machines in favor of the Republicans. He also passed along a link or two, including one I should have caught in the MIT Technology Review, about security issues with the voting machines.

At bottom, the whole mess still feels to me like a problem of under-secured, under-audited, badly-managed software development.

Black and White week

I was hoping to go for a whole week with album covers that consisted of black and white portraits of two men, but didn’t quite make it. Still, I continue to be slightly creeped out by the Kruder and Dorfmeister cover, which is like a bizarro version of the Bookends cover. And is the pose on the cover of Songs From the Big Chair coincidence? I don’t think so. Check out the Past Listening page for the whole picture.