Congratulations

Echoing Scoble, my congratulations to my friends on the Office team, who shipped Office 2003 today. Coordinating that many applications—plus OneNote and InfoPath, and Office LiveMeeting—together into a single release counts as a major accomplishment in anyone’s book.

I won’t comment on the release itself except to say that PowerPoint is now consistently stable for me, and that I love cached mode in Outlook…

Follow up: Windows iTunes and protected music

Despite my hopes yesterday, I didn’t get a chance to try sharing music between Mac and Windows iTunes last night, but I did copy a few songs I bought from the iTunes Music Store using my Mac to my Windows machine this morning. When I asked it to play the first track, it prompted for my music store account information, connected to authorize this machine, and then played that track and the others that were brought over in the same way.

So the downside is that moving tracks from one machine to another is not seamless, requiring you to dig up the files and physically copy them over; the upside is that once they are there, authorizing the other computer is painless.

The rumors are true: iTunes for Windows

As rumored, the iTunes Music Store for Windows arrived today. Two significant parts of that announcement: the store is available to Windows users, and iTunes itself is available to Windows users.

I downloaded the software today, which includes QuickTime 6.4, and installed it. After a restart (probably necessary because I already had QuickTime installed), I started iTunes. It asked me whether I wanted to find MP3 and AAC files in my My Music folder, and whether I wanted to see the store right away. I said yes and no, respectively, and let it start finding music. About a minute later, it had populated 1473 MP3s into my library.

Note: It didn’t ask about WMA files. Not a problem unles you were already using Windows Media Player and ripping to WMA format. But support for the format is absent, unsurprisingly.

The music store looks slightly different from the Mac version, but the functionality is the same, including the two new features added today, Gift Certificates and Allowance (smart features for family and friends). One major problem: I couldn’t get purchased music that I had bought on my Mac to download to my Windows account. I’ll have to see if I can figure out how to do that… (Update: Looks like I’ll have to copy the files manually; see these Support Articles about moving music between authorized computers.)

Update 2: Holy crap, music sharing works too! Someone else in my building has downloaded the software, and his shared list just showed up in my library… I’ll have to wait until tonight to see how it works on our home network.

Followup day part 2: Coalescing temporal data in SQL queries

I have to confess that there’s a little trick I didn’t mention in my first post about summarizing time range data using SQL. Specifically, my solution relies on the data set being sorted in a certain way, in this case by server_id and DateAndTime, and then inserting a sequential key on the table using an identity column. So my solution isn’t very general.

This came back to bite me in the butt when I wanted to take the next step and summarize the output from that script further by eliminating the TestID and summarizing by server_id and time range. I couldn’t get my original algorithm to work. At all. Frustrated, I did more research and found that this problem, which is formally known as coalescing temporal data, is really hard—so hard that there are people like database guru Rick Snodgrass who devote their whole careers to figuring out how best to summarize temporal data using SQL.

The difficulty is that SQL is a set based language, but to properly summarize temporal data, SQL needs to understand time spans bounded by a start and end date and be able to compare them to see if one partially or wholly contains another. Fortunately for me, Snodgrass wrote an article a few years back in Database Programming and Design, called “Temporal Coalescing,” that lays out several options for solving this problem, including a mostly procedural option, a cursor-based option, an option all in one SQL query that’s even hairier than the one I proposed, and an option that uses a view and a HAVING COUNT statement, which is what I’m using now. It’s not fast, but it is correct. Here’s Snodgrass’s sample code, translated into the terms of my original solution:

CREATE VIEW V1 (server_id, DateStart1, DateEnd2) 
AS SELECT F.server_id, F.DateStart1, L.DateEnd2
FROM ServerHistory AS F, ServerHistory AS L, ServerHistory AS E 
WHERE F.DateEnd2 <= L.DateEnd2
   AND F.server_id = L.server_id AND F.server_id = E.server_id
GROUP BY F.server_id, F.DateStart1, L.DateEnd2
HAVING COUNT(CASE
         WHEN (E.DateStart1 < F.DateStart1
            AND F.DateStart1 <= E.DateEnd2)
         OR (E.DateStart1 <= L.DateEnd2
            AND L.DateEnd2 < E.DateEnd2)
      THEN 1 END) = 0

CREATE TABLE Temp(server_id int,
   DateStart1 DATETIME, DateEnd2 DATETIME)
INSERT INTO Temp
SELECT server_id, DateStart1, MIN(DateEnd2) 
FROM V1
GROUP BY server_id, DateStart1

Mandatory disclaimer: This posting is provided “AS IS” with no warranties, and confers no rights.

Fun with SQL and temporal data

I get to do random weird stuff in my job. Sometimes I’m working from really high level customer satisfaction data and making recommendations about how to improve the customer experience on Microsoft.com. Other times I have to roll up my sleeves and get into the rawest of raw data before I can get to the recommendations. I just got done wading through some pretty low level log data with SQL and thought I’d share how I was able to munge it to get usable information. If you’re not a SQL geek, feel free to skip.

I have a raw data set that consists of information about clusters and tests. There are one or more tests that run periodically against each of a set of clusters. Each time the test runs, a set of information is recorded: the server and test number, the date and time, and the status returned by the test (essentially a pass-fail), among other parameters.

Here’s a sample:

Server ID Test ID Date and Time Status
1 1 9/15/2003 00:01:23.456 Pass
1 1 9/15/2003 00:01:24.540 Fail
1 1 9/15/2003 00:01:25.006 Pass
1 1 9/15/2003 00:01:28.456 Pass

This is pretty useful, except what I really want to know is, if a test fails, how long does it take before the test starts passing again? And how many times does it happen a day? a week?

So I started trying to aggregate the data into something that would look like this:

Server ID Test ID Event Start Event End Duration
1 1 9/15/2003 00:01:24.540 9/15/2003 00:01:25.006 0.454

It turns out to be trickier than I thought. What I ended up having to do was to join the table to itself to get the beginning and end date (and therefore duration), then use a NOT EXISTS clause to screen out lots and lots of cases where one failure might have multiple rows afterward with normal statuses—because if you don’t, the table above would show two events, both starting at the same time but one ending at 00:01:25.006 and the other at 00:01:28.456.

Here’s the query I used to make it all work:

select eh1.server_id,
eh1.TestID,
eh1.DateAndTime as DateStart1,
eh2.DateAndTime as DateEnd2,
DateDiff(ss,eh1.DateAndTime,eh2.DateAndTime) as Duration
from EventHeap eh1 INNER JOIN EventHeap eh2
ON ( eh1.server_id = eh2.server_id AND eh1.TestID = eh2.testid)
INNER JOIN EventHeap eh3
ON ( eh1.server_id = eh3.server_id AND eh1.TestID = eh3.TestID )
WHERE eh1.EventHeapID < eh2.EventHeapID AND
eh1.Status = 'Fail' AND eh2.Status = 'Pass'
AND NOT EXISTS
( select * from EventHeap eh4 WHERE eh4.server_id = eh1.server_id AND
eh4.TestID = eh1.TestID
AND eh4.EventHeapID < eh2.EventHeapID AND
eh4.EventHeapID >= eh1.EventHeapID AND eh4.Status = 'Pass' )
GROUP BY eh1.server_id, eh1.TestID, eh1.DateAndTime,eh2.DateAndTime
ORDER BY eh1.server_id, eh1.TestID, DateStart1

The biggest problem I have now is performance; running the aggregation on 57,000 rows takes a while. But the end result is much more usable data.

Mandatory disclaimer: This posting is provided “AS IS” with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.

Bastard Son of Blaster?

Scoble pointed to this AP article that says the next Blaster (really a tool that exploits a similar vulnerability) is being distributed from China. I wonder if it’s the same as this warning from Symantec. Either way, looks like a busy day at Microsoft.com.

As always, protect yourself. The Protect site on Microsoft.com makes it easier than ever—there’s a step by step wizard for each OS, and on Windows XP you can use a Windows Update-like client to set up your firewall for you.

Suggestion: Beat the rush, update now

Based on what happened during Blaster, it might be a good idea to hit Windows Update tonight to get the latest patch, MS03-039, which addresses another RPC vulnerability. To the best of my knowledge there’s no worm that exploits this vulnerability, but based on sad experience it’s only a matter of time. As a bonus, this patch supersedes the one that fixed the vulnerability that Blaster exploited, so if you never got around to patching your system last time this one will cover you.

If WU is overloaded, you can also get the download for your operating system from the TechNet advisory. Background information for the less technically inclined here.

More Microsoft.com Web Service goodness

When one talented programmer succeeds in implementing something, you can be sure another is a step behind making improvements. Sam Ruby provides a stripped down version of Phillip’s Python wrapper for the Microsoft.com Web Service that doesn’t depend on a toolkit or XML parser and supports one operation: Top 10 Popular US Downloads.

Looking at the implementations these guys have done, it’s clear that the bulk of the work on non-.NET platforms is implementing WS-Security. It strikes me as a worthwhile challenge to try it out. I wonder how easy it is to get Apple’s SOAP libraries to support the headers required.

Other notes: Both a human-readable description of the web service and a WSDL description are available.

More about Microsoft.com Web Services 1.0

Dave asks for more details about the Microsoft.com Web Services release I discussed earlier. I can’t really give too many more details other than what’s in the MSDN announcement linked above, but the key is in the name, Microsoft.com Web Services (which I’ve corrected in my post
below).

This is a web service layer on Microsoft.com, which is intended, as the release says, to “enable you to integrate information and services from MSDN, Technet, other Microsoft.com sites, and Microsoft Support.” Version 1.0, which is a proof of concept and shakedown for the infrastructure, provides an API via SOAP that allows accessing a designated set of Microsoft.com content, the Top Downloads on the site. Future releases, the release indicates, will allow you to access other Microsoft.com content, including presumably info from Support, MSDN, and Technet.

“Sounds like RSS—so why isn’t it RSS?” you cry. Good question. I’ll see if I can find out. But the key point is that this is kind of analogous to the Amazon SOAP API or the Google API: a way to programmatically access certain content on Microsoft.com. Potentially this could be of interest to Microsoft’s partners and content providers in the Microsoft community, who might want to selectively expose some of Microsoft.com’s content without having to send their users blindly to us.

Obligatory disclaimer courtesy our legal folks: This posting is provided “AS IS” with no warranties, and confers no rights.

Microsoft.com Web Services v. 1.0

Mark Pilgrim points to the 1.0 release of Microsoft.com Web Services. This is kind of a big deal at Microsoft.com (where I work), because it is a publicly-visible hook into a new publishing model for us. Of course, it inevitably raises some snags too; Mark does a good job of highlighting the problems with our registration process and the fact that the documentation is only available if you already have one of the recent versions of Visual Studio.

Maybe I’ll play with trying an AppleScript wrapper for the service, which will almost certainly grow in usefulness beyond listing Top Downloads. Of course, I’ll have to be very careful about doing so in accordance with the license terms, which among other things prohibit redistributing the documentation off my premises or distributing modified sample code that does not run on the Windows platform.

Update: It’s Microsoft.com Web Services, not just Microsoft Web Services as I incorrectly indicated earlier; my apologies for the confusion.

More SoBig fallout: blacklists

In my mail this morning, along with the few SoBig messages that made it past my ISP’s mail virus filter and my junk mail filters (see this entry at MacOSXHints for a rule to filter the rest as junk manually), was a notice from Yahoo! Groups that my account had been paused because I had exceeded the maximum number of bounces to my email account. I clicked the provided link to reactivate my account, then looked at the bounce history. Interestingly, only one bounce happened during SoBig; the rest were ancient history. But the email that bounced yesterday was hard bounced by my ISP because the IP address that sent it had been blacklisted. Not by my ISP, by SpamCop.

Now think about the implications of that. Because of an email worm with its own mail engine, not just ISPs and spammers but innocent users could end up on blacklists run by third parties—with no warning. Maybe Dave and others are right about this being the end of email.

SoBiggest

Lawrence points to a News.com story that sez Sobig is aptly named: the fastest spreading virus ever. Guesses as to what made it spread so quickly: a combination of good social engineering (randomly selected forged return addresses) and good spam-filter-busting capabilities (the rotating subject lines, the changing return addresses, the changing attachment name). No surprise: the BBC says that Sobig seems to have been written by a spammer who needed a way to get his messages past spam filters.

Frustrating point about this worm: it really has nothing to do with Outlook. It doesn’t exploit any Outlook vulnerabilities—except maybe the fact that it’s easy to click and execute an attachment in Outlook, and to read Outlook address books. The worm carries its own mail sending engine around with it. And because the worm is so self reliant, it isn’t easy to avoid it—there’s no “magic bullet” patch that will keep it from spreading. Except behavioral changes on the part of users, and maybe switching OSes.

More SoBig updates

Something I didn’t mention in my initial posts about SoBig: the worm can send mail by itself, since it contains its own SMTP server, and will forge return addresses based on entries in your Outlook address book or your Internet cache. So if you see email from me, no, I’m not infected with the virus, but someone else who knows me or has read my web page is.

Technical details of SoBig at the Berkman Geekroom. Reaction from Kevin Werbach: “either email is broken, Microsoft’s email software is broken, or those two statements are the same.” Rob McNair-Huff at MacNetJournal has been hit hard, as has Mark Frauenfelder at Boing-Boing.