Dave: Everyone’s a newbie in Boston now

Dave points out that the northbound lane of the I-93 tunnel and the Leonard Zakim Bunker Hill Bridge have opened in downtown Boston. Wow: the prospect of half the noise near the north end being moved underground is just incredible. (I’d normally get this sort of update from George, but he’s still in California. Oops. I almost said “out in California,” as though it were so far away from me now.)

MSDN comes to the party

Tim Ewald: “RSS at MSDN!” New RSS feeds for MSDN, including a comprehensive all-new-articles feed and separate feeds for Visual Basic, C#, C++, the overall Visual Studio product, the .NET Framework, and XML Web Services. There’s a lot of content in MSDN (even if most of it is by definition Microsoft-centric), and having an RSS feed through which to consume it makes it immeasurably easier to consume, navigate—and blog about, natch. Dave thinks so too.

Finally, a decent technical critique of TIA

DM Review: “TIAin’t.” Herb Edelstein points out four major problems with the TIA strategy from a technical point of view:

  • Data integration and data quality: How much time and money will the TIA folks spend just on trying to match disparate records from fifty state drivers’ license bureaus, hundreds of utility bill providers and credit application sources, and all the different banks, credit card providers, and so forth?
  • Too much data, too few examples: With only a handful of domestic terrorists and a US adult population of about 220 million, Edelstein points out, there’s way too low a signal to noise ratio: “Let’s assume there are 1,000 active terrorists in the U.S. (a number that likely overstates the case by an order of magnitude) out of a population (age 16 and up) of approximately 220 million. An algorithm could be 99.999995 percent accurate by saying no one is a terrorist. Even were we to look only at non-citizens (an arguable tactic), we would still have an accuracy rate of 99.99995 percent by declaring no one a terrorist.”
  • Lack of sufficient examples to create good signatures (identifying patterns). This is a technical refinement of the previous point, but basically the sample size of terrorists is so small that it’s hard to build patterns from them that can reliably be used to predict future terrorist activity. Further, Edelstein points out, terrorists exhibit adaptive behavior, learning from what gets other terrorists caught.
  • False positives. Edelstein summarizes this point as a kind of Hobson’s choice: you don’t want to falsely accuse anyone but you don’t want to miss any terrorists. And if you have a failure rate of your algorithms of 0.1%—an overwhelming success in most data mining applications—that’s still over 220,000 potential false positives!

Edelstein concludes that the right answer is to improve the technology and use it to answer fixed questions rather than look for patterns in all possible available data—to use the system for decision support rather than rely on it to make the decisions.

My question: given the large amount of money to be spent, and the large likely consequences of arresting and incarcerating innocent people, how big a disaster do we have to be able to predict and eliminate before a system like this justifies its cost?

These are the things about my neighborhood

  1. No matter how wet and nasty the previous night was, I’ve been waking up each morning to sunlight and a world washed clean. There’s a bit of a wet green glow everywhere I drive. (Never mind that much of it might be dandelions.)
  2. I discovered the world’s scariest parking lot in downtown Kirkland last night: not in terms of violence but just in terms of gravity. The lot is on a steep (about 40°) hill, and rather than have the cars park with noses facing toward the bottom of the hill, they have the spaces along the contour of the hill, so that the parked cars have their drivers side about two feet lower than the passenger side. I swear, I was afraid the car was going to tip over on me as I got out. Pictures soon.