Just got off a long IM conversation with Greg, in which he pointed me to, in no particular order:
- stevenf’s proposal for a standard way, called mtaste, to represent your musical tastes on a blog (more thoughts on this in a second)
- Jeremy Zawodny’s complaint that Apple’s music store doesn’t already know what he might like to listen to, through his iTunes file (more on this too)
- The assertion in Business Week that the Verizon piracy records ruling + the Grokster/Morpheus ruling = legislative ruling that piracy is behavioral, not technological. No more thoughts about this one, just thought it was interesting.
- Lessig’s thought that all the international and state DMCA maneuvers are a way to lock in the DMCA here makes a scary amount of sense. And it makes me wonder whose analysis of the law is right. Is BW right and all these laws are, courtesy the Grokster decision, obsolete? Or is Lessig right and the clawed hand of the RIAA is closing in on my iTunes? Or are both right, and the RIAA is betting that the forces of consumerism will get tired of getting all the laws struck down (or go broke doing so) by the time the RIAA sets them all up?
So, with that note… the mtaste thing. Comparing those files would be even more difficult than stephenf imagines. Basically, he’s describing something like what Amazon does, but decentralized.
The problem is, unless you have an exact match in your mtaste file with the other guy’s, you have to do the music match thing that Amazon does, which is a large clustering problem. To get a good match you need a good sample size—given the number of artists out there (428 in my limited library, probably a lot more in other places), probably thousands. Probably more. Because the record that you’re comparing to someone else’s is at least 428 artists long. By way of comparison, training data sets of around 1600 were needed to predict television show preferences, 5000 for a movie database, and over 32000 for movement around Microsoft.com in a 1998 Microsoft Research study by Breese, Heckerman, and Kadie.
So, it’s a hard problem. You’d need, oh, a large dataset from users and a database to process it, a collaborative filtering algorithm, and a lot of data. And you’d need a standard way to convert the mtaste file (which, as proposed, is an arbitrary flat file) to a standard data structure.
But if you could pull it off… it would disintermediate Amazon. Really. It would do away with one of their powerful competitive strengths.
Which is probably why Apple hasn’t done it. Anyone seen the licensing terms on which they got the rights to use the One Click business method from Amazon?