My thought: this is where selection bias potentially becomes a real problem for Comscore.
Comscore gets their metrics from a panel of Internet users who are recruited with a package of incentives—in the past, it was browsing acceleration and download monitoring. The user’s traffic is passed through a Comscore proxy server, monitored, and warehoused. Traffic is anonymized but indexed against demographic variables.
So the issue of how the data is collected is pretty unremarkable. If Comscore sees a transaction, it really happened. The larger question is: do people who volunteer to have their traffic sniffed represent the whole Internet? It seems pretty clear to me that they don’t, and in fact Comscore regularly adjusts the metrics they report from the panel to account for overall demographics (percentage of women, percentage of users from different geographic areas). But you can’t a priori adjust the numbers based on a worldwide count of Radiohead fans, or Radiohead fans who are comfortable downloading music. And that’s where I think there is a potential problem with the numbers. Comscore’s Andrew Lipsman says that their sample, 1000 people of whom several hundred downloaded the data, is representative. I say that’s true only if there was no selection bias in the sample to begin with, and Comscore hasn’t proven that.