BrowseRank and the challenge of improving search

I posted a quick link to an article about Microsoft’s new BrowseRank search technology a few days ago. Here’s why the paper is informative, why I think BrowseRank is an interesting technology for improving search, and why I think it’s doomed as a general-purpose basis for building relevance data for the web.

Informative: This paper should be required reading for anyone who wants to know the fundamentals of how web search ranking currently works, what PageRank actually does for Google, and how to objectively test the quality of a search engine. It also offers an interesting two-pronged critique of PageRank:

  • PageRank can be manipulated. PageRank assumes that a link from a page with authority to another page confers some higher rank on the second page. The paper points out the well-known issue that, since the “authority” of the first page is also derived from inbound links, it’s possible to use Google bombing, link farms and other mechanisms to artificially inflate the importance of individual pages for fun and profit. It’s pretty well known that Google periodically adjusts its implementation of PageRank to correct for this problem.
  • PageRank neglects user behavior. The paper argues this somewhat tendentiously, saying that PageRank doesn’t incorporate information about the amount of time the user spends on the page–of course, the paper’s whole hypothesis is that time on page matters, so this doesn’t reveal any deep insight into PageRank. But it’s an interesting point that PageRank does assume that only web authors contribute to the ranking algorithm. Or does it? I’ll come back to this in a bit.

Interesting: The proposed BrowseRank algorithm uses user data–pages visited, browse activity, and time on page–to create a user browsing graph that relies on the user’s activity in the browser to confer value on pages. The authors suggest that the user data could be provided by web server administrators, in the form of logs, or directly by users via browser add-ins. A footnote helpfully suggests that “search engines such as Google, Yahoo, and Live Search provide client software called toolbars, which can serve the purpose.”

The claim of the paper is that user behavior such as time on page confers an “implicit vote” on the content in a way that’s harder to spam than PageRank. I’ll come back to this point too.

Doomed: BrowseRank relies on the following:

  1. A way to obtain a statistically valid sample of user browsing information
  2. A reliable way to determine intent from user browsing information, such as session construction
  3. Time on page is a statistically valid indicator of page quality.

There are problems with each of these requirements that are non-trivial.

User browsing information. The paper proposes that user browsing data can be obtained by the user of a client-side browsing input or by parsing server logs, and says that this practice would eliminate linkspam. Well, yeah, but it opens up two concerns: first, how are you going to recruit those users and site administrators so that you get a representative sample? And second, how do you ensure that the users are not themselves spamming the quality information? In the first case, we have plenty of evidence (Alexa, Comscore) that user-driven panel results can yield misleading information about things like site traffic. In the second case, we know that it’s trivial to trick the browser into doing things even without having a toolbar installed (botnet, anyone?), and it’s been proven that Alexa rankings can be manipulated.

There are two main problems with the user browse data model: it’s difficult enough to recruit a representative panel of honest users to install a browser plugin that will monitor their online activities, but screening out spam activities becomes far more difficult.

Session construction: Knowledge about the user’s session is one of those interesting things that turn out to be quite difficult to construct in practice, especially when you care about meaningful time on page data. The method described in the Microsoft paper is pretty typical, and neglects usage patterns like the following:

  1. Spending large amounts of time in a web app UI opening tabs to read later (web based blog aggregator)
  2. Going quickly back and forth between multiple windows or multiple tabs (continuous partial attention)
  3. The last page in a session gets assigned too much time on page because of the arbitrary 30 minute session limit (the “bathroom break” problem)

Time on page as an indicator of search quality: This is where my main gripe with the article comes from. The authors conclude that their user browsing graph yields better results than PageRank and TrustRank. The problem is, better results at what? The tests posed were to construct a top 20 list of web sites; differentiate between spam and non-spam sites; and identify relevant results for a sample query. The authors claim BrowseRank’s superiority in all three areas. I would argue that the first test is irrelevant; the second was not done on an even playing field; and the third is incomplete. To wit: First, if you aren’t using the relationship between web pages in your algorithm, you shouldn’t need to know what the absolute top 20 sites are because the information is completely irrelevant to the results for a specific query. Second, conducting a test on spam sorting with user input that operates on a spammy corpus without spammy users is not a real world test.

Third, the paper’s authors themselves note that “The use of user behavior data can lead to reliable importance calculation for the head web pages, but not for the tail web pages, which have low frequency or even zero frequency in the user behavior data.” In other words, BrowseRank is great, if you only care about what everyone else cares about. The reality is that most user queries are in the long tail, so optimizing how you’re doing on the head web pages is a little like rearranging deck chairs on the Titanic. And because we don’t know what the sample queries were for this part of the study, it’s impossible to tell for which type of searches BrowseRank performs better.

Finally, there’s a real philosophical difference between BrowseRank and PageRank. BrowseRank assumes that the only interaction a user can have with a web page is to read it. (This is the model of user as consumer.) PageRank makes a more powerful assumption: that if a user is free to make contributions to the web by adding to it, specifically by writing new content. The paper talks a lot about Web 2.0 in the context of sites like MySpace and Facebook, but arguably PageRank, which implictly empowers the user by assuming their equal participation in authoring the Web, is the more Web 2.0-like metric.

Bill Gates’ Movie Maker experience, as seen from the inside

Yesterday I posted a quick link (last entry) to one of the epic Billg emails that somehow became evidence in the Microsoft antitrust trial. The mail was sent in January 2003, when I was working in the marketing group that was responsible for Microsoft.com, which was one of the groups implicated in the email about Bill’s being unable to find, download and install the updated version of Windows Movie Maker.

As someone who spent most of his next 18 months at Microsoft working on some of those challenges, here’s how Bill’s experience matched up to problems with the Microsoft customer experience at that time. (Microsoft.com has completely changed by now, almost five years later, so I feel safe in describing the way it was then):

“The first 5 times I used the site it timed out while trying to bring up the download page. Then after an 8 second delay I got it to come up. This site is so slow it is unusable.” I don’t remember the specific issues here, except to note that capacity management was an ongoing challenge for a part of the site that typically saw between 60 and 80 million unique users a month.

“It wasn’t in the top 5 so I expanded the other 45. These 45 names are totally confusing. These names make stuff like: C:Documents and SettingsbillgMy DocumentsMy Pictures seem clear. They are not filtered by the system … and so many of the things are strange.” The Download Center was something of a battleground and the user experience showed it. The thought process was that search would be the primary way to allow people to get targeted downloads and the default experience would just be ordered by download frequency; the only filter was by which country you accessed the site. The top 5/top 50 list that Bill refers to accordingly mixed downloads aimed at consumers, IT pros, developers, and business users without regard for audience or for operating system.

When the web marketing groups that I worked with did research to figure out how to fix this issue and present more targeted downloads, we found that there was no easy way to “fix it.” You couldn’t do it by OS–if an IT pro were logged in from his XP box and searching for server downloads he wouldn’t find them. You couldn’t even do it by cookie, because business users were consumers when they got home.

And the best part? Some execs who read this part thought that the answer was editorial promotion of “featured downloads.” Never mind that 99% of the users who came to Microsoft.com weren’t looking for Movie Maker; if Billg wants to see it in the top 5, let’s jam it into the top 5!!!!

“I tried scoping to Media stuff. Still no moviemaker.” The product groups owned the keywords used to describe their products, and though we had acres of search data to inform them, very few of them mined the search strings to figure out how to categorize their products. Usually the categories were driven by product group, and so “media” would have meant Windows Media–at that time a separate product group and totally disconnected from the Movie Maker team.

I typed in movie. Nothing. I typed in movie maker. Nothing.” Ah, Search. I spent so long on problems with Microsoft.com Search that it’s not even funny. At this point in time the search engine behind the 4 million pages of content on Microsoft.com was based on the one that came with Commerce Server. Did Commerce Server scale to cover that much content? Did it do well with dynamically generated content like those download pages? Let’s just say there’s a new engine there now.

“So I gave up and sent mail to Amir saying – where is this Moviemaker download? Does it exist? So they told me that using the download page to download something was not something they anticipated.” Heh. This is my favorite point. Sadly it’s not as insane as it sounds. The product groups had control over their own content areas on Microsoft and so they thought that customers just knew to come to the Windows site to start looking for Windows downloads. This is one of the reasons that the Downloads site was such a ghetto; a lot of marketing groups didn’t understand that it was a destination for a lot of users and thus spent no time on it.

“They told me to go to the main page search button and type movie maker (not moviemaker!). I tried that. The site was pathetically slow but after 6 seconds of waiting up it came.” Search again. There was no internal search engine optimization effort, no one (in the program groups) looking at actual search keyword strings, and the search engine wasn’t smart enough to match moviemakerto movie maker. Since the keyword moviemaker didn’t appear on the page, the search didn’t return the content.

“I thought for sure now I would see a button to just go do the download. In fact it is more like a puzzle that you get to solve. It told me to go to Windows Update and do a bunch of incantations.” The product group had chosen to deploy MovieMaker as an optional download through Windows Update, rather than as a regular software download. Why? Well, the Windows product group had more control over WU than the downloads area. Plus, apparently they thought no one would ever want to download it. How many times do you look for optional downloads through Microsoft Update? Yeah, me either. And from this point the story becomes the familiar one of the nightmare of WU.

It’s really no wonder everyone hated Microsoft at this point. The web experience really showed no understanding of how users actually used the site and what they were trying to do.

So what would the right answer have been? Some of the steps that were taken right away were a dedicated focus on improving Microsoft.com search by providing more scalable indexing and tuning and much better search algorithms. (Unfortunately the guy who headed up the part about “better algorithms” famously was sniped away by Google.) There was a better editorial focus across the entire site starting around this time, based on user behavior data, to improve the user experience. There was significant improvement of the internal BI tooling to help us better understand what people were trying to do on the site (I worked on this part).

I wish I could say that the product groups started working together more closely to figure out an integrated user experience. I don’t know that I can give a fair perspective on what this part of Microsoft’s culture now, since I left in July 2004. But at the time this was the big drawback of Microsoft’s legendary empowerment of their product teams; all the incentives were there for individual product marketers to do everything they could for their particular product or product segment without considering how it played with the rest of what Microsoft did. While the Microsoft.com team that I worked on had this as its charter, we didn’t have the power to change things or override the product groups. In fact, Billg’s email and others like it were critical to Microsoft’s success because there were so few other mechanisms that considered the customer experience as a whole–and had the power to change it.