Secrets of Wikipedia research

Also known as: How on earth did people write encyclopedias before the Internet?

I’ve been a regular editor on Wikipedia for a while now, with a pretty narrow focus on the University of Virginia and related topics. In the process, I’ve found a list of sources that have made the topic much easier, and might be helpful for other fans of the history of the University:

Note that the sources are hosted by the UVA Library, Google Books, and the Internet Archive. Without the efforts of text initiatives like these I don’t think that what is being done on Wikipedia would be possible. I don’t think that I imagined, when I was an intern applying SGML markup to out-of-copyright texts in the University’s Electronic Text Center (since incorporated into the library’s Scholars Lab), that the work would lead here.

The non-linear cost of bad software development

I ran across an interesting concept in my reading today: technical debt, and its cousin design debt. The concept is basically the application of the Second Law of Thermodynamics to software development. As you develop software, you affect the entropy of the code. Feature development typically increases entropy, while refactoring and explicit design activities decrease entropy.

Why do we care about entropy in software code? Code with high entropy is harder to maintain, harder to fix bugs in, and harder to add features to. It basically increases the cost and time to get new releases of the software out.

The concept of design debt argues that this kind of entropy is additive across releases, and that each time you perform entropy positive actions you increase the amount of work needed to dig out and make the code maintainable again.

I’ve lived this, for sure, and I suspect most others have too. But what makes it really interesting is thinking about it dynamically, where it is made clear that design debt decreases the profitability of a project. I think it’s even worse than it appears in the diagram, because the diagram neglects the time dimension. As the cost of development increases, more than likely the time to develop also increases—which means that Domain Evolution proceeds even farther while you are trying to catch up. This means that you have to increase the number of features even more, but that incurs a higher design debt still. It’s an unpleasant positive feedback loop.

Design mistakes cost

I’ve stopped reading Jakob Nielsen on a regular basis, so I missed this: Top-10 Application-Design Mistakes. As it turns out, this is one of the few of Jakob’s Alertboxes that I agree with more than disagree with. Iterative design, paper prototypes, decide what your app should do, beware nonstandard GUI controls, design for the user rather than the back-end system, etc.

Number two particularly amuses me. I was on a business trip with someone who was bitten, hard, by this bug (on a different travel site). His boss booked his travel, and didn’t pay attention to the fact that the position of the months on the calendar changed between the Start and End date fields. Worse, the travel was in February in a non-leap year, so there wasn’t even a difference in date numbers to clue him in (since the Wednesday in March was exactly 28 days after the Wednesday in April). Result? A very long delay for our friend at McCarran Airport in Las Vegas trying to straighten the problem out, so that my friend could get back 30 days earlier than his ticket specified.

Usability mistakes cost.

Getting ready for the big one

The big concert, that is, or concerts to be more precise. The last Tanglewood Festival Chorus concert series of the Symphony Hall part of our season is coming up, and it’s big: Hector Berlioz’s two part opera, Les Troyens. Everything about it is big: five acts divided into two nights, big chorus, big orchestra, big writing.

The background on the opera’s composition makes for some interesting reading, a classic battle between artist and public. Berlioz wrote what he felt to be a magnum opus, only to have it whittled down by the only opera house willing to perform it. Of the audiences who came to see the opera, he remarked glumly, “Yes, they are coming, but I am going.”

We’ve had a pair of rehearsals, and all I can say is that so much tonality, after the astringent aesthetic of the Bolcom, feels kind of sinful. Should be a fun run.

Opening Day, very early in the morning

New York Times: Red Sox Top A’s, 6-5, in Tokyo Opener. For the curious, no, I did not get out of bed at 5:30 to watch the opener. I did, however, tune into the game on AM radio—something I haven’t ever used on my car before—on the way in to work, to hear that the As were up in the seventh inning.

Yes indeed: daytime temps nearing 50, the Red Sox are back in action, and it’s still light when I drive home from work. Must be spring.

Ham and mushrooms, butter and garlic

It’s been a while since I wrote a food-oriented post—and of course a holiday weekend is just the thing to trigger one. Lisa’s parents were here this weekend, so our relatively freewheeling Easter dinner that we have honed over the past few years got expanded a little stylistically while reining in a few of the more eccentric ingredients.

The menu: deviled eggs for hors d’oeuvres; glazed ham; mashed potatoes; asparagus; and mushrooms. The deviled eggs were the most restrained compared to past years, where I used wasabi in place of the horseradish my parents always used to perk things up. Instead of wasabi, I just used hot sauce, slightly increased the salt for flavor, and diced up some shallot very fine to mix into the filling. The eggs were superb: eminently edible but leaving one still hungry—and thirsty. As is also traditional at Easter, I accompanied mine with a small amount of bourbon over ice as I was cooking. This year it was Blanton’s, a serendipitous find that I was delighted to have in my liquor cabinet. No juleps this year, though; for one thing, at 30-something degrees, it was too damned cold out to have them or want them.

The potatoes were simple too—half and half and butter in the place of the chicken broth and buttermilk that I’ve used in the past to give them flavor, and I thought the potatoes were bland as a result. But! They were a perfect foil to the mushrooms (sliced, cooked in olive oil and butter with more diced shallot and two cloves of garlic, and then finished covered in the pan), which were a hit. The garlic was definitely the thing. Alas the asparagus! cooked much too long.

The ham was tasty, but—and here regional prejudices rear their head—I do wish I could have found a proper ham. And by proper, I mean country ham, dry-aged, the kind that comes in a burlap bag and tastes a little like a salt lick and a little like a smoky prosciutto. That’s the ham I had a lot of growing up, both at home and at church, where ham biscuits were the order of the day after a sunrise service. But this ham—a spiral sliced ham with a brown sugar and orange juice glaze, was pretty good in its own way—just not quite the way my mouth remembered it.

After dinner, of course, the requisite ham biscuits. Mine reflected my inner culinary struggle, with mustard on top and butter on the bottom. Yes! Butter with ham. And if you think it’s insane, ask the street vendors in Provence selling jambon cru sandwiches with thick local butter about it, and then come back and tell me I was right. Of course it‘s not the Provençal coming out in me so much as the Pennsylvania Dutch grandmother, but oh well.

Others had clam chowder with dinner—Legal’s, sold prepackaged, and it occurred to me how much easy access to the greatest ambrosia breeds contempt. Watching the others eat it made me think about the Bull Island clam chowder I grew up with, cooked with a clear broth, not milk, and certainly not with tomato.

Program to live vs. live to program: early hacker critique

Happy Good Friday! In honor of the day when history turned upside down, here’s a keen little insight from the late Joseph Weizenbaum (via helmintholog, via Scott Rosenberg): some programmers are compulsive programmers who, in taking a purely software-centric approach to solving problems, set themselves up for failure and take the organizations in which they work hostage. Weizenbaum cites some nifty examples of this, such as the programmer who can’t be bothered to write documentation for his mission critical hacks.

Weizenbaum goes on to cast this critique of hacker culture—the concept that everything can be explained by the computer, and that no external skills are needed—in the context of scientism, the belief that science alone, without external belief systems or other human considerations, is sufficient to explain everything about the world around us.

I may have to go and dig up a copy of the book. It sounds like a thought provoking read along the lines of Winner’s The Whale and the Reactor.

Is Apple evil? Maybe, but not the way Wired says

I was going to take a shot at ripping apart this Leander Kahney article in Wired magazine on how Apple is the anti-Google and therefore evil, but I figured if I waited long enough that John Gruber at Daring Fireball would do it for me. Gruber didn’t disappoint, noting that “by Kahney’s logic, any company that is different from Google – and clearly most companies are far more different from Google than Apple is – is evil. I can’t tell if Kahney is being willfully obtuse or is simply a shithead.” Heh.

The accompanying list of 5 ways that Apple “breaks the rules” makes me wish that Gruber had gone after it as well. Software should be decoupled from hardware, huh? So it can run on just any phone or computer? We have a name for that kind of application. It’s called a web application. You know, the kind of application that Apple encouraged people to develop for the iPhone, and that all the pundits said wasn’t sufficient. Now Kahney slams Apple for encouraging people to build apps that run on the iPhone natively. What does he really want? Maybe Kahney is really asking for the iPhone OS to run on any old phone hardware platform. I can tell you that I can think of no surer way to ruin the user experience, and the brand, than to cram the iPhone software onto a piece of crap like the Sony Ericsson phone I just got rid of, or even onto my wife’s Blackberry Pearl.

The third point, that every Mac is preloaded with Apple software, makes me laugh. You think PC users like having a bunch of crap applications preloaded on their machines? Windows Media Player, which is preloaded on Windows everywhere but the EU, is an OK media player and it’s the default, unless the OEM changes it. But that has nothing to do with the OEM’s concern for the end user’s experience, and everything to do with the revenue they get from the partner from whom they are bundling the software. To be fair, Apple chooses not to bundle competing products, but they have bundled third party software, notably Quickbooks and trials from the Omni group. On both Windows and the Mac, the user can change the default music player (or any other default program) very easily. Would Kahney prefer that Apple shipped with no default player and made the user download one?

And the whole point about the iTunes/iPod closed loop is such a piece of crap. One word: MP3. Available on every platform. You can rip your CDs to MP3s, using iTunes, and put the MP3s on your iPod. One point in favor of this argument: iTunes for Windows doesn’t support syncing to non-iPod players, but there’s a free plugin to fix that.

The fourth point, love your customers, sounds like a page from the Good Product Manager blog. How to be a bad product manager: give your customers whatever they want and ask for in your product, regardless of the cost of support and regardless of whether the resulting product actually does what your customer wants it to do. How else to explain Kahney’s inexplicably picking on the “no floppy drive in an iMac” decision, which in retrospect was not only one of the smartest things that Apple ever did but also created the market for USB thumb drive storage? And the MacBook Air “no optical drive” situation has been covered over and over again. It’s called making intelligent trade offs. It’s what every product manager does.

I enjoyed the Fake Steve Jobs smack-down on Kahney, and wish that he had gone farther. There’s a lot of good lessons to learn in the article for a product manager with half a brain; you just need to dig in and question every assumption that Kahney makes.

Free as in beer, Wind as in air

A few comics related links this morning. First, it will be of interests to comics historians, fantasy fans, and my sister that the full archive of Elfquest is going on line for free to mark the comic’s thirtieth anniversary; the archive will fill up over the coming year. That’s a whole lotta Pini, folks. If you thought catching up with the Sluggy Freelance archives took a long time, just wait.

The other freebie is an archive of the original art for the first issue of Elektra: Assassin, written by Frank Miller and lovingly painted by Bill Sienkiewicz. If you think Miller’s later work was weird, intense, and violent, just wait until you feast your mind on this one. (Greg Burgas wrote an excellent review of the series that might lend some context to the art.)

Print on demand from the Internet Archive

Browsing a Wired.com photo feature on the Internet Archive’s book scanning operation, I was struck by this image, showing a self-contained book press. PDF goes in, paperback bound book comes out.

I would pay for a copy of Cabell’s Early History of the University of Virginia, for sure, and maybe even the five-volume centennial History of the University of Virginia by Bruce, which has provided so much material for my Wikipedia articles. I hope they get this capability on line soon.

A defining moment: Obama on race

I’ve just read what I hope will be the first speech collected in Barack Obama’s presidential library, the prepared text of his address on race that he is giving right now in Philadelphia (New York Times liveblog). I don’t think I’ve heard any candidate in recent memory speak so cogently about problems with racial perspectives on both sides of the color line, nor put things in perspective quite so eloquently. Bottom line: Obama has taken what his opponents tried to paint as a liability and made of it an opportunity for one of the great statements of challenge to the nation, the first great challenge speech of the 21st century, and the first presidential speech to stand alongside Kennedy’s inaugural address.

Excerpts:

And this helps explain, perhaps, my relationship with Reverend Wright. As imperfect as he may be, he has been like family to me. He strengthened my faith, officiated my wedding, and baptized my children. Not once in my conversations with him have I heard him talk about any ethnic group in derogatory terms, or treat whites with whom he interacted with anything but courtesy and respect. He contains within him the contradictions — the good and the bad — of the community that he has served diligently for so many years.

I can no more disown him than I can disown the black community. I can no more disown him than I can my white grandmother — a woman who helped raise me, a woman who sacrificed again and again for me, a woman who loves me as much as she loves anything in this world, but a woman who once confessed her fear of black men who passed by her on the street, and who on more than one occasion has uttered racial or ethnic stereotypes that made me cringe…

But for all those who scratched and clawed their way to get a piece of the American Dream, there were many who didn’t make it — those who were ultimately defeated, in one way or another, by discrimination. That legacy of defeat was passed on to future generations — those young men and increasingly young women who we see standing on street corners or languishing in our prisons, without hope or prospects for the future. Even for those blacks who did make it, questions of race, and racism, continue to define their worldview in fundamental ways. For the men and women of Reverend Wright’s generation, the memories of humiliation and doubt and fear have not gone away; nor has the anger and the bitterness of those years. That anger may not get expressed in public, in front of white co-workers or white friends. But it does find voice in the barbershop or around the kitchen table. At times, that anger is exploited by politicians, to gin up votes along racial lines, or to make up for a politician’s own failings.

And occasionally it finds voice in the church on Sunday morning, in the pulpit and in the pews. The fact that so many people are surprised to hear that anger in some of Reverend Wright’s sermons simply reminds us of the old truism that the most segregated hour in American life occurs on Sunday morning. That anger is not always productive; indeed, all too often it distracts attention from solving real problems; it keeps us from squarely facing our own complicity in our condition, and prevents the African-American community from forging the alliances it needs to bring about real change. But the anger is real; it is powerful; and to simply wish it away, to condemn it without understanding its roots, only serves to widen the chasm of misunderstanding that exists between the races.

In fact, a similar anger exists within segments of the white community. Most working- and middle-class white Americans don’t feel that they have been particularly privileged by their race. Their experience is the immigrant experience — as far as they’re concerned, no one’s handed them anything, they’ve built it from scratch. They’ve worked hard all their lives, many times only to see their jobs shipped overseas or their pension dumped after a lifetime of labor. They are anxious about their futures, and feel their dreams slipping away; in an era of stagnant wages and global competition, opportunity comes to be seen as a zero sum game, in which your dreams come at my expense. So when they are told to bus their children to a school across town; when they hear that an African American is getting an advantage in landing a good job or a spot in a good college because of an injustice that they themselves never committed; when they’re told that their fears about crime in urban neighborhoods are somehow prejudiced, resentment builds over time….

In the end, then, what is called for is nothing more, and nothing less, than what all the world’s great religions demand — that we do unto others as we would have them do unto us. Let us be our brother’s keeper, Scripture tells us. Let us be our sister’s keeper. Let us find that common stake we all have in one another, and let our politics reflect that spirit as well.

For we have a choice in this country. We can accept a politics that breeds division, and conflict, and cynicism. We can tackle race only as spectacle — as we did in the OJ trial — or in the wake of tragedy, as we did in the aftermath of Katrina – or as fodder for the nightly news. We can play Reverend Wright’s sermons on every channel, every day and talk about them from now until the election, and make the only question in this campaign whether or not the American people think that I somehow believe or sympathize with his most offensive words. We can pounce on some gaffe by a Hillary supporter as evidence that she’s playing the race card, or we can speculate on whether white men will all flock to John McCain in the general election regardless of his policies.

We can do that.

But if we do, I can tell you that in the next election, we’ll be talking about some other distraction. And then another one. And then another one. And nothing will change.

That is one option. Or, at this moment, in this election, we can come together and say, “Not this time.” This time we want to talk about the crumbling schools that are stealing the future of black children and white children and Asian children and Hispanic children and Native American children. This time we want to reject the cynicism that tells us that these kids can’t learn; that those kids who don’t look like us are somebody else’s problem. The children of America are not those kids, they are our kids, and we will not let them fall behind in a 21st century economy. Not this time.

Laws of the Internet, continued

It seems to be the day for oracular pronouncements about the Net. An engineer I work with told me about an intermittent network connectivity problem he had experienced yesterday. Sometimes he could get on the network and sometimes he couldn’t. The cause? A bad network cable! He said, “Normally with a network problem like this it’s either on or off, not somewhere in the middle.”

I responded without thinking, “Yeah, every now and then we need to be reminded that we live in a very shallow digital layer on an analog world.”

That just might be my first law of the Internet.

Spafford’s axioms of Usenet, generalized

In looking for a source for the “https = armored truck between two cardboard boxes” analogy referenced in my previous post, I came across a list of other famous analogies by the author, Gene “Spaf” Spafford. Many of the ones cited need some context, but #7, which I reproduce below in its entirety, is completely understandable to any Internet veteran of a certain age:

Usenet is like a herd of performing elephants with diarrhea: massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it.

The comment, posted prior to Spafford’s withdrawal from recreational Usenet use, sits alongside his three axioms of Usenet (Usenet is not the real world, and usually does not resemble it; ability to type on a computer keyboard is no guarantee of sanity, intelligence, or common sense; and Sturgeon’s Law applies to Usenet). I think the quote above, and Spafford’s axioms, deserve elevating to a higher consideration. They are certainly directly applicable to blogs, MySpace, Facebook, and just about every other online expression of individuality. They may be applicable to Wikipedia, and are certainly applicable if the deletions and random vandalism all too visible from the Recent Changes page are taken into account. They may even generally apply to humanity itself, as formulated below:

  1. Humanity is not (all of) the real world, and human models of the real world usually do not resemble it.
  2. Humanity is no guarantee of sanity, intelligence, or common sense.
  3. Sturgeon’s Law applies to humanity.
  4. Humanity is like a herd of performing elephants with diarrhea: massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it.

To which I can only say: True. True.

Ripples from SOURCE: Boston: how much security is optimal?

I wasn’t able to attend this week’s SOURCE: Boston conference, which my company is cosponsoring, but reading about some of the talks and looking at some of the papers that are coming out of it has been fascinating. A few points:

If you think protecting digital systems is hard, what about analog systems like the telephone?

The number of potential points of compromise is staggering… Once the X-rays of telephone equipment and close-ups of modified circuit boards came out (notice that there’s supposed to be a diode there, but someone replaced it with a capacitor…) we were headed into real spy vs. spy territory. Tracking down covert channels requires identifying, mapping, and physically and electronically testing every conductor out of an area. Even the conduit and grounds can be used to carry signal, and they have to be checked.

We don’t normally think about telephone security as an issue (although given the shenanigans that the FBI has been up to, with retroactive blanket wiretapping warrants, almost 200,000 National Security Letters authorizing warrantless wiretapping in a four-year-period from 2003 to 2006, and collection of data that they are specifically disallowed from collecting, maybe we should). Why? Because there’s an implicit cost-benefit calculation at play: given the size of the attack surface, or the vulnerable parts of the infrastructure, the cost of absolute security is staggering.

But very few people bother to follow that thought to the logical conclusion, which is that the optimum number of security violations is greater than zero. I’m not recommending hacks, mind, but if you use a cost-benefit approach to analyze security spending, you are constantly trading the cost of protection vs. the cost of attacks. If you spend so much on security that there are no breaches, you have spent more than warranted by the cost of the attack. Dan Geer makes this argument neatly, in graphical form, in the opening of his article “The Evolution of Security” in ACM Queue. The whole article is worth digesting and mulling over. He points out that as our networked world gets more complex, we start to replicate design patterns found in nature (central nervous systems, primitive immune systems, hive behavior) and perhaps we ought to look to those natural models to understand how to create more effective security responses.

Getting back to SOURCE: Boston, Geer’s keynote there amplified some of his points from the Evolution paper and addressed other uncomfortable thoughts. Such as:

  1. The model for security used to be “I’m OK, you’re OK, the network is compromised,” which leads to the widespread use of encryption. But SSL and other network encryption technology has been famously likened (by Gene Spafford) to hiring an armored car to deliver rolls of pennies from someone living in a cardboard box to someone living on a park bench. Meaning: in a world of malware and botnets, maybe the model ought to be: “I’m OK, I think, but you’re not.”
  2. Epidemiologically, as malware and botnets become more prevalent, they will become less virulent. One of the L0pht team has said (as cited by Geer) that computers might be better off in botnets than in the wild, because the botmaster will want to keep them from being infected by other malware. (This is the gang membership theory of the inner city writ large.) Geer likens this to the evolution of beneficial parasites and symbiotes.
  3. So if botnets are here to stay and we need to assume everyone is compromised, why shouldn’t bots become a part of doing business? Why shouldn’t ETrade 0wnz0r my computer when I make a trade, if only to ensure that no one else can listen in? Suddenly the Sony BMG rootkit begins to make more sense, in a sick sort of way.

Geer closes his talk by bringing back the question of how much security we want. If the cost of absolute security is absolute surveillance, of having one&rsquo’s computer routinely 0wnz0r3d by one’s chosen e-commerce sites, then perhaps we need to be prepared to tolerate a little insecurity. Maybe the digital equipment of telephone equipment boxes “secured” with a single hex bolt makes sense after all.