this space intentionally left blank

April 21, 2009

Filed under: journalism»new_media

The Precision Hack

Yesterday, Jeff Atwood at Coding Horror linked to "Inside the Precision Hack", a blog entry describing the process by which 4chan hackers broke the Time 100 poll. The poll, which is meant to nominate the "world's most influential people," had practically no security built into the voting mechanism. The kids from notorious Internet sewer and discussion board 4chan were able to manipulate it to the point where they could spell out messages, acrostic-style, at the top of the list.

Since Coding Horror is a programming blog run by a guy who's relatively new to web programming, he mainly sees this as a funny way to make a point: look how easy it is to bypass security when it's incompetent! But there's a wider question that ought to be raised, which would be: is this level of competency (or lack thereof) actually uncommon in journalism? And as newspapers and other outlets increasingly work through "new media," will they do so securely? What are the risks if they don't? These are relatively simple questions, and ones of self-evident importance. But as journalism conducts its internal debate regarding "innovation" in reporting, they're not questions that I'm seeing asked as often as they perhaps should be.

So what did Time do wrong? Turns out that they made lots of basic mistakes. The voting was submitted in plaintext using URL variables, and you could request the page using a GET instead of a POST, so innocent people could be enlisted simply by embedding an iframe on an unrelated page. When it became clear that this was skewing the vote, Time added a verification parameter consisting of the URL and a secret code run through an MD5 hash. Unfortunately, it sounds like they left the secret code in the Flash file as a literal, which is pretty easy to extract with one of the many SWF decompilers out there. These are some pretty weak security measures--a low barrier to entry that made it easy for some relatively-unskilled hackers to precisely manipulate Time's poll.

I want to make it clear that I'm not bringing this up at Time's expense, as Atwood is (I like Coding Horror, but he's not exactly a crack security researcher). In fact, I sympathize with Time. Security is hard! And expensive! And if you're not used to thinking about it from the very beginning, you're going to screw it up.

But why did it happen? Here's my completely unsubstantiated hunch: they got caught trying to do more with less. News organizations these days are caught between two directives: cut costs, and simultaneously jump onto the Web 2.0 bandwagon. These goals are directly opposed to each other. You can't get the kinds of programmers that you need to keep up with Google/Yahoo/Microsoft for cheap. So what happens? Chances are, you take journalists that are a little technically inclined, give them a few books on Ruby on Rails, and ta-da! you've got an "innovation"* team. It's not a recipe for tight security.

It doesn't help that the buzz in newsrooms for years has basically been around "hybrid journalists" that are video producers/writers/programmers all at once. Now, I have some respect for that idea. I personally believe in being well-rounded. But it's not always realistic, and more importantly, some things are too important to be left to generalists. Security is one of those things. Not only can poor data security undermine your instititional reputation, but it can be dangerous for your reporting, as well.

Take note, for example, of this article from Poynter on data visualizations. Washington Post reporter Sarah Cohen explains how graphing data isn't just useful for external audiences, but it can also help reporters zero in on interesting stories, or eliminate stories that actually aren't newsworthy. In fact, she says, the internal usage is probably far greater than the amount that makes it to the web or to print. It's a great explanation of why data visualization is an actual reporting tool that gets lost in the fuss over Twitter and blogging ethics panels.

So newsroom data isn't only meant for public consumption. It's a real source for journalists, particularly in number-heavy beats like public policy or business. And that means that data needs to be trusted. As long as it's siloed away inside the building, that's probably fine. Once it's moved outside and exposed through any kind of API, measures need to be taken to ensure it isn't tampered with in any way. And if it's used for any kind of crowdsourcing (which, to be fair, I have advocated in the past), that goes double.

So am I saying we should back away from opening up our newsrooms to online audiences? Not at all. But we should understand the gravity of the situation first, making sure that resources have been expended commensurate with reputational risk. And let's be honest: while it's great that NPR and the New York Times are making neat API calls and interactive polls available to everyone, maybe that's simply not appropriate--or aligned with the newsroom's primary mission--at smaller organizations.

Journalism has to come first. That journalism has to be trustworthy, down to the data on which it relies. Think of it as an editorial bar that needs to be cleared: if you don't feel like your security is up to the task, perhaps caution is in order. On the other hand, if you can't justify security from the start (as Time clearly couldn't), what you're really saying is that your results don't really matter (Time's certainly shouldn't). In that case, is it really the best use of your time?

Future - Present - Past