this space intentionally left blank

June 22, 2011

Filed under: journalism»new_media»data_driven

Against the Grain

If I have a self-criticism of the work I'm doing at CQ, it's that I mostly make flat tools for data-excavation. We rarely set out with a narrative that we want to tell--instead, we present people with a window into a dataset and give them the opportunity to uncover their own conclusions. This is partly due to CQ's newsroom culture: I like to think we frown a bit on sensationalism here. But it is also because, to a certain extent, my team is building the kinds of interactives we would want to use. We are data-as-playground people, less data-as-theme-park.

It's also easier to create general purpose tools than it is to create a carefully-curated narrative. But that sounds less flattering.

In any case, our newest project does not buck this trend, but I think it's pretty fascinating anyway. "Against the Grain" is a browseable database of dissent on party unity votes in the House and Senate (party unity votes are defined by CQ as those votes where a majority of Republicans and a majority of Democrats took opposing sides on a bill). Go ahead, take a look at it, and then I'd like to talk about the two sides of something like this: the editorial and the technical.

The Editorial

Even when you're building a relatively straightforward data-exploration application like this one, there's still an editorial process in play. It comes through in the flow of interaction, in the filters that are made available to the user, and the items given particular emphasis by the visual design.

Inescapably, there are parallels here to the concept of "objective" journalism. People are tempted to think of data as "objective," and I guess at its most pure level it might be, but from a practical standpoint we don't ever deal with absolutely raw data. Raw data isn't useful--it has to be aggregated to have value (and boy, if there's a more perilous-but-true phrase in journalism these days than "aggregation has value," I haven't heard it). Once you start making decisions about how to combine, organize, and display your set, you've inevitably committed to an editorial viewpoint on what you want that data to mean. That's not a bad thing, but it has to be acknowledged.

Regardless, from an editorial perspective, we had a pretty specific goal with "Against the Grain." It began as an offshoot of a common print graphic using our votestudy data, but we wanted to be able to take advantage of the web's unlimited column inches. What quickly emerged as our showcase feature--what made people say "ooooh" when we talked it up in the newsroom--was to organize a given member's dissenting votes by subject code. What are the policy areas on which Member X most often breaks from the party line? Is it regulation, energy, or financial services? How are those different between parties, or between chambers? With an interactive presentation, we could even let people drill down from there into individual bills--and jump from there back out to other subject codes or specific members.

To present this process, I went with a panel-oriented navigation method, modeled on mobile interaction patterns (although, unfortunately, it still doesn't work on mobile--if anyone can tell me why the panels stack instead of floating next to each other on both Webkit and Mobile Firefox, I'd love to know). By presenting users with a series of rich menu options, while keeping the previous filters onscreen if there's space, I tried to strike a balance between query-building and giving room for exploration. Users can either start from the top and work down, by viewing the top members and exploring their dissent; from the bottom up, by viewing the most contentious votes and seeing who split from the party; or somewhere in the middle, by filtering the two main views through a vote's subject code.

We succeeded, I think, in giving people the ability to look at patterns of dissent at a member and subject level, but there's more that could be done. Congressional voting is CQ's raison d'etre, and we store a mind-boggling amount of legislative information that could be exploited. I'd like to add arbitrary member lookup, so people could find their own senator or representative. And I think it might be interesting to slice dissent by vote type--to see if there's a stage in the legislative process where discipline is particularly low or high.

So sure, now that we've got this foundation, there are lots of stories we'd like it to handle, and certain views that seem clunkier than necessary. It's certainly got its flaws and its oddities. But on the other hand, this is a way of browsing through CQ's vote database that nobody outside of CQ (and most of the people inside) have never had before. Whatever its limitations, it enables people to answer questions they couldn't have asked prior to its creation. That makes me happy, because I think a certain portion of my job is simply to push the organization forward in terms of what we consider possible.

So with that out of the way, how did I do it?

The Technical

"Against the Grain" is probably the biggest JavaScript application I've written to date. It's certainly the best-written--our live election night interactive might have been bigger, but it was a mess of display code and XML parsing. With this project, I wanted to stop writing JavaScript as if it was the poor man's ActionScript (even if it is), and really engage on its own peculiar terms: closures, prototypal inheritance, and all.

I also wanted to write an application that would be maintainable and extensible, so at first I gave Backbone.js a shot. Backbone is a Model-View-Controller library of the type that's been all the rage with the startup hipster crowd, particularly those who use obstinately-MVC frameworks like Ruby on Rails. I've always thought that MVC--like most design patterns--feels like a desparate attempt to convert common sense into jargon, but the basic goal of it seemed admirable: to separate display code from internal logic, so that your code remains clean and abstracted from its own presentation.

Long story short, Backbone seems designed to be completely incomprehensible to someone who hasn't been writing formal MVC applications before. The documentation is terrible, there's no error reporting to speak of, and the sample application is next to useless. I tried to figure it out for a couple of hours, then ended up coding my own display/data layer. But it gave me a conceptual model to aim for, and I did use Backbone's underlying collections library, Underscore.js, to handle some of the filtering and sorting duties, so it wasn't a total loss.

One feature I appreciated in Backbone was the templating it inherits from Underscore (and which they got in turn from jQuery's John Resig). It takes advantage of the fact that browsers will ignore the contents of <script> tags with a type set to something other than "text/javascript"--if you set it to, say, "text/html" or "template," you can put arbitrary HTML in there. I created a version with Mustache-style support for replacing tags from an optional hash, and it made populating my panels a lot easier. Instead of manually searching for <span> IDs and replacing them in a JavaScript soup, I could simply pass my data objects to the template and have panels populated automatically. Most of the vote detail display is done this way.

I also wanted to implement some kind of inheritance to simplify my code. After all, each panel in the interactive shares a lot of functionality: they're basically all lists, most of them have a cascading "close" button, and they trigger new panels of information based on interaction. Panels are managed by a (wait for it...) PanelManager singleton that handles adding, removing, and positioning them within the viewport. The panels themselves take care of instantiating and populating their descendants, but in future versions I'd like to move that into the PanelManager as well and trigger it using custom events.

Unfortunately, out-of-the-box JavaScript inheritance is deeply weird, and it's tangled up in the biggest flaw of the language: terrible variable scoping. I never realized how important scope is until I saw how many frustrations JavaScript's bad implementation creates (no real namespaces! overuse of the "this" keyword! closures over loop values! ARGH IT BURNS).

Scope in JavaScript is eerily like Inception: at every turn, the language drops into a leaky subcontext, except that instead of slow-motion vans and antigravity hotels and Leonardo DiCaprio's dead wife, every level change is a new function scope. With each closure, the meaning of the "this" keyword changes to something different (often to something ridiculous like the Window object), a tendency worsened in a functional library like Underscore. In ActionScript, the use of well-defined Event objects and real namespaces meant I'd never had trouble untangling scope from itself, but in JavaScript it was a major source of bugs. In the end I found it helpful, in any function that uses "this" (read: practically everything you'll write in JavaScript), to immediately cache it in another variable and then only use that variable if possible, so that even inside callbacks and anonymous functions I could still reliably refer to the parent scope.

After this experience, I still like JavaScript, but some of the shine has worn off. The language has some incredibly powerful features, particularly its first-class functions, that the community uses to paper over the huge gaps in its design. Like Lisp, it's a small language that everyone can extend--and like Lisp, the downside is that everyone has to do so in order to get anything done. The result is a million non-standard libraries re-implementing basic necessities like classes and dependencies, and no sign that we'll ever get those gaps filled in the language itself. Like it or not, we're largely stuck with JavaScript, and I can't quite be thrilled about that.

Conclusions

This has been a long post, so I'll try to wrap up quickly. I learned a lot creating "Against the Grain," not all of it technical. I'm intrigued by the way these kinds of interactives fit into our wider concept of journalism: by operating less as story presentations and more as tools, do they represent an abandonment of narrative, of expertise, or even a kind of "sponsored" citizen journalism? Is their appearance of transparency and neutrality dangerous or even deceptive? And is that really any less true of traditional journalism, which has seen its fair share of abused "objectivity" over the years?

I don't know the answers to those questions. We're still figuring them out as an industry. I do believe that an important part of data journalism in the future is transparency of methodology, possibly incorporating open source. After all, this style of interactive is (obviously, given the verbosity on display above) increasingly complex and difficult for laymen to understand. Some way for the public to check our math is important, and open source may offer that. At the same time, the role of the journalist is to understand the dataset, including its limitations and possible misuses, and there is no technological fix for that. Yet.

Past - Present