Perhaps this is irony, coming on the heels of the previous post, but I'd like to announce that the first version of the NPR client for Android incorporating my patches has gone live. You can find it in the Market, or see it on App Brain. I get a credit and everything, as the second coder on the project. I'm pretty thrilled.
I got involved because, in keeping with the open-source spirit behind Android itself, NPR has released the source for the client at a Google Code repository under the Apache license. You can download it for yourself, if you'd like (you'll need an API key to compile, though). The NPR team would love to have contributions from other coders, designers, or even just interested listeners. You can hit them up via @nprandroid on Twitter, or send an e-mail to the app's feedback address.
This version mainly splits playback off into a background service with a notification, which is a better user experience and means the stream won't be killed if you leave the application with the Back button. We've got another version in the works that improves this functionality, incorporates some little UI tweaks, and lays the groundwork for home screen widgets. I'd like to thank Corvus for his help in spotting areas where the Android client needs improvement. The NPR design team is also finishing up an overhaul of the look-and-feel of the application, and hopefully we can get that out soon. Along with taking care of bug fixes and project cleanup, that's my priority as soon as existing revisions are cleared.
I'm biased, of course, as someone who's interested in what I call data-driven journalism. But the way I see it, the basic task of journalism is to ask questions, and with more data than ever being made available by governments, non-profits, corporations, and individuals, it becomes difficult to answer those questions--or even to know where to start--unless you can leverage a computer's ability to filter and scale.
For example: our graphics reporter is pulling together some information regarding cloture over the last century years. She's got a complete list of all the motions filed since the 66th Congress (Treaty of Versailles in 1919!). Getting a count of motions from the whole set with a given result is easy with Excel's COUNTIF function, but how do we get a count of rejected motions by individual Congress? You could do it by manually filtering the list and noting the results, or you could write a new counting function (which we then extended to check for additional criteria--say, motions which were rejected by the majority party). The latter only takes about 10 lines of code, and it saves a tremendous amount of tedium. More importantly, it let her immediately figure out which avenues of analysis would be dead ends, and concentrate our editorial efforts elsewhere.
We also do a fair amount of page-scraping here--sometimes even for our own data, given that we don't always have an API for a given database field. I'm trying to get more of our economic data loaded this way--right now, one of our researchers has to go out and get updates on the numbers from various sources manually. That's time they can't spend crunching those numbers for trends, or writing up the newest results. It's frustratingly inefficient, and really ought to be automated--this is, after all, exactly what most scripting languages were written to do.
It's true that these are all examples of fairly narrow journalism--business and economic trends, specific political analysis, metatextual reporting. Not every section of the paper will use these tools all the time, and I'm not claiming that old style, call-people-and-harass-them-for-answers reporting will go away any time soon. But I've been thinking lately about the cost of investigative reporting, and the ways that computer automation could make it more profitable. Take Pro Publica's nursing board investigation, for example. It's a mix of traditional shoe leather reporting and database pattern-matching, with the latter used to direct the former. Investigative reporting has always been expensive and slow, but could tools like this speed the process up? Could it multiply the effectiveness of investigative reporters? Could it revive the ability for local papers to act as a watchdog for their regional governments and businesses?
Well, maybe. There are a lot of reasons why it wouldn't work right now, not the least of which is the dependence of data-driven journalists on, well, data. It assumes that the people you're investigating are actually putting information somewhere you can get to it, and that the data is good--or that you have the skills and sufficient signal to distinguish between good data and bad. If I imagine trying to do this kind of thing out where my parents live in rural Virginia (a decent acid test for local American news), I'd say it's probably not living up to its potential yet.
But I think that day is coming. And I'm not the only one: Columbia just announced a dual-degree masters program in journalism and computer science (Wired has more, including examples of what the degree hopes to teach). To no small degree, the pitch for developing these skills isn't just a matter of leveraging newsroom time efficiently. It's more that in the future, this is how the world will increasingly work: rich (but disconnected) private databases, electronic governmental records, and interesting stories buried under petabytes of near-random noise. Journalists don't just need to learn their way around basic scripting because it's a faster way to research. They may need it just to keep up.
Product X will not save journalism.
I'm writing this so I can link back to it regularly. Because roughly once a year, someone invents a new device or service that moves words around, and at that point the media punditry proclaims it the second coming for poor beleaguered Journalism. This is the flip side--and direct result--of the industry's "The Internets Are In Ur Base, Killing Ur Journalisms" meme. And they're both wrong, consistently and repeatedly, but no-one ever seems to learn.
How do we know it's wrong? Well, for one thing, because Product X keeps Not Saving the industry. The Kindle didn't do it. Twitter isn't doing it. Smartphones aren't doing it. Past experience isn't a perfect guide to future performance, but there does seem to be a trend here--namely, that a single technical innovation is not enough to single-handedly stop journalists from A) ruining their industry and B) loudly complaining that their industry is being ruined. To be fair, "complain loudly" is basically the sum of the journalist's credo.
The issue with Product X is invariably that it solves the wrong problem. Usually, this means monetizing digital distribution channels, which media pundits find fascinating for no good reason (see: "The Internets In Ur Base etc."). I say no good reason, because we've known how to monetize distribution over the Internet for literally years now. These are solved problems, technologically. We know that advertising can work (for certain values of "advertising" and with sufficient infrastructure), and it's not like we didn't have ways of selling subscriptions via standard web browsers, which are the great leveller of access and audience. It's not that we don't have the capabilities--it's that people (with some notable exceptions) don't want to pay for journalism this way.
And yet, every time Product X is announced, editorialists shout from the mountaintops that publishing will be saved by its peculiar virtues. Because if they couldn't sell subscriptions to the whole Internet at the implausible profit margins demanded by Wall Street, surely they'll be able to sell them to a much smaller audience through an expensive gadget and its specialized storefront--which demands a cut off the top. And this is their idea of a sustainable business plan!
What it comes down to is this: you can't solve journalism's problems with technological Product X because its most pressing problems aren't really technological. They're social, commercial, and editorial (see: Your Plan To Save The Media Will Not Work, A Checklist). And anyone claiming that Product X can solve those problems has been reading their own marketing materials for too long.
It's intensely frustrating to me that of all the possible professions where this kind of thinking could occur, journalism is so prone to this kind of reductionist thinking. Perhaps it's a symptom of the industry's current obsession with horse-race narratives. But if it were up to me, we'd make this an interview test for news executives. Do you believe in Product X? Yes? Then you should maybe stay on your beat a little longer, until you figure out that's not the way the world works.
Via Aleks Krotoski, web developer Jeremy Keith discusses the "truism" that The Internets Never Forget:
We seem to have a collective fundamental attribution error when it comes to the longevity of data on the web. While we are very quick to recall the instances when a resource remains addressable for a long enough time period to cause embarrassment or shame later on, we completely ignore all the link rot and 404s that is the fate of most data on the web.From there, Keith muses a bit on domain names, which are rented from the ICANN: you can own your data (or own your name), but you can't own your domain in perpetuity. We've been dealing with content-management questions a bit at work lately, as any news organization transitioning from print to web must, so this kind of thing has been on my mind anyway. And I've reached the point, personally, where I take a fairly radical stance: not just that the web does lose content over time, but that it should do so. Permanence is unrealistic, if not actively harmful.
There is an inverse relationship between the age of a resource and its longevity. You are one hundred times more likely to find an embarrassing picture of you on the web uploaded in the last year than to find an embarrassing picture of you uploaded ten years ago.
Now, I say this as someone who likes URLs, and who believes that basic URL literacy is not too much to expect from people. I also think URLs should be stable for a reasonable period of time--inversely proportional to their depth in the directory tree, for example, so that "www.domain.com/stories" should be much more stable than "www.domain.com/content/stories/about/buildings/and/food.html" or something like that. But the idea that you can have URLs that are stable forever? Or that you should expect all content to be equally preservation-worthy? That's just foolish.
Take a news organization, for example. Your average news site produces a truly galling amount of content every day: not just text stories, but also images, video, audio, slideshows, interactives, and so on. Keeping track of all of this is a monumental task, and the general feeling I get is that these companies are failing miserably at it. I cannot think of a single newspaper website (including CQ, no less) where it is easier for me to find a given item through their own navigation or search than it is to go to Google and type "washington post mccain obama timeline" (to pick a recent example).
And that's not a bad thing. Google spends a lot of time learning how to read your mind (effectively). They (and their competitors at Bing, or wherever) employ a lot of smart people to do nothing but help you find what you're looking for, even if you don't spell it right or if the URL has changed. I say, let them do that. If it were up to me, I'd replace every in-site search engine with a custom Google query and then forget about it: the results would probably be better (they could hardly be worse) and newsroom tech departments could spend their time and money on actual journalism-related activities.
The thing is, the vast majority of content (particularly in journalism) has a set lifespan, and we should respect that. The window of time when stable URLs are crucial is limited to a couple of months or so: enough time for bloggers (micro- or otherwise) and social networkers to discuss those rare few articles that catch on with the Internet audience and have legs. After that, searchability is more important than stability, because people aren't going to dig up the old links. They're going to locate what they want via someone else's search engine. That's if they search for it at all, of course. Because realistically speaking, most news has little in the way of legs, especially on the Internet where readers expect breaking stories to adopt a blog-style hit-and-run update pattern. It's intensely valuable for about a day, and then it's digital hamster-cage lining. Don't throw it away haphazardly--but don't fool yourself about its long-term value, either.
This may sound like I'm saying that we should give up on archiving. I'm not--after all, I'm the world's biggest fan of Lexis-Nexis. I simply propose that fighting linkrot can't be our top priority. When it comes to content management, my question is not "how do we store this at the same location forever?" but "how easy will it be to port this medium-term storage solution into another with a minimum of degradation for the content that actually matters?" It's that content that I really care about, not its address, because Google (or Bing, or whatever) will always be able to find the new location. That makes ease of migration much more important than URL fidelity. If you're thinking about a news CMS timeframe longer than, say, two years, I think you risk losing sight of that fact.
Ultimately, links break. Let them. Attempting to engineer for eternity is a great way to never finish building--or to lock yourself into a poor foundation when the technological ground shifts. And honestly, we're far enough behind as an industry now. We don't need to bury ourselves any more.
It's been a big week for CQ's vote studies, which measure the presidential support and party unity of each senator and representative on a series of key votes. Our editorial and research team finished up the results for President Obama's first year in office, leading to a pair of NPR stories based on that data, in addition to our own coverage, of course.
To accompany our stories, I built a new version of our vote study visualization, leveraging what I've learned since creating the original almost two years ago. It is, as you'd expect, faster and better-looking. But there are subtle improvements included as well, ones I hope will make this a solid base for our vote study multimedia during the Obama administration.
As I've said before, I'm extraordinarily proud of the work our vote study team does, and thrilled to be able to contribute to their online presence in this way. Check it out, and I'd love to hear your thoughts.
As part of my new team leader position at CQ, I get to pick which technologies and platforms our multimedia team will use for its projects. This is less impressive than it sounds: for content management reasons, our team often has to work separately from the rest of the CQ.com publishing platform, so it's not like I get to decide the fate of the organization. In any case, today I want to talk about a particular aspect of the limited power I do have: the use of "web standards" in creating online journalism.
But it's wrong to think that we should avoid Flash for ideological reasons instead of jumping in the moment it becomes more convenient--and frankly, the "web standards" approach is often anything but convenient, particularly for interaction and rich graphics. Building good-looking UI components out of div tags or fighting with stylesheets is not my idea of a good time. And it's not just painful, it's much less productive compared to the rapid pace of development in Actionscript. I personally feel that the speed factor--the time it takes for me to write a complex, rich application--is something that web standards groups aren't spending enough time on, frankly. The <aside> tag won't help me create content faster, while making CSS behave in a sane and easily predictable fashion would, but there are working groups for the former and seemingly none for the latter.
(Advocates for these "semantic" tags, by the way, would do well to read Arika Okrent's In The Land of Invented Languages, particularly the parts about the "philosophical" conlangs, which attempted--and failed miserably--to create a logical, self-evident classification for all the concepts we express in our messy and meaning-overloaded "natural" languages. Sound familiar?)
HTML 5 proponents point to its new tags (such as <canvas> or <video>) as alternatives, an idea that should make even the most inexperienced Actionscript developer chuckle in cynical mirth. Canvas in particular is phenomenally unsuited to replace Flash's animation and interaction capabilities, as a single glance at the API tutorial should make clear. All drawing is done manually on every frame, transforms are awkward, and compositing is done in the most confusing possible manner. It's fine for simple graphs and charts, but I'd have to re-implement the equivalent of Actionscript's display list--its powerful, tree-based rendering engine--and its event dispatch libraries from scratch before canvas could be useful. Our team's time is too valuable to spend hacking around on that kind of low-level functionality instead of producing actual journalism. Not to mention the time it would take to replace Actionscript's enormous library of other utility code in the DOM (also known as the world's worst programming API).
All of which to say that I just can't get worked up when people start ranting about killing off Flash and replacing it with "standards"-based design. As far as I'm concerned, Actionscript has become a de-facto standard for the web, one that anyone can leverage (the free Flex SDK and FlashDevelop IDE are a must-have combo). By all means, let's put pressure out there for less centralized and more open solutions, ones that aren't owned by a single corporate entity. But in the meantime, if we want to get things done, there are two options. We can shun Flash out of spite, in favor of solutions that require more work for less return. Or we can start telling news stories in interesting ways using this technology. I know which path my team is going to take.
Being the hip young technologist that she is, Belle has one of those Palm Pre phones, which does something very cool: given login information for various social media accounts (Google, Facebook, etc), it collates and cross-link that information into the device's contact list. So a person's ID picture is the same as their Facebook profile image, and when they update their contact information online, it automatically changes on the phone. Handy--when it works.
My understanding is that most of the time it does, but sometimes Palm's system doesn't quite connect the dots, and then Belle has to go in and tell it that certain entries are, in fact, the same person. Frankly, I'm impressed that it works at all. It's an example of the kind of pattern recognition that people are very good at, and computers typically are not. I personally think we'll always have an edge, which makes me feel absurdly better, as if Skynet's assassin robots will never be able able to track down Sarah Connor or something.
In essence, what Palm has done is create a system for linking facts with a confidence threshold. And it's something I've been thinking about in relation to journalism, particularly after watching a presentation by the Sunlight Foundation on their data harvesting efforts during the age of data.gov, not to mention the work I've been doing lately on budget and economic indicators. There's a lot of information floating around (and more every day), but how can we coordinate it with confidence? And is it possible that the truth will get buried under its weight?
Larry Lessig, of all people, pessimistically pitched the latter earlier this month, in a New Republic essay titled "Against Transparency." Lessig ties together the open government movement, free content activists, and privacy advocates into what he calls the "salience" problem: extracting meaning in context from a soup of easily-manipulated facts, without swamping the audience in data or misinterpreting it for political gain. It's a familiar problem: I consider myself a journalist, but I spend pretty much my entire workday nowadays chin-deep in databases, figuring out how to present them to both our readers and our own editorial team for use. It is, in other words, the same confidence problem: how do we decide which bits of data are connected, and which are not?
Well, part of the answer is that you need journalists who are good subject experts. All the data in the world is meaningless unless you have someone who can interpret it. In fact, this is one of the main directions I see journalism exploring as newsrooms become more comfortable with technology. Assuming journalists can survive until that point, of course: being a deep subject expert is well and good, but it seems to be the first thing that gets cut these days when the newsroom profitability drops.
Second, as journalism and crowdsourcing become more comfortable with each other, I think we're going to have to start tagging information with a confidence rating: how sure are we that these bits of information are related? Data that's increasingly pulled from disparate--and unevenly vetted--sources will need to be identified by its reliability. I'd still like to be able to use it, but I should be able to adjust for "truthiness" and alert others about it.
But perhaps most importantly, this kind of debate really highlights how the open government movement needs to be not just about the amount of data, but also its degree of interoperability. This has really been driven home to me on the federal budget process: from what I can understand of this fantastically complicated accounting system, you can track funds from the top down (via the subcommittees), or from the bottom up (actual agency funding). But getting the numbers to meet in the middle is incredibly hard, due to the ways that money is tracked. Indeed, you can get the entire federal budget as a spreadsheet (it's something like 30,000 line items), but good luck decoding it into something understandable, much less following funding from year to year.
That's a problem for a journalist, but it's also a problem as a citizen. Without clean data, open government initiatives may be severely weakened. But contra Lessig, I don't think that makes them worthless. I think it creates an interesting problem to solve--one we can't just brute-force with computing power. Open government shouldn't just be about amount, but about quality. When both are high, I see a lot of great opportunities for future reporting.
My apologies for a slow week posting here--in addition to rewriting the site and learning a bit more about Android, you may have heard that there's been some excitement going around at CQ. It's been busy.
But we're not the only journalistic institution feeling a little shaken up. In the aftermath of the Google Wave invite frenzy, Mark Milian of the LA Times got a little overexcited. He lists some "wild ideas" they've had while testing the technology. And I am all for wild ideas, but I think he's missing the point. The problem in newsrooms isn't the lack of technology, it's that journalists don't use it.
Case in point: most of Milian's suggestions involve using Wave as a kind of glorified content management system--using it to log notes during collaborative stories, archiving interview recordings, or providing a better update mechanism. I absolutely understand why such a thing seems like a dream come true, because as far as I can tell most CMSs in the journalism world are appalling (often because they were geared toward print needs, and have been jury-rigged into double-duty online). But look realistically at what he's asking for: effectively, it's a wiki (albeit a very slick one) and a modern editorial policy. This isn't rocket science.
We've had the tools to do what Milian wants for years now. The problem, in my experience, has been getting reporters and editors to cooperate. They're an independent lot, and we still sometimes have trouble getting them to use our existing centralized, collaborative, versioned publishing toolchain, much less a complex and possibly overwhelming web app like Wave. Moreover, what's the real benefit? Will we get more readers with prettier correction notes? Will the fact-checking be more accurate if it's transmitted over AJAX? Can Wave halt the erosion of trust in American journalism? No? Then it's kind of a distraction from the real problem, as far as I'm concerned. I mean, I'd love it if all the reporters I work with knew their way around a data visualization. I'd like a pony, too. But at the end of the day, what matters is the journalism, not the tools that were used to create it.
Where Milian might have a point is in the centralization of Wave, with its integration of audio, video, and image assets. The catch is where it's centralized: with Google. I doubt many newsrooms are incredibly keen to trust reporter's notes, call recordings, and editing chatter entirely to a third-party, particularly one with which they already consider themselves at odds. There are real questions of liability, safety, and integrity to be considered here. Not to mention what happens if one of those interlinked services goes down (I'm looking at you, GMail). If we're headed for a griefer future (and I think we are), maybe it's wise not to leap headfirst into that cloud just yet.
So look: everything he's written is a fine idea. I agree that they'd be great options to have, and you'll never find me arguing against better content management. But the barrier to entry has never been that we lacked a Google Wave to make it happen--it's been an ideological resistance to the basic principles of new media publishing in newsrooms around the country. Until you change that, by convincing journalists of the value of community interaction/link traffic/transparency/multimedia, all the fancy innovations in the world won't make an impact.
Yesterday I attended the Knight-Batten awards for innovation in journalism with some of the other multimedia team members at CQ. There was some really interesting work being shown (such as Pro Publica's Change Tracker project), as well as some for which I remain skeptical (the concept of "printcasting," for example, seems deeply misguided to me).
One award-winner that did truly impress me was the Center for Public Integrity's investigative journalism into tobacco smuggling. Titled Tobacco Underground, CPI lays out the global implications of the illicit tobacco economy, including hazardous counterfeit cigarettes from China, contraband flooding out of Russia and Ukraine, and a billion-dollar black market in the US and Canada run by organized crime. Tobacco is even a major funding source for terrorists in Pakistan, Northern Ireland, and Columbia. CPI's piece is an astonishing look at something that I (as a non-smoker) and likely most others would never suspect was an international criminal enterprise worth billions of dollars. Check it out.
Since web video is kind of a hobbyhorse for me, at least one coworker has sent me their reactions to the Washington Post's ill-advised "Mouthpiece Theater" videos. These were a series of "comedy" shorts centering on political reporters Dana Milbank and Chris Cillizza, culminating in a piece that recommended a brand of beer named "Mad Bitch" to Secretary of State Hillary Clinton. The Columbia Journalism Review has a decent overview here. CJR's Megan Garber also draws attention to an important point from the paper's ombudsman: the Post views this, and other web video, as an "experiment."
I wish I could say that this is uncharacteristic. But there's just something about new media that makes otherwise sane, respectable journalistic outlets ignore the infrastructure of fact-checking, editorial review, and reputational risk that they've built for their traditional output. Executive Editor Marcus Brauchli admits as much in his reply to the Center for New Words when he writes: "We did not have a good process in place for reviewing videos before they are published on our site, and we are correcting that." Obviously, the Post would never treat its print reporting with a similar lack of oversight, but when YouTube enters the picture, caution is apparently tossed to the winds. I don't know exactly what it is that causes this. But I do have some guesses.
In truth, reliable comedy takes a massive amount of work. The Onion staff says in interviews that they start each week with six to eight hundred headline ideas, which are eventually culled to the 15 or 20 strongest candidates before publication. With that much effort bent to the task, the Onion and the Daily Show simply make this look easy. Journalists attempting to ape them quickly find out that it's not. Mouthpiece Theater caused offense for a joke that went too far, but the first warning sign should have been that it wasn't particularly hilarious to begin with.
I'll close with a somewhat in-the-trenches observation: as print organizations have moved online, there's been a great deal of panic over the role that video and multimedia will play in relation to more familiar formats. Most of the time, this panic means there's no clear vision behind their use: are they for clowning around? For infodumps by talking heads? For reposting network footage to accompany articles? For aping the stilted, much-ridiculed delivery of the local TV news? You only have to look at the schizophrenic archives of most American media sites to realize that there's no real plan behind it (the unsurprising exception among the big names being the New York Times, which has a generally savvy new media team).
In elementary school, we learn to write about the five questions: who, what, when, where, and why. I think you can answer these in any medium--but I think that each format has its strengths. My guiding rule of thumb has been that video is best-suited toward answering the "who" and the "why"--the human angle, in other words. Who are these people? What are their motivations, and their reasoning? Video leverages the tools that we've evolved over millenia for reading faces and telling stories, in ways that would be very difficult to evoke objectively through text or an interactive graphic. In my opinion, as news organizations try to figure out where video fits into their lineup, that's the high-level discussion they should be having. In the meantime, they should probably leave the comedy to the experts.