It has been a busy week, but I wanted to take a moment to recognize my colleagues at The Seattle Times for their tremendous work, resulting in a 2015 Pulitzer Prize for breaking news journalism. Their coverage of the Oso landslide was clear, comprehensive, and accurate, and followup work continues to this day (including one of my first projects for the paper). It's very cool to be working in a newsroom that's the winner of 10 Pulitzer Prizes over the years, and I'm looking forward to being here when we win #11.
We have sent several people from the newsroom, including myself, to journalism conferences over the last few months. Most conferences are about 50% inspirational and 50% crap (tilted heavily crap-wards in the keynotes), but you meet good people and you get to see the nuts and bolts behind the scenes of some of the best interactive news stories published.
It's natural to come back from a conference with a kind of inferiority complex, and equally easy to conclude that we're not making similar rich presentations because we don't have the cool tools that those other (richer, more tech-savvy) newsrooms have. We too, according to this train of thought, need to be coding elaborate visualization generators and complicated new CMS features — or, as Ryan Pitts from Mozilla said to me last weekend at the Society for News Design workshop, "let's not rest until every paper in the country has built its own charting application."
I think better newsroom tech is important, but let's play devil's advocate for a bit with an unpopular hypothesis: developing tools for the editors and reporters at your newspaper is a waste of your time, and a distraction from the journalism you should be doing.
Why a waste of your time? Partly because newsroom tools get a lot less uptake than you probably think they do (certainly less than we'd hope they would). I've written a lot of internal applications in my time, and they've never been particularly popular, because most reporters and editors don't care. They're too busy doing journalism to use your solution (which is as it should be), and they are probably not big on technology anyway (I have a lot of reporters who can't use Excel, which pains me greatly). Creating tools for reporters is, most of the time, attacking the problem at the wrong point.
For many newsrooms, that wasted time will end up being twice as expensive, because development resources are scarce and UI is hard. Building a polished, feature-filled chart generator that the average journalist can use will take at least a couple of programmer months, which is time those developers aren't working on stories and visualizations that readers want. Are you willing to sacrifice that time, especially if you can't guarantee that it'll actually get used? That's a pretty big gamble, unless you have the resources of the New York Times. You're probably better off just going with an off-the-shelf package, or even finding a simpler solution.
I don't think it's a coincidence that, for all the noise people make about the new data journalism startups like Vox and FiveThirtyEight, 99% of their chart output does not come from a fancy tool or a complex interactive: they post JPEGs. And that's fine! No actual reader has ever complained about having to look at a picture of a graph instead of a souped-up vector rendering (in Vox's case, they're too busy complaining that the graph was stolen from someone else, but that's another story). JPEG is a perfectly decent solution when it comes to simply telling the story across the entire web platform — in fact, it's a great embodiment of "do the simplest thing that works," which has served me well as a guiding motto in life.
So, as a rule of thumb: don't build charting libraries. Don't build general-purpose databases. Don't build drag-and-drop slideshows. Leave these things to other people, who have time and energy to build them for a living. Does this mean you shouldn't create tools at all? No, but the target audience should be you, the news developer, and other semi-technical newsroom staff like the web producers. In other words, make technology for the people who will actually use it, and can handle something that's not polished to a mirror sheen.
I believe this is the big strength of web components, and one reason I'm so bullish on them at the Seattle Times. They're not glossy, end-user products, but they are a great balance between power and accessibility for people with a little technical skill, and they're very fast to build. If the day comes when we do choose to invest in a slicker newsroom app, we can leverage them anyway, the same way that the NYT's fancy chart designers are all based on the developer-oriented D3 library.
In the meanwhile, while I would consider an anti-tool stance a "strong opinion weakly held," I think there's a workable philosophy there. These days, I feel two concerns very strongly (outside of my normal news/editorial production, of course): how to get the newsroom to make use of our skills, and how to best use the limited developer resources we have. A "no tools" guideline is not an absolute rule, but it serves as a useful heuristic to weed out the kinds of projects that might otherwise take over our time.
In the last couple of weeks, a few more of my Seattle Times projects have gone live — namely, the animated graph in this story about EB-5 visa growth, and the Seattle architecture quiz. Both use the FLIP animation technique I wrote about a few weeks ago, although it's much more elaborate in the EB-5 graph, which animates roughly 150 elements at 60fps on older mobile devices.
About nine months ago, I made the first check-in on the Seattle Times news app template. Since that time, it's been at the heart of pretty much everything we've done at the Times, ranging from big investigative projects to Super Bowl coverage to dog name analysis. We've adapted it to form the basis of our web component stack, and made a version that automates Leaflet map creation. It's been a pretty great tool, used by news apps developers, producers, and graphics team members alike.
That said, I think in digital journalism we often talk in glowing terms about our tools, but we don't nearly as often discuss the downsides they possess. So let's be honest with ourselves: I love this scaffolding, but it's not perfect. It has issues. And I think those issues say interesting things about not only the template itself, but also newsroom culture, and the challenges of creating tools that can operate there.
What are the common threads here? While you could point to the static page approach as being part of the issue, I actually think what causes a lot of these problems is that the intended audience for the news app template is both broad and narrow. It's broad in that its users range from novice journalists to experienced developers (and, indirectly, non-technical editors and reporters feeding data into Google Sheets). It's narrow in that the actual production still requires a high level of technical comfort: familiarity with the command line, new kinds of tooling, and some ability to roll with unexpected bugs.
This is a tough, and self-contradictory, audience for a visualization toolkit. It's not, however, out of character for a general-purpose dev framework. And indeed, when we talk about app scaffolds from any news organization (not just The Seattle Times), that's what they are. They're written to be fast, to be portable, and to generate static files, because those are our priorities as deadline-driven journalists. They are also the far end of a range of newsroom tools, where news apps are at one end and pre-built widgets live on the other. I'm not really worried about where the template lives on that range, and I'm certainly not planning on reducing the complexity — I think it's at a sweet spot right now. But I do worry about the ways that it (and our CMS) fit into newsroom culture.
At the Times, like in many newsrooms, the online presence is largely run by "producers," who curate the stories on the home page and handle the print-to-digital transition process (it's not the same as a "producer" in software development). This process is complicated and highly-skilled, because news CMS systems are generally terrible. The web production staff also often work on projects that would, in print, fall under page design: building complex HTML presentations for special stories. This isn't because they're trained designers: producers are often younger, and while it's not entry-level work, it's close. They end up doing this work because trained, HTML-fluent designers are rare, and because nobody else in the newsroom bothers to learn web design.
As a result, we end up in a funny situation: the only people in the newsroom who really understand the web are the producers. Editors and reporters are discouraged from becoming more technically savvy because the workflow is print-first, and the CMS is so intimidating. Meanwhile, producers rarely become editors or reporters because the newsroom can't afford to lose their skills. There's a tremendous gap in newsroom culture between people who produce the content, and people who actually understand the medium in which that content is consumed. While the tooling is not entirely responsible for that, it is a contributing factor.
I think the challenge we face, as newsroom developers, is to be always aware and vigilant of that gap and its causes. Tools like the news app template are important, because they speed up our work, and the work of other technical people. But they don't mitigate the need for better, web-first publishing systems — something that can help diffuse web thinking from a producer-only skill to something that's available throughout the newsroom.
This week, if you want to be horrified by our grim meathook future, check out these posts from Seattle Times news librarian Gene Balk on vaccination rates at Washington State schools. There's a searchable data table and a map, but I'll spoil it for you: a large proportion of parents should probably pack surgical masks and antibiotics with their kids' lunches, because herd immunity is basically a thing of the past.
This kind of database-driven reporting is a staple of Gene's "FYI Guy" blog, and readers seem to enjoy it. Done right, it can help flesh out local coverage in interesting ways, explore topics that are off the beaten path, and find connections that we might otherwise miss. That said, I don't think you can stress enough how much of that depends on the quality of the reporter: Gene is a great researcher, and not everyone has his skills and experience.
By coincidence, yesterday Melissa Bell at Vox announced that they're (re)entering the field of data journalism in a almost parodically-titled post. I'm a little confused about the timing, since I thought data journalism was a part of their whole raison d'etre, but maybe I'm confusing them with a different scrappy, SEO-oriented news startup. Regardless, welcome to the party! After name-checking Philip Meyer's Precision Journalism, Bell adds a list of nine basic guidelines they plan to use. It's not a bad list, although several items are inoffensively bland (has anyone ever aspired to produce content that isn't "relevant and useful?").
- Vox will work to provide the most relevant and useful data behind the news, when you need it, in ways that help you understand the stories that matter most.
- We will work to make all the data behind our stories available to you to download and play with for yourself.
- We want you to improve on what we’ve done, to play with the data, visualize it, and help us analyze it — and make our work better.
- We will prioritize building data sets that can feed many stories, rather than focusing on one-off projects.
- Our data visualizations will be clear, concise, and deep — to help you understand our editorial better. They will adhere to design rules which ensure their accuracy and transparency.
- In the event we make a mistake (they do happen), we will swiftly and clearly clarify, correct, and communicate that as transparently as we can.
- We will curate and showcase the best data infographics and visualizations on the web.
- Visualizations we produce in-house will work well on as many platforms as possible: if you view it on a smartphone, it will function as well as it does on web.
- We will curate and publish the best content that our community of readers produces. Our data journalism is as much about you, the community, as it is about us: this is a partnership.
Some of these goals are particularly strong, and we share them at the Seattle Times. Take #2, for example: not only do I think it's important that we publish the data on which our visualizations are built whenever possible, but we also open-source our graphics so that people can see the methodology we used. It's also just good sense to be mobile-friendly (#8), although I personally believe that there are some times when a story simply can't be fully told on a 4" screen.
I'm less sure about curation, either from readers (#9) or around the web(#7), particularly in conjunction with accuracy and corrections (#6). One of the strengths of a newsroom is supposed to be fact-checking, but it's not clear to me what the process is for verification of third-party visualizations, or if Vox plans to do so at all (it hasn't been evident to me as a reader that they do it now). Which is too bad, because I think a kind of real-time "Snopes for bad reporting" is a site I'd definitely support.
But I'm really most skeptical of #4, which Bell elsewhere refers to as "finding, cleaning, and setting up data streams so that they can be the source for repeated stories." It's not that I think it's necessarily a stupid idea. I'm just not sure that it's effective, based on my experience. Data stories are just reporting. Data streams are reporting on top of engineering on top of reporting.
CQ's Economy Tracker, for example, was my team's attempt at a reusable data API, but it turned out to be a frustrating experience to keep it topped off with up-to-date content, the architecture was a hard problem to solve, and the number of stories we pulled out of it probably didn't justify the effort. It turns out that it's hard to find a data set that can actually support a series of articles.
(You may say, at this point, hang on a minute: wasn't Congressional Quarterly an example of exactly what we're talking about? It's a large, data-oriented news organization that sold access to data streams, and maintained datasets that were used to build stories and interactives via the multimedia team. Which is true, but it elides a number of factors: CQ was a single-purpose news site — congress and legislation only — with a huge number of reporters feeding the beast and a large technical staff to tend to it. Vox does not have those advantages, since it's a general-audience, international news site with a much smaller staff.)
More importantly, a "data stream," like an API, demands maintenance which quickly becomes a drag on the amount of time that can be spent on efforts outside those streams. That's doubly true if you make them public, and people start relying on them. Will will Vox sunset these data streams, if they stop being useful internally? What are the cutoff criteria? How will they let people know before the source is shut down? Most importantly, how much time will be taken away from reporting to maintain the data products?
When I joined at the Seattle Times, I made a pitch to editors that was a little different: instead of designing long-running services, we generally build news apps that are scoped to a specific point in time. In other words, we make stories, the same as the rest of the newsroom does. And just as you wouldn't normally ask a reporter to go back and update all their old stories when new events happen, we don't maintain news apps more than a week or two after publication (barring, of course, normal corrections and serious bug-fixes). Our entire development stack, in fact, is based on this assumption — that's why we publish static files to S3 (which is cheap and easy), instead of running a Rails/Laravel/Node server (which is expensive and hard).
Maybe for Vox, this isn't a problem. After all, they're the people with the "poor man's Wikipedia" card stacks that they maintain for topics over many months, and the evergreen experiments. At the very least, though, it does highlight a very real distinction that goes (in my opinion) beyond "data journalism" and to the core of the digital news mission. Are we building general systems and tools to cover unique stories? Or are we optimizing for semi-predictable products built around APIs and data sources? I'm leaning toward the former because I think it's a better match for a messy, unpredictable, human world. But best of luck to Vox with the latter.
While we've still got plenty of interesting projects in the works, the Seahawks rampage into the Super Bowl has pretty much taken over the News Apps budget at the Seattle Times this week. As a result, we've got some interesting interactives you might have seen:
More to come, obviously, as the road to the Super Bowl continues! Or, as Marshawn Lynch would say, "Yeah."
This week The Seattle Times published a piece I've been looking forward to all year: Sell Block details the way that the Washington correctional industries have fallen down on the promises that they made over three decades ago. I did the logo design for this series, after daydreaming at the bus stop one day, and I also put together the maps for it.
From a development perspective, this project is also noteworthy as a test case for the custom elements that I've been pushing hard at the Times. Both maps are embedded through <responsive-frame> elements, which has supplanted our use of Pym.js, and the statewide map was built on top of <leaflet-map>. Custom elements continue to be a phenomenal way to develop and deploy web apps. In fact, I have an article up on Source, the OpenNews blog, about how we used them in the elections and this project, and calling for more news developers to adopt them.
I had hoped, personally, that we were past the app craze in newsrooms, particularly since the New York Times (eternally the canary in the coalmine for the rest of the industry) started killing off its unsuccessful subscription apps. But there's a sucker born every minute, and this time it's the Washington Post, which is launching a special Kindle app, the main goal of which seems to be to remind everyone that Jeff Bezos bought the Post last year:
The app, which was designed to reduce the noise of the Web to something as streamlined as a print publication, will be automatically added to certain Kindle Fire tablets as part of a software update. It will feature two editions each day, at 5 a.m. and 5 p.m. Eastern time, when the company believes it will reach the most readers.
Two things stick out in that paragraph: first, the "noise of the Web," as though that's a thing that exists in and of itself and not a product of newspaper websites being largely assembled at the whims of a huge number of competing interests (advertising, editorial, IT, etc). If your web site is noisy, it's because you made it that way, and maybe you should fix it instead of launching a new platform.
Second, two editions? In a world of twenty-four hour online news, someone's making a digital news publication that updates (with exceptions for breaking events) twice a day? That's not a strategy for reaching readers, it's a sop to a print-oriented workflow that has to produce distinct physical "products" instead of a stream of content. It's not like they have to lay out a page, so what's the point? Why make people wait?
I expect this thing to go the way of The Daily within a year, quietly killed when the Post announces some new shiny object, probably. Of course, as a long-time web partisan, I think launching another native journalism app is a silly move anyway. The reasons for this are well-rehearsed and familiar: ease of production, greater audience reach, and creating a single path for content. But ultimately what makes native news apps fail is that they can't interoperate with other services the way the web can.
A lot of ink has been spilled about the NYTimes innovation report, for better or worse, but one of the big takeaways for me was the graph on page 23 of home page visitors compared to page views. The Times has seen no real drop-off in overall traffic, but the number of people seeing the home page has dropped by half over the last two years alone. And the reason for this is simple: most people don't go looking for journalism anymore. It finds them instead, when stories are shared through Facebook and Twitter and (increasingly less) RSS/Atom feeds.
Whether you're thinking of your app as a new home page or as a new publishing platform entirely, this trend seems equally grim — a choice between apathy or obscurity. It's probably possible, somehow, to make an app share to Facebook or Twitter. But it's never going to be as quick, as smooth, or as easy as sharing to those services via a simple URL. As much as anything else, this dooms native news apps from the start: if users can't share your content, it might as well be stored in a sealed vault. If you make the app share a web link as a workaround, everyone ends up on the site anyway, so why bother creating the app in the first place?
(Incidentally, this is why the line tossed around by some pundits that "native apps are too on the web, because they use HTTP" is nonsense. Does your native app have a front-facing URL? Can I link someone to a specific page in your app? No? Then it's not on the web.)
Don't get me wrong: I'm not necessarily sanguine about this state of affairs. The increasing role of social media in discovery and spread of journalism is worrying, from the silencing effects to the loss of control for publishers. One day I'd like to think we'll be out from underneath Facebook's thumb, or anyone else seeking to wall off the web until it pays up. We need better solutions for that problem, ones that don't make us sharecroppers on anyone else's land.
Meanwhile, however, this is the world we live in: the social networks dominate, and ultimately they run on URLs, not on binary blobs stored in a native bundle. Publishing two gimmicky "editions" a day through a fancy app, on a device that relatively few people use, is not going to change that anytime soon. If you want people to read your news, it had better be on the (sharable, linkable, endlessly flexible) web.
You might have heard that there was an election this last week. Like every news organization, The Seattle Times had a live results page, powered by a Node-based scraper. It did pretty well: we had no glitches with pulling results, and the response has been solid. It also generated the source data for the print edition. Oh, and we put bunting on the front page, which is not something you get to do every day.
Behind the scenes, however, that results page has another interesting feature: as far as I'm aware, it's the first use of Web Components (at least, the custom elements part) in production by a news organization. Each of the Washington maps on the page is a custom-built <svg-map> element, which handles loading the image document and provides a set of convenience methods for manipulating the map once it's available.
SVG is one of those technologies that I really want to like, but has always been a total pain to actually use. It's an annoying format to author, doesn't seem to actually save any space compared to bitmap images, and has a ton of edge cases even in "standard" browsers (for example, Chrome will forget the state of an SVG document inside an object tag if that tag or its parents are set to display: none). Wrapping it up in a component that would manage its lifecycle and quirks for me just seemed like a no-brainer.
To create the component, I used Andrea Giammarchi's registerElement() shim instead of Polymer's polyfill layer — Giammarchi's script only shims the custom element portion of Web Components, but it works all the way back to IE9 and (more importantly) is only 2KB. On top of that, I used RSVP.js to create a quick shared cache for SVG source documents, ICanHaz for my templating, and a custom module called Savage to do SVG class/style manipulation.
From the outside, however, you don't need to know any of that. Instead, the interface is simple:
As a developer, I'm really excited by the potential of these new custom elements. Although I had used them at ArenaNet for building the new Guild Wars 2 trading post, those were used to create tight integration with the in-game interface, and only needed to work in a single browser. This is the first time I've used them in a wider ecosystem, and they worked like a charm.
But as a library consumer, and particularly as a harried newsroom dev, I think web components have a tremendous potential to make complex behavior way easier to build and train for. Take the afore-mentioned Landline, for example: wouldn't it be nice to simply include a script tag (or an HTML import) and then be able to write <landline-map> tags into the page, with an attribute pointing to a CSV or a Google Sheet containing the necessary data? Or consider Pym, NPR's responsive iframe library that's so great I forked and rewrote big chunks of it. Right now, using Pym on the parent page requires including the script, adding a dummy element, and then initializing the script — why shouldn't it just be <pym-embed> instead?
Distributing libraries not as modules or loose scripts, but as chunks of new HTML functionality, has the potential to radically change how we create new content on the web in the future. Newsrooms, which are always under pressure and often consume "pre-made" tools for interactive elements like timelines and galleries, are a perfect use-case for Web Components. After this election experience, I'm planning to lean heavily on them whenever possible, and I'm hoping other people will as well.
I'm very proud to say that "Loaded with lead," a Seattle Times investigation into the ways that gun ranges poison their customers and workers, went live this weekend. I worked on all four interactives for this project, as well as doing the header design and various special effects. We'll have a post up soon on the developer blog about those headers, but what I'd like to talk about today is one particular graphic — specifically, the string-of-pearls chart from part 2.
The data underlying the pearl chart is a set of almost 300 blood tests. These are not all tests taken by range workers in Washington, just the ones that had to be reported after exceeding the safe threshold of 10 micrograms per deciliter. Although we know who some of the tested workers are, most of them are identified only by an anonymous patient ID and the name of their employer. My first impulse was to simply toss the data into a scatter chart, but as is often the case, that first impulse proved ill-advised:
Talking with reporters, what emerged was that the time dimension was not really important to this dataset. What was important was to show that there was a repeated pattern of negligence: that these ranges posted high numbers repeatedly, over long periods of time (in several cases, more than five years). Once we discard a strict time axis, a lot more interesting options open up to us for data visualization.
One way to handle this would be with a traditional box and whiskers plot, which shows the median and variation within a statistical set. Unfortunately, box plots are also wonky and weird-looking for most readers, who are not statisticians and would not know a quartile if it offered them a grilled cheese sandwich. So one prototype simplified the box plot down to its simplest form — probably too simple: I rendered a bar that began and ended within the total range of test results for each range, with individual test results marked with a line inside that bar.
This version of the plot was visually interesting, but it had flaws. It made it easy to see the general level of blood tests found at each range, and compare gun ranges against each other, but it didn't show concentration. Since a single tick mark was shown within the bar no matter how many test results at a given level, there was litttle visual difference between two employers with the same range of test results, even if one employer mainly showed results at the top of the range, and the other results were clustered at the bottom. We needed a way to show not only level, but also distribution, of results.
Given that the chart was already basically a number line, with a bar drawn from the lowest to the highest test result, I removed the bar and replaced the tick marks with circles that were sized to match the number of test results at each amount. Essentially, this is a histogram, but I liked the way that the circles overlapped to create "blobs" around areas of common test results. You can immediately see where most of the tests fall for each employer, but you don't lose sight of the overall picture (which in some cases, like the contractors working outside of a ventilation hood at Wade's, can be horrific — almost three times the amount considered dangerous by the CDC). I'm not aware of anyone else who's done this kind of chart before, but it seems too simple for me to be the first to think of it.
I'd like to take a moment here to observe that pretty much all data visualization comes down to translating information into a form that our visual systems are evolved to quickly understand. There's a great post on how that translation functions here, with illustrations that show where each arrangement sits on a spectrum of perceived accuracy and meaning. It's not rocket science, but I think it's a helpful perspective: I'm just trying to trick your visual cortex into absorbing a table's worth of data at a glance.
But what I've been trying to stress in the newsroom from this example is less technical, and more about how much effective digital journalism comes from the simple process of iteration and self-evaluation. We shouldn't expect to come up with a brilliant interactive on the first try every time, or even any of the time. I think the string-of-pearls is a great example of that, going from a visualization that I was confusing and overly-broad to a more focused graphic statement, thanks to a lot of evolution and brainstorming. It was exhausting work, but it's become my favorite of the four visualizations for this project, and I'm looking forward to tweaking it for future stories.