I'm not sure what it says about Seattle that one of our biggest yearly events is a May Day protest that wrecks havoc across big chunks of downtown. What even competes? The Blue Angels shut down traffic on the bridges once a summer, and there's the Sea Fair downtown, but reception to those is always pretty muted in my experience. International Workers Day is the big show.
The May Day map I put together to track our reporters has quickly become one of my favorite projects for the Seattle Times. It was real-time, it posed interesting data challenges, and it really exploited our <leaflet-map> element more than anything else we've done so far. While I also wrote a post on it for our dev blog at work, I wanted to call out a couple of other interesting points here.
The most interesting technical detail here is the use of the Twitter streaming API, which delivers nearly instant updates for a search query (either on users, geolocation, or keyword). Node is a great fit for this, with the twitter module offering a readable stream that fires events as new items come in. Our scaffolding, on the other hand, is not intended to be run as a long-standing process, and I didn't really want to retrofit Grunt into a general-purpose application framework. I ended up writing the Twitter part of the app as a completely separate, continuous Node process, which then dumped out its data as a JSON file and started a standard build/deploy in a child process whenever new data arrived.
To store the tweets from the stream, the application uses a SQLite3 database, since that's the easiest way to query and update data. A static data store like this is not something that we've used on projects before, and I don't know if I'd re-use it again. Using SQLite itself is always a pleasure, but reliance on a local database means that I couldn't just clone the project from home and update it when I wanted to change the coloring on Saturday morning. Using cloud storage, like Google Sheets, has a lot of advantages for distributed and remote development.
Working with Twitter itself is an interesting problem, because it's clear that the company has no real coherent plan for outside developers. Over the last few years, the API for user access has been increasingly limited and broken as Twitter tried to drive third-party clients (which don't show ads and don't make money) out of existence. On the other hand, if you are building a Twitter bot, which our map effectively is, it remains a pretty useful and effective service for pub/sub communication. I'm not sure it says very much about Twitter's strategy that they'll let bots run wild while ordinary people are locked into a client monoculture, but that's honestly the least of my frustrations with them at this point.
All that said, I would personally use with this stack again in a heartbeat. Twitter is not the highest social traffic source for the Seattle Times, but almost all of our reporters use it anyway, and it's much nicer to program against compared to Facebook. The impending dilemma is if (or when) Twitter will decide to switch to a "curated" (read: algorithmically-tampered) stream a la Facebook's timeline. When that happens, its value to me as a news developer drops basically to nothing, because I won't be able to guarantee message delivery any more.
Which brings me to the most boring but probably most profound lesson of this project: we need a better build server. The May Day map ran on a box in the office we've affectionately dubbed "Cronda," which also currently tests our traffic alert application and previously powered the Seahawks fan map. In each of those cases, we've jury-rigged together a solution for pulling the latest source code and running builds at regular intervals (the cron Grunt task), but it's not optimal. We can't check on those builds remotely, or restart them if something goes wrong.
At some point, we'll probably move our builds from Cronda to an EC2 box that we can access remotely, but doing so doesn't honestly solve the problem — it just makes it less fragile. Eventually, I think we'll need to look into a real build monitor like Jenkins, which can automate deployments, track error logs, and respond to queries in our teach chat. I'm not entirely looking forward to that, since it feels like a very heavyweight solution, but the more complex our applications get, the more a little up-front rigor will pay off.
It has been a busy week, but I wanted to take a moment to recognize my colleagues at The Seattle Times for their tremendous work, resulting in a 2015 Pulitzer Prize for breaking news journalism. Their coverage of the Oso landslide was clear, comprehensive, and accurate, and followup work continues to this day (including one of my first projects for the paper). It's very cool to be working in a newsroom that's the winner of 10 Pulitzer Prizes over the years, and I'm looking forward to being here when we win #11.
We have sent several people from the newsroom, including myself, to journalism conferences over the last few months. Most conferences are about 50% inspirational and 50% crap (tilted heavily crap-wards in the keynotes), but you meet good people and you get to see the nuts and bolts behind the scenes of some of the best interactive news stories published.
It's natural to come back from a conference with a kind of inferiority complex, and equally easy to conclude that we're not making similar rich presentations because we don't have the cool tools that those other (richer, more tech-savvy) newsrooms have. We too, according to this train of thought, need to be coding elaborate visualization generators and complicated new CMS features — or, as Ryan Pitts from Mozilla said to me last weekend at the Society for News Design workshop, "let's not rest until every paper in the country has built its own charting application."
I think better newsroom tech is important, but let's play devil's advocate for a bit with an unpopular hypothesis: developing tools for the editors and reporters at your newspaper is a waste of your time, and a distraction from the journalism you should be doing.
Why a waste of your time? Partly because newsroom tools get a lot less uptake than you probably think they do (certainly less than we'd hope they would). I've written a lot of internal applications in my time, and they've never been particularly popular, because most reporters and editors don't care. They're too busy doing journalism to use your solution (which is as it should be), and they are probably not big on technology anyway (I have a lot of reporters who can't use Excel, which pains me greatly). Creating tools for reporters is, most of the time, attacking the problem at the wrong point.
For many newsrooms, that wasted time will end up being twice as expensive, because development resources are scarce and UI is hard. Building a polished, feature-filled chart generator that the average journalist can use will take at least a couple of programmer months, which is time those developers aren't working on stories and visualizations that readers want. Are you willing to sacrifice that time, especially if you can't guarantee that it'll actually get used? That's a pretty big gamble, unless you have the resources of the New York Times. You're probably better off just going with an off-the-shelf package, or even finding a simpler solution.
I don't think it's a coincidence that, for all the noise people make about the new data journalism startups like Vox and FiveThirtyEight, 99% of their chart output does not come from a fancy tool or a complex interactive: they post JPEGs. And that's fine! No actual reader has ever complained about having to look at a picture of a graph instead of a souped-up vector rendering (in Vox's case, they're too busy complaining that the graph was stolen from someone else, but that's another story). JPEG is a perfectly decent solution when it comes to simply telling the story across the entire web platform — in fact, it's a great embodiment of "do the simplest thing that works," which has served me well as a guiding motto in life.
So, as a rule of thumb: don't build charting libraries. Don't build general-purpose databases. Don't build drag-and-drop slideshows. Leave these things to other people, who have time and energy to build them for a living. Does this mean you shouldn't create tools at all? No, but the target audience should be you, the news developer, and other semi-technical newsroom staff like the web producers. In other words, make technology for the people who will actually use it, and can handle something that's not polished to a mirror sheen.
I believe this is the big strength of web components, and one reason I'm so bullish on them at the Seattle Times. They're not glossy, end-user products, but they are a great balance between power and accessibility for people with a little technical skill, and they're very fast to build. If the day comes when we do choose to invest in a slicker newsroom app, we can leverage them anyway, the same way that the NYT's fancy chart designers are all based on the developer-oriented D3 library.
In the meanwhile, while I would consider an anti-tool stance a "strong opinion weakly held," I think there's a workable philosophy there. These days, I feel two concerns very strongly (outside of my normal news/editorial production, of course): how to get the newsroom to make use of our skills, and how to best use the limited developer resources we have. A "no tools" guideline is not an absolute rule, but it serves as a useful heuristic to weed out the kinds of projects that might otherwise take over our time.
The honest answer is "about five years of practice," but that's not the whole story (nor is it something I can take to a curriculum planning committee). I think there are two areas of growth that students need to be aware of, and that I'm planning on stressing for this quarter: tooling and functional programming.
Learning their way around this trio is going to be a huge challenge for my students, most of whom still live in a world where individual files are edited and sent to the browser as-is (possibly with a PHP include or two). They haven't built applications with RESTful routes, or written client-side code in a module system. SCC hasn't typically stressed those techniques, which is a shame.
I'm happy to be the person who forces students into the deep end, but I do want to make sure they have a good, structured experience. Throwing everything at students is a quick way to make sure that they get overwhelmed and give up (not a hypothetical scenario: the previous ITC 298 class had exactly that problem, and ended poorly). To ease them in, we'll try building the following sequence of exercises in our directed lab sessions:
The progression starts with Node, and then builds out gradually so that each step conceptually depends on a previous lesson. Along the way, students will learn a lot about how to structure an application across all three of these environments — which brings us to the second, and probably harder, focus of the class.
The hard part of writing for Node is that you must embrace some degree of functional programming: the continuation-passing style used in the core APIs makes it inescapable. But the great part of writing for Node (especially as the first section of the course) is that it's actually a fairly gentle ramp-up. Callback functions are not that far from event listeners, and the ubiquitous async library softens the difficulty of mapping an array functionally. Between the two, there's no shortage of practice, since there's literally no other way to write a Node program.
That's the other strategy behind the tooling sequence I've laid out. We'll start from Node, and then build toward increasingly complex functional constructs, like modules, constructors, and promises. By the time the class have finished their final projects, they should be old hands at callbacks and closures, which will serve them well in almost any language.
The specifics of this quarter are still a little bit in flux, and will likely remain so, since I think it's good to be flexible the first time teaching a class. But if you're interested in following along, feel free to check out the class repo, which contains the syllabus, supporting materials, and example code so far. Issues and pull requests are also welcome!
In the last couple of weeks, a few more of my Seattle Times projects have gone live — namely, the animated graph in this story about EB-5 visa growth, and the Seattle architecture quiz. Both use the FLIP animation technique I wrote about a few weeks ago, although it's much more elaborate in the EB-5 graph, which animates roughly 150 elements at 60fps on older mobile devices.
I saw a lot of shocked reactions when Nintendo announced it would be partnering with another company to make smartphone games. The company was quick to stress that it wouldn't be moving entirely to app stores controlled by third parties: these games will not be re-releases of existing titles, and Nintendo is still working on new dedicated console hardware for the next generation. You shouldn't expect New Super Mario on your phone anytime soon. Basically, their smartphone games will serve as ads for the "real" games.
Unlike a lot of people, I've never really rooted for Nintendo to become a software-only company. Other companies that make that jump often do so to their detriment — look at Sega, which lost a real creative spark when they got out of the hardware business — and it's even more true for Nintendo, which has always explored the physical aspects of gaming as much as the virtual. The playful design of the GameCube controller buttons, or the weirdness of a double-screened handheld, or the runaway popularity of Wii Sports, are the result of designers who are encouraged to hold strong opinions. A touchscreen, on the other hand, is a weak opinion — even no opinion, as it imitates (but never really emulates) physical controls like buttons or joysticks.
But here's the other thing: what Nintendo represents on dedicated handheld hardware, as much as wacky design chops, is a sustainable market. I play a lot of Android games, I own a Shield, I'm generally positive on the idea of microconsoles. Even given those facts, a lot of the games I play on the go are either emulators or console ports, because the app store model simply does not support development beyond a single mechanic or a few hours of gameplay. The race to the bottom, and the resulting crash of mobile game prices, means that you will almost never see a phone game with the kind of lifespan and complexity you'd get out of even the lamest Nintendo title (Yoshi Touch & Go aside).
I don't think everything Nintendo produces is golden, but they're reliable. People buy Nintendo games because you're pretty much guaranteed a polished, enjoyable experience, to the point where they can start with an expanded riff on a gimmick level and still end up with a solid gameplay hit. They're the Pixar of games. And as a result of that consistency, people will pay $40 for first-party Nintendo titles, largely sight-unseen. This creates a virtuous cycle: the revenue from a relatively-expensive gaming market lets them make the kind of games that justify that cost. It's almost impossible to imagine Nintendo being able to sustain the same halo in a $1-5 game market.
There's room for both experiences in the gaming ecosystem. Microsoft, Sony, and Steam will all provide big-budget, adult-oriented games. The app stores are overflowing with shorter, quirkier, free-to-play fare. Nintendo's niche is that they crossed those lines: oddball software for all ages that was polished to a mirror sheen. Luckily, even though observers seem convinced that Nintendo is doomed, the company itself seems well aware of where their value lies — and it's not on someone else's platform.
About nine months ago, I made the first check-in on the Seattle Times news app template. Since that time, it's been at the heart of pretty much everything we've done at the Times, ranging from big investigative projects to Super Bowl coverage to dog name analysis. We've adapted it to form the basis of our web component stack, and made a version that automates Leaflet map creation. It's been a pretty great tool, used by news apps developers, producers, and graphics team members alike.
That said, I think in digital journalism we often talk in glowing terms about our tools, but we don't nearly as often discuss the downsides they possess. So let's be honest with ourselves: I love this scaffolding, but it's not perfect. It has issues. And I think those issues say interesting things about not only the template itself, but also newsroom culture, and the challenges of creating tools that can operate there.
What are the common threads here? While you could point to the static page approach as being part of the issue, I actually think what causes a lot of these problems is that the intended audience for the news app template is both broad and narrow. It's broad in that its users range from novice journalists to experienced developers (and, indirectly, non-technical editors and reporters feeding data into Google Sheets). It's narrow in that the actual production still requires a high level of technical comfort: familiarity with the command line, new kinds of tooling, and some ability to roll with unexpected bugs.
This is a tough, and self-contradictory, audience for a visualization toolkit. It's not, however, out of character for a general-purpose dev framework. And indeed, when we talk about app scaffolds from any news organization (not just The Seattle Times), that's what they are. They're written to be fast, to be portable, and to generate static files, because those are our priorities as deadline-driven journalists. They are also the far end of a range of newsroom tools, where news apps are at one end and pre-built widgets live on the other. I'm not really worried about where the template lives on that range, and I'm certainly not planning on reducing the complexity — I think it's at a sweet spot right now. But I do worry about the ways that it (and our CMS) fit into newsroom culture.
At the Times, like in many newsrooms, the online presence is largely run by "producers," who curate the stories on the home page and handle the print-to-digital transition process (it's not the same as a "producer" in software development). This process is complicated and highly-skilled, because news CMS systems are generally terrible. The web production staff also often work on projects that would, in print, fall under page design: building complex HTML presentations for special stories. This isn't because they're trained designers: producers are often younger, and while it's not entry-level work, it's close. They end up doing this work because trained, HTML-fluent designers are rare, and because nobody else in the newsroom bothers to learn web design.
As a result, we end up in a funny situation: the only people in the newsroom who really understand the web are the producers. Editors and reporters are discouraged from becoming more technically savvy because the workflow is print-first, and the CMS is so intimidating. Meanwhile, producers rarely become editors or reporters because the newsroom can't afford to lose their skills. There's a tremendous gap in newsroom culture between people who produce the content, and people who actually understand the medium in which that content is consumed. While the tooling is not entirely responsible for that, it is a contributing factor.
I think the challenge we face, as newsroom developers, is to be always aware and vigilant of that gap and its causes. Tools like the news app template are important, because they speed up our work, and the work of other technical people. But they don't mitigate the need for better, web-first publishing systems — something that can help diffuse web thinking from a producer-only skill to something that's available throughout the newsroom.
It's an accepted truth on the web that fast pages are better for users — people stay on them longer, follow more links from them, and generally report being happier with them. I think a lot about performance on my projects, because I want readers to be thinking about the story, not distracted by slow load times.
It's possible that I've been more aware of it, just because I've been working on a project that involves smoothly animating a chart using regular HTML instead of canvas, but it seems like it's been a bad month for that kind of thing. First Peter-Paul Koch wrote a diatribe about client-side templating, insisting that it's a needless performance hit. Then Flipboard wrote about discarding traditional elements entirely, instead rendering everything to a canvas tag in pursuit of 60 frames/second animations. Ironically, you'll notice that these are radically different approaches that both claim they create a better experience.
Instead of just sighing while the usual native app advocates use these posts to bash the web, and given that I am working on a page where high-performance mobile animations are a key part, I thought it'd be nice to talk about some experiments I've run with the approaches found in both. There are a lot of places where the web platform needs help competing on mobile, no doubt. But I'd prefer we talk about actual performance problems, and not get sidetracked into chasing down scattered criticisms without evidence.
At the other extreme is Flipboard's experiment with canvas rendering. Instead of putting everything in the document, like normal websites, they put a full-screen canvas image up and render everything — text, images, animations, etc. — manually to that buffer. You can try a demo out on your device here. On my Nexus 5, which is a reasonably new device running the latest version of Chrome, it's noticeably choppy. My experience with canvas is that Chrome's implementation is actually much faster than Safari, so I don't expect it to be smooth on iOS either (they've blacklisted tablets, so I can't be sure).
In order to get this "fluid" experience, here's what the Flipboard team threw away:
Again, I'm not claiming that my use case is a perfect analogue. I'm animating a graphic in response to a single button press, and they're attempting to create an "infinite scroll" (sort of — it's not really a scroll so much as an animated pager). But this idea that "the DOM is lava" and touching it will cause your reader's phones to instantly burst into flames of scorn seems patently ridiculous, especially when we look back at that list of everything that was sacrificed in the single-minded pursuit of speed.
Performance is important, and I care deeply and obsessively about it. As a gamer and a graphics nerd, I love tweaking out those last few frames per second, or adding flashy effects to a page. But it's not the most important thing. It's not more important than making your content available to the blind or visually impaired. It's not more important than providing standard UI actions like copy-and-paste or "open in new tab." And it's not more important than providing a fallback for older and less-powerful devices, the kind that are used by poor readers. Let's keep speed in perspective on the web, and not get so caught up in dogma that we abandon useful techniques like client-side templating and the DOM.
This week, if you want to be horrified by our grim meathook future, check out these posts from Seattle Times news librarian Gene Balk on vaccination rates at Washington State schools. There's a searchable data table and a map, but I'll spoil it for you: a large proportion of parents should probably pack surgical masks and antibiotics with their kids' lunches, because herd immunity is basically a thing of the past.
This kind of database-driven reporting is a staple of Gene's "FYI Guy" blog, and readers seem to enjoy it. Done right, it can help flesh out local coverage in interesting ways, explore topics that are off the beaten path, and find connections that we might otherwise miss. That said, I don't think you can stress enough how much of that depends on the quality of the reporter: Gene is a great researcher, and not everyone has his skills and experience.
By coincidence, yesterday Melissa Bell at Vox announced that they're (re)entering the field of data journalism in a almost parodically-titled post. I'm a little confused about the timing, since I thought data journalism was a part of their whole raison d'etre, but maybe I'm confusing them with a different scrappy, SEO-oriented news startup. Regardless, welcome to the party! After name-checking Philip Meyer's Precision Journalism, Bell adds a list of nine basic guidelines they plan to use. It's not a bad list, although several items are inoffensively bland (has anyone ever aspired to produce content that isn't "relevant and useful?").
- Vox will work to provide the most relevant and useful data behind the news, when you need it, in ways that help you understand the stories that matter most.
- We will work to make all the data behind our stories available to you to download and play with for yourself.
- We want you to improve on what we’ve done, to play with the data, visualize it, and help us analyze it — and make our work better.
- We will prioritize building data sets that can feed many stories, rather than focusing on one-off projects.
- Our data visualizations will be clear, concise, and deep — to help you understand our editorial better. They will adhere to design rules which ensure their accuracy and transparency.
- In the event we make a mistake (they do happen), we will swiftly and clearly clarify, correct, and communicate that as transparently as we can.
- We will curate and showcase the best data infographics and visualizations on the web.
- Visualizations we produce in-house will work well on as many platforms as possible: if you view it on a smartphone, it will function as well as it does on web.
- We will curate and publish the best content that our community of readers produces. Our data journalism is as much about you, the community, as it is about us: this is a partnership.
Some of these goals are particularly strong, and we share them at the Seattle Times. Take #2, for example: not only do I think it's important that we publish the data on which our visualizations are built whenever possible, but we also open-source our graphics so that people can see the methodology we used. It's also just good sense to be mobile-friendly (#8), although I personally believe that there are some times when a story simply can't be fully told on a 4" screen.
I'm less sure about curation, either from readers (#9) or around the web(#7), particularly in conjunction with accuracy and corrections (#6). One of the strengths of a newsroom is supposed to be fact-checking, but it's not clear to me what the process is for verification of third-party visualizations, or if Vox plans to do so at all (it hasn't been evident to me as a reader that they do it now). Which is too bad, because I think a kind of real-time "Snopes for bad reporting" is a site I'd definitely support.
But I'm really most skeptical of #4, which Bell elsewhere refers to as "finding, cleaning, and setting up data streams so that they can be the source for repeated stories." It's not that I think it's necessarily a stupid idea. I'm just not sure that it's effective, based on my experience. Data stories are just reporting. Data streams are reporting on top of engineering on top of reporting.
CQ's Economy Tracker, for example, was my team's attempt at a reusable data API, but it turned out to be a frustrating experience to keep it topped off with up-to-date content, the architecture was a hard problem to solve, and the number of stories we pulled out of it probably didn't justify the effort. It turns out that it's hard to find a data set that can actually support a series of articles.
(You may say, at this point, hang on a minute: wasn't Congressional Quarterly an example of exactly what we're talking about? It's a large, data-oriented news organization that sold access to data streams, and maintained datasets that were used to build stories and interactives via the multimedia team. Which is true, but it elides a number of factors: CQ was a single-purpose news site — congress and legislation only — with a huge number of reporters feeding the beast and a large technical staff to tend to it. Vox does not have those advantages, since it's a general-audience, international news site with a much smaller staff.)
More importantly, a "data stream," like an API, demands maintenance which quickly becomes a drag on the amount of time that can be spent on efforts outside those streams. That's doubly true if you make them public, and people start relying on them. Will will Vox sunset these data streams, if they stop being useful internally? What are the cutoff criteria? How will they let people know before the source is shut down? Most importantly, how much time will be taken away from reporting to maintain the data products?
When I joined at the Seattle Times, I made a pitch to editors that was a little different: instead of designing long-running services, we generally build news apps that are scoped to a specific point in time. In other words, we make stories, the same as the rest of the newsroom does. And just as you wouldn't normally ask a reporter to go back and update all their old stories when new events happen, we don't maintain news apps more than a week or two after publication (barring, of course, normal corrections and serious bug-fixes). Our entire development stack, in fact, is based on this assumption — that's why we publish static files to S3 (which is cheap and easy), instead of running a Rails/Laravel/Node server (which is expensive and hard).
Maybe for Vox, this isn't a problem. After all, they're the people with the "poor man's Wikipedia" card stacks that they maintain for topics over many months, and the evergreen experiments. At the very least, though, it does highlight a very real distinction that goes (in my opinion) beyond "data journalism" and to the core of the digital news mission. Are we building general systems and tools to cover unique stories? Or are we optimizing for semi-predictable products built around APIs and data sources? I'm leaning toward the former because I think it's a better match for a messy, unpredictable, human world. But best of luck to Vox with the latter.
While we've still got plenty of interesting projects in the works, the Seahawks rampage into the Super Bowl has pretty much taken over the News Apps budget at the Seattle Times this week. As a result, we've got some interesting interactives you might have seen:
More to come, obviously, as the road to the Super Bowl continues! Or, as Marshawn Lynch would say, "Yeah."