Mile Zero :: this space intentionally left blank

April 16, 2015

Empty-handed

We have sent several people from the newsroom, including myself, to journalism conferences over the last few months. Most conferences are about 50% inspirational and 50% crap (tilted heavily crap-wards in the keynotes), but you meet good people and you get to see the nuts and bolts behind the scenes of some of the best interactive news stories published.

It's natural to come back from a conference with a kind of inferiority complex, and equally easy to conclude that we're not making similar rich presentations because we don't have the cool tools that those other (richer, more tech-savvy) newsrooms have. We too, according to this train of thought, need to be coding elaborate visualization generators and complicated new CMS features — or, as Ryan Pitts from Mozilla said to me last weekend at the Society for News Design workshop, "let's not rest until every paper in the country has built its own charting application."

I think better newsroom tech is important, but let's play devil's advocate for a bit with an unpopular hypothesis: developing tools for the editors and reporters at your newspaper is a waste of your time, and a distraction from the journalism you should be doing.

Why a waste of your time? Partly because newsroom tools get a lot less uptake than you probably think they do (certainly less than we'd hope they would). I've written a lot of internal applications in my time, and they've never been particularly popular, because most reporters and editors don't care. They're too busy doing journalism to use your solution (which is as it should be), and they are probably not big on technology anyway (I have a lot of reporters who can't use Excel, which pains me greatly). Creating tools for reporters is, most of the time, attacking the problem at the wrong point.

For many newsrooms, that wasted time will end up being twice as expensive, because development resources are scarce and UI is hard. Building a polished, feature-filled chart generator that the average journalist can use will take at least a couple of programmer months, which is time those developers aren't working on stories and visualizations that readers want. Are you willing to sacrifice that time, especially if you can't guarantee that it'll actually get used? That's a pretty big gamble, unless you have the resources of the New York Times. You're probably better off just going with an off-the-shelf package, or even finding a simpler solution.

I don't think it's a coincidence that, for all the noise people make about the new data journalism startups like Vox and FiveThirtyEight, 99% of their chart output does not come from a fancy tool or a complex interactive: they post JPEGs. And that's fine! No actual reader has ever complained about having to look at a picture of a graph instead of a souped-up vector rendering (in Vox's case, they're too busy complaining that the graph was stolen from someone else, but that's another story). JPEG is a perfectly decent solution when it comes to simply telling the story across the entire web platform — in fact, it's a great embodiment of "do the simplest thing that works," which has served me well as a guiding motto in life.

So, as a rule of thumb: don't build charting libraries. Don't build general-purpose databases. Don't build drag-and-drop slideshows. Leave these things to other people, who have time and energy to build them for a living. Does this mean you shouldn't create tools at all? No, but the target audience should be you, the news developer, and other semi-technical newsroom staff like the web producers. In other words, make technology for the people who will actually use it, and can handle something that's not polished to a mirror sheen.

I believe this is the big strength of web components, and one reason I'm so bullish on them at the Seattle Times. They're not glossy, end-user products, but they are a great balance between power and accessibility for people with a little technical skill, and they're very fast to build. If the day comes when we do choose to invest in a slicker newsroom app, we can leverage them anyway, the same way that the NYT's fancy chart designers are all based on the developer-oriented D3 library.

In the meanwhile, while I would consider an anti-tool stance a "strong opinion weakly held," I think there's a workable philosophy there. These days, I feel two concerns very strongly (outside of my normal news/editorial production, of course): how to get the newsroom to make use of our skills, and how to best use the limited developer resources we have. A "no tools" guideline is not an absolute rule, but it serves as a useful heuristic to weed out the kinds of projects that might otherwise take over our time.

10:33 x permalink

April 3, 2015

What's advanced?

Next week, I'll start teaching ITC 298 at Seattle Central College. It's the first special topics class I've actually gotten to teach (a previous class on tooling couldn't get enough students), and it's on a topic near and dear to my heart: namely, intermediate to advanced JavaScript. But what does that actually mean? If web development were a medieval blacksmith shop, what would you need to know in order to move beyond "apprentice" and hit the road as a journeyman? What is it that separates a jQuery dabbler from a serious front-end coder?

The honest answer is "about five years of practice," but that's not the whole story (nor is it something I can take to a curriculum planning committee). I think there are two areas of growth that students need to be aware of, and that I'm planning on stressing for this quarter: tooling and functional programming.

Tooling

Rebecca Murphey recently wrote a revised baseline for front-end developers, just as I was putting together my syllabus. It's still a great guide to the state of the art, even if I don't agree with everything in it. But I would add that modern JavaScript applications tend to be built on three pillars that students need to understand:

Server code written in NodeJS,
Client code built on top of some kind of MVC, and
A build system (also in Node) that weaves the two together.

Learning their way around this trio is going to be a huge challenge for my students, most of whom still live in a world where individual files are edited and sent to the browser as-is (possibly with a PHP include or two). They haven't built applications with RESTful routes, or written client-side code in a module system. SCC hasn't typically stressed those techniques, which is a shame.

I'm happy to be the person who forces students into the deep end, but I do want to make sure they have a good, structured experience. Throwing everything at students is a quick way to make sure that they get overwhelmed and give up (not a hypothetical scenario: the previous ITC 298 class had exactly that problem, and ended poorly). To ease them in, we'll try building the following sequence of exercises in our directed lab sessions:

simple Node script with callbacks
scraper using async/events/streams
basic site using Hapi.js
authenticated site with session-handling
Grunt scripts to build JavaScript with Browserify
simple MVC with Backbone

The progression starts with Node, and then builds out gradually so that each step conceptually depends on a previous lesson. Along the way, students will learn a lot about how to structure an application across all three of these environments — which brings us to the second, and probably harder, focus of the class.

Functional programming

Undoubtably, students need to have a better grasp of JavaScript fundamentals ("the good parts") if they're to be considered intermediate front-end devs. But how do we break down those fundamentals? We could concentrate on inheritance and object orientation, or think of everything as MVC. Maybe we could spend a bunch of time on modules, or deep-dive into how to write high-performance DOM code. Murphey recommends learning ES2015 features, like fat arrows and destructuring. All of these are important pieces of the JavaScript toolkit, but they are more patterns available for use, not core theoretical knowledge.

The heart of the language, however, remains the humble function. Where other languages have modules, private/static properties, blocks, async/await, classes, and list comprehensions, JavaScript just has first-class functions. Astonishingly, this has actually worked out pretty well, but it means you do really need to understand them in order to read and write code — particularly on Node, where callbacks are still the preferred method of handling concurrency.

Typically, functional coding has been one of the more difficult concepts to introduce in my basic class: we spend some time about halfway through the quarter implementing Array.forEach(), and we wire up a lot of event listeners. Map/reduce usually overwhelms most students, however, and call/apply gets a lot of blank stares. These are literally unavoidable in a well-written, modern JavaScript codebase: we have to find a way to approach them if students are to reach the next stage of their professional development.

The hard part of writing for Node is that you must embrace some degree of functional programming: the continuation-passing style used in the core APIs makes it inescapable. But the great part of writing for Node (especially as the first section of the course) is that it's actually a fairly gentle ramp-up. Callback functions are not that far from event listeners, and the ubiquitous async library softens the difficulty of mapping an array functionally. Between the two, there's no shortage of practice, since there's literally no other way to write a Node program.

That's the other strategy behind the tooling sequence I've laid out. We'll start from Node, and then build toward increasingly complex functional constructs, like modules, constructors, and promises. By the time the class have finished their final projects, they should be old hands at callbacks and closures, which will serve them well in almost any language.

The plan

Originally, I joked with people that we'd spend six weeks reading JavaScript: the Good Parts and then six weeks building a chat application. Two students would have enjoyed this, and the rest would have followed me home and murdered me in my sleep. But the two-part plan laid out in this post hopefully marries the practical and the theoretical in a way that will help students grow. They'll learn about the intricacies of JavaScript's functional quirks, but they'll do so by building real applications to solve real problems.

The specifics of this quarter are still a little bit in flux, and will likely remain so, since I think it's good to be flexible the first time teaching a class. But if you're interested in following along, feel free to check out the class repo, which contains the syllabus, supporting materials, and example code so far. Issues and pull requests are also welcome!

16:31 x permalink

March 27, 2015

Construction and architecture

In the last couple of weeks, a few more of my Seattle Times projects have gone live — namely, the animated graph in this story about EB-5 visa growth, and the Seattle architecture quiz. Both use the FLIP animation technique I wrote about a few weeks ago, although it's much more elaborate in the EB-5 graph, which animates roughly 150 elements at 60fps on older mobile devices.

In the case of the architecture quiz, I also added the Babel compiler (formerly 6to5), which turns ES6 code into readable ES5 JavaScript that the average browser can run. Although it's not an enormous change, looking through the original source code will show the new object literal syntax, template strings, and (my personal favorite) fat arrow functions, which do not rebind this and offer a lighter-weight syntax for array sort, map, and filter operations.

I'm not sold on all of the changes in ES6 — I think let is overrated, and the module syntax is pretty terrible — but these changes are definitely a positive step that reduces much of the boilerplate that was required for modern JavaScript. Most importantly, one of Babel's big advantages is that it produces readable output, compared to previous compilers like Traceur, so that even without source maps it's easy to debug. We've added Babel as a part of the default build step in the Times news app template, so if you're looking to try it out, there's no better time than now.

11:28 x permalink

March 19, 2015

Platformers

I saw a lot of shocked reactions when Nintendo announced it would be partnering with another company to make smartphone games. The company was quick to stress that it wouldn't be moving entirely to app stores controlled by third parties: these games will not be re-releases of existing titles, and Nintendo is still working on new dedicated console hardware for the next generation. You shouldn't expect New Super Mario on your phone anytime soon. Basically, their smartphone games will serve as ads for the "real" games.

Unlike a lot of people, I've never really rooted for Nintendo to become a software-only company. Other companies that make that jump often do so to their detriment — look at Sega, which lost a real creative spark when they got out of the hardware business — and it's even more true for Nintendo, which has always explored the physical aspects of gaming as much as the virtual. The playful design of the GameCube controller buttons, or the weirdness of a double-screened handheld, or the runaway popularity of Wii Sports, are the result of designers who are encouraged to hold strong opinions. A touchscreen, on the other hand, is a weak opinion — even no opinion, as it imitates (but never really emulates) physical controls like buttons or joysticks.

But here's the other thing: what Nintendo represents on dedicated handheld hardware, as much as wacky design chops, is a sustainable market. I play a lot of Android games, I own a Shield, I'm generally positive on the idea of microconsoles. Even given those facts, a lot of the games I play on the go are either emulators or console ports, because the app store model simply does not support development beyond a single mechanic or a few hours of gameplay. The race to the bottom, and the resulting crash of mobile game prices, means that you will almost never see a phone game with the kind of lifespan and complexity you'd get out of even the lamest Nintendo title (Yoshi Touch & Go aside).

I don't think everything Nintendo produces is golden, but they're reliable. People buy Nintendo games because you're pretty much guaranteed a polished, enjoyable experience, to the point where they can start with an expanded riff on a gimmick level and still end up with a solid gameplay hit. They're the Pixar of games. And as a result of that consistency, people will pay $40 for first-party Nintendo titles, largely sight-unseen. This creates a virtuous cycle: the revenue from a relatively-expensive gaming market lets them make the kind of games that justify that cost. It's almost impossible to imagine Nintendo being able to sustain the same halo in a $1-5 game market.

There's room for both experiences in the gaming ecosystem. Microsoft, Sony, and Steam will all provide big-budget, adult-oriented games. The app stores are overflowing with shorter, quirkier, free-to-play fare. Nintendo's niche is that they crossed those lines: oddball software for all ages that was polished to a mirror sheen. Luckily, even though observers seem convinced that Nintendo is doomed, the company itself seems well aware of where their value lies — and it's not on someone else's platform.

18:54 x permalink

March 11, 2015

Template Trouble

About nine months ago, I made the first check-in on the Seattle Times news app template. Since that time, it's been at the heart of pretty much everything we've done at the Times, ranging from big investigative projects to Super Bowl coverage to dog name analysis. We've adapted it to form the basis of our web component stack, and made a version that automates Leaflet map creation. It's been a pretty great tool, used by news apps developers, producers, and graphics team members alike.

That said, I think in digital journalism we often talk in glowing terms about our tools, but we don't nearly as often discuss the downsides they possess. So let's be honest with ourselves: I love this scaffolding, but it's not perfect. It has issues. And I think those issues say interesting things about not only the template itself, but also newsroom culture, and the challenges of creating tools that can operate there.

The templating situation can be confusing. Since it's all JavaScript, it's sometimes hard for scaffolding users to keep track of what's running during the build, and what will run on the client. Generally, we use a different library in each scenario (Lo-Dash during builds, ICanHaz or DoT in the browser), but it can still be odd for people who are used to a language split — and worse for those who have little or no programming experience.
Deployment could be better. This really has less to do with our scaffold, and more to do with the environment in which it operates. We don't have great CMS integration, because the hooks don't exist. And we have to keep credentials in a separate file (which isn't checked into Git), because many of users can't update environment variables on their own machines. We're also still trying to figure out what we check into Git: should Google Sheets go in there? What about their ID numbers?
It was great when the paper launched its new responsive site last month, because it meant we finally have reasonable default styles. The news app scaffolding has previously left these up to the project authors, and the result is that we're not nearly consistent enough. I think we have a fine line to walk between "build everything in" and "provide flexibility" — what's good for the main site may not be good for us.
Along those same lines, the new CMS offers us a better, responsive layout, but it also took away a lot of flexibility. The result is that we're probably overusing the news app template to compensate. While I think it's great that we have a place for generating unconventional pages, I'm not wild about effectively creating a parallel content system on S3 whenever we need a small amount of control over the page.
Old apps are locked to old dependencies. Like any good Node package, we load dependencies for the news app template on a per-project basis. But I've been tinkering with this framework for 8 months now, and several things have changed radically (most notably, a switch from RequireJS to Browserify). Stepping back into old projects often requires a bit of code archaeology to figure out where everything used to live.

What are the common threads here? While you could point to the static page approach as being part of the issue, I actually think what causes a lot of these problems is that the intended audience for the news app template is both broad and narrow. It's broad in that its users range from novice journalists to experienced developers (and, indirectly, non-technical editors and reporters feeding data into Google Sheets). It's narrow in that the actual production still requires a high level of technical comfort: familiarity with the command line, new kinds of tooling, and some ability to roll with unexpected bugs.

This is a tough, and self-contradictory, audience for a visualization toolkit. It's not, however, out of character for a general-purpose dev framework. And indeed, when we talk about app scaffolds from any news organization (not just The Seattle Times), that's what they are. They're written to be fast, to be portable, and to generate static files, because those are our priorities as deadline-driven journalists. They are also the far end of a range of newsroom tools, where news apps are at one end and pre-built widgets live on the other. I'm not really worried about where the template lives on that range, and I'm certainly not planning on reducing the complexity — I think it's at a sweet spot right now. But I do worry about the ways that it (and our CMS) fit into newsroom culture.

At the Times, like in many newsrooms, the online presence is largely run by "producers," who curate the stories on the home page and handle the print-to-digital transition process (it's not the same as a "producer" in software development). This process is complicated and highly-skilled, because news CMS systems are generally terrible. The web production staff also often work on projects that would, in print, fall under page design: building complex HTML presentations for special stories. This isn't because they're trained designers: producers are often younger, and while it's not entry-level work, it's close. They end up doing this work because trained, HTML-fluent designers are rare, and because nobody else in the newsroom bothers to learn web design.

As a result, we end up in a funny situation: the only people in the newsroom who really understand the web are the producers. Editors and reporters are discouraged from becoming more technically savvy because the workflow is print-first, and the CMS is so intimidating. Meanwhile, producers rarely become editors or reporters because the newsroom can't afford to lose their skills. There's a tremendous gap in newsroom culture between people who produce the content, and people who actually understand the medium in which that content is consumed. While the tooling is not entirely responsible for that, it is a contributing factor.

I think the challenge we face, as newsroom developers, is to be always aware and vigilant of that gap and its causes. Tools like the news app template are important, because they speed up our work, and the work of other technical people. But they don't mitigate the need for better, web-first publishing systems — something that can help diffuse web thinking from a producer-only skill to something that's available throughout the newsroom.

12:19 x permalink

February 18, 2015

Speed Kills

It's an accepted truth on the web that fast pages are better for users — people stay on them longer, follow more links from them, and generally report being happier with them. I think a lot about performance on my projects, because I want readers to be thinking about the story, not distracted by slow load times.

Unfortunately, web performance has a bad rap, in part because it's a complicated topic. Making it work effectively and efficiently means learning a lot about how the browser runtime works, and optimizing for new techniques like GPU transforms. Like everything else in web development, there's also a lot of misinformation out there, and a lot of people who insist that everything was better back when we built everything without all the JavaScript and fancy-pants frameworks.

It's possible that I've been more aware of it, just because I've been working on a project that involves smoothly animating a chart using regular HTML instead of canvas, but it seems like it's been a bad month for that kind of thing. First Peter-Paul Koch wrote a diatribe about client-side templating, insisting that it's a needless performance hit. Then Flipboard wrote about discarding traditional elements entirely, instead rendering everything to a canvas tag in pursuit of 60 frames/second animations. Ironically, you'll notice that these are radically different approaches that both claim they create a better experience.

Instead of just sighing while the usual native app advocates use these posts to bash the web, and given that I am working on a page where high-performance mobile animations are a key part, I thought it'd be nice to talk about some experiments I've run with the approaches found in both. There are a lot of places where the web platform needs help competing on mobile, no doubt. But I'd prefer we talk about actual performance problems, and not get sidetracked into chasing down scattered criticisms without evidence.

Let's start with templating, which is serving as a stand-in for client-side JavaScript in general. PPK argues that templating (and by extension, single-page app design) is terrible for performance, but is that true? While I was working on my graph, I worried a little bit about startup time. Since I write JavaScript on both the server and the client, it was pretty easy to port my code from one to the other and check. I personally found the results conclusive, and a little surprising.

The client-side version of the page weighed in at 10KB and spent roughly 35ms in JavaScript during startup, rendering the page and prepping its data structures. That's actually not bad for something that's doing some fairly heavy positioning and styling, and it fits in the 14KB first TCP round-trip recommended by Google. In contrast, the server-side page, in which all the markup was pre-rendered and then progressively enhanced after page load, was 160KB and spent about 30ms in JavaScript. In other words, following PPK's advice to avoid client-side templating caused the page to be sixteen times larger, and still required two video frames to start up.

Now, this is a slightly special case: unlike typical server applications, my news apps are useless without JavaScript. They're not RESTful, they don't talk to a database, they involve a lot of moving parts. But even I was surprised by how little impact client-side templating actually had. Browsers these days are just ridiculously fast at assembling HTML. So while I don't recommend doing the entire page this way, or abandoning server-generated HTML entirely, it's pretty clear to me that it's not the slam-dunk case that holdouts for traditional server rendering claim it is.

At the other extreme is Flipboard's experiment with canvas rendering. Instead of putting everything in the document, like normal websites, they put a full-screen canvas image up and render everything — text, images, animations, etc. — manually to that buffer. You can try a demo out on your device here. On my Nexus 5, which is a reasonably new device running the latest version of Chrome, it's noticeably choppy. My experience with canvas is that Chrome's implementation is actually much faster than Safari, so I don't expect it to be smooth on iOS either (they've blacklisted tablets, so I can't be sure).

In order to get this "fluid" experience, here's what the Flipboard team threw away:

Accessibility: nothing on the page actually exists to a screen reader. Users that invert their screens or change the text size to make it easier to read are out of luck.
Copy and paste: since there's no document, there's nothing to select for these basic text operations.
Real links: you can't open pages in a new tab. You can't share them using the OS share panel. You can't do anything with the links, because they aren't really there.
GPU acceleration: by doing everything in canvas, they've ignored all the optimizations that browsers actually do to ensure a smooth, battery-efficient experience.
View source: inspecting the Flipboard page gives you no information at all, and the JavaScript is minified without source maps. The app is completely opaque to anyone who wants to learn from it.

The irony of doing all this work for a fluid experience that isn't actually fluid is that the kinds of animations they're doing — transform and opacity — are actually the exact properties that browsers can animate at 60FPS. Much has been written about using GPU compositing for smooth animations, but this article is a great start. If Flipboard had stuck with the DOM and used the GPU fully, they probably could have had fluid animations without leaving all those other features behind.

That's an easy thing to say, but is it true? Here's another experiment from my stacked bar chart: when a toggle is pressed, the chart shifts from being a measure of absolute numbers to relative proportions, with each bar smoothly animating up to 100%. I'm using an adaptation of Paul Lewis' FLIP technique, in which animations are set in JavaScript but run via CSS transitions. In my case, each of the 160+ blocks is measured, assigned a transform to "freeze" it in place as a new GPU layer, then transitioned to its final position with a second transform and "thawed" back into a regular, responsive element.

Even though I'm animating many more elements than Flipboard is doing in their demo, the animation is perfectly smooth on my Nexus 5, and on the aging iPad we use for testing. By doing all the hard computational work up front, and then handing the pre-computed transitions over to the browser, I'm actually not JavaScript-bound at all: everything is done on the graphics chip, and in the C++ compositing layer. The result is a smooth 60FPS during the animation, all done via regular DOM elements. So much for "If you touch the DOM in any way during an animation you’ve already blown through your 16ms frame budget."

Again, I'm not claiming that my use case is a perfect analogue. I'm animating a graphic in response to a single button press, and they're attempting to create an "infinite scroll" (sort of — it's not really a scroll so much as an animated pager). But this idea that "the DOM is lava" and touching it will cause your reader's phones to instantly burst into flames of scorn seems patently ridiculous, especially when we look back at that list of everything that was sacrificed in the single-minded pursuit of speed.

Performance is important, and I care deeply and obsessively about it. As a gamer and a graphics nerd, I love tweaking out those last few frames per second, or adding flashy effects to a page. But it's not the most important thing. It's not more important than making your content available to the blind or visually impaired. It's not more important than providing standard UI actions like copy-and-paste or "open in new tab." And it's not more important than providing a fallback for older and less-powerful devices, the kind that are used by poor readers. Let's keep speed in perspective on the web, and not get so caught up in dogma that we abandon useful techniques like client-side templating and the DOM.

14:38 x permalink

February 5, 2015

What is Data Journalism?

This week, if you want to be horrified by our grim meathook future, check out these posts from Seattle Times news librarian Gene Balk on vaccination rates at Washington State schools. There's a searchable data table and a map, but I'll spoil it for you: a large proportion of parents should probably pack surgical masks and antibiotics with their kids' lunches, because herd immunity is basically a thing of the past.

This kind of database-driven reporting is a staple of Gene's "FYI Guy" blog, and readers seem to enjoy it. Done right, it can help flesh out local coverage in interesting ways, explore topics that are off the beaten path, and find connections that we might otherwise miss. That said, I don't think you can stress enough how much of that depends on the quality of the reporter: Gene is a great researcher, and not everyone has his skills and experience.

By coincidence, yesterday Melissa Bell at Vox announced that they're (re)entering the field of data journalism in a almost parodically-titled post. I'm a little confused about the timing, since I thought data journalism was a part of their whole raison d'etre, but maybe I'm confusing them with a different scrappy, SEO-oriented news startup. Regardless, welcome to the party! After name-checking Philip Meyer's Precision Journalism, Bell adds a list of nine basic guidelines they plan to use. It's not a bad list, although several items are inoffensively bland (has anyone ever aspired to produce content that isn't "relevant and useful?").

Vox will work to provide the most relevant and useful data behind the news, when you need it, in ways that help you understand the stories that matter most.

We will work to make all the data behind our stories available to you to download and play with for yourself.

We want you to improve on what we’ve done, to play with the data, visualize it, and help us analyze it — and make our work better.

We will prioritize building data sets that can feed many stories, rather than focusing on one-off projects.

Our data visualizations will be clear, concise, and deep — to help you understand our editorial better. They will adhere to design rules which ensure their accuracy and transparency.

In the event we make a mistake (they do happen), we will swiftly and clearly clarify, correct, and communicate that as transparently as we can.

We will curate and showcase the best data infographics and visualizations on the web.

Visualizations we produce in-house will work well on as many platforms as possible: if you view it on a smartphone, it will function as well as it does on web.

We will curate and publish the best content that our community of readers produces. Our data journalism is as much about you, the community, as it is about us: this is a partnership.

Some of these goals are particularly strong, and we share them at the Seattle Times. Take #2, for example: not only do I think it's important that we publish the data on which our visualizations are built whenever possible, but we also open-source our graphics so that people can see the methodology we used. It's also just good sense to be mobile-friendly (#8), although I personally believe that there are some times when a story simply can't be fully told on a 4" screen.

I'm less sure about curation, either from readers (#9) or around the web(#7), particularly in conjunction with accuracy and corrections (#6). One of the strengths of a newsroom is supposed to be fact-checking, but it's not clear to me what the process is for verification of third-party visualizations, or if Vox plans to do so at all (it hasn't been evident to me as a reader that they do it now). Which is too bad, because I think a kind of real-time "Snopes for bad reporting" is a site I'd definitely support.

But I'm really most skeptical of #4, which Bell elsewhere refers to as "finding, cleaning, and setting up data streams so that they can be the source for repeated stories." It's not that I think it's necessarily a stupid idea. I'm just not sure that it's effective, based on my experience. Data stories are just reporting. Data streams are reporting on top of engineering on top of reporting.

CQ's Economy Tracker, for example, was my team's attempt at a reusable data API, but it turned out to be a frustrating experience to keep it topped off with up-to-date content, the architecture was a hard problem to solve, and the number of stories we pulled out of it probably didn't justify the effort. It turns out that it's hard to find a data set that can actually support a series of articles.

(You may say, at this point, hang on a minute: wasn't Congressional Quarterly an example of exactly what we're talking about? It's a large, data-oriented news organization that sold access to data streams, and maintained datasets that were used to build stories and interactives via the multimedia team. Which is true, but it elides a number of factors: CQ was a single-purpose news site — congress and legislation only — with a huge number of reporters feeding the beast and a large technical staff to tend to it. Vox does not have those advantages, since it's a general-audience, international news site with a much smaller staff.)

More importantly, a "data stream," like an API, demands maintenance which quickly becomes a drag on the amount of time that can be spent on efforts outside those streams. That's doubly true if you make them public, and people start relying on them. Will will Vox sunset these data streams, if they stop being useful internally? What are the cutoff criteria? How will they let people know before the source is shut down? Most importantly, how much time will be taken away from reporting to maintain the data products?

When I joined at the Seattle Times, I made a pitch to editors that was a little different: instead of designing long-running services, we generally build news apps that are scoped to a specific point in time. In other words, we make stories, the same as the rest of the newsroom does. And just as you wouldn't normally ask a reporter to go back and update all their old stories when new events happen, we don't maintain news apps more than a week or two after publication (barring, of course, normal corrections and serious bug-fixes). Our entire development stack, in fact, is based on this assumption — that's why we publish static files to S3 (which is cheap and easy), instead of running a Rails/Laravel/Node server (which is expensive and hard).

Maybe for Vox, this isn't a problem. After all, they're the people with the "poor man's Wikipedia" card stacks that they maintain for topics over many months, and the evergreen experiments. At the very least, though, it does highlight a very real distinction that goes (in my opinion) beyond "data journalism" and to the core of the digital news mission. Are we building general systems and tools to cover unique stories? Or are we optimizing for semi-predictable products built around APIs and data sources? I'm leaning toward the former because I think it's a better match for a messy, unpredictable, human world. But best of luck to Vox with the latter.

20:34 x permalink

January 30, 2015

Bowled Over

While we've still got plenty of interesting projects in the works, the Seahawks rampage into the Super Bowl has pretty much taken over the News Apps budget at the Seattle Times this week. As a result, we've got some interesting interactives you might have seen:

The Seahawks personality quiz and I-90 trivia quiz were put together by a new member of the team, and proved extremely popular. I suspect we'll be working on a <pop-quiz> web component soon enough as a result.
Top 12 Moments of the Seahawks season was a super-fast project that I tossed together this week, collecting the favorite plays of our knowledgeable football pundits along with some of our best sports photos.
(update: Jan. 30) Super Bowl Pick'em improves on our serverless infrastructure from the fan map to let people make their own score predictions (and read predictions by columnists and celebrities) even while hosting on S3.
(update: Jan. 30) Super Bowl Matchups is another scroll-oriented presentation, this time focusing on comparisons between the coaches, quarterbacks, and cornerbacks who are playing on Sunday. After this, I don't want to see another parallax layout until July.

Additionally, you may have seen that the Seahawks have launched I'm In, a fan map that bears a suspiciously close resemblance to our Hawks fan map. Personally, I prefer to think that since imitation is the sincerest form of flattery, a page like that is just Richard Sherman's way of letting me know how great I am.

More to come, obviously, as the road to the Super Bowl continues! Or, as Marshawn Lynch would say, "Yeah."

18:27 x permalink

January 22, 2015

Whitman

Whitman is a simple sampler (womp womp) written for modern web browsers. Built in Angular, it uses the WebAudio API to load and play sound files via a basic groovebox interface. You can try a demo on GitHub Pages. I put Whitman together for my dad's elementary school classes, so it's pretty simple by design, but it was a good learning experience.

The WebAudio API is not the worst new interface I've ever seen in a browser, but it's pretty bad. Some of its problems are just weird: for example, audio nodes are one-shot, and have to be created with new each time that you want to play the sound, which seems like a great way to trigger garbage collection and cause stuttering. Loading audio data is also kind of obnoxious, but at least you only have to do it once. I really wanted to be able to save the audio files in local storage so that they'd persist between refreshes, but getting access to the buffer (at least from the console/debugger) was oddly difficult, and eventually I just gave up.

But parts of it are genuinely cool, too. The API is built around wiring together nodes as if they were synthesizer components — an oscillator might get hooked up to a low-pass filter, then sent through a gain node before being mixed into the audio context — which feels pleasantly flexible. I'd like to put together a chiptune tracker with it. Support is decent, too, with the mobile browsers I care about (Safari and Chrome) already having decent availability. IE support is on the way.

The most surprising thing about Whitman is that it ended up being entirely built on web tech. When I started the project, I expected to move it over to a Chrome App at some point (it'll be taught on Chromebooks). There are still some places where that would have been nice (file retention, better support for saving data and synchronization), but for the most part it wasn't necessary at all. Believe it or not, you can pretty much write a basic audio app completely on the web these days, which is amazing.

In the parts where there is friction, it feels like a strong argument in favor of the Extensible Web. Take saving files, for example: without a "File Writer" object, Whitman does it by creating a link with a download attribute, base64-encoding the file into the href, and then programmatically clicking it when the user goes to save. That's a pretty crappy solution, because browsers only expose data URIs to create files. We need something lower-level, that can ask for permission to write locally, outside of a sandbox (especially now that the File System API is dead in the water).

14:50 x permalink

January 15, 2015

Ponographic

This week, Neil Young finally made the dreams of heavy-walleted audiophiles a reality by releasing the PonoPlayer, a digital audio player that's specifically made for lossless files recorded at 192KHz. Basically, it plays master recordings, as opposed to the downsampled audio that ends up on CDs (or, god forbid, those horrible MP3s that all the kids are listening to these days). It's been a while since I've written about audio or science, so let's talk about why the whole idea behind Pono — or indeed, most audiophile nattering about sample rate — is hogwash.

To understand sample rates, we need to back up and talk about one of the fundemental theories of digital audio: the Nyquist limit, which says that in order to accurately record and reproduce a signal, you need to sample at twice the frequency of that signal. Above the limit, the sampler doesn't record often enough to preserve the variation of the wave, and the input "wraps around" the limit. The technical term for this is "aliasing," because the sampled wave becomes indistinguishable from a lower-frequency waveform. Obviously, this doesn't sound great: at a 10KHz sample rate, an 9KHz audio signal would wrap around and play in the recording as 1KHz — a transition in scale roughly the same as going from one end of the piano to another.

To solve this problem, when digital audio came of age with CDs, engineers did two things. First, they put a filter in front of the sample input that filters out anything above the Nyquist limit, which keeps extremely high-frequency sounds from showing up in the recording as low-frequency noises. Secondly, they selected a sample rate for playback that would be twice the frequency range of normal human hearing, ensuring that the resulting audio would accurately represent anything people could actually hear. That's why CDs use 44.1KHz sampling: it gives you signal accuracy at up to 22.05KHz, which is frankly generous (most human hearing actually drops off sharply at around 14KHz). There's not very much point in playback above 44.1KHz, because you couldn't hear it anyway.

There's a lot of misunderstanding of how this works among people who consider themselves to be audiophiles (or musicians). They look at something like the Nyquist limit and what they see is information that's lost: filtered out before sampling, then filtered again when it gets downsampled from the high-resolution Pro Tools session (which may need the extra sample data for filtering and time-stretching). But truthfully, this is a glass-half-full situation. Sure, the Nyquist limit says we can't accurately record above 1/2 the sample rate — but on the other hand, below that limit accuracy is guaranteed. Everything that people can actually hear is reproduced in CD-quality audio.

This isn't to say that the $400 you'll pay for a PonoPlayer is a total scam. Although the digital-analog converter (DAC) inside it probably isn't that much better than the typical phone headphone jack, there are lots of places where you can improve digital audio playback with that kind of budget. You can add a cleaner amplifier, for example, so that there's less noise in the signal. But for most people, will it actually sound better? Not particularly. I think it's telling that one of their testimonials compares it to a high-end turntable — vinyl records having a notoriously high noise floor and crappy dynamic range, which is the polar opposite of what Pono's trying to do. You'd probably be better off spending the money on a really nice set of headphones, which will make a real difference in audio quality for most people.

I think the really interesting question raised by Pono is not the technical gibberish on their specifications page (audiophile homeopathy at its best), but rather to ask why: why is this the solution? Neil Young is a rich, influential figure, and he's decided that the industry problem he wants to solve is MP3 bitrates and CD sampling, but why?

I find Young's quest for clarity and precision fascinating, in part, because the rock tradition he's known for has always been heavily mediated and filtered, albeit in a way that we could generously call "engineered" (and cynically call "dishonest"). A rock recording is literally unnatural. Microphones are chosen very specifically for the flavor that they bring to a given instrument. Fake reverb is added to particular parts of the track and not to others, in a way that's not at all like live music. Don't even get me started on distortion, or the tonal characteristics of recording on magnetic tape.

The resulting characteristics that we think of as a "rock sound" are profoundly artificial. So I think it's interesting — not wrong, necessarily, but interesting — that someone would spend so much time on recreating the "original form" (their words) of music that doesn't sound anything like its live performance. And I do question whether it matters musically: one of my favorite albums of all time, the Black Keys' Rubber Factory, is a cheaply-produced and badly-mastered recording of performances in an abandoned building. Arguably Rubber Factory might sound better as MP3 than it does as the master, but the power it has musically has nothing to do with its sample rate.

(I'd still rather listen to it than Neil Young, too, but that's a separate issue.)

At the same time, I'm not surprised that a rock musician pitched and sold Pono, because it seems very much of that genre — trying to get closer to analog sound, because it came from an age of tape. These days, I wonder what would be the equivalent "quality" measurement for music that is deeply rooted in digital (and lo-fi digital, at that). What would be the point of Squarepusher at 192KHz? How could you remaster the Bomb Squad, when so much of their sound is in the sampled source material? And who would care, frankly, about high-fidelity chiptunes?

It's kind of fun to speculate if we'll see something like Pono in 20 years aimed at a generation that grew up on digital compression: maybe a ~~hypertext~~ hyperaudio player that can connect songs via the original tunes they both sample, and annotate lyrics for you a la Rap Genius? 3D audio, that shifts based on position? Time-stretching and resampling to match your surroundings? I don't know, personally. And I probably won't buy it then, either. But I like to think that those solutions will be at least more interesting than just increasing some numbers and calling it a revolution.

11:18 x permalink

Past - Present - Future