this space intentionally left blank

February 18, 2015

Filed under: tech»web

Speed Kills

It's an accepted truth on the web that fast pages are better for users — people stay on them longer, follow more links from them, and generally report being happier with them. I think a lot about performance on my projects, because I want readers to be thinking about the story, not distracted by slow load times.

Unfortunately, web performance has a bad rap, in part because it's a complicated topic. Making it work effectively and efficiently means learning a lot about how the browser runtime works, and optimizing for new techniques like GPU transforms. Like everything else in web development, there's also a lot of misinformation out there, and a lot of people who insist that everything was better back when we built everything without all the JavaScript and fancy-pants frameworks.

It's possible that I've been more aware of it, just because I've been working on a project that involves smoothly animating a chart using regular HTML instead of canvas, but it seems like it's been a bad month for that kind of thing. First Peter-Paul Koch wrote a diatribe about client-side templating, insisting that it's a needless performance hit. Then Flipboard wrote about discarding traditional elements entirely, instead rendering everything to a canvas tag in pursuit of 60 frames/second animations. Ironically, you'll notice that these are radically different approaches that both claim they create a better experience.

Instead of just sighing while the usual native app advocates use these posts to bash the web, and given that I am working on a page where high-performance mobile animations are a key part, I thought it'd be nice to talk about some experiments I've run with the approaches found in both. There are a lot of places where the web platform needs help competing on mobile, no doubt. But I'd prefer we talk about actual performance problems, and not get sidetracked into chasing down scattered criticisms without evidence.

Let's start with templating, which is serving as a stand-in for client-side JavaScript in general. PPK argues that templating (and by extension, single-page app design) is terrible for performance, but is that true? While I was working on my graph, I worried a little bit about startup time. Since I write JavaScript on both the server and the client, it was pretty easy to port my code from one to the other and check. I personally found the results conclusive, and a little surprising.

The client-side version of the page weighed in at 10KB and spent roughly 35ms in JavaScript during startup, rendering the page and prepping its data structures. That's actually not bad for something that's doing some fairly heavy positioning and styling, and it fits in the 14KB first TCP round-trip recommended by Google. In contrast, the server-side page, in which all the markup was pre-rendered and then progressively enhanced after page load, was 160KB and spent about 30ms in JavaScript. In other words, following PPK's advice to avoid client-side templating caused the page to be sixteen times larger, and still required two video frames to start up.

Now, this is a slightly special case: unlike typical server applications, my news apps are useless without JavaScript. They're not RESTful, they don't talk to a database, they involve a lot of moving parts. But even I was surprised by how little impact client-side templating actually had. Browsers these days are just ridiculously fast at assembling HTML. So while I don't recommend doing the entire page this way, or abandoning server-generated HTML entirely, it's pretty clear to me that it's not the slam-dunk case that holdouts for traditional server rendering claim it is.

At the other extreme is Flipboard's experiment with canvas rendering. Instead of putting everything in the document, like normal websites, they put a full-screen canvas image up and render everything — text, images, animations, etc. — manually to that buffer. You can try a demo out on your device here. On my Nexus 5, which is a reasonably new device running the latest version of Chrome, it's noticeably choppy. My experience with canvas is that Chrome's implementation is actually much faster than Safari, so I don't expect it to be smooth on iOS either (they've blacklisted tablets, so I can't be sure).

In order to get this "fluid" experience, here's what the Flipboard team threw away:

  • Accessibility: nothing on the page actually exists to a screen reader. Users that invert their screens or change the text size to make it easier to read are out of luck.
  • Copy and paste: since there's no document, there's nothing to select for these basic text operations.
  • Real links: you can't open pages in a new tab. You can't share them using the OS share panel. You can't do anything with the links, because they aren't really there.
  • GPU acceleration: by doing everything in canvas, they've ignored all the optimizations that browsers actually do to ensure a smooth, battery-efficient experience.
  • View source: inspecting the Flipboard page gives you no information at all, and the JavaScript is minified without source maps. The app is completely opaque to anyone who wants to learn from it.
The irony of doing all this work for a fluid experience that isn't actually fluid is that the kinds of animations they're doing — transform and opacity — are actually the exact properties that browsers can animate at 60FPS. Much has been written about using GPU compositing for smooth animations, but this article is a great start. If Flipboard had stuck with the DOM and used the GPU fully, they probably could have had fluid animations without leaving all those other features behind.

That's an easy thing to say, but is it true? Here's another experiment from my stacked bar chart: when a toggle is pressed, the chart shifts from being a measure of absolute numbers to relative proportions, with each bar smoothly animating up to 100%. I'm using an adaptation of Paul Lewis' FLIP technique, in which animations are set in JavaScript but run via CSS transitions. In my case, each of the 160+ blocks is measured, assigned a transform to "freeze" it in place as a new GPU layer, then transitioned to its final position with a second transform and "thawed" back into a regular, responsive element.

Even though I'm animating many more elements than Flipboard is doing in their demo, the animation is perfectly smooth on my Nexus 5, and on the aging iPad we use for testing. By doing all the hard computational work up front, and then handing the pre-computed transitions over to the browser, I'm actually not JavaScript-bound at all: everything is done on the graphics chip, and in the C++ compositing layer. The result is a smooth 60FPS during the animation, all done via regular DOM elements. So much for "If you touch the DOM in any way during an animation you’ve already blown through your 16ms frame budget."

Again, I'm not claiming that my use case is a perfect analogue. I'm animating a graphic in response to a single button press, and they're attempting to create an "infinite scroll" (sort of — it's not really a scroll so much as an animated pager). But this idea that "the DOM is lava" and touching it will cause your reader's phones to instantly burst into flames of scorn seems patently ridiculous, especially when we look back at that list of everything that was sacrificed in the single-minded pursuit of speed.

Performance is important, and I care deeply and obsessively about it. As a gamer and a graphics nerd, I love tweaking out those last few frames per second, or adding flashy effects to a page. But it's not the most important thing. It's not more important than making your content available to the blind or visually impaired. It's not more important than providing standard UI actions like copy-and-paste or "open in new tab." And it's not more important than providing a fallback for older and less-powerful devices, the kind that are used by poor readers. Let's keep speed in perspective on the web, and not get so caught up in dogma that we abandon useful techniques like client-side templating and the DOM.

February 5, 2015

Filed under: journalism»industry

What is Data Journalism?

This week, if you want to be horrified by our grim meathook future, check out these posts from Seattle Times news librarian Gene Balk on vaccination rates at Washington State schools. There's a searchable data table and a map, but I'll spoil it for you: a large proportion of parents should probably pack surgical masks and antibiotics with their kids' lunches, because herd immunity is basically a thing of the past.

This kind of database-driven reporting is a staple of Gene's "FYI Guy" blog, and readers seem to enjoy it. Done right, it can help flesh out local coverage in interesting ways, explore topics that are off the beaten path, and find connections that we might otherwise miss. That said, I don't think you can stress enough how much of that depends on the quality of the reporter: Gene is a great researcher, and not everyone has his skills and experience.

By coincidence, yesterday Melissa Bell at Vox announced that they're (re)entering the field of data journalism in a almost parodically-titled post. I'm a little confused about the timing, since I thought data journalism was a part of their whole raison d'etre, but maybe I'm confusing them with a different scrappy, SEO-oriented news startup. Regardless, welcome to the party! After name-checking Philip Meyer's Precision Journalism, Bell adds a list of nine basic guidelines they plan to use. It's not a bad list, although several items are inoffensively bland (has anyone ever aspired to produce content that isn't "relevant and useful?").

  1. Vox will work to provide the most relevant and useful data behind the news, when you need it, in ways that help you understand the stories that matter most.
  2. We will work to make all the data behind our stories available to you to download and play with for yourself.
  3. We want you to improve on what we’ve done, to play with the data, visualize it, and help us analyze it — and make our work better.
  4. We will prioritize building data sets that can feed many stories, rather than focusing on one-off projects.
  5. Our data visualizations will be clear, concise, and deep — to help you understand our editorial better. They will adhere to design rules which ensure their accuracy and transparency.
  6. In the event we make a mistake (they do happen), we will swiftly and clearly clarify, correct, and communicate that as transparently as we can.
  7. We will curate and showcase the best data infographics and visualizations on the web.
  8. Visualizations we produce in-house will work well on as many platforms as possible: if you view it on a smartphone, it will function as well as it does on web.
  9. We will curate and publish the best content that our community of readers produces. Our data journalism is as much about you, the community, as it is about us: this is a partnership.

Some of these goals are particularly strong, and we share them at the Seattle Times. Take #2, for example: not only do I think it's important that we publish the data on which our visualizations are built whenever possible, but we also open-source our graphics so that people can see the methodology we used. It's also just good sense to be mobile-friendly (#8), although I personally believe that there are some times when a story simply can't be fully told on a 4" screen.

I'm less sure about curation, either from readers (#9) or around the web(#7), particularly in conjunction with accuracy and corrections (#6). One of the strengths of a newsroom is supposed to be fact-checking, but it's not clear to me what the process is for verification of third-party visualizations, or if Vox plans to do so at all (it hasn't been evident to me as a reader that they do it now). Which is too bad, because I think a kind of real-time "Snopes for bad reporting" is a site I'd definitely support.

But I'm really most skeptical of #4, which Bell elsewhere refers to as "finding, cleaning, and setting up data streams so that they can be the source for repeated stories." It's not that I think it's necessarily a stupid idea. I'm just not sure that it's effective, based on my experience. Data stories are just reporting. Data streams are reporting on top of engineering on top of reporting.

CQ's Economy Tracker, for example, was my team's attempt at a reusable data API, but it turned out to be a frustrating experience to keep it topped off with up-to-date content, the architecture was a hard problem to solve, and the number of stories we pulled out of it probably didn't justify the effort. It turns out that it's hard to find a data set that can actually support a series of articles.

(You may say, at this point, hang on a minute: wasn't Congressional Quarterly an example of exactly what we're talking about? It's a large, data-oriented news organization that sold access to data streams, and maintained datasets that were used to build stories and interactives via the multimedia team. Which is true, but it elides a number of factors: CQ was a single-purpose news site — congress and legislation only — with a huge number of reporters feeding the beast and a large technical staff to tend to it. Vox does not have those advantages, since it's a general-audience, international news site with a much smaller staff.)

More importantly, a "data stream," like an API, demands maintenance which quickly becomes a drag on the amount of time that can be spent on efforts outside those streams. That's doubly true if you make them public, and people start relying on them. Will will Vox sunset these data streams, if they stop being useful internally? What are the cutoff criteria? How will they let people know before the source is shut down? Most importantly, how much time will be taken away from reporting to maintain the data products?

When I joined at the Seattle Times, I made a pitch to editors that was a little different: instead of designing long-running services, we generally build news apps that are scoped to a specific point in time. In other words, we make stories, the same as the rest of the newsroom does. And just as you wouldn't normally ask a reporter to go back and update all their old stories when new events happen, we don't maintain news apps more than a week or two after publication (barring, of course, normal corrections and serious bug-fixes). Our entire development stack, in fact, is based on this assumption — that's why we publish static files to S3 (which is cheap and easy), instead of running a Rails/Laravel/Node server (which is expensive and hard).

Maybe for Vox, this isn't a problem. After all, they're the people with the "poor man's Wikipedia" card stacks that they maintain for topics over many months, and the evergreen experiments. At the very least, though, it does highlight a very real distinction that goes (in my opinion) beyond "data journalism" and to the core of the digital news mission. Are we building general systems and tools to cover unique stories? Or are we optimizing for semi-predictable products built around APIs and data sources? I'm leaning toward the former because I think it's a better match for a messy, unpredictable, human world. But best of luck to Vox with the latter.

January 30, 2015

Filed under: journalism»new_media

Bowled Over

While we've still got plenty of interesting projects in the works, the Seahawks rampage into the Super Bowl has pretty much taken over the News Apps budget at the Seattle Times this week. As a result, we've got some interesting interactives you might have seen:

  • The Seahawks personality quiz and I-90 trivia quiz were put together by a new member of the team, and proved extremely popular. I suspect we'll be working on a <pop-quiz> web component soon enough as a result.
  • Top 12 Moments of the Seahawks season was a super-fast project that I tossed together this week, collecting the favorite plays of our knowledgeable football pundits along with some of our best sports photos.
  • (update: Jan. 30) Super Bowl Pick'em improves on our serverless infrastructure from the fan map to let people make their own score predictions (and read predictions by columnists and celebrities) even while hosting on S3.
  • (update: Jan. 30) Super Bowl Matchups is another scroll-oriented presentation, this time focusing on comparisons between the coaches, quarterbacks, and cornerbacks who are playing on Sunday. After this, I don't want to see another parallax layout until July.
Additionally, you may have seen that the Seahawks have launched I'm In, a fan map that bears a suspiciously close resemblance to our Hawks fan map. Personally, I prefer to think that since imitation is the sincerest form of flattery, a page like that is just Richard Sherman's way of letting me know how great I am.

More to come, obviously, as the road to the Super Bowl continues! Or, as Marshawn Lynch would say, "Yeah."

January 22, 2015

Filed under: music»tools

Whitman

Whitman is a simple sampler (womp womp) written for modern web browsers. Built in Angular, it uses the WebAudio API to load and play sound files via a basic groovebox interface. You can try a demo on GitHub Pages. I put Whitman together for my dad's elementary school classes, so it's pretty simple by design, but it was a good learning experience.

The WebAudio API is not the worst new interface I've ever seen in a browser, but it's pretty bad. Some of its problems are just weird: for example, audio nodes are one-shot, and have to be created with new each time that you want to play the sound, which seems like a great way to trigger garbage collection and cause stuttering. Loading audio data is also kind of obnoxious, but at least you only have to do it once. I really wanted to be able to save the audio files in local storage so that they'd persist between refreshes, but getting access to the buffer (at least from the console/debugger) was oddly difficult, and eventually I just gave up.

But parts of it are genuinely cool, too. The API is built around wiring together nodes as if they were synthesizer components — an oscillator might get hooked up to a low-pass filter, then sent through a gain node before being mixed into the audio context — which feels pleasantly flexible. I'd like to put together a chiptune tracker with it. Support is decent, too, with the mobile browsers I care about (Safari and Chrome) already having decent availability. IE support is on the way.

The most surprising thing about Whitman is that it ended up being entirely built on web tech. When I started the project, I expected to move it over to a Chrome App at some point (it'll be taught on Chromebooks). There are still some places where that would have been nice (file retention, better support for saving data and synchronization), but for the most part it wasn't necessary at all. Believe it or not, you can pretty much write a basic audio app completely on the web these days, which is amazing.

In the parts where there is friction, it feels like a strong argument in favor of the Extensible Web. Take saving files, for example: without a "File Writer" object, Whitman does it by creating a link with a download attribute, base64-encoding the file into the href, and then programmatically clicking it when the user goes to save. That's a pretty crappy solution, because browsers only expose data URIs to create files. We need something lower-level, that can ask for permission to write locally, outside of a sandbox (especially now that the File System API is dead in the water).

January 15, 2015

Filed under: music»recording

Ponographic

This week, Neil Young finally made the dreams of heavy-walleted audiophiles a reality by releasing the PonoPlayer, a digital audio player that's specifically made for lossless files recorded at 192KHz. Basically, it plays master recordings, as opposed to the downsampled audio that ends up on CDs (or, god forbid, those horrible MP3s that all the kids are listening to these days). It's been a while since I've written about audio or science, so let's talk about why the whole idea behind Pono — or indeed, most audiophile nattering about sample rate — is hogwash.

To understand sample rates, we need to back up and talk about one of the fundemental theories of digital audio: the Nyquist limit, which says that in order to accurately record and reproduce a signal, you need to sample at twice the frequency of that signal. Above the limit, the sampler doesn't record often enough to preserve the variation of the wave, and the input "wraps around" the limit. The technical term for this is "aliasing," because the sampled wave becomes indistinguishable from a lower-frequency waveform. Obviously, this doesn't sound great: at a 10KHz sample rate, an 9KHz audio signal would wrap around and play in the recording as 1KHz — a transition in scale roughly the same as going from one end of the piano to another.

To solve this problem, when digital audio came of age with CDs, engineers did two things. First, they put a filter in front of the sample input that filters out anything above the Nyquist limit, which keeps extremely high-frequency sounds from showing up in the recording as low-frequency noises. Secondly, they selected a sample rate for playback that would be twice the frequency range of normal human hearing, ensuring that the resulting audio would accurately represent anything people could actually hear. That's why CDs use 44.1KHz sampling: it gives you signal accuracy at up to 22.05KHz, which is frankly generous (most human hearing actually drops off sharply at around 14KHz). There's not very much point in playback above 44.1KHz, because you couldn't hear it anyway.

There's a lot of misunderstanding of how this works among people who consider themselves to be audiophiles (or musicians). They look at something like the Nyquist limit and what they see is information that's lost: filtered out before sampling, then filtered again when it gets downsampled from the high-resolution Pro Tools session (which may need the extra sample data for filtering and time-stretching). But truthfully, this is a glass-half-full situation. Sure, the Nyquist limit says we can't accurately record above 1/2 the sample rate — but on the other hand, below that limit accuracy is guaranteed. Everything that people can actually hear is reproduced in CD-quality audio.

This isn't to say that the $400 you'll pay for a PonoPlayer is a total scam. Although the digital-analog converter (DAC) inside it probably isn't that much better than the typical phone headphone jack, there are lots of places where you can improve digital audio playback with that kind of budget. You can add a cleaner amplifier, for example, so that there's less noise in the signal. But for most people, will it actually sound better? Not particularly. I think it's telling that one of their testimonials compares it to a high-end turntable — vinyl records having a notoriously high noise floor and crappy dynamic range, which is the polar opposite of what Pono's trying to do. You'd probably be better off spending the money on a really nice set of headphones, which will make a real difference in audio quality for most people.

I think the really interesting question raised by Pono is not the technical gibberish on their specifications page (audiophile homeopathy at its best), but rather to ask why: why is this the solution? Neil Young is a rich, influential figure, and he's decided that the industry problem he wants to solve is MP3 bitrates and CD sampling, but why?

I find Young's quest for clarity and precision fascinating, in part, because the rock tradition he's known for has always been heavily mediated and filtered, albeit in a way that we could generously call "engineered" (and cynically call "dishonest"). A rock recording is literally unnatural. Microphones are chosen very specifically for the flavor that they bring to a given instrument. Fake reverb is added to particular parts of the track and not to others, in a way that's not at all like live music. Don't even get me started on distortion, or the tonal characteristics of recording on magnetic tape.

The resulting characteristics that we think of as a "rock sound" are profoundly artificial. So I think it's interesting — not wrong, necessarily, but interesting — that someone would spend so much time on recreating the "original form" (their words) of music that doesn't sound anything like its live performance. And I do question whether it matters musically: one of my favorite albums of all time, the Black Keys' Rubber Factory, is a cheaply-produced and badly-mastered recording of performances in an abandoned building. Arguably Rubber Factory might sound better as MP3 than it does as the master, but the power it has musically has nothing to do with its sample rate.

(I'd still rather listen to it than Neil Young, too, but that's a separate issue.)

At the same time, I'm not surprised that a rock musician pitched and sold Pono, because it seems very much of that genre — trying to get closer to analog sound, because it came from an age of tape. These days, I wonder what would be the equivalent "quality" measurement for music that is deeply rooted in digital (and lo-fi digital, at that). What would be the point of Squarepusher at 192KHz? How could you remaster the Bomb Squad, when so much of their sound is in the sampled source material? And who would care, frankly, about high-fidelity chiptunes?

It's kind of fun to speculate if we'll see something like Pono in 20 years aimed at a generation that grew up on digital compression: maybe a hypertext hyperaudio player that can connect songs via the original tunes they both sample, and annotate lyrics for you a la Rap Genius? 3D audio, that shifts based on position? Time-stretching and resampling to match your surroundings? I don't know, personally. And I probably won't buy it then, either. But I like to think that those solutions will be at least more interesting than just increasing some numbers and calling it a revolution.

December 16, 2014

Filed under: journalism»professional

Sell Block

This week The Seattle Times published a piece I've been looking forward to all year: Sell Block details the way that the Washington correctional industries have fallen down on the promises that they made over three decades ago. I did the logo design for this series, after daydreaming at the bus stop one day, and I also put together the maps for it.

From a development perspective, this project is also noteworthy as a test case for the custom elements that I've been pushing hard at the Times. Both maps are embedded through <responsive-frame> elements, which has supplanted our use of Pym.js, and the statewide map was built on top of <leaflet-map>. Custom elements continue to be a phenomenal way to develop and deploy web apps. In fact, I have an article up on Source, the OpenNews blog, about how we used them in the elections and this project, and calling for more news developers to adopt them.

November 26, 2014

Filed under: journalism»new_media

Ex Post Facto

I had hoped, personally, that we were past the app craze in newsrooms, particularly since the New York Times (eternally the canary in the coalmine for the rest of the industry) started killing off its unsuccessful subscription apps. But there's a sucker born every minute, and this time it's the Washington Post, which is launching a special Kindle app, the main goal of which seems to be to remind everyone that Jeff Bezos bought the Post last year:

The app, which was designed to reduce the noise of the Web to something as streamlined as a print publication, will be automatically added to certain Kindle Fire tablets as part of a software update. It will feature two editions each day, at 5 a.m. and 5 p.m. Eastern time, when the company believes it will reach the most readers.

Two things stick out in that paragraph: first, the "noise of the Web," as though that's a thing that exists in and of itself and not a product of newspaper websites being largely assembled at the whims of a huge number of competing interests (advertising, editorial, IT, etc). If your web site is noisy, it's because you made it that way, and maybe you should fix it instead of launching a new platform.

Second, two editions? In a world of twenty-four hour online news, someone's making a digital news publication that updates (with exceptions for breaking events) twice a day? That's not a strategy for reaching readers, it's a sop to a print-oriented workflow that has to produce distinct physical "products" instead of a stream of content. It's not like they have to lay out a page, so what's the point? Why make people wait?

I expect this thing to go the way of The Daily within a year, quietly killed when the Post announces some new shiny object, probably. Of course, as a long-time web partisan, I think launching another native journalism app is a silly move anyway. The reasons for this are well-rehearsed and familiar: ease of production, greater audience reach, and creating a single path for content. But ultimately what makes native news apps fail is that they can't interoperate with other services the way the web can.

A lot of ink has been spilled about the NYTimes innovation report, for better or worse, but one of the big takeaways for me was the graph on page 23 of home page visitors compared to page views. The Times has seen no real drop-off in overall traffic, but the number of people seeing the home page has dropped by half over the last two years alone. And the reason for this is simple: most people don't go looking for journalism anymore. It finds them instead, when stories are shared through Facebook and Twitter and (increasingly less) RSS/Atom feeds.

Whether you're thinking of your app as a new home page or as a new publishing platform entirely, this trend seems equally grim — a choice between apathy or obscurity. It's probably possible, somehow, to make an app share to Facebook or Twitter. But it's never going to be as quick, as smooth, or as easy as sharing to those services via a simple URL. As much as anything else, this dooms native news apps from the start: if users can't share your content, it might as well be stored in a sealed vault. If you make the app share a web link as a workaround, everyone ends up on the site anyway, so why bother creating the app in the first place?

(Incidentally, this is why the line tossed around by some pundits that "native apps are too on the web, because they use HTTP" is nonsense. Does your native app have a front-facing URL? Can I link someone to a specific page in your app? No? Then it's not on the web.)

Don't get me wrong: I'm not necessarily sanguine about this state of affairs. The increasing role of social media in discovery and spread of journalism is worrying, from the silencing effects to the loss of control for publishers. One day I'd like to think we'll be out from underneath Facebook's thumb, or anyone else seeking to wall off the web until it pays up. We need better solutions for that problem, ones that don't make us sharecroppers on anyone else's land.

Meanwhile, however, this is the world we live in: the social networks dominate, and ultimately they run on URLs, not on binary blobs stored in a native bundle. Publishing two gimmicky "editions" a day through a fancy app, on a device that relatively few people use, is not going to change that anytime soon. If you want people to read your news, it had better be on the (sharable, linkable, endlessly flexible) web.

November 12, 2014

Filed under: tech»web

grunt-init component

Last week, I wrote a little bit about using custom elements for our election pages. Being able to interact with SVG maps using a simple DOM interface, while still annoying (it's SVG, after all) miles more pleasant than actually using the tags directly. At the end of that post, I recommended that newsrooms thinking about docreating new JavaScript libraries look into Web Components — or at least custom elements. This week, I've got a way to make good on that pitch.

Similar to our news app template, I've put together a Grunt scaffolding for creating bundled custom elements, including HTML templating and CSS, all in a single standalone file. It's our component template — or, as I like to call it, the Poor Journalist's Polymer.

As with the app template, I'm developing the component scaffolding by building projects with it and then integrating the improvements back in. The first is a responsive-frame element that serves as a smaller, easier-to-use replacement for NPR's Pym. I like Pym, and I've used it in several projects now, but it's a little buggy and the setup process is cumbersome. In contrast, the custom elements don't require any JavaScript skills: just include the script to start using them on the page, and they'll connect up with the child elements on the other side of the iframe automatically.

My second testbed project is a Leaflet map element that uses custom HTML to set the map configuration without ever writing a line of JSON (unless you really want to). It's intended to make mapping simple and fast for web producers, while still offering plenty of power for people like me who just want the boilerplate out of the way. Leaflet's a great candidate for this kind of declarative approach, and I think this is a really promising demo for the power of custom elements.

For standalone components like these, the template seems to be working well. I haven't yet solved the problem of easily embedding them in highly opinionated news apps, due to the way that dependencies are handled. It's useful for custom elements to be able to bundle their CSS and other assets into their package, similar to the way that HTML imports and shadow root offer embedded styles, but that means they may not integrate well into projects that already have their own build system. As far as I can tell, the best solution for now will probably be to load the packages from Bower and require() the standalone files from its build directory, which should work with whatever module system you like.

But to be clear, the component template isn't really intended to solve those problems. Its goal is to simplify and modernize the kinds of scripts that, even now, people tend to solve with a jQuery plugin. I'd like to change that, so that more newsrooms produce reusable HTML elements instead of JavaScript spaghetti code. If you build something interesting with the component template, or if it inspires you to make your own, please let me know!

November 5, 2014

Filed under: journalism»new_media

Election Elements

You might have heard that there was an election this last week. Like every news organization, The Seattle Times had a live results page, powered by a Node-based scraper. It did pretty well: we had no glitches with pulling results, and the response has been solid. It also generated the source data for the print edition. Oh, and we put bunting on the front page, which is not something you get to do every day.

Behind the scenes, however, that results page has another interesting feature: as far as I'm aware, it's the first use of Web Components (at least, the custom elements part) in production by a news organization. Each of the Washington maps on the page is a custom-built <svg-map> element, which handles loading the image document and provides a set of convenience methods for manipulating the map once it's available.

SVG is one of those technologies that I really want to like, but has always been a total pain to actually use. It's an annoying format to author, doesn't seem to actually save any space compared to bitmap images, and has a ton of edge cases even in "standard" browsers (for example, Chrome will forget the state of an SVG document inside an object tag if that tag or its parents are set to display: none). Wrapping it up in a component that would manage its lifecycle and quirks for me just seemed like a no-brainer.

To create the component, I used Andrea Giammarchi's registerElement() shim instead of Polymer's polyfill layer — Giammarchi's script only shims the custom element portion of Web Components, but it works all the way back to IE9 and (more importantly) is only 2KB. On top of that, I used RSVP.js to create a quick shared cache for SVG source documents, ICanHaz for my templating, and a custom module called Savage to do SVG class/style manipulation.

From the outside, however, you don't need to know any of that. Instead, the interface is simple:

  1. Write a map element into the page, with a src attribute pointing to the SVG file you want to load. Put your tooltip template inside the tag.
  2. Attach callbacks to the element's ready promise to run code once the image is on the page.
  3. Use the eachPath method on the element to do painting, and set the onhover callback to pass in data for templating in the tooltip.
Using these maps, in other words, is basically just like using a regular element, if regular elements had a DOM API that wasn't written by psychopaths. All their complexity is tucked away inside, and what they present externally is clean, simple, and self-contained. The map element does 80% of what Pro Publica's Landline does, and I'd argue it does it better.

As a developer, I'm really excited by the potential of these new custom elements. Although I had used them at ArenaNet for building the new Guild Wars 2 trading post, those were used to create tight integration with the in-game interface, and only needed to work in a single browser. This is the first time I've used them in a wider ecosystem, and they worked like a charm.

But as a library consumer, and particularly as a harried newsroom dev, I think web components have a tremendous potential to make complex behavior way easier to build and train for. Take the afore-mentioned Landline, for example: wouldn't it be nice to simply include a script tag (or an HTML import) and then be able to write <landline-map> tags into the page, with an attribute pointing to a CSV or a Google Sheet containing the necessary data? Or consider Pym, NPR's responsive iframe library that's so great I forked and rewrote big chunks of it. Right now, using Pym on the parent page requires including the script, adding a dummy element, and then initializing the script — why shouldn't it just be <pym-embed> instead?

Distributing libraries not as modules or loose scripts, but as chunks of new HTML functionality, has the potential to radically change how we create new content on the web in the future. Newsrooms, which are always under pressure and often consume "pre-made" tools for interactive elements like timelines and galleries, are a perfect use-case for Web Components. After this election experience, I'm planning to lean heavily on them whenever possible, and I'm hoping other people will as well.

October 21, 2014

Filed under: journalism»investigation

Loaded with lead

I'm very proud to say that "Loaded with lead," a Seattle Times investigation into the ways that gun ranges poison their customers and workers, went live this weekend. I worked on all four interactives for this project, as well as doing the header design and various special effects. We'll have a post up soon on the developer blog about those headers, but what I'd like to talk about today is one particular graphic — specifically, the string-of-pearls chart from part 2.

The data underlying the pearl chart is a set of almost 300 blood tests. These are not all tests taken by range workers in Washington, just the ones that had to be reported after exceeding the safe threshold of 10 micrograms per deciliter. Although we know who some of the tested workers are, most of them are identified only by an anonymous patient ID and the name of their employer. My first impulse was to simply toss the data into a scatter chart, but as is often the case, that first impulse proved ill-advised:

  • Although the tests are taken from a ten-year period, for any given employer/employee the tests tend to be more compact in timeframe, which makes it tough to look at any series without either losing the wider context, or making it impossible to see the individual tests.
  • There aren't that many workers, or even many ranges, with lots of test results. It's hard to draw a trend when the filtered dataset might be composed of only a handful of points. And since within a range the tests might be from one worker or from many, they can't really be meaningfully compared.
  • Since these tests are only of workers that exceeded the safe limit, even those trends that can be graphed do not tell a good story visually: they usually show a high exposure, followed by gradually lowered lead levels. The impression given is that gun ranges are becoming safer, but the truth is that workers with hazardous blood lead levels undergo treatment and may be removed from the high-lead environment, resulting in lowered test results but not necessarily a lead-free workplace. It's one of those graphs that's "technically correct," but is actually misleading.

Talking with reporters, what emerged was that the time dimension was not really important to this dataset. What was important was to show that there was a repeated pattern of negligence: that these ranges posted high numbers repeatedly, over long periods of time (in several cases, more than five years). Once we discard a strict time axis, a lot more interesting options open up to us for data visualization.

One way to handle this would be with a traditional box and whiskers plot, which shows the median and variation within a statistical set. Unfortunately, box plots are also wonky and weird-looking for most readers, who are not statisticians and would not know a quartile if it offered them a grilled cheese sandwich. So one prototype simplified the box plot down to its simplest form — probably too simple: I rendered a bar that began and ended within the total range of test results for each range, with individual test results marked with a line inside that bar.

This version of the plot was visually interesting, but it had flaws. It made it easy to see the general level of blood tests found at each range, and compare gun ranges against each other, but it didn't show concentration. Since a single tick mark was shown within the bar no matter how many test results at a given level, there was litttle visual difference between two employers with the same range of test results, even if one employer mainly showed results at the top of the range, and the other results were clustered at the bottom. We needed a way to show not only level, but also distribution, of results.

Given that the chart was already basically a number line, with a bar drawn from the lowest to the highest test result, I removed the bar and replaced the tick marks with circles that were sized to match the number of test results at each amount. Essentially, this is a histogram, but I liked the way that the circles overlapped to create "blobs" around areas of common test results. You can immediately see where most of the tests fall for each employer, but you don't lose sight of the overall picture (which in some cases, like the contractors working outside of a ventilation hood at Wade's, can be horrific — almost three times the amount considered dangerous by the CDC). I'm not aware of anyone else who's done this kind of chart before, but it seems too simple for me to be the first to think of it.

I'd like to take a moment here to observe that pretty much all data visualization comes down to translating information into a form that our visual systems are evolved to quickly understand. There's a great post on how that translation functions here, with illustrations that show where each arrangement sits on a spectrum of perceived accuracy and meaning. It's not rocket science, but I think it's a helpful perspective: I'm just trying to trick your visual cortex into absorbing a table's worth of data at a glance.

But what I've been trying to stress in the newsroom from this example is less technical, and more about how much effective digital journalism comes from the simple process of iteration and self-evaluation. We shouldn't expect to come up with a brilliant interactive on the first try every time, or even any of the time. I think the string-of-pearls is a great example of that, going from a visualization that I was confusing and overly-broad to a more focused graphic statement, thanks to a lot of evolution and brainstorming. It was exhausting work, but it's become my favorite of the four visualizations for this project, and I'm looking forward to tweaking it for future stories.

Future - Present - Past