this space intentionally left blank

May 10, 2016

Filed under: tech»web

Behind the Times

The paper recently launched a new native app. I can't say I'm thrilled about that, but nobody made me CEO. Still, the technical approach it takes is "interesting:" its backing API converts articles into a linear stream of blocks, each of which is then hand-rendered in the app. That's the plan, at least: at this time, it doesn't support non-text inline content at all. As a result, a lot of our more creative digital content doesn't appear in the app, or is distorted when it does appear.

The justification given for this decision was speed, with the implicit statement being that a webview would be inherently too slow to use. But is that true? I can't resist a challenge, and it seemed like a great opportunity to test out some new web features I haven't used much, so I decided to try building a client. You can find the code here. It's currently structured as a Chrome app, but that's just to get around the CORS limit since our API doesn't have the Access-Control-Allow-Origin headers added.

The app uses a technique that's been popularized by Nolan Lawson's Pokedex.org, in which almost all of the time-consuming code runs in a Web Worker, and the main thread just handles capturing UI events and re-rendering. I started out with the worker process handling network and caching in IndexedDB (the poor man's Service Worker), and then expanded it to do HTML sanitization as well. There's probably other stuff I could move in, but honestly I think it's at a good balance now.

By putting all this stuff into a second script that runs independently, it frees up the browser to maintain a smooth frame rate in animations and UI response. It's not just the fact that I'm doing work elsewhere, but also that there's hardly any garbage collection on the main thread, which means no halting while the JavaScript VM cleans up. I thought building an app this way would be difficult, but it turns out to be mostly similar to writing any page that uses a lot of AJAX — structure the worker as a "server" and the patterns are pretty much the same.

The other new technology that I learned for this project is Mithril, a virtual DOM framework that my old coworkers at ArenaNet rave about. I'm not using much of its MVC architecture, but its view rendering code is great at gradually updating the page as the worker sends back new data: I can generate the initial article list using just the titles that come from one network endpoint, and then add the thumbnails that I get from a second, lower-priority request. Readers get a faster feed of stories, and I don't have to manually synchronize the DOM with the new data.

The metrics from this version of the app are (unsurprisingly) pretty good! The biggest slowdown is the network, which would also be a problem in native code: loading the article list for a section requires one request to get the article IDs, and then one request for each article in that section (up to 21 in total). That takes a while — about a second, on average. On the other hand, it means we have every article cached by the time that the user can choose something to read, which cuts the time for requesting and loading an individual article hovers around 150ms on my Chromebook.

That's not to say that there aren't problems, although I think they're manageable. For one thing, the worker and app bundles are way too big right now (700KB and 200KB, respectively), in part because they're pulling in a bunch of big NPM modules to do their processing. These should be lazy-loaded for speed as much as possible: we don't need HTML parsing right away, for example, which would cut a good 500KB off of the worker's initial size. Every kilobyte of script is roughly 1ms of load time on a mobile device, so spreading that out will drastically speed up the app's startup time.

As an interesting side note, we could cut almost all that weight entirely if the document.implementation object was available in Web Workers. Weir, for example, does all its parsing and sanitization in an inert document. Unfortunately, the DOM isn't thread-safe, so nothing related to document is available outside the main process, and I suspect a serious sanitization pass would blow past our frame budget anyway. Oh well: htmlparser2 and friends it is.

Ironically, the other big issue is mostly a result of packaging this up as a Chrome app. While that lets me talk to the CMS without having CORS support, it also comes with a fearsome content security policy. The app shell can't directly load images or fonts from the network, so we have to load article thumbnails through JavaScript manually instead. Within Chrome's <webview> tag, we have the opposite problem: the webview can't load anything from the app, and it has a weird protocol location when loaded from a data URL, so all relative links have to be rewritten. It's not insurmountable, but you have to be pretty comfortable with the way browsers work to figure it out, and the debugging can get a little hairy.

So there you have it: a web app that performs like native, but includes support for features like DocumentCloud embeds or interactive HTML graphs. At the very least, I think you could use this to advocate for a hybrid native/web client on your news site. But there's a strong argument to be made that this could be your only app: add a Service Worker and (in Chrome and Firefox) it could load instantly and work offline after the first visit. It would even get a home screen icon and push notification support. I think the possibilities for progressive web apps in the news industry are really exciting, and building this client makes me think it's doable without a huge amount of extra work.

Future - Present - Past