this space intentionally left blank

August 10, 2016

Filed under: tech»web

RIP Chrome apps

Update: Well, that was prescient.

At least once a day, I log into the Chrome Web Store dashboard to check on support requests and see how many users I've still got. Caret has held steady for the last year or so at about 150,000 active users, give or take ten thousand, and the support and feature requests have settled into a predictable rut:

  • People who can't run Caret because their version of Chrome is too old, and I've started using new ES6 features that aren't supported six browser versions back.
  • People who want split-screen support, and are out of luck barring a major rewrite.
  • People who don't like the built-in search/replace functionality, which makes sense, because it's honestly pretty terrible.
  • People who don't like the icons, and are just going to have to get over it.

In a few cases, however, users have more interesting questions about the fundamental capabilies of developer tooling, like file system monitoring or plugging into the OS in a deeper way. And there I have bad news, because as far as I can tell, Chrome apps are no longer actively developed by the Chromium team at all, and probably never will be again.

I don't think Chrome apps are going away immediately — they're still useful and used by a lot of third-party companies — but it's pretty clear from the dev side of things that Google's heart isn't in it anymore. New APIs have ceased to roll out, and apps don't get much play at conferences. The new party line is all about progressive web apps, with browser extensions for the few cases where you need more capabilities.

Now, progressive web apps are great, and anything that moves offline applications away from a single browser and out to the wider web is a good thing. But the fact remains that while a large number of Chrome apps can become PWAs with little fuss, Caret can't. Because it interacts with the filesystem so heavily, in a way that assumes a broader ecosystem of file-based tools (like Git or Node), there's actually no path forward for it using browser-only APIs. As such, it's an interesting litmus test for just how far web apps can actually reach — not, as some people have wrongly assumed, because there's an inherent performance penalty on the web, but because of fundamental limits in the security model of the browser.

Bounding boxes

What's considered "possible" for a web app in, say, 2020? It may be easier to talk about what isn't possible, which avoids the judgment call on what is "suitable." For example, it's a safe bet that the following capabilities won't ever be added to the web, even though they've been hotly debated in and out of standards committees for years:

  • Read/write file access (died when the W3C pulled the plug on the Directories part of the Filesystem API)
  • Non-HTTP sockets and networking (an endless number of reasons, but mostly "routers are awful")

There are also a bunch of APIs that are in experimental stages, but which I seriously doubt will see stable deployment in multiple browsers, such as:

  • Web Bluetooth (enormous security and usability issues)
  • Web USB (same as Bluetooth, but with added attacks from the physical connection)
  • Battery status (privacy concerns)
  • Web MIDI

It's tough to get worked up about a lot of the initiatives in the second list, which mostly read as a bad case of mobile envy. There are good reasons not to let a web page have drive-by access to hardware, and who's hooking up a MIDI keyboard to a browser anyway? The physical web is a better answer to most of these problems.

When you look at both lists together, one thing is clear: Chrome apps have clearly been a testing ground for web features. Almost all the not-to-be-implemented web APIs have counterparts in Chrome apps. And in the end, the web did learn from it — mainly that even in a sandboxed, locked-down, centrally distributed environment, giving developers that much power with so little install friction could be really dangerous. Rogue extensions and apps are a serious problem for Chrome, as I can attest: about once a week, shady people e-mail me to ask if they can purchase Caret. They don't explicitly say that they're going to use it to distribute malware and takeover ads, but the subtext is pretty clear.

The great thing about the web is that it can run code without any installation step, but that's also the worst thing about it. Even as a huge fan of the platform, the idea that any of the uncountable pages I visit in any given week could access USB directly is pretty chilling, especially when combined with exploits for devices that are plugged in, like hacking a phone (a nice twist on the drive-by jailbreak of iOS 4). Access to the file system opens up an even bigger can of worms.

Basically, all the things that we want as developers are probably too dangerous to hand out to the web. I wish that weren't true, but it is.

Untrusted computing

Let's assume that all of the above is true, and the web can't safely expand for developer tools. You can still build powerful apps in a browser, they just have to be supported by a server. For example, you can use a service like Cloud 9 (now an AWS subsidiary) to work on a hosted VM. This is the revival of the thick-client model: offline capabilities in a pinch, but ultimately you're still going to need an internet connection to get work done.

In this vision, we are leaning more on the browser sandbox: creating a two-tier system with the web as a client runtime, and a native tier for more trust on the local machine. But is that true? Can the web be made safe? Is it safe now? The answer is, at best, "it depends." Every third-party embed or script exposes your users to risk — if you use an ad network, you don't have any real idea who could be reading their auth cookies or tracking their movements. The miracle of the web isn't that it is safe, it's that it manages to be useful despite how rampantly unsafe its defaults are.

So along with the shift back to thick clients has come a change in the browser vendors' attitude toward powerful API features. For example, you can no longer use geolocation or the camera/microphone in Chrome on pages that aren't served over HTTPS, with other browsers to follow. Safari already disallows third-party cookie access as a general rule. New APIs, like Service Worker, require HTTPS. And I don't think it's hard to imagine a world where an API also requires a strict Content Security Policy that bans third-party embeds altogether (another place where Chrome apps led the way).

The packaged app security model was that if you put these safeguards into place and verified the package contents, you could trust the code to access additional capabilities. But trusting the client was a mistake when people were writing Quakebots, and it stayed a mistake in the browser. In the new model, those controls are the minimum just to keep what you had. Anything extra that lives solely on the client is going to face a serious uphill battle.

Mind the gap

The longer that I work on Caret, the less I'm upset by the idea that its days are numbered. Working on a moderately-successful open source project is exhausting: people have no problems making demands, sending in random changes, or asking the same questions over and over again. It's like having a second boss, but one that doesn't pay me or offer me any opportunities for advancement. It's good for exposure, but people die from exposure.

The one regret that I will have is the loss of Caret's educational value. Since its early days, there's been a small but steady stream of e-mail from teachers who are using it in classrooms, both because Chromebooks are huge in education and because Caret provides a pretty good editor with almost no fuss (you don't even have to be signed in). If you're a student, or poor, or a poor student, it's a pretty good starter option, with no real competition for its market niche.

There are alternatives, but they tend to be online-only (like Mozilla's Thimble) or they're not Chromebook friendly (Atom) or they're completely unacceptable in a just world (Vim). And for that reason alone, I hope Chrome keeps packaged apps around, even if they refuse to spend any time improving the infrastructure. Google's not great at end-of-life maintenance, but there are a lot of people counting on this weird little ecosystem they've enabled. It would be a shame to let that die.

August 1, 2016

Filed under: tech»web

<slide-show>

On Thursday, I'll be giving a talk at CascadiaFest on using custom elements in production. It's kind of a sales pitch, to convince people that adopting web components is safe to do, despite the instability of the spec and the contentious politics between browsers. After all, we've been publishing with several components at the Times for almost two years now, with good results.

When I presented an early version of this at SeattleJS, I presented by scrolling through a single text file instead of slides, because I've always wanted to do that. But for Cascadia, I wanted to do something a little more special, so I built the presentation itself out of custom elements, with the goal that it would demonstrate how to write code that works with both versions of the spec. It's also meant to be a good example for someone who's just learning how web components function — I use pretty much every custom elements feature at one point or another in 300 lines of code. You can take a look at the source for it here.

There are several strategies that I ended up emphasizing while writing the <slide-show> elements, primarily the heavy use of events to tame asynchronicity. It turns out that between V0, V1, and the two major polyfills, elements and their attributes are resolved by the parser with entirely different timing. It's really important that child elements notify their parent when they upgrade, and parents shouldn't assume that children are ready at startup.

One way to deal with asynchronous upgrades is just to put all your functionality in the parent element (our <leaflet-map> does this), but I wanted to make these slides easier to extend with new types (such as text, code, or image slides). In this case, the slide show looks for a parsedContent property on the current slide, and it's the child's job to populate and update that value. An earlier version called a parseContents() method, but using properties as "duck-typing" makes it much easier to handle un-upgraded elements, and moving the responsibility to the child also greatly simplified the process of watching slide contents for changes.

A nice side effect of using live properties and events is that it "feels" a lot more like a built-in element. The modern DOM API is built on similar primitives, so writing the glue code for the UI ended up being very pleasant, and it's possible to interact using the dev tools in a natural way. I suspect that well-built component libraries in the future will be judged on how well they leverage a declarative interface to blend in with existing elements.

Ironically, between child elements and Shadow DOM, it's actually much harder to move between different polyfills than it is to write an element definition for both the new and old specifications. We've always written for Giammarchi's registerElement shim at the Times, and it was shocking for me to find out that Polymer's shim not only diverges from its counterpart, but also differs from Chrome's native implementation. Coding around these differences took a bit of effort, but it's probably work I should have done at the start, and the result is quite a bit nicer than some of the hacks I've done for the Times. I almost feel like I need to go back now and update them with what I've learned.

Writing this presentation was a good way to make sure I was current on the new spec, and I'm actually pretty happy with the way things have turned out. When WebKit started prototyping their own API, I started to get a bit nervous, but the resulting changes are relatively minor: some property names have changed, the lifecycle is ordered a bit differently, and upgrade code is called in the constructor (to encourage using the class syntax) instead of from a createdCallback() method. Most of these are positive alterations, and while there are some losses going from V0 to V1 (no is attribute to subclass arbitrary elements), they're not dealbreakers. Overall, I'm more optimistic about the future of web components than I have in quite a while, and I'm looking forward to telling people about it at Cascadia!

May 10, 2016

Filed under: tech»web

Behind the Times

The paper recently launched a new native app. I can't say I'm thrilled about that, but nobody made me CEO. Still, the technical approach it takes is "interesting:" its backing API converts articles into a linear stream of blocks, each of which is then hand-rendered in the app. That's the plan, at least: at this time, it doesn't support non-text inline content at all. As a result, a lot of our more creative digital content doesn't appear in the app, or is distorted when it does appear.

The justification given for this decision was speed, with the implicit statement being that a webview would be inherently too slow to use. But is that true? I can't resist a challenge, and it seemed like a great opportunity to test out some new web features I haven't used much, so I decided to try building a client. You can find the code here. It's currently structured as a Chrome app, but that's just to get around the CORS limit since our API doesn't have the Access-Control-Allow-Origin headers added.

The app uses a technique that's been popularized by Nolan Lawson's Pokedex.org, in which almost all of the time-consuming code runs in a Web Worker, and the main thread just handles capturing UI events and re-rendering. I started out with the worker process handling network and caching in IndexedDB (the poor man's Service Worker), and then expanded it to do HTML sanitization as well. There's probably other stuff I could move in, but honestly I think it's at a good balance now.

By putting all this stuff into a second script that runs independently, it frees up the browser to maintain a smooth frame rate in animations and UI response. It's not just the fact that I'm doing work elsewhere, but also that there's hardly any garbage collection on the main thread, which means no halting while the JavaScript VM cleans up. I thought building an app this way would be difficult, but it turns out to be mostly similar to writing any page that uses a lot of AJAX — structure the worker as a "server" and the patterns are pretty much the same.

The other new technology that I learned for this project is Mithril, a virtual DOM framework that my old coworkers at ArenaNet rave about. I'm not using much of its MVC architecture, but its view rendering code is great at gradually updating the page as the worker sends back new data: I can generate the initial article list using just the titles that come from one network endpoint, and then add the thumbnails that I get from a second, lower-priority request. Readers get a faster feed of stories, and I don't have to manually synchronize the DOM with the new data.

The metrics from this version of the app are (unsurprisingly) pretty good! The biggest slowdown is the network, which would also be a problem in native code: loading the article list for a section requires one request to get the article IDs, and then one request for each article in that section (up to 21 in total). That takes a while — about a second, on average. On the other hand, it means we have every article cached by the time that the user can choose something to read, which cuts the time for requesting and loading an individual article hovers around 150ms on my Chromebook.

That's not to say that there aren't problems, although I think they're manageable. For one thing, the worker and app bundles are way too big right now (700KB and 200KB, respectively), in part because they're pulling in a bunch of big NPM modules to do their processing. These should be lazy-loaded for speed as much as possible: we don't need HTML parsing right away, for example, which would cut a good 500KB off of the worker's initial size. Every kilobyte of script is roughly 1ms of load time on a mobile device, so spreading that out will drastically speed up the app's startup time.

As an interesting side note, we could cut almost all that weight entirely if the document.implementation object was available in Web Workers. Weir, for example, does all its parsing and sanitization in an inert document. Unfortunately, the DOM isn't thread-safe, so nothing related to document is available outside the main process, and I suspect a serious sanitization pass would blow past our frame budget anyway. Oh well: htmlparser2 and friends it is.

Ironically, the other big issue is mostly a result of packaging this up as a Chrome app. While that lets me talk to the CMS without having CORS support, it also comes with a fearsome content security policy. The app shell can't directly load images or fonts from the network, so we have to load article thumbnails through JavaScript manually instead. Within Chrome's <webview> tag, we have the opposite problem: the webview can't load anything from the app, and it has a weird protocol location when loaded from a data URL, so all relative links have to be rewritten. It's not insurmountable, but you have to be pretty comfortable with the way browsers work to figure it out, and the debugging can get a little hairy.

So there you have it: a web app that performs like native, but includes support for features like DocumentCloud embeds or interactive HTML graphs. At the very least, I think you could use this to advocate for a hybrid native/web client on your news site. But there's a strong argument to be made that this could be your only app: add a Service Worker and (in Chrome and Firefox) it could load instantly and work offline after the first visit. It would even get a home screen icon and push notification support. I think the possibilities for progressive web apps in the news industry are really exciting, and building this client makes me think it's doable without a huge amount of extra work.

March 22, 2016

Filed under: tech»web

ES6 in anger

One of the (many) advantages of running Seattle Times interactives on an entirely different tech stack from the rest of the paper is that we can use new web features as quickly as we can train ourselves on them. And because each news app ships with an isolated set of dependencies, it's easy to experiment. We've been using a lot of new ES6 features as standard for more than a year now, and I think it's a good chance to talk about how to use them effectively.

The Good

Surprisingly (to me at least), the single most useful ES6 feature has been arrow functions. The key to using them well is to restrict them only to one-liners, which you'd think would limit their usefulness. Instead, it frees you up to write much more readable JavaScript, especially in array processing. As soon as it breaks to a second line (or seems like it might do so in the future), I switch to writing regular function statements.


//It's easy to filter and map:
var result = list.filter(d => d.id).map(d => d.value);

//Better querySelectorAll with the spread operator:
var $ = s => [...document.querySelectorAll(s)];

//Fast event logging:
map.on("click", e => console.log(e.latlng);

//Better styling with template strings:
var translate = (x, y) => `translate(${x}px, ${y}px);`;

Template strings are the second biggest win, especially as above, where they're combined with arrow functions to create text snippets. Having a multiline string in JS is very useful, and being able to insert arbitrary values makes building dynamic popups or CSS styles enormously simpler. I love writing template strings for quick chunks of templating, or embedding readable SQL in my Node apps.

Despite the name, template strings aren't real templates: they can't handle loops, they don't really do interpolation, and the interface for using "tagged" strings is cumbersome. If you're writing very long template strings (say, more than five lines), it's probably a sign that you need to switch to something like Handlebars or EJS. I have yet to see a "templating language" built on tagged strings that didn't seem like a wildly frustrating experience, and despite the industry's shift toward embedded DSLs like React's JSX, there is a benefit to keeping different types of code in different files (if only for widespread syntax highlighting).

The last feature I've really embraced is destructuring and object literals. They're mostly valuable for cleanup, since all they do is cut down on repetition. But they're pleasant to use, especially when parsing text and interacting with CommonJS modules.



//Splitting dates is much nicer now:
var [year, month, day] = dateString.split(/\/|-/);

//Or getting substrings out of a regex match:
var re = /(\w{3})mlb_(\w{3})mlb_(\d+)/;
var [match, away, home, index] = gameString.match(re);

//Exporting from a module can be simpler:
var x = "a";
var y = "b";
module.exports = { x, y };

//And imports are cleaner:
var { x } = require("module");

The bad

I've tried to like ES6 classes and modules, and it's possible that one day they're going to be really great, but right now they're not terribly friendly. Classes are just syntactic sugar around ES5 prototypes — although they look like Java-esque class statements, they're still going to act in surprising ways for developers who are used to traditional inheritance. And for JavaScript programmers who understand how the language actually works, class definitions boast a weird, comma-less syntax that's sort of like the new object literal syntax, but far enough off that it keeps tripping me up.

The turning point for the new class keyword will be when the related, un-polyfillable features make their way into browsers — I'm thinking mainly of the new Symbols that serve as feature flags and the ability to extend Array and other built-ins. Until that time, I don't really see the appeal, but on the other hand I've developed a general aversion to traditional object-oriented programming, so I'm probably not the best person to ask.

Modules also have some nice features from a technical standpoint, but there's just no reason to use them over CommonJS right now, especially since we're already compiling our applications during the build process (and you have to do that, because browser support is basically nil). The parts that are really interesting to me about the module system — namely, the configurable loader system — aren't even fully specified yet.

New discoveries

Most of what we use on the Times' interactive team is restricted to portions of ES6 that can be transpiled by Babel, so there are a lot of features (proxies, for example) that I don't have any experience using. In a Node environment, however, I've had a chance to use some of those features on the server. When I was writing our MLB scraper, I took the opportunity to try out generators for the first time.

Generators are borrowed liberally from Python, and they're basically constructors for custom iterable sequences. You can use them to make normal objects respond to language-level iteration (i.e., for ... of and the spread operator), but you can also define sequences that don't correspond to anything in particular. In my case, I created a generator for the calendar months that the scraper loads from the API, which (when hooked up to the command line flags) lets users restart an MLB download from a later time period:


//feed this a starting year and month
var monthGen = function*(year, month) {
  while (year < 2016) {
    yield { year, month };
    month++;
    if (month > 12) {
      month = 1;
      year++;
    }
  }
};

//generate a sequence from 2008 to 2016
var months = [...monthGen(2008, 1)];

That's a really nice code pattern for creating arbitrary lists, and it opens up a lot of doors for JavaScript developers. I've been reading and writing a bit more Python lately, and it's been amazing to see how much a simple pattern like this, applied language-wide, can really contribute to its ergonomics. Instead of the Stream object that's common in Node, Python often uses generators and iteration for common tasks, like reading a file line-by-line or processing a data pipeline. As a result, I suspect most new Python programmers need to survey a lot less intellectual surface area to get up and running, even while the guts underneath are probably less elegant for advanced users.

It surprised me that I was so impressed with generators, since I haven't particularly liked Python very much in the past. But in reading the Cookbook to prep for a UW class in Python, I've realized that the two languages are actually closer together than I'd thought, and getting closer. Python's class implementation is actually prototypical behind the scenes, and its use of duck typing for built-in language features (such as the with statement) bears a strong resemblance to the work being done on JavaScript Promises (a.k.a. "then-ables") and iterator protocols.

It's easy to be resistant to change, and especially when it's at the level of a language (computer or otherwise). I've been critical of a lot of the decisions made in ES6 in the past, but these are positive additions on the whole. It's also exciting, as someone who has been working in JavaScript at a deep level, to find that it has new tricks, and to stretch my brain a little integrating them into my vocabulary. It's good for all of us to be newcomers every so often, so that we don't get too full of ourselves.

December 28, 2015

Filed under: tech»web

Let's not

Right now you can access my portfolio over a secure, encrypted connection, thanks to Let's Encrypt. Which is pretty cool! On the other hand, if nginx restarts this week, it'll probably crash on a bad config value, temporarily disabling all my public-facing websites. This has been emblematic of my HTTPS experience in general: a mix of triumphs and severe configuration mishaps.

A little background: in order to serve a website over a secure connection, you need a digital certificate to encrypt communication with the browser. You can generate these certificates yourself, but that's really only good for personal use. The self-signed cert has to be manually installed on each machine that accesses the server, otherwise the browser will throw up a big, ugly warning screen. The alternative is to buy a certificate from a "trusted authority," most of which are not particularly trustworth or authoritative, but it'll get you a green lock icon in the URL bar. Purchased certs tend to be either expensive or a hassle or both.

After the Snowden leaks, there was a lot of interest in encrypting all web traffic, which meant bypassing the existing certificate authority protection racket run by Symantec et al. Mozilla and some other organizations got together and started Let's Encrypt, with the goal of making trusted certificates free and easy. I figure they're halfway there: I didn't pay anyone for the cert, at least.

There's an official client for the service, but it only works for Apache and it's kind of hefty. My server is set up in an unsupported (but still pretty standard) configuration: I run nginx as a forward proxy in front of Apache (for PHP scripts) and Node (for various apps, including Weir), both of which I'd like to be secured. So I used acme-tiny instead, which basically just talks to the cert API and is small enough that I could read and understand the whole thing. I wrote a shell script to wrap it up and automate things. Automation is important, because unlike paid certificates, these are only good for 90 days, so you need a cron job set to run every month or so to renew them.

Setting all this up wasn't an easy process. The acme-tiny script is well-written, but it has bugs on the version of Python that comes with CentOS. Then I had to set up nginx to use the certificates manually. My webmail got locked into an infinite redirect once I moved my self-signed cert out from Apache and out to the proxy. And the restart crash? Turns out that Let's Encrypt is rate-limited on a per-domain basis, and I didn't back up the current certificate before I hit the rate limit, so my update script overwrote it with an empty version. Luckily, nginx caches certs and won't restart if it detects a bad config, so I'm safe as long as it can outlast the seven-day rate-limit window (it probably will: it's been up 333 days so far, after all).

Without literally years of server admin experience, I'm not sure I would have made it through these issues. And as I mentioned, my system is pretty standard — there's no load-balancer, no CDN, and I don't need to host third-party content. I also don't have any business that gets lost if anything is busted and the certificate expires in March. If I were, say, an IT department responsible for a high-traffic site, I'd be a lot more cautious about moving everything over to HTTPS, either through Let's Encrypt or a paid option.

Ultimately, the news industry and other sites are going to have to follow the lead of the Washington Post, even if the timeframe takes a while. Even apart from the security benefits it carries, browsers have locked new features (Service Worker, for example) behind HTTPS, and are moving old features behind it as well (geolocation is going to be the biggest disruption there). If you want to develop fast websites in the future (assuming that's something news product management cares about, which is... questionable), and especially if you want to create rich news applications, you're going to have to be encrypted.

In my case, I wanted to get a head start on developing with new browser features (a Service Worker would clean up a lot of Weir code), so it's worth the hassle. And we will continue to push these boundaries on the Seattle Times interactives team, since we've moved our S3 hosting to HTTPS (the rest of the site will follow eventually).

But I think there's a lot of tension between where we want to be, as a news industry, and where it's possible for us to be right now. Although I've seen people calling for incentives to change it (such as requiring HTTPS for news grants), the truth is that it just isn't that simple. News sites are often built in a baroque, overcomplicated set of layers — the Seattle Times, for example, currently sits behind a CDN, several instances of Varnish, some reverse proxies, and a load balancer, mostly due to a lot of historical baggage. Changing this to run securely is going to be a big process, even for a company of our size (maybe because of our size). I can't imagine the hassle for local papers that might have little or no IT support. It won't happen overnight, and Let's Encrypt hasn't done anything to change that yet.

In the meantime, I think it's worth stepping back and asking what we really want out of a digital news industry, because sometimes it's hard to maintain perspective from in the trenches. Is it important that readers be able to see our sites securely, free from worries that third parties are snooping or altering what they see? Sure, that's important. Is it in the top three things that Americans need from local news, above problems like "a sustainable revenue model" and "a CMS that doesn't actively fight against the newsroom?" Probably not. Given a choice between a cryptographically-secure media and a diverse, sustainably-funded media, I'm personally going to take the latter every time.

December 7, 2015

Filed under: tech»web

How We're Fast

Over the Thanksgiving holiday, when I wasn't busily digesting as much cornbread stuffing as I could eat, I spent some time running WebPageTest against various projects that the Seattle Times Interactive team has built. The news industry as a whole may not care about speed, but I do, and I want our pages as fast as possible — especially the ones that are embedded in the regular CMS via responsive frames.

After all the testing, I'm generally pleased by how our stuff stacks up, especially when compared against the rest of the site. We have some advantages, of course: our pages typically have fewer ads, and we can strip down the page for maximum efficiency. But it's also the result of a lot of hard work on our news app template, ensuring that every project comes with smart decisions built in. I genuinely think that all news pages could be this fast, so it's worth talking about how we've made it happen, especially for other news organizations that use a similar flat-file approach to their interactives.

Browserify with care

We use Browserify to package up our JavaScript, because we're not savages, and you need some sort of module system for JavaScript these days. Browserify builds all our scripts into a single file, which is important for high-latency connections (which means most cellular networks, even on 4G). We also make sure to load that bundle file with the async attribute at the bottom of the page, so that it won't block rendering.

All of that is pretty standard best practice, but we've also learned that Browserify can be dangerous if you're not careful. A lot of NPM modules are published with the unminified, debug version of the library as the default export from the module. Angular in particular is bad about this: running require("angular") on its own will load a file filled with comments and documentation, totalling more than a megabyte in size (even after gzip, it's still more than 200KB). That's huge!

As a result, one of our production checklist items is to make sure that we are loading the minified version of any external libraries. We also use the browser property in our package.json file to alias common libraries to their minified versions, so that when we require Angular, jQuery, or Leaflet, it automatically defaults to the smallest file.

Gzip on S3

Like a lot of newsroom developers, my team hosts files on Amazon S3, mostly because it's cheap and reliable. People like to think about S3 as though it's just a normal, heirarchical flat-file server, like Apache or Nginx, but it's not. S3 is really a key-value store: you put in a path, and it spits back a prerecorded reply, including the headers.

If you think of S3 as a server, you'll expect it to do a bunch of things that it doesn't actually do. For example, it doesn't set a cache expiration date, and it doesn't know about content types. It also doesn't understand about Gzip compression, so it'll merrily serve your files in their uncompressed form, making them way bigger than they need to be, even if the browser requests the compressed version.

We get around this by running a compression stage on any text-based file during deployment, and setting the headers for the stored object to match. This does mean that theoretically, a browser that doesn't support Gzip will be unable to request that content, because S3 will always respond with compressed content no matter what Accepts-Encoding header the browser sends. Luckily, every browser since IE4 supports it.

Reduce framework code

I love Angular. If you want to quickly generate a visualization with powerful tools for filtering and data binding, you can't do much better. I personally think it's an order of magnitude better than D3. But Angular can also be brutally slow: its change detection algorithm requires a lot of time and memory as a tradeoff for developer convenience.

On a recent project that looked at animal imports, we started with Angular as a way to test out the visualization, but soon noticed that it was taking three or four seconds just to parse and apply the data. On a desktop, that time is a drag. On mobile, it's likely to get the tab terminated, or convince readers that there's something wrong with it.

When the profiler says that you're spending that much time in JavaScript, there are two options. The first is to try to find ways to work around the framework, which can range from unpleasant to actively painful. The second is to just rewrite in vanilla JS. It sounds more difficult to do the rewrite, but if all you're doing is data-binding and events, you can usually replace it pretty easily with a little templating and some custom data attributes. The resulting code isn't as clean or simple, but in the case of the animal imports, it dropped our JS execution time to under 100ms. That's fast.

Even jQuery can be optional these days. Because we compile ES6 down with Babel, a lot of DOM code that would be ungainly can become elegant. Template strings and arrow functions alone have allowed us to cut out DOM libraries entirely, and as a result many of our interactives consist of no external libraries at all. If you haven't checked into the advantages of using Babel in your build process, it's well worth another look.

Reduce third-party code

The number one contributor to page load time is not written by journalists: it's the third-party ad code that runs on the page. There may be only so much you can do about this, since it pays the bills, and of course it may not even apply on embedded graphics. But on our standalone pages, I've taken a strong stance on implementing all code ourselves whenever possible. For example, although our commenting system usually requires multiple scripts loaded synchronously, I wrote a loader that runs through and adds them asynchronously, and only after a user clicks on the "view comments" banner. We can't avoid the hit, but we can delay it until well after the rest of the page has had a chance to render.

Lazy-load everything

Once you've delayed scripts with the async attribute, trimmed the size of those scripts and compressed them, and deferred as much third-party code as you can, what's left over? In our case, this is where we start getting into the structure of the actual interactive, and how it loads itself. For most interactives, we embed data directly into the page, but beyond a certain size it becomes worthwhile to grab it via AJAX instead.

But there's another way to think about lazy-loading, and that's to consider what format you're actually using to populate the page. I'm as big a fan of progressive enhancement as anyone else, but in the case of my team, what we produce is interactive — there's literally no point if JavaScript is disabled. I've found that moving content into JSON and then templating it onto the page can reduce download times significantly, while the speed hit is negligible. Finding the balance between network speed and JavaScript execution time is a constant process for us.

When performance matters

Finally, a note of caution: as much fun as it is to squeeze every last millisecond out of the browser, I'm a little uncomfortable making it the alpha and omega of the job. Ultimately, our goal is to inform people — we'd like that to be fast, but a fast page with bad or misleading reporting is still a failure.

What I like about front-end speed is that it serves as a useful proxy for site quality. A site that's fast can't load too many ads. It can't serve too many tracking scripts. It has to put the reader first. It's easy, much of the time, to chip away at performance in the name of business metrics: loading an additional analytics script to get more information, or an obnoxious ad for a short-term revenue boost.

But if you put speed first, every decision has to start from the perspective of "what's good for the reader?" It's hard to measure the impact of good journalism, but we can have metrics for speed and other technical aspects of the presentation. We can spend more time on the former if we have strong, user-centric guidelines on the latter. If we want people to give us money over the long term, that seems like the only healthy strategy to me.

September 3, 2015

Filed under: tech»web

Codes of Conduct

Rachel Nabors writes that that you literally cannot pay her to attend a conference without a code of conduct:

When I promised not to go to conferences without Codes of Conduct, I wasn’t paying lip service to a trend, doing the popular thing to gain brownie points with my feminist besties. I meant every word. It is my greatest fear in life that something bad would happen to someone attending a conference I attracted them to.

It was weird the other day to realize that I'm actually becoming a mentor figure for some people: I have interns, I teach classes, I'm now the co-organizer of the local Hacks/Hackers chapter. I'm still basically nobody, but I will also make this pledge: if a conference does not have a code of conduct (or its equivalent), I won't go, as a speaker or an attendee. It's important to me that industry gatherings be places where people are comfortable and safe, no matter who they are. Everyone deserves that much consideration, and while a code of conduct is not a guarantee of safe behavior, it's a good start.

It's disappointing, but perhaps to be expected, that several high-profile white dudes have decided that the most important thing they can do this year is fight against codes of conduct. Many of them feel attacked and want to circle the wagons instead of listening to voices from the community. That's too bad for people who attend conferences — but part of the point of the pledge is that it should punish organizers who don't think creating a standard framework for conference behavior is a priority. If they can't get speakers or attendees to come, sooner or later they don't have an event.

Let's be clear: nobody is entitled to run a conference, and helping people with their careers once upon a time does not excuse you from being a good citizen. If the pressure causes people like Jared Spool and Mike Monteiro to change their mind, everyone benefits. If it doesn't, and they go the way of the dinosaurs, well... that's how evolution works. We'll survive without them.

But I think this is a great opportunity to re-examine one particular community role that I see a lot, both in tech and outside. You know the one: that guy (almost always a man) whose schtick is being the "honest teller of truth," which really means "rude, petty, and abusive in kind of a funny way about stuff we all agree on." A lot of times, we tolerate that behavior because honestly, it is satisfying to have someone saying what we're all thinking about clients who won't pay, or bad design, or ugly code. I have sometimes thought of myself as that person, but I'm trying not to be anymore.

The problem with the "angry truth-teller" is that it stops being funny when they suddenly turn those tools on their allies. When that happens, you realize that it was never actually funny: you just didn't care about being respectful, because you didn't think the target was worthy. Unfortunately, a lack of empathy is not the same as a sense of humor. There might be a thin line between "comedian" and "jerk," but unless you're Don Rickles I don't really see the point in intentionally crossing it.

In the end, life is too short to give money and attention to people who can't have a little empathy over something as silly as tech conference administration. We are not required to make them leaders in our community, nor are we required to keep them as leaders if we decide that the negatives of their input outweigh the positives. There are lots of technically-skilled people out there who can be right about an issue — or wrong on it, even — without being entitled, nasty jackasses about it. Let's boost them instead.

June 25, 2015

Filed under: tech»web

A good (virtual) walk spoiled

Earlier today I took the wraps off of the private repo for our Chambers Bay interactive flyover. You can find the source code here, and a post on our dev blog about it here. It was a really fun challenge, and a rare example of using WebGL in a news capacity (the NYT did the Dawn Wall, but that's the only one I can remember recently).

From a technical standpoint, this was my first three.js project, and the experience was largely positive. I think there's a strong case to be made that three.js is basically jQuery for WebGL: sure, you don't need it, but it only takes a couple of features to make it worthwhile. In this case, I didn't particularly feel like writing a model loader or a scene graph. There are still plenty of hooks to write the parts that I do enjoy, like the fragment shader for the landscape (check out that sweet dithering), or the UI for directing the camera. Sure, three.js is a relatively large library, but I'm loading 4MB of textures and another 4MB of gzipped landscape model, so what's a few more hundred KB of code?

WebGL itself runs surprisingly well these days, although failure modes do not seem to be its strong suit. For example, the browser may have WebGL support enabled, but then crash when it tries to render (or it may be lying about support, as with the remote VM sessions used in Times meeting rooms). That said, I was astonished to find that pretty much everything (mobile included) could run the landscape at a solid frame rate, despite the fact that it's a badly-optimized mesh with 150,000 triangles. Even iOS, which usually falls over and dies when WebGL pushes past its skimpy RAM limits, was able to run smoothly once I added a low-res texture for it to use.

This was an ambitious project using some pretty cutting-edge web technology, which makes it interesting in light of arguments that the web suffers from "featuritis". After all, when you're talking about feature overkill, WebGL is a barnstormer. But this story would have been tough to tell another way, and it would have never had the same reach siloed in an app store.

Or take Paul Ford's mind-boggling What is Code? in Businessweek this month: behind Bloomberg's Trapper Keeper design aesthetic, it's a powerful article that integrates animations, videos, and interactive demonstrations with the textual message. Ironically, I saw many of the same people that criticize web apps going wild for Ford's piece, a stance that I can only attribute to sophistry.

My team at the Seattle Times has gradually abandoned the term "news apps," since everyone who hears it assumes we're actually writing iPhone clients for the paper. As a term of art, it has always been clumsy. But it does strike at a crucial quality of what we do, which occupy a gray area between "text" and "program." And if it seems like I'm touchy about pundits who think we should abandon the web, this is in large part the reason why.

Arguments that browsers should just go back to being a document viewers ignore the fact that HTML is not just a text format: it's a hypermedia format, and those have always blurred the already-fuzzy lines between data and code (see also: Excel, Hypercard, and IPython notebooks). It's true that the features of the web platform are often abused. Nobody likes slow navigation, ad popups, or user tracking scripts. But it's those same features that make new kinds of storytelling possible — my journalism is built on the same heavily-structured, "over-tooled" web platform that critics find so objectionable. I wouldn't give that up for all the native apps in the world.

February 18, 2015

Filed under: tech»web

Speed Kills

It's an accepted truth on the web that fast pages are better for users — people stay on them longer, follow more links from them, and generally report being happier with them. I think a lot about performance on my projects, because I want readers to be thinking about the story, not distracted by slow load times.

Unfortunately, web performance has a bad rap, in part because it's a complicated topic. Making it work effectively and efficiently means learning a lot about how the browser runtime works, and optimizing for new techniques like GPU transforms. Like everything else in web development, there's also a lot of misinformation out there, and a lot of people who insist that everything was better back when we built everything without all the JavaScript and fancy-pants frameworks.

It's possible that I've been more aware of it, just because I've been working on a project that involves smoothly animating a chart using regular HTML instead of canvas, but it seems like it's been a bad month for that kind of thing. First Peter-Paul Koch wrote a diatribe about client-side templating, insisting that it's a needless performance hit. Then Flipboard wrote about discarding traditional elements entirely, instead rendering everything to a canvas tag in pursuit of 60 frames/second animations. Ironically, you'll notice that these are radically different approaches that both claim they create a better experience.

Instead of just sighing while the usual native app advocates use these posts to bash the web, and given that I am working on a page where high-performance mobile animations are a key part, I thought it'd be nice to talk about some experiments I've run with the approaches found in both. There are a lot of places where the web platform needs help competing on mobile, no doubt. But I'd prefer we talk about actual performance problems, and not get sidetracked into chasing down scattered criticisms without evidence.

Let's start with templating, which is serving as a stand-in for client-side JavaScript in general. PPK argues that templating (and by extension, single-page app design) is terrible for performance, but is that true? While I was working on my graph, I worried a little bit about startup time. Since I write JavaScript on both the server and the client, it was pretty easy to port my code from one to the other and check. I personally found the results conclusive, and a little surprising.

The client-side version of the page weighed in at 10KB and spent roughly 35ms in JavaScript during startup, rendering the page and prepping its data structures. That's actually not bad for something that's doing some fairly heavy positioning and styling, and it fits in the 14KB first TCP round-trip recommended by Google. In contrast, the server-side page, in which all the markup was pre-rendered and then progressively enhanced after page load, was 160KB and spent about 30ms in JavaScript. In other words, following PPK's advice to avoid client-side templating caused the page to be sixteen times larger, and still required two video frames to start up.

Now, this is a slightly special case: unlike typical server applications, my news apps are useless without JavaScript. They're not RESTful, they don't talk to a database, they involve a lot of moving parts. But even I was surprised by how little impact client-side templating actually had. Browsers these days are just ridiculously fast at assembling HTML. So while I don't recommend doing the entire page this way, or abandoning server-generated HTML entirely, it's pretty clear to me that it's not the slam-dunk case that holdouts for traditional server rendering claim it is.

At the other extreme is Flipboard's experiment with canvas rendering. Instead of putting everything in the document, like normal websites, they put a full-screen canvas image up and render everything — text, images, animations, etc. — manually to that buffer. You can try a demo out on your device here. On my Nexus 5, which is a reasonably new device running the latest version of Chrome, it's noticeably choppy. My experience with canvas is that Chrome's implementation is actually much faster than Safari, so I don't expect it to be smooth on iOS either (they've blacklisted tablets, so I can't be sure).

In order to get this "fluid" experience, here's what the Flipboard team threw away:

  • Accessibility: nothing on the page actually exists to a screen reader. Users that invert their screens or change the text size to make it easier to read are out of luck.
  • Copy and paste: since there's no document, there's nothing to select for these basic text operations.
  • Real links: you can't open pages in a new tab. You can't share them using the OS share panel. You can't do anything with the links, because they aren't really there.
  • GPU acceleration: by doing everything in canvas, they've ignored all the optimizations that browsers actually do to ensure a smooth, battery-efficient experience.
  • View source: inspecting the Flipboard page gives you no information at all, and the JavaScript is minified without source maps. The app is completely opaque to anyone who wants to learn from it.
The irony of doing all this work for a fluid experience that isn't actually fluid is that the kinds of animations they're doing — transform and opacity — are actually the exact properties that browsers can animate at 60FPS. Much has been written about using GPU compositing for smooth animations, but this article is a great start. If Flipboard had stuck with the DOM and used the GPU fully, they probably could have had fluid animations without leaving all those other features behind.

That's an easy thing to say, but is it true? Here's another experiment from my stacked bar chart: when a toggle is pressed, the chart shifts from being a measure of absolute numbers to relative proportions, with each bar smoothly animating up to 100%. I'm using an adaptation of Paul Lewis' FLIP technique, in which animations are set in JavaScript but run via CSS transitions. In my case, each of the 160+ blocks is measured, assigned a transform to "freeze" it in place as a new GPU layer, then transitioned to its final position with a second transform and "thawed" back into a regular, responsive element.

Even though I'm animating many more elements than Flipboard is doing in their demo, the animation is perfectly smooth on my Nexus 5, and on the aging iPad we use for testing. By doing all the hard computational work up front, and then handing the pre-computed transitions over to the browser, I'm actually not JavaScript-bound at all: everything is done on the graphics chip, and in the C++ compositing layer. The result is a smooth 60FPS during the animation, all done via regular DOM elements. So much for "If you touch the DOM in any way during an animation you’ve already blown through your 16ms frame budget."

Again, I'm not claiming that my use case is a perfect analogue. I'm animating a graphic in response to a single button press, and they're attempting to create an "infinite scroll" (sort of — it's not really a scroll so much as an animated pager). But this idea that "the DOM is lava" and touching it will cause your reader's phones to instantly burst into flames of scorn seems patently ridiculous, especially when we look back at that list of everything that was sacrificed in the single-minded pursuit of speed.

Performance is important, and I care deeply and obsessively about it. As a gamer and a graphics nerd, I love tweaking out those last few frames per second, or adding flashy effects to a page. But it's not the most important thing. It's not more important than making your content available to the blind or visually impaired. It's not more important than providing standard UI actions like copy-and-paste or "open in new tab." And it's not more important than providing a fallback for older and less-powerful devices, the kind that are used by poor readers. Let's keep speed in perspective on the web, and not get so caught up in dogma that we abandon useful techniques like client-side templating and the DOM.

November 12, 2014

Filed under: tech»web

grunt-init component

Last week, I wrote a little bit about using custom elements for our election pages. Being able to interact with SVG maps using a simple DOM interface, while still annoying (it's SVG, after all) miles more pleasant than actually using the tags directly. At the end of that post, I recommended that newsrooms thinking about docreating new JavaScript libraries look into Web Components — or at least custom elements. This week, I've got a way to make good on that pitch.

Similar to our news app template, I've put together a Grunt scaffolding for creating bundled custom elements, including HTML templating and CSS, all in a single standalone file. It's our component template — or, as I like to call it, the Poor Journalist's Polymer.

As with the app template, I'm developing the component scaffolding by building projects with it and then integrating the improvements back in. The first is a responsive-frame element that serves as a smaller, easier-to-use replacement for NPR's Pym. I like Pym, and I've used it in several projects now, but it's a little buggy and the setup process is cumbersome. In contrast, the custom elements don't require any JavaScript skills: just include the script to start using them on the page, and they'll connect up with the child elements on the other side of the iframe automatically.

My second testbed project is a Leaflet map element that uses custom HTML to set the map configuration without ever writing a line of JSON (unless you really want to). It's intended to make mapping simple and fast for web producers, while still offering plenty of power for people like me who just want the boilerplate out of the way. Leaflet's a great candidate for this kind of declarative approach, and I think this is a really promising demo for the power of custom elements.

For standalone components like these, the template seems to be working well. I haven't yet solved the problem of easily embedding them in highly opinionated news apps, due to the way that dependencies are handled. It's useful for custom elements to be able to bundle their CSS and other assets into their package, similar to the way that HTML imports and shadow root offer embedded styles, but that means they may not integrate well into projects that already have their own build system. As far as I can tell, the best solution for now will probably be to load the packages from Bower and require() the standalone files from its build directory, which should work with whatever module system you like.

But to be clear, the component template isn't really intended to solve those problems. Its goal is to simplify and modernize the kinds of scripts that, even now, people tend to solve with a jQuery plugin. I'd like to change that, so that more newsrooms produce reusable HTML elements instead of JavaScript spaghetti code. If you build something interesting with the component template, or if it inspires you to make your own, please let me know!

Past - Present - Future