this space intentionally left blank

May 21, 2019

Filed under: tech»web

Radios Hack

The past few months, I've mostly been writing in public for NPR's News Apps team blog, with posts on the new Dailygraphics rig (and setting it up on Windows), the Mueller report redactions, and building a scrolling audio story. However, in my personal time, I decided to listen to some podcasts. So naturally, I built a web-based listener app, just for me.

I had a few goals for Radio as I was building it. The first was my own personal use case, in which I wanted to track and listen to a few podcasts, but not actually install a dedicated player on my phone. A web app makes perfect sense for this kind of ephemeral use case, especially since I'm not ever really offline anymore. I also wanted to try building something entirely using Web Components instead of a UI framework, and to use modern features like import — in part because I wanted to see if I could recommend it as a standard workflow for younger developers, and for internal newsroom tools.

Was it a success? Well, I've been using it to listen to Waypoint and Says Who for the last couple of months, so I'd say on that metric it proved itself. And I certainly learned a lot. There are definitely parts of this experience that I can whole-heartedly recommend — importing JavaScript modules instead of using a bundler is an amazing experience, and is the right kind of tradeoff for the health of the open web. It's slower (equivalent to dynamic AMD imports) but fast enough for most applications, and it lets many projects opt entirely out of beginner-unfriendly tooling.

Not everything was as smooth. I'm on record for years as a huge fan of Web Components, particularly custom elements. And for an application of this size, it was a reasonably good experience. I wrote a base class that automated some of the rough edges, like templating and synchronizing element attributes and properties. But for anything bigger or more complex, there are some cases where the platform feels lacking — or even sometimes actively hostile.

For example: in the modern, V1 spec for custom elements, they're not allowed to manipulate their own contents until they've been placed in the page. If you want to skip the extra bookkeeping that would require, you are allowed to create a shadow root in the constructor and put whatever HTML you want inside. It feels very much like this is the workflow you're supposed to use. But shadow DOM is harder to inspect at the moment (browser tools tend to leave it collapsed when inspecting the page), and it causes problems with events (which do not cross the shadow DOM boundary unless you alter your dispatch code).

There's also no equivalent in Web Components for the state management that's core to most modern frameworks. If you want to pass information down to child components from the parent, it either needs to be set through attributes (meaning you only get strings) or properties (more bookkeeping during the render step). I suspect if I were building something larger than this simple list-of-lists, I'd want to add something like Redux to manage data. This would tie my components to that framework, but it would substantially clean up the code.

Ironically, the biggest hassle in the development process was not from a new browser feature, but from a very old one: while it's very easy to create an audio tag and set its source to any sound clip on the web, actually getting the list of audio files is often impossible, thanks to CORS. Most podcasts do not publish their episode feeds with the cross-origin header set, so the browser's security settings shut down the AJAX requests completely. It's wild that in 2019, there's still no good way to make a secure request (say, one that transmits no cookies or custom headers) to another domain. I ended up running the final app on Glitch, which provides basic Node hosting, so that I could deploy a simple proxy for feed data.

For me, the neat thing about this project was how it brought back the feeling of hackability on the web, something I haven't really felt since I first built Caret years ago. It's so easy to get something spun up this way, and that's a huge incentive for creating little personal apps. I love being able to make an ugly little app for myself in only a few hours, instead of needing to evaluate between a bunch of native apps run by people I don't entirely trust. And I really appreciated the ways that Glitch made that easy to do, and emphasized that in its design. It helps that podcasting, so far, is still a platform built on open web tech: XML and MP3. More of this, please!

October 2, 2018

Filed under: tech»coding

Generators: the best JS feature you're not using

People love to talk about "beautiful" code. There was a book written about it! And I think it's mostly crap. The vast majority of programming is not beautiful. It's plumbing: moving values from one place to another in response to button presses. I love indoor plumbing, but I don't particularly want to frame it and hang it in my living room.

That's not to say, though, that there isn't pleasant code, or that we can't make the plumbing as nice as possible. A program is, in many ways, a diagram for a machine that is also the machine itself — when we talk about "clean" or "elegant" code, we mean cases where those two things dovetail neatly, as if you sketched an idea and it just happened to do the work as a side effect.

In the last couple of years, JavaScript updates have given us a lot of new tools for writing code that's more expressive. Some of these have seen wide adoption (arrow functions, template strings), and deservedly so. But if you ask me, one of the most elegant and interesting new features is generators, and they've seen little to no adoption. They're the best JavaScript syntax that you're not using.

To see why generators are neat, we need to know about iterators, which were added to the language in pieces over the years. An iterator is an object with a next() method, which you can call to get a new result with either a value or a "done" flag. Initially, this seems like a fairly silly convention — who wants to manually call a loop function over and over, unwrapping its values each time? — but the goal is actually to enable new syntax once the convention is in place. In this case, we get the generic for ... of loop, which hides all the next() and result.done checks behind a familiar-looking construct: for (var item of iterator) { // item comes from iterator and // loops until the "done" flag is set }

Designing iteration as a protocol of specific method/property names, similar to the way that Promises are signaled via the then() method, is something that's been used in languages like Python and Lua in the past. In fact, the new loop works very similarly to Python's iteration protocol, especially with the role of generators: while for ... of makes consuming iterators easier, generators make it easier to create them.

You create a generator by adding a * after the function keyword. Within the generator, you can ouput a value using the yield keyword. This is kind of like a return, but instead of exiting the function, it pauses it and allows it to resume the next time it's called. This is easier to understand with an example than it is in text: function* range(from, to) { while (from <= to) { yield from; from += 1; } } for (var num of range(3, 6)) { // logs 3, 4, 5, 6 console.log(num); }

Behind the scenes, the generator will create a function that, when called, produces an iterator. When the function reaches its end, it'll be "done" for the purposes of looping, but internally it can yield as many values as you want. The for ... of syntax will take care of calling next() for you, and the function starts up from where it was paused from the last yield.

Previously, in JavaScript, if we created a new collection object (like jQuery or D3 selections), we would probably have to add a method on it for doing iteration, like collection.forEach(). This new syntax means that instead of every collection creating its own looping method (that can't be interrupted and requires a new function scope), there's a standard construct that everyone can use. More importantly, you can use it to loop over abstract things that weren't previously loopable.

For example, let's take a problem that many data journalists deal with regularly: CSV. In order to read a CSV, you probably need to get a file line by line. It's possible to split the file and create a new array of strings, but what if we could just lazily request the lines in a loop? function* readLines(str) { var buffer = ""; for (var c of str) { if (c == "\n") { yield buffer; buffer = ""; } else { buffer += c; } } }

Reading input this way is much easier on memory, and it's much more expressive to think about looping through lines directly versus creating an array of strings. But what's really neat is that it also becomes composable. Let's say I wanted to read every other line from the first five lines (this is a weird use case, but go with it). I might write the following: function* take(x, list) { var i = 0; for (var item of list) { if (i == x) return; yield item; i++; } } function* everyOther(list) { var other = true; for (var item of list) { if (!other) continue; other = !other; yield item; } } // get my weird subset var lines = readLines(file); var firstFive = take(5, lines); var alternating = everyOther(firstFive); for (var value of alternating) { // ... }

Not only are these generators chained, they're also lazy: until I hit my loop, they do no work, and they'll only read as much as they need to (in this case, only the first five lines are read). To me, this makes generators a really nice way to write library code, and it's surprising that it's seen so little uptake in the community (especially compared to streams, which they largely supplant).

So much of programming is just looping in different forms: processing delimited data line by line, shading polygons by pixel fragment, updating sets of HTML elements. But by baking fundamental tools for creating easy loops into the language, it's now easier to create pipelines of transformations that build on each other. It may still be plumbing, but you shouldn't have to think about it much — and that's as close to beautiful as most code needs to be.

September 16, 2017

Filed under: tech»web

Hacks and Hackers

Last month, I spoke at the first SeattleJS conference, and talked about how we've developed our tools and philosophy for the Seattle Times interactives team. The slides are available here, if you have trouble reading them on-screen.

I'm pretty happy with how this talk turned out. The first two-thirds were largely given, but the last section went through a lot of revision as I wrote and polished. It originally started as a meditation on training and junior developers, which is an important topic in both journalism and tech. Then it became a long rant about news organizations and native applications. But in the end, it came back to something more positive: how the web is important to journalism, and why I believe that both communities have a lot to gain from mutual education and protection.

In addition to my talk, there are a few others from Seattle JS that I really enjoyed:

November 7, 2016

Filed under: tech

Return values

This is a tale of two algorithms.

Every now and then, someone publishes a piece of crossover writing that blows me away. Last week, Claudia Lo at Rock Paper Shotgun examined the algorithms behind relationship simulation in the indie game Rimworld by decompiling its code:

The question we're asking is, 'what are the stories that RimWorld is already telling?' Yes, making a game is a lot of work, and maybe these numbers were just thrown in without too much thought as to how they'd influence the game. But what kind of system is being designed, that in order to 'just make it work', you wind up with a system where there will never be bisexual men? Or where all women, across the board, are eight times less likely to initiate romance?

I want to take a moment to admire Lo's writing, which takes a difficult technical subject (bias in object-oriented simulation code) and explains it in a way that's both readable and ultimately damning to its subject. It's a calm dissection of how stereotypes can get encoded (literally!) into entertainment, and even goes out of its way to be generous to the developer. I haven't talked much about games journalism in the last few years, but this is the kind of work I wish I'd thought to do when I was still writing.

In short, Lo notes that the code for Rimworld combines sexuality, gender, age, and emotional modeling in a way that's more than a little unsettling: women in the game are all bisexual and may be attracted to much older partners, while men have a narrower (and notably younger) attraction range. The game also models penalties for rejection, but not for being harassed, which leaves players trying to solve bizarre problems like "attractive lesbians that destroy community morale."

Part of what makes Lo's story interesting is how deeply this particular simulation reflects not just general social biases, but the particular biases of its developer. It took a lot more work to encode a complex, weighted system of asymmetrical attraction than it would to just use the same basic algorithm for everyone. Someone planned this very carefully — in fact, since the developer jumped into the comments on the piece with a pseudoscientific rant about the non-existence of bisexual men, we know that he's thought way too much about it. He's really invested in this view of "how people are," and very upset that anyone would think that it's just the slightest bit unrealistic or ignorant.

Keep that in mind when reading Pro Publica's investigation into Facebook ad preferences, in which advertisers can target ads toward (or against) particular "ethnic affinities" in violation of federal laws against discrimination in housing and other services. "Affinity" is Facebook's term, a way of hiding behind "we're not actually discriminating on race." Even for the blinkered Silicon Valley, it's astonishing that nobody thought about the ways that this could be misapplied.

The existence of these affinities may be confusing, since Facebook never actually asks about your ethnicity when setting up a profile. Instead, it generates this metadata based on some kind of machine-learning process as you use the service. As is typical for Facebook, this is not only a very bad idea, it's also very badly implemented. I know this myself: Facebook thinks I have an ethnic affinity of "African-American (US)," probably because my friend group via Urban Artistry is largely black, even though I am one of the whitest people in America (there doesn't seem to be an "Ethnic Affinity: White" tag, maybe because that's considered the default setting).

Both Facebook and Rimworld are examples of programming real-world bias into technology, but of the two it's probably Facebook that bothers me more. For the game, a developer very intentionally designed this system, and is clear about his reasons for doing so. But in Facebook's case, this behavior is partially emergent, which gives it deniability. There's no source code that we can decompile, either because it's hidden away on Facebook's servers or because it's the output of a long, machine-generated process and even Facebook can't directly "read" it. The only way to discover these problems is indirectly — such as how some news organizations are trying to collect and analyze Facebook's political ads via user reporting tools.

As more and more of the world becomes data-driven, it's easy to see how emergent behavior could reinforce existing biases if not carefully checked and guarded. When the behavior in question is Pokemon Go redlining, maybe it's not such a big deal, but when it becomes housing — or job opportunities, or access/pricing for services — the effects will be far more serious. Class action lawsuits are a reasonable start, but I think it may be time to consider stronger regulation on what personal data companies can maintain and utilize (granted, I don't have any good answers as to what that regulation would look like). After all, it's pretty clear that tech won't regulate itself.


Addendum: on journalism and public policy

I'm posting this the night before Election Day in the USA. Like a lot of newsrooms, The Seattle Times has a strict non-partisan policy for its journalists, so I can tell you to vote (please do! it's important!), but I can't tell you how I think you should vote or how I voted, and I'm discouraged from taking part in public activities that would signal bias, such as caucusing for either party. It's a condition of my employment at the paper.

Here's what I think is funny: in the post above, I lay out a strong but (I think) reasoned argument for public policy around the use of personal data. While I'm not particularly well-sourced compared to the journalists at Pro Publica, I am one of the few people at the Times who can speak knowledgeably about how databases are stored and interact. I can make this argument to you, and feel like it carries some authority behind it.

But I can only make that argument because our political parties do not (yet) have any coherent preference on how personal data should be treated. If, at some point, the use of personal data was to become politicized by one side or another, I'd no longer be free to share this argument with you — the merits of my position wouldn't have changed, nor would my expertise, but I'd no longer be considered "objective" or "unbiased."

The idea that reporters do not have opinions, or can only have opinions as long as the subjects don't matter, seems self-evidently silly to me. Data privacy is a useful thought experiment: in this case, I'm safe as long as the government never gets its act together. If that seems like a terrible way to run the fourth estate — indeed, if this sounds like a recipe for the proliferation of fact-free publishing outlets encouraged by the malicious inattention of Facebook — I can't disagree. But to tie this to my original point, however this election turns out, I think a lot of journalists are going to need to ask themselves some hard questions about the value of "objectivity" in an algorithm-powered media universe.

September 9, 2016

Filed under: tech»web

Classless components

In early August, I delivered my talk on "custom elements in production" to the CascadiaFest crowd. We've been using these new web platform features at the Seattle Times for more than two years now, and I wanted to share the lessons we've learned, and encourage others to give them a shot. Apart from some awkward technical problems with the projector, I actually think the talk went pretty well:

One of the big changes in the web component world, which I touched on briefly, is the transition from the V0 API that originally shipped in Chrome to the V1 spec currently being finalized. For the most part, the changeover is not a difficult one: some callbacks have been renamed, and there's a new function used to register the element definition.

There is, however, one aspect of the new spec that is deeply problematic. In V0, to avoid complicated questions around parser timing and integration, elements were only defined using a prototype object, with the constructor handled internally and inheritance specified in the options hash. V1 relies instead on an ES6 class definition, like so: class CustomElement extends HTMLElement { constructor() { super(); } } customElements.define("custom-element", CustomElement);

When I wrote my presentation, I didn't think that this would be a huge problem. The conventional wisdom on classes in JavaScript is that they're just syntactic sugar for the existing prototype system — it should be possible to write a standard constructor function that's effectively identical, albeit more verbose.

The conventional wisdom, sadly, is wrong, as became clear once I started testing the V1 API currently available behind a flag in Chrome Canary. In fact, ES6 classes are not just a wrapper for prototypes: specifically, the super() call is not a straightforward translation to older inheritance models, especially when used to extend browser built-ins as it does here. No matter what workarounds I tried, Chrome's V1 custom elements implementation threw errors when passed an ES5 constructor with an otherwise valid prototype chain.

In a perfect world, we would just use the new syntax. But at the Seattle Times, we target Internet Explorer 10 and up, which doesn't support the class keyword. That means that we need to be able to write (or transpile to) an ES5 constructor that will work in both environments. Since the specification is written only in terms of classes, I did what you're supposed to do and filed a bug against the spec, asking how to write a backwards-compatible element definition.

It shouldn't surprise me, but the responses from the spec authors were wildly unhelpful. Apple's representative flounced off, insisting that it's not his job to teach people how to use new features. Google's rep closed the bug as irrelevant, stating that supporting older browsers isn't their problem.

Both of these statements are wrong, although only the second is wrong in an interesting way. Obviously, if you work on standards specifications, it is part of your job to educate developers. A spec isn't just for browsers to implement — if it were, it'd be written in a machine-readable language like WebIDL, or as a series of automated tests, not in stilted (but still recognizable) English. Indeed, the same Google representative that closed my issue previously defended the "tutorial-like" introductory sections elsewhere. Personally, I don't think a little consistency is too much to ask.

But it is the dismissal of older browsers, and the spec's responsibility to them, that I find more jarring. Obviously, a spec for a new feature needs to be free to break from the past. But a big part of the Extensible Web Manifesto, which directly references web components and custom elements, is that the platform should be explainable, and driven by feedback from real web developers. Specifically, it states:

Making new features easy to understand and polyfill introduces a virtuous cycle:
  • Developers can ramp up more quickly on new APIs, providing quicker feedback to the platform while the APIs are still the most malleable.
  • Mistakes in APIs can be corrected quickly by the developers who use them, and library authors who serve them, providing high-fidelity, critical feedback to browser vendors and platform designers.
  • Library authors can experiment with new APIs and create more cow-paths for the platform to pave.

In the case of the V1 custom elements spec, feedback from developers is being ignored — I'm not the only person that has complained publicly about the way that the class-based definitions are a pain to use in a mixed-browser environment. But more importantly, the spec is actively hostile to polyfills in a way that the original version was not. Authors currently working to shim the V1 API into browsers have faced three problems:

  1. Calling super() invokes magic that's hard to reproduce in ES5, and needlessly so.
  2. HTMLElement isn't a callable function in older environments, and has to be awkwardly monkey-patched.
  3. Apple publicly opposes extending anything other than the generic HTMLElement, and has only allowed it into the spec so they can kill it later.

The end result is that you can write code that will work in old and new browsers, but it won't exactly look like real V1 code. It's not a true polyfill, more of a mini-framework that looks almost — but not exactly! — like the native API.

I find this frustrating in part for its inelegance, but more so because it fundamentally puts the lie to the principles of the extensible web. You can't claim that you're explaining the capabilities of the platform when your API is polyfill-hostile, since a polyfill is the mechanism by which we seek to explain and extend those capabilities.

More importantly, there is no surer way to slow adoption of a web feature than to artificially restrict its usage, and to refuse to educate developers on how to use it. The spec didn't have to be this way: they could detail ES5 semantics, and help people who are struggling, but they've chosen not to care. As someone who literally stood on a stage in front of hundreds of people and advocated for this feature, that's insulting.

Contrast the bullying attitude of the custom elements spec authors with the advocacy that's been done on behalf of Service Worker. You couldn't swing a dead cat in 2016 without hitting a developer advocate talking up their benefits, creating detailed demos, offering advice to people trying them out, and talking about how they gracefully degrade in older browsers. As a result, chances are good that Service Worker will ship in multiple browsers, and see widespread adoption, by the end of next year.

Meanwhile, custom elements will probably languish in relative obscurity, as they've done for many years now. It's a shame, because I'd argue that the benefits of custom elements are strong enough to justify using them even via the old V0 polyfill. I still think they're a wonderful way to build and declare UI, and we'll keep using them at the Times. But whatever wider success they achieve will be despite the spec, not because of it. It's a disgrace to the idea of an extensible web. And the authors have only themselves to blame.

August 10, 2016

Filed under: tech»web

RIP Chrome apps

Update: Well, that was prescient.

At least once a day, I log into the Chrome Web Store dashboard to check on support requests and see how many users I've still got. Caret has held steady for the last year or so at about 150,000 active users, give or take ten thousand, and the support and feature requests have settled into a predictable rut:

  • People who can't run Caret because their version of Chrome is too old, and I've started using new ES6 features that aren't supported six browser versions back.
  • People who want split-screen support, and are out of luck barring a major rewrite.
  • People who don't like the built-in search/replace functionality, which makes sense, because it's honestly pretty terrible.
  • People who don't like the icons, and are just going to have to get over it.

In a few cases, however, users have more interesting questions about the fundamental capabilies of developer tooling, like file system monitoring or plugging into the OS in a deeper way. And there I have bad news, because as far as I can tell, Chrome apps are no longer actively developed by the Chromium team at all, and probably never will be again.

I don't think Chrome apps are going away immediately — they're still useful and used by a lot of third-party companies — but it's pretty clear from the dev side of things that Google's heart isn't in it anymore. New APIs have ceased to roll out, and apps don't get much play at conferences. The new party line is all about progressive web apps, with browser extensions for the few cases where you need more capabilities.

Now, progressive web apps are great, and anything that moves offline applications away from a single browser and out to the wider web is a good thing. But the fact remains that while a large number of Chrome apps can become PWAs with little fuss, Caret can't. Because it interacts with the filesystem so heavily, in a way that assumes a broader ecosystem of file-based tools (like Git or Node), there's actually no path forward for it using browser-only APIs. As such, it's an interesting litmus test for just how far web apps can actually reach — not, as some people have wrongly assumed, because there's an inherent performance penalty on the web, but because of fundamental limits in the security model of the browser.

Bounding boxes

What's considered "possible" for a web app in, say, 2020? It may be easier to talk about what isn't possible, which avoids the judgment call on what is "suitable." For example, it's a safe bet that the following capabilities won't ever be added to the web, even though they've been hotly debated in and out of standards committees for years:

  • Read/write file access (died when the W3C pulled the plug on the Directories part of the Filesystem API)
  • Non-HTTP sockets and networking (an endless number of reasons, but mostly "routers are awful")

There are also a bunch of APIs that are in experimental stages, but which I seriously doubt will see stable deployment in multiple browsers, such as:

  • Web Bluetooth (enormous security and usability issues)
  • Web USB (same as Bluetooth, but with added attacks from the physical connection)
  • Battery status (privacy concerns)
  • Web MIDI

It's tough to get worked up about a lot of the initiatives in the second list, which mostly read as a bad case of mobile envy. There are good reasons not to let a web page have drive-by access to hardware, and who's hooking up a MIDI keyboard to a browser anyway? The physical web is a better answer to most of these problems.

When you look at both lists together, one thing is clear: Chrome apps have clearly been a testing ground for web features. Almost all the not-to-be-implemented web APIs have counterparts in Chrome apps. And in the end, the web did learn from it — mainly that even in a sandboxed, locked-down, centrally distributed environment, giving developers that much power with so little install friction could be really dangerous. Rogue extensions and apps are a serious problem for Chrome, as I can attest: about once a week, shady people e-mail me to ask if they can purchase Caret. They don't explicitly say that they're going to use it to distribute malware and takeover ads, but the subtext is pretty clear.

The great thing about the web is that it can run code without any installation step, but that's also the worst thing about it. Even as a huge fan of the platform, the idea that any of the uncountable pages I visit in any given week could access USB directly is pretty chilling, especially when combined with exploits for devices that are plugged in, like hacking a phone (a nice twist on the drive-by jailbreak of iOS 4). Access to the file system opens up an even bigger can of worms.

Basically, all the things that we want as developers are probably too dangerous to hand out to the web. I wish that weren't true, but it is.

Untrusted computing

Let's assume that all of the above is true, and the web can't safely expand for developer tools. You can still build powerful apps in a browser, they just have to be supported by a server. For example, you can use a service like Cloud 9 (now an AWS subsidiary) to work on a hosted VM. This is the revival of the thick-client model: offline capabilities in a pinch, but ultimately you're still going to need an internet connection to get work done.

In this vision, we are leaning more on the browser sandbox: creating a two-tier system with the web as a client runtime, and a native tier for more trust on the local machine. But is that true? Can the web be made safe? Is it safe now? The answer is, at best, "it depends." Every third-party embed or script exposes your users to risk — if you use an ad network, you don't have any real idea who could be reading their auth cookies or tracking their movements. The miracle of the web isn't that it is safe, it's that it manages to be useful despite how rampantly unsafe its defaults are.

So along with the shift back to thick clients has come a change in the browser vendors' attitude toward powerful API features. For example, you can no longer use geolocation or the camera/microphone in Chrome on pages that aren't served over HTTPS, with other browsers to follow. Safari already disallows third-party cookie access as a general rule. New APIs, like Service Worker, require HTTPS. And I don't think it's hard to imagine a world where an API also requires a strict Content Security Policy that bans third-party embeds altogether (another place where Chrome apps led the way).

The packaged app security model was that if you put these safeguards into place and verified the package contents, you could trust the code to access additional capabilities. But trusting the client was a mistake when people were writing Quakebots, and it stayed a mistake in the browser. In the new model, those controls are the minimum just to keep what you had. Anything extra that lives solely on the client is going to face a serious uphill battle.

Mind the gap

The longer that I work on Caret, the less I'm upset by the idea that its days are numbered. Working on a moderately-successful open source project is exhausting: people have no problems making demands, sending in random changes, or asking the same questions over and over again. It's like having a second boss, but one that doesn't pay me or offer me any opportunities for advancement. It's good for exposure, but people die from exposure.

The one regret that I will have is the loss of Caret's educational value. Since its early days, there's been a small but steady stream of e-mail from teachers who are using it in classrooms, both because Chromebooks are huge in education and because Caret provides a pretty good editor with almost no fuss (you don't even have to be signed in). If you're a student, or poor, or a poor student, it's a pretty good starter option, with no real competition for its market niche.

There are alternatives, but they tend to be online-only (like Mozilla's Thimble) or they're not Chromebook friendly (Atom) or they're completely unacceptable in a just world (Vim). And for that reason alone, I hope Chrome keeps packaged apps around, even if they refuse to spend any time improving the infrastructure. Google's not great at end-of-life maintenance, but there are a lot of people counting on this weird little ecosystem they've enabled. It would be a shame to let that die.

August 1, 2016

Filed under: tech»web

<slide-show>

On Thursday, I'll be giving a talk at CascadiaFest on using custom elements in production. It's kind of a sales pitch, to convince people that adopting web components is safe to do, despite the instability of the spec and the contentious politics between browsers. After all, we've been publishing with several components at the Times for almost two years now, with good results.

When I presented an early version of this at SeattleJS, I presented by scrolling through a single text file instead of slides, because I've always wanted to do that. But for Cascadia, I wanted to do something a little more special, so I built the presentation itself out of custom elements, with the goal that it would demonstrate how to write code that works with both versions of the spec. It's also meant to be a good example for someone who's just learning how web components function — I use pretty much every custom elements feature at one point or another in 300 lines of code. You can take a look at the source for it here.

There are several strategies that I ended up emphasizing while writing the <slide-show> elements, primarily the heavy use of events to tame asynchronicity. It turns out that between V0, V1, and the two major polyfills, elements and their attributes are resolved by the parser with entirely different timing. It's really important that child elements notify their parent when they upgrade, and parents shouldn't assume that children are ready at startup.

One way to deal with asynchronous upgrades is just to put all your functionality in the parent element (our <leaflet-map> does this), but I wanted to make these slides easier to extend with new types (such as text, code, or image slides). In this case, the slide show looks for a parsedContent property on the current slide, and it's the child's job to populate and update that value. An earlier version called a parseContents() method, but using properties as "duck-typing" makes it much easier to handle un-upgraded elements, and moving the responsibility to the child also greatly simplified the process of watching slide contents for changes.

A nice side effect of using live properties and events is that it "feels" a lot more like a built-in element. The modern DOM API is built on similar primitives, so writing the glue code for the UI ended up being very pleasant, and it's possible to interact using the dev tools in a natural way. I suspect that well-built component libraries in the future will be judged on how well they leverage a declarative interface to blend in with existing elements.

Ironically, between child elements and Shadow DOM, it's actually much harder to move between different polyfills than it is to write an element definition for both the new and old specifications. We've always written for Giammarchi's registerElement shim at the Times, and it was shocking for me to find out that Polymer's shim not only diverges from its counterpart, but also differs from Chrome's native implementation. Coding around these differences took a bit of effort, but it's probably work I should have done at the start, and the result is quite a bit nicer than some of the hacks I've done for the Times. I almost feel like I need to go back now and update them with what I've learned.

Writing this presentation was a good way to make sure I was current on the new spec, and I'm actually pretty happy with the way things have turned out. When WebKit started prototyping their own API, I started to get a bit nervous, but the resulting changes are relatively minor: some property names have changed, the lifecycle is ordered a bit differently, and upgrade code is called in the constructor (to encourage using the class syntax) instead of from a createdCallback() method. Most of these are positive alterations, and while there are some losses going from V0 to V1 (no is attribute to subclass arbitrary elements), they're not dealbreakers. Overall, I'm more optimistic about the future of web components than I have in quite a while, and I'm looking forward to telling people about it at Cascadia!

May 10, 2016

Filed under: tech»web

Behind the Times

The paper recently launched a new native app. I can't say I'm thrilled about that, but nobody made me CEO. Still, the technical approach it takes is "interesting:" its backing API converts articles into a linear stream of blocks, each of which is then hand-rendered in the app. That's the plan, at least: at this time, it doesn't support non-text inline content at all. As a result, a lot of our more creative digital content doesn't appear in the app, or is distorted when it does appear.

The justification given for this decision was speed, with the implicit statement being that a webview would be inherently too slow to use. But is that true? I can't resist a challenge, and it seemed like a great opportunity to test out some new web features I haven't used much, so I decided to try building a client. You can find the code here. It's currently structured as a Chrome app, but that's just to get around the CORS limit since our API doesn't have the Access-Control-Allow-Origin headers added.

The app uses a technique that's been popularized by Nolan Lawson's Pokedex.org, in which almost all of the time-consuming code runs in a Web Worker, and the main thread just handles capturing UI events and re-rendering. I started out with the worker process handling network and caching in IndexedDB (the poor man's Service Worker), and then expanded it to do HTML sanitization as well. There's probably other stuff I could move in, but honestly I think it's at a good balance now.

By putting all this stuff into a second script that runs independently, it frees up the browser to maintain a smooth frame rate in animations and UI response. It's not just the fact that I'm doing work elsewhere, but also that there's hardly any garbage collection on the main thread, which means no halting while the JavaScript VM cleans up. I thought building an app this way would be difficult, but it turns out to be mostly similar to writing any page that uses a lot of AJAX — structure the worker as a "server" and the patterns are pretty much the same.

The other new technology that I learned for this project is Mithril, a virtual DOM framework that my old coworkers at ArenaNet rave about. I'm not using much of its MVC architecture, but its view rendering code is great at gradually updating the page as the worker sends back new data: I can generate the initial article list using just the titles that come from one network endpoint, and then add the thumbnails that I get from a second, lower-priority request. Readers get a faster feed of stories, and I don't have to manually synchronize the DOM with the new data.

The metrics from this version of the app are (unsurprisingly) pretty good! The biggest slowdown is the network, which would also be a problem in native code: loading the article list for a section requires one request to get the article IDs, and then one request for each article in that section (up to 21 in total). That takes a while — about a second, on average. On the other hand, it means we have every article cached by the time that the user can choose something to read, which cuts the time for requesting and loading an individual article hovers around 150ms on my Chromebook.

That's not to say that there aren't problems, although I think they're manageable. For one thing, the worker and app bundles are way too big right now (700KB and 200KB, respectively), in part because they're pulling in a bunch of big NPM modules to do their processing. These should be lazy-loaded for speed as much as possible: we don't need HTML parsing right away, for example, which would cut a good 500KB off of the worker's initial size. Every kilobyte of script is roughly 1ms of load time on a mobile device, so spreading that out will drastically speed up the app's startup time.

As an interesting side note, we could cut almost all that weight entirely if the document.implementation object was available in Web Workers. Weir, for example, does all its parsing and sanitization in an inert document. Unfortunately, the DOM isn't thread-safe, so nothing related to document is available outside the main process, and I suspect a serious sanitization pass would blow past our frame budget anyway. Oh well: htmlparser2 and friends it is.

Ironically, the other big issue is mostly a result of packaging this up as a Chrome app. While that lets me talk to the CMS without having CORS support, it also comes with a fearsome content security policy. The app shell can't directly load images or fonts from the network, so we have to load article thumbnails through JavaScript manually instead. Within Chrome's <webview> tag, we have the opposite problem: the webview can't load anything from the app, and it has a weird protocol location when loaded from a data URL, so all relative links have to be rewritten. It's not insurmountable, but you have to be pretty comfortable with the way browsers work to figure it out, and the debugging can get a little hairy.

So there you have it: a web app that performs like native, but includes support for features like DocumentCloud embeds or interactive HTML graphs. At the very least, I think you could use this to advocate for a hybrid native/web client on your news site. But there's a strong argument to be made that this could be your only app: add a Service Worker and (in Chrome and Firefox) it could load instantly and work offline after the first visit. It would even get a home screen icon and push notification support. I think the possibilities for progressive web apps in the news industry are really exciting, and building this client makes me think it's doable without a huge amount of extra work.

April 14, 2016

Filed under: tech»coding

Calculated Amalgamation

In a fit of nostalgia, I've been trying to get my hands on a TI-82 calculator for a few weeks now. TI BASIC was probably the first programming language in which I actually wrote significant amounts of code: although a few years later I'd start working in C for PalmOS and Windows CE, I have a lot of memories of trying to squeeze programs for speed and size during slow class periods. While I keep checking Goodwill for spares, there are plenty of TI calculator emulation apps, so I grabbed one and loaded up a TI-82 ROM to see what I've retained.

Actually, TI BASIC is really weird. Things I had forgotten:

  • You can type in all-caps text if you want, but most of the time you don't, because all of the programming keywords (If, Else, While, etc.) are actually single "character" glyphs that you insert from a menu.
  • In fact, pretty much the only code that's typed manually are variable names, of which you get 26 (one for each letter). There are also six arrays (max length 99), five two-dimensional matrices (limited by memory), and a handful of state variables you can abuse if you really need more. Everything is global.
  • Variables aren't stored using =, which is reserved for testing, but with a left-to-right arrow operator: value → dest I imagine this clears up a lot of ambiguity in the parser.
  • Of course, if you're processing data linearly, you can do a lot without explicit variables, because the result of any statement gets stored in Ans. So you can chain a lot of operations together as long as you just keep operating on the output of the previous line.
  • There's no debugger, but you can hit the On key to break at any time, and either quit or jump to the current line.
  • You can call other programs and they do return after calling, but there are no function definitions or return values other than Ans (remember, everything is global). There is GOTO, but it apparently causes memory leaks when used (thanks, Dijkstra!).

I'd romanticized it over time — the self-contained hardware, the variable-juggling, the 1-bit graphics on a 96x64 screen. Even today, I'm kind of bizarrely fascinated by this environment, which feels like the world's most cumbersome register VM. But loading up the emulator, it's obvious why I never actually finished any of my projects: TI BASIC is legitimately a terrible way to work.

In retrospect, it's obviously a scripting language for a plotting library, and not the game development environment I wanted it to be when I was trying to build Wolf3D clones. You're supposed to write simple macros in TI BASIC, not full-sized applications. But as a bored kid, it was a great playground, and the limitations of the platform (including its molasses-slow interpreter) made simple problems into brainteasers (it's almost literally the challenge behind TIS-100).

These days, the kids have it way better than I did. A micro:bit is cheaper and syncs with a phone or computer. A Raspberry Pi is a real computer of its own, as is the average smartphone. And a laptop or Chromebook with a browser is miles more productive than a TI-82 could ever be. On the other hand, they probably can't sneak any of those into their trig classes and get away with it. And maybe that's for the best — look how I turned out!

March 22, 2016

Filed under: tech»web

ES6 in anger

One of the (many) advantages of running Seattle Times interactives on an entirely different tech stack from the rest of the paper is that we can use new web features as quickly as we can train ourselves on them. And because each news app ships with an isolated set of dependencies, it's easy to experiment. We've been using a lot of new ES6 features as standard for more than a year now, and I think it's a good chance to talk about how to use them effectively.

The Good

Surprisingly (to me at least), the single most useful ES6 feature has been arrow functions. The key to using them well is to restrict them only to one-liners, which you'd think would limit their usefulness. Instead, it frees you up to write much more readable JavaScript, especially in array processing. As soon as it breaks to a second line (or seems like it might do so in the future), I switch to writing regular function statements.


//It's easy to filter and map:
var result = list.filter(d => d.id).map(d => d.value);

//Better querySelectorAll with the spread operator:
var $ = s => [...document.querySelectorAll(s)];

//Fast event logging:
map.on("click", e => console.log(e.latlng);

//Better styling with template strings:
var translate = (x, y) => `translate(${x}px, ${y}px);`;

Template strings are the second biggest win, especially as above, where they're combined with arrow functions to create text snippets. Having a multiline string in JS is very useful, and being able to insert arbitrary values makes building dynamic popups or CSS styles enormously simpler. I love writing template strings for quick chunks of templating, or embedding readable SQL in my Node apps.

Despite the name, template strings aren't real templates: they can't handle loops, they don't really do interpolation, and the interface for using "tagged" strings is cumbersome. If you're writing very long template strings (say, more than five lines), it's probably a sign that you need to switch to something like Handlebars or EJS. I have yet to see a "templating language" built on tagged strings that didn't seem like a wildly frustrating experience, and despite the industry's shift toward embedded DSLs like React's JSX, there is a benefit to keeping different types of code in different files (if only for widespread syntax highlighting).

The last feature I've really embraced is destructuring and object literals. They're mostly valuable for cleanup, since all they do is cut down on repetition. But they're pleasant to use, especially when parsing text and interacting with CommonJS modules.



//Splitting dates is much nicer now:
var [year, month, day] = dateString.split(/\/|-/);

//Or getting substrings out of a regex match:
var re = /(\w{3})mlb_(\w{3})mlb_(\d+)/;
var [match, away, home, index] = gameString.match(re);

//Exporting from a module can be simpler:
var x = "a";
var y = "b";
module.exports = { x, y };

//And imports are cleaner:
var { x } = require("module");

The bad

I've tried to like ES6 classes and modules, and it's possible that one day they're going to be really great, but right now they're not terribly friendly. Classes are just syntactic sugar around ES5 prototypes — although they look like Java-esque class statements, they're still going to act in surprising ways for developers who are used to traditional inheritance. And for JavaScript programmers who understand how the language actually works, class definitions boast a weird, comma-less syntax that's sort of like the new object literal syntax, but far enough off that it keeps tripping me up.

The turning point for the new class keyword will be when the related, un-polyfillable features make their way into browsers — I'm thinking mainly of the new Symbols that serve as feature flags and the ability to extend Array and other built-ins. Until that time, I don't really see the appeal, but on the other hand I've developed a general aversion to traditional object-oriented programming, so I'm probably not the best person to ask.

Modules also have some nice features from a technical standpoint, but there's just no reason to use them over CommonJS right now, especially since we're already compiling our applications during the build process (and you have to do that, because browser support is basically nil). The parts that are really interesting to me about the module system — namely, the configurable loader system — aren't even fully specified yet.

New discoveries

Most of what we use on the Times' interactive team is restricted to portions of ES6 that can be transpiled by Babel, so there are a lot of features (proxies, for example) that I don't have any experience using. In a Node environment, however, I've had a chance to use some of those features on the server. When I was writing our MLB scraper, I took the opportunity to try out generators for the first time.

Generators are borrowed liberally from Python, and they're basically constructors for custom iterable sequences. You can use them to make normal objects respond to language-level iteration (i.e., for ... of and the spread operator), but you can also define sequences that don't correspond to anything in particular. In my case, I created a generator for the calendar months that the scraper loads from the API, which (when hooked up to the command line flags) lets users restart an MLB download from a later time period:


//feed this a starting year and month
var monthGen = function*(year, month) {
  while (year < 2016) {
    yield { year, month };
    month++;
    if (month > 12) {
      month = 1;
      year++;
    }
  }
};

//generate a sequence from 2008 to 2016
var months = [...monthGen(2008, 1)];

That's a really nice code pattern for creating arbitrary lists, and it opens up a lot of doors for JavaScript developers. I've been reading and writing a bit more Python lately, and it's been amazing to see how much a simple pattern like this, applied language-wide, can really contribute to its ergonomics. Instead of the Stream object that's common in Node, Python often uses generators and iteration for common tasks, like reading a file line-by-line or processing a data pipeline. As a result, I suspect most new Python programmers need to survey a lot less intellectual surface area to get up and running, even while the guts underneath are probably less elegant for advanced users.

It surprised me that I was so impressed with generators, since I haven't particularly liked Python very much in the past. But in reading the Cookbook to prep for a UW class in Python, I've realized that the two languages are actually closer together than I'd thought, and getting closer. Python's class implementation is actually prototypical behind the scenes, and its use of duck typing for built-in language features (such as the with statement) bears a strong resemblance to the work being done on JavaScript Promises (a.k.a. "then-ables") and iterator protocols.

It's easy to be resistant to change, and especially when it's at the level of a language (computer or otherwise). I've been critical of a lot of the decisions made in ES6 in the past, but these are positive additions on the whole. It's also exciting, as someone who has been working in JavaScript at a deep level, to find that it has new tricks, and to stretch my brain a little integrating them into my vocabulary. It's good for all of us to be newcomers every so often, so that we don't get too full of ourselves.

Past - Present - Future