this space intentionally left blank

October 2, 2018

Filed under: tech»coding

Generators: the best JS feature you're not using

People love to talk about "beautiful" code. There was a book written about it! And I think it's mostly crap. The vast majority of programming is not beautiful. It's plumbing: moving values from one place to another in response to button presses. I love indoor plumbing, but I don't particularly want to frame it and hang it in my living room.

That's not to say, though, that there isn't pleasant code, or that we can't make the plumbing as nice as possible. A program is, in many ways, a diagram for a machine that is also the machine itself — when we talk about "clean" or "elegant" code, we mean cases where those two things dovetail neatly, as if you sketched an idea and it just happened to do the work as a side effect.

In the last couple of years, JavaScript updates have given us a lot of new tools for writing code that's more expressive. Some of these have seen wide adoption (arrow functions, template strings), and deservedly so. But if you ask me, one of the most elegant and interesting new features is generators, and they've seen little to no adoption. They're the best JavaScript syntax that you're not using.

To see why generators are neat, we need to know about iterators, which were added to the language in pieces over the years. An iterator is an object with a next() method, which you can call to get a new result with either a value or a "done" flag. Initially, this seems like a fairly silly convention — who wants to manually call a loop function over and over, unwrapping its values each time? — but the goal is actually to enable new syntax once the convention is in place. In this case, we get the generic for ... of loop, which hides all the next() and result.done checks behind a familiar-looking construct: for (var item of iterator) { // item comes from iterator and // loops until the "done" flag is set }

Designing iteration as a protocol of specific method/property names, similar to the way that Promises are signaled via the then() method, is something that's been used in languages like Python and Lua in the past. In fact, the new loop works very similarly to Python's iteration protocol, especially with the role of generators: while for ... of makes consuming iterators easier, generators make it easier to create them.

You create a generator by adding a * after the function keyword. Within the generator, you can ouput a value using the yield keyword. This is kind of like a return, but instead of exiting the function, it pauses it and allows it to resume the next time it's called. This is easier to understand with an example than it is in text: function* range(from, to) { while (from <= to) { yield from; from += 1; } } for (var num of range(3, 6)) { // logs 3, 4, 5, 6 console.log(num); }

Behind the scenes, the generator will create a function that, when called, produces an iterator. When the function reaches its end, it'll be "done" for the purposes of looping, but internally it can yield as many values as you want. The for ... of syntax will take care of calling next() for you, and the function starts up from where it was paused from the last yield.

Previously, in JavaScript, if we created a new collection object (like jQuery or D3 selections), we would probably have to add a method on it for doing iteration, like collection.forEach(). This new syntax means that instead of every collection creating its own looping method (that can't be interrupted and requires a new function scope), there's a standard construct that everyone can use. More importantly, you can use it to loop over abstract things that weren't previously loopable.

For example, let's take a problem that many data journalists deal with regularly: CSV. In order to read a CSV, you probably need to get a file line by line. It's possible to split the file and create a new array of strings, but what if we could just lazily request the lines in a loop? function* readLines(str) { var buffer = ""; for (var c of str) { if (c == "\n") { yield buffer; buffer = ""; } else { buffer += c; } } }

Reading input this way is much easier on memory, and it's much more expressive to think about looping through lines directly versus creating an array of strings. But what's really neat is that it also becomes composable. Let's say I wanted to read every other line from the first five lines (this is a weird use case, but go with it). I might write the following: function* take(x, list) { var i = 0; for (var item of list) { if (i == x) return; yield item; i++; } } function* everyOther(list) { var other = true; for (var item of list) { if (!other) continue; other = !other; yield item; } } // get my weird subset var lines = readLines(file); var firstFive = take(5, lines); var alternating = everyOther(firstFive); for (var value of alternating) { // ... }

Not only are these generators chained, they're also lazy: until I hit my loop, they do no work, and they'll only read as much as they need to (in this case, only the first five lines are read). To me, this makes generators a really nice way to write library code, and it's surprising that it's seen so little uptake in the community (especially compared to streams, which they largely supplant).

So much of programming is just looping in different forms: processing delimited data line by line, shading polygons by pixel fragment, updating sets of HTML elements. But by baking fundamental tools for creating easy loops into the language, it's now easier to create pipelines of transformations that build on each other. It may still be plumbing, but you shouldn't have to think about it much — and that's as close to beautiful as most code needs to be.

Past - Present