this space intentionally left blank

May 10, 2016

Filed under: tech»web

Behind the Times

The paper recently launched a new native app. I can't say I'm thrilled about that, but nobody made me CEO. Still, the technical approach it takes is "interesting:" its backing API converts articles into a linear stream of blocks, each of which is then hand-rendered in the app. That's the plan, at least: at this time, it doesn't support non-text inline content at all. As a result, a lot of our more creative digital content doesn't appear in the app, or is distorted when it does appear.

The justification given for this decision was speed, with the implicit statement being that a webview would be inherently too slow to use. But is that true? I can't resist a challenge, and it seemed like a great opportunity to test out some new web features I haven't used much, so I decided to try building a client. You can find the code here. It's currently structured as a Chrome app, but that's just to get around the CORS limit since our API doesn't have the Access-Control-Allow-Origin headers added.

The app uses a technique that's been popularized by Nolan Lawson's Pokedex.org, in which almost all of the time-consuming code runs in a Web Worker, and the main thread just handles capturing UI events and re-rendering. I started out with the worker process handling network and caching in IndexedDB (the poor man's Service Worker), and then expanded it to do HTML sanitization as well. There's probably other stuff I could move in, but honestly I think it's at a good balance now.

By putting all this stuff into a second script that runs independently, it frees up the browser to maintain a smooth frame rate in animations and UI response. It's not just the fact that I'm doing work elsewhere, but also that there's hardly any garbage collection on the main thread, which means no halting while the JavaScript VM cleans up. I thought building an app this way would be difficult, but it turns out to be mostly similar to writing any page that uses a lot of AJAX — structure the worker as a "server" and the patterns are pretty much the same.

The other new technology that I learned for this project is Mithril, a virtual DOM framework that my old coworkers at ArenaNet rave about. I'm not using much of its MVC architecture, but its view rendering code is great at gradually updating the page as the worker sends back new data: I can generate the initial article list using just the titles that come from one network endpoint, and then add the thumbnails that I get from a second, lower-priority request. Readers get a faster feed of stories, and I don't have to manually synchronize the DOM with the new data.

The metrics from this version of the app are (unsurprisingly) pretty good! The biggest slowdown is the network, which would also be a problem in native code: loading the article list for a section requires one request to get the article IDs, and then one request for each article in that section (up to 21 in total). That takes a while — about a second, on average. On the other hand, it means we have every article cached by the time that the user can choose something to read, which cuts the time for requesting and loading an individual article hovers around 150ms on my Chromebook.

That's not to say that there aren't problems, although I think they're manageable. For one thing, the worker and app bundles are way too big right now (700KB and 200KB, respectively), in part because they're pulling in a bunch of big NPM modules to do their processing. These should be lazy-loaded for speed as much as possible: we don't need HTML parsing right away, for example, which would cut a good 500KB off of the worker's initial size. Every kilobyte of script is roughly 1ms of load time on a mobile device, so spreading that out will drastically speed up the app's startup time.

As an interesting side note, we could cut almost all that weight entirely if the document.implementation object was available in Web Workers. Weir, for example, does all its parsing and sanitization in an inert document. Unfortunately, the DOM isn't thread-safe, so nothing related to document is available outside the main process, and I suspect a serious sanitization pass would blow past our frame budget anyway. Oh well: htmlparser2 and friends it is.

Ironically, the other big issue is mostly a result of packaging this up as a Chrome app. While that lets me talk to the CMS without having CORS support, it also comes with a fearsome content security policy. The app shell can't directly load images or fonts from the network, so we have to load article thumbnails through JavaScript manually instead. Within Chrome's <webview> tag, we have the opposite problem: the webview can't load anything from the app, and it has a weird protocol location when loaded from a data URL, so all relative links have to be rewritten. It's not insurmountable, but you have to be pretty comfortable with the way browsers work to figure it out, and the debugging can get a little hairy.

So there you have it: a web app that performs like native, but includes support for features like DocumentCloud embeds or interactive HTML graphs. At the very least, I think you could use this to advocate for a hybrid native/web client on your news site. But there's a strong argument to be made that this could be your only app: add a Service Worker and (in Chrome and Firefox) it could load instantly and work offline after the first visit. It would even get a home screen icon and push notification support. I think the possibilities for progressive web apps in the news industry are really exciting, and building this client makes me think it's doable without a huge amount of extra work.

April 29, 2016

Filed under: journalism»education

Reporting with Python

This month, I'm teaching a class at the University of Washington on reporting with Python. This seems like an odd match for me, since I hardly ever work with Python, but I wanted to do a class that was more journalism-focused (as opposed to the front-end development that I normally teach) and teaching first-time programmers how to do data analysis in Node just isn't realistic. If you're interested in following along, the repository with the class materials is located here

I'm not the Times' data reporter, so I don't get to do this kind of analysis often, but I always really enjoy it when I do. The danger when planning a class on a fun topic is that it's easy to over-stuff the curriculum in my eagerness to cover the techniques that I think are particularly interesting. To fight that impulse, I typically make a list of material I want to cover, then cut it in half, then think about cutting it in half again. As a result, there's a lot of stuff that didn't make it in — SQL and web scraping primarily among them.

What's left, however, is a pretty solid base for reporters who are interested in starting to use code to generate and explore stories. Last week, we cleaned and searched 1,000 text files for a string, and this week we'll look at doing analysis on CSV files. In the final session, I'm planning on taking a deep dive into regular expressions: so much of reporting is based around interrogating text files, and the nice thing about an education in regex is that it will travel into almost any programming language (as well as being useful for many command line tools like grep or sed).

If I can get anything across in this class, I'm hoping to leave students with an understanding of just how big digital scale can be, and how important it is to have tools for handling it. I was talking one night with one of the Girl Develop It organizers, who works for a local analytics company. Whereas millions of rows of data is a pretty big deal for me, for her it's a couple of hours on a Saturday — she's working at a whole other order of magnitude. I wouldn't even know where to start.

Right now, most record requests and data dumps operate more at my scale. A list of all animal imports/exports in the US for the last ten years is about 7 million records, for example. That's approachable with Python, although you'd be better off learning some SQL for the heavy lifting, but it's past the point where Excel is useful, and it certainly couldn't be explored by hand. If you can't code, or you don't have access to someone who does, you can't write that story.

At some point, the leaks and government records that reporters pore over may grow to a larger kind of scale (leaks, certainly; government data has will be aggregated as long as there are privacy concerns). When that happens, reporters will have to develop the kinds of skills that I don't have. We already see hints of this in the tremendous tooling and coordination required for investigating the Panama papers. But in the meantime, I think it's tremendously important that students learn how to automate data at a basic level, and I'm really excited that this class will introduce them to it.

April 15, 2016

Filed under: tech»coding

Calculated Amalgamation

In a fit of nostalgia, I've been trying to get my hands on a TI-82 calculator for a few weeks now. TI BASIC was probably the first programming language in which I actually wrote significant amounts of code: although a few years later I'd start working in C for PalmOS and Windows CE, I have a lot of memories of trying to squeeze programs for speed and size during slow class periods. While I keep checking Goodwill for spares, there are plenty of TI calculator emulation apps, so I grabbed one and loaded up a TI-82 ROM to see what I've retained.

Actually, TI BASIC is really weird. Things I had forgotten:

  • You can type in all-caps text if you want, but most of the time you don't, because all of the programming keywords (If, Else, While, etc.) are actually single "character" glyphs that you insert from a menu.
  • In fact, pretty much the only code that's typed manually are variable names, of which you get 26 (one for each letter). There are also six arrays (max length 99), five two-dimensional matrices (limited by memory), and a handful of state variables you can abuse if you really need more. Everything is global.
  • Variables aren't stored using =, which is reserved for testing, but with a left-to-right arrow operator: value → dest I imagine this clears up a lot of ambiguity in the parser.
  • Of course, if you're processing data linearly, you can do a lot without explicit variables, because the result of any statement gets stored in Ans. So you can chain a lot of operations together as long as you just keep operating on the output of the previous line.
  • There's no debugger, but you can hit the On key to break at any time, and either quit or jump to the current line.
  • You can call other programs and they do return after calling, but there are no function definitions or return values other than Ans (remember, everything is global). There is GOTO, but it apparently causes memory leaks when used (thanks, Dijkstra!).

I'd romanticized it over time — the self-contained hardware, the variable-juggling, the 1-bit graphics on a 96x64 screen. Even today, I'm kind of bizarrely fascinated by this environment, which feels like the world's most cumbersome register VM. But loading up the emulator, it's obvious why I never actually finished any of my projects: TI BASIC is legitimately a terrible way to work.

In retrospect, it's obviously a scripting language for a plotting library, and not the game development environment I wanted it to be when I was trying to build Wolf3D clones. You're supposed to write simple macros in TI BASIC, not full-sized applications. But as a bored kid, it was a great playground, and the limitations of the platform (including its molasses-slow interpreter) made simple problems into brainteasers (it's almost literally the challenge behind TIS-100).

These days, the kids have it way better than I did. A micro:bit is cheaper and syncs with a phone or computer. A Raspberry Pi is a real computer of its own, as is the average smartphone. And a laptop or Chromebook with a browser is miles more productive than a TI-82 could ever be. On the other hand, they probably can't sneak any of those into their trig classes and get away with it. And maybe that's for the best — look how I turned out!

March 22, 2016

Filed under: tech»web

ES6 in anger

One of the (many) advantages of running Seattle Times interactives on an entirely different tech stack from the rest of the paper is that we can use new web features as quickly as we can train ourselves on them. And because each news app ships with an isolated set of dependencies, it's easy to experiment. We've been using a lot of new ES6 features as standard for more than a year now, and I think it's a good chance to talk about how to use them effectively.

The Good

Surprisingly (to me at least), the single most useful ES6 feature has been arrow functions. The key to using them well is to restrict them only to one-liners, which you'd think would limit their usefulness. Instead, it frees you up to write much more readable JavaScript, especially in array processing. As soon as it breaks to a second line (or seems like it might do so in the future), I switch to writing regular function statements.


//It's easy to filter and map:
var result = list.filter(d => d.id).map(d => d.value);

//Better querySelectorAll with the spread operator:
var $ = s => [...document.querySelectorAll(s)];

//Fast event logging:
map.on("click", e => console.log(e.latlng);

//Better styling with template strings:
var translate = (x, y) => `translate(${x}px, ${y}px);`;

Template strings are the second biggest win, especially as above, where they're combined with arrow functions to create text snippets. Having a multiline string in JS is very useful, and being able to insert arbitrary values makes building dynamic popups or CSS styles enormously simpler. I love writing template strings for quick chunks of templating, or embedding readable SQL in my Node apps.

Despite the name, template strings aren't real templates: they can't handle loops, they don't really do interpolation, and the interface for using "tagged" strings is cumbersome. If you're writing very long template strings (say, more than five lines), it's probably a sign that you need to switch to something like Handlebars or EJS. I have yet to see a "templating language" built on tagged strings that didn't seem like a wildly frustrating experience, and despite the industry's shift toward embedded DSLs like React's JSX, there is a benefit to keeping different types of code in different files (if only for widespread syntax highlighting).

The last feature I've really embraced is destructuring and object literals. They're mostly valuable for cleanup, since all they do is cut down on repetition. But they're pleasant to use, especially when parsing text and interacting with CommonJS modules.



//Splitting dates is much nicer now:
var [year, month, day] = dateString.split(/\/|-/);

//Or getting substrings out of a regex match:
var re = /(\w{3})mlb_(\w{3})mlb_(\d+)/;
var [match, away, home, index] = gameString.match(re);

//Exporting from a module can be simpler:
var x = "a";
var y = "b";
module.exports = { x, y };

//And imports are cleaner:
var { x } = require("module");

The bad

I've tried to like ES6 classes and modules, and it's possible that one day they're going to be really great, but right now they're not terribly friendly. Classes are just syntactic sugar around ES5 prototypes — although they look like Java-esque class statements, they're still going to act in surprising ways for developers who are used to traditional inheritance. And for JavaScript programmers who understand how the language actually works, class definitions boast a weird, comma-less syntax that's sort of like the new object literal syntax, but far enough off that it keeps tripping me up.

The turning point for the new class keyword will be when the related, un-polyfillable features make their way into browsers — I'm thinking mainly of the new Symbols that serve as feature flags and the ability to extend Array and other built-ins. Until that time, I don't really see the appeal, but on the other hand I've developed a general aversion to traditional object-oriented programming, so I'm probably not the best person to ask.

Modules also have some nice features from a technical standpoint, but there's just no reason to use them over CommonJS right now, especially since we're already compiling our applications during the build process (and you have to do that, because browser support is basically nil). The parts that are really interesting to me about the module system — namely, the configurable loader system — aren't even fully specified yet.

New discoveries

Most of what we use on the Times' interactive team is restricted to portions of ES6 that can be transpiled by Babel, so there are a lot of features (proxies, for example) that I don't have any experience using. In a Node environment, however, I've had a chance to use some of those features on the server. When I was writing our MLB scraper, I took the opportunity to try out generators for the first time.

Generators are borrowed liberally from Python, and they're basically constructors for custom iterable sequences. You can use them to make normal objects respond to language-level iteration (i.e., for ... of and the spread operator), but you can also define sequences that don't correspond to anything in particular. In my case, I created a generator for the calendar months that the scraper loads from the API, which (when hooked up to the command line flags) lets users restart an MLB download from a later time period:


//feed this a starting year and month
var monthGen = function*(year, month) {
  while (year < 2016) {
    yield { year, month };
    month++;
    if (month > 12) {
      month = 1;
      year++;
    }
  }
};

//generate a sequence from 2008 to 2016
var months = [...monthGen(2008, 1)];

That's a really nice code pattern for creating arbitrary lists, and it opens up a lot of doors for JavaScript developers. I've been reading and writing a bit more Python lately, and it's been amazing to see how much a simple pattern like this, applied language-wide, can really contribute to its ergonomics. Instead of the Stream object that's common in Node, Python often uses generators and iteration for common tasks, like reading a file line-by-line or processing a data pipeline. As a result, I suspect most new Python programmers need to survey a lot less intellectual surface area to get up and running, even while the guts underneath are probably less elegant for advanced users.

It surprised me that I was so impressed with generators, since I haven't particularly liked Python very much in the past. But in reading the Cookbook to prep for a UW class in Python, I've realized that the two languages are actually closer together than I'd thought, and getting closer. Python's class implementation is actually prototypical behind the scenes, and its use of duck typing for built-in language features (such as the with statement) bears a strong resemblance to the work being done on JavaScript Promises (a.k.a. "then-ables") and iterator protocols.

It's easy to be resistant to change, and especially when it's at the level of a language (computer or otherwise). I've been critical of a lot of the decisions made in ES6 in the past, but these are positive additions on the whole. It's also exciting, as someone who has been working in JavaScript at a deep level, to find that it has new tricks, and to stretch my brain a little integrating them into my vocabulary. It's good for all of us to be newcomers every so often, so that we don't get too full of ourselves.

March 17, 2016

Filed under: politics»national»executive

Seventy-two

Since it is election season, when I ran out of library books last week I decided to re-read Hunter S. Thompson's Fear and Loathing: On the Campaign Trail '72, as I do every four years or so. Surprisingly, I don't appear to have written about it here, even though it's one of my favorite books, and the reason I got into journalism in the first place.

On the Campaign Trail is always a relevant text, but it feels particularly so apt this year. In the middle of the Trump presidential run, the book's passage on the original populist rabble-rouser, George Wallace, could have been written yesterday if you just swap some names — not to mention the whirlwind chaos of the primaries and a convention battle. On the other hand, with writing this good, there's really no wrong time to bring it up.

Even before his death in 2005, when most people thought about Thompson, what usually came to mind was wild indulgence: drugs, guns, and "bat country." Ironically, On the Campaign Trail makes the strong case that his best writing was powerfully controlled and focused, not loose and hedonistic: the first two-thirds of the book (or even the first third) contain his finest work. After the Democratic national convention, and the resulting breakdowns in Thompson's health, the analysis remains sharp but the writing never reaches those heights again.

That's not to say that the book is without moments of depravity — his account of accidentally unleashing a drunken yahoo on the Muskie whistlestop tour is still a classic, not to mention the extended threat to chop the big toes off the McGovern political director — but it's never random or undirected. For Thompson, wild fabrication is the only way to bring readers into the surreal world of a political race. His genius is that it actually works.

Despite all that, if On the Campaign Trail has a legacy, it's not the craziness, the drugs, or even the politics. The core of the book is two warring impulses that drive Thompson at every turn: sympathy for the voters who pull the levers of democracy, and simultaneously a deep distrust of the kind of people that they reliably elect. The union of the two is the fuel behind his best writing. Or, as he puts it:

The highways are full of good mottos. But T.S. Eliot put them all in a sack when he coughed up that line about... what was it? Have these Dangerous Drugs fucked my memory? Maybe so. But I think it went something like this:

"Between the Idea and the Reality... Falls the Shadow."

The Shadow? I could almost smell the bastard behind me when I made the last turn into Manchester. It was late Tuesday night, and tomorrow's schedule was calm. All the candidates had zipped off to Florida — except for Sam Yorty, and I didn't feel ready for that.

The next day, around noon, I drove down to Boston. The only hitchhiker I saw was an eighteen-year-old kid with long black hair who was going to Reading — or "Redding," as he said it &mdsash; but when I asked him who he planned to vote for in the election he looked at me like I'd said something crazy.

"What election?" he asked.

"Never mind," I said. "I was only kidding."

March 3, 2016

Filed under: journalism»industry

Spotlit

Judging by my peers, it's possible that I'm the only journalist in America who didn't absolutely love Spotlight. I thought it was a serviceable movie, but when it comes to this year's Best Picture award I still harbor a fantasy that there's an Oscar waiting in Valhalla, shiny and chrome, for Fury Road (or for Creed, if push came to shove).

But I'm not upset to see Spotlight win, either. The movie may have been underwhelming for me, but its subject deserves all the attention it gets (whether or not, as former NYT designer Khoi Vinh wonders, the Globe fully capitalizes on it). My only real concern is that soon it'll be mostly valuable as a historical document, with the kind of deep reporting that it portrays either dying or dead.

To recap: Spotlight centers on the Boston Globe's investigation into the Catholic Church's pedophilia scandals in the 1990s — and specifically, into how the church covered up for abusive priests by moving them around or assigning them to useless "rehabilitation" sessions. The paper not only proved the fact that the church was aware of the problem, but also demonstrated that it was far more common than anyone suspected. It's one of the most important, influential works of journalism in modern memory, done by a local newsroom.

It's also a story of successful data journalism, which I feel is often rare: while my industry niche likes to talk itself up, our track record is shorter than many of us like to admit. The data in question isn't complex — the team used spreadsheets and data entry, not scripting languages or visualizations — but it represents long hours of carefully entering, cleaning, and checking data to discover priests that were shuffled out of public view after reports of abuse. Matt Carroll, the team's "data geek," writes about that experience here, including notes on what he'd do differently now.

So it's very cool to see the film getting acclaim. At the same time, it's a love letter to an increasingly small part of the news industry. Investigative teams are rare these days, and many local papers don't have them anymore. We're lucky that we still have them at the Seattle Times — it's one of the things I really like about working there.

Why do investigative teams vanish? They're expensive, for one thing: a team may spend months, or even a year working on a story. They may need legal help to pursue evidence, or legal protection once a story is published. And investigative stories are not huge traffic winners, certainly not proportional to the effort they take. They're one of the things newsrooms do on principle, and when budget gets tight, those principles often start to look more negotiable than they used to.

In this void, there are still a few national publishers pursuing investigations, both among the startups (Buzzfeed, which partnered on our mobile home stories) and the non-profits (Pro Publica and the Marshall Project). I'm a big fan of the work they're doing. Still, they're spread thin trying to cover the whole country, or a particular topic, leaving a lot of shadows at the local level that could use a little sun.

It's nice to imagine that the success of Spotlight the movie will lead to a resurgence in funding for Spotlight the investigative department, and others like them. I suspect that's wishful thinking, though. In the end, that Oscar isn't going to pay for more reporters or editors. If even Hollywood glamor can't get reporters and editors funded, can anything?

February 17, 2016

Filed under: culture»pop»comics

Excelsior?

Marvel Comics has a digital subscription service called "Marvel Unlimited" that's basically Netflix for their comics: access to most of their archives online for ten bucks a month or so. I decided to give it a shot after Ta-Nehisi Coates kept singing its praises. I buy a few trades a year, but don't always keep them on my shelf, and I figured this was a good chance to go trolling through a few classics that aren't collected in print anymore.

Is it worth it? Well, usually. It turns out that Marvel's back catalog is hardly immune to Sturgeon's Law: most of it is crap. It doesn't help that it's almost all superhero-flavored, which is fine in small doses but starts to feel a little ridiculous when you're exposed to literally thousands of titles and they've all got capes: really, this is all you have? Sure, it's Marvel and that's what they do, but knowing that there's a broad range of other stories being told in this medium makes their genre limitations feel all the more jarring.

Marvel's other bad habit, which only seems to have gotten worse as far as I can tell, is the "special events" that make it impossible to just read through a single storyline. For example, trying to read through the new X-Men titles is an exercise in frustration, since they keep being interrupted or pre-empted by crossover stories from other books. As a way to sell comics to a hardcore faithful, it probably works pretty well. But as a relative newcomer, it's disorienting and irritating, as though a medical drama came crashing into your favorite sitcoms at random intervals.

As a result, it's not surprising that my favorite series to read so far have been either standalone humor titles or oddball takes on the genre. Dan Slott's 2004 run on She-Hulk (often referred to as "Single Green Female") is more legal workplace drama than anything else, and while it sometimes got too clever with the meta-humor, it delivers a nice, funny, self-contained story arc. Ditto for The Superior Foes of Spider-Man, which ran for 17 issues and follows a set of petty, incompetent super-thieves who get in way over their heads. X-Men: Legacy is another short storyline focusing on Charles Xavier's son, David, who has some legitimate disagreements with his "peaceful" father's violent vigilante organization. With its frequent trips into psychic psychedelia, it makes a great case for the infinite effects budget that comics so rarely exploit.

On the other side of the coin, I went trolling through Walt Simonson's tenure on Thor, which ran back in the 1980's and often gets mentioned as a stellar example of classic comics writing. It's pretty good! But it's also a decidedly-weird artifact: while there's overlap with the rest of the Marvel universe from time to time, most of the story is a kind of bonkers faux-Norse legend, with characters taking oaths of honor, pursuing doomed love, and striking off on various quests. The most impressive thing, from a modern perspective, is how many storylines it manages to juggle per issue. There's A, B, C, and sometimes even a D plot, all playing out in 30 page chunks.

But by far my favorite discovery has been the original reason I signed up: Priest's late-90's Black Panther, which is a really fascinating, thought-provoking bit of work. While parts of the art and dialog have not aged gracefully, a lot of it continues to feel very current, both in terms of topic matter and storytelling.

As early as possible, and throughout the rest of the book, Priest emphasizes that T'Challa (the titular Panther) is not just a vigilante out to fight crime, like other superheroes. He's the king of a country — a legitimate state power with an entirely different set of priorities and concerns. To drive that point home, Priest frames the narrative as a series of progress reports from the US liason to T'Challa, Everett Ross, a move that turns out to be an elegant narrative hat trick:

  • Being a white State Department functionary, Ross can explain the political element of the books and serve as an audience surrogate for the largely-white readership.
  • He's useful as comic relief, which is good, since the story arcs themselves revolve around political coups and international sovereignty, and can get a little byzantine.
  • He's a terrible narrator, which starts a running gag where each issue starts disastrously in media res and then unshuffles itself as Ross is forced to double back and explain the situation.

It's a comic book, so of course there are goofy action scenes, and much like the current crop of comic-inspired movies, these rarely rise above "vaguely interesting." But when I think back to the most memorable pages, it's mostly quieter or more subversive scenes. Most of the real plot happens in dialog: negotiations between the Panther and other governments, discussions of succession and history, sarcastic asides that mock the standard superhero schtick. Along the way, Priest is happy to extend a scene for either pathos or awkward humor, to undercut his own pretension, or let characters react to The Black Panther's quietly revolutionary core — an African nation that's portrayed as a technological superpower of its own. As Coates says, when talking about his own plans to write for the character:

It's obviously not the case, but T'Challa — the Black Panther and mythical ruler of Wakanda — has always struck as the product of the black nationalist dream, a walking revocation of white supremacist myth. T'Challa isn't just a superhero in the physical sense, he is one of the smartest people in the world, ruling the most advanced civilization on the planet. Wakanda's status as ever-independent seems to eerily parallel Ethiopia's history as well as its place in the broader black imagination. Maybe it's only me, but I can't read Jason Aaron's superb "See Wakanda And Die" and not think of Adowa.

Comic book creators, like all story-tellers, get great mileage out of myth and history. But given the society we live in, some people's myths are privileged over others. Some of that is changing, no doubt. In the more recent incarnations of T'Challa you can see Christopher Priest invoking the language of the Hausa or Reginald Hudlin employing the legacy of colonialism. These were shrewd artistic decisions, rooted in the fact that anyone writing Black Panther enjoys an immediate, if paradoxical, advantage: the black diaspora is terra incognita for much of the world. What does the broader world really know of Adowa? Of Nanny and Cudjoe? Of the Maji-Maji rebellion? Of Legba and Oshun? Of Shine? Of High John The Conqueror? T'Challa's writers have always enjoyed access to a rich and under-utilized pool of allusion and invocation.

It's a proudly Afrocentric (and Afrofuturist) book, way ahead of its time, and put out by a major comics publisher. I imagine there are a lot of people for whom these throwaway, cheaply-printed comics were profound experiences when they were young. It's hard to imagine how much of that material can translate through to the eventual movie version, even when directed by a thoughtful and talented filmmaker like Ryan Coogler. But kids who go looking for the originals after they see it in theaters are in for a real surprise.

February 5, 2016

Filed under: journalism»articles

Catch-up: 2015

The last thing I'd written about here was the paper's investigation into police shootings, so let's take this chance to wander through the rest of 2015.

In October, after a Seattle dentist shot Cecil the lion and made himself temporarily infamous, one of our reporters put in a records request for all historical animal imports into the USA. The resulting story involved querying through seven and a half million rows of data to find out what we import, and how Paul Allen's Initiative 1401 (which banned the resale of several species of animal trophies) would affect these imports (answer: hardly at all). We also got to do some fun visualizations for it.

In November, my teammate Audrey worked with the Seattle Sketcher to create a voiced history of Ravensdale, a boomtown destroyed after a mining accident. In general, audio slideshows aren't hugely successful online, but I think this one was a really pleasant experience, and analytics indicate that a lot of people listened to it.

Every year, during the Seahawks season, the paper does a series of "paper hawks" — foldable paper dolls for players on the team. The last one is blank, so people can put in their own faces. To make things interesting, I put together a paper hawk web app that could use a camera to take a picture of the reader, and do all the customization in the page (including changing skin tones and hair color), then print it out. This was interesting project in part because the API I used (getUserMedia) is restricted to HTTPS only in Chrome. To make it work, we moved all of our projects to secure domains, which was a great test case for encrypting additional content at the paper.

For MLK Day, my team revived the Seattle Times' tribute to the great man, which was originally published twenty years ago (and had been last updated in 2011). The new version is responsive and easier to update, so that each year we can add more information to it. It's fitting, of course, that the paper has a page just for Dr. King, since they were a major part of the campaign to rename King County in his honor back in 1995. It's pretty cool to keep that tradition going.

Finally, just this week, we published a Pacific NW Magazine story on modern dating, with an interactive "mini-documentary" that I built with our video team. Based on your answers, it generates a custom playlist from the interviews that we recorded. We were inspired by this great piece done by the Washington Post on "the N word." I really enjoyed putting the interactions and animation together, but honestly, most of the credit goes to our video team, and my work was just the window dressing.

These are just the major interactives, of course. All told, we built 84 projects of all sizes last year, not including various small pages built by the producers using our app template. That's a pretty good rate of production for a two-developer team. Here's to a busy 2016!

January 21, 2016

Filed under: journalism»professional

Unconferencing

How do we level up data journalists? In a few months, we'll have a new digital/data intern at the Times, and so I've been asking myself this question quite a bit, especially in light of our team's efforts to recruit diverse candidates. There are a lot of students and young journalists out there with a little bit of training, but no idea where to go from there: how do we get them across the gap to where they're capable of working on a newsroom development team? There's a catch-22 at work here: it's especially tough for aspiring news devs to get a job without experience, but they can't get experience without the job.

One strategy I've often heard is that young people should attend industry conferences as a way to learn from experienced journalists and build connections. Myself, I'm skeptical of this. Conferences have never really been a part of my professional life. We didn't go to them at CQ, and I never got a chance to go to GDC when I worked in the game industry. After I was hired at the paper, I got to go to SND2015 and Write the Docs, and this year I'm heading to NICAR, SRCCON, and (possibly) CascadiaJS. It's possible I really hate myself.

Visiting conferences is rewarding, but it's also exhausting, expensive, and a huge time-sink. And while host organizations often work to mitigate that through scholarships and grants to disadvantaged communities, it's still a big ask for neophytes. Even if I weren't skeptical of the benefits conferences actually bring, I think it's hard to argue that we don't need better, more accessible solutions.

The way I see it, there are three things that you get out of a conference as a young person:

  1. Mentorship
  2. Training
  3. Exposure to developing industry trends

Of the three, the first is the hardest to duplicate, and yet it's the most crucial. Networks are powerful in this industry, and you can practically watch them develop before your eyes if you look closely: young people who catch a break early with the right people, and find themselves quickly elevated with opportunities to work on well-known teams, fill industry panels, and write insipid Nieman Lab think-pieces on the future of news. Then we all end up competing over hiring those same six people, which I don't really think is healthy.

Ironically, this is something I want to discuss with other newsrooms at the conferences this year, before I retreat into my Seattle cave for the rest of my natural life. But I'm also starting a personal initiative to make myself available for "remote mentorship," and asking other people to do so. If you're in news and would like to join, feel free to add yourself to the sheet, and I'll share it with students or other people who get in touch!

December 28, 2015

Filed under: tech»web

Let's not

Right now you can access my portfolio over a secure, encrypted connection, thanks to Let's Encrypt. Which is pretty cool! On the other hand, if nginx restarts this week, it'll probably crash on a bad config value, temporarily disabling all my public-facing websites. This has been emblematic of my HTTPS experience in general: a mix of triumphs and severe configuration mishaps.

A little background: in order to serve a website over a secure connection, you need a digital certificate to encrypt communication with the browser. You can generate these certificates yourself, but that's really only good for personal use. The self-signed cert has to be manually installed on each machine that accesses the server, otherwise the browser will throw up a big, ugly warning screen. The alternative is to buy a certificate from a "trusted authority," most of which are not particularly trustworth or authoritative, but it'll get you a green lock icon in the URL bar. Purchased certs tend to be either expensive or a hassle or both.

After the Snowden leaks, there was a lot of interest in encrypting all web traffic, which meant bypassing the existing certificate authority protection racket run by Symantec et al. Mozilla and some other organizations got together and started Let's Encrypt, with the goal of making trusted certificates free and easy. I figure they're halfway there: I didn't pay anyone for the cert, at least.

There's an official client for the service, but it only works for Apache and it's kind of hefty. My server is set up in an unsupported (but still pretty standard) configuration: I run nginx as a forward proxy in front of Apache (for PHP scripts) and Node (for various apps, including Weir), both of which I'd like to be secured. So I used acme-tiny instead, which basically just talks to the cert API and is small enough that I could read and understand the whole thing. I wrote a shell script to wrap it up and automate things. Automation is important, because unlike paid certificates, these are only good for 90 days, so you need a cron job set to run every month or so to renew them.

Setting all this up wasn't an easy process. The acme-tiny script is well-written, but it has bugs on the version of Python that comes with CentOS. Then I had to set up nginx to use the certificates manually. My webmail got locked into an infinite redirect once I moved my self-signed cert out from Apache and out to the proxy. And the restart crash? Turns out that Let's Encrypt is rate-limited on a per-domain basis, and I didn't back up the current certificate before I hit the rate limit, so my update script overwrote it with an empty version. Luckily, nginx caches certs and won't restart if it detects a bad config, so I'm safe as long as it can outlast the seven-day rate-limit window (it probably will: it's been up 333 days so far, after all).

Without literally years of server admin experience, I'm not sure I would have made it through these issues. And as I mentioned, my system is pretty standard — there's no load-balancer, no CDN, and I don't need to host third-party content. I also don't have any business that gets lost if anything is busted and the certificate expires in March. If I were, say, an IT department responsible for a high-traffic site, I'd be a lot more cautious about moving everything over to HTTPS, either through Let's Encrypt or a paid option.

Ultimately, the news industry and other sites are going to have to follow the lead of the Washington Post, even if the timeframe takes a while. Even apart from the security benefits it carries, browsers have locked new features (Service Worker, for example) behind HTTPS, and are moving old features behind it as well (geolocation is going to be the biggest disruption there). If you want to develop fast websites in the future (assuming that's something news product management cares about, which is... questionable), and especially if you want to create rich news applications, you're going to have to be encrypted.

In my case, I wanted to get a head start on developing with new browser features (a Service Worker would clean up a lot of Weir code), so it's worth the hassle. And we will continue to push these boundaries on the Seattle Times interactives team, since we've moved our S3 hosting to HTTPS (the rest of the site will follow eventually).

But I think there's a lot of tension between where we want to be, as a news industry, and where it's possible for us to be right now. Although I've seen people calling for incentives to change it (such as requiring HTTPS for news grants), the truth is that it just isn't that simple. News sites are often built in a baroque, overcomplicated set of layers — the Seattle Times, for example, currently sits behind a CDN, several instances of Varnish, some reverse proxies, and a load balancer, mostly due to a lot of historical baggage. Changing this to run securely is going to be a big process, even for a company of our size (maybe because of our size). I can't imagine the hassle for local papers that might have little or no IT support. It won't happen overnight, and Let's Encrypt hasn't done anything to change that yet.

In the meantime, I think it's worth stepping back and asking what we really want out of a digital news industry, because sometimes it's hard to maintain perspective from in the trenches. Is it important that readers be able to see our sites securely, free from worries that third parties are snooping or altering what they see? Sure, that's important. Is it in the top three things that Americans need from local news, above problems like "a sustainable revenue model" and "a CMS that doesn't actively fight against the newsroom?" Probably not. Given a choice between a cryptographically-secure media and a diverse, sustainably-funded media, I'm personally going to take the latter every time.

Past - Present - Future