this space intentionally left blank

July 19, 2013

Filed under: tech»web

Over the Top

By the time you read this, I'll have been running Weir as my full-time RSS reader for two and a half weeks, starting on July 1. It's going well! Having just added OPML export, so that I can switch if it stops being worth the trouble, I've had a chance to sit back and consider some lessons learned from the project.

  1. Eight megabytes does not seem like a lot of data these days, but it adds up. Before I started culling feeds (and before I added Gzip support to the request service), Weir was pulling down roughly eight megs of data with each fetch, at ten minute intervals. That's 48MB per hour, a small amount that adds up to over a gigabyte per day. By default, I get 24 gigs per month of transfer allowance on my server. Something had to go.

    It's interesting to note, by the way, that this is something that RSS services like Newsblur or Feedly don't worry so much about, because the cost of each feed is spread across all subscribers. I didn't cost Google Reader as much traffic as Weir requires on its own.

  2. So I started unsubscribing. The majority of my original subscription list in Google Reader came from Paul Irish's front-end feed collection, and while I had already unsubscribed from the crazy people, it turns out most of the other blogs in that collection were dead. Even with 304 support added in, Weir was downloading a ton of RSS, only to discard much of it as being past the configured expiration date. I don't think this means blogging is a thing of the past, personally, but it's clearly down from its heyday in favor of social services, particularly (in the technical community) Google+.
  3. That said, using Feedburner seems to be a clear indication that you weren't that interested in blogging anyway, because it makes up a disproportionate amount of the abandoned or simply broken RSS feeds on my list. I suspect this is because it signals a lack of ownership. If you care about your feed, you maintain it yourself.
  4. Even with feeds that work, sometimes connections fail, or things break, just because it's the wild and crazy web out there. Taking that into consideration from the start, and tracking the last result for every feed, was one of the smarter things I did. I should probably be tracking more, but I'm too lazy to do real logging.
  5. Feeds are messy, and sanitization is hard. People inject all kinds of styles into their RSS. They include height and width attributes that don't play well with mobile. They put things into tables. They load scripts that I don't want to run, and images that I'd like to defer until their containing post is activated. Right now, I'm using document.implementation.createHTMLDocument() to make a functional (but "dead") DOM, then running a sanitization task over that, but figuring out that process--and making it watertight--has not been easy.
  6. In fact, working with RSS--ostensibly a "machine-readable" format--tends to drive home just how porous the web can be, and how amazing it is that it works at all. Take RSS date formats discovered by the developer of another reader web app, for example. I'm relatively isolated from the actual parsing, but there's code in Weir to work around buggy HTTP responses, missing feed information, and weird characters. Postel's Law has a lot to answer for, in my opinion.
  7. "Worse-is-better" really works for me as a personal development philosophy. My priority has been to get things running, no matter how badly--hacks get added to the .plan file and addressed later on, when I have time to figure out a graceful solution. This has kept my momentum high, and ensured that I don't get bogged down with architecture that I might not even need.
  8. The value of open-source software for this project can't be overstated. There's no way I could have built Weir on my own. In addition to Node, of course, I'm using a number of open-source modules for parsing feeds, handling two-factor auth, and storing posts in the database. Because of open source, I can patch those various libraries together, add my own code on top, and have a newsreader application that does everything I need. Weir doesn't stand on the shoulders of giants--it stands on the shoulders of countless other people, each giving a little bit back to the wider community.

Past - Present