this space intentionally left blank

October 26, 2010

Filed under: meta»blosxom

The PHP Version

About a year back, Mile Zero started to seriously drag in terms of performance, taking more than two seconds to render the page. The problem seemed to be a combination of things: the CGI interface was slow, it didn't run under mod_perl, and I had accumulated a vast number of posts it was having to sift through--which, given that my Blosxom CMS uses the file system as its database, meant lots of drive I/O.

Since I needed to dip my feet into server-side coding anyway, I rewrote Blosxom in PHP. There are a few PHP versions of the script online, but they seemed like a hassle to install, and none of them had support for the plugins I was using--it was almost easier to just do it myself. The result was faster, smaller, and proved to be a great first programming project. Since it's also proved basically stable over the last year, I've decided to go ahead and post the source code in case anyone else wants it (consider it released under the WTF Public License). I suspect the market for file-based blogging scripts is fairly small at this point, but you never know.

HOW IT WORKS

Essentially, both versions of Blosxom work the same way: they recurse through the contents of your blog folder, looking for text files with a certain extension (.txt by default) and building a list. Then they sort the list by reverse-chron date and insert the contents of the first N entries into a set of templates (head, foot, story, and date). Using a REST-like URL scheme, you can change templates (helpful for mobile or RSS) or filter entries by subfolder. It's primitive, but it's also practically unhackable, and it's an awesome way to blog if you like text files. Turns out that I like text files a lot.

Original Blosxom boasted an impressive plugin collection, which it implemented via a package system: plugins exposed a function for each stage of the page assembly process where they wanted to get involved, and the main script would call out to them during those actions, passing in various parameters depending on the task. This being Perl, the whole thing was a weird approximation of object-oriented code that looked like a string of constant cartoon profanity.

PHP provided, I think, better tools. So a plugin for my new version of Blosxom does three things: it sets up its class definition, which should include the appropriate methods for its type, as well as any class or static properties it might need, then it instantiates itself, and finally adds itself to one of several global arrays by plugin type. During execution, the main script iterates through these arrays at the proper time, calling each plugin object's processing method in turn. At least, that's how it works in theory. In practice, I've only implemented plugins for entry text manipulation, because that's all I needed. But the pattern should carry forward without problems to other parts of the process, although you might want to rename the existing process() API method to something more specific, like processEntry(). That way a single plugin could register to handle multiple stages of rendering.

ENOUGH OF THAT, HOW DO I INSTALL IT?

Just copy the script to a publicly-accessible directory, and edit the configuration variables to point it toward your content directory. The part that tends to be confusing is the $datadir variable, which needs to be set to your internal server path (what you see if you log in via FTP or SSH), not the external URL path.

Next, you'll need to set up your templates. For each flavor, Blosxom loads a series of template files and inserts your content. These files are:

  • content_type.flavor - Allows you to mess with the HTTP content-type, if you really want to.
  • head.flavor - Everything that comes before your entries
  • date.flavor - Format for the date subhead
  • story.flavor - Template for each story, including its title and metadata
  • foot.flavor - Everything that comes after your entries
  • 404.flavor - What to display instead of an entry if no content is found
In each of these templates, you simply write standard HTML, inserting a placeholder variable where the actual content will go. This process is identical to original Blosxom theming system, including most of the supported variable names.

At that point, when you put text files in the data directory, they'll be assembled into blog entries based on the file modification time. The first line becomes the title of the entry. You can categorize entries by putting them in subfolders, or subfolders of subfolders, and then appending the path after the Blosxom script URL.

I've included a couple of entry plugins, as well, just to show how they generally work. One is a comment counter for the old Pollxn comment system that I still use--the CGI script works fine for comments, but the Perl can't interface with the new PHP script to say how many comments there are on any given entry. The other is a port of the directorybrowse plugin, which creates the little broken-up paths at the end of each entry, so people can jump up to a different level of the category heirarchy. They're short and mostly self-explanatory.

LESSONS LEARNED

At this point, I've been blogging on Blosxom, either the original or this custom version, for slightly more than five years. During that time, I've had all the dates wiped out during a bad server transition, I've moved hosts two or three times, and I've tweaked the site constantly. I think the level of effort is comparable to people I know on more traditional blogging platforms like Wordpress or Movable Type. Of course, the art of writing online isn't really about the tools. But there are some ways that Blosxom has its own quirks--for better and for worse.

The big hassle has been the folder system, especially for a personal blog like this one, where I may ramble across any number of loosely-connected topics from day to day. Basing a taxonomy on folders means that posts can't span multiple categories--I can't have something that's both /journalism and /gaming, for example, which is unfortunate when writing about something like incentive systems for news sites. And once it's been created, you're pretty much stuck with a category, since most of the old links will target the old folder. There aren't many reasons I would want to switch to a database CMS, but the ability to categorize by tags tops the list.

On the other hand, there's something to be said as a writer for the flat-file approach. It has an immediacy to it that a database layer can't duplicate. I don't have to sign into an admin page, visit the "create post" section, type my code into one of those text-mangling rich editing forms, select "publish," and then watch it validate and republish the whole blog. I just open a text editor and start typing, and when I save it somewhere, it's live. Working this way is great for eliminating distractions and obstacles. There's no abstraction between what my fingers and the end product.

And while working via individual files is probably less safe or reliable compared to a SQL store, it benefits from easy hackability. I don't have to understand the CMS schema to fix anything that goes wrong, or add new features, or make wide changes. If I decided to change where my linked images are located tomorrow, I'd have all the power of UNIX's text-obsessed command line at my fingertips for propagating those changes. For an organization, that'd be insane. But it works pretty well as long as it's just me tinkering around on the server in my spare time.

Blosxom also ended up being a pretty decent content framework when I recoded my portfolio earlier this year as a single-page JQuery interactive. I tweaked a couple of themes, added a client URL parameter and a teaser plugin, and within a day had it serving up the desired HTML snippets in response to my AJAX calls, while still providing a low-fidelity version for JavaScript-disabled browsers. I'm sure you could do the same kind of thing with a serious CMS like Drupal or Django, but I don't know that I could have done it as quickly or simply.

I don't recommend that anyone else try running a web page this way. But as a learning experience, writing your own tiny server framework serves pretty well. It's a good challenge that covers the broadest parts of Internet programming--file access, data structures, sorting, filtering, caching, HTTP requests, and output. And hey, it works for me. Maybe it'll work for you too.

Past - Present