this space intentionally left blank

September 7, 2011

Filed under: tech»i_slash_o

In the Thick of It

After a month of much weeping and gnashing of teeth, my venerable laptop is back from out-of-warranty repair. That month was the longest I've been without a personal computer in years, if not decades, but I've survived--giving me hope for when our machine overlords rise up and replace Google Reader with endless reruns of The Biggest Loser.

After the second day, I felt good. In fact, I started to think that computing without a PC was doable. I do a lot of work "in the cloud," after all: I write posts and browse the web using Nano and Lynx via terminal window, and my social networks are all mobile-friendly. Maybe I'd be perfectly happy just using my smartphone for access--or anything that could open an SSH session, for that matter! Quick, fetch my Palm V!

Unfortunately, that was just the "denial" stage. By the second week of laptoplessness, my optimism had faded and I was climbing the walls. It's a shame, but I don't think cloud computing is there yet. As I tried to make it through a light month of general tasks, I kept running into barriers to acceptance, sorted into three categories: the screen, the keyboard, and the sandbox.

The Screen

Trying to do desktop-sized tasks in a browser immediately runs into problems on small-screen devices. It's painful to interact with desktop sites in an area that's under 13" across. It's even more annoying to lose half of that space for a virtual keyboard. When writing, the inability to fit an entire paragraph onscreen in a non-squint font makes it tougher to write coherently. I probably spend more time zooming in and out than actually working.

But more importantly, the full-screen application model is broken for doing real work. That sounds like nerd snobbery ("I demand a tiling window manager for maximum efficiency snort"), but it's really not. Consider a simple task requiring both reading and writing, like assembling a song list from a set of e-mails. On a regular operating system, I can put those two tasks side by side, referring to one in order to compile the other. But on today's smartphones, I'm forced to juggle between two fullscreen views, a process that's slow and clumsy.

There's probably no good way to make a multi-window smartphone. But existing tablets and thin clients (like Chromebooks) are also designed around a modal UI, which makes them equally impractical for tasks that involve working between two documents at the same time. I think the only people who are even thinking about this problem are Microsoft with their new Windows 8 shell, but it's still deep in development--it won't begin to influence the market for at least another year, if then.

The Keyboard

I bought a Thinkpad in the first place because I'm a sucker for a good keyboard, but I'm not entirely opposed to virtual input schemes. I still remember Palm's Graffiti--I even once wrote a complete (albeit terrible) screenplay in Graffiti. On the other hand, that was in college, when I was young and stupid and spending a lot of time in fifteen-passenger vans on the way to speech tournaments (this last quality may render the previous two redundant). My patience is a lot thinner now.

Input matters. A good input method stays out of your way--as a touch-typist, using a physical keyboard is completely effortless--while a weaker input system introduces cognitive friction. And what I've noticed is that I'm less likely to produce anything substantial using an input method with high friction. I'm unlikely to even start anything. That's true of prose, and more so for the technical work I do (typical programming syntax, like ><{}[];$#!=, is truly painful on a virtual keyboard).

Proponents of tablets are often defensive about the conventional wisdom that they're oriented more towards consumption, less toward creativity. They trumpet the range of production software that's available for making music and writing text on a "post-PC" device, and they tirelessly champion whenever an artist uses one to make something (no matter how many supporting devices were also used). But let's face it: these devices are--as with anything--a collection of tradeoffs, and those tradeoffs almost invariably make it more difficult to create than to consume. Virtual keyboards make it harder to type, interaction options are limited by the size of the screen, input and output can't be easily expanded, and touch controls are imprecise at best.

Sure, I could thumb-type a novel on my phone. I could play bass on a shoebox and a set of rubber bands, too, but apart from the novelty value it's hard to see the point. I'm a pragmatist, not a masochist.

The Sandbox

I almost called this section "the ecosystem," but to be honest it's not about a scarcity of applications. It's more about process and restriction, and the way the new generation of mobile operating systems are designed.

All these operating systems, to a greater or lesser extent, are designed to sandbox application data from each other, and to remove the heirarchical file system from the user. Yes, Android allows you to access the SD card as an external drive. But the core applications, and most add-on applications, are written to operate with each other at a highly-abstracted level. So you don't pick a file to open, you select an image from the gallery, or a song from the music player.

As an old PalmOS user and developer, from back in the days when they had monochrome screens and ran on AAAs, this has an unpleasantly familiar taste: Palm also tried to abstract the file system away from the user, by putting everything into a flat soup of tagged databases. Once you accumulated enough stuff, or tried to use a file across multiple applications, you were forced to either A) scroll through an unusably lengthy list that might not even include what you want, or B) run normal files through an external conversion process before they could be used. The new metaphors ('gallery' or 'camera roll' instead of files) feel like a slightly hipper version of the PalmOS behavior that forced me over to Windows CE. We're just using Dropbox now instead of Hotsync, and I fail to see how that's a substantial improvement.

Look, heirarchical file systems are like democracy. Everyone agrees that they're terrible, but everything else we've tried is worse. Combined with a decent search, I think they can actually be pretty good. It's possible that mobile devices can't support them in their full richness yet, but that's not an argument that they've gotten it right--it shows that they still have a lot of work to do. (The web, of course, has barely even started to address the problem of shared resources: web intents might get us partway there, one day, maybe.)

The Future

When I was in high school, Java had just come out. I remember talking with some friends about how network-enabled, platform-independent software would be revolutionary: instead of owning a computer, a person would simply log onto the network from any device with a Java VM and instantly load their documents and software--the best of thin and thick clients in one package.

Today's web apps are so close to that idea that it kind of amazes me. I am, despite my list of gripes, still optimistic that cloud computing can take over much of what I do on a daily basis, if only for environmental reasons. My caveats are primarily those of form-factor: it's technically possible for me to work in the cloud, but the tools aren't productive, given the recent craze for full-screen, touch-only UIs.

Maybe that makes me a holdover, but I think it's more likely that these things are cyclical. It's as though on each new platform, be it mobile or the browser, we're forced to re-enact the invention of features like windows, multitasking, application management, and the hard drive. Each time, we start with the thinnest of clients and then gradually move more and more complexity into the local device. By the time "the cloud" reaches a level where it's functional, will it really be that different from what I'm using now?

December 29, 2010

Filed under: tech»i_slash_o

Keyboard. How quaint.

Obligatory Scotty reference aside, voice recognition has come a long way, and it's becoming more common: just in my apartment, there's a Windows 7 laptop, my Android phone, and the Kinect, each of which boasts some variation on it. That's impressive, and helpful from an accessibility standpoint--not everyone can comfortably use a keyboard and mouse. Speaking personally, though, I'm finding that I use it very differently on each device. As a result, I suspect that voice control is going to end up like video calling--a marker of "futureness" that we're glad to have in theory, but rarely leverage in practice.

I tried using Windows voice recognition when I had a particularly bad case of tendonitis last year. It's surprisingly good for what it's trying to do, which is to provide a voice-control system to a traditional desktop operating system. It recognizes well, has a decent set of text-correction commands, and two helpful navigation shortcuts: Show Numbers, which overlays each clickable object with a numerical ID for fast access, and Mouse Grid, which lets you interact with arbitrary targets using a system right out of Blade Runner.

That said, I couldn't stick with it, and I haven't really activated it since. The problem was not so much the voice recognition quality, which was excellent, but rather the underlying UI. Windows is not designed to be used by voice commands (understandably). No matter how good the recognition, every time it made a mistake or asked me to repeat myself, my hands itched to grab the keyboard and mouse.

The system also (and this is very frustrating, given the extensive accessibility features built into Windows) has a hard time with applications built around non-standard GUI frameworks, like Firefox or Zune--in fact, just running Firefox seems to throw a big monkey wrench into the whole thing, which is impractical if you depend on it as much as I do. I'm happy that Windows ships with speech recognition, especially for people with limited dexterity, but I'll probably never have the patience to use it even semi-exclusively.

On the other side of the spectrum is Android, where voice recognition is much more limited--you can dictate text, or use a few specific keywords (map of, navigate to, send text, call), but there's no attempt to voice-enable the entire OS. The recognition is also done remotely, on Google's servers, so it takes a little longer to work and requires a data connection. That said, I find myself using the phone's voice commands all the time--much more than I thought I would when the feature was first announced for Android 2.2. Part of the difference, I think, is that input on a touchscreen feels nowhere near as speedy as a physical keyboard--there's a lot of cognitive overhead to it that I don't have when I'm touch-typing--and the expectations of accuracy are much lower. Voice commands also fit my smartphone usage pattern: answer a quick query, then go away.

Almost exactly between these two is the Kinect. It's got on-device voice recognition that no doubt is based on the Windows speech codebase, so the accuracy's usually high, and like Android it mainly uses voice to augment a limited UI scheme, so the commands tend to be more reliable. When voice is available, it's pretty great--arguably better than the gesture control system, which is prone to misfires (I can't use it to listen to music while folding laundry because, like the Heart of Gold sub-etha radio, it interprets inadvertent movements as "next track" swipes). Unfortunately, Kinect voice commands are only available in a few places (commands for Netflix, for example, are still notably absent), and a voice system that you can't use everywhere is a system that doesn't get used. No doubt future updates will address this, but right now the experience is kind of disappointing.

Despite its obvious flaws, the idea of total voice control has a certain pull. Part of it, probably, is the fact that we're creatures of communication by nature: it seems natural to use our built-in language toolkit with machines instead of having to learn abstractions like keyboards or mouse, or even touch. There may be a touch of the Frankenstein to it as well--being able to converse with a computer would feel like A.I., even if it were a lot more limited. But the more I actually use voice recognition systems, the more I think this is a case of not knowing what we really want. Language is ambiguous by its nature, and computers are already scary and unpredictable for a lot of people. Simple commands for a direct result are helpful. Beyond that, it's a novelty, and one that quickly wears out its welcome.

Past - Present