February 27, 2008

Out on the Dynamic Range

For this month's Round Table, Corvus has written about using ambient noise in a game. Basically, he raises two points: first, the perfect silence of silences in a video game is a flaw in their ability to suspend belief because the real world contains no perfect silences; and second, few games have used ambient noise as a gameplay mechanic.

As I mentioned in the comments, this is not really restricted to gaming--it's a variant of arguments over dynamic range that audio producers have been having for more than 20 years. Ever since people realized that heavy compression (of volume, not data) made songs on FM radio sound "louder" (because our ears actually hear the average amplitude, and not its absolute level), people have been complaining that dynamic range has been abandoned--a debate that became even more bitter with the advent of digital media, which has a hard-coded maximum dB level.

Indeed, the decision to go softer instead of louder is something that any form of media can--and no doubt has--use to accentuate a critical moment for the audience. You can even take it to extremes--on both With Teeth and The Downward Spiral, Trent Reznor includes a section on several songs that drops to barely a fraction of its previous volume, which is quite a shock when it returns to full blast. And The Wire is notable for its diegetic sound, meaning that any music or sound effects have to come from the environment around the characters. The lack of "sonic cues" means that the audience isn't constantly being told how to feel about the onscreen events, which fits in with the show's "all grey areas" mentality.

So while I agree whole-heartedly with Corvus' second point, that silence or ambient sounds could be used far better, it's hard for me to agree with his first: that ambient sounds should be explicitly modeled, or even used for crucial gameplay cues.

Many of the reasons are technical. It may be impractical to model environmental white noise for entertainment, for example, just due to the uncertain hardware on the player's side. There's already a lot of noise possible there--do you really want to add more? On many built-in soundcards, which is what most people will use instead of having an add-in card, the internal processing often takes place at 48KHz instead of the more widely-used 44.1KHz, so all sounds get resampled up and down during playback--not a high-fidelity process. Most soundcard amplifiers, which raise the level to full line-level for output, are noisy and terrible. The amps in speakers or most low-end sound systems are hardly better. And who knows how someone will listen to your audio? I once had a roommate who ran his laptop into a tape adapter plugged into a cheap Sony boombox. Few living rooms are great acoustic environments, and few consumer headphones are a decent replacement. And of course, all of this applies to consoles, which are just as likely nowadays to use the same kinds of components as a PC.

Now it's true that the magical "HD era" is upon us, and more people are using digital sound output and other solutions for moving audio from one place to another. But it's also true that not everyone--not nearly--has a nice HD set. And even when they do, who's to say that sound receives the same attention as video? I have an LCD TV that will run up to 1080i, but I still listen to my DVDs through its built-in speakers--in stereo, no less--and while they're certainly adequate they're also nothing special. It's a sad truth that audio doesn't get nearly the attention that video does, from either the consumer or the manufacturer.

At the other end of the spectrum, there's simultaneously a movement taking place that can't leverage high-def audio--namely, downloads. It's a funny thing--going 3D meant that art assets actually got smaller for a little while, and with procedural tricks you can do quite a lot visually with a little. But you can't make something out of nothing in audio, unless you're going fully synthesized--which is an intriguing idea, but unlikely to occur for a very large number of reasons (latency, processor-drain, and lack of specialized hardware, to name a few). This leaves anyone who is aiming for a download market to consider their audio options carefully, and more than likely means choosing from compressed audio.

I won't go into the whole process of psychoacoustic compression (you could read my AudioFile article for that), but here's the basic idea: MP3 and almost every other audio compression format actually work by figuring out exactly how much noise they can add before you'll notice it. It's not impossible to compress a file containing nothing but ambient sound, but you kind of have to wonder why you'd bother. You'd be better off trying to generate it, or try to produce it from the environment (Creative's ill-fated EAX was a first step towards doing so with modeled reverb), either of which would probably be extremely hard to do more convincingly than just relying on the trope of silence. After all, the brain actually filters out ambient noise most of the time--if you make it noticeable enough to overpower the existing noise floor of hardware and compression and the listening environment, it's probably going to sound pretty terrible--like a load of static hiss over all your samples.

It is funny, in a way, that we're having this discussion about actually adding noise, because most people don't realize how important noise is to even the "cleanest" of sounds. How does analog-digital conversion overcome rounding error in its samples? How does MP3 shrink file size? How do delta-sigma DACs turn those numbers back into sound? The answer to all of these questions is the manipulation of noise. Without noise, your digital audio would actually sound more distorted.

While we could argue whether I'm correct about all this from a purely technical standpoint, and heaven knows I've been wrong plenty times before, I think there are also practical reasons to not rely on soft audio cues for storytelling--urban living and accessibility. For the former, remember that a lot of people live in apartments or townhouses, and can't crank up the sound. These days, plenty of gamers have kids, or irritable spouses, or other assorted wildlife which (in the name of continued coexistence) means that they can't listen as carefully as they might like. Likewise, while deaf gamers are not a cohort that's going to drive a lot of sales, they do exist, as do audience members with differing degrees of hearing damage. What are you going to do, subtitle the ambient noises?

This is not to say that it's not a good idea to incorporate more realistic dynamic range into entertainments of all kinds. Subtlety and realism are always welcome, and if we really are going HD, then by all means let's do it right. Likewise, as someone who often feels that video production is overstressed compared to the audible element in almost all forms of media, I really do love having these kinds of questions raised. I'm pointing out reasons why they might not work, but it's also entirely possible that they might. I hope someone will prove me wrong.

I'll leave that point with an anecdote: a while back I wrote another one of these posts about how I'd always wanted to play a shooter that worked just with audio--forcing players to orient themselves using the stereo field. Soon after, I think someone referred me to AudioQuake, which tries to do exactly that. There's just one problem with AudioQuake: it doesn't work very well. Turns out that stereo isn't really enough information to place objects in space (at least for me, and I'd guess most other people), and the cues it uses to represent level geometry aren't exactly user-friendly. I mean, I know the first level of Quake pretty well, but I couldn't find my way around with my eyes closed at all.

Does that mean no-one should have made AudioQuake, that it was a waste of time? Not at all! It sounded like a good idea at the time. And it might still be. I can think of a number of additions that might make it feasible--comb filters and delays to mimic the actual response of sounds travelling past a person's head, for example (read more) but we won't know until someone (not me) actually tries it.

