This week, Neil Young finally made the dreams of heavy-walleted audiophiles a reality by releasing the PonoPlayer, a digital audio player that's specifically made for lossless files recorded at 192KHz. Basically, it plays master recordings, as opposed to the downsampled audio that ends up on CDs (or, god forbid, those horrible MP3s that all the kids are listening to these days). It's been a while since I've written about audio or science, so let's talk about why the whole idea behind Pono — or indeed, most audiophile nattering about sample rate — is hogwash.
To understand sample rates, we need to back up and talk about one of the fundemental theories of digital audio: the Nyquist limit, which says that in order to accurately record and reproduce a signal, you need to sample at twice the frequency of that signal. Above the limit, the sampler doesn't record often enough to preserve the variation of the wave, and the input "wraps around" the limit. The technical term for this is "aliasing," because the sampled wave becomes indistinguishable from a lower-frequency waveform. Obviously, this doesn't sound great: at a 10KHz sample rate, an 9KHz audio signal would wrap around and play in the recording as 1KHz — a transition in scale roughly the same as going from one end of the piano to another.
To solve this problem, when digital audio came of age with CDs, engineers did two things. First, they put a filter in front of the sample input that filters out anything above the Nyquist limit, which keeps extremely high-frequency sounds from showing up in the recording as low-frequency noises. Secondly, they selected a sample rate for playback that would be twice the frequency range of normal human hearing, ensuring that the resulting audio would accurately represent anything people could actually hear. That's why CDs use 44.1KHz sampling: it gives you signal accuracy at up to 22.05KHz, which is frankly generous (most human hearing actually drops off sharply at around 14KHz). There's not very much point in playback above 44.1KHz, because you couldn't hear it anyway.
There's a lot of misunderstanding of how this works among people who consider themselves to be audiophiles (or musicians). They look at something like the Nyquist limit and what they see is information that's lost: filtered out before sampling, then filtered again when it gets downsampled from the high-resolution Pro Tools session (which may need the extra sample data for filtering and time-stretching). But truthfully, this is a glass-half-full situation. Sure, the Nyquist limit says we can't accurately record above 1/2 the sample rate — but on the other hand, below that limit accuracy is guaranteed. Everything that people can actually hear is reproduced in CD-quality audio.
This isn't to say that the $400 you'll pay for a PonoPlayer is a total scam. Although the digital-analog converter (DAC) inside it probably isn't that much better than the typical phone headphone jack, there are lots of places where you can improve digital audio playback with that kind of budget. You can add a cleaner amplifier, for example, so that there's less noise in the signal. But for most people, will it actually sound better? Not particularly. I think it's telling that one of their testimonials compares it to a high-end turntable — vinyl records having a notoriously high noise floor and crappy dynamic range, which is the polar opposite of what Pono's trying to do. You'd probably be better off spending the money on a really nice set of headphones, which will make a real difference in audio quality for most people.
I think the really interesting question raised by Pono is not the technical gibberish on their specifications page (audiophile homeopathy at its best), but rather to ask why: why is this the solution? Neil Young is a rich, influential figure, and he's decided that the industry problem he wants to solve is MP3 bitrates and CD sampling, but why?
I find Young's quest for clarity and precision fascinating, in part, because the rock tradition he's known for has always been heavily mediated and filtered, albeit in a way that we could generously call "engineered" (and cynically call "dishonest"). A rock recording is literally unnatural. Microphones are chosen very specifically for the flavor that they bring to a given instrument. Fake reverb is added to particular parts of the track and not to others, in a way that's not at all like live music. Don't even get me started on distortion, or the tonal characteristics of recording on magnetic tape.
The resulting characteristics that we think of as a "rock sound" are profoundly artificial. So I think it's interesting — not wrong, necessarily, but interesting — that someone would spend so much time on recreating the "original form" (their words) of music that doesn't sound anything like its live performance. And I do question whether it matters musically: one of my favorite albums of all time, the Black Keys' Rubber Factory, is a cheaply-produced and badly-mastered recording of performances in an abandoned building. Arguably Rubber Factory might sound better as MP3 than it does as the master, but the power it has musically has nothing to do with its sample rate.
(I'd still rather listen to it than Neil Young, too, but that's a separate issue.)
At the same time, I'm not surprised that a rock musician pitched and sold Pono, because it seems very much of that genre — trying to get closer to analog sound, because it came from an age of tape. These days, I wonder what would be the equivalent "quality" measurement for music that is deeply rooted in digital (and lo-fi digital, at that). What would be the point of Squarepusher at 192KHz? How could you remaster the Bomb Squad, when so much of their sound is in the sampled source material? And who would care, frankly, about high-fidelity chiptunes?
It's kind of fun to speculate if we'll see something like Pono in 20 years aimed at a
generation that grew up on digital compression: maybe a
hyperaudio player that can connect songs via the original tunes they both sample, and
annotate lyrics for you a la Rap Genius? 3D audio, that shifts based on position?
Time-stretching and resampling to match your surroundings? I don't know, personally. And I
probably won't buy it then, either. But I like to think that those solutions will be at
least more interesting than just increasing some numbers and calling it a revolution.