this space intentionally left blank

April 8, 2010

Filed under: music»recording»mp3

Bitrot

Digital files don't wear out, right? This is one of the big advantages of the medium, particularly in studio situations: people love the warmth of tape, but it's fragile and it loses a tiny bit of fidelity every time you play it, much less when you make a copy. If you read a lot of studio how-to articles (a guilty pleasure of mine), a common theme is the engineer who records on tape for the sound, then immediately dumps it into Pro Tools for actual editing and mixing. And of course you can make a perfect copy of a digital file, where as there's no such thing in analog.

With one exception: back when DRM'd music sales were the norm, the typical way to remove that DRM was to burn the file to a CD and re-rip to MP3 format. This was seen as kind of a kludge, because the process involves conversion to a lossless .WAV format and then back into lossy pyschoacoustic compression. In theory, every time this happens, the latter step means a loss of information, and thus fidelity.

But how much of a loss? I started wondering this when I went to make a CD for a fellow dance student from some MP3 files I'd gotten from More Than A Stance. I didn't know how he planned to play them or how tech-savvy he was, so audio CDs seemed like a better choice than audio files on a data CD. But if he decided to rip the CDs back, how bad would the quality hit be? I decided to find out.

Using some shell scripting (first PowerShell, then old-fashioned batch files--never use a computer without at least one scripting option, kids), I sent a couple of MP3 files through a conversion roundtrip a few hundred times. My choices were "Beam Katana Chronicles" from the No More Heroes soundtrack and a remix of the Jackson Five's "Life of the Party" from DJ D.L.'s Soul Movement II, picking these particular tracks for a few reasons:

  • Both tracks are relatively close to the real-world case I was trying to figure out, with the latter being an actual dance track.
  • Both were layered compositions, with plenty of detail to lose during conversion.
  • Both included strong percussion tracks with plenty of hi-hat and snare--the kinds of high-frequency transient noises that easily smear and blur under psychoacoustic compression.
I used LAME to do the decoding and encoding at a 256kbps bitrate. On the first test, I actually ran the file out to a separate .wav and back. The second time, I figured out how to pipe the stdout from one LAME instance to the stdin of a second, and just bounced it between two MP3 files, which was much faster.

The results were surprising. Here's a table with some samples (caution: may be loud), which I'll summarize below.

iterations trackaudio
original No More Heroes
DJ D.L.
50 No More Heroes
DJ D.L.
100 No More Heroes
DJ D.L.
500 No More Heroes
DJ D.L.
At under 10 iterations, I can't tell a difference between the two files. At 30-50, it's subtle--there's a little bit of swirliness around the high end, and the transients are a little blurry, but nothing more than you'd expect from, say, a turntable. It's not until you hit 100 iterations--that's 100 times going from an MP3 file to a WAV and back--that it starts to become noticeable. At that point, there's some definite artifacting, and you can start to hear a little bit of pumping in the volume after each peak. Even still, it's not much beyond the extremes of dynamic compression that have emerged from the loudness wars, and if you snuck it into my playlist I wouldn't guarantee that I'd pick it out. Once you get beyond 100, it becomes more obvious that something's broken. By 500, there's some real glitchiness going on when the track hits full volume--surprisingly, much more in the NMH track than the J5, although the latter also has its "underwater washing machine" moments.

There are a few holes in my experiment that would be interesting to test:

  • I used a symmetrical encoding and decoding process, with the same codec feeding into itself. It would be interesting to see how a mix of two or more encoders would change these results. It's likely that this would accelerate the decay rate, but would it be enough to overcome the sizeable margin in this test?
  • Likewise, this was a test of high-bitrate encoding--simply because that's the scenario where most people would realistically encounter. I'm guessing the minimum bitrate for most people is 192kbps, and anything you buy these days is usually higher. But yes, at lower bitrates I'm guessing this is dramatically more detrimental.
  • Finally, this is a test of MP3. I like MP3, and I think the folks behind LAME have done about as good a job with it as they could, but it is a last-generation compression format. It'd be interesting to see how OGG, AAC, or WMA could stack up against it.

Still, I have to admit this is far better performance than I expected going in, and I was cheering for LAME to begin with. I think we can safely reach the conclusion that for limited, real-world cases of digital dubbing, there's no serious impact on sound quality that wasn't already lost in the first MP3 encoding. Burn and rip away!

Future - Present - Past