My last day at NPR was September 3, and I started at Chalkbeat on September 13. In the nine days in between, I tried to detox: I stayed away from the news, played a lot of Castlevania, and — in an effort to not feel completely useless — worked on a project I've been meaning to tackle for a while: I wrote a browser-based FM synth modeled on the classic Yamaha DX-7.
The DX-7 is the classic Lament Configuration of digital sound design. It's not only based on a model of synthesis that's unintuitive, but Yamaha wrapped it in a pushbutton user interface that discourages experimentation. Its sound defined an era almost entirely through the presets: the piano arpeggios from Twin Peaks, countless Whitney Houston ballads, and the bass line from Take on Me. I never owned a genuine DX-7, but I had one of Yamaha's budget models, and learning to build sounds on it was a long-standing white whale of mine.
Most synthesizers are what we call additive and subtractive. You generate a waveform, either by combining different wave shapes (sine, rectangle, sawtooth, or triangle) or using a noise generator, and then patch that through a series of filters and effects, and out the other end either emerges a transcendant reinterpretation of Bach (if you're Wendy Carlos) or a kind of deranged squawking (if you're me). This kind of synthesis isn't easy, per se, but it makes sense to someone who has used, say, a guitar pedalboard.
The DX-7 works differently, using something called frequency modulation (FM) synthesis. Essentially, it uses up to six sine wave oscillator units (called "operators"), but most of them aren't audible at any given time. Instead, the secondary units (called "modulators") are used to tweak the frequency of the audible operators ("carriers"). When the wave output of the modulator goes up, so does the frequency of its carrier. When the wave goes down, the freqency dips. Since these changes in frequency happen many times a second, and are often scaled to the input pitch from the keyboard, the result are complicated harmonic patterns, often described as metallic, bell-like, percussive.
WebAudio is a kind of beautiful monstrosity. It's easy to imagine an API designer deciding that browser audio should be mostly WebGL-style primitives, basically just handing you an audio buffer array and leaving the sound generation up to you. Or the pendulum could have swung the other way, toward extreme user-friendliness, with just a slightly more performant version of the <audio> tag letting you load and trigger preset clips.
Instead, the final API ends up looking more akin to a classic Moog patchbay or a studio effects rack, letting you wire various modules together into a complex signal chain. Those nodes start out as simple oscillators and gain amplifiers, but from there it gets pretty batteries-included: nodes for impulse convolution, multiple shaped filters, and compression, plus a custom "script worker" node as an escape hatch.
Crucially for my purposes, WebAudio signal nodes can be wired to more than just audio inputs and outputs. You can also hook nodes into the control parameters, so that the output from one changes the volume or strength of another. In our case, we can use the audio signal from our modulators and pipe it into the frequency value of our carriers. It's not quite the same as the classic DX-7 formula, but it performs very well, and the sound actually isn't that far off. You can hear the classic EPIANO1 preset adapted to my code on the GitHub demo page for the project.
However, while this implementation felt more intuitive than juggling Math.sin(), WebAudio also has some quirks that made it tricky. For example, oscillators are single-shot: they can only be started and stopped once. The API is full of this kind of design, where you're supposed to create nodes, connect them to the graph, and then throw them away. But when you have modulator oscillators feeding into carrier oscillators in a complicated web of amplifiers and filters, disposable audio sources don't really fit the design.
In the end, I had to wrap the whole thing in a disposable Voice class that encapsulates an arrangement of operators for a single note. When the synth is asked to play a sound, it creates a Voice containing a fresh set of operators, hooks that up to the audio context, and sends it on its merry way. This effectively makes our synthesizer polyphonic by default, since each individual frequency gets its own voice on demand. It feels wasteful, but it works.
Working on a project like this makes me think a lot about how it is that I build projects, and how to teach others to do the same. We often tell junior developers that they should learn by creating something fairly complex, but we don't really tell them how to do it. I suspect this is because it becomes fairly instinctual over time, so it's hard to explain.
Part of what we don't tell junior developers is that big projects are built out of little projects, one level of abstraction at a time. For example, for the Hello Operator repo, the process of getting a (mostly) working synthesizer looks like:
I suspect that when we say "build bigger projects," what people hear is that their application needs to spring fully-formed from their head like Athena, but literally nothing I've ever built has been scoped that way. It's always been a gradual accretion of functionality. Caret, for example, started out as just a text box and a keyboard input, and everything else, from tabs to project management, grew from there.
Did I accomplish my goal of learning to program a DX-7? No. But ultimately, for these kinds of projects, that's not really the point. I learned a lot about sound, how the browser processes it, and how to handle new kinds of input. One day, I might even finish it. Brian Eno, eat your heart out.