The patent was rejected. The earliest digital phone lines were primitive affairs, and the enormous amount of audio data on the compact disc could never fit down such a narrow pipe. For Seitzer’s scheme to work, the files on the disc would have to be shrunk to one-twelfth their original size, and no known approach to data compression would get you anywhere near this level. Seitzer battled with the patent examiner for a few years, citing the importance of Zwicker’s findings, but without a working implementation it was hopeless. Eventually, he withdrew his application.
Still, the idea stayed with him. If the limitations of the human ear had been mapped by Zwicker, then the remaining task was to quantify these limitations with math. Seitzer himself had never been able to solve this problem, nor had any of the many other researchers who had tried. But he directed his own protégé toward the problem with enthusiasm: the young electrical engineering student named Karlheinz Brandenburg was one of the smartest people he’d ever met.
Privately, Brandenburg wondered if a decade of table tennis with an eccentric otological experimenter had driven Seitzer insane. Information in the digital age was stored in binary units of zero or one, termed “bits,” and the goal of compression was to use as few of these bits as possible. CD audio used more than 1.4 million bits to store a single second of stereo sound. Seitzer wanted to do it with 128,000.
Brandenburg thought this goal was preposterous—it was like trying to build a car on a budget of two hundred dollars. But he also thought it was a worthy target for his own ambitions. He worked on the problem for the next three years, until in early 1986 he spotted an avenue of inquiry that had never been explored. Dubbing this insight “analysis by synthesis,” he spent the next few sleepless weeks writing a set of mathematical instructions for how those precious bits could be assigned.
He began by chopping the audio up. With a “sampler,” he divided the incoming sound into fractional slivers of a second. With a “filter bank,” he then further sorted the audio into different frequency partitions. (The filter bank worked on sound the way a prism worked on light.) The result was a grid of time and frequency, consisting of microscopic snippets of sound, sorted into narrow bands of pitch—the audio version of pixels.
Brandenburg then told the computer how to simplify these audio “pixels” using four of Zwicker’s psychoacoustic tricks:
First, Zwicker had shown that human hearing was best at a certain range of pitch frequencies, roughly corresponding to the tonal range of the human voice. At registers beyond that, hearing degraded, particularly as you went higher on the scale. That meant you could assign fewer bits to the extreme ends of the spectrum.
Second, Zwicker had shown that tones that were close in pitch tended to cancel each other out. In particular, lower tones overrode higher ones, so if you were digitizing music with overlapping instrumentation—say a violin and a cello at the same time—you could assign fewer bits to the violin.
Third, Zwicker had shown that the auditory system canceled out noise following a loud click. So if you were digitizing music with, say, a cymbal crash every few measures, you could assign fewer bits to the first few milliseconds following the beat.
Fourth—and this is where it gets weird—Zwicker had shown that the auditory system also canceled out noise prior to a loud click. This was because it took a few milliseconds for the ear to actually process what it was sensing, and this processing could be disrupted by a sudden onrush of louder noise. So, going back to the cymbal crash, you could also assign fewer bits to the first few milliseconds before the beat.
Relying on decades of empirical auditory research, Brandenburg told the bits where to go. But this was just the first step. Brandenburg’s real achievement was figuring out that you could run this process iteratively. In other words, you could take the output of his bit-assignment algorithm, feed it back into the algorithm, and run it again. And you could do this as many times as you wished, each time reducing the number of bits you were spending, making the audio file as small as you liked. There was degradation of course: like a copy of a copy or a fourth-generation cassette dub, with each successive pass of the algorithm, audio quality got worse. In fact, if you ran the process a million times, you’d end up with nothing more than a single bit. But if you struck the right balance, it would be possible to both compress the audio and preserve fidelity, using only those bits you knew the human ear could actually hear.
Of course, not all musical work employed such complex instrumentation. A violin concerto might have all sorts of psychoacoustic redundancies; a violin solo would not. Without cymbal crashes, or an overlapping cello, or high register information to be simplified, there was just a pure tone and nowhere to hide. What Brandenburg could do here, though, was dump the output bits from his compression method into a second, completely different one.
Termed “Huffman coding,” this approach had been developed by the pioneering computer scientist David Huffman at MIT in the 1950s. Working at the dawn of the Information Age, Huffman had observed that if you wanted to save on bits, you had to look for patterns, because patterns, by definition, repeated. Which meant that rather than assigning bits to the pattern every time it occurred, you just had to do it once, then refer back to those bits as needed. And from the perspective of information theory, that was all a violin solo was: a vibrating string, cutting predictable, repetitive patterns of sound in the air.
The two methods complemented each other perfectly: Brandenburg’s algorithm for complicated, overlapping noise; Huffman’s for pure, simple tones. The combined result united decades of research into acoustic physics and human anatomy with basic principles of information theory and complex higher math. By the middle of 1986, Brandenburg had even written a rudimentary computer program that provided a working demonstration of this approach. It was the signature achievement of his career: a proven method for capturing audio data that could stick to even the stingiest budget for bits. He was 31 years old.
He received his first patent before he’d even defended his thesis. For a graduate student, Brandenburg was unusually interested in the dynamic potential of the marketplace. With a mind like his, a tenure-track position was guaranteed, but academia held little interest for him. As a child he’d read biographies of the great inventors, and at an early age had internalized the importance of the hands-on approach. Brandenburg—like Bell, like Edison—was an inventor first.
These ambitions were encouraged. After escaping from Zwicker, Dieter Seitzer had spent most of his own career at IBM, accruing basic patents and developing keen commercial instincts. He directed his graduate students to do likewise. When he saw the progress that Brandenburg was making in psychoacoustic research, he pushed him away from the university and toward the nearby Fraunhofer Institute for Integrated Circuits, the newly founded Bavarian technology incubator that Seitzer oversaw.
The institute was a division of the Fraunhofer Society, a massive state-run research organization with dozens of campuses across the country—Germany’s answer to Bell Labs. Fraunhofer allocated taxpayer money toward promising research across a wide variety of academic disciplines, and, as the research matured, brokered commercial relationships with large consumer industrial firms. For a stake in the future revenues of Brandenburg’s ideas, Fraunhofer offered state-of-the-art supercomputers, high-end acoustic equipment, professional intellectual property expertise, and skilled engineering manpower.