The last was critical. Brandenburg’s method was complex, and required several computationally demanding mathematical operations to be conducted simultaneously. 1980s computing technology was barely up to the task, and algorithmic efficiency was key. Brandenburg needed a virtuoso, a caffeine-addled superstar who could translate graduate-level mathematical concepts into flawless computer code. At Fraunhofer he found his man: a 26-year-old computer programmer by the name of Bernhard Grill.
Grill was shorter than Brandenburg and his manner was far more calm. His face was broad and friendly and he wore his sandy hair a little long. He spoke more loudly than Brandenburg, with more passion, and conversations with him were composed and natural. He told jokes, too, jokes that were—well, not all that funny either, but certainly better than Brandenburg’s.
In the world of audio, Grill stood out, for it was possible to imagine him as something other than an engineer. Like Brandenburg, he was Bavarian, but his attitude was more bohemian. He had a relaxed, wonkish nature to him, and was the sort of person who, had he lived in America, might have favored sandals and a Hawaiian shirt. Perhaps it was his background. While Brandenburg’s father was himself a professor, and most of the other Fraunhofer researchers hailed from the upper middle class, Grill’s father had worked in a factory. For Brandenburg, a university education had been a given, practically a birthright, but for Grill it had real meaning.
In his own way he had rebelled against the typisch Deutsch mentality. His original passion had been music. At a young age Grill had taken up the trumpet, and by his teens he was practicing six hours a day. During a brief period in his early 20s he had played professionally in a nine-piece swing band. When the economic realities of that career choice became apparent, he’d returned to engineering, and ended up studying computers. But music remained close to his heart, and over the years he amassed an enormous, eclectic collection of recorded music from a variety of obscure genres. His other hobby was building loudspeakers.
Brandenburg and Grill were joined by four other Fraunhofer researchers. Heinz Gerhäuser oversaw the institute’s audio research group; Harald Popp was a hardware specialist; Ernst Eberlein was a signal processing expert; Jürgen Herre was another graduate student whose mathematical prowess rivaled Brandenburg’s own. In later years this group would refer to themselves as “the original six.”
Beginning in 1987, they took on the full-time task of creating commercial products based on Brandenburg’s patent. The group saw two potential avenues for development. First, Brandenburg’s compression algorithm could be used to “stream” music—that is, send it directly to the user from a central server, as Seitzer had envisioned. Alternatively, Brandenburg’s compression algorithm could be used to “store” music—that is, create replayable music files that the user would keep on a personal computer. Either way, size mattered, and getting the compression ratio to 12 to 1 was the key.
It was slow going. Computing was still emerging from its homebrew origins, and the team built most of its equipment by hand. The lab was a sea of cables, speakers, signal processors, CD players, woofers, and converters. Brandenburg’s algorithm had to be coded directly onto programmable chips, a process that could take days. Once a chip was created, the team would use it to compress a ten-second sample from a compact disc, then compare it with the original to see if they could hear the difference. When they could—which, in the early days, was almost always—they refined the algorithm and tried again.
They started at the top, with the piccolo, then worked down the scale. Grill, who had obsessed over acoustics since childhood, could see at once that the compression technology was far from being marketable. Brandenburg’s algorithm generated a variety of unpredictable errors, and at times it was all Grill could do to take inventory. Sometimes, the encoding was “muddy,” as if the music were being played underwater. Sometimes it “hissed,” like static from an AM radio. Sometimes there was “double-speak,” as if the same recording had been overlaid twice. Worst of all was “pre-echo,” a peculiar phenomenon where ghostly remnants of musical phrases popped up several milliseconds early.
Brandenburg’s math was elegant, even beautiful, but it couldn’t fully account for the messy reality of perception. To truly model human hearing, they needed human test subjects. And these subjects required training to understand the vocabulary of failure as well as Grill did. And once this expertise was established, it would have to be submitted to thousands upon thousands of controlled, randomized, double-blind trials.
Grill approached this time-consuming endeavor with enthusiasm. He was what they called a “golden ear”: he could distinguish between microtones and pick up on frequencies normally available only to children and dogs. He approached the sense of hearing the way a perfumer approached the sense of smell, and this sharpened sense allowed him to name and grade certain sensory phenomena—certain aspects of reality, really—that others could never know.
Charged with selecting the reference material, Grill combed his massive compact disc archive for every conceivable form of music: funk, jazz, rock, R&B, metal, classical—every genre except rap, which he disliked. He wanted to throw everything he could find at Brandenburg’s algorithm, to be sure it could handle every conceivable case. Funded by Fraunhofer’s generous research budget, Grill went beyond music to become a collector of exotic noise. He found recordings of fast talkers with difficult accents. He found recordings of birdcalls and crowd noise. He found recordings of clacking castanets and mistuned harpsichords. His personal favorite came from a visit to Boeing headquarters in Seattle, where, in the gift shop, he found a collection of audio samples from roaring jet engines.
Under Grill’s direction, Fraunhofer also purchased several pairs of thousand-dollar Stax headphones. Made in Japan, these “electrostatic earspeakers” were the size of bricks and required their own dedicated amplifiers. They were impractical and expensive, but Grill considered the Stax to be the finest piece of equipment in the history of audio. They revealed every imperfection with grating clarity, and the ability to isolate these digital glitches spurred a cycle of continuous improvement.
Like a shrinking ray, the compression algorithm could target different output sizes. At half size, the files sounded decent. At quarter size, they sounded OK. In March 1988, Brandenburg isolated a recording of a piano solo, then dialed the encoding ratio as low as he dared—all the way down to Seitzer’s crazy stretch goal of one-twelfth CD size. The resulting encoding was lousy with errors. Brandenburg would later say the pianist sounded “drunk.” But even so, this experiment in uneasy listening gave him confidence, and he began to see for the first time how Seitzer’s vision might be achieved.
Increases in processing power spurred progress. Within a year Brandenburg’s algorithm was handling a wide variety of recorded music. The team hit a milestone with the 1812 Overture, then another with Tracy Chapman, then another with a track by Gloria Estefan (Grill was on a Latin kick). In late 1988, the team made its first sale, and shipped a hand-built decoder to the first ever end user of mp3 technology: a tiny radio station run by missionaries on the remote Micronesian island of Saipan.
But one audio source was proving intractable: what Grill, with his imperfect command of English, called “the lonely voice.” (He meant “lone.”) Human speech could not, in isolation, be psychoacoustically masked. Nor could you use Huffman’s pattern recognition approach—the essence of speech was its dynamic nature, its plosives and sibilants and glottal stops. Brandenburg’s shrinking algorithm could handle symphonies, guitar solos, cannons, even “Oye Mi Canto,” but it still couldn’t handle a newscast.