Выбрать главу

In industry, there’s no sign that knowledge engineering will ever be able to compete with machine learning outside of a few niche areas. Why pay experts to slowly and painfully encode knowledge into a form computers can understand, when you can extract it from data at a fraction of the cost? What about all the things the experts don’t know but you can discover from data? And when data is not available, the cost of knowledge engineering seldom exceeds the benefit. Imagine if farmers had to engineer each cornstalk in turn, instead of sowing the seeds and letting them grow: we would all starve.

Another prominent machine-learning skeptic is the linguist Noam Chomsky. Chomsky believes that language must be innate, because the examples of grammatical sentences children hear are not enough to learn a grammar. This only puts the burden of learning language on evolution, however; it does not argue against the Master Algorithm but only against it being something like the brain. Moreover, if a universal grammar exists (as Chomsky believes), elucidating it is a step toward elucidating the Master Algorithm. The only way this is not the case is if language has nothing in common with other cognitive abilities, which is implausible given its evolutionary recency.

In any case, if we formalize Chomsky’s “poverty of the stimulus” argument, we find that it’s demonstrably false. In 1969, J. J. Horning proved that probabilistic context-free grammars can be learned from positive examples only, and stronger results have followed. (Context-free grammars are the linguist’s bread and butter, and the probabilistic version models how likely each rule is to be used.) Besides, language learning doesn’t happen in a vacuum; children get all sorts of cues from their parents and the environment. If we’re able to learn language from a few years’ worth of examples, it’s partly because of the similarity between its structure and the structure of the world. This common structure is what we’re interested in, and we know from Horning and others that it suffices.

More generally, Chomsky is critical of all statistical learning. He has a list of things statistical learners can’t do, but the list is fifty years out of date. Chomsky seems to equate machine learning with behaviorism, where animal behavior is reduced to associating responses with rewards. But machine learning is not behaviorism. Modern learning algorithms can learn rich internal representations, not just pairwise associations between stimuli.

In the end, the proof is in the pudding. Statistical language learners work, and hand-engineered language systems don’t. The first eye-opener came in the 1970s, when DARPA, the Pentagon’s research arm, organized the first large-scale speech recognition project. To everyone’s surprise, a simple sequential learner of the type Chomsky derided handily beat a sophisticated knowledge-based system. Learners like it are now used in just about every speech recognizer, including Siri. Fred Jelinek, head of the speech group at IBM, famously quipped that “every time I fire a linguist, the recognizer’s performance goes up.” Stuck in the knowledge-engineering mire, computational linguistics had a near-death experience in the late 1980s. Since then, learning-based methods have swept the field, to the point where it’s hard to find a paper devoid of learning in a computational linguistics conference. Statistical parsers analyze language with accuracy close to that of humans, where hand-coded ones lagged far behind. Machine translation, spelling correction, part-of-speech tagging, word sense disambiguation, question answering, dialogue, summarization: the best systems in these areas all use learning. Watson, the Jeopardy! computer champion, would not have been possible without it.

To this Chomsky might reply that engineering successes are not proof of scientific validity. On the other hand, if your buildings collapse and your engines don’t run, perhaps something is wrong with your theory of physics. Chomsky thinks linguists should focus on “ideal” speaker-listeners, as defined by him, and this gives him license to ignore things like the need for statistics in language learning. Perhaps it’s not surprising, then, that few experimentalists take his theories seriously any more.

Another potential source of objections to the Master Algorithm is the notion, popularized by the psychologist Jerry Fodor, that the mind is composed of a set of modules with only limited communication between them. For example, when you watch TV your “higher brain” knows that it’s only light flickering on a flat surface, but your visual system still sees three-dimensional shapes. Even if we believe in the modularity of mind, however, that does not imply that different modules use different learning algorithms. The same algorithm operating on, say, visual and verbal information may suffice.

Critics like Minsky, Chomsky, and Fodor once had the upper hand, but thankfully their influence has waned. Nevertheless, we should keep their criticisms in mind as we set out on the road to the Master Algorithm for two reasons. The first is that knowledge engineers faced many of the same problems machine learners do, and even if they didn’t succeed, they learned many valuable lessons. The second is that learning and knowledge are intertwined in surprisingly subtle ways, as we’ll soon find out. Unfortunately, the two camps often talk past each other. They speak different languages: machine learning speaks probability, and knowledge engineering speaks logic. Later in the book we’ll see what to do about this.

Swan bites robot

“No matter how smart your algorithm, there are some things it just can’t learn.” Outside of AI and cognitive science, the most common objections to machine learning are variants of this claim. Nassim Taleb hammered on it forcefully in his book The Black Swan. Some events are simply not predictable. If you’ve only ever seen white swans, you think the probability of ever seeing a black one is zero. The financial meltdown of 2008 was a “black swan.”

It’s true that some things are predictable and some aren’t, and the first duty of the machine learner is to distinguish between them. But the goal of the Master Algorithm is to learn everything that can be known, and that’s a vastly wider domain than Taleb and others imagine. The housing bust was far from a black swan; on the contrary, it was widely predicted. Most banks’ models failed to see it coming, but that was due to well-understood limitations of those models, not limitations of machine learning in general. Learning algorithms are quite capable of accurately predicting rare, never-before-seen events; you could even say that that’s what machine learning is all about. What’s the probability of a black swan if you’ve never seen one? How about it’s the fraction of known species that belatedly turned out to have black specimens? This is only a crude example; we’ll see many deeper ones in this book.

A related, frequently heard objection is “Data can’t replace human intuition.” In fact, it’s the other way around: human intuition can’t replace data. Intuition is what you use when you don’t know the facts, and since you often don’t, intuition is precious. But when the evidence is before you, why would you deny it? Statistical analysis beats talent scouts in baseball (as Michael Lewis memorably documented in Moneyball), it beats connoisseurs at wine tasting, and every day we see new examples of what it can do. Because of the influx of data, the boundary between evidence and intuition is shifting rapidly, and as with any revolution, entrenched ways have to be overcome. If I’m the expert on X at company Y, I don’t like to be overridden by some guy with data. There’s a saying in industry: “Listen to your customers, not to the HiPPO,” HiPPO being short for “highest paid person’s opinion.” If you want to be tomorrow’s authority, ride the data, don’t fight it.