The algorithm we’ll arrive at is not yet the Master Algorithm, for reasons we’ll see, but it’s the closest anyone has come. And we’ll gather enough riches along the way to make Croesus envious. Nevertheless, this book is only part one of the Master Algorithm saga. Part two’s protagonist is you, dear reader. Your mission, should you choose to accept it, is to go the rest of the way and bring back the prize. I will be your humble guide in part one, from here to the edge of the known world. Do I hear you protest that you don’t know enough, or algorithms are not your forte? Fear not. Computer science is still young, and unlike in physics or biology, you don’t need a PhD to start a revolution. (Just ask Bill Gates, Messrs. Sergey Brin and Larry Page, or Mark Zuckerberg.) Insight and persistence are what counts.
Are you ready? Our journey begins with a visit to the symbolists, the tribe with the oldest roots.
CHAPTER THREE: Hume’s Problem of Induction
Are you a rationalist or an empiricist?
Rationalists believe that the senses deceive and that logical reasoning is the only sure path to knowledge. Empiricists believe that all reasoning is fallible and that knowledge must come from observation and experimentation. The French are rationalists; the Anglo-Saxons (as the French call them) are empiricists. Pundits, lawyers, and mathematicians are rationalists; journalists, doctors, and scientists are empiricists. Murder, She Wrote is a rationalist TV crime show; CSI: Crime Scene Investigation is an empiricist one. In computer science, theorists and knowledge engineers are rationalists; hackers and machine learners are empiricists.
The rationalist likes to plan everything in advance before making the first move. The empiricist prefers to try things and see how they turn out. I don’t know if there’s a gene for rationalism or one for empiricism, but looking at my computer scientist colleagues, I’ve observed time and again that they are almost like personality traits: some people are rationalistic to the core and could never have been otherwise; and others are empiricist through and through, and that’s what they’ll always be. The two sides can converse with each other and sometimes draw on each other’s results, but they can understand each other only so much. Deep down each believes that what the other does is secondary, and not very interesting.
Rationalists and empiricists have probably been around since the dawn of Homo sapiens. Before setting out on a hunt, Caveman Bob spent a long time sitting in his cave figuring out where the game would be. In the meantime, Cavewoman Alice was out systematically surveying the territory. Since both kinds are still with us, it’s probably safe to say that neither approach was better. You might think that machine learning is the final triumph of the empiricists, but the truth is more subtle, as we’ll soon see.
Rationalism versus empiricism is a favorite question of philosophers. Plato was an early rationalist, and Aristotle an early empiricist. But the debate really took off during the Enlightenment, with a trio of great thinkers on each side: Descartes, Spinoza, and Leibniz were the leading rationalists; Locke, Berkeley, and Hume were their empiricist counterparts. Trusting in their powers of reasoning, the rationalists concocted theories of the universe that-to put it gently-did not stand the test of time, but they also invented fundamental mathematical techniques like calculus and analytical geometry. The empiricists were altogether more practical, and their influence is everywhere from the scientific method to the Constitution of the United States.
David Hume was the greatest of the empiricists and the greatest English-speaking philosopher of all time. Thinkers like Adam Smith and Charles Darwin count him among their key influences. You could also say he’s the patron saint of the symbolists. He was born in Scotland in 1711 and spent most of his life in eighteenth-century Edinburgh, a prosperous city full of intellectual ferment. A man of genial disposition, he was nevertheless an exacting skeptic who spent much of his time debunking the myths of his age. He also took the empiricist train of thought that Locke had started to its logical conclusion and asked a question that has since hung like a sword of Damocles over all knowledge, from the most trivial to the most advanced: How can we ever be justified in generalizing from what we’ve seen to what we haven’t? Every learning algorithm is, in a sense, an attempt to answer this question.
Hume’s question is also the departure point for our journey. We’ll start by illustrating it with an example from daily life and meeting its modern embodiment in the famous “no free lunch” theorem. Then we’ll see the symbolists’ answer to Hume. This leads us to the most important problem in machine learning: overfitting, or hallucinating patterns that aren’t really there. We’ll see how the symbolists solve it, and how machine learning is at heart a kind of alchemy, transmuting data into knowledge with the aid of a philosopher’s stone. For the symbolists, the philosopher’s stone is knowledge itself. In the next four chapters we’ll study the solutions of the other tribes’ alchemists.
To date or not to date?
You have a friend you really like, and you want to ask her out on a date. You have a hard time dealing with rejection, though, and you’re only going to ask her if you’re pretty sure she’ll say yes. It’s Friday evening, and there you sit with cell phone in hand, trying to decide whether or not to call her. You remember that the previous time you asked her, she said no. But why? Two times before that she said yes, and the one before those she said no. Maybe there are days she doesn’t like to go out? Or maybe she likes clubbing but not dinner dates? Being of an unusually systematic nature, you put down the phone and jot down what you can remember about those previous occasions:
So… what shall it be? Date or no date? Is there a pattern that distinguishes the yeses from the nos? And, most important, what does that pattern say about today?
Clearly, there’s no single factor that correctly predicts the answer: on some weekends she likes to go out, and on some she doesn’t; sometimes she likes to go clubbing, and sometimes she doesn’t, and so on. What about a combination of factors? Maybe she likes to go clubbing on weekends? No, occasion number 4 crosses that one out. Or maybe she only likes to go out on warm weekend nights? Bingo! That works! In which case, looking at the frosty weather outside, tonight doesn’t look promising. But wait! What if she likes to go clubbing when there’s nothing good on TV? That also works, and that means today is a yes! Quick, call her before it gets too late. But wait a second. How do you know this is the right pattern? You’ve found two that agree with your previous experience, but they make opposite predictions. Come to think of it, what if she only goes clubbing when the weather is nice? Or she goes out on weekends when there’s nothing to watch on TV? Or-
At this point you crumple your notes in frustration and fling them into the wastebasket. There’s no way to know! What can you do? The ghost of Hume nods sadly over your shoulder. You have no basis to pick one generalization over another. Yes and no are equally legitimate answers to the question “What will she say?” And the clock is ticking. Bitterly, you fish out a quarter from your pocket and prepare to flip it.