The first is that, in reality, we never have enough data to completely determine the world. Even ignoring the uncertainty principle, precisely knowing the positions and velocities of all particles in the world at some point in time is not remotely feasible. And because the laws of physics are chaotic, uncertainty compounds over time, and pretty soon they determine very little indeed. To accurately describe the world, we need a fresh batch of data at regular intervals. In effect, the laws of physics only tell us what happens locally. This drastically reduces their power.
The second problem is that, even if we had complete knowledge of the world at some point in time, the laws of physics would still not allow us to determine its past and future. This is because the sheer amount of computation required to make those predictions would be beyond the capabilities of any imaginable computer. In effect, to perfectly simulate the universe we would need another, identical universe. This is why string theory is mostly irrelevant outside of physics. The theories we have in biology, psychology, sociology, or economics are not corollaries of the laws of physics; they had to be created from scratch. We assume that they are approximations of what the laws of physics would predict when applied at the scale of cells, brains, and societies, but there’s no way to know.
Unlike the theories of a given field, which only have power within that field, the Master Algorithm has power across all fields. Within field X, it has less power than field X’s prevailing theory, but across all fields-when we consider the whole world-it has vastly more power than any other theory. The Master Algorithm is the germ of every theory; all we need to add to it to obtain theory X is the minimum amount of data required to induce it. (In the case of physics, that would be just the results of perhaps a few hundred key experiments.) The upshot is that, pound for pound, the Master Algorithm may well be the best starting point for a theory of everything we’ll ever have. Pace Stephen Hawking, it may ultimately tell us more about the mind of God than string theory.
Some may say that seeking a universal learner is the epitome of techno-hubris. But dreaming is not hubris. Maybe the Master Algorithm will take its place among the great chimeras, alongside the philosopher’s stone and the perpetual motion machine. Or perhaps it will be more like finding the longitude at sea, given up as too difficult until a lone genius solved it. More likely, it will be the work of generations, raised stone by stone like a cathedral. The only way to find out is to get up early one day and set out on the journey.
Candidates that don’t make the cut
So, if the Master Algorithm exists, what is it? A seemingly obvious candidate is memorization: just remember everything you’ve seen; after a while you’ll have seen everything there is to see, and therefore know everything there is to know. The problem with this is that, as Heraclitus said, you never step in the same river twice. There’s far more to see than you ever could. No matter how many snowflakes you’ve examined, the next one will be different. Even if you had been present at the Big Bang and everywhere since, you would still have seen only a tiny fraction of what you could see in the future. If you had witnessed life on Earth up to ten thousand years ago, that would not have prepared you for what was to come. Someone who grew up in one city doesn’t become paralyzed when they move to another, but a robot capable only of memorization would. Besides, knowledge is not just a long list of facts. Knowledge is general, and has structure. “All humans are mortal” is much more succinct than seven billion statements of mortality, one for each human. Memorization gives us none of these things.
Another candidate Master Algorithm is the microprocessor. After all, the one in your computer can be viewed as a single algorithm whose job is to execute other algorithms, like a universal Turing machine; and it can run any imaginable algorithm, up to its limits of memory and speed. In effect, to a microprocessor an algorithm is just another kind of data. The problem here is that, by itself, the microprocessor doesn’t know how to do anything; it just sits there idle all day. Where do the algorithms it runs come from? If they were coded up by a human programmer, no learning is involved. Nevertheless, there’s a sense in which the microprocessor is a good analog for the Master Algorithm. A microprocessor is not the best hardware for running any particular algorithm. That would be an ASIC (application-specific integrated circuit) designed very precisely for that algorithm. Yet microprocessors are what we use for almost all applications, because their flexibility trumps their relative inefficiency. If we had to build an ASIC for every new application, the Information Revolution would never have happened. Similarly, the Master Algorithm is not the best algorithm for learning any particular piece of knowledge; that would be an algorithm that already encodes most of that knowledge (or all of it, making the data superfluous). The point, however, is to induce the knowledge from data, because it’s easier and costs less; so the more general the learning algorithm, the better.
An even more extreme candidate is the humble NOR gate: a logic switch whose output is 1 only if its inputs are both 0. Recall that all computers are made of logic gates built out of transistors, and all computations can be reduced to combinations of AND, OR, and NOT gates. A NOR gate is just an OR gate followed by a NOT gate: the negation of a disjunction, as in “I’m happy as long as I’m not starving or sick.” AND, OR and NOT can all be implemented using NOR gates, so NOR can do everything, and in fact it’s all some microprocessors use. So why can’t it be the Master Algorithm? It’s certainly unbeatable for simplicity. Unfortunately, a NOR gate is not the Master Algorithm any more than a Lego brick is the universal toy. It can certainly be a universal building block for toys, but a pile of Legos doesn’t spontaneously assemble itself into a toy. The same applies to other simple computation schemes, like Petri nets or cellular automata.
Moving on to more sophisticated alternatives, what about the queries that any good database engine can answer, or the simple algorithms in a statistical package? Aren’t those enough? These are bigger Lego bricks, but they’re still only bricks. A database engine never discovers anything new; it just tells you what it knows. Even if all the humans in a database are mortal, it doesn’t occur to it to generalize mortality to other humans. (Database engineers would blanch at the thought.) Much of statistics is about testing hypotheses, but someone has to formulate them in the first place. Statistical packages can do linear regression and other simple procedures, but these have a very low limit on what they can learn, no matter how much data you feed them. The better packages cross into the gray zone between statistics and machine learning, but there are still many kinds of knowledge they can’t discover.
OK, it’s time to come clean: the Master Algorithm is the equation U(X) = 0. Not only does it fit on a T-shirt; it fits on a postage stamp. Huh? U(X) = 0 just says that some (possibly very complex) function U of some (possibly very complex) variable X is equal to 0. Every equation can be reduced to this form; for example, F = ma is equivalent to F – ma = 0, so if you think of F – ma as a function U of F, voilà: U(F) = 0. In general, X could be any input and U could be any algorithm, so surely the Master Algorithm can’t be any more general than this; and since we’re looking for the most general algorithm we can find, this must be it. I’m just kidding, of course, but this particular failed candidate points to a real danger in machine learning: coming up with a learner that’s so general, it doesn’t have enough content to be useful.