Читать онлайн "The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World" - Domingos Pedro - RuLit

One of the greatest mathematicians of all time, Laplace is perhaps best known for his dream of Newtonian determinism:

An intelligence that, at a given instant, could comprehend all the forces by which nature is animated and the respective situation of the beings that make it up, if moreover it were vast enough to submit these data to analysis, would encompass in the same formula the movements of the greatest bodies of the universe and those of the lightest atoms. For such an intelligence nothing would be uncertain, and the future, like the past, would be open to its eyes.

This is ironic, since Laplace was also the father of probability theory, which he believed was just common sense reduced to calculation. At the heart of his explorations in probability was a preoccupation with Hume’s question. For example, how do we know the sun will rise tomorrow? It has done so every day until today, but that’s no guarantee it will continue. Laplace’s answer had two parts. The first is what we now call the principle of indifference, or principle of insufficient reason. We wake up one day-at the beginning of time, let’s say, which for Laplace was five thousand years or so ago-and after a beautiful afternoon, we see the sun go down. Will it come back? We’ve never seen the sun rise, and there is no particular reason to believe it will or won’t. Therefore we should consider the two scenarios equally likely and say that the sun will rise again with a probability of one-half. But, Laplace went on, if the past is any guide to the future, every day that the sun rises should increase our confidence that it will continue to do so. After five thousand years, the probability that the sun will rise yet again tomorrow should be very close to one, but not quite there, since we can never be completely certain. From this thought experiment, Laplace derived his so-called rule of succession, which estimates the probability that the sun will rise again after having risen n times as (n + 1) / (n + 2). When n = 0, this is just ½; and as n increases, so does the probability, approaching 1 when n approaches infinity.

This rule arises from a more general principle. Suppose you awake in the middle of the night on a strange planet. Even though all you can see is the starry sky, you have reason to believe that the sun will rise at some point, since most planets revolve around themselves and their sun. So your estimate of the corresponding probability should be greater than one-half (two-thirds, say). We call this the prior probability that the sun will rise, since it’s prior to seeing any evidence. It’s not based on counting the number of times the sun has risen on this planet in the past, because you weren’t there to see it; rather, it reflects your a priori beliefs about what will happen, based on your general knowledge of the universe. But now the stars start to fade, so your confidence that the sun does rise on this planet goes up, based on your experience on Earth. Your confidence is now a posterior probability, since it’s after seeing some evidence. The sky begins to lighten, and the posterior probability takes another leap. Finally, a sliver of the sun’s bright disk appears above the horizon and perhaps catches “the Sultan’s turret in a noose of light,” as in the opening verse of the Rubaiyat. Unless you’re hallucinating, it is now certain that the sun will rise.

The crucial question is exactly how the posterior probability should evolve as you see more evidence. The answer is Bayes’ theorem. We can think of it in terms of cause and effect. Sunrise causes the stars to fade and the sky to lighten, but the latter is stronger evidence of daybreak, since the stars could fade in the middle of the night due to, say, fog rolling in. So the probability of sunrise should increase more after seeing the sky lighten than after seeing the stars fade. In mathematical notation, we say that P(sunrise | lightening-sky), the conditional probability of sunrise given that the sky is lightening, is greater than P(sunrise | fading-stars), its conditional probability given that the stars are fading. According to Bayes’ theorem, the more likely the effect is given the cause, the more likely the cause is given the effect: if P(lightening-sky | sunrise) is higher than P(fading-stars | sunrise), perhaps because some planets are far enough from their sun that the stars still shine after sunrise, then P(sunrise | lightening sky) is also higher than P(sunrise | fading-stars).

This is not the whole story, however. If we observe an effect that would happen even without the cause, then surely that’s not much evidence of the cause being present. Bayes’ theorem incorporates this by saying that P(cause | effect) goes down with P(effect), the prior probability of the effect (i.e., its probability in the absence of any knowledge of the causes). Finally, other things being equal, the more likely a cause is a priori, the more likely it should be a posteriori. Putting all of these together, Bayes’ theorem says that

P(cause | effect) = P(cause) × P(effect | cause) / P(effect).

Replace cause by A and effect by B and omit the multiplication sign for brevity, and you get the ten-foot formula in the cathedral.

That’s just a statement of the theorem, not a proof, of course. But the proof is surprisingly simple. We can illustrate it with an example from medical diagnosis, one of the “killer apps” of Bayesian inference. Suppose you’re a doctor, and you’ve diagnosed a hundred patients in the last month. Fourteen of them had the flu, twenty had a fever, and eleven had both. The conditional probability of fever given flu is therefore eleven out of fourteen, or 11/14. Conditioning reduces the size of the universe that we’re considering, in this case from all patients to only patients with the flu. In the universe of all patients, the probability of fever is 20/100; in the universe of flu-stricken patients, it’s 11/14. The probability that a patient has the flu and a fever is the fraction of patients that have the flu times the fraction of those that have a fever: P(flu, fever) = P(flu) × P(fever | flu) = 14/100 × 11/14 = 11/100. But we could equally well have done this the other way around: P(flu, fever) = P(fever) × P(flu | fever). Therefore, since they’re both equal to P(flu,fever), P(fever) × P(flu | fever) = P(flu) × P(fever | flu). Divide both sides by P(fever), and you get P(flu | fever) = P(flu) × P(fever | flu) / P(fever). That’s it! That’s Bayes’ theorem, with flu as the cause and fever as the effect.

Humans, it turns out, are not very good at Bayesian inference, at least when verbal reasoning is involved. The problem is that we tend to neglect the cause’s prior probability. If you test positive for HIV, and the test only gives 1 percent false positives, should you panic? At first sight, it seems like your chances of having AIDS are now 99 percent. Yikes! But let’s keep a cool head and apply Bayes’ theorem step-by-step: P(HIV | positive) = P(HIV) × P(positive | HIV) / P(positive). P(HIV) is the prevalence of HIV in the general population, which is about 0.3 percent in the United States. P(positive) is the probability that the test comes out positive whether or not you have AIDS; let’s say that’s 1 percent. So P(HIV | positive) = 0.003 × 0.99 / 0.01 = 0.297. That’s very different from 0.99! The reason is that HIV is rare in the general population. The test coming out positive increases your chances of having AIDS by two orders of magnitude, but they’re still less than half. If you test positive for HIV, the right thing to do is to stay calm and take another, more definitive test. Chances are you’ll be fine.