Chapter Five
“Evolutionary robotics,” by Josh Bongard (Communications of the ACM, 2013), surveys the work of Hod Lipson and others on evolving robots. Artificial Life, by Steven Levy (Vintage, 1993), gives a tour of the digital zoo, from computer-created animals in virtual worlds to genetic algorithms. Chapter 5 of Complexity, by Mitch Waldrop (Touchstone, 1992), tells the story of John Holland and the first few decades of research on genetic algorithms. Genetic Algorithms in Search, Optimization, and Machine Learning,* by David Goldberg (Addison-Wesley, 1989), is the standard introduction to genetic algorithms.
Niles Eldredge and Stephen Jay Gould propose their theory of punctuated equilibria in “Punctuated equilibria: An alternative to phyletic gradualism,” in Models in Paleobiology, edited by T. J. M. Schopf (Freeman, 1972). Richard Dawkins critiques it in Chapter 9 of The Blind Watchmaker (Norton, 1986). The exploration-exploitation dilemma is discussed in Chapter 2 of Reinforcement Learning,* by Richard Sutton and Andrew Barto (MIT Press, 1998). John Holland proposes his solution, and much else, in Adaptation in Natural and Artificial Systems* (University of Michigan Press, 1975).
John Koza’s Genetic Programming* (MIT Press, 1992) is the key reference on this paradigm. An evolved robot soccer team is described in “Evolving team Darwin United,”* by David Andre and Astro Teller, in RoboCup-98: Robot Soccer World Cup II, edited by Minoru Asada and Hiroaki Kitano (Springer, 1999). Genetic Programming III,* by John Koza, Forrest Bennett III, David Andre, and Martin Keane (Morgan Kaufmann, 1999), includes many examples of evolved electronic circuits. Danny Hillis argues that parasites are good for evolution in “Co-evolving parasites improve simulated evolution as an optimization procedure”* (Physica D, 1990). Adi Livnat, Christos Papadimitriou, Jonathan Dushoff, and Marcus Feldman propose that sex optimizes mixability in “A mixability theory of the role of sex in evolution”* (Proceedings of the National Academy of Sciences, 2008). Kevin Lang’s paper comparing genetic programming and hill climbing is “Hill climbing beats genetic search on a Boolean circuit synthesis problem of Koza’s”* (Proceedings of the Twelfth International Conference on Machine Learning, 1995). Koza’s reply is “A response to the ML-95 paper entitled…”* (unpublished; online at www.genetic-programming.com/jktahoe24page.html).
James Baldwin proposed the eponymous effect in “A new factor in evolution” (American Naturalist, 1896). Geoff Hinton and Steven Nowlan describe their implementation of it in “How learning can guide evolution”* (Complex Systems, 1987). The Baldwin effect was the theme of a 1996 special issue* of the journal Evolutionary Computation edited by Peter Turney, Darrell Whitley, and Russell Anderson.
The distinction between descriptive and normative theories was articulated by John Neville Keynes in The Scope and Method of Political Economy (Macmillan, 1891).
Chapter Six
Sharon Bertsch McGrayne tells the history of Bayesianism, from Bayes and Laplace to the present, in The Theory That Would Not Die (Yale University Press, 2011). A First Course in Bayesian Statistical Methods,* by Peter Hoff (Springer, 2009), is an introduction to Bayesian statistics.
The Naïve Bayes algorithm is first mentioned in Pattern Classification and Scene Analysis,* by Richard Duda and Peter Hart (Wiley, 1973). Milton Friedman argues for oversimplified theories in “The methodology of positive economics,” which appears in Essays in Positive Economics (University of Chicago Press, 1966). The use of Naïve Bayes in spam filtering is described in “Stopping spam,” by Joshua Goodman, David Heckerman, and Robert Rounthwaite (Scientific American, 2005). “Relevance weighting of search terms,”* by Stephen Robertson and Karen Sparck Jones (Journal of the American Society for Information Science, 1976), explains the use of Naïve Bayes-like methods in information retrieval.
“First links in the Markov chain,” by Brian Hayes (American Scientist, 2013), recounts Markov’s invention of the eponymous chains. “Large language models in machine translation,”* by Thorsten Brants et al. (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007), explains how Google Translate works. “The PageRank citation ranking: Bringing order to the Web,”* by Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd (Stanford University technical report, 1998), describes the PageRank algorithm and its interpretation as a random walk over the web. Statistical Language Learning,* by Eugene Charniak (MIT Press, 1996), explains how hidden Markov models work. Statistical Methods for Speech Recognition,* by Fred Jelinek (MIT Press, 1997), describes their application to speech recognition. The story of HMM-style inference in communication is told in “The Viterbi algorithm: A personal history,” by David Forney (unpublished; online at arxiv.org/pdf/cs/0504020v2.pdf). Bioinformatics: The Machine Learning Approach,* by Pierre Baldi and Søren Brunak (2nd ed., MIT Press, 2001), is an introduction to the use of machine learning in biology, including HMMs. “Engineers look to Kalman filtering for guidance,” by Barry Cipra (SIAM News, 1993), is a brief introduction to Kalman filters, their history, and their applications.
Judea Pearl’s pioneering work on Bayesian networks appears in his book Probabilistic Reasoning in Intelligent Systems* (Morgan Kaufmann, 1988). “Bayesian networks without tears,”* by Eugene Charniak (AI Magazine, 1991), is a largely nonmathematical introduction to them. “Probabilistic interpretation for MYCIN’s certainty factors,”* by David Heckerman (Proceedings of the Second Conference on Uncertainty in Artificial Intelligence, 1986), explains when sets of rules with confidence estimates are and aren’t a reasonable approximation to Bayesian networks. “Module networks: Identifying regulatory modules and their condition-specific regulators from gene expression data,” by Eran Segal et al. (Nature Genetics, 2003), is an example of using Bayesian networks to model gene regulation. “Microsoft virus fighter: Spam may be more difficult to stop than HIV,” by Ben Paynter (Fast Company, 2012), tells how David Heckerman took inspiration from spam filters and used Bayesian networks to design a potential AIDS vaccine. The probabilistic or “noisy” OR is explained in Pearl’s book.* “Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base,” by M. A. Shwe et al. (Parts I and II, Methods of Information in Medicine, 1991), describes a noisy-OR Bayesian network for medical diagnosis. Google’s Bayesian network for ad placement is described in Section 26.5.4 of Kevin Murphy’s Machine Learning* (MIT Press, 2012). Microsoft’s player rating system is described in “TrueSkillTM: A Bayesian skill rating system,”* by Ralf Herbrich, Tom Minka, and Thore Graepel (Advances in Neural Information Processing Systems 19, 2007).
Modeling and Reasoning with Bayesian Networks,* by Adnan Darwiche (Cambridge University Press, 2009), explains the main algorithms for inference in Bayesian networks. The January/February 2000 issue* of Computing in Science and Engineering, edited by Jack Dongarra and Francis Sullivan, has articles on the top ten algorithms of the twentieth century, including MCMC. “Stanley: The robot that won the DARPA Grand Challenge,” by Sebastian Thrun et al. (Journal of Field Robotics, 2006), explains how the eponymous self-driving car works. “Bayesian networks for data mining,”* by David Heckerman (Data Mining and Knowledge Discovery, 1997), summarizes the Bayesian approach to learning and explains how to learn Bayesian networks from data. “Gaussian processes: A replacement for supervised neural networks?,”* by David MacKay (NIPS tutorial notes, 1997; online at www.inference.eng.cam.ac.uk/mackay/gp.pdf), gives a flavor of how the Bayesians co-opted NIPS.