Читать онлайн "How to Create a Mind: The Secret of Human Thought Revealed" - Kurzweil Ray - RuLit

If computer technology were being pursued by only a handful of researchers, it would indeed be unpredictable. But it’s the product of a sufficiently dynamic system of competitive projects that a basic measure of its price/performance, such as calculations per second per constant dollar, follows a very smooth exponential path, dating back to the 1890 American census as I noted in the previous chapter. While the theoretical basis for the LOAR is presented extensively in The Singularity Is Near, the strongest case for it is made by the extensive empirical evidence that I and others present.

Allen writes that “these ‘laws’ work until they don’t.” Here he is confusing paradigms with the ongoing trajectory of a basic area of information technology. If we were examining, for example, the trend of creating ever smaller vacuum tubes—the paradigm for improving computation in the 1950s—it’s true that it continued until it didn’t. But as the end of this particular paradigm became clear, research pressure grew for the next paradigm. The technology of transistors kept the underlying trend of the exponential growth of price/performance of computation going, and that led to the fifth paradigm (Moore’s law) and the continual compression of features on integrated circuits. There have been regular predictions that Moore’s law will come to an end. The semiconductor industry’s “International Technology Roadmap for Semiconductors” projects seven-nanometer features by the early 2020s.² At that point key features will be the width of thirty-five carbon atoms, and it will be difficult to continue shrinking them any farther. However, Intel and other chip makers are already taking the first steps toward the sixth paradigm, computing in three dimensions, to continue exponential improvement in price/performance. Intel projects that three-dimensional chips will be mainstream by the teen years; three-dimensional transistors and 3-D memory chips have already been introduced. This sixth paradigm will keep the LOAR going with regard to computer price/performance to a time later in this century when a thousand dollars’ worth of computation will be trillions of times more powerful than the human brain.³ (It appears that Allen and I are at least in agreement on what level of computation is required to functionally simulate the human brain.)⁴

Allen then goes on to give the standard argument that software is not progressing in the same exponential manner as hardware. In The Singularity Is Near I addressed this issue at length, citing different methods of measuring complexity and capability in software that do demonstrate a similar exponential growth.⁵ One recent study (“Report to the President and Congress, Designing a Digital Future: Federally Funded Research and Development in Networking and Information Technology,” by the President’s Council of Advisors on Science and Technology) states the following:

Even more remarkable—and even less widely understood—is that in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed. The algorithms that we use today for speech recognition, for natural language translation, for chess playing, for logistics planning, have evolved remarkably in the past decade…. Here is just one example, provided by Professor Martin Grötschel of Konrad-Zuse-Zentrum für Informationstechnik Berlin. Grötschel, an expert in optimization, observes that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later—in 2003—this same model could be solved in roughly 1 minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algorithms! Grötschel also cites an algorithmic improvement of roughly 30,000 for mixed integer programming between 1991 and 2008. The design and analysis of algorithms, and the study of the inherent computational complexity of problems, are fundamental subfields of computer science.

Note that the linear programming that Grötschel cites above as having benefited from an improvement in performance of 43 million to 1 is the mathematical technique that is used to optimally assign resources in a hierarchical memory system such as HHMM that I discussed earlier. I cite many other similar examples like this in The Singularity Is Near.⁶

Regarding AI, Allen is quick to dismiss IBM’s Watson, an opinion shared by many other critics. Many of these detractors don’t know anything about Watson other than the fact that it is software running on a computer (albeit a parallel one with 720 processor cores). Allen writes that systems such as Watson “remain brittle, their performance boundaries are rigidly set by their internal assumptions and defining algorithms, they cannot generalize, and they frequently give nonsensical answers outside of their specific areas.”

First of all, we could make a similar observation about humans. I would also point out that Watson’s “specific areas” include all of Wikipedia plus many other knowledge bases, which hardly constitute a narrow focus. Watson deals with a vast range of human knowledge and is capable of dealing with subtle forms of language, including puns, similes, and metaphors in virtually all fields of human endeavor. It’s not perfect, but neither are humans, and it was good enough to be victorious on Jeopardy! over the best human players.

Allen argues that Watson was assembled by the scientists themselves, building each link of narrow knowledge in specific areas. This is simply not true. Although a few areas of Watson’s data were programmed directly, Watson acquired the significant majority of its knowledge on its own by reading natural-language documents such as Wikipedia. That represents its key strength, as does its ability to understand the convoluted language in Jeopardy! queries (answers in search of a question).

As I mentioned earlier, much of the criticism of Watson is that it works through statistical probabilities rather than “true” understanding. Many readers interpret this to mean that Watson is merely gathering statistics on word sequences. The term “statistical information” in the case of Watson actually refers to distributed coefficients and symbolic connections in self-organizing methods such as hierarchical hidden Markov models. One could just as easily dismiss the distributed neurotransmitter concentrations and redundant connection patterns in the human cortex as “statistical information.” Indeed we resolve ambiguities in much the same way that Watson does—by considering the likelihood of different interpretations of a phrase.

Allen continues, “Every structure [in the brain] has been precisely shaped by millions of years of evolution to do a particular thing, whatever it might be. It is not like a computer, with billions of identical transistors in regular memory arrays that are controlled by a CPU with a few different elements. In the brain every individual structure and neural circuit has been individually refined by evolution and environmental factors.”

This contention that every structure and neural circuit in the brain is unique and there by design is simply impossible, for it would mean that the blueprint of the brain would require hundreds of trillions of bytes of information. The brain’s structural plan (like that of the rest of the body) is contained in the genome, and the brain itself cannot contain more design information than the genome. Note that epigenetic information (such as the peptides controlling gene expression) does not appreciably add to the amount of information in the genome. Experience and learning do add significantly to the amount of information contained in the brain, but the same can be said of AI systems like Watson. I show in The Singularity Is Near that, after lossless compression (due to massive redundancy in the genome), the amount of design information in the genome is about 50 million bytes, roughly half of which (that is, about 25 million bytes) pertains to the brain.⁷ That’s not simple, but it is a level of complexity we can deal with and represents less complexity than many software systems in the modern world. Moreover much of the brain’s 25 million bytes of genetic design information pertain to the biological requirements of neurons, not to their information-processing algorithms.