Выбрать главу

How did this viewpoint come about, and how sensible is it?

Humanity acquired the ability to quantify information in 1948, when the mathematician-turnedengineer Claude Shannon found a way to define how much information is contained in a message -he preferred the term signal -sent from a transmitter to a receiver using some kind of code. By a signal, Shannon meant a series of binary digits ('bits', 0 and 1) of the kind that is ubiquitous in modern computers and communication devices, and in Murray's semaphore. By a code, he meant a specific procedure that transforms an original signal into another one. The simplest code is the trivial 'leave it alone'; more sophisticated codes can be used to detect or even correct transmission errors. In the engineering applications, codes are a central issue, but for our purposes here we can ignore them and assume the message is sent 'in plain'.

Shannon's information measure puts a number to the extent to which our uncertainty about the bits that make up a signal is reduced by what we receive. In the simplest case, where the message is a string of 0s and 1s and every choice is equally likely, the amount of information in a message is entirely straightforward: it is the total number of binary digits. Each digit that we receive reduces our uncertainty about that particular digit (is it 0 or 1?) to certainty ('it's a 1', say) but tells us nothing about the others, so we have received one bit of information. Do this a thousand times and we have received a thousand bits of information. Easy.

The point of view here is that of a communications engineer, and the unstated assumption is that we are interested in the bit-by-bit content of the signal, not in its meaning. So the message

111111111111111 contains 15 bits of information, and so does the message 111001101101011.

But Shannon's concept of information is not the only possible one. More recently, Gregory Chaitin has pointed out that you can quantify the extent to which a signal contains patterns. The way to do this is to focus not on the size of the message, but on the size of a computer program, or algorithm, that can generate it. For instance, the first of the above messages can be created by the algorithm 'every digit is a 1'. But there is no simple way to describe the second message, other than to write it down bit by bit. So these two messages have the same Shannon information content, but from Chaitin's point of view the second contains far more 'algorithmic information'

than the first.

Another way to say this is that Chaitin's concept focuses on the extent to which the message is

'compressible'. If a short program can generate a long message, then we can transmit the program instead of the message and save time and money. Such a program 'compresses' the message.

When your computer takes a big graphics file -a photograph, say -and turns it into a much smaller file in JPEG format, it has used a standard algorithm to compress the information in the original file. This is possible because photographs contain numerous patterns: lots of repetitions of blue pixels for the sky, for instance. The more incompressible a signal is, the more information in Chaitin's sense it contains. And the way to compress a signal is to describe the patterns that make it up. This implies that incompressible signals are random, have no pattern, yet contain the most information. In one way this is reasonable: when each successive bit is maximally unpredictable, you learn more from knowing what it is. If the signal reads

111111111111111 then there is no great surprise if the next bit turns out to be 1; but if the signal reads 111001101101011 (which we obtained by tossing a coin 15 times) then there is no obvious guess for the next bit.

Both measures of information are useful in the design of electronic technology. Shannon information governs the time it takes to transmit a signal somewhere else; Chaitin information tells you whether there's a clever way to compress the signal first, and transmit something smaller. At least, it would do if you could calculate it, but one of the features of Chaitin's theory is that it is impossible to calculate the amount of algorithmic information in a message -and he can prove it. The wizards would approve of this twist.

'Information' is therefore a useful concept, but it is curious that 'To be or not to be' contains the same Shannon information as, and less Chaitin information than, 'xyQGRlfryu&d°/oskOwc'. The reason for this disparity is that information is not the same thing as meaning. That's fascinating.

What really matters to people is the meaning of a message, not its bit-count, but mathematicians have been unable to quantify meaning. So far.

And that brings us back to stories, which are messages that convey meaning. The moral is that we should not confuse a story with 'information'. The elves gave humanity stories, but they didn't give them any information. In fact, the stories people came up with included things like werewolves, which don't even exist on Roundworld. No information there -at least, apart from what it might tell you about the human imagination.

Most people, scientists in particular, are happiest with a concept when they can put a number to it. Anything else, they feel, is too vague to be useful. 'Information' is a number, so that comfortable feeling of precision slips in without anyone noticing that it might be spurious. Two sciences that have gone a long way down this slippery path are biology and physics.

The discovery of the linear molecular structure of DNA has given evolutionary biology a seductive metaphor for the complexity of organisms and how they evolve, namely: the genome of an organism represents the information that is required to construct it. The origin of this metaphor is Francis Crick and James Watson's epic discovery that an organism's DNA consists of 'code words' in the four molecular 'letters' A C T G, which, you'll recall, are the initials of the four possible 'bases'. This description led to the inevitable metaphor that the genome contains information about the corresponding organism. Indeed, the genome is widely described as

'containing the information needed to produce' an organism.

The easy target here is the word 'the'. There are innumerable reasons why a developing organism's DNA does not determine the organism. These non-genomic influences on development are collectively known as 'epigenetics', and they range from subtle chemical tagging of DNA to the investment of parental care. The hard target is 'information'. Certainly, the genome includes information in some sense: currently an enormous international effort is being devoted to listing that information for the human genome, and also for other organisms such as rice, yeast, and the nematode worm Caenorhabditis elegans. But notice how easily we slip into cavalier attitudes, for here the word 'information' refers to the human mind as receiver, not to the developing organism. The Human Genome Project informs us, not organisms.

This flawed metaphor leads to the equally flawed conclusion that the genome explains the complexity of an organism in terms of the amount of information in its DNA code. Humans are complicated because they have a long genome that carries a lot of information; nematodes are less complicated because their genome is shorter. However, this seductive idea can't be true. For example, the Shannon information content of the human genome is smaller by several orders of magnitude than the quantity of information needed to describe the wiring of the neurons in the human brain. How can we be more complex than the information that describes us? And some amoebas have much longer genomes than ours, which takes us down several pegs as well as casting even more doubt on DNA as information.

Underlying the widespread belief that DNA complexity explains organism complexity (even though it clearly doesn't) are two assumptions, two scientific stories that we tell ourselves. The first story is DNA as Blueprint, in which the genome is represented not just as an important source of control and guidance over biological development, but as the information needed to determine an organism. The second is DNA as Message, the 'Book of Life' metaphor.