This flawed metaphor leads to the equally flawed conclusion that the genome explains the complexity of an organism in terms of the amount of information in its DNA code. Humans are complicated because they have a long genome that carries a lot of information; nematodes are less complicated because their genome is shorter. However, this seductive idea can't be true. For example, the Shannon information content of the human genome is smaller by several orders of magnitude than the quantity of information needed to describe the wiring of the neurons in the human brain. How can we be more complex than the information that describes us? And some amoebas have much longer genomes than ours, which takes us down several pegs as well as casting even more doubt on DNA as information.
Underlying the widespread belief that DNA complexity explains organism complexity (even though it clearly doesn't) are two assumptions, two scientific stories that we tell ourselves. The first story is DNA as Blueprint, in which the genome is represented not just as an important source of control and guidance over biological development, but as the information needed to determine an organism. The second is DNA as Message, the 'Book of Life' metaphor.
Both stories oversimplify a beautifully complex interactive system. DNA as Blueprint says that the genome is a molecular 'map' of an organism. DNA as Message says that an organism can pass that map to the next generation by 'sending' the appropriate information.
Both of these are wrong, although they're quite good science fiction -or, at least, interestingly bad science fiction with good special effects.
If there is a 'receiver' for the DNA 'message' it is not the next generation of the organism, which does not even exist at the time the 'message' is being 'sent', but the ribosome, which is the molecular machine that turns DNA sequences (in a protein-coding gene) into protein. The ribosome is an essential part of the coding system; it functions as an 'adapter', changing the sequence information along the DNA into an amino acid sequence in proteins. Every cell contains many ribosomes: we say 'the' because they are all identical. The metaphor of DNA as information has become almost universal, yet virtually nobody has suggested that the ribosome must be a vast repository of information. The structure of the ribosome is now known in high detail, and there is no sign of obvious 'information-bearing' structure like that in DNA. The ribosome seems to be a fixed machine. So where has the information gone? Nowhere. That's the wrong question.
The root of these misunderstandings lies in a lack of attention to context. Science is very strong on content, but it has a habit of ignoring 'external' constraints on the systems being studied.
Context is an important but neglected feature of information. It is so easy to focus on the combinatorial clarity of the message and to ignore the messy, complicated processes carried out by the receiver when it decodes the message. Context is crucial to the interpretation of messages: to their meaning. In his book The User Illusion Tor Norretranders introduced the term exformation to capture the role of the context, and Douglas Hofstadter made the same general point in Godel, Escher, Bach. Observe how, in the next chapter, the otherwise incomprehensible message 'THEOSTRY' becomes obvious when context is taken into account.
Instead of thinking about a DNA 'blueprint' encoding an organism, it's easier to think of a CD
encoding music. Biological development is like a CD that contains instructions for building a new CD-player. You can't 'read' those instructions without already having one. If meaning does not depend upon context, then the code on the CD should have an invariant meaning, one that is independent of the player. Does it, though?
Compare two extremes: a 'standard' player that maps the digital code on the CD to music in the manner intended by the design engineers, and a jukebox. With a normal jukebox, the only message that you send is some money and a button-push; yet in the context of the jukebox these are interpreted as a specific several minutes' worth of music. In principle, any numerical code can 'mean' any piece of music you wish; it just depends on how the jukebox is set up, that is, on the exformation associated with the jukebox's design. Now consider a jukebox that reacts to a CD not by playing the tune that's encoded on it, as a series of bits, but by interpreting that code as a number, and then playing some other CD to which that number has been assigned. For instance, suppose that a recording of Beethoven's Fifth Symphony starts, in digital form, with
11001. That's the number 25 in binary. So the jukebox reads the CD as '25', and looks for CD
number 25, which we'll assume is a recording of Charlie Parker playing jazz. On the other hand, elsewhere in the jukebox is CD number 973, which actually is Beethoven's Fifth Symphony.
Then a CD of Beethoven's Fifth can be 'read' in two totally different ways: as a 'pointer' to Charlie Parker, or as Beethoven's Fifth Symphony itself (triggered by whichever CDs start with
973 in binary). Two contexts, two interpretations, two meanings, two results.
Whether something is a message depends upon context, too: sender and receiver must agree upon a protocol for turning meanings into symbols and back again. Without this protocol a semaphore is just a few bits of wood that flap about. Tree branches are bits of wood that flap about, too, but no one ever tries to decode the message being transmitted by a tree. Tree rings the growth rings that appear when you saw through the trunk, one ring per year -are a different matter. We have learned to 'decode' their 'message', about climate in the year 1066 and the like.
A thick ring indicates a good year with lots of growth on the tree, probably warm and wet; a thin ring indicates a poor year, probably cold and dry. But the sequence of tree rings only became a message, only conveyed information, when we figured out the rules that link climate to tree growth. The tree didn't send its message to us.
In biological development the protocol that gives meaning to the DNA message is the laws of physics and chemistry. That is where the exformation resides. However, it is unlikely that exformation can be quantified. An organism's complexity is not determined by the number of bases in its DNA sequence, but by the complexity of the actions initiated by those bases within the context of biological development. That is, by the meaning of the DNA 'message' when it is received by a finely tuned, up-and-running biochemical machine. This is where we gain an edge over those amoebas. Starting with an embryo that develops little flaps, and making a baby with those exquisite little hands, involves a series of processes that produce skeleton, muscles, skin, and so on. Each stage depends on the current state of the others, and all of them depend on contextual physical, biological, chemical and cultural processes.
A central concept in Shannon's information theory is something that he called entropy, which in this context is a measure of how statistical patterns in a source of messages affect the amount of information that the messages can convey. If certain patterns of bits are more likely than others, then their presence conveys less information, because the uncertainty is reduced by a smaller amount. In English, for example, the letter 'E' is much more common than the letter 'Q'. So receiving an 'E' tells you less than receiving a 'Q'. Given a choice between 'E' and 'Q', your best bet is that you're going to receive an 'E'*. And you learn the most when your expectations are proved wrong. Shannon's entropy smooths out these statistical biases and provides a 'fair' measure of information content.