One educated look at how the molecule is shaped and you can figure out what it’s for. Even at the molecular level, function follows form. Before us is a detailed blueprint of breathtaking precision for building complex molecular machines. The molecule is very long and composed of two intertwined strands. Running the length of each strand is a sequence made of four smaller molecular building blocks, the nucleotides—which humans conventionally represent by the letters A, C, G, and T. (Each nucleotide molecule actually looks like a ring, or two connected rings, made of atoms.) On and on the sequence goes, for billions of letters. A short segment of it might read something like this:A​T​G​A​A​G​T​C​G​A​T​C​C​T​A​G​A​T​G​G​C​C​T​T​G​C​A​G​A​C​A​C​C​A​C​C​T​T​C​C​G​T​A​C​C​A​T​C​A​C​C​A​C​A​G​A​C​C​T​C​C​T …

Along the opposite strand there’s an identical sequence, except that wherever nucleotide A was in the first strand, it’s T in the second; and instead of G it’s always C. And vice versa. Like this:T​A​C​T​T​C​A​G​C​T​A​G​G​A​T​C​T​A​C​C​G​G​A​A​C​G​T​C​T​G​T​G​G​T​G​G​A​A​G​G​C​A​T​G​G​T​A​G​T​G​G​T​G​T​C​T​G​G​A​G​G​A …

This is a code, a long sequence of words written out in an alphabet of only four letters. As in ancient human writing, there are no spaces between the words. Inside this molecule there are, written in a special language of life, detailed instructions—or rather, two copies of the same detailed instructions, because the information in one strand can surely be reconstructed from the information in the other, once you understand the simple substitution cipher. The message is redundant, bespeaking care, conservatism; it conveys a sense that whatever it is saying must be preserved, treasured, passed intact to future generations.

Almost every issue of leading scientific journals such as Science or Nature contains the newly uncovered ACGT sequence of some part of the genetic instructions of some lifeform or other. We’re slowly beginning to read the genetic libraries. The library of our own hereditary information, the human genome, is also becoming increasingly revealed, but there’s a lot to read: Every cell of your body has a full set of instructions about how to manufacture you, encoded in a very compressed format—it takes only a picogram (a trillionth of a gram) of this molecule to specify everything you’ve inherited from your ancestors, back to the first beings of the primeval sea. Yet, there are almost as many nucleotide building blocks, or “letters,” in the microminiaturized genetic information in any of your cells as there are people on Earth.

All words in the genetic code are three letters long. So, if we insert the implicit spaces between the words, the beginning of the first message above looks like this:ATG AAG TCG ATC CTA GAT GGC CTT GCA GAC ACC ACC TTC CGT ACC …

Since there are only four kinds of nucleotides (A, C, G, and T), there are at most only 4 × 4 × 4 = 64 possible words in this language. But if the order in which the words are put together is central to the meaning of the message, you can say a great deal with only a few dozen different words. With messages that are a billion carefully selected words long, what might be possible? You must take care in reading the message, though: With no spaces between the words, if you start reading at the wrong place, the meaning will surely change and a lucid message might be reduced to gibberish. This is one reason the giant molecule has special code words meaning “START READING HERE” and “STOP READING HERE.”

As you watch the molecule closely you observe that the two strands occasionally unwind and unzip. Each copies the other, using available A, C, G, and T raw materials—like the metal type stored in an old-fashioned printer’s box Now, instead of one pair, there are two pairs of identical messages. As well as utilizing a language and embodying a complex, redundantly encoded text, this molecule is a printing press.

But what’s the use of a message if nobody reads it? Through copying links and relays, the sequences of As, Cs, Gs, and Ts are revealed to be the job orders and blueprints for the construction of particular molecular machine tools. Some sequences are orders to itself—arranging for the giant molecule to twist and kink so it can then issue a particular set of instructions. Other sequences ensure that the instructions will be followed to the letter. Many three-letter words specify a particular amino acid (or a punctuation mark, like the one that signifies “START”) out there in the surrounding cell, and the sequence of words encoded determines the sequence of amino acids that will make up the protein machine tools that control the life of the cell. Once such a protein is manufactured, it usually twists and folds itself into a three-dimensional shape spring-loaded for action. Sometimes another protein bends it into shape. These machine tools, at a pace determined both by the long double-stranded molecule and by the outside world, then proceed on their own to strip other molecules down, to build new ones up, to help communicate molecular or electrical messages to other cells.

This is a description of some of the humdrum, everyday action in each of the ten trillion or so cells of your body, and those of nearly every other plant, animal, and microbe on Earth. The tiny machine tools perform stupefying feats of molecular transformation. They are submicroscopic and made of organic molecules, rather than macroscopic and made of silicates or steel, but at the molecular level life was tool-using and tool-making from the start.

The long self-replicating double-stranded molecule with the complex message is a sequence of genes, a little like beads on a string. Chemically, it is a nucleic acid (here, the kind abbreviated DNA, which stands for deoxyribonucleic acid). The two strands, wrapped around each other, comprise the famous DNA double helix. The nucleotide bases in DNA are called adenine, cytosine, guanine, and thymine, which is where the abbreviations A, C, G, and T come from. Their names date back to long before their key role in heredity was understood. Guanine, for example, is named unpretentiously after guano, the bird droppings from which it was first isolated. It is a double ring molecule made of five carbon atoms, five hydrogens, five nitrogens, and one oxygen. There’s something like a billion guanines (and roughly equal numbers of As, Cs, and Ts) in the genes of any one of your cells.

Except for some oddball microbes, the genetic information of every organism on Earth is contained in DNA—a molecular engineer of formidable, even awesome talents. One (very long) sequence of As, Cs, Gs, and Ts contains all the information for making a person; another such sequence, nearly identical, for a chimpanzee; others, not so different, for a wolf or a mouse. In turn, the sequences for nightingales, sidewinders, toads, carp, scallops, forsythia, club mosses, seaweed, and bacteria are still more different—although even they collectively hold many sequences of As, Cs, Gs, and Ts in common. A typical gene, controlling or contributing to one specific hereditary trait, might be a few thousand nucleotides long. Some genes may comprise more than a million As, Cs, Gs, and Ts. Their sequences specify the chemical instructions for, say, manufacturing the organic pigments that make eyes brown or green; or extracting energy out of food; or finding the opposite sex.

How this complex information got into our cells, and how arrangements were made for its precise replication and the obedient implementation of its instructions, is tantamount to asking how life evolved. Nucleic acids were unknown when The Origin of Species was first published, and the messages they contain were not to be deciphered for another century. They constitute the demonstration and definitive record of evolution that Darwin sought. Scattered in the ACGT sequences of the diverse lifeforms of our planet is an incomplete history of the evolution of life—not the blood, bones, brains, and the other manufactured products of the genetic factories, but the actual production records, the master instructions themselves, slowly varying at different rates in different beings in different epochs.