Выбрать главу

O Romeo, Romeo! wherefore fart thou Romeo?

This puerile example illustrates why a script needs to be reproduced faithfully. It can be the same with our DNA – one inappropriate change (a mutation) can have devastating effects. This is particularly true if the mutation is present in an egg or a sperm, as this can ultimately lead to the birth of an individual in whom all the cells carry the mutation. Some mutations have devastating clinical effects. These range from children who age so prematurely that a ten-year-old has the body of a person of 70, to women who are pretty much predestined to develop aggressive and difficult to treat breast cancer before they are 40 years of age. Thankfully, these sorts of genetic mutations and conditions are relatively rare compared with the types of diseases that afflict most people.

The 50,000,000,000,000 or so cells in a human body are all the result of perfect replication of DNA, time after time after time, whenever cells divide after the formation of that single-cell zygote from Chapter 1. This is all the more impressive when we realise just how much DNA has to be reproduced each time one cell divides to form two daughter cells. Each cell contains six billion base-pairs of DNA (half originally came from your father and half from your mother). This sequence of six billion base-pairs is what we call the genome. So every single cell division in the human body was the result of copying 6,000,000,000 bases of DNA. Using the same type of calculation as in Chapter 1, if we count one base-pair every second without stopping, it would take a mere 190 years to count all the bases in the genome of a cell. When we consider that a baby is born just nine months after the creation of the single-celled zygote, we can see that our cells must be able to replicate DNA really fast.

The three billion base-pairs we inherit from each parent aren’t formed of one long string of DNA. They are arranged into smaller bundles, which are the chromosomes. We’ll delve deeper into these in Chapter 9.

Reading the script

Let’s go back to the more fundamental question of what these six billion base-pairs of DNA actually do, and how the script works. More specifically how can a code that only has four letters (A, C, G and T) create the thousands and thousands of different proteins found in our cells? The answer is surprisingly elegant. It could be described as the modular paradigm of molecular biology but it’s probably far more useful to think of it as Lego.

Lego used to have a great advertising slogan ‘It’s a new toy every day’, and it was very accurate. A large box of Lego contains a limited number of designs, essentially a fairly small range of bricks of certain shapes, sizes and colours. Yet it’s possible to use these bricks to create models of everything from ducks to houses, and from planes to hippos. Proteins are rather like that. The ‘bricks’ in proteins are quite small molecules called amino acids, and there are twenty standard amino acids (different Lego bricks) in our cells. But these twenty amino acids can be joined together in an incredible array of combinations of all sorts of diversity and length, to create an enormous number of proteins.

That still leaves the problem of how even as few as twenty amino acids can be encoded by just four bases in DNA. The way this works is that the cell machinery ‘reads’ DNA in blocks of three base-pairs at a time. Each block of three is known as a codon and may be AAA, or GCG or any other combination of A, C, G and T. From just four bases it’s possible to create sixty-four different codons, more than enough for the twenty amino acids. Some amino acids are coded for by more than one codon. For example, the amino acid called lysine is coded for by AAA and AAG. A few codons don’t code for amino acids at all. Instead they act as signals to tell the cellular machinery that it’s at the end of a protein-coding sequence. These are referred to as stop codons.

How exactly does the DNA in our chromosomes act as a script for producing proteins? It does it through an intermediary protein, a molecule called messenger RNA (mRNA). mRNA is very like DNA although it does differ in a few significant details. Its backbone is slightly different from DNA (hence RNA, which stands for ribonucleic acid rather than deoxyribonucleic acid); it is single-stranded (only one backbone); it replaces the T base with a very similar but slightly different one called U (we don’t need to go into the reason it does this here). When a particular DNA stretch is ‘read’ so that a protein can be produced using that bit of script, a huge complex of proteins unzips the right piece of DNA and makes mRNA copies. The complex uses the base-pairing principle to make perfect mRNA copies. The mRNA molecules are then used as temporary templates at specialised structures in the cell that produce protein. These read the three letter codon code and stitch together the right amino acids to form the longer protein chains. There is of course a lot more to it than all this, but that’s probably sufficient detail.

An analogy from everyday life may be useful here. The process of moving from DNA to mRNA to protein is a bit like controlling an image from a digital photograph. Let’s say we take a photograph on a digital camera of the most amazing thing in the world. We want other people to have access to the image, but we don’t want them to be able to change the original in any way. The raw data file from the camera is like the DNA blueprint. We copy it into another format, that can’t be changed very much – a PDF maybe – and then we email out thousands of copies of this PDF, to everyone who asks for it. The PDF is the messenger RNA. If people want to, they can print paper copies from this PDF, as many as they want, and these paper copies are the proteins. So everyone in the world can print the image, but there is only one original file.

Why so complicated, why not just have a direct mechanism? There are a number of good reasons that evolution has favoured this indirect method. One of them is to prevent damage to the script, the original image file. When DNA is unzipped it is relatively susceptible to damage and that’s something that cells have evolved to avoid. The indirect way in which DNA codes for proteins minimises the period of time for which a particular stretch of DNA is open and vulnerable. The other reason this indirect method has been favoured by evolution is that it allows a lot of control over the amount of a specific protein that’s produced, and this creates flexibility.

Consider the protein called alcohol dehydrogenase (ADH). This is produced in the liver and breaks down alcohol. If we drink a lot of alcohol, the cells of our livers will increase the amounts of ADH they produce. If we don’t drink for a while, the liver will produce less of this protein. This is one of the reasons why people who drink frequently are better able to tolerate the immediate effects of alcohol than those who rarely drink, who will become tipsy very quickly on just a couple of glasses of wine. The more often we drink alcohol, the more ADH protein our livers produce (up to a limit). The cells of the liver don’t do this by increasing the number of copies of the ADH gene. They do this by reading the ADH gene more efficiently, i.e. producing more mRNA copies and/or by using these mRNA copies more efficiently as protein templates.

As we shall see, epigenetics is one of the mechanisms a cell uses to control the amount of a particular protein that is produced, especially by controlling how many mRNA copies are made from the original template.

The last few paragraphs have all been about how genes encode proteins. How many genes are there in our cells? This seems like a simple question but oddly enough there is no agreed figure on this. This is because scientists can’t agree on how to define a gene. It used to be quite straightforward – a gene was a stretch of DNA that encoded a protein. We now know that this is far too simplistic. However, it’s certainly true to say that all proteins are encoded by genes, even if not all genes encode proteins. There are about 20,000 to 24,000 protein-encoding genes in our DNA, a much lower estimate than the 100,000 that scientists thought was a good guess just ten years ago[17].

вернуться

17

See http://genome.wellcome.ac.uk/doc_WTD020745.html for a wealth of useful genome-related facts and figures.