Consider pigmentation. Pigmentation is a complex trait, by which we mean it is influenced by lots of genes acting together. The end results in this case are eye, hair and skin colour. We all know by experience that humans vary enormously with respect to these features of our appearance. In addition to several genes contributing to pigmentation levels, there are also different variants of those genes, creating additional potential for variation.{293}
One of the major variants is a single base difference, which occurs as either a C or a T. The T version is associated with higher levels of dark pigment, the C version with lower levels.[45] But this variation doesn’t lie in a protein-coding gene. It has been shown that the reason it affects pigmentation is because it is in an enhancer region, 21,000 base pairs away from the target gene. This target gene codes for a protein that is important for pigment production. We know this because mutations in this gene lead to a form of albinism where the affected individual can’t make pigment.{294}[46]
It has been shown experimentally that the enhancer loops to the target. Transcription factors that control the target bind with greater or lesser efficiency depending on the C or T base.{295} This is very similar to the situation outlined above for pancreatic agenesis, and uses pretty much the same mechanism as shown in Figure 15.2.
It’s quite likely that there are a lot of similar relationships between single base changes in junk DNA and the expression of protein-coding genes. This has implications for understanding human diversity and human health and disease. There are a large number of conditions where we know genetics plays a role in whether or not a person develops a disorder. In these conditions, a person’s genetic background influences their likelihood of suffering from an illness, but doesn’t explain it entirely. The environment also plays a role; as, sometimes, does plain bad luck.
We can identify disorders with a genetic contribution by looking at how often a disease occurs in a family. Twins are particularly useful in this analysis. Let’s look at Huntington’s disease, a devastating neurological disorder caused by a mutation in one gene. If one twin has the condition, their identical twin will also always have the disease (unless they die early from an unrelated cause, such as a traffic accident). Huntington’s disease is 100 per cent due to genetics.
But if we look instead at schizophrenia, we find that if one twin suffers from this condition, there is only a 50 per cent chance that their identical twin is also affected. This has been calculated by analysing lots of twin pairs and working out the frequency with which both twins develop the condition. This tells us that genetics contributes about half the risk for developing schizophrenia and the other risk factors aren’t due to the genome.
Researchers can extend these studies into other family members, because we know how much genetic information family members share. For example, non-identical siblings share 50 per cent of their genetic information, as do parents and children. First cousins share only 12.5 per cent of their genomes. It’s possible to use this information to calculate the contribution of genetics to a large range of conditions from rheumatoid arthritis to diabetes, and from multiple sclerosis to Alzheimer’s disease. In these conditions, and many more, genetics and environment act together.
If we can find enough families, we can analyse their genomes to identify regions that are associated with disease. But we have to remember that the data we generate will be very different from the simple situation we see with a purely genetic condition such as Huntington’s disease. In Huntington’s 100 per cent of the genetic contribution lies in one mutation in one protein-coding gene. But for a condition such as schizophrenia, the 50 per cent genetic contribution to the disease isn’t due to just one gene, and the same is true of most other conditions where both genetics and the environment play a role. There could be five genes each contributing 10 per cent of the risk for schizophrenia, or twenty genes each contributing 2.5 per cent of the risk. Or any other combination of which you can think. This makes it harder to identify the relevant genetic factors, and to prove that sequence changes really do influence the condition being studied.
Notwithstanding these difficulties, more than 80 diseases and traits have been mapped using these methods, generating thousands of candidate regions and variations.[47]{296} Remarkably, nearly 90 per cent of the regions identified across these studies are in junk DNA. About half are in the regions between genes, and the other half are in the junk regions within genes.{297}
We have to be very careful about assuming that because we can detect a variation in DNA that is associated with disease, that this means the variation has a role in causing the disease. Sometimes we may just be looking at guilt by association. The genetic change that really contributes to the condition may be a different variation that is close by, and our candidate may just have been carried along for the ride.
An example of guilt by association would be cirrhosis of the liver. One way to assess exposure to cigarette smoke is to measure the levels of carbon monoxide in a person’s breath. Ten years ago, if we measured the levels of this gas in the breath of non-smokers with liver disease we might find that there were higher concentrations of this gas in the airways of people with the condition than without, on average. One interpretation (although not the only one) would be that passive smoking increases the risk of liver cirrhosis. But in reality, the carbon monoxide levels are a case of guilt by association. They probably just reflect that the patient may spend a lot of time in pubs and bars, because excessive alcohol consumption is a major risk factor for developing this illness. Until the introduction of smoking bans in many cities, pubs and bars were traditionally pretty smoke-filled environments.
Even if we exclude guilt by association when analysing the contribution of a genetic variation to a human disorder, we still need to be really careful to test hypotheses about the functional consequences of our findings. Otherwise, we can be badly misled.
The variation that contributes to human pigmentation that we met earlier in this chapter actually lies in the introns, the bits of junk DNA that lie in between the protein-coding parts of a gene, which we first met in Chapter 2. The gene is very big, and the variant base pair is in the 86th stretch of junk DNA between amino acid-coding regions. But this gene itself plays no role in control of pigment levels. So we have clear precedent for accepting that variations in the junk regions in one gene may be important for effects on other genes.
Obesity is one area in which there has been a great deal of interest in identifying genetic variants linked to physical variation. Nearly 80 different regions in the human genome have been associated with obesity or with other relevant parameters such as body mass index.{298}
In multiple studies, the variation showing the greatest association with obesity was a single base pair change in a candidate protein-coding gene on chromosome 16.[48]{299}{300} Individuals who inherited an A on both copies of this gene tended to be about 3kg (6.6lb) heavier than individuals who inherited a T on both copies. This change was in the junk region between the first two amino acid-coding stretches of the candidate gene. The fact that this association was detected in more than one study was important as this increases our confidence that we are looking at a meaningful event.
47
This approach to finding disease/trait-associated genes and variants is known as GWAS — genome-wide association studies.