Imagine you meet someone and are extraordinarily smitten by them. You have heard that they love poetry, and you want to sweep them off their feet, but you always skipped literature class in school. A friend gives you a piece of paper with a killer first line from a poem on it. But for some reason your vaguely sociopathic friend has split up the words of the first line among a load of gibberish, and you only have a couple of seconds to find the poetry, say it out loud and win someone’s heart (or at least their attention). Can you do it? Take a very quick look at Figure 17.1 and find out.
Figure 17.1 A glorious line of poetry is in here somewhere. Have a quick look: can you find the words that will win someone’s heart?
This is what our cells do all the time, every moment of every day of our lives. Machinery in the cell analyses a long stretch of apparent gibberish and almost instantaneously finds the hidden words and joins them together. You can take a look at Figure 17.2 to see if you managed to compete with the non-sentient proteins that keep you alive.
Figure 17.2 The words shown in bold and underlined should do the trick for you: one of the most romantic and seductive first lines of poetry in the English language, ‘Had we but world enough and time’, from Andrew Marvell’s ‘To His Coy Mistress’.
In any long stretch of random letters there will also be combinations that spell words just by chance. Use these words by mistake when wooing (does anyone still woo?) the object of your desires and you may ruin your one chance for happiness. Figure 17.3 will show you how.
Figure 17.3 No! Bad combination! With a selection of right and wrong words, the sentiment may be very different, e.g. ‘Had we but had enough to drink’.
By using this slightly bizarre example, we can understand some of the mechanistic challenges that our cells face when splicing RNA molecules properly. If we were designing this as a process, it would have the components shown in Figure 17.4.{339} In addition to the components described in this diagram, it’s important to realise that different cells will handle the same gene differently, depending on the cell type and what is happening to it at any given moment. Consequently, all the stages have to be appropriately regulated and integrated so that the correct protein variants are made to meet the needs of the situation.
Figure 17.4 The sequence, reading from the top, lays out the steps that the splicing machinery has to be able to carry out to join up the appropriate amino acid-coding regions to create the correct mature messenger RNA.
This splicing of long RNAs to create smaller messenger RNAs that carry the information for specific proteins is a really complex process. It’s a very ancient system, and the components and steps have been maintained from yeast throughout the entire animal kingdom. It is carried out by a huge conglomeration of molecules called the spliceosome, which forms the splicing machinery. The spliceosome is composed of hundreds of proteins and also some junk RNAs, a little like the ribosomes that act as the factories to produce proteins.{340}
One of the critical stages is that the spliceosome wraps around the intervening sequences that need to be removed from an RNA molecule. It snips them out and then joins up the amino acid-coding regions. It’s an enormously complicated multi-stage process but we know that one of the first key steps is that the spliceosome needs to recognise the intervening regions, so that it can bind to them and remove them.
The beginnings and ends of these intervening sequences are always indicated by particular two-base sequences. Junk RNA molecules in the spliceosome can bind to these two-base sequences in much the same way as the two strands of DNA can pair up in our genes.
But there are only four bases in RNA, which means there are only sixteen two-base sequences (AC and CA are considered as different pairs, as are all the others). We would expect that the two-base sequences that mark the beginnings and ends of the intervening sequences would also be found elsewhere in these sequences, and also in the amino acid-coding regions. This is indeed the case. So although these two-base sequences are necessary for splicing, they aren’t sufficient on their own to direct the process properly. Other sequences are also required, as indicated in Figure 17.5.
Figure 17.5 Multiple sequences within an RNA molecule interact to drive splicing. The two-base motifs shown are necessary but not in themselves sufficient to regulate all the fine-tuning of this process. Other sites are involved, of varying strengths, as indicated by the different sizes of arrows.
The other sequences involved in selecting how splicing will take place are found in both the junk intervening regions and the amino acid-coding regions. Some of them influence splicing very strongly, others are more subtle. Some increase the chances of a splice event, others decrease them. They work in complex partnerships and the impact that they have on the final splicing pattern is affected by other things happening in the cell, such as the precise complement of proteins in the spliceosome. The descriptions that are used for these modifying sequences usually include such words as ‘dizzying’ or ‘bewildering’. These are geek speak for ‘unbelievably complicated, way beyond anything we can get our heads around or even design predictive computer algorithms for at the moment.’
We can get clues to the degree of sophistication by looking at a group of genetic diseases. These include a form of blindness called retinitis pigmentosa, which affects about one in 4,000 people. The blindness is progressive, often starting in the teenage years with a decline in night vision, and then becoming steadily worse and more disabling with age. The loss of vision occurs because the cells in the eye that detect light gradually die off.{341} About one in twenty cases is caused by a mutation in one of five proteins involved in a specific step in splicing.{342}{343}{344}{345} The mutation only causes a deficit in the cells of the retina, and not in all the other cells in the body which also rely on splicing. This shows us that splicing is under complex cell- and gene-specific control, in ways that we haven’t yet been able to understand.
By contrast, there is a very severe form of dwarfism with other unusual features such as dry skin, sparse hair, seizures and learning disabilities. Affected children almost always die before they are four years old.{346} It’s very rare except in the Ohio Amish community, where 8 per cent of the people are carriers. That’s because the mutation that causes this condition was present in the small number of families that founded this community. It isn’t found in other Amish groups such as those in Pennsylvania, which were founded by other families. When the mutation that causes this condition was identified, the researchers first thought that it was changing the amino acid sequence of a gene that codes for a splicing protein. But we now know that the change actually disrupts the three-dimensional structure of a junk RNA that forms part of the spliceosome.{347} Unlike the retinitis pigmentosa situation, this defect in the action of the spliceosome causes a very wide-ranging set of symptoms, possibly by causing mis-splicing of lots of different genes.