1
|
Savisaar R, Hurst LD. Exonic splice regulation imposes strong selection at synonymous sites. Genome Res 2018; 28:1442-1454. [PMID: 30143596 PMCID: PMC6169883 DOI: 10.1101/gr.233999.117] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 07/31/2018] [Indexed: 01/17/2023]
Abstract
What proportion of coding sequence nucleotides have roles in splicing, and how strong is the selection that maintains them? Despite a large body of research into exonic splice regulatory signals, these questions have not been answered. This is because, to our knowledge, previous investigations have not explicitly disentangled the frequency of splice regulatory elements from the strength of the evolutionary constraint under which they evolve. Current data are consistent both with a scenario of weak and diffuse constraint, enveloping large swaths of sequence, as well as with well-defined pockets of strong purifying selection. In the former case, natural selection on exonic splice enhancers (ESEs) might primarily act as a slight modifier of codon usage bias. In the latter, mutations that disrupt ESEs are likely to have large fitness and, potentially, clinical effects. To distinguish between these scenarios, we used several different methods to determine the distribution of selection coefficients for new mutations within ESEs. The analyses converged to suggest that ∼15%-20% of fourfold degenerate sites are part of functional ESEs. Most of these sites are under strong evolutionary constraint. Therefore, exonic splice regulation does not simply impose a weak bias that gently nudges coding sequence evolution in a particular direction. Rather, the selection to preserve these motifs is a strong force that severely constrains the evolution of a substantial proportion of coding nucleotides. Thus synonymous mutations that disrupt ESEs should be considered as a potentially common cause of single-locus genetic disorders.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| |
Collapse
|
2
|
Savisaar R, Hurst LD. Both Maintenance and Avoidance of RNA-Binding Protein Interactions Constrain Coding Sequence Evolution. Mol Biol Evol 2017; 34:1110-1126. [PMID: 28138077 PMCID: PMC5400389 DOI: 10.1093/molbev/msx061] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
While the principal force directing coding sequence (CDS) evolution is selection on protein function, to ensure correct gene expression CDSs must also maintain interactions with RNA-binding proteins (RBPs). Understanding how our genes are shaped by these RNA-level pressures is necessary for diagnostics and for improving transgenes. However, the evolutionary impact of the need to maintain RBP interactions remains unresolved. Are coding sequences constrained by the need to specify RBP binding motifs? If so, what proportion of mutations are affected? Might sequence evolution also be constrained by the need not to specify motifs that might attract unwanted binding, for instance because it would interfere with exon definition? Here, we have scanned human CDSs for motifs that have been experimentally determined to be recognized by RBPs. We observe two sets of motifs-those that are enriched over nucleotide-controlled null and those that are depleted. Importantly, the depleted set is enriched for motifs recognized by non-CDS binding RBPs. Supporting the functional relevance of our observations, we find that motifs that are more enriched are also slower-evolving. The net effect of this selection to preserve is a reduction in the over-all rate of synonymous evolution of 2-3% in both primates and rodents. Stronger motif depletion, on the other hand, is associated with stronger selection against motif gain in evolution. The challenge faced by our CDSs is therefore not only one of attracting the right RBPs but also of avoiding the wrong ones, all while also evolving under selection pressures related to protein structure.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
3
|
Savisaar R, Hurst LD. Estimating the prevalence of functional exonic splice regulatory information. Hum Genet 2017; 136:1059-1078. [PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022]
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
4
|
Abstract
Exonic splice enhancers (ESEs) are short nucleotide motifs, enriched near exon ends, that enhance the recognition of the splice site and thus promote splicing. Are intronless genes under selection to avoid these motifs so as not to attract the splicing machinery to an mRNA that should not be spliced, thereby preventing the production of an aberrant transcript? Consistent with this possibility, we find that ESEs in putative recent retrocopies are at a higher density and evolving faster than those in other intronless genes, suggesting that they are being lost. Moreover, intronless genes are less dense in putative ESEs than intron-containing ones. However, this latter difference is likely due to the skewed base composition of intronless sequences, a skew that is in line with the general GC richness of few exon genes. Indeed, after controlling for such biases, we find that both intronless and intron-containing genes are denser in ESEs than expected by chance. Importantly, nucleotide-controlled analysis of evolutionary rates at synonymous sites in ESEs indicates that the ESEs in intronless genes are under purifying selection in both human and mouse. We conclude that on the loss of introns, some but not all, ESE motifs are lost, the remainder having functions beyond a role in splice promotion. These results have implications for the design of intronless transgenes and for understanding the causes of selection on synonymous sites.
Collapse
Affiliation(s)
- Rosina Savisaar
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
5
|
Abstract
The discovery that many intron-containing genes can be cotranscriptionally spliced has led to an increased understanding of how splicing and transcription are intricately intertwined. Cotranscriptional splicing has been demonstrated in a number of different organisms and has been shown to play roles in coordinating both constitutive and alternative splicing. The nature of cotranscriptional splicing suggests that changes in transcription can dramatically affect splicing, and new evidence suggests that splicing can, in turn, influence transcription. In this chapter, we discuss the mechanisms and consequences of cotranscriptional splicing and introduce some of the tools used to measure this process.
Collapse
Affiliation(s)
- Evan C Merkhofer
- Molecular Biology Section, Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | | | | |
Collapse
|
6
|
Kapustin Y, Chan E, Sarkar R, Wong F, Vorechovsky I, Winston RM, Tatusova T, Dibb NJ. Cryptic splice sites and split genes. Nucleic Acids Res 2011; 39:5837-44. [PMID: 21470962 PMCID: PMC3152350 DOI: 10.1093/nar/gkr203] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
We describe a new program called cryptic splice finder (CSF) that can reliably identify cryptic splice sites (css), so providing a useful tool to help investigate splicing mutations in genetic disease. We report that many css are not entirely dormant and are often already active at low levels in normal genes prior to their enhancement in genetic disease. We also report a fascinating correlation between the positions of css and introns, whereby css within the exons of one species frequently match the exact position of introns in equivalent genes from another species. These results strongly indicate that many introns were inserted into css during evolution and they also imply that the splicing information that lies outside some introns can be independently recognized by the splicing machinery and was in place prior to intron insertion. This indicates that non-intronic splicing information had a key role in shaping the split structure of eukaryote genes.
Collapse
Affiliation(s)
- Yuri Kapustin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, USA.
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Abstract
In the present study we have examined human-mouse homologous intronless disease and non-disease genes alongside their extent of sequence conservation, tissue expression, domain and gene ontology composition to get an idea regarding evolutionary and functional attributes. We show that selection has significantly discriminated between the two groups and the disease associated genes in particular exhibit lower K(a) and K(a)/K(s) while K(s) although smaller is not significantly different. Our analyses suggest that majority of disease related intronless human genes have homology limited to eukaryotic genomes and their expression is localized. Also we observed that different classes of intronless disease related genes have experienced diverse selective pressures and are enriched for higher level functionality that is essentially needed for developmental processes in complex organisms. It is expected that these insights will enhance our understanding of the nature of these genes and also improve our ability to identify disease related intronless genes.
Collapse
Affiliation(s)
- Subhash Mohan Agarwal
- Center for Computational Biology and Bioinformatics, School of Information Technology, Jawaharlal Nehru University, New Delhi 110067, India.
| | | |
Collapse
|
8
|
Abstract
Sam68 (Src-associated in mitosis, 68 kDa) is a KH domain RNA binding protein implicated in a variety of cellular processes, including alternative pre-mRNA splicing, but its functions are not well understood. Using RNA interference knockdown of Sam68 expression and splicing-sensitive microarrays, we identified a set of alternative exons whose splicing depends on Sam68. Detailed analysis of one newly identified target exon in epsilon sarcoglycan (Sgce) showed that both RNA elements distributed across the adjacent introns and the RNA binding activity of Sam68 are necessary to repress the Sgce exon. Sam68 protein is upregulated upon neuronal differentiation of P19 cells, and many Sam68 RNA targets change in expression and splicing during this process. When Sam68 is knocked down by short hairpin RNAs, many Sam68-dependent splicing changes do not occur and P19 cells fail to differentiate. We also found that the differentiation of primary neuronal progenitor cells from embryonic mouse neocortex is suppressed by Sam68 depletion and promoted by Sam68 overexpression. Thus, Sam68 controls neurogenesis through its effects on a specific set of RNA targets.
Collapse
|
9
|
Akker SA, Misra S, Aslam S, Morgan EL, Smith PJ, Khoo B, Chew SL. Pre-spliceosomal binding of U1 small nuclear ribonucleoprotein (RNP) and heterogenous nuclear RNP E1 is associated with suppression of a growth hormone receptor pseudoexon. Mol Endocrinol 2007; 21:2529-40. [PMID: 17622584 DOI: 10.1210/me.2007-0038] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Pseudoexons occur frequently in the human genome. This paper characterizes a pseudoexon in the GH receptor gene. Inappropriate activation of this pseudoexon causes Laron syndrome. Using in vitro splicing assays, pseudoexon silencing was shown to require a combination of a weak 5' pseudosplice-site and splicing silencing elements within the pseudoexon. Immunoprecipitation experiments showed that specific binding of heterogenous nuclear ribonucleoprotein E1 (hnRNP E1) and U1 small nuclear ribonucleoprotein (snRNP) in the pre-spliceosomal complex was associated with silencing of pseudoexon splicing. The possible role of hnRNP E1 was further supported by RNA interference experiments in cultured cells. Immunoprecipitation experiments with three other pseudoexons suggested that pre-spliceosomal binding of U1 snRNP is a potential general mechanism of suppression of pseudoexons.
Collapse
Affiliation(s)
- Scott A Akker
- Department of Endocrinology, 5th Floor, King George V Block, St Bartholomew's Hospital, West Smithfield, London EC1A 7BE, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
10
|
Niu DK. Protecting exons from deleterious R-loops: a potential advantage of having introns. Biol Direct 2007; 2:11. [PMID: 17459149 PMCID: PMC1863416 DOI: 10.1186/1745-6150-2-11] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2007] [Accepted: 04/25/2007] [Indexed: 02/02/2023] Open
Abstract
Background Accumulating evidence indicates that the nascent RNA can invade and pair with one strand of DNA, forming an R-loop structure that threatens the stability of the genome. In addition, the cost and benefit of introns are still in debate. Results At least three factors are likely required for the R-loop formation: 1) sequence complementarity between the nascent RNA and the target DNA, 2) spatial juxtaposition between the nascent RNA and the template DNA, and 3) accessibility of the template DNA and the nascent RNA. The removal of introns from pre-mRNA reduces the complementarity between RNA and the template DNA and avoids the spatial juxtaposition between the nascent RNA and the template DNA. In addition, the secondary structures of group I and group II introns may act as spatial obstacles for the formation of R-loops between nearby exons and the genomic DNA. Conclusion Organisms may benefit from introns by avoiding deleterious R-loops. The potential contribution of this benefit in driving intron evolution is discussed. I propose that additional RNA polymerases may inhibit R-loop formation between preceding nascent RNA and the template DNA. This idea leads to a testable prediction: intermittently transcribed genes and genes with frequently prolonged transcription should have higher intron density. Reviewers This article was reviewed by Dr. Eugene V. Koonin, Dr. Alexei Fedorov (nominated by Dr. Laura F Landweber), and Dr. Scott W. Roy (nominated by Dr. Arcady Mushegian).
Collapse
Affiliation(s)
- Deng-Ke Niu
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China.
| |
Collapse
|
11
|
Wang HF, Hou WR, Niu DK. Strand compositional asymmetries in vertebrate large genes. Mol Biol Rep 2007; 35:163-9. [PMID: 17420956 DOI: 10.1007/s11033-007-9066-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Accepted: 02/26/2007] [Indexed: 10/23/2022]
Abstract
Both transcription-associated and replication-associated strand compositional asymmetries have recently been shown in vertebrate genomes. In this paper, we illustrate that transcription-associated strand compositional asymmetries and replication-associated ones coexist in most vertebrate large genes, although in most case the former conceals the latter. Furthermore, we found that the transcription-associated strand compositional asymmetries of housekeeping genes are stronger than those of somatic cell expressed genes. Together with other evidence, we suggest that germline transcription-associated strand asymmetric mutations may be the main cause of the transcription-associated strand compositional asymmetries.
Collapse
Affiliation(s)
- Hai-Fang Wang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | | | | |
Collapse
|