1
|
Wang S, Chen X. Identification of Protein-Coding Gene Structure and Protein-Related Genes and Their Splicing Sites in Kidney Stone Disease: A Protein Big Data Analysis. Appl Biochem Biotechnol 2023; 195:6020-6031. [PMID: 36763230 DOI: 10.1007/s12010-023-04322-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2023] [Indexed: 02/11/2023]
Abstract
The study of protein-coding gene structure and protein-related genes in kidney stone disease is used for accurate identification of splicing sites and accurate location of gene exon boundaries, which is one of the difficulties and key problems in understanding the genome and discovering new genes. Prediction techniques based on signal characteristics of conserved sequences around splicing sites, such as the weighted array model (WAM), are widely used. On this basis, several other features that can be used for splicing site recognition (such as the base composition of splicing site upstream and downstream sequences, the change of signal and base composition of upstream and downstream sequences with the C + G content of adjacent sequences) were mined further, and a model was developed to describe these features. In this study, a log-linear model that can effectively integrate these features for splicing site recognition was designed, and a SpliceKey programme was developed. The findings reveal that SpliceKey's splicing site identification accuracy is not only much better than the WAM approach, but also better than DGSplice.
Collapse
Affiliation(s)
- Shiyu Wang
- The Second Hospital of Jilin University, Changchun, Jilin Province, China.
| | - Xiangmei Chen
- The Second Hospital of Jilin University, Changchun, Jilin Province, China
| |
Collapse
|
2
|
Badr E, ElHefnawi M, Heath LS. Computational Identification of Tissue-Specific Splicing Regulatory Elements in Human Genes from RNA-Seq Data. PLoS One 2016; 11:e0166978. [PMID: 27861625 PMCID: PMC5115852 DOI: 10.1371/journal.pone.0166978] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 11/07/2016] [Indexed: 12/24/2022] Open
Abstract
Alternative splicing is a vital process for regulating gene expression and promoting proteomic diversity. It plays a key role in tissue-specific expressed genes. This specificity is mainly regulated by splicing factors that bind to specific sequences called splicing regulatory elements (SREs). Here, we report a genome-wide analysis to study alternative splicing on multiple tissues, including brain, heart, liver, and muscle. We propose a pipeline to identify differential exons across tissues and hence tissue-specific SREs. In our pipeline, we utilize the DEXSeq package along with our previously reported algorithms. Utilizing the publicly available RNA-Seq data set from the Human BodyMap project, we identified 28,100 differentially used exons across the four tissues. We identified tissue-specific exonic splicing enhancers that overlap with various previously published experimental and computational databases. A complicated exonic enhancer regulatory network was revealed, where multiple exonic enhancers were found across multiple tissues while some were found only in specific tissues. Putative combinatorial exonic enhancers and silencers were discovered as well, which may be responsible for exon inclusion or exclusion across tissues. Some of the exonic enhancers are found to be co-occurring with multiple exonic silencers and vice versa, which demonstrates a complicated relationship between tissue-specific exonic enhancers and silencers.
Collapse
Affiliation(s)
- Eman Badr
- Department of Information Technology, Faculty of Computers and Information, Cairo University, Giza, Egypt
- * E-mail:
| | - Mahmoud ElHefnawi
- Center of Excellence for Advanced Sciences, Informatics and Systems Department, National Research Center, Cairo, Egypt
- Center for Informatics Science, Nile University, Sheikh Zayed City, Egypt
| | - Lenwood S. Heath
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
3
|
Badr E, Heath LS. CoSREM: a graph mining algorithm for the discovery of combinatorial splicing regulatory elements. BMC Bioinformatics 2015; 16:285. [PMID: 26337677 PMCID: PMC4559876 DOI: 10.1186/s12859-015-0698-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Accepted: 08/06/2015] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Alternative splicing (AS) is a post-transcriptional regulatory mechanism for gene expression regulation. Splicing decisions are affected by the combinatorial behavior of different splicing factors that bind to multiple binding sites in exons and introns. These binding sites are called splicing regulatory elements (SREs). Here we develop CoSREM (Combinatorial SRE Miner), a graph mining algorithm to discover combinatorial SREs in human exons. Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy. CoSREM is able to identify sets of SREs and is not limited to SRE pairs as are current approaches. RESULTS We identified 37 SRE sets that include both enhancer and silencer elements. We show that our results intersect with previous results, including some that are experimental. We also show that the SRE set GGGAGG and GAGGAC identified by CoSREM may play a role in exon skipping events in several tumor samples. We applied CoSREM to RNA-Seq data for multiple tissues to identify combinatorial SREs which may be responsible for exon inclusion or exclusion across tissues. CONCLUSION The new algorithm can identify different combinations of splicing enhancers and silencers without assuming a predefined size or limiting the algorithm to find only pairs of SREs. Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.
Collapse
Affiliation(s)
- Eman Badr
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA
| | - Lenwood S Heath
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA.
| |
Collapse
|
4
|
Badr E, Heath LS. Identifying splicing regulatory elements with de Bruijn graphs. J Comput Biol 2015; 21:880-97. [PMID: 25393830 DOI: 10.1089/cmb.2014.0183] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Splicing regulatory elements (SREs) are short, degenerate sequences on pre-mRNA molecules that enhance or inhibit the splicing process via the binding of splicing factors, proteins that regulate the functioning of the spliceosome. Existing methods for identifying SREs in a genome are either experimental or computational. Here, we propose a formalism based on de Bruijn graphs that combines genomic structure, word count enrichment analysis, and experimental evidence to identify SREs found in exons. In our approach, SREs are not restricted to a fixed length (i.e., k-mers, for a fixed k). As a result, we identify 2001 putative exonic enhancers and 3080 putative exonic silencers for human genes, with lengths varying from 6 to 15 nucleotides. Many of the predicted SREs overlap with experimentally verified binding sites. Our model provides a novel method to predict variable length putative regulatory elements computationally for further experimental investigation.
Collapse
Affiliation(s)
- Eman Badr
- Department of Computer Science, Virginia Tech , Blacksburg, Virginia
| | | |
Collapse
|
5
|
Lo C, Kakaradov B, Lokshtanov D, Boucher C. SeeSite: Characterizing Relationships between Splice Junctions and Splicing Enhancers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:648-656. [PMID: 26356335 DOI: 10.1109/tcbb.2014.2304294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
RNA splicing is a cellular process driven by the interaction between numerous regulatory sequences and binding sites, however, such interactions have been primarily explored by laboratory methods since computational tools largely ignore the relationship between different splicing elements. Current computational methods identify either splice sites or other regulatory sequences, such as enhancers and silencers. We present a novel approach for characterizing co-occurring relationships between splice site motifs and splicing enhancers. Our approach relies on an efficient algorithm for approximately solving Consensus Sequence with Outliers , an NP-complete string clustering problem. In particular, we give an algorithm for this problem that outputs near-optimal solutions in polynomial time. To our knowledge, this is the first formulation and computational attempt for detecting co-occurring sequence elements in RNA sequence data. Further, we demonstrate that SeeSite is capable of showing that certain ESEs are preferentially associated with weaker splice sites, and that there exists a co-occurrence relationship with splice site motifs.
Collapse
|
6
|
Brooks AN, Aspden JL, Podgornaia AI, Rio DC, Brenner SE. Identification and experimental validation of splicing regulatory elements in Drosophila melanogaster reveals functionally conserved splicing enhancers in metazoans. RNA (NEW YORK, N.Y.) 2011; 17:1884-94. [PMID: 21865603 PMCID: PMC3185920 DOI: 10.1261/rna.2696311] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Accepted: 07/08/2011] [Indexed: 05/22/2023]
Abstract
RNA sequence elements involved in the regulation of pre-mRNA splicing have previously been identified in vertebrate genomes by computational methods. Here, we apply such approaches to predict splicing regulatory elements in Drosophila melanogaster and compare them with elements previously found in the human, mouse, and pufferfish genomes. We identified 99 putative exonic splicing enhancers (ESEs) and 231 putative intronic splicing enhancers (ISEs) enriched near weak 5' and 3' splice sites of constitutively spliced introns, distinguishing between those found near short and long introns. We found that a significant proportion (58%) of fly enhancer sequences were previously reported in at least one of the vertebrates. Furthermore, 20% of putative fly ESEs were previously identified as ESEs in human, mouse, and pufferfish; while only two fly ISEs, CTCTCT and TTATAA, were identified as ISEs in all three vertebrate species. Several putative enhancer sequences are similar to characterized binding-site motifs for Drosophila and mammalian splicing regulators. To provide additional evidence for the function of putative ISEs, we separately identified 298 intronic hexamers significantly enriched within sequences phylogenetically conserved among 15 insect species. We found that 73 putative ISEs were among those enriched in conserved regions of the D. melanogaster genome. The functions of nine enhancer sequences were verified in a heterologous splicing reporter, demonstrating that these sequences are sufficient to enhance splicing in vivo. Taken together, these data identify a set of predicted positive-acting splicing regulatory motifs in the Drosophila genome and reveal regulatory sequences that are present in distant metazoan genomes.
Collapse
Affiliation(s)
- Angela N. Brooks
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | - Julie L. Aspden
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Integrative Genomics, University of California, Berkeley, California 94720, USA
| | - Anna I. Podgornaia
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | - Donald C. Rio
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Integrative Genomics, University of California, Berkeley, California 94720, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Corresponding author.E-mail .
| |
Collapse
|
7
|
Prasov L, Brown NL, Glaser T. A critical analysis of Atoh7 (Math5) mRNA splicing in the developing mouse retina. PLoS One 2010; 5:e12315. [PMID: 20808762 PMCID: PMC2927423 DOI: 10.1371/journal.pone.0012315] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Accepted: 06/25/2010] [Indexed: 01/22/2023] Open
Abstract
The Math5 (Atoh7) gene is transiently expressed during retinogenesis by progenitors exiting mitosis, and is essential for ganglion cell (RGC) development. Math5 contains a single exon, and its 1.7 kb mRNA encodes a 149-aa polypeptide. Mouse Math5 mutants have essentially no RGCs or optic nerves. Given the importance of this gene in retinal development, we thoroughly investigated the possibility of Math5 mRNA splicing by Northern blot, 3'RACE, RNase protection assays, and RT-PCR, using RNAs extracted from embryonic eyes and adult cerebellum, or transcribed in vitro from cDNA clones. Because Math5 mRNA contains an elevated G+C content, we used graded concentrations of betaine, an isostabilizing agent that disrupts secondary structure. Although approximately 10% of cerebellar Math5 RNAs are spliced, truncating the polypeptide, our results show few, if any, spliced Math5 transcripts exist in the developing retina (<1%). Rare deleted cDNAs do arise via RT-mediated RNA template switching in vitro, and are selectively amplified during PCR. These data differ starkly from a recent study (Kanadia and Cepko 2010), which concluded that the vast majority of Math5 and other bHLH transcripts are spliced to generate noncoding RNAs. Our findings clarify the architecture of the Math5 gene and its mechanism of action. These results have implications for all members of the bHLH gene family, for any gene that is alternatively spliced, and for the interpretation of all RT-PCR experiments.
Collapse
Affiliation(s)
- Lev Prasov
- Departments of Human Genetics and Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Nadean L. Brown
- Division of Developmental Biology, Department of Pediatrics and Ophthalmology, Cincinnati Children's Research Foundation, University of Cincinnati School of Medicine, Cincinnati, Ohio, United States of America
| | - Tom Glaser
- Departments of Human Genetics and Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
8
|
Searching for splicing motifs. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2008; 623:85-106. [PMID: 18380342 DOI: 10.1007/978-0-387-77374-2_6] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Intron removal during pre-mRNA splicing in higher eukaryotes requires the accurate identification of the two splice sites at the ends of the exons, or exon definition. The sequences constituting the splice sites provide insufficient information to distinguish true splice sites from the greater number of false splice sites that populate transcripts. Additional information used for exon recognition resides in a large number of positively or negatively acting elements that lie both within exons and in the adjacent introns. The identification of such sequence motifs has progressed rapidly in recent years, such that extensive lists are now available for exonic splicing enhancers and exonic splicing silencers. These motifs have been identified both by empirical experiments and by computational predictions, the validity of the latter being confirmed by experimental verification. Molecular searches have been carried out either by the selection of sequences that bind to splicing factors, or enhance or silence splicing in vitro or in vivo. Computational methods have focused on sequences of 6 or 8 nucleotides that are over- or under-represented in exons, compared to introns or transcripts that do not undergo splicing. These various methods have sought to provide global definitions of motifs, yet the motifs are distinctive to the method used for identification and display little overlap. Astonishingly, at least three-quarters of a typical mRNA would be comprised of these motifs. A present challenge lies in understanding how the cell integrates this surfeit of information to generate what is usually a binary splicing decision.
Collapse
|
9
|
Bechtel JM, Rajesh P, Ilikchyan I, Deng Y, Mishra PK, Wang Q, Wu X, Afonin KA, Grose WE, Wang Y, Khuder S, Fedorov A. Calculation of splicing potential from the Alternative Splicing Mutation Database. BMC Res Notes 2008; 1:4. [PMID: 18611287 PMCID: PMC2518266 DOI: 10.1186/1756-0500-1-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2008] [Accepted: 02/26/2008] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The Alternative Splicing Mutation Database (ASMD) presents a collection of all known mutations inside human exons which affect splicing enhancers and silencers and cause changes in the alternative splicing pattern of the corresponding genes. FINDINGS An algorithm was developed to derive a Splicing Potential (SP) table from the ASMD information. This table characterizes the influence of each oligonucleotide on the splicing effectiveness of the exon containing it. If the SP value for an oligonucleotide is positive, it promotes exon retention, while negative SP values mean the sequence favors exon skipping. The merit of the SP approach is the ability to separate splicing signals from a wide range of sequence motifs enriched in exonic sequences that are attributed to protein-coding properties and/or translation efficiency. Due to its direct derivation from observed splice site selection, SP has an advantage over other computational approaches for predicting alternative splicing. CONCLUSION We show that a vast majority of known exonic splicing enhancers have highly positive cumulative SP values, while known splicing silencers have core motifs with strongly negative cumulative SP values. Our approach allows for computation of the cumulative SP value of any sequence segment and, thus, gives researchers the ability to measure the possible contribution of any sequence to the pattern of splicing.
Collapse
Affiliation(s)
- Jason M Bechtel
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Health Science Campus, Toledo, Ohio 43614, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Bechtel JM, Rajesh P, Ilikchyan I, Deng Y, Mishra PK, Wang Q, Wu X, Afonin KA, Grose WE, Wang Y, Khuder S, Fedorov A. The Alternative Splicing Mutation Database: a hub for investigations of alternative splicing using mutational evidence. BMC Res Notes 2008; 1:3. [PMID: 18611286 PMCID: PMC2518265 DOI: 10.1186/1756-0500-1-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2008] [Accepted: 02/26/2008] [Indexed: 11/22/2022] Open
Abstract
Background Some mutations in the internal regions of exons occur within splicing enhancers and silencers, influencing the pattern of alternative splicing in the corresponding genes. To understand how these sequence changes affect splicing, we created a database of these mutations. Findings The Alternative Splicing Mutation Database (ASMD) serves as a repository for all exonic mutations not associated with splicing junctions that measurably change the pattern of alternative splicing. In this initial published release (version 1.2), only human sequences are present, but the ASMD will grow to include other organisms, (see Availability and requirements section for the ASMD web address). This relational database allows users to investigate connections between mutations and features of the surrounding sequences, including flanking sequences, RNA secondary structures and strengths of splice junctions. Splicing effects of the mutations are quantified by the relative presence of alternative mRNA isoforms with and without a given mutation. This measure is further categorized by the accuracy of the experimental methods employed. The database currently contains 170 mutations in 66 exons, yet these numbers increase regularly. We developed an algorithm to derive a table of oligonucleotide Splicing Potential (SP) values from the ASMD dataset. We present the SP concept and tools in detail in our corresponding article. Conclusion The current data set demonstrates that mutations affecting splicing are located throughout exons and might be enriched within local RNA secondary structures. Exons from the ASMD have below average splicing junction strength scores, but the difference is small and is judged not to be significant.
Collapse
Affiliation(s)
- Jason M Bechtel
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Health Science Campus, Toledo, Ohio 43614, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Sada A, Katayama Y, Yamamoto K, Okuyama S, Nakata H, Shimada H, Oshimi K, Mori M, Matsui T. A multicenter analysis of the FIP1L1-αPDGFR fusion gene in Japanese idiopathic hypereosinophilic syndrome: an aberrant splicing skipping the αPDGFR exon 12. Ann Hematol 2007; 86:855-63. [PMID: 17701174 DOI: 10.1007/s00277-007-0357-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2007] [Accepted: 07/20/2007] [Indexed: 10/23/2022]
Abstract
To study the clinical characteristics of hypereosionophilic syndrome and chronic eosinophilic leukemia (HES/CEL) in Japan, the clinical data of 29 HES/CEL patients throughout the country were surveyed. Moreover, the involvement of the FIP1L1-alphaPDGFR fusion gene resulting from a cryptic del (4)(q12q12) was examined in 24 cases. The FIP1L1-alphaPDGFR messenger RNA (mRNA) was detected in three patients (13% of patients fulfilled WHO criteria and 17% of Chusid criteria). One had a novel fusion transcript, which skipped the exon 12 of alphaPDGFR. The transcript appears to be generated by a splicing mechanism that is different from the previously reported splicing patterns. In silico analysis, the exon skipping was not related to a disruption of the exonic splicing enhancers within the exon but strongly associated with the loss of the vast majority of the FIP1L intron 8a where intronic splicing enhancers were accumulated. Unexpectedly, pseudo-chimera DNA fragments with some shared characteristic features were occasionally generated from healthy control samples by reverse transcriptase polymerase chain reaction (RT-PCR). Considering the relatively low incidence of the FIP1L1-alphaPDGFR transcript positive case, extreme care must therefore be taken when making a diagnosis using RT-PCR before imatinib therapy.
Collapse
Affiliation(s)
- Akiko Sada
- Hematology/Oncology, Department of Medicine, Kobe University Graduate School of Medicine, 7-5-1, Kusunoki-cho, Chuo-ku, Kobe 650-0017, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Voelker RB, Berglund JA. A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing. Genes Dev 2007; 17:1023-33. [PMID: 17525134 PMCID: PMC1899113 DOI: 10.1101/gr.6017807] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2006] [Accepted: 04/12/2007] [Indexed: 11/24/2022]
Abstract
Orthologous mammalian introns contain many highly conserved sequences. Of these sequences, many are likely to represent protein binding sites that are under strong positive selection. In order to identify conserved protein binding sites that are important for splicing, we analyzed the composition of intronic sequences that are conserved between human and six eutherian mammals. We focused on all completely conserved sequences of seven or more nucleotides located in the regions adjacent to splice-junctions. We found that these conserved intronic sequences are enriched in specific motifs, and that many of these motifs are statistically associated with either alternative or constitutive splicing. In validation of our methods, we identified several motifs that are known to play important roles in alternative splicing. In addition, we identified several novel motifs containing GCT that are abundant and are associated with alternative splicing. Furthermore, we demonstrate that, for some of these motifs, conservation is a strong indicator of potential functionality since conserved instances are associated with alternative splicing while nonconserved instances are not. A surprising outcome of this analysis was the identification of a large number of AT-rich motifs that are strongly associated with constitutive splicing. Many of these appear to be novel and may represent conserved intronic splicing enhancers (ISEs). Together these data show that conservation provides important insights into the identification and possible roles of cis-acting intronic sequences important for alternative and constitutive splicing.
Collapse
Affiliation(s)
- Rodger B. Voelker
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon 97403, USA
| | - J. Andrew Berglund
- Institute of Molecular Biology, University of Oregon, Eugene, Oregon 97403, USA
| |
Collapse
|
13
|
Piva F, Principato G. Possible role of nucleotide correlations between human exon junctions. Gene 2007; 393:81-6. [PMID: 17350768 DOI: 10.1016/j.gene.2007.01.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2006] [Revised: 01/19/2007] [Accepted: 01/22/2007] [Indexed: 11/19/2022]
Abstract
There is ample evidence that prediction of human splice sites can be refined by analyzing the nucleotides surrounding splice sites. This could mean that exon nucleotides over splice sites harbour information for the splicing process in addition to the coding information to specify aminoacids. We analyzed the correlations among the nucleotides lying at the end and at the beginning of all the consecutive human exons to seek relationships among the nucleotides. We have divided the sequences taking into account the phase of interruption. Even though exon sequences are involved in the coding function, we found phase-dependent, specific correlations in the area of exon junctions. These regularities do not give rise to specific motifs, but rather to a phase-specific nucleotide context that could contribute to define the splice site or aid the splicing machinery to join the exon ends. Results provide further evidence that accurate selection of human splice sites likely requires the contribution of exon regulatory sequences.
Collapse
Affiliation(s)
- Francesco Piva
- Istituto di Biologia e Genetica, Università Politecnica delle Marche, Via Brecce Bianche, Monte D'Ago, 60131 Ancona, Italy.
| | | |
Collapse
|
14
|
Goren A, Ram O, Amit M, Keren H, Lev-Maor G, Vig I, Pupko T, Ast G. Comparative analysis identifies exonic splicing regulatory sequences--The complex definition of enhancers and silencers. Mol Cell 2006; 22:769-781. [PMID: 16793546 DOI: 10.1016/j.molcel.2006.05.008] [Citation(s) in RCA: 239] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2005] [Revised: 04/06/2006] [Accepted: 05/03/2006] [Indexed: 12/11/2022]
Abstract
Exonic splicing regulatory sequences (ESRs) are cis-acting factor binding sites that regulate constitutive and alternative splicing. A computational method based on the conservation level of wobble positions and the overabundance of sequence motifs between 46,103 human and mouse orthologous exons was developed, identifying 285 putative ESRs. Alternatively spliced exons that are either short in length or contain weak splice sites show the highest conservation level of those ESRs, especially toward the edges of exons. ESRs that are abundant in those subgroups show a different distribution between constitutively and alternatively spliced exons. Representatives of these ESRs and two SR protein binding sites were shown, experimentally, to display variable regulatory effects on alternative splicing, depending on their relative locations in the exon. This finding signifies the delicate positional effect of ESRs on alternative splicing regulation.
Collapse
Affiliation(s)
- Amir Goren
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Ram
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Maayan Amit
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Hadas Keren
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Galit Lev-Maor
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ida Vig
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Ast
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| |
Collapse
|
15
|
Abstract
In addition to protein-coding information, mRNAs harbor regulatory sequences necessary for appropriate processing of their precursors. Goren et al. (2006) and Wang et al. (2006) explore the diversity of these signals and the rules by which they function.
Collapse
Affiliation(s)
- Roderic Guigó
- Centre de Regulació Genòmica, Passeig Marítim 37-49, 08003 Barcelona, Spain
| | | |
Collapse
|
16
|
Matlin AJ, Clark F, Smith CWJ. Understanding alternative splicing: towards a cellular code. Nat Rev Mol Cell Biol 2005; 6:386-98. [PMID: 15956978 DOI: 10.1038/nrm1645] [Citation(s) in RCA: 955] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In violation of the 'one gene, one polypeptide' rule, alternative splicing allows individual genes to produce multiple protein isoforms - thereby playing a central part in generating complex proteomes. Alternative splicing also has a largely hidden function in quantitative gene control, by targeting RNAs for nonsense-mediated decay. Traditional gene-by-gene investigations of alternative splicing mechanisms are now being complemented by global approaches. These promise to reveal details of the nature and operation of cellular codes that are constituted by combinations of regulatory elements in pre-mRNA substrates and by cellular complements of splicing regulators, which together determine regulated splicing pathways.
Collapse
Affiliation(s)
- Arianne J Matlin
- Department of Biochemistry, 80 Tennis Court Road, University of Cambridge, CB2 1GA, UK
| | | | | |
Collapse
|
17
|
Webb CJ, Romfo CM, van Heeckeren WJ, Wise JA. Exonic splicing enhancers in fission yeast: functional conservation demonstrates an early evolutionary origin. Genes Dev 2005; 19:242-54. [PMID: 15625190 PMCID: PMC545887 DOI: 10.1101/gad.1265905] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2004] [Accepted: 11/11/2004] [Indexed: 12/17/2022]
Abstract
Discrete sequence elements known as exonic splicing enhancers (ESEs) have been shown to influence both the efficiency of splicing and the profile of mature mRNAs in multicellular eukaryotes. While the existence of ESEs has not been demonstrated previously in unicellular eukaryotes, the factors known to recognize these elements and mediate their communication with the core splicing machinery are conserved and essential in the fission yeast Schizosaccharomyces pombe. Here, we provide evidence that ESE function is conserved through evolution by demonstrating that three exonic splicing enhancers derived from vertebrates (chicken ASLV, mouse IgM, and human cTNT) promote splicing of two distinct S. pombe pre-messenger RNAs (pre-mRNAs). Second, as in extracts from mammalian cells, ESE function in S. pombe is compromised by mutations and increased distance from the 3'-splice site. Third, three-hybrid analyses indicate that the essential SR (serine/arginine-rich) protein Srp2p, but not the dispensable Srp1p, binds specifically to both native and heterologous purine-rich elements; thus, Srp2p is the likely mediator of ESE function in fission yeast. Finally, we have identified five natural purine-rich elements from S. pombe that promote splicing of our reporter pre-mRNAs. Taken together, these results provide strong evidence that the genesis of ESE-mediated splicing occurred early in eukaryotic evolution.
Collapse
Affiliation(s)
- Christopher J Webb
- School of Medicine, Department of Molecular Biology and Microbiology, Case Western Reserve University, Cleveland, OH 44106-4960, USA
| | | | | | | |
Collapse
|
18
|
Pozzoli U, Riva L, Menozzi G, Cagliani R, Comi GP, Bresolin N, Giorda R, Sironi M. Over-representation of exonic splicing enhancers in human intronless genes suggests multiple functions in mRNA processing. Biochem Biophys Res Commun 2004; 322:470-6. [PMID: 15325254 DOI: 10.1016/j.bbrc.2004.07.144] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2004] [Indexed: 11/24/2022]
Abstract
The human transcriptome is constituted of a great majority of intron-containing and a minority of intron-lacking mRNAs; given the different processing these transcripts undergo, they are expected to carry, intermingled with coding properties, very different editing information. Here we applied a computational approach to compare intronless and intron-containing coding sequences. Hexamer composition comparison allowed the definition of over- and under-represented motifs in intronless genes; surprisingly, experimental testing revealed that intron-lacking coding sequences are enriched rather than depleted in elements with splicing enhancement ability. Similarly, we show evidence that intronless transcripts display a significantly higher frequency of both shuttling and non-shuttling SR protein binding sites compared to intron-containing sequences. These observations suggest that SR proteins (and possibly other splicing factors) play a role in cellular processes distinct from splicing.
Collapse
Affiliation(s)
- Uberto Pozzoli
- Scientific Institute IRCCS E. Medea, Associazione La Nostra Famiglia, 23842 Bosisio Parini (LC), Italy.
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Zhang XHF, Chasin LA. Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev 2004; 18:1241-50. [PMID: 15145827 PMCID: PMC420350 DOI: 10.1101/gad.1195304] [Citation(s) in RCA: 338] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2004] [Accepted: 04/09/2004] [Indexed: 12/23/2022]
Abstract
We have searched for sequence motifs that contribute to the recognition of human pre-mRNA splice sites by comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5' untranslated regions (5' untranslated regions [UTRs]) of transcripts of intronless genes. This type of comparison avoids the isolation of sequences that are distinguished by their protein-coding information. We classified sequence families comprising 2069 putative exonic enhancers and 974 putative exonic silencers. Representatives of each class functioned as enhancers or silencers when inserted into a test exon and assayed in transfected mammalian cells. As a class, the enhancer sequencers were more prevalent and the silencer elements less prevalent in all exons compared with introns. A survey of 58 reported exonic splicing mutations showed good agreement between the splicing phenotype and the effect of the mutation on the motifs defined here. The large number of effective sequences implied by these results suggests that sequences that influence splicing may be very abundant in pre-mRNA.
Collapse
Affiliation(s)
- Xiang H-F Zhang
- Department of Biological Sciences, MC2433, Columbia University, New York, New York 10027, USA
| | | |
Collapse
|
20
|
Sironi M, Menozzi G, Riva L, Cagliani R, Comi GP, Bresolin N, Giorda R, Pozzoli U. Silencer elements as possible inhibitors of pseudoexon splicing. Nucleic Acids Res 2004; 32:1783-91. [PMID: 15034146 PMCID: PMC390338 DOI: 10.1093/nar/gkh341] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Human pre-mRNAs contain a definite number of exons and several pseudoexons which are located within intronic regions. We applied a computational approach to address the question of how pseudoexons are neglected in favor of exons and to possibly identify sequence elements preventing pseudoexon splicing. A search for possible splicing silencers was carried out on a pseudoexon selection that resembled exons in terms of splice site strength and exon splicing enhancer (ESE) representation; three motifs were retrieved through hexamer composition comparisons. One of these functions as a powerful silencer in transfection-based splicing assays and matches a previously identified silencer sequence with hnRNP H binding ability. The other two motifs are novel and failed to induce skipping of a constitutive exon, indicating that they might act as weak repressors or in synergy with other unidentified elements. All three motifs are enriched in pseudoexons compared with intronic regions and display higher frequencies in intronless gene-coding sequences compared with exons. We consider that a subpopulation of pseudoexons might rely on negative regulators for splicing repression; this hypothesis, if experimentally verified, might improve our understanding of exonic splicing regulatory sequences and provide the identification of a novel mutation target for human genetic diseases.
Collapse
Affiliation(s)
- Manuela Sironi
- IRCCS E. Medea, Associazione La Nostra Famiglia, 23842 Bosisio Parini, LC, Italy.
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Zhang L, Luo L. Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res 2003; 31:6214-20. [PMID: 14576308 PMCID: PMC275452 DOI: 10.1093/nar/gkg805] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Based on the conservation of nucleotides at splicing sites and the features of base composition and base correlation around these sites we use the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to study the dependence structure of splicing sites and predict the exons/introns and their boundaries for four model genomes: Caenorhabditis elegans, Arabidopsis thaliana, Drosophila melanogaster and human. The comparison of compositional features between two sequences and the comparison of base dependencies at adjacent or non-adjacent positions of two sequences can be integrated automatically in the increment of diversity (ID). Eight feature variables around a potential splice site are defined in terms of ID. They are integrated in a single formal framework given by IDQD. In our calculations 7 (8) base region around the donor (acceptor) sites have been considered in studying the conservation of nucleotides and sequences of 48 bp on either side of splice sites have been used in studying the compositional and base-correlating features. The windows are enlarged to 16 (donor), 29 (acceptor) and 80 bp (either side) to improve the prediction for human splice sites. The prediction capability of the present method is comparable with the leading splice site detector--GeneSplicer.
Collapse
Affiliation(s)
- Lirong Zhang
- Laboratory of Theoretical Biophysics, Faculty of Science and Technology, Inner Mongolia University, Hohhot, 010021 China
| | | |
Collapse
|
22
|
Fedorov A, Saxonov S, Gilbert W. Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res 2002; 30:1192-7. [PMID: 11861911 PMCID: PMC101244 DOI: 10.1093/nar/30.5.1192] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Nucleotides surrounding a codon influence the choice of this particular codon from among the group of possible synonymous codons. The strongest influence on codon usage arises from the nucleotide immediately following the codon and is known as the N1 context. We studied the relative abundance of codons with N1 contexts in genes from four eukaryotes for which the entire genomes have been sequenced: Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana. For all the studied organisms it was found that 90% of the codons have a statistically significant N1 context-dependent codon bias. The relative abundance of each codon with an N1 context was compared with the relative abundance of the same 4mer oligonucleotide in the whole genome. This comparison showed that in about half of all cases the context-dependent codon bias could not be explained by the sequence composition of the genome. Ranking statistics were applied to compare context-dependent codon biases for codons from different synonymous groups. We found regularities in N1 context-dependent codon bias with respect to the codon nucleotide composition. Codons with the same nucleotides in the second and third positions and the same N1 context have a statistically significant correlation of their relative abundances.
Collapse
Affiliation(s)
- Alexei Fedorov
- Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
23
|
Abstract
Alternative splicing of pre-mRNAs is central to the generation of diversity from the relatively small number of genes in metazoan genomes. Auxiliary cis elements and trans-acting factors are required for the recognition of constitutive and alternatively spliced exons and their inclusion in pre-mRNA. Here, we discuss the regulatory elements that direct alternative splicing and how genome-wide analyses can aid in their identification.
Collapse
Affiliation(s)
- Andrea N Ladd
- Department of Pathology, Baylor College of Medicine, Houston, TX 77030, USA.
| | | |
Collapse
|