1
|
Cui Y, Zhang C, Cai M. Prediction and feature analysis of intron retention events in plant genome. Comput Biol Chem 2017; 68:219-223. [PMID: 28419974 DOI: 10.1016/j.compbiolchem.2017.04.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 02/07/2017] [Accepted: 04/11/2017] [Indexed: 12/27/2022]
Abstract
Alternative splicing (AS) is a major contributor to increase the potential informational content of eukaryotic genomes by creating multiple mRNA species and proteins from a single gene. In plants, up to 60% genes are alternatively spliced and the most common type of AS is intron retention (IR). Genomic analyses of IR have illuminated its crucial role in shaping the evolution of genomes, in the control of developmental processes, and in the dynamic regulation of the transcriptome to influence phenotype. To explore the relationship between the sequence feature and the formation mechanism of IR, we statistically analyzed the retained introns and proposed an improved random forest-based hybrid method to predict intron retention events in plant genome. The results indicate that IR has significant relationship with individual introns which have weaker 5' splice sites, lower GC content and less termination codon occurrence. By the method we proposed, 93.48% retained introns can be correctly distinguished from constitutive introns. Strikingly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.
Collapse
Affiliation(s)
- Ying Cui
- School of Mechano-Electronic Engineering, Xidian University, Xi'an 710071, China; Center for Polymer Studies and Department of Physics, Boston University, Boston, MA 02215, USA
| | - Chao Zhang
- School of Mechano-Electronic Engineering, Xidian University, Xi'an 710071, China.
| | - Meng Cai
- School of Economics and Management, Xidian University, Xi'an 710071, China; Center for Polymer Studies and Department of Physics, Boston University, Boston, MA 02215, USA
| |
Collapse
|
2
|
Lee A, Khiabanian H, Kugelman J, Elliott O, Nagle E, Yu GY, Warren T, Palacios G, Rabadan R. Transcriptome reconstruction and annotation of cynomolgus and African green monkey. BMC Genomics 2014; 15:846. [PMID: 25277458 PMCID: PMC4194418 DOI: 10.1186/1471-2164-15-846] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Accepted: 09/25/2014] [Indexed: 11/10/2022] Open
Abstract
Background Non-human primates (NHPs) and humans share major biological mechanisms, functions, and responses due to their close evolutionary relationship and, as such, provide ideal animal models to study human diseases. RNA expression in NHPs provides specific signatures that are informative of disease mechanisms and therapeutic modes of action. Unlike the human transcriptome, the transcriptomes of major NHP animal models are yet to be comprehensively annotated. Results In this manuscript, employing deep RNA sequencing of seven tissue samples, we characterize the transcriptomes of two commonly used NHP animal models: Cynomolgus macaque (Macaca fascicularis) and African green monkey (Chlorocebus aethiops). We present the Multi-Species Annotation (MSA) pipeline that leverages well-annotated primate species and annotates 99.8% of reconstructed transcripts. We elucidate tissue-specific expression profiles and report 13 experimentally validated novel transcripts in these NHP animal models. Conclusion We report comprehensively annotated transcriptomes of two non-human primates, which we have made publically available on a customized UCSC Genome Browser interface. The MSA pipeline is also freely available. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-846) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Raul Rabadan
- Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, New York, NY 10032, USA.
| |
Collapse
|
3
|
Lu J, Li C, Shi C, Balducci J, Huang H, Ji HL, Chang Y, Huang Y. Identification of novel splice variants and exons of human endothelial cell-specific chemotaxic regulator (ECSCR) by bioinformatics analysis. Comput Biol Chem 2012; 41:41-50. [PMID: 23147565 DOI: 10.1016/j.compbiolchem.2012.10.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2012] [Revised: 10/10/2012] [Accepted: 10/11/2012] [Indexed: 01/01/2023]
Abstract
Recent discovery of biological function of endothelial cell-specific chemotaxic regulator (ECSCR), previously known as endothelial cell-specific molecule 2 (ECSM2), in modulating endothelial cell migration, apoptosis, and angiogenesis, has made it an attractive molecule in vascular research. Thus, identification of splice variants of ECSCR could provide new strategies for better understanding its roles in health and disease. In this study, we performed a series of blast searches on the human EST database with known ECSCR cDNA sequence (Variant 1), and identified additional three splice variants (Variants 2-4). When examining the ECSCR gene in the human genome assemblies, we found a large unknown region between Exons 9 and 11. By PCR amplification and sequencing, we partially mapped Exon 10 within this previously unknown region of the ECSCR gene. Taken together, in addition to previously reported human ECSCR, we identified three novel full-length splice variants potentially encoding different protein isoforms. We further defined a total of twelve exons and nearly all exon-intron boundaries of the gene, of which only eight are annotated in current public databases. Our work provides new information on gene structure and alternative splicing of the human ECSCR, which may imply its functional complexity. This undoubtedly opens new opportunities for future investigation of the biological and pathological significance of these ECSCR splice variants.
Collapse
Affiliation(s)
- Jia Lu
- Department of Obstetrics and Gynecology, Barrow Neurological Institute, St Joseph's Hospital and Medical Center, Phoenix, AZ 85013, USA
| | | | | | | | | | | | | | | |
Collapse
|
4
|
de Lima Morais DA, Harrison PM. Large-scale evidence for conservation of NMD candidature across mammals. PLoS One 2010; 5:e11695. [PMID: 20657786 PMCID: PMC2908137 DOI: 10.1371/journal.pone.0011695] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 06/24/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Alternatively-spliced (AS) forms can vary protein function, intracellular localization and post-translational modifications. AS coupled with mRNA nonsense-mediated decay (NMD) can also control the transcript abundance. Here, we have investigated the genome-scale conservation of alternatively-spliced NMD candidates (AS-NMD candidates), in mammals. METHODOLOGY/PRINCIPAL FINDINGS We mapped>12 million cDNA/EST library transcripts, comprising pooled data from both older and next-generation sequencing techniques, against genomic sequences to annotate AS-NMD candidates generated by in-frame premature termination codons (PTCs), in the human, mouse, rat and cow genomes. In these genomes, we found populations of genes that harbour AS-NMD candidates, varying in number from approximately 149 to 2,051 genes. We discovered that a highly-significant proportion (27%-35%) of AS-NMD candidate genes in mouse, rat and cow, also have human orthologs targeted for NMD. Intron retention was the most abundant type of AS-NMD, ranging from 43% to 67% of genes harbouring an AS-NMD candidate. Groupings of AS-NMD candidate genes either with or without intron retentions also have highly significant AS-NMD conservation, indicating that the trend is not due primarily to conservation of intron retentions. As a subset, the AS-NMD intron retentions are distinguished from non-retained introns by higher GC content, and codon usage similar to the usage in protein-coding sequences. This indicates that most of these alternatively spliced sequences have coded for proteins in the recent evolutionary past. In general, the AS-NMD candidate genes showed a similar pattern of Gene Ontology functional category enrichments in all four species. Genes linked to nucleic-acid interaction and apoptosis, and involved in pathways linked with cancer, were the most common. Finally, we mapped the AS-NMD candidates to mass spectrometry-derived proteomics data, and gathered evidence of truncated polypeptides for at least 10% of all human AS-NMD candidate transcripts. CONCLUSIONS/SIGNIFICANCE In summary, our analysis provides strong statistical evidence for conservation of functional AS-NMD candidature across Mammalia for a large subset of genes. However, because codon usage of AS-NMD intron retentions is similar to the usage in exons, it is difficult to de-couple conservation of AS-NMD-based regulation from conservation for protein-coding ability, for intron retentions.
Collapse
Affiliation(s)
| | - Paul M. Harrison
- Department of Biology, McGill University, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
5
|
Hiller M, Findeiss S, Lein S, Marz M, Nickel C, Rose D, Schulz C, Backofen R, Prohaska SJ, Reuter G, Stadler PF. Conserved introns reveal novel transcripts in Drosophila melanogaster. Genome Res 2009; 19:1289-300. [PMID: 19458021 DOI: 10.1101/gr.090050.108] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Noncoding RNAs that are-like mRNAs-spliced, capped, and polyadenylated have important functions in cellular processes. The inventory of these mRNA-like noncoding RNAs (mlncRNAs), however, is incomplete even in well-studied organisms, and so far, no computational methods exist to predict such RNAs from genomic sequences only. The subclass of these transcripts that is evolutionarily conserved usually has conserved intron positions. We demonstrate here that a genome-wide comparative genomics approach searching for short conserved introns is capable of identifying conserved transcripts with a high specificity. Our approach requires neither an open reading frame nor substantial sequence or secondary structure conservation in the surrounding exons. Thus it identifies spliced transcripts in an unbiased way. After applying our approach to insect genomes, we predict 369 introns outside annotated coding transcripts, of which 131 are confirmed by expressed sequence tags (ESTs) and/or noncoding FlyBase transcripts. Of the remaining 238 novel introns, about half are associated with protein-coding genes-either extending coding or untranslated regions or likely belonging to unannotated coding genes. The remaining 129 introns belong to novel mlncRNAs that are largely unstructured. Using RT-PCR, we verified seven of 12 tested introns in novel mlncRNAs and 11 of 17 introns in novel coding genes. The expression level of all verified mlncRNA transcripts is low but varies during development, which suggests regulation. As conserved introns indicate both purifying selection on the exon-intron structure and conserved expression of the transcript in related species, the novel mlncRNAs are good candidates for functional transcripts.
Collapse
Affiliation(s)
- Michael Hiller
- Bioinformatics Group, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Sinha R, Hiller M, Pudimat R, Gausmann U, Platzer M, Backofen R. Improved identification of conserved cassette exons using Bayesian networks. BMC Bioinformatics 2008; 9:477. [PMID: 19014490 PMCID: PMC2621368 DOI: 10.1186/1471-2105-9-477] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Accepted: 11/12/2008] [Indexed: 12/14/2022] Open
Abstract
Background Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many species genomic data is being produced at a far greater rate than corresponding transcript data, hence in silico methods of predicting alternative splicing have to be improved. Results Here, we show that the use of Bayesian networks (BNs) allows accurate prediction of evolutionary conserved exon skipping events. At a stringent false positive rate of 0.5%, our BN achieves an improved true positive rate of 61%, compared to a previously reported 50% on the same dataset using support vector machines (SVMs). Incorporating several novel discriminative features such as intronic splicing regulatory elements leads to the improvement. Features related to mRNA secondary structure increase the prediction performance, corroborating previous findings that secondary structures are important for exon recognition. Random labelling tests rule out overfitting. Cross-validation on another dataset confirms the increased performance. When using the same dataset and the same set of features, the BN matches the performance of an SVM in earlier literature. Remarkably, we could show that about half of the exons which are labelled constitutive but receive a high probability of being alternative by the BN, are in fact alternative exons according to the latest EST data. Finally, we predict exon skipping without using conservation-based features, and achieve a true positive rate of 29% at a false positive rate of 0.5%. Conclusion BNs can be used to achieve accurate identification of alternative exons and provide clues about possible dependencies between relevant features. The near-identical performance of the BN and SVM when using the same features shows that good classification depends more on features than on the choice of classifier. Conservation based features continue to be the most informative, and hence distinguishing alternative exons from constitutive ones without using conservation based features remains a challenging problem.
Collapse
Affiliation(s)
- Rileen Sinha
- Genome Analysis, Leibniz Institute for Age Research, Fritz Lipmann Institute, Jena, Germany.
| | | | | | | | | | | |
Collapse
|
7
|
Hiller M, Platzer M. Widespread and subtle: alternative splicing at short-distance tandem sites. Trends Genet 2008; 24:246-55. [PMID: 18394746 DOI: 10.1016/j.tig.2008.03.003] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2008] [Revised: 03/05/2008] [Accepted: 03/06/2008] [Indexed: 12/11/2022]
Abstract
Alternative splicing at donor or acceptor sites located just a few nucleotides apart is widespread in many species. It results in subtle changes in the transcripts and often in the encoded proteins. Several of these tandem splice events contribute to the repertoire of functionally different proteins, whereas many are neutral or deleterious. Remarkably, some of the functional events are differentially spliced in tissues or developmental stages, whereas others exhibit constant splicing ratios, indicating that function is not always associated with differential splicing. Stochastic splice site selection seems to play a major role in these processes. Here, we review recent progress in understanding functional and evolutionary aspects as well as the mechanism of splicing at short-distance tandem sites.
Collapse
Affiliation(s)
- Michael Hiller
- Bioinformatics Group, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany.
| | | |
Collapse
|
8
|
Hiller M, Szafranski K, Huse K, Backofen R, Platzer M. Selection against tandem splice sites affecting structured protein regions. BMC Evol Biol 2008; 8:89. [PMID: 18366714 PMCID: PMC2279118 DOI: 10.1186/1471-2148-8-89] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2007] [Accepted: 03/21/2008] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Alternative selection of splice sites in tandem donors and acceptors is a major mode of alternative splicing. Here, we analyzed whether in-frame tandem sites leading to subtle mRNA insertions/deletions of 3, 6, or 9 nucleotides are under natural selection. RESULTS We found multiple lines of evidence that the human protein coding sequences are under selection against such in-frame tandem splice events, indicating that these events are often deleterious. The strength of selection is not homogeneous within the coding sequence as protein regions that fold into a fixed 3D structure (intrinsically ordered) are under stronger selection, especially against sites with a strong minor splice site. Investigating structures of functional protein domains, we found that tandem acceptors are preferentially located at the domain surface and outside structural elements such as helices and sheets. Using three-species comparisons, we estimate that more than half of all mutations that create NAGNAG acceptors in the coding region have been eliminated by selection. CONCLUSION We estimate that ~2,400 introns are under selection against possessing a tandem site.
Collapse
Affiliation(s)
- Michael Hiller
- Bioinformatics Group, Albert-Ludwigs-University Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany.
| | | | | | | | | |
Collapse
|
9
|
Holste D, Ohler U. Strategies for identifying RNA splicing regulatory motifs and predicting alternative splicing events. PLoS Comput Biol 2008; 4:e21. [PMID: 18225947 PMCID: PMC2217580 DOI: 10.1371/journal.pcbi.0040021] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
- Dirk Holste
- * To whom correspondence should be addressed. E-mail: (UO), (DH)
| | - Uwe Ohler
- * To whom correspondence should be addressed. E-mail: (UO), (DH)
| |
Collapse
|
10
|
Kurmangaliyev YZ, Gelfand MS. Computational analysis of splicing errors and mutations in human transcripts. BMC Genomics 2008; 9:13. [PMID: 18194514 PMCID: PMC2234086 DOI: 10.1186/1471-2164-9-13] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2007] [Accepted: 01/14/2008] [Indexed: 01/10/2023] Open
Abstract
Background Most retained introns found in human cDNAs generated by high-throughput sequencing projects seem to result from underspliced transcripts, and thus they capture intermediate steps of pre-mRNA splicing. On the other hand, mutations in splice sites cause exon skipping of the respective exon or activation of pre-existing cryptic sites. Both types of events reflect properties of the splicing mechanism. Results The retained introns were significantly shorter than constitutive ones, and skipped exons are shorter than exons with cryptic sites. Both donor and acceptor splice sites of retained introns were weaker than splice sites of constitutive introns. The authentic acceptor sites affected by mutations were significantly weaker in exons with activated cryptic sites than in skipped exons. The distance from a mutated splice site to the nearest equivalent site is significantly shorter in cases of activated cryptic sites compared to exon skipping events. The prevalence of retained introns within genes monotonically increased in the 5'-to-3' direction (more retained introns close to the 3'-end), consistent with the model of co-transcriptional splicing. The density of exonic splicing enhancers was higher, and the density of exonic splicing silencers lower in retained introns compared to constitutive ones and in exons with cryptic sites compared to skipped exons. Conclusion Thus the analysis of retained introns in human cDNA, exons skipped due to mutations in splice sites and exons with cryptic sites produced results consistent with the intron definition mechanism of splicing of short introns, co-transcriptional splicing, dependence of splicing efficiency on the splice site strength and the density of candidate exonic splicing enhancers and silencers. These results are consistent with other, recently published analyses.
Collapse
Affiliation(s)
- Yerbol Z Kurmangaliyev
- Institute for Information Transmission Problems (the Kharkevich Institute) RAS, Bolshoi Karetny pereulok 19, Moscow, 127994, Russia.
| | | |
Collapse
|
11
|
Abstract
In recent years, genome-wide detection of alternative splicing based on Expressed Sequence Tag (EST) sequence alignments with mRNA and genomic sequences has dramatically expanded our understanding of the role of alternative splicing in functional regulation. This chapter reviews the data, methodology, and technical challenges of these genome-wide analyses of alternative splicing, and briefly surveys some of the uses to which such alternative splicing databases have been put. For example, with proper alternative splicing database schema design, it is possible to query genome-wide for alternative splicing patterns that are specific to particular tissues, disease states (e.g., cancer), gender, or developmental stages. EST alignments can be used to estimate exon inclusion or exclusion level of alternatively spliced exons and evolutionary changes for various species can be inferred from exon inclusion level. Such databases can also help automate design of probes for RT-PCR and microarrays, enabling high throughput experimental measurement of alternative splicing.
Collapse
|
12
|
Leparc GG, Mitra RD. A sensitive procedure to detect alternatively spliced mRNA in pooled-tissue samples. Nucleic Acids Res 2007; 35:e146. [PMID: 18000005 PMCID: PMC2175357 DOI: 10.1093/nar/gkm989] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
One important goal of genomics is to explore the extent of alternative splicing in the transcriptome and generate a comprehensive catalog of splice forms. New computational and experimental approaches have led to an increase in the number of predicted alternatively spliced transcripts; however, validation of these predictions has not kept pace. In this work, we systematically explore different methods for the validation of cassette exons predicted by computational methods or tiling microarrays. Our goal was to find a procedure that is cost effective, sensitive and specific. We examined three ways of priming the reverse transcription (RT) reaction—poly-dT priming, random priming and pooled exon-specific priming. We also examined two strategies for PCR amplification—flanking PCR, which uses primers that hybridize to the constitutive exons flanking the predicted exon, and a semi-nested PCR with a primer that targets the predicted exon. We found that the combination of RT using a pool of gene-specific primers followed by semi-nested PCR resulted in a significant increase in sensitivity over the most commonly used methodology (97% of the test set was detected versus 14%). Our method was also highly specific—no false positives were detected using a test set of true negatives. Finally, we demonstrate that this method is able to detect alternative exons with a high sensitivity from whole-organism RNA, allowing all tissues to be sampled in a single experiment. The protocol developed here is an accurate and cost-effective way to validate predictions of alternative splicing.
Collapse
Affiliation(s)
- Germán Gastón Leparc
- Department of Genetics and Center for Genome Sciences, Washington University in St Louis, 4444 Forest Park Parkway, Campus Box 8510, St Louis, MO 63108, USA
| | | |
Collapse
|
13
|
Artamonova II, Gelfand MS. Comparative Genomics and Evolution of Alternative Splicing: The Pessimists' Science. Chem Rev 2007; 107:3407-30. [PMID: 17645315 DOI: 10.1021/cr068304c] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Irena I Artamonova
- Group of Bioinformatics, Vavilov Institute of General Genetics, RAS, Gubkina 3, Moscow 119991, Russia
| | | |
Collapse
|
14
|
Leparc GG, Mitra RD. Non-EST-based prediction of novel alternatively spliced cassette exons with cell signaling function in Caenorhabditis elegans and human. Nucleic Acids Res 2007; 35:3192-202. [PMID: 17452356 PMCID: PMC1904267 DOI: 10.1093/nar/gkm187] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
To better understand the complex role that alternative splicing plays in intracellular signaling, it is important to catalog the numerous splice variants involved in signal transduction. Therefore, we developed PASE (Prediction of Alternative Signaling Exons), a computational tool to identify novel alternative cassette exons that code for kinase phosphorylation or signaling protein-binding sites. We first applied PASE to the Caenorhabditis elegans genome. In this organism, our algorithm had an overall specificity of ≥76.4%, including 33 novel cassette exons that we experimentally verified. We then used PASE to analyze the human genome and made 804 predictions, of which 308 were found as alternative exons in the transcript database. We experimentally tested 384 of the remaining unobserved predictions and discovered 26 novel human exons for a total specificity of ≥41.5% in human. By using a test set of known alternatively spliced signaling exons, we determined that the sensitivity of PASE is ∼70%. GO term analysis revealed that our exon predictions were found in the introns of known signal transduction genes more often than expected by chance, indicating PASE enriches for splice variants that function in signaling pathways. Overall, PASE was able to uncover 59 novel alternative cassette exons in C. elegans and humans through a genome-wide ab initio prediction method that enriches for exons involved in signaling.
Collapse
Affiliation(s)
| | - Robi David Mitra
- *To whom correspondence should be addressed. Tel: +1-314-362-2751; Fax: +1-314-362-2156;
| |
Collapse
|
15
|
Vukusic I, Grellscheid SN, Wiehe T. Applying genetic programming to the prediction of alternative mRNA splice variants. Genomics 2007; 89:471-9. [PMID: 17276654 DOI: 10.1016/j.ygeno.2007.01.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2006] [Revised: 12/22/2006] [Accepted: 01/02/2007] [Indexed: 11/22/2022]
Abstract
Genetic programming (GP) can be used to classify a given gene sequence as either constitutively or alternatively spliced. We describe the principles of GP and apply it to a well-defined data set of alternatively spliced genes. A feature matrix of sequence properties, such as nucleotide composition or exon length, was passed to the GP system "Discipulus." To test its performance we concentrated on cassette exons (SCE) and retained introns (SIR). We analyzed 27,519 constitutively spliced and 9641 cassette exons including their neighboring introns; in addition we analyzed 33,316 constitutively spliced introns compared to 2712 retained introns. We find that the classifier yields highly accurate predictions on the SIR data with a sensitivity of 92.1% and a specificity of 79.2%. Prediction accuracies on the SCE data are lower, 47.3% (sensitivity) and 70.9% (specificity), indicating that alternative splicing of introns can be better captured by sequence properties than that of exons.
Collapse
Affiliation(s)
- Ivana Vukusic
- Institut für Genetik, Universität zu Köln, Zülpicher Strasse 47, 50674 Köln, Germany
| | | | | |
Collapse
|
16
|
Xia H, Bi J, Li Y. Identification of alternative 5'/3' splice sites based on the mechanism of splice site competition. Nucleic Acids Res 2006; 34:6305-13. [PMID: 17098928 PMCID: PMC1669764 DOI: 10.1093/nar/gkl900] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2006] [Revised: 08/01/2006] [Accepted: 10/12/2006] [Indexed: 11/30/2022] Open
Abstract
Alternative splicing plays an important role in regulating gene expression. Currently, most efficient methods use expressed sequence tags or microarray analysis for large-scale detection of alternative splicing. However, it is difficult to detect all alternative splice events with them because of their inherent limitations. Previous computational methods for alternative splicing prediction could only predict particular kinds of alternative splice events. Thus, it would be highly desirable to predict alternative 5'/3' splice sites with various splicing levels using genomic sequences alone. Here, we introduce the competition mechanism of splice sites selection into alternative splice site prediction. This approach allows us to predict not only rarely used but also frequently used alternative splice sites. On a dataset extracted from the AltSplice database, our method correctly classified approximately 70% of the splice sites into alternative and constitutive, as well as approximately 80% of the locations of real competitors for alternative splice sites. It outperforms a method which only considers features extracted from the splice sites themselves. Furthermore, this approach can also predict the changes in activation level arising from mutations in flanking cryptic splice sites of a given splice site. Our approach might be useful for studying alternative splicing in both computational and molecular biology.
Collapse
Affiliation(s)
- Huiyu Xia
- Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China.
| | | | | |
Collapse
|
17
|
Li Y, Bor YC, Misawa Y, Xue Y, Rekosh D, Hammarskjöld ML. An intron with a constitutive transport element is retained in a Tap messenger RNA. Nature 2006; 443:234-7. [PMID: 16971948 DOI: 10.1038/nature05107] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2006] [Accepted: 07/24/2006] [Indexed: 11/09/2022]
Abstract
Alternative splicing is a key factor contributing to genetic diversity and evolution. Intron retention, one form of alternative splicing, is common in plants but rare in higher eukaryotes, because messenger RNAs with retained introns are subject to cellular restriction at the level of cytoplasmic export and expression. Often, retention of internal introns restricts the export of these mRNAs and makes them the targets for degradation by the cellular nonsense-mediated decay machinery if they contain premature stop codons. In fact, many of the database entries for complementary DNAs with retained introns represent them as artefacts that would not affect the proteome. Retroviruses are important model systems in studies of regulation of RNAs with retained introns, because their genomic and mRNAs contain one or more unspliced introns. For example, Mason-Pfizer monkey virus overcomes cellular restrictions by using a cis-acting RNA element known as the constitutive transport element (CTE). The CTE interacts directly with the Tap protein (also known as nuclear RNA export factor 1, encoded by NXF1), which is thought to be a principal export receptor for cellular mRNA, leading to the hypothesis that cellular mRNAs with retained introns use cellular CTE equivalents to overcome restrictions to their expression. Here we show that the Tap gene contains a functional CTE in its alternatively spliced intron 10. Tap mRNA containing this intron is exported to the cytoplasm and is present in polyribosomes. A small Tap protein is encoded by this mRNA and can be detected in human and monkey cells. Our results indicate that Tap regulates expression of its own intron-containing RNA through a CTE-mediated mechanism. Thus, CTEs are likely to be important elements that facilitate efficient expression of mammalian mRNAs with retained introns.
Collapse
Affiliation(s)
- Ying Li
- Myles H. Thaler Center for AIDS & Human Retrovirus Research and Department of Microbiology, University of Virginia, Charlottesville, Virginia 22908, USA
| | | | | | | | | | | |
Collapse
|
18
|
Allen JE, Salzberg SL. A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons. Algorithms Mol Biol 2006; 1:14. [PMID: 16934144 PMCID: PMC1570466 DOI: 10.1186/1748-7188-1-14] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2006] [Accepted: 08/25/2006] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND An important challenge in eukaryotic gene prediction is accurate identification of alternatively spliced exons. Functional transcripts can go undetected in gene expression studies when alternative splicing only occurs under specific biological conditions. Non-expression based computational methods support identification of rarely expressed transcripts. RESULTS A non-expression based statistical method is presented to annotate alternatively spliced exons using a single genome sequence and evidence from cross-species sequence conservation. The computational method is implemented in the program ExAlt and an analysis of prediction accuracy is given for Drosophila melanogaster. CONCLUSION ExAlt identifies the structure of most alternatively spliced exons in the test set and cross-species sequence conservation is shown to improve the precision of predictions. The software package is available to run on Drosophila genomes to search for new cases of alternative splicing.
Collapse
Affiliation(s)
- Jonathan E Allen
- Center for Bioinformatics and Computational Biology, University of Maryland Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA
- Department of Computer Science, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Steven L Salzberg
- Center for Bioinformatics and Computational Biology, University of Maryland Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
19
|
Zavolan M, van Nimwegen E. The types and prevalence of alternative splice forms. Curr Opin Struct Biol 2006; 16:362-7. [PMID: 16713247 DOI: 10.1016/j.sbi.2006.05.002] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Revised: 04/10/2006] [Accepted: 05/02/2006] [Indexed: 01/01/2023]
Abstract
The finding that eukaryotic gene structures are extremely complex prompted the development of new experimental techniques for the accurate measurement of transcription start site usage and of the expression of alternative splice forms. On the computational side, analyses of large databases of splice variants revealed differences in the length, motif composition and selection pressure between constitutive and alternatively spliced exons. Such features are being incorporated into novel computational tools for gene structure prediction. The result of these investigations is a continuously improving catalogue of alternative splice forms. How the expression of these alternative splice forms is regulated remains one of the major open questions.
Collapse
Affiliation(s)
- Mihaela Zavolan
- Division of Bioinformatics, Biozentrum, University of Basel, Klingelberstrasse 50-70, Basel, CH-4056, Switzerland
| | | |
Collapse
|