1
|
Ura H, Hatanaka H, Togi S, Niida Y. Computational Comparison of Differential Splicing Tools for Targeted RNA Long-Amplicon Sequencing (rLAS). Int J Mol Sci 2025; 26:3220. [PMID: 40244027 PMCID: PMC11989494 DOI: 10.3390/ijms26073220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/24/2025] [Accepted: 03/27/2025] [Indexed: 04/18/2025] Open
Abstract
RNA sequencing (RNA-Seq) is a powerful technique for the quantification of transcripts and the analysis of alternative splicing. Previously, our laboratory developed the targeted RNA long-amplicon sequencing (rLAS) method, which has the advantage of allowing deep analysis of targeted specific transcripts. The computational tools for analyzing RNA-Seq data have boosted alternative splicing research by detecting and quantifying splicing events. However, the performance of these splicing tools has not yet been investigated for rLAS. Here, we evaluated the performance of four splicing tools (MAJIQ, rMATS, MISO, and SplAdder) using samples with different types of known splicing events (exon-skipping, multiple-exon-skipping, alternative 5' splicing, and alternative 3' splicing). MAJIQ was able to detect all of the types of events, but it was unable to detect one of the exon-skipping events. On the other hand, rMATS was able to detect all of the exon-skipping events. However, rMATS failed to detect other types of events besides exon-skipping events. Both MISO and SplAdder were unable to detect any of the events. These results indicate that MAJIQ presents better performance for the different types of splicing events in rLAS and that rMATS shows better performance for exon-skipping splicing events.
Collapse
Affiliation(s)
- Hiroki Ura
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku 920-0923, Ishikawa, Japan (S.T.); (Y.N.)
- Division of Genomic Medicine, Department of Advanced Medicine, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Kahoku 920-0923, Ishikawa, Japan
| | - Hisayo Hatanaka
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku 920-0923, Ishikawa, Japan (S.T.); (Y.N.)
| | - Sumihito Togi
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku 920-0923, Ishikawa, Japan (S.T.); (Y.N.)
- Division of Genomic Medicine, Department of Advanced Medicine, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Kahoku 920-0923, Ishikawa, Japan
| | - Yo Niida
- Center for Clinical Genomics, Kanazawa Medical University Hospital, 1-1 Daigaku, Uchinada, Kahoku 920-0923, Ishikawa, Japan (S.T.); (Y.N.)
- Division of Genomic Medicine, Department of Advanced Medicine, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Kahoku 920-0923, Ishikawa, Japan
| |
Collapse
|
2
|
Wang Y, Xie Z, Kutschera E, Adams JI, Kadash-Edmondson KE, Xing Y. rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data. Nat Protoc 2024; 19:1083-1104. [PMID: 38396040 DOI: 10.1038/s41596-023-00944-2] [Citation(s) in RCA: 47] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 11/02/2023] [Indexed: 02/25/2024]
Abstract
Pre-mRNA alternative splicing is a prevalent mechanism for diversifying eukaryotic transcriptomes and proteomes. Regulated alternative splicing plays a role in many biological processes, and dysregulated alternative splicing is a feature of many human diseases. Short-read RNA sequencing (RNA-seq) is now the standard approach for transcriptome-wide analysis of alternative splicing. Since 2011, our laboratory has developed and maintained Replicate Multivariate Analysis of Transcript Splicing (rMATS), a computational tool for discovering and quantifying alternative splicing events from RNA-seq data. Here we provide a protocol for the contemporary version of rMATS, rMATS-turbo, a fast and scalable re-implementation that maintains the statistical framework and user interface of the original rMATS software, while incorporating a revamped computational workflow with a substantial improvement in speed and data storage efficiency. The rMATS-turbo software scales up to massive RNA-seq datasets with tens of thousands of samples. To illustrate the utility of rMATS-turbo, we describe two representative application scenarios. First, we describe a broadly applicable two-group comparison to identify differential alternative splicing events between two sample groups, including both annotated and novel alternative splicing events. Second, we describe a quantitative analysis of alternative splicing in a large-scale RNA-seq dataset (~1,000 samples), including the discovery of alternative splicing events associated with distinct cell states. We detail the workflow and features of rMATS-turbo that enable efficient parallel processing and analysis of large-scale RNA-seq datasets on a compute cluster. We anticipate that this protocol will help the broad user base of rMATS-turbo make the best use of this software for studying alternative splicing in diverse biological systems.
Collapse
Affiliation(s)
- Yuanyuan Wang
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA, USA
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Zhijie Xie
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Eric Kutschera
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jenea I Adams
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA, USA
| | - Kathryn E Kadash-Edmondson
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
| |
Collapse
|
3
|
Ascensão-Ferreira M, Martins-Silva R, Saraiva-Agostinho N, Barbosa-Morais NL. betAS: intuitive analysis and visualization of differential alternative splicing using beta distributions. RNA (NEW YORK, N.Y.) 2024; 30:337-353. [PMID: 38278530 PMCID: PMC10946425 DOI: 10.1261/rna.079764.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 01/15/2024] [Indexed: 01/28/2024]
Abstract
Next-generation RNA sequencing allows alternative splicing (AS) quantification with unprecedented resolution, with the relative inclusion of an alternative sequence in transcripts being commonly quantified by the proportion of reads supporting it as percent spliced-in (PSI). However, PSI values do not incorporate information about precision, proportional to the respective AS events' read coverage. Beta distributions are suitable to quantify inclusion levels of alternative sequences, using reads supporting their inclusion and exclusion as surrogates for the two distribution shape parameters. Each such beta distribution has the PSI as its mean value and is narrower when the read coverage is higher, facilitating the interpretability of its precision when plotted. We herein introduce a computational pipeline, based on beta distributions accurately modeling PSI values and their precision, to quantitatively and visually compare AS between groups of samples. Our methodology includes a differential splicing significance metric that compromises the magnitude of intergroup differences, the estimation uncertainty in individual samples, and the intragroup variability, being therefore suitable for multiple-group comparisons. To make our approach accessible and clear to both noncomputational and computational biologists, we developed betAS, an interactive web app and user-friendly R package for visual and intuitive differential splicing analysis from read count data.
Collapse
Affiliation(s)
- Mariana Ascensão-Ferreira
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Lisboa 1649-028, Portugal
| | - Rita Martins-Silva
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Lisboa 1649-028, Portugal
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nuno L Barbosa-Morais
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Lisboa 1649-028, Portugal
| |
Collapse
|
4
|
Kordala AJ, Stoodley J, Ahlskog N, Hanifi M, Garcia Guerra A, Bhomra A, Lim WF, Murray LM, Talbot K, Hammond SM, Wood MJA, Rinaldi C. PRMT inhibitor promotes SMN2 exon 7 inclusion and synergizes with nusinersen to rescue SMA mice. EMBO Mol Med 2023; 15:e17683. [PMID: 37724723 PMCID: PMC10630883 DOI: 10.15252/emmm.202317683] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 08/31/2023] [Accepted: 09/01/2023] [Indexed: 09/21/2023] Open
Abstract
Spinal muscular atrophy (SMA) is a leading genetic cause of infant mortality. The advent of approved treatments for this devastating condition has significantly changed SMA patients' life expectancy and quality of life. Nevertheless, these are not without limitations, and research efforts are underway to develop new approaches for improved and long-lasting benefits for patients. Protein arginine methyltransferases (PRMTs) are emerging as druggable epigenetic targets, with several small-molecule PRMT inhibitors already in clinical trials. From a screen of epigenetic molecules, we have identified MS023, a potent and selective type I PRMT inhibitor able to promote SMN2 exon 7 inclusion in preclinical SMA models. Treatment of SMA mice with MS023 results in amelioration of the disease phenotype, with strong synergistic amplification of the positive effect when delivered in combination with the antisense oligonucleotide nusinersen. Moreover, transcriptomic analysis revealed that MS023 treatment has minimal off-target effects, and the added benefit is mainly due to targeting neuroinflammation. Our study warrants further clinical investigation of PRMT inhibition both as a stand-alone and add-on therapy for SMA.
Collapse
Affiliation(s)
- Anna J Kordala
- Department of Physiology Anatomy and GeneticsUniversity of OxfordOxfordUK
- Department of PaediatricsUniversity of OxfordOxfordUK
- Institute of Developmental and Regenerative Medicine (IDRM)OxfordUK
| | - Jessica Stoodley
- Department of PaediatricsUniversity of OxfordOxfordUK
- Institute of Developmental and Regenerative Medicine (IDRM)OxfordUK
| | - Nina Ahlskog
- Department of PaediatricsUniversity of OxfordOxfordUK
- Institute of Developmental and Regenerative Medicine (IDRM)OxfordUK
| | | | - Antonio Garcia Guerra
- Department of PaediatricsUniversity of OxfordOxfordUK
- Institute of Developmental and Regenerative Medicine (IDRM)OxfordUK
| | - Amarjit Bhomra
- Department of PaediatricsUniversity of OxfordOxfordUK
- Institute of Developmental and Regenerative Medicine (IDRM)OxfordUK
| | - Wooi Fang Lim
- Department of PaediatricsUniversity of OxfordOxfordUK
- Institute of Developmental and Regenerative Medicine (IDRM)OxfordUK
| | - Lyndsay M Murray
- Centre for Discovery Brain Sciences, College of Medicine and Veterinary MedicineUniversity of EdinburghEdinburghUK
- Euan McDonald Centre for Motor Neuron Disease ResearchUniversity of EdinburghEdinburghUK
| | - Kevin Talbot
- Nuffield Department of Clinical Neurosciences, John Radcliffe HospitalUniversity of OxfordOxfordUK
- Kavli Institute for Nanoscience DiscoveryUniversity of OxfordOxfordUK
| | | | - Matthew JA Wood
- Department of PaediatricsUniversity of OxfordOxfordUK
- Institute of Developmental and Regenerative Medicine (IDRM)OxfordUK
- MDUK Oxford Neuromuscular CentreOxfordUK
| | - Carlo Rinaldi
- Department of PaediatricsUniversity of OxfordOxfordUK
- Institute of Developmental and Regenerative Medicine (IDRM)OxfordUK
- MDUK Oxford Neuromuscular CentreOxfordUK
| |
Collapse
|
5
|
Montero-Hidalgo AJ, Pérez-Gómez JM, Martínez-Fuentes AJ, Gómez-Gómez E, Gahete MD, Jiménez-Vacas JM, Luque RM. Alternative splicing in bladder cancer: potential strategies for cancer diagnosis, prognosis, and treatment. WILEY INTERDISCIPLINARY REVIEWS. RNA 2023; 14:e1760. [PMID: 36063028 DOI: 10.1002/wrna.1760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 07/25/2022] [Accepted: 08/05/2022] [Indexed: 05/13/2023]
Abstract
Bladder cancer is the most common malignancy of the urinary tract worldwide. The therapeutic options to tackle this disease comprise surgery, intravesical or systemic chemotherapy, and immunotherapy. Unfortunately, a wide number of patients ultimately become resistant to these treatments and develop aggressive metastatic disease, presenting a poor prognosis. Therefore, the identification of novel therapeutic approaches to tackle this devastating pathology is urgently needed. However, a significant limitation is that the progression and drug response of bladder cancer is strongly associated with its intrinsic molecular heterogeneity. In this sense, RNA splicing is recently gaining importance as a critical hallmark of cancer since can have a significant clinical value. In fact, a profound dysregulation of the splicing process has been reported in bladder cancer, especially in the expression of certain key splicing variants and circular RNAs with a potential clinical value as diagnostic/prognostic biomarkers or therapeutic targets in this pathology. Indeed, some authors have already evidenced a profound antitumor effect by targeting some splicing factors (e.g., PTBP1), mRNA splicing variants (e.g., PKM2, HYAL4-v1), and circular RNAs (e.g., circITCH, circMYLK), which illustrates new possibilities to significantly improve the management of this pathology. This review represents the first detailed overview of the splicing process and its alterations in bladder cancer, and highlights opportunities for the development of novel diagnostic/prognostic biomarkers and their clinical potential for the treatment of this devastating cancer type. This article is categorized under: RNA Processing > Splicing Regulation/Alternative Splicing RNA in Disease and Development > RNA in Disease.
Collapse
Affiliation(s)
- Antonio J Montero-Hidalgo
- Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Cordoba, 14004, Spain
- Department of Cell Biology, Physiology and Immunology, University of Cordoba, Cordoba, 14004, Spain
- Reina Sofia University Hospital (HURS), Cordoba, 14004, Spain
- CIBER Physiopathology of Obesity and Nutrition (CIBERobn), Cordoba, 14004, Spain
| | - Jesús M Pérez-Gómez
- Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Cordoba, 14004, Spain
- Department of Cell Biology, Physiology and Immunology, University of Cordoba, Cordoba, 14004, Spain
- Reina Sofia University Hospital (HURS), Cordoba, 14004, Spain
- CIBER Physiopathology of Obesity and Nutrition (CIBERobn), Cordoba, 14004, Spain
| | - Antonio J Martínez-Fuentes
- Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Cordoba, 14004, Spain
- Department of Cell Biology, Physiology and Immunology, University of Cordoba, Cordoba, 14004, Spain
- Reina Sofia University Hospital (HURS), Cordoba, 14004, Spain
- CIBER Physiopathology of Obesity and Nutrition (CIBERobn), Cordoba, 14004, Spain
| | - Enrique Gómez-Gómez
- Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Cordoba, 14004, Spain
- Reina Sofia University Hospital (HURS), Cordoba, 14004, Spain
- Urology Service, HURS/IMIBIC, Cordoba, 14004, Spain
| | - Manuel D Gahete
- Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Cordoba, 14004, Spain
- Department of Cell Biology, Physiology and Immunology, University of Cordoba, Cordoba, 14004, Spain
- Reina Sofia University Hospital (HURS), Cordoba, 14004, Spain
- CIBER Physiopathology of Obesity and Nutrition (CIBERobn), Cordoba, 14004, Spain
| | - Juan M Jiménez-Vacas
- Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Cordoba, 14004, Spain
- Department of Cell Biology, Physiology and Immunology, University of Cordoba, Cordoba, 14004, Spain
- Reina Sofia University Hospital (HURS), Cordoba, 14004, Spain
- CIBER Physiopathology of Obesity and Nutrition (CIBERobn), Cordoba, 14004, Spain
| | - Raúl M Luque
- Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Cordoba, 14004, Spain
- Department of Cell Biology, Physiology and Immunology, University of Cordoba, Cordoba, 14004, Spain
- Reina Sofia University Hospital (HURS), Cordoba, 14004, Spain
- CIBER Physiopathology of Obesity and Nutrition (CIBERobn), Cordoba, 14004, Spain
| |
Collapse
|
6
|
Rosenkranz RRE, Ullrich S, Löchli K, Simm S, Fragkostefanakis S. Relevance and Regulation of Alternative Splicing in Plant Heat Stress Response: Current Understanding and Future Directions. FRONTIERS IN PLANT SCIENCE 2022; 13:911277. [PMID: 35812973 PMCID: PMC9260394 DOI: 10.3389/fpls.2022.911277] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 05/26/2022] [Indexed: 05/26/2023]
Abstract
Alternative splicing (AS) is a major mechanism for gene expression in eukaryotes, increasing proteome diversity but also regulating transcriptome abundance. High temperatures have a strong impact on the splicing profile of many genes and therefore AS is considered as an integral part of heat stress response. While many studies have established a detailed description of the diversity of the RNAome under heat stress in different plant species and stress regimes, little is known on the underlying mechanisms that control this temperature-sensitive process. AS is mainly regulated by the activity of splicing regulators. Changes in the abundance of these proteins through transcription and AS, post-translational modifications and interactions with exonic and intronic cis-elements and core elements of the spliceosomes modulate the outcome of pre-mRNA splicing. As a major part of pre-mRNAs are spliced co-transcriptionally, the chromatin environment along with the RNA polymerase II elongation play a major role in the regulation of pre-mRNA splicing under heat stress conditions. Despite its importance, our understanding on the regulation of heat stress sensitive AS in plants is scarce. In this review, we summarize the current status of knowledge on the regulation of AS in plants under heat stress conditions. We discuss possible implications of different pathways based on results from non-plant systems to provide a perspective for researchers who aim to elucidate the molecular basis of AS under high temperatures.
Collapse
Affiliation(s)
| | - Sarah Ullrich
- Molecular Cell Biology of Plants, Goethe University Frankfurt, Frankfurt, Germany
| | - Karin Löchli
- Molecular Cell Biology of Plants, Goethe University Frankfurt, Frankfurt, Germany
| | - Stefan Simm
- Institute of Bioinformatics, University Medicine Greifswald, Greifswald, Germany
| | | |
Collapse
|
7
|
Singh P, Ahi EP. The importance of alternative splicing in adaptive evolution. Mol Ecol 2022; 31:1928-1938. [DOI: 10.1111/mec.16377] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 01/06/2022] [Accepted: 01/25/2022] [Indexed: 11/26/2022]
Affiliation(s)
- Pooja Singh
- Department of Biological Sciences University of Calgary Calgary Canada
- Institute of Ecology and Evolution University of Bern Bern Switzerland
- Swiss Federal Institute of Aquatic Science and Technology (EAWAG) Kastanienbaum Switzerland
| | - Ehsan Pashay Ahi
- Organismal and Evolutionary Biology Research Programme University of Helsinki Helsinki Finland
| |
Collapse
|
8
|
Zhang Y, Zou D, Zhu T, Xu T, Chen M, Niu G, Zong W, Pan R, Jing W, Sang J, Liu C, Xiong Y, Sun Y, Zhai S, Chen H, Zhao W, Xiao J, Bao Y, Hao L, Zhang Z. Gene Expression Nebulas (GEN): a comprehensive data portal integrating transcriptomic profiles across multiple species at both bulk and single-cell levels. Nucleic Acids Res 2022; 50:D1016-D1024. [PMID: 34591957 PMCID: PMC8728231 DOI: 10.1093/nar/gkab878] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 09/15/2021] [Accepted: 09/17/2021] [Indexed: 01/07/2023] Open
Abstract
Transcriptomic profiling is critical to uncovering functional elements from transcriptional and post-transcriptional aspects. Here, we present Gene Expression Nebulas (GEN, https://ngdc.cncb.ac.cn/gen/), an open-access data portal integrating transcriptomic profiles under various biological contexts. GEN features a curated collection of high-quality bulk and single-cell RNA sequencing datasets by using standardized data processing pipelines and a structured curation model. Currently, GEN houses a large number of gene expression profiles from 323 datasets (157 bulk and 166 single-cell), covering 50 500 samples and 15 540 169 cells across 30 species, which are further categorized into six biological contexts. Moreover, GEN integrates a full range of transcriptomic profiles on expression, RNA editing and alternative splicing for 10 bulk datasets, providing opportunities for users to conduct integrative analysis at both transcriptional and post-transcriptional levels. In addition, GEN provides abundant gene annotations based on value-added curation of transcriptomic profiles and delivers online services for data analysis and visualization. Collectively, GEN presents a comprehensive collection of transcriptomic profiles across multiple species, thus serving as a fundamental resource for better understanding genetic regulatory architecture and functional mechanisms from tissues to cells.
Collapse
Affiliation(s)
- Yuansheng Zhang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dong Zou
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Tongtong Zhu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tianyi Xu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Ming Chen
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangyi Niu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenting Zong
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rong Pan
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Jing
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jian Sang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chang Liu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yujia Xiong
- Beijing Neurosurgical Institute, Capital Medical University, Beijing 100069, China
| | - Yubin Sun
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Shuang Zhai
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Huanxin Chen
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Wenming Zhao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingfa Xiao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiming Bao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lili Hao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Zhang Zhang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
9
|
Liu L, Amorín R, Moriel P, DiLorenzo N, Lancaster PA, Peñagaricano F. Maternal methionine supplementation during gestation alters alternative splicing and DNA methylation in bovine skeletal muscle. BMC Genomics 2021; 22:780. [PMID: 34717556 PMCID: PMC8557564 DOI: 10.1186/s12864-021-08065-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Accepted: 09/28/2021] [Indexed: 01/18/2023] Open
Abstract
Background The evaluation of alternative splicing, including differential isoform expression and differential exon usage, can provide some insights on the transcriptional changes that occur in response to environmental perturbations. Maternal nutrition is considered a major intrauterine regulator of fetal developmental programming. The objective of this study was to assess potential changes in splicing events in the longissimus dorsi muscle of beef calves gestated under control or methionine-rich diets. RNA sequencing and whole-genome bisulfite sequencing were used to evaluate muscle transcriptome and methylome, respectively. Results Alternative splicing patterns were significantly altered by maternal methionine supplementation. Most of the altered genes were directly implicated in muscle development, muscle physiology, ATP activities, RNA splicing and DNA methylation, among other functions. Interestingly, there was a significant association between DNA methylation and differential exon usage. Indeed, among the set of genes that showed differential exon usage, significant differences in methylation level were detected between significant and non-significant exons, and between contiguous and non-contiguous introns to significant exons. Conclusions Overall, our findings provide evidence that a prenatal diet rich in methyl donors can significantly alter the offspring transcriptome, including changes in isoform expression and exon usage, and some of these changes are mediated by changes in DNA methylation. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08065-4.
Collapse
Affiliation(s)
- Lihe Liu
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, 1675 Observatory Dr, Madison, WI, 53706, USA
| | - Rocío Amorín
- University of Florida Genetics Institute, University of Florida, 32611, Gainesville, FL, USA
| | - Philipe Moriel
- Range Cattle Research and Education Center, University of Florida, 33865, Ona, FL, USA
| | - Nicolás DiLorenzo
- North Florida Research and Education Center, University of Florida, 32351, Marianna, FL, USA
| | - Phillip A Lancaster
- Department of Clinical Sciences, Kansas State University, 66506, Manhattan, KS, USA
| | - Francisco Peñagaricano
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, 1675 Observatory Dr, Madison, WI, 53706, USA.
| |
Collapse
|
10
|
Cui M, Bai M, Zheng L, Bao Y, Sun L, Yu C, Sun Y, Song Z, Wang G, Yu Z, Li Y, Huang Y. Discovery and Verification of Key Liver Cancer Genes and Alternative Splicing Events Based on Second-Generation Sequencing Data Analysis. Biol Pharm Bull 2021; 44:1433-1444. [PMID: 34602553 DOI: 10.1248/bpb.b21-00241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Hepatocellular carcinoma (HCC) is the most common malignant liver disease in the world. Existing screening and early diagnosis methods are not highly sensitive for HCC, and patients are likely to develop the disease to the middle and advanced stages before being diagnosed. Therefore, finding new and efficient diagnosis and treatment methods has become an urgent problem. We aimed at finding and verifying new liver cancer markers by combining informatics analysis with experimental exploration to provide new ideas and methods for the diagnosis and treatment of clinical liver cancer. We used two different bioinformatic pipelines to analyze sequencing data of clinical liver cancer samples and identify differentially expressed genes and key variants after combining them with The Cancer Genome Atlas sequencing data. Then, we explored the functions and mechanisms of the key variants to identify potential liver cancer markers. Through bioinformatic analysis of sequencing data, 139 differentially expressed genes were found, including 53 upregulated genes and 86 downregulated genes. Through enrichment and alternative splicing event analysis of sequencing data, we found nine key variants with exon skipping events. Metallothionein 1E (MT1E)-203 was found to be a key variant that influenced cell proliferation through the p53 cell cycle pathway through cell viability and proliferation assays, and MT1E-203 lost the ability to bind two zinc ions due to exon skipping according to the structure prediction of MT1E-203. MT1E-203 is a potential biomarker for HCC and may play an important role in the diagnosis and treatment of HCC.
Collapse
Affiliation(s)
- Mengqi Cui
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University
| | - Miao Bai
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University
| | - Lihua Zheng
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University
| | - Yongli Bao
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University
| | - Luguo Sun
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University
| | - Chunlei Yu
- Research Center of Agriculture and Medicine Gene Engineering of Ministry of Education, Northeast Normal University
| | - Ying Sun
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University
| | - Zhenbo Song
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University
| | - Guannan Wang
- Research Center of Agriculture and Medicine Gene Engineering of Ministry of Education, Northeast Normal University
| | - Zhenxiang Yu
- Department of Respiratory Medicine, the First Hospital of Jilin University
| | - Yuxin Li
- Research Center of Agriculture and Medicine Gene Engineering of Ministry of Education, Northeast Normal University
| | - Yanxin Huang
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University
| |
Collapse
|
11
|
Manz Q, Tsoy O, Fenn A, Baumbach J, Völker U, List M, Kacprowski T. ASimulatoR: splice-aware RNA-Seq data simulation. Bioinformatics 2021; 37:3008-3010. [PMID: 33647976 DOI: 10.1093/bioinformatics/btab142] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 02/16/2021] [Accepted: 02/25/2021] [Indexed: 02/02/2023] Open
Abstract
SUMMARY A plethora of tools exist for RNA-Seq data analysis with a focus on alternative splicing (AS). However, appropriate data for their comparative evaluation is missing. The R package ASimulatoR simulates gold standard RNA-Seq datasets with fine-grained control over the distribution of AS events, which allow for evaluating alternative splicing tools, e.g. to study the effect of sequencing depth on the performance of AS event detection. AVAILABILITY AND IMPLEMENTATION ASimulatoR is freely available at https://github.com/biomedbigdata/ASimulatoR as an R package under GPL-3 license.
Collapse
Affiliation(s)
- Quirin Manz
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
| | - Olga Tsoy
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
| | - Amit Fenn
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark.,Chair of Computational Systems Biology, University of Hamburg, 22607 Hamburg, Germany
| | - Uwe Völker
- Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
| | - Tim Kacprowski
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany.,Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, 38106 Brunswick, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), 38106 Braunschweig, Germany
| |
Collapse
|
12
|
Muller IB, Meijers S, Kampstra P, van Dijk S, van Elswijk M, Lin M, Wojtuszkiewicz AM, Jansen G, de Jonge R, Cloos J. Computational comparison of common event-based differential splicing tools: practical considerations for laboratory researchers. BMC Bioinformatics 2021; 22:347. [PMID: 34174808 PMCID: PMC8236165 DOI: 10.1186/s12859-021-04263-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 06/11/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Computational tools analyzing RNA-sequencing data have boosted alternative splicing research by identifying and assessing differentially spliced genes. However, common alternative splicing analysis tools differ substantially in their statistical analyses and general performance. This report compares the computational performance (CPU utilization and RAM usage) of three event-level splicing tools; rMATS, MISO, and SUPPA2. Additionally, concordance between tool outputs was investigated. RESULTS Log-linear relations were found between job times and dataset size in all splicing tools and all virtual machine (VM) configurations. MISO had the highest job times for all analyses, irrespective of VM size, while MISO analyses also exceeded maximum CPU utilization on all VM sizes. rMATS and SUPPA2 load averages were relatively low in both size and replicate comparisons, not nearing maximum CPU utilization in the VM simulating the lowest computational power (D2 VM). RAM usage in rMATS and SUPPA2 did not exceed 20% of maximum RAM in both size and replicate comparisons while MISO reached maximum RAM usage in D2 VM analyses for input size. Correlation coefficients of differential splicing analyses showed high correlation (β > 80%) between different tool outputs with the exception of comparisons of retained intron (RI) events between rMATS/MISO and rMATS/SUPPA2 (β < 60%). CONCLUSIONS Prior to RNA-seq analyses, users should consider job time, amount of replicates and splice event type of interest to determine the optimal alternative splicing tool. In general, rMATS is superior to both MISO and SUPPA2 in computational performance. Analysis outputs show high concordance between tools, with the exception of RI events.
Collapse
Affiliation(s)
- Ittai B Muller
- Department of Clinical Chemistry, Amsterdam UMC - location VUmc, Amsterdam, The Netherlands
| | | | | | | | | | - Marry Lin
- Department of Clinical Chemistry, Amsterdam UMC - location VUmc, Amsterdam, The Netherlands
| | - Anna M Wojtuszkiewicz
- Department of Hematology, Cancer Center Amsterdam, Rm CCA 4.24, Amsterdam UMC - location VUmc, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
| | - Gerrit Jansen
- Amsterdam Rheumatology and immunology Center, Amsterdam UMC - location VUmc, Amsterdam, The Netherlands
| | - Robert de Jonge
- Department of Clinical Chemistry, Amsterdam UMC - location VUmc, Amsterdam, The Netherlands
| | - Jacqueline Cloos
- Department of Hematology, Cancer Center Amsterdam, Rm CCA 4.24, Amsterdam UMC - location VUmc, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
13
|
Cheng C, Liu L, Bao Y, Yi J, Quan W, Xue Y, Sun L, Zhang Y. SUVA: splicing site usage variation analysis from RNA-seq data reveals highly conserved complex splicing biomarkers in liver cancer. RNA Biol 2021; 18:157-171. [PMID: 34152934 DOI: 10.1080/15476286.2021.1940037] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Most of the current alternative splicing (AS) analysis tools are powerless to analyse complex splicing. To address this, we developed SUVA (Splice sites Usage Variation Analysis) that decomposes complex splicing events into five types of splice junction pairs. By analysing real and simulated data, SUVA showed higher sensitivity and accuracy in detecting AS events than the compared methods. Notably, SUVA detected extensive complex AS events and screened out 69 highly conserved and dominant AS events associated with cancer. The cancer-associated complex AS events in FN1 and the co-regulated RNA-binding proteins were significantly correlated with patient survival.
Collapse
Affiliation(s)
- Chao Cheng
- ABLife BioBigData Institute, Wuhan, Hubei China.,Center for Genome Analysis, ABLife Inc., Wuhan, Hubei China
| | - Lei Liu
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun China
| | - Yongli Bao
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun China
| | - Jingwen Yi
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun China
| | - Weili Quan
- ABLife BioBigData Institute, Wuhan, Hubei China
| | - Yaqiang Xue
- ABLife BioBigData Institute, Wuhan, Hubei China
| | - Luguo Sun
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun China
| | - Yi Zhang
- ABLife BioBigData Institute, Wuhan, Hubei China.,Center for Genome Analysis, ABLife Inc., Wuhan, Hubei China
| |
Collapse
|
14
|
Dent CI, Singh S, Mukherjee S, Mishra S, Sarwade RD, Shamaya N, Loo KP, Harrison P, Sureshkumar S, Powell D, Balasubramanian S. Quantifying splice-site usage: a simple yet powerful approach to analyze splicing. NAR Genom Bioinform 2021; 3:lqab041. [PMID: 34017946 PMCID: PMC8121094 DOI: 10.1093/nargab/lqab041] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/24/2021] [Accepted: 04/28/2021] [Indexed: 02/07/2023] Open
Abstract
RNA splicing, and variations in this process referred to as alternative splicing, are critical aspects of gene regulation in eukaryotes. From environmental responses in plants to being a primary link between genetic variation and disease in humans, splicing differences confer extensive phenotypic changes across diverse organisms (1–3). Regulation of splicing occurs through differential selection of splice sites in a splicing reaction, which results in variation in the abundance of isoforms and/or splicing events. However, genomic determinants that influence splice-site selection remain largely unknown. While traditional approaches for analyzing splicing rely on quantifying variant transcripts (i.e. isoforms) or splicing events (i.e. intron retention, exon skipping etc.) (4), recent approaches focus on analyzing complex/mutually exclusive splicing patterns (5–8). However, none of these approaches explicitly measure individual splice-site usage, which can provide valuable information about splice-site choice and its regulation. Here, we present a simple approach to quantify the empirical usage of individual splice sites reflecting their strength, which determines their selection in a splicing reaction. Splice-site strength/usage, as a quantitative phenotype, allows us to directly link genetic variation with usage of individual splice-sites. We demonstrate the power of this approach in defining the genomic determinants of splice-site choice through GWAS. Our pilot analysis with more than a thousand splice sites hints that sequence divergence in cis rather than trans is associated with variations in splicing among accessions of Arabidopsis thaliana. This approach allows deciphering principles of splicing and has broad implications from agriculture to medicine.
Collapse
Affiliation(s)
- Craig I Dent
- School of Biological Sciences, Monash University, VIC 3800, Australia
| | - Shilpi Singh
- School of Biological Sciences, Monash University, VIC 3800, Australia
| | | | - Shikhar Mishra
- School of Biological Sciences, Monash University, VIC 3800, Australia
| | - Rucha D Sarwade
- School of Biological Sciences, Monash University, VIC 3800, Australia
| | - Nawar Shamaya
- School of Biological Sciences, Monash University, VIC 3800, Australia
| | - Kok Ping Loo
- School of Biological Sciences, Monash University, VIC 3800, Australia
| | - Paul Harrison
- Monash Bioinformatics Platform, Monash University, VIC 3800, Australia
| | | | - David Powell
- Monash Bioinformatics Platform, Monash University, VIC 3800, Australia
| | | |
Collapse
|
15
|
Jiang W, Chen L. Alternative splicing: Human disease and quantitative analysis from high-throughput sequencing. Comput Struct Biotechnol J 2020; 19:183-195. [PMID: 33425250 PMCID: PMC7772363 DOI: 10.1016/j.csbj.2020.12.009] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 11/26/2020] [Accepted: 12/11/2020] [Indexed: 02/07/2023] Open
Abstract
Alternative splicing contributes to the majority of protein diversity in higher eukaryotes by allowing one gene to generate multiple distinct protein isoforms. It adds another regulation layer of gene expression. Up to 95% of human multi-exon genes undergo alternative splicing to encode proteins with different functions. Moreover, around 15% of human hereditary diseases and cancers are associated with alternative splicing. Regulation of alternative splicing is attributed to a set of delicate machineries interacting with each other in aid of important biological processes such as cell development and differentiation. Given the importance of alternative splicing events, their accurate mapping and quantification are paramount for downstream analysis, especially for associating disease with alternative splicing. However, deriving accurate isoform expression from high-throughput RNA-seq data remains a challenging task. In this mini-review, we aim to illustrate I) mechanisms and regulation of alternative splicing, II) alternative splicing associated human disease, III) computational tools for the quantification of isoforms and alternative splicing from RNA-seq.
Collapse
Affiliation(s)
- Wei Jiang
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| | - Liang Chen
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| |
Collapse
|
16
|
Merino GA, Conesa A, Fernández EA. A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies. Brief Bioinform 2019; 20:471-481. [PMID: 29040385 DOI: 10.1093/bib/bbx122] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Revised: 08/20/2017] [Indexed: 12/16/2022] Open
Abstract
Over the last few years, RNA-seq has been used to study alterations in alternative splicing related to several diseases. Bioinformatics workflows used to perform these studies can be divided into two groups, those finding changes in the absolute isoform expression and those studying differential splicing. Many computational methods for transcriptomics analysis have been developed, evaluated and compared; however, there are not enough reports of systematic and objective assessment of processing pipelines as a whole. Moreover, comparative studies have been performed considering separately the changes in absolute or relative isoform expression levels. Consequently, no consensus exists about the best practices and appropriate workflows to analyse alternative and differential splicing. To assist the adequate pipeline choice, we present here a benchmarking of nine commonly used workflows to detect differential isoform expression and splicing. We evaluated the workflows performance over different experimental scenarios where changes in absolute and relative isoform expression occurred simultaneously. In addition, the effect of the number of isoforms per gene, and the magnitude of the expression change over pipeline performances were also evaluated. Our results suggest that workflow performance is influenced by the number of replicates per condition and the conditions heterogeneity. In general, workflows based on DESeq2, DEXSeq, Limma and NOISeq performed well over a wide range of transcriptomics experiments. In particular, we suggest the use of workflows based on Limma when high precision is required, and DESeq2 and DEXseq pipelines to prioritize sensitivity. When several replicates per condition are available, NOISeq and Limma pipelines are indicated.
Collapse
Affiliation(s)
| | - Ana Conesa
- Microbiology and Cell Sciences Department of the University of Florida at Gainesville, FL, USA
| | | |
Collapse
|
17
|
Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet 2019; 20:631-656. [DOI: 10.1038/s41576-019-0150-2] [Citation(s) in RCA: 679] [Impact Index Per Article: 113.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/18/2019] [Indexed: 12/12/2022]
|
18
|
Van den Berge K, Hembach KM, Soneson C, Tiberi S, Clement L, Love MI, Patro R, Robinson MD. RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021255] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.
Collapse
Affiliation(s)
- Koen Van den Berge
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Katharina M. Hembach
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Simone Tiberi
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Lieven Clement
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Michael I. Love
- Department of Biostatistics and Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, New York 11794, USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
19
|
Abstract
Alternative splicing is a widespread, essential, and complex component of gene regulation. Apicomplexan parasites have long been recognized to produce alternatively spliced transcripts for some genes and can produce multiple protein products that are essential for parasite growth. Alternative splicing is a widespread, essential, and complex component of gene regulation. Apicomplexan parasites have long been recognized to produce alternatively spliced transcripts for some genes and can produce multiple protein products that are essential for parasite growth. Recent approaches are now providing more wide-ranging surveys of the extent of alternative splicing; some indicate that alternative splicing is less widespread than in other model eukaryotes, whereas others suggest levels comparable to those of previously studied groups. In many cases, apicomplexan alternative splicing events appear not to generate multiple alternative proteins but instead produce aberrant or noncoding transcripts. Nonetheless, appropriate regulation of alternative splicing is clearly essential in Plasmodium and Toxoplasma parasites, suggesting a biological role for at least some of the alternative splicing observed. Several studies have now disrupted conserved regulators of alternative splicing and demonstrated lethal effects in apicomplexans. This minireview discusses methods to accurately determine the extent of alternative splicing in Apicomplexa and discuss potential biological roles for this conserved process in a phylum of parasites with compact genomes.
Collapse
|
20
|
Abstract
Identification of differentially expressed genes has been a high priority task of downstream analyses to further advances in biomedical research. Investigators have been faced with an array of issues in dealing with more complicated experiments and metadata, including batch effects, normalization, temporal dynamics (temporally differential expression), and isoform diversity (isoform-level quantification and differential splicing events). To date, there are currently no standard approaches to precisely and efficiently analyze these moderate or large-scale experimental designs, especially with combined metadata. In this report, we propose comprehensive analytical pipelines to precisely characterize temporal dynamics in differential expression of genes and other genomic features, i.e., the variability of transcripts, isoforms and exons, by controlling batch effects and other nuisance factors that could have significant confounding effects on the main effects of interest in comparative models and may result in misleading interpretations.
Collapse
|
21
|
The Development and Use of Scalable Systems for Studying Aberrant Splicing in SF3B1-Mutant CLL. Methods Mol Biol 2018. [PMID: 30350199 DOI: 10.1007/978-1-4939-8876-1_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Mutational landscape of CLL is now known to include recurrent non-synonymous mutations in SF3B1, a core splicing factor. About 5-10% of newly diagnosed CLL harbor these mutations which are typically limited to HEAT domains in the carboxyl-terminus of the protein. Importantly, the mutations are not specific to CLL but also present in several unrelated clonal disorders. Analysis of patient samples and cell lines has shown the primary splicing aberration in SF3B1-mutant cells to the use of novel or "cryptic" 3' splice sites (3SS). Advances in genome-editing and next-generation sequencing (NGS) have allowed development of isogenic models and detailed analysis of changes to the transcriptome with relative ease. In this manuscript, we focus on two relevant methods to study splicing factor mutations in CLL: development of isogenic scalable cell lines and informatics analysis of RNA-Seq datasets.
Collapse
|
22
|
Norton SS, Vaquero-Garcia J, Lahens NF, Grant GR, Barash Y. Outlier detection for improved differential splicing quantification from RNA-Seq experiments with replicates. Bioinformatics 2018; 34:1488-1497. [PMID: 29236961 PMCID: PMC6454425 DOI: 10.1093/bioinformatics/btx790] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 11/17/2017] [Accepted: 12/07/2017] [Indexed: 01/20/2023] Open
Abstract
Motivation A key component in many RNA-Seq-based studies is contrasting multiple replicates from different experimental conditions. In this setup, replicates play a key role as they allow to capture underlying biological variability inherent to the compared conditions, as well as experimental variability. However, what constitutes a 'bad' replicate is not necessarily well defined. Consequently, researchers might discard valuable data or downstream analysis may be hampered by failed experiments. Results Here we develop a probability model to weigh a given RNA-Seq sample as a representative of an experimental condition when performing alternative splicing analysis. We demonstrate that this model detects outlier samples which are consistently and significantly different compared with other samples from the same condition. Moreover, we show that instead of discarding such samples the proposed weighting scheme can be used to downweight samples and specific splicing variations suspected as outliers, gaining statistical power. These weights can then be used for differential splicing (DS) analysis, where the resulting algorithm offers a generalization of the MAJIQ algorithm. Using both synthetic and real-life data, we perform an extensive evaluation of the improved MAJIQ algorithm in different scenarios involving perturbed samples, mislabeled samples, same condition groups, and different levels of coverage, showing it compares favorably to other tools. Overall, this work offers an outlier detection algorithm that can be combined with any splicing pipeline, a generalized and improved version of MAJIQ for DS detection, and evaluation metrics with matching code and data for DS algorithms. Availability and implementation Software and data are accessible via majiq.biociphers.org/norton_et_al_2017/. Contact yosephb@upenn.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Scott S Norton
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jorge Vaquero-Garcia
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Gregory R Grant
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
23
|
Aguiar D, Cheng LF, Dumitrascu B, Mordelet F, Pai AA, Engelhardt BE. Bayesian nonparametric discovery of isoforms and individual specific quantification. Nat Commun 2018; 9:1681. [PMID: 29703885 PMCID: PMC5923247 DOI: 10.1038/s41467-018-03402-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Accepted: 02/11/2018] [Indexed: 12/18/2022] Open
Abstract
Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop biisq, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. biisq does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. biisq shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios. Alternative splicing leads to transcript isoform diversity. Here, Aguiar et al. develop biisq, a Bayesian nonparametric approach to discover and quantify isoforms from RNA-seq data.
Collapse
Affiliation(s)
- Derek Aguiar
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.
| | - Li-Fang Cheng
- Department of Electrical Engineering, Princeton University, Princeton, NJ, 08540, USA
| | - Bianca Dumitrascu
- Lewis-Sigler Institute, Princeton University, Princeton, NJ, 08544, USA
| | - Fantine Mordelet
- Institute for Genome Sciences and Policy, Duke University, Durham, NC, 27708, USA
| | - Athma A Pai
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, MA, 01605, USA
| | - Barbara E Engelhardt
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA. .,Center for Statistics and Machine Learning, Princeton University, Princeton, NJ, 08540, USA.
| |
Collapse
|
24
|
Trincado JL, Entizne JC, Hysenaj G, Singh B, Skalic M, Elliott DJ, Eyras E. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol 2018; 19:40. [PMID: 29571299 PMCID: PMC5866513 DOI: 10.1186/s13059-018-1417-1] [Citation(s) in RCA: 365] [Impact Index Per Article: 52.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 03/02/2018] [Indexed: 02/08/2023] Open
Abstract
Despite the many approaches to study differential splicing from RNA-seq, many challenges remain unsolved, including computing capacity and sequencing depth requirements. Here we present SUPPA2, a new method that addresses these challenges, and enables streamlined analysis across multiple conditions taking into account biological variability. Using experimental and simulated data, we show that SUPPA2 achieves higher accuracy compared to other methods, especially at low sequencing depth and short read length. We use SUPPA2 to identify novel Transformer2-regulated exons, novel microexons induced during differentiation of bipolar neurons, and novel intron retention events during erythroblast differentiation.
Collapse
Affiliation(s)
| | | | - Gerald Hysenaj
- Institute of Genetic Medicine, Newcastle University, Central Parkway, Newcastle, NE1 3BZ, UK
| | - Babita Singh
- Pompeu Fabra University, E08003, Barcelona, Spain
| | - Miha Skalic
- Pompeu Fabra University, E08003, Barcelona, Spain
| | - David J Elliott
- Institute of Genetic Medicine, Newcastle University, Central Parkway, Newcastle, NE1 3BZ, UK
| | - Eduardo Eyras
- Pompeu Fabra University, E08003, Barcelona, Spain. .,Catalan Institution for Research and Advanced Studies, E08010, Barcelona, Spain.
| |
Collapse
|
25
|
Park E, Pan Z, Zhang Z, Lin L, Xing Y. The Expanding Landscape of Alternative Splicing Variation in Human Populations. Am J Hum Genet 2018; 102:11-26. [PMID: 29304370 PMCID: PMC5777382 DOI: 10.1016/j.ajhg.2017.11.002] [Citation(s) in RCA: 245] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 11/03/2017] [Indexed: 12/16/2022] Open
Abstract
Alternative splicing is a tightly regulated biological process by which the number of gene products for any given gene can be greatly expanded. Genomic variants in splicing regulatory sequences can disrupt splicing and cause disease. Recent developments in sequencing technologies and computational biology have allowed researchers to investigate alternative splicing at an unprecedented scale and resolution. Population-scale transcriptome studies have revealed many naturally occurring genetic variants that modulate alternative splicing and consequently influence phenotypic variability and disease susceptibility in human populations. Innovations in experimental and computational tools such as massively parallel reporter assays and deep learning have enabled the rapid screening of genomic variants for their causal impacts on splicing. In this review, we describe technological advances that have greatly increased the speed and scale at which discoveries are made about the genetic variation of alternative splicing. We summarize major findings from population transcriptomic studies of alternative splicing and discuss the implications of these findings for human genetics and medicine.
Collapse
Affiliation(s)
- Eddie Park
- Department of Microbiology, Immunology, & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Zhicheng Pan
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Zijun Zhang
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Lan Lin
- Department of Microbiology, Immunology, & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yi Xing
- Department of Microbiology, Immunology, & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
26
|
Du X, Hu C, Yao Y, Sun S, Zhang Y. Analysis and Prediction of Exon Skipping Events from RNA-Seq with Sequence Information Using Rotation Forest. Int J Mol Sci 2017; 18:ijms18122691. [PMID: 29231888 PMCID: PMC5751293 DOI: 10.3390/ijms18122691] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Revised: 11/21/2017] [Accepted: 12/08/2017] [Indexed: 12/14/2022] Open
Abstract
In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.
Collapse
Affiliation(s)
- Xiuquan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, China.
- Center of Information Support & Assurance Technology, Anhui University, Hefei 230601, China.
- School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| | - Changlin Hu
- School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| | - Yu Yao
- School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| | - Shiwei Sun
- School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| | - Yanping Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, China.
- Center of Information Support & Assurance Technology, Anhui University, Hefei 230601, China.
- School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| |
Collapse
|
27
|
Ding L, Rath E, Bai Y. Comparison of Alternative Splicing Junction Detection Tools Using RNA-Seq Data. Curr Genomics 2017; 18:268-277. [PMID: 28659722 PMCID: PMC5476949 DOI: 10.2174/1389202918666170215125048] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 11/28/2016] [Accepted: 12/01/2016] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Alternative splicing (AS) is a posttranscriptional process that produces differ-ent transcripts from the same gene and is important to produce diverse protein products in response to environmental stimuli. AS occurs at specific sites on the mRNA sequence, some of which have been de-fined. Multiple bioinformatics tools have been developed to detect AS from experimental data. OBJECTIVES The goal of this review is to help researchers use specific tools to aid their research and to develop new AS detection tools based on these previously established tools. METHOD We selected 15 AS detection tools that were recently published; we classified and delineated them on several aspects. Also, a performance comparison of these tools with the same starting input was conducted. RESULT We reviewed the following categorized features of the tools: Publication information, working principles, generic and distinct workflows, running platform, input data requirement, sequencing depth dependency, reads mapped to multiple locations, isoform annotation basis, precise detected AS types, and performance benchmarks. CONCLUSION Through comparisons of these tools, we provide a panorama of the advantages and short-comings of each tool and their scopes of application.
Collapse
Affiliation(s)
| | | | - Yongsheng Bai
- Department of Biology.,The Center for Genomic Advocacy, Indiana State University, Terre Haute, IN, USA
| |
Collapse
|
28
|
Wu W, Zong J, Wei N, Cheng J, Zhou X, Cheng Y, Chen D, Guo Q, Zhang B, Feng Y. CASH: a constructing comprehensive splice site method for detecting alternative splicing events. Brief Bioinform 2017; 19:905-917. [DOI: 10.1093/bib/bbx034] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2017] [Indexed: 01/03/2023] Open
Affiliation(s)
- Wenwu Wu
- The State Key Laboratory of Subtropical Silviculture, Zhejiang A & F University, Lin’an, Hangzhou, China
| | - Jie Zong
- Novel Bioinformatics Co., Ltd, Shanghai, China
| | - Ning Wei
- Institute for Nutritional Sciences, Chinese Academy of Sciences (CAS), Shanghai, China
| | - Jian Cheng
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences (CAS), Shanghai, China
| | - Xuexia Zhou
- Tianjin Medical University General Hospital, China
| | - Yuanming Cheng
- Institute for Nutritional Sciences, Chinese Academy of Sciences (CAS), Shanghai, China
| | - Dai Chen
- Novel Bioinformatics Co., Ltd, Shanghai, China
| | - Qinghua Guo
- Novel Bioinformatics Co., Ltd, Shanghai, China
| | - Bo Zhang
- Novel Bioinformatics Co., Ltd, Shanghai, China
| | - Ying Feng
- Institute for Nutritional Sciences, Chinese Academy of Sciences (CAS), Shanghai, China
| |
Collapse
|
29
|
Brown JWS, Calixto CPG, Zhang R. High-quality reference transcript datasets hold the key to transcript-specific RNA-sequencing analysis in plants. THE NEW PHYTOLOGIST 2017; 213:525-530. [PMID: 27659901 DOI: 10.1111/nph.14208] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Accepted: 08/04/2016] [Indexed: 06/06/2023]
Abstract
525 I. 525 II. 526 III. 527 IV. 527 V. 529 VI. 529 529 References 529 SUMMARY: Re-programming of the transcriptome involves both transcription and alternative splicing (AS). Some genes are regulated only at the AS level with no change in expression at the gene level. AS data must be incorporated as an essential aspect of the regulation of gene expression. RNA-sequencing (RNA-seq) can deliver both transcriptional and AS information, but accurate methods to analyse the added complexity in RNA-seq data are needed. The construction of a comprehensive reference transcript dataset (RTD) for a specific plant species, variety or accession, from all available sequence data, will immediately allow more robust analysis of RNA-seq data. RTDs will continually evolve and improve, a process that will be more efficient if resources across a community are shared and pooled.
Collapse
Affiliation(s)
- John W S Brown
- Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
- Cell and Molecular Sciences, The James Hutton Institute, Dundee, DD2 5DA, UK
| | - Cristiane P G Calixto
- Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Runxuan Zhang
- Information and Computational Sciences, The James Hutton Institute, Dundee, DD2 5DA, UK
| |
Collapse
|
30
|
Badr E, ElHefnawi M, Heath LS. Computational Identification of Tissue-Specific Splicing Regulatory Elements in Human Genes from RNA-Seq Data. PLoS One 2016; 11:e0166978. [PMID: 27861625 PMCID: PMC5115852 DOI: 10.1371/journal.pone.0166978] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 11/07/2016] [Indexed: 12/24/2022] Open
Abstract
Alternative splicing is a vital process for regulating gene expression and promoting proteomic diversity. It plays a key role in tissue-specific expressed genes. This specificity is mainly regulated by splicing factors that bind to specific sequences called splicing regulatory elements (SREs). Here, we report a genome-wide analysis to study alternative splicing on multiple tissues, including brain, heart, liver, and muscle. We propose a pipeline to identify differential exons across tissues and hence tissue-specific SREs. In our pipeline, we utilize the DEXSeq package along with our previously reported algorithms. Utilizing the publicly available RNA-Seq data set from the Human BodyMap project, we identified 28,100 differentially used exons across the four tissues. We identified tissue-specific exonic splicing enhancers that overlap with various previously published experimental and computational databases. A complicated exonic enhancer regulatory network was revealed, where multiple exonic enhancers were found across multiple tissues while some were found only in specific tissues. Putative combinatorial exonic enhancers and silencers were discovered as well, which may be responsible for exon inclusion or exclusion across tissues. Some of the exonic enhancers are found to be co-occurring with multiple exonic silencers and vice versa, which demonstrates a complicated relationship between tissue-specific exonic enhancers and silencers.
Collapse
Affiliation(s)
- Eman Badr
- Department of Information Technology, Faculty of Computers and Information, Cairo University, Giza, Egypt
- * E-mail:
| | - Mahmoud ElHefnawi
- Center of Excellence for Advanced Sciences, Informatics and Systems Department, National Research Center, Cairo, Egypt
- Center for Informatics Science, Nile University, Sheikh Zayed City, Egypt
| | - Lenwood S. Heath
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
31
|
Abstract
The recent genomic characterization of cancers has revealed recurrent somatic point mutations and copy number changes affecting genes encoding RNA splicing factors. Initial studies of these 'spliceosomal mutations' suggest that the proteins bearing these mutations exhibit altered splice site and/or exon recognition preferences relative to their wild-type counterparts, resulting in cancer-specific mis-splicing. Such changes in the splicing machinery may create novel vulnerabilities in cancer cells that can be therapeutically exploited using compounds that can influence the splicing process. Further studies to dissect the biochemical, genomic and biological effects of spliceosomal mutations are crucial for the development of cancer therapies targeted at these mutations.
Collapse
Affiliation(s)
- Heidi Dvinge
- Computational Biology Program, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Eunhee Kim
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Omar Abdel-Wahab
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Leukemia Service, Dept. of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Robert K. Bradley
- Computational Biology Program, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
32
|
Nowicka M, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res 2016; 5:1356. [PMID: 28105305 PMCID: PMC5200948 DOI: 10.12688/f1000research.8900.2] [Citation(s) in RCA: 122] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/01/2016] [Indexed: 02/03/2023] Open
Abstract
There are many instances in genomics data analyses where measurements are made on a multivariate response. For example, alternative splicing can lead to multiple expressed isoforms from the same primary transcript. There are situations where differences (e.g. between normal and disease state) in the relative ratio of expressed isoforms may have significant phenotypic consequences or lead to prognostic capabilities. Similarly, knowledge of single nucleotide polymorphisms (SNPs) that affect splicing, so-called splicing quantitative trait loci (sQTL) will help to characterize the effects of genetic variation on gene expression. RNA sequencing (RNA-seq) has provided an attractive toolbox to carefully unravel alternative splicing outcomes and recently, fast and accurate methods for transcript quantification have become available. We propose a statistical framework based on the Dirichlet-multinomial distribution that can discover changes in isoform usage between conditions and SNPs that affect relative expression of transcripts using these quantifications. The Dirichlet-multinomial model naturally accounts for the differential gene expression without losing information about overall gene abundance and by joint modeling of isoform expression, it has the capability to account for their correlated nature. The main challenge in this approach is to get robust estimates of model parameters with limited numbers of replicates. We approach this by sharing information and show that our method improves on existing approaches in terms of standard statistical performance metrics. The framework is applicable to other multivariate scenarios, such as Poly-A-seq or where beta-binomial models have been applied (e.g., differential DNA methylation). Our method is available as a Bioconductor R package called DRIMSeq.
Collapse
Affiliation(s)
- Malgorzata Nowicka
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
33
|
Nowicka M, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res 2016; 5:1356. [PMID: 28105305 PMCID: PMC5200948 DOI: 10.12688/f1000research.8900.1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/01/2016] [Indexed: 08/31/2023] Open
Abstract
There are many instances in genomics data analyses where measurements are made on a multivariate response. For example, alternative splicing can lead to multiple expressed isoforms from the same primary transcript. There are situations where differences (e.g. between normal and disease state) in the relative ratio of expressed isoforms may have significant phenotypic consequences or lead to prognostic capabilities. Similarly, knowledge of single nucleotide polymorphisms (SNPs) that affect splicing, so-called splicing quantitative trait loci (sQTL) will help to characterize the effects of genetic variation on gene expression. RNA sequencing (RNA-seq) has provided an attractive toolbox to carefully unravel alternative splicing outcomes and recently, fast and accurate methods for transcript quantification have become available. We propose a statistical framework based on the Dirichlet-multinomial distribution that can discover changes in isoform usage between conditions and SNPs that affect relative expression of transcripts using these quantifications. The Dirichlet-multinomial model naturally accounts for the differential gene expression without losing information about overall gene abundance and by joint modeling of isoform expression, it has the capability to account for their correlated nature. The main challenge in this approach is to get robust estimates of model parameters with limited numbers of replicates. We approach this by sharing information and show that our method improves on existing approaches in terms of standard statistical performance metrics. The framework is applicable to other multivariate scenarios, such as Poly-A-seq or where beta-binomial models have been applied (e.g., differential DNA methylation). Our method is available as a Bioconductor R package called DRIMSeq.
Collapse
Affiliation(s)
- Malgorzata Nowicka
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Mark D. Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
34
|
Hartley SW, Mullikin JC. Detection and visualization of differential splicing in RNA-Seq data with JunctionSeq. Nucleic Acids Res 2016; 44:e127. [PMID: 27257077 PMCID: PMC5009739 DOI: 10.1093/nar/gkw501] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 05/24/2016] [Indexed: 12/14/2022] Open
Abstract
Although RNA-Seq data provide unprecedented isoform-level expression information, detection of alternative isoform regulation (AIR) remains difficult, particularly when working with an incomplete transcript annotation. We introduce JunctionSeq, a new method that builds on the statistical techniques used by the well-established DEXSeq package to detect differential usage of both exonic regions and splice junctions. In particular, JunctionSeq is capable of detecting differential usage of novel splice junctions without the need for an additional isoform assembly step, greatly improving performance when the available transcript annotation is flawed or incomplete. JunctionSeq also provides a powerful and streamlined visualization toolset that allows bioinformaticians to quickly and intuitively interpret their results. We tested our method on publicly available data from several experiments performed on the rat pineal gland and Toxoplasma gondii, successfully detecting known and previously validated AIR genes in 19 out of 19 gene-level hypothesis tests. Due to its ability to query novel splice sites, JunctionSeq is still able to detect these differences even when all alternative isoforms for these genes were not included in the transcript annotation. JunctionSeq thus provides a powerful method for detecting alternative isoform regulation even with low-quality annotations. An implementation of JunctionSeq is available as an R/Bioconductor package.
Collapse
Affiliation(s)
- Stephen W Hartley
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - James C Mullikin
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
35
|
Goldstein LD, Cao Y, Pau G, Lawrence M, Wu TD, Seshagiri S, Gentleman R. Prediction and Quantification of Splice Events from RNA-Seq Data. PLoS One 2016; 11:e0156132. [PMID: 27218464 PMCID: PMC4878813 DOI: 10.1371/journal.pone.0156132] [Citation(s) in RCA: 89] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 04/18/2016] [Indexed: 01/24/2023] Open
Abstract
Analysis of splice variants from short read RNA-seq data remains a challenging problem. Here we present a novel method for the genome-guided prediction and quantification of splice events from RNA-seq data, which enables the analysis of unannotated and complex splice events. Splice junctions and exons are predicted from reads mapped to a reference genome and are assembled into a genome-wide splice graph. Splice events are identified recursively from the graph and are quantified locally based on reads extending across the start or end of each splice variant. We assess prediction accuracy based on simulated and real RNA-seq data, and illustrate how different read aligners (GSNAP, HISAT2, STAR, TopHat2) affect prediction results. We validate our approach for quantification based on simulated data, and compare local estimates of relative splice variant usage with those from other methods (MISO, Cufflinks) based on simulated and real RNA-seq data. In a proof-of-concept study of splice variants in 16 normal human tissues (Illumina Body Map 2.0) we identify 249 internal exons that belong to known genes but are not related to annotated exons. Using independent RNA samples from 14 matched normal human tissues, we validate 9/9 of these exons by RT-PCR and 216/249 by paired-end RNA-seq (2 x 250 bp). These results indicate that de novo prediction of splice variants remains beneficial even in well-studied systems. An implementation of our method is freely available as an R/Bioconductor package SGSeq.
Collapse
Affiliation(s)
- Leonard D. Goldstein
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA, United States of America
- Department of Molecular Biology, Genentech Inc., South San Francisco, CA, United States of America
- * E-mail:
| | - Yi Cao
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA, United States of America
| | - Gregoire Pau
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA, United States of America
| | - Michael Lawrence
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA, United States of America
| | - Thomas D. Wu
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA, United States of America
| | - Somasekar Seshagiri
- Department of Molecular Biology, Genentech Inc., South San Francisco, CA, United States of America
| | - Robert Gentleman
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA, United States of America
| |
Collapse
|
36
|
Involvement of PARP1 in the regulation of alternative splicing. Cell Discov 2016; 2:15046. [PMID: 27462443 PMCID: PMC4860959 DOI: 10.1038/celldisc.2015.46] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 11/11/2015] [Indexed: 12/18/2022] Open
Abstract
Specialized chromatin structures such as nucleosomes with specific histone modifications decorate exons in eukaryotic genomes, suggesting a functional connection between chromatin organization and the regulation of pre-mRNA splicing. Through profiling the functional location of Poly (ADP) ribose polymerase, we observed that it is associated with the nucleosomes at exon/intron boundaries of specific genes, suggestive of a role for this enzyme in alternative splicing. Poly (ADP) ribose polymerase has previously been implicated in the PARylation of splicing factors as well as regulation of the histone modification H3K4me3, a mark critical for co-transcriptional splicing. In light of these studies, we hypothesized that interaction of the chromatin-modifying factor, Poly (ADP) ribose polymerase with nucleosomal structures at exon–intron boundaries, might regulate pre-mRNA splicing. Using genome-wide approaches validated by gene-specific assays, we show that depletion of PARP1 or inhibition of its PARylation activity results in changes in alternative splicing of a specific subset of genes. Furthermore, we observed that PARP1 bound to RNA, splicing factors and chromatin, suggesting that Poly (ADP) ribose polymerase serves as a gene regulatory hub to facilitate co-transcriptional splicing. These studies add another function to the multi-functional protein, Poly (ADP) ribose polymerase, and provide a platform for further investigation of this protein’s function in organizing chromatin during gene regulatory processes.
Collapse
|
37
|
Vaquero-Garcia J, Barrera A, Gazzara MR, González-Vallinas J, Lahens NF, Hogenesch JB, Lynch KW, Barash Y. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 2016; 5:e11752. [PMID: 26829591 PMCID: PMC4801060 DOI: 10.7554/elife.11752] [Citation(s) in RCA: 282] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 01/31/2016] [Indexed: 12/29/2022] Open
Abstract
Alternative splicing (AS) can critically affect gene function and disease, yet mapping splicing variations remains a challenge. Here, we propose a new approach to define and quantify mRNA splicing in units of local splicing variations (LSVs). LSVs capture previously defined types of alternative splicing as well as more complex transcript variations. Building the first genome wide map of LSVs from twelve mouse tissues, we find complex LSVs constitute over 30% of tissue dependent transcript variations and affect specific protein families. We show the prevalence of complex LSVs is conserved in humans and identify hundreds of LSVs that are specific to brain subregions or altered in Alzheimer's patients. Amongst those are novel isoforms in the Camk2 family and a novel poison exon in Ptbp1, a key splice factor in neurogenesis. We anticipate the approach presented here will advance the ability to relate tissue-specific splice variation to genetic variation, phenotype, and disease.
Collapse
Affiliation(s)
- Jorge Vaquero-Garcia
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, United States
| | - Alejandro Barrera
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, United States
| | - Matthew R Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.,Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
| | - Juan González-Vallinas
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, United States
| | - Nicholas F Lahens
- Department of Pharmacology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
| | - John B Hogenesch
- Department of Pharmacology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
| | - Kristen W Lynch
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.,Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, United States
| |
Collapse
|
38
|
Soneson C, Matthes KL, Nowicka M, Law CW, Robinson MD. Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage. Genome Biol 2016; 17:12. [PMID: 26813113 PMCID: PMC4729156 DOI: 10.1186/s13059-015-0862-3] [Citation(s) in RCA: 98] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2015] [Accepted: 12/29/2015] [Indexed: 01/08/2023] Open
Abstract
Background RNA-seq has been a boon to the quantitative analysis of transcriptomes. A notable application is the detection of changes in transcript usage between experimental conditions. For example, discovery of pathological alternative splicing may allow the development of new treatments or better management of patients. From an analysis perspective, there are several ways to approach RNA-seq data to unravel differential transcript usage, such as annotation-based exon-level counting, differential analysis of the percentage spliced in, or quantitative analysis of assembled transcripts. The goal of this research is to compare and contrast current state-of-the-art methods, and to suggest improvements to commonly used work flows. Results We assess the performance of representative work flows using synthetic data and explore the effect of using non-standard counting bin definitions as input to DEXSeq, a state-of-the-art inference engine. Although the canonical counting provided the best results overall, several non-canonical approaches were as good or better in specific aspects and most counting approaches outperformed the evaluated event- and assembly-based methods. We show that an incomplete annotation catalog can have a detrimental effect on the ability to detect differential transcript usage in transcriptomes with few isoforms per gene and that isoform-level prefiltering can considerably improve false discovery rate control. Conclusion Count-based methods generally perform well in the detection of differential transcript usage. Controlling the false discovery rate at the imposed threshold is difficult, particularly in complex organisms, but can be improved by prefiltering the annotation catalog. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0862-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland.
| | - Katarina L Matthes
- Division of Chronic Disease Epidemiology, Epidemiology, Biostatistics and Prevention Institute (EPBI), University of Zurich, Hirschengraben 84, Zurich, 8001, Switzerland. .,Cancer Registry Zurich and Zug, University Hospital Zurich, Vogelsangstrasse 10, Zurich, 8091, Switzerland.
| | - Malgorzata Nowicka
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland.
| | - Charity W Law
- Molecular Medicine Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia.
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland.
| |
Collapse
|
39
|
Sun W, Liu Y, Crowley JJ, Chen TH, Zhou H, Chu H, Huang S, Kuan PF, Li Y, Miller DR, Shaw GD, Wu Y, Zhabotynsky V, McMillan L, Zou F, Sullivan PF, de Villena FPM. IsoDOT Detects Differential RNA-isoform Expression/Usage with respect to a Categorical or Continuous Covariate with High Sensitivity and Specificity. J Am Stat Assoc 2015; 110:975-986. [PMID: 26617424 DOI: 10.1080/01621459.2015.1040880] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We have developed a statistical method named IsoDOT to assess differential isoform expression (DIE) and differential isoform usage (DIU) using RNA-seq data. Here isoform usage refers to relative isoform expression given the total expression of the corresponding gene. IsoDOT performs two tasks that cannot be accomplished by existing methods: to test DIE/DIU with respect to a continuous covariate, and to test DIE/DIU for one case versus one control. The latter task is not an uncommon situation in practice, e.g., comparing the paternal and maternal alleles of one individual or comparing tumor and normal samples of one cancer patient. Simulation studies demonstrate the high sensitivity and specificity of IsoDOT. We apply IsoDOT to study the effects of haloperidol treatment on the mouse transcriptome and identify a group of genes whose isoform usages respond to haloperidol treatment.
Collapse
Affiliation(s)
- Wei Sun
- Department of Biostatistics, Department of Genetics, UNC Chapel Hill, NC 27599
| | - Yufeng Liu
- Department of Statistics and Operations Research, Department of Genetics, Department and Biostatistics, UNC Chapel Hill
| | | | | | - Hua Zhou
- Department of Statistics, NC State University
| | - Haitao Chu
- Department of Biostatistics, University of Minnesota
| | | | - Pei-Fen Kuan
- Department of Applied Mathematics and Statistics, Stony Brook University
| | - Yuan Li
- Department of Statistics, NC State University
| | - Darla R Miller
- Department of Genetics, Lineberger Comprehensive Cancer Center, UNC Chapel Hill
| | - Ginger D Shaw
- Department of Genetics, Lineberger Comprehensive Cancer Center, UNC Chapel Hill
| | - Yichao Wu
- Department of Statistics, NC State University
| | | | | | - Fei Zou
- Department of Biostatistics, UNC Chapel Hill
| | - Patrick F Sullivan
- Department of Genetics, Department of Psychiatry, Department of Epidemiology, UNC Chapel Hill
| | | |
Collapse
|
40
|
Yu NYL, Hallström BM, Fagerberg L, Ponten F, Kawaji H, Carninci P, Forrest ARR, Hayashizaki Y, Uhlén M, Daub CO. Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium. Nucleic Acids Res 2015; 43:6787-98. [PMID: 26117540 PMCID: PMC4538815 DOI: 10.1093/nar/gkv608] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 05/28/2015] [Accepted: 05/29/2015] [Indexed: 12/20/2022] Open
Abstract
Understanding the normal state of human tissue transcriptome profiles is essential for recognizing tissue disease states and identifying disease markers. Recently, the Human Protein Atlas and the FANTOM5 consortium have each published extensive transcriptome data for human samples using Illumina-sequenced RNA-Seq and Heliscope-sequenced CAGE. Here, we report on the first large-scale complex tissue transcriptome comparison between full-length versus 5'-capped mRNA sequencing data. Overall gene expression correlation was high between the 22 corresponding tissues analyzed (R > 0.8). For genes ubiquitously expressed across all tissues, the two data sets showed high genome-wide correlation (91% agreement), with differences observed for a small number of individual genes indicating the need to update their gene models. Among the identified single-tissue enriched genes, up to 75% showed consensus of 7-fold enrichment in the same tissue in both methods, while another 17% exhibited multiple tissue enrichment and/or high expression variety in the other data set, likely dependent on the cell type proportions included in each tissue sample. Our results show that RNA-Seq and CAGE tissue transcriptome data sets are highly complementary for improving gene model annotations and highlight biological complexities within tissue transcriptomes. Furthermore, integration with image-based protein expression data is highly advantageous for understanding expression specificities for many genes.
Collapse
Affiliation(s)
- Nancy Yiu-Lin Yu
- Department of Biosciences and Nutrition, Karolinska Institute, Huddinge, 14183, Sweden Science for Life Laboratory, Karolinska Institute, Solna, 17121, Sweden
| | - Björn M Hallström
- Science for Life Laboratory, KTH-Royal Institute of Technology, Solna, 17121, Sweden
| | - Linn Fagerberg
- Science for Life Laboratory, KTH-Royal Institute of Technology, Solna, 17121, Sweden
| | - Fredrik Ponten
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, 751 85, Sweden
| | - Hideya Kawaji
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama 351-0198, Japan RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Omics Science Center1, Yokohama, Kanagawa, 230-0045, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Omics Science Center1, Yokohama, Kanagawa, 230-0045, Japan
| | - Alistair R R Forrest
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Omics Science Center1, Yokohama, Kanagawa, 230-0045, Japan
| | - Yoshihide Hayashizaki
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama 351-0198, Japan RIKEN Omics Science Center1, Yokohama, Kanagawa, 230-0045, Japan
| | - Mathias Uhlén
- Science for Life Laboratory, KTH-Royal Institute of Technology, Solna, 17121, Sweden
| | - Carsten O Daub
- Department of Biosciences and Nutrition, Karolinska Institute, Huddinge, 14183, Sweden Science for Life Laboratory, Karolinska Institute, Solna, 17121, Sweden RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, Tsurumi-ku, Yokohama, 230-0045, Japan RIKEN Omics Science Center1, Yokohama, Kanagawa, 230-0045, Japan
| |
Collapse
|
41
|
Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G, Zavolan M. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol 2015. [PMID: 26201343 PMCID: PMC4511015 DOI: 10.1186/s13059-015-0702-5] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Background Understanding the regulation of gene expression, including transcription start site usage, alternative splicing, and polyadenylation, requires accurate quantification of expression levels down to the level of individual transcript isoforms. To comparatively evaluate the accuracy of the many methods that have been proposed for estimating transcript isoform abundance from RNA sequencing data, we have used both synthetic data as well as an independent experimental method for quantifying the abundance of transcript ends at the genome-wide level. Results We found that many tools have good accuracy and yield better estimates of gene-level expression compared to commonly used count-based approaches, but they vary widely in memory and runtime requirements. Nucleotide composition and intron/exon structure have comparatively little influence on the accuracy of expression estimates, which correlates most strongly with transcript/gene expression levels. To facilitate the reproduction and further extension of our study, we provide datasets, source code, and an online analysis tool on a companion website, where developers can upload expression estimates obtained with their own tool to compare them to those inferred by the methods assessed here. Conclusions As many methods for quantifying isoform abundance with comparable accuracy are available, a user’s choice will likely be determined by factors such as the memory and runtime requirements, as well as the availability of methods for downstream analyses. Sequencing-based methods to quantify the abundance of specific transcript regions could complement validation schemes based on synthetic data and quantitative PCR in future or ongoing assessments of RNA-seq analysis methods. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0702-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexander Kanitz
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland.
| | - Foivos Gypas
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland.
| | - Andreas J Gruber
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland.
| | - Andreas R Gruber
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland.
| | - Georges Martin
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland.
| | - Mihaela Zavolan
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland.
| |
Collapse
|
42
|
Yang Bai, Shufan Ji, Qinghua Jiang, Yadong Wang. Identification Exon Skipping Events From High-Throughput RNA Sequencing Data. IEEE Trans Nanobioscience 2015; 14:562-9. [DOI: 10.1109/tnb.2015.2419812] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
43
|
Carrara M, Lum J, Cordero F, Beccuti M, Poidinger M, Donatelli S, Calogero RA, Zolezzi F. Alternative splicing detection workflow needs a careful combination of sample prep and bioinformatics analysis. BMC Bioinformatics 2015; 16 Suppl 9:S2. [PMID: 26050971 PMCID: PMC4464605 DOI: 10.1186/1471-2105-16-s9-s2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Background RNA-Seq provides remarkable power in the area of biomarkers discovery and disease characterization. Two crucial steps that affect RNA-Seq experiment results are Library Sample Preparation (LSP) and Bioinformatics Analysis (BA). This work describes an evaluation of the combined effect of LSP methods and BA tools in the detection of splice variants. Results Different LSPs (TruSeq unstranded/stranded, ScriptSeq, NuGEN) allowed the detection of a large common set of splice variants. However, each LSP also detected a small set of unique transcripts that are characterized by a low coverage and/or FPKM. This effect was particularly evident using the low input RNA NuGEN v2 protocol. A benchmark dataset, in which synthetic reads as well as reads generated from standard (Illumina TruSeq 100) and low input (NuGEN) LSPs were spiked-in was used to evaluate the effect of LSP on the statistical detection of alternative splicing events (AltDE). Statistical detection of AltDE was done using as prototypes for splice variant-quantification Cuffdiff2 and RSEM-EBSeq. As prototype for exon-level analysis DEXSeq was used. Exon-level analysis performed slightly better than splice variant-quantification approaches, although at most only 50% of the spiked-in transcripts was detected. The performances of both splice variant-quantification and exon-level analysis improved when raising the number of input reads. Conclusion Data, derived from NuGEN v2, were not the ideal input for AltDE, especially when the exon-level approach was used. We observed that both splice variant-quantification and exon-level analysis performances were strongly dependent on the number of input reads. Moreover, the ribosomal RNA depletion protocol was less sensitive in detecting splicing variants, possibly due to the significant percentage of the reads mapping to non-coding transcripts.
Collapse
|
44
|
Bonfert T, Kirner E, Csaba G, Zimmer R, Friedel CC. ContextMap 2: fast and accurate context-based RNA-seq mapping. BMC Bioinformatics 2015; 16:122. [PMID: 25928589 PMCID: PMC4411664 DOI: 10.1186/s12859-015-0557-5] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 03/30/2015] [Indexed: 01/24/2023] Open
Abstract
Background Mapping of short sequencing reads is a crucial step in the analysis of RNA sequencing (RNA-seq) data. ContextMap is an RNA-seq mapping algorithm that uses a context-based approach to identify the best alignment for each read and allows parallel mapping against several reference genomes. Results In this article, we present ContextMap 2, a new and improved version of ContextMap. Its key novel features are: (i) a plug-in structure that allows easily integrating novel short read alignment programs with improved accuracy and runtime; (ii) context-based identification of insertions and deletions (indels); (iii) mapping of reads spanning an arbitrary number of exons and indels. ContextMap 2 using Bowtie, Bowtie 2 or BWA was evaluated on both simulated and real-life data from the recently published RGASP study. Conclusions We show that ContextMap 2 generally combines similar or higher recall compared to other state-of-the-art approaches with significantly higher precision in read placement and junction and indel prediction. Furthermore, runtime was significantly lower than for the best competing approaches. ContextMap 2 is freely available at http://www.bio.ifi.lmu.de/ContextMap. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0557-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Thomas Bonfert
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| | - Evelyn Kirner
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| | - Gergely Csaba
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| | - Ralf Zimmer
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| | - Caroline C Friedel
- Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, Munich, 80333, Germany.
| |
Collapse
|
45
|
Liao JL, Zhou HW, Peng Q, Zhong PA, Zhang HY, He C, Huang YJ. Transcriptome changes in rice (Oryza sativa L.) in response to high night temperature stress at the early milky stage. BMC Genomics 2015; 16:18. [PMID: 25928563 PMCID: PMC4369907 DOI: 10.1186/s12864-015-1222-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 01/05/2015] [Indexed: 11/29/2022] Open
Abstract
Background Rice yield and quality are adversely affected by high temperatures, especially at night; high nighttime temperatures are more harmful to grain weight than high daytime temperatures. Unfortunately, global temperatures are consistently increasing at an alarming rate and the minimum nighttime temperature has increased three times as much as the corresponding maximum daytime temperature over the past few decades. Results We analyzed the transcriptome profiles for rice grain from heat-tolerant and -sensitive lines in response to high night temperatures at the early milky stage using the Illumina Sequencing method. The analysis results for the sequencing data indicated that 35 transcripts showed different expressions between heat-tolerant and -sensitive rice, and RT-qPCR analyses confirmed the expression patterns of selected transcripts. Functional analysis of the differentially expressed transcripts indicated that 21 genes have functional annotation and their functions are mainly involved in oxidation-reduction (6 genes), metabolic (7 genes), transport (4 genes), transcript regulation (2 genes), defense response (1 gene) and photosynthetic (1 gene) processes. Based on the functional annotation of the differentially expressed genes, the possible process that regulates these differentially expressed transcripts in rice grain responding to high night temperature stress at the early milky stage was further analyzed. This analysis indicated that high night temperature stress disrupts electron transport in the mitochondria, which leads to changes in the concentration of hydrogen ions in the mitochondrial and cellular matrix and influences the activity of enzymes involved in TCA and its secondary metabolism in plant cells. Conclusions Using Illumina sequencing technology, the differences between the transcriptomes of heat-tolerant and -sensitive rice lines in response to high night temperature stress at the early milky stage was described here for the first time. The candidate transcripts may provide genetic resources that may be useful in the improvement of heat-tolerant characters of rice. The model proposed here is based on differences in expression and transcription between two rice lines. In addition, the model may support future studies on the molecular mechanisms underlying plant responses to high night temperatures. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1222-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jiang-Lin Liao
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education, Jiangxi Province, 330045, China. .,Key Laboratory of Agriculture responding to Climate Change (Jiangxi Agricultural University), Nanchang City, Jiangxi Province, 330045, China.
| | - Hui-Wen Zhou
- Key Laboratory of Agriculture responding to Climate Change (Jiangxi Agricultural University), Nanchang City, Jiangxi Province, 330045, China.
| | - Qi Peng
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education, Jiangxi Province, 330045, China.
| | - Ping-An Zhong
- Key Laboratory of Agriculture responding to Climate Change (Jiangxi Agricultural University), Nanchang City, Jiangxi Province, 330045, China.
| | - Hong-Yu Zhang
- Key Laboratory of Agriculture responding to Climate Change (Jiangxi Agricultural University), Nanchang City, Jiangxi Province, 330045, China.
| | - Chao He
- Key Laboratory of Agriculture responding to Climate Change (Jiangxi Agricultural University), Nanchang City, Jiangxi Province, 330045, China.
| | - Ying-Jin Huang
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education, Jiangxi Province, 330045, China. .,Key Laboratory of Agriculture responding to Climate Change (Jiangxi Agricultural University), Nanchang City, Jiangxi Province, 330045, China.
| |
Collapse
|
46
|
Liu R, Loraine AE, Dickerson JA. Comparisons of computational methods for differential alternative splicing detection using RNA-seq in plant systems. BMC Bioinformatics 2014; 15:364. [PMID: 25511303 PMCID: PMC4271460 DOI: 10.1186/s12859-014-0364-4] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2014] [Accepted: 10/29/2014] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Alternative Splicing (AS) as a post-transcription regulation mechanism is an important application of RNA-seq studies in eukaryotes. A number of software and computational methods have been developed for detecting AS. Most of the methods, however, are designed and tested on animal data, such as human and mouse. Plants genes differ from those of animals in many ways, e.g., the average intron size and preferred AS types. These differences may require different computational approaches and raise questions about their effectiveness on plant data. The goal of this paper is to benchmark existing computational differential splicing (or transcription) detection methods so that biologists can choose the most suitable tools to accomplish their goals. RESULTS This study compares the eight popular public available software packages for differential splicing analysis using both simulated and real Arabidopsis thaliana RNA-seq data. All software are freely available. The study examines the effect of varying AS ratio, read depth, dispersion pattern, AS types, sample sizes and the influence of annotation. Using a real data, the study looks at the consistences between the packages and verifies a subset of the detected AS events using PCR studies. CONCLUSIONS No single method performs the best in all situations. The accuracy of annotation has a major impact on which method should be chosen for AS analysis. DEXSeq performs well in the simulated data when the AS signal is relative strong and annotation is accurate. Cufflinks achieve a better tradeoff between precision and recall and turns out to be the best one when incomplete annotation is provided. Some methods perform inconsistently for different AS types. Complex AS events that combine several simple AS events impose problems for most methods, especially for MATS. MATS stands out in the analysis of real RNA-seq data when all the AS events being evaluated are simple AS events.
Collapse
Affiliation(s)
- Ruolin Liu
- Department of Electrical and Computational Engineering, Iowa State University, Howe Hall, Ames, 50011-3060, USA.
| | - Ann E Loraine
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, North Carolina Research Campus, 600 Laureate Way, Kannapolis, 28081, NC, USA.
| | - Julie A Dickerson
- Department of Electrical and Computational Engineering, Iowa State University, Howe Hall, Ames, 50011-3060, USA.
| |
Collapse
|
47
|
Coble DJ, Fleming D, Persia ME, Ashwell CM, Rothschild MF, Schmidt CJ, Lamont SJ. RNA-seq analysis of broiler liver transcriptome reveals novel responses to high ambient temperature. BMC Genomics 2014; 15:1084. [PMID: 25494716 PMCID: PMC4299486 DOI: 10.1186/1471-2164-15-1084] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 12/02/2014] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND In broilers, high ambient temperature can result in reduced feed consumption, digestive inefficiency, impaired metabolism, and even death. The broiler sector of the U.S. poultry industry incurs approximately $52 million in heat-related losses annually. The objective of this study is to characterize the effects of cyclic high ambient temperature on the transcriptome of a metabolically active organ, the liver. This study provides novel insight into the effects of high ambient temperature on metabolism in broilers, because it is the first reported RNA-seq study to characterize the effect of heat on the transcriptome of a metabolic-related tissue. This information provides a platform for future investigations to further elucidate physiologic responses to high ambient temperature and seek methods to ameliorate the negative impacts of heat. RESULTS Transcriptome sequencing of the livers of 8 broiler males using Illumina HiSeq 2000 technology resulted in 138 million, 100-base pair single end reads, yielding a total of 13.8 gigabases of sequence. Forty genes were differentially expressed at a significance level of P-value < 0.05 and a fold-change ≥ 2 in response to a week of cyclic high ambient temperature with 27 down-regulated and 13 up-regulated genes. Two gene networks were created from the function-based Ingenuity Pathway Analysis (IPA) of the differentially expressed genes: "Cell Signaling" and "Endocrine System Development and Function". The gene expression differences in the liver transcriptome of the heat-exposed broilers reflected physiological responses to decrease internal temperature, reduce hyperthermia-induced apoptosis, and promote tissue repair. Additionally, the differential gene expression revealed a physiological response to regulate the perturbed cellular calcium levels that can result from high ambient temperature exposure. CONCLUSIONS Exposure to cyclic high ambient temperature results in changes at the metabolic, physiologic, and cellular level that can be characterized through RNA-seq analysis of the liver transcriptome of broilers. The findings highlight specific physiologic mechanisms by which broilers reduce the effects of exposure to high ambient temperature. This information provides a foundation for future investigations into the gene networks involved in the broiler stress response and for development of strategies to ameliorate the negative impacts of heat on animal production and welfare.
Collapse
Affiliation(s)
- Derrick J Coble
- />Department of Animal Science, Iowa State University, Ames, IA 50011 USA
| | - Damarius Fleming
- />Department of Animal Science, Iowa State University, Ames, IA 50011 USA
| | - Michael E Persia
- />Department of Animal Science, Iowa State University, Ames, IA 50011 USA
| | - Chris M Ashwell
- />Department of Poultry Science, North Carolina State University, Raleigh, NC 27695 USA
| | - Max F Rothschild
- />Department of Animal Science, Iowa State University, Ames, IA 50011 USA
| | - Carl J Schmidt
- />Department of Animal and Food Sciences, University of Delaware, Newark, DE 19716 USA
| | - Susan J Lamont
- />Department of Animal Science, Iowa State University, Ames, IA 50011 USA
| |
Collapse
|
48
|
rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci U S A 2014; 111:E5593-601. [PMID: 25480548 DOI: 10.1073/pnas.1419161111] [Citation(s) in RCA: 1630] [Impact Index Per Article: 148.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Ultra-deep RNA sequencing (RNA-Seq) has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We previously developed multivariate analysis of transcript splicing (MATS), a statistical method for detecting differential alternative splicing between two RNA-Seq samples. Here we describe a new statistical model and computer program, replicate MATS (rMATS), designed for detection of differential alternative splicing from replicate RNA-Seq data. rMATS uses a hierarchical model to simultaneously account for sampling uncertainty in individual replicates and variability among replicates. In addition to the analysis of unpaired replicates, rMATS also includes a model specifically designed for paired replicates between sample groups. The hypothesis-testing framework of rMATS is flexible and can assess the statistical significance over any user-defined magnitude of splicing change. The performance of rMATS is evaluated by the analysis of simulated and real RNA-Seq data. rMATS outperformed two existing methods for replicate RNA-Seq data in all simulation settings, and RT-PCR yielded a high validation rate (94%) in an RNA-Seq dataset of prostate cancer cell lines. Our data also provide guiding principles for designing RNA-Seq studies of alternative splicing. We demonstrate that it is essential to incorporate biological replicates in the study design. Of note, pooling RNAs or merging RNA-Seq data from multiple replicates is not an effective approach to account for variability, and the result is particularly sensitive to outliers. The rMATS source code is freely available at rnaseq-mats.sourceforge.net/. As the popularity of RNA-Seq continues to grow, we expect rMATS will be useful for studies of alternative splicing in diverse RNA-Seq projects.
Collapse
|
49
|
de Klerk E, den Dunnen JT, 't Hoen PAC. RNA sequencing: from tag-based profiling to resolving complete transcript structure. Cell Mol Life Sci 2014; 71:3537-51. [PMID: 24827995 PMCID: PMC4143603 DOI: 10.1007/s00018-014-1637-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Revised: 04/13/2014] [Accepted: 04/28/2014] [Indexed: 12/22/2022]
Abstract
Technological advances in the sequencing field support in-depth characterization of the transcriptome. Here, we review genome-wide RNA sequencing methods used to investigate specific aspects of gene expression and its regulation, from transcription to RNA processing and translation. We discuss tag-based methods for studying transcription, alternative initiation and polyadenylation events, shotgun methods for detection of alternative splicing, full-length RNA sequencing for the determination of complete transcript structures, and targeted methods for studying the process of transcription and translation. With the ensemble of technologies available, it is now possible to obtain a comprehensive view on transcriptome complexity and the regulation of transcript diversity.
Collapse
Affiliation(s)
- Eleonora de Klerk
- Department of Human Genetics, Leiden University Medical Center, 2300 RC, Leiden, The Netherlands
| | | | | |
Collapse
|
50
|
Angelini C, De Canditiis D, De Feis I. Computational approaches for isoform detection and estimation: good and bad news. BMC Bioinformatics 2014; 15:135. [PMID: 24885830 PMCID: PMC4098781 DOI: 10.1186/1471-2105-15-135] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 04/24/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue--at a particular stage and condition--to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quantification of transcriptome at unprecedented level of resolution, accuracy and low cost. Several computational methods have been proposed to achieve such purposes. However, it is still not clear which promises are already met and which challenges are still open and require further methodological developments. RESULTS We carried out a simulation study to assess the performance of 5 widely used tools, such as: CEM, Cufflinks, iReckon, RSEM, and SLIDE. All of them have been used with default parameters. In particular, we considered the effect of the following three different scenarios: the availability of complete annotation, incomplete annotation, and no annotation at all. Moreover, comparisons were carried out using the methods in three different modes of action. In the first mode, the methods were forced to only deal with those isoforms that are present in the annotation; in the second mode, they were allowed to detect novel isoforms using the annotation as guide; in the third mode, they were operating in fully data driven way (although with the support of the alignment on the reference genome). In the latter modality, precision and recall are quite poor. On the contrary, results are better with the support of the annotation, even though it is not complete. Finally, abundance estimation error often shows a very skewed distribution. The performance strongly depends on the true real abundance of the isoforms. Lowly (and sometimes also moderately) expressed isoforms are poorly detected and estimated. In particular, lowly expressed isoforms are identified mainly if they are provided in the original annotation as potential isoforms. CONCLUSIONS Both detection and quantification of all isoforms from RNA-seq data are still hard problems and they are affected by many factors. Overall, the performance significantly changes since it depends on the modes of action and on the type of available annotation. Results obtained using complete or partial annotation are able to detect most of the expressed isoforms, even though the number of false positives is often high. Fully data driven approaches require more attention, at least for complex eucaryotic genomes. Improvements are desirable especially for isoform quantification and for isoform detection with low abundance.
Collapse
|