1
|
Mabin JW, Vock IW, Machyna M, Haque N, Thakran P, Zhang A, Rai G, Leibler INM, Inglese J, Simon MD, Hogg JR. Uncovering the isoform-resolution kinetic landscape of nonsense-mediated mRNA decay with EZbakR. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.12.642874. [PMID: 40161772 PMCID: PMC11952489 DOI: 10.1101/2025.03.12.642874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Cellular RNA levels are a product of synthesis and degradation kinetics, which can differ among transcripts of the same gene. An important cause of isoform-specific decay is the nonsense-mediated mRNA decay (NMD) pathway, which degrades transcripts with premature termination codons (PTCs) and other features. Understanding NMD functions requires strategies to quantify isoform kinetics; however, current approaches remain limited. Methods like nucleotide-recoding RNA-seq (NR-seq) enable insights into RNA kinetics, but existing bioinformatic tools do not provide robust, isoform-specific degradation rate constant estimates. We extend the EZbakR-suite by implementing a strategy to infer isoform-level kinetics from short-read NR-seq data. This approach uncovers unexpected variability in NMD efficiency among transcripts with conserved PTC-containing exons and rapid decay of a subset of mRNAs lacking PTCs. Our findings highlight the effects of competition between NMD and other decay pathways, provide mechanistic insights into established NMD efficiency correlates, and identify transcript features promoting efficient decay.
Collapse
Affiliation(s)
- Justin W. Mabin
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Isaac W. Vock
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
| | - Martin Machyna
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
- Present address: Paul-Ehrlich-Institut, Host-Pathogen-Interactions, 63225 Langen, Germany
| | - Nazmul Haque
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Present address: Ultragenyx, 7000 Shoreline Ct, South San Francisco, CA 94080
| | - Poonam Thakran
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Alexandra Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
| | - Ganesha Rai
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, 20850 Maryland, USA
| | | | - James Inglese
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, 20850 Maryland, USA
- Metabolic Medicine Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Matthew D. Simon
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
| | - J. Robert Hogg
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
2
|
Chamberlin J, Gillen A, Quinlan A. Improved characterization of 3' single-cell RNA-seq libraries with paired-end avidity sequencing. NAR Genom Bioinform 2024; 6:lqae175. [PMID: 39703419 PMCID: PMC11655283 DOI: 10.1093/nargab/lqae175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 11/12/2024] [Accepted: 11/30/2024] [Indexed: 12/21/2024] Open
Abstract
Prevailing poly(dT)-primed 3' single-cell RNA-seq protocols generate barcoded cDNA fragments containing the reverse transcriptase priming site or in principle the polyadenylation site. Direct sequencing across this site was historically difficult because of DNA sequencing errors induced by the homopolymeric primer at the 'barcode' end. Here, we evaluate the capability of 'avidity base chemistry' DNA sequencing from Element Biosciences to sequence through the primer and enable accurate paired-end read alignment and precise quantification of polyadenylation sites. We find that the Element Aviti instrument sequences through the thymine homopolymer into the subsequent cDNA sequence without detectable loss of accuracy. The additional sequence enables direct and independent assignment of reads to polyadenylation sites, which bypasses the complexities and limitations of conventional approaches but does not consistently improve read mapping rates compared to single-end alignment. We also characterize low-level artifacts and demonstrate necessary adjustments to adapter trimming and sequence alignment regardless of platform, particularly in the context of extended read lengths. Our analyses confirm that Element avidity sequencing is an effective alternative to Illumina sequencing for standard single-cell RNA-seq, particularly for polyadenylation site measurement but do not rule out the potential for similar performance from other emerging platforms.
Collapse
Affiliation(s)
- John T Chamberlin
- Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way #140, Salt Lake City, UT 84112, USA
| | - Austin E Gillen
- RNA Bioscience Initiative, University of Colorado School of Medicine, 12801 E 17th Ave, Aurora, CO 80045, USA
- Division of Hematology, University of Colorado School of Medicine, 12700 East 19th Ave, Aurora, CO 80045, USA
- Rocky Mountain Regional VA Medical Center, 1700 N Wheeling St, Aurora, CO 80045, USA
| | - Aaron R Quinlan
- Department of Biomedical Informatics, University of Utah School of Medicine, 421 Wakara Way #140, Salt Lake City, UT 84112, USA
- Department of Human Genetics, University of Utah School of Medicine, 15 N 2030 E, Salt Lake City, UT 84112, USA
| |
Collapse
|
3
|
Chamberlin JT, Gillen AE, Quinlan AR. Improved characterization of single-cell RNA-seq libraries with paired-end avidity sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.10.602909. [PMID: 39026715 PMCID: PMC11257511 DOI: 10.1101/2024.07.10.602909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Prevailing poly(dT)-primed 3' single-cell RNA-seq protocols generate barcoded cDNA fragments containing the reverse transcriptase priming site, which is expected to be the poly(A) tail or a genomic adenine homopolymer. Direct sequencing across this priming site was historically difficult because of DNA sequencing errors induced by the homopolymeric primer at the 'barcode' end. Here, we evaluate the capability of "avidity base chemistry" DNA sequencing from Element Biosciences to sequence through this homopolymer accurately, and the impact of the additional cDNA sequence on read alignment and precise quantification of polyadenylation site usage. We find that the Element Aviti instrument sequences through the thymine homopolymer into the subsequent cDNA sequence without detectable loss of accuracy. The resulting paired-end alignments enable direct and independent assignment of reads to polyadenylation sites, which bypasses complexities and limitations of conventional approaches but does not consistently improve read mapping rates compared to single-end alignment. We also characterize low-level artifacts and arrive at an adjusted adapter trimming and alignment workflow that significantly improves the alignment of sequence data from Element and Illumina, particularly in the context of extended read lengths. Our analyses confirm that Element avidity sequencing is an effective alternative to Illumina sequencing for standard single-cell RNA-seq, particularly for polyadenylation site analyses but do not rule out the potential for similar performance from other emerging platforms.
Collapse
Affiliation(s)
- John T Chamberlin
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, 84112, USA
| | - Austin E Gillen
- RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Division of Hematology, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Rocky Mountain Regional VA Medical Center, Aurora, CO, 80045, USA
| | - Aaron R Quinlan
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, 84112, USA
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT, 84112, USA
| |
Collapse
|
4
|
Mastini C, Campisi M, Patrucco E, Mura G, Ferreira A, Costa C, Ambrogio C, Germena G, Martinengo C, Peola S, Mota I, Vissio E, Molinaro L, Arigoni M, Olivero M, Calogero R, Prokoph N, Tabbò F, Shoji B, Brugieres L, Geoerger B, Turner SD, Cuesta-Mateos C, D’Aliberti D, Mologni L, Piazza R, Gambacorti-Passerini C, Inghirami GG, Chiono V, Kamm RD, Hirsch E, Koch R, Weinstock DM, Aster JC, Voena C, Chiarle R. Targeting CCR7-PI3Kγ overcomes resistance to tyrosine kinase inhibitors in ALK-rearranged lymphoma. Sci Transl Med 2023; 15:eabo3826. [PMID: 37379367 PMCID: PMC10804420 DOI: 10.1126/scitranslmed.abo3826] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 06/02/2023] [Indexed: 06/30/2023]
Abstract
Anaplastic lymphoma kinase (ALK) tyrosine kinase inhibitors (TKIs) show potent efficacy in several ALK-driven tumors, but the development of resistance limits their long-term clinical impact. Although resistance mechanisms have been studied extensively in ALK-driven non-small cell lung cancer, they are poorly understood in ALK-driven anaplastic large cell lymphoma (ALCL). Here, we identify a survival pathway supported by the tumor microenvironment that activates phosphatidylinositol 3-kinase γ (PI3K-γ) signaling through the C-C motif chemokine receptor 7 (CCR7). We found increased PI3K signaling in patients and ALCL cell lines resistant to ALK TKIs. PI3Kγ expression was predictive of a lack of response to ALK TKI in patients with ALCL. Expression of CCR7, PI3Kγ, and PI3Kδ were up-regulated during ALK or STAT3 inhibition or degradation and a constitutively active PI3Kγ isoform cooperated with oncogenic ALK to accelerate lymphomagenesis in mice. In a three-dimensional microfluidic chip, endothelial cells that produce the CCR7 ligands CCL19/CCL21 protected ALCL cells from apoptosis induced by crizotinib. The PI3Kγ/δ inhibitor duvelisib potentiated crizotinib activity against ALCL lines and patient-derived xenografts. Furthermore, genetic deletion of CCR7 blocked the central nervous system dissemination and perivascular growth of ALCL in mice treated with crizotinib. Thus, blockade of PI3Kγ or CCR7 signaling together with ALK TKI treatment reduces primary resistance and the survival of persister lymphoma cells in ALCL.
Collapse
Affiliation(s)
- Cristina Mastini
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Marco Campisi
- Dana Farber Cancer Institute, Boston, MA 02115, USA
- Department of Pathology, Boston Children’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Department of Mechanical and Aerospace Engineering, Politecnico of Torino, Torino 10129, Italy
| | - Enrico Patrucco
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Giulia Mura
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Antonio Ferreira
- Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston MA 02115, USA
| | - Carlotta Costa
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Chiara Ambrogio
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Giulia Germena
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Cinzia Martinengo
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Silvia Peola
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Ines Mota
- Department of Pathology, Boston Children’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Elena Vissio
- Department of Oncology, University of Torino, Orbassano, Torino 10043, Italy
| | - Luca Molinaro
- Department of Medical Science, University of Torino, Torino 10126, Italy
| | - Maddalena Arigoni
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Martina Olivero
- Department of Oncology, University of Torino, Orbassano, Torino 10043, Italy
- Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Torino 10060, Italy
| | - Raffaele Calogero
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Nina Prokoph
- Division of Cellular and Molecular Pathology, Department of Pathology, University of Cambridge, Addenbrooke’s Hospital, Cambridge CB2 0QQ, UK
| | - Fabrizio Tabbò
- Department of Pathology, Cornell University, New York NY 10121, USA
| | - Brent Shoji
- Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston MA 02115, USA
| | - Laurence Brugieres
- Department of Pediatric and Adolescent Oncology, Gustave Roussy Cancer Center, Paris-Saclay University, Villejuif 94805, France
| | - Birgit Geoerger
- Department of Pediatric and Adolescent Oncology, Gustave Roussy Cancer Center, Paris-Saclay University, Villejuif 94805, France
- Université Paris-Saclay, INSERM U1015, Villejuif 94805, France
| | - Suzanne D. Turner
- Division of Cellular and Molecular Pathology, Department of Pathology, University of Cambridge, Addenbrooke’s Hospital, Cambridge CB2 0QQ, UK
- Faculty of Medicine, Masaryk University, Brno 601 77, Czech Republic
| | - Carlos Cuesta-Mateos
- Department of Pre-Clinical Development, Catapult Therapeutics B.V., 8243 RC, Lelystad, Netherlands
| | - Deborah D’Aliberti
- Department of Medicine and Surgery, University of Milan-Bicocca, Monza 20900, Italy
| | - Luca Mologni
- Department of Medicine and Surgery, University of Milan-Bicocca, Monza 20900, Italy
| | - Rocco Piazza
- Department of Medicine and Surgery, University of Milan-Bicocca, Monza 20900, Italy
| | | | | | - Valeria Chiono
- Department of Mechanical and Aerospace Engineering, Politecnico of Torino, Torino 10129, Italy
| | - Roger D. Kamm
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Emilio Hirsch
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Raphael Koch
- Dana Farber Cancer Institute, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
- University Medical Center Göttingen, 37075 Göttingen, Germany
| | - David M. Weinstock
- Dana Farber Cancer Institute, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Jon C. Aster
- Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston MA 02115, USA
| | - Claudia Voena
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
| | - Roberto Chiarle
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino 10126, Italy
- Department of Pathology, Boston Children’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
5
|
Hyaluronan nanoscale clustering and Hyaluronan synthase 2 expression are linked to the invasion of child fibroblasts and infantile fibrosarcoma in vitro and in vivo. Sci Rep 2022; 12:19835. [PMID: 36400790 PMCID: PMC9674583 DOI: 10.1038/s41598-022-21952-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Accepted: 10/06/2022] [Indexed: 11/19/2022] Open
Abstract
Infantile fibrosarcoma is a rare childhood tumour that originates in the fibrous connective tissue of the long bones for which there is an urgent need to identify novel therapeutic targets. This study aims to clarify the role of the extracellular matrix component hyaluronan in the invasion of child fibroblasts and Infantile fibrosarcoma into the surrounding environment. Using nanoscale super-resolution STED (Stimulated emission depletion) microscopy followed by computational image analysis, we observed, for the first time, that invasive child fibroblasts showed increased nanoscale clustering of hyaluronan at the cell periphery, as compared to control cells. Hyaluronan was not observed within focal adhesions. Bioinformatic analyses further revealed that the increased nanoscale hyaluronan clustering was accompanied by increased gene expression of Hyaluronan synthase 2, reduced expression of Hyaluronidase 2 and CD44, and no change of Hyaluronan synthase 1 and Hyaluronidases 1, 3, 4 or 5. We further observed that the expression of the Hyaluronan synthase 1, 2 and 3, and the Hyaluronidase 3 and 5 genes was linked to reduced life expectancy of fibrosarcoma patients. The invasive front of infantile fibrosarcoma tumours further showed increased levels of hyaluronan, as compared to the tumour centre. Taken together, our findings are consistent with the possibility that while Hyaluronan synthase 2 increases the levels, the Hyaluronidases 3 and 5 reduce the weight of hyaluronan, resulting in the nanoscale clustering of hyaluronan at the leading edge of cells, cell invasion and the spread of Infantile fibrosarcoma.
Collapse
|
6
|
Srikakulam N, Sridevi G, Pandi G. High-quality reference transcriptome construction improves RNA-seq quantification in Oryza sativa indica. Front Genet 2022; 13:995072. [PMID: 36246658 PMCID: PMC9558114 DOI: 10.3389/fgene.2022.995072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 09/02/2022] [Indexed: 11/13/2022] Open
Abstract
The Reference Transcriptomic Dataset (RTD) is an accurate and comprehensive collection of transcripts originating from a given organism. It holds the key to precise transcript quantification and downstream analysis of differential expressions and regulations. Currently, transcriptome annotations for most crop plants are far from complete. For example, Oryza sativa indica (O. sativa indica) is reported to have 40,759 transcripts in the Ensembl database without alternative transcript isoforms and alternative splicing (AS) events. To generate a high-quality RTD, we conducted RNA sequencing of rice leaf samples collected at various time points during Rhizoctonia solani infection. The obtained reads were analyzed by adopting the recently developed computational analysis pipeline to assemble the RTD with increased transcript and AS diversity for O. sativa indica (IndicaRTD). After stringent quality filtering, the newly constructed transcriptome annotation was comprised of 122,968 non-redundant transcripts from 53,695 genes. This study identified many novel transcripts compared to Ensembl deposited data that are important for regulating molecular and physiological processes in the plant system. Currently, the assembled IndicaRTD must allow fast quantification of transcript and gene expression with high precision.
Collapse
Affiliation(s)
- Nagesh Srikakulam
- Laboratory of RNA Biology and Epigenomics, Department of Plant Biotechnology, School of Biotechnology, Madurai Kamaraj University, Madurai, India
- *Correspondence: Nagesh Srikakulam, ; Gopal Pandi,
| | - Ganapathi Sridevi
- Department of Plant Biotechnology, School of Biotechnology, Madurai Kamaraj University, Madurai, India
| | - Gopal Pandi
- Laboratory of RNA Biology and Epigenomics, Department of Plant Biotechnology, School of Biotechnology, Madurai Kamaraj University, Madurai, India
- *Correspondence: Nagesh Srikakulam, ; Gopal Pandi,
| |
Collapse
|
7
|
Guo W, Tzioutziou NA, Stephen G, Milne I, Calixto CPG, Waugh R, Brown JWS, Zhang R. 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists. RNA Biol 2021; 18:1574-1587. [PMID: 33345702 PMCID: PMC8594885 DOI: 10.1080/15476286.2020.1858253] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 11/26/2020] [Accepted: 11/27/2020] [Indexed: 12/19/2022] Open
Abstract
RNA-sequencing (RNA-seq) analysis of gene expression and alternative splicing should be routine and robust but is often a bottleneck for biologists because of different and complex analysis programs and reliance on specialized bioinformatics skills. We have developed the '3D RNA-seq' App, an R shiny App and web-based pipeline for the comprehensive analysis of RNA-seq data from any organism. It represents an easy-to-use, flexible and powerful tool for analysis of both gene and transcript-level gene expression to identify differential gene/transcript expression, differential alternative splicing and differential transcript usage (3D) as well as isoform switching from RNA-seq data. 3D RNA-seq integrates state-of-the-art differential expression analysis tools and adopts best practice for RNA-seq analysis. The program is designed to be run by biologists with minimal bioinformatics experience (or by bioinformaticians) allowing lab scientists to analyse their RNA-seq data. It achieves this by operating through a user-friendly graphical interface which automates the data flow through the programs in the pipeline. The comprehensive analysis performed by 3D RNA-seq is extremely rapid and accurate, can handle complex experimental designs, allows user setting of statistical parameters, visualizes the results through graphics and tables, and generates publication quality figures such as heat-maps, expression profiles and GO enrichment plots. The utility of 3D RNA-seq is illustrated by analysis of data from a time-series of cold-treated Arabidopsis plants and from dexamethasone-treated male and female mouse cortex and hypothalamus data identifying dexamethasone-induced sex- and brain region-specific differential gene expression and alternative splicing.
Collapse
Affiliation(s)
- Wenbin Guo
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
- Information and Computational Sciences, The James Hutton Institute, Dundee, UK
| | - Nikoleta A Tzioutziou
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
| | - Gordon Stephen
- Information and Computational Sciences, The James Hutton Institute, Dundee, UK
| | - Iain Milne
- Information and Computational Sciences, The James Hutton Institute, Dundee, UK
| | - Cristiane PG Calixto
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
| | - Robbie Waugh
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
- Cell and Molecular Sciences, The James Hutton Institute, Dundee, UK
| | - John W. S. Brown
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
- Cell and Molecular Sciences, The James Hutton Institute, Dundee, UK
| | - Runxuan Zhang
- Information and Computational Sciences, The James Hutton Institute, Dundee, UK
| |
Collapse
|
8
|
Guo W, Tzioutziou NA, Stephen G, Milne I, Calixto CP, Waugh R, Brown JWS, Zhang R. 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists. RNA Biol 2021. [PMID: 33345702 DOI: 10.1101/656686] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023] Open
Abstract
RNA-sequencing (RNA-seq) analysis of gene expression and alternative splicing should be routine and robust but is often a bottleneck for biologists because of different and complex analysis programs and reliance on specialized bioinformatics skills. We have developed the '3D RNA-seq' App, an R shiny App and web-based pipeline for the comprehensive analysis of RNA-seq data from any organism. It represents an easy-to-use, flexible and powerful tool for analysis of both gene and transcript-level gene expression to identify differential gene/transcript expression, differential alternative splicing and differential transcript usage (3D) as well as isoform switching from RNA-seq data. 3D RNA-seq integrates state-of-the-art differential expression analysis tools and adopts best practice for RNA-seq analysis. The program is designed to be run by biologists with minimal bioinformatics experience (or by bioinformaticians) allowing lab scientists to analyse their RNA-seq data. It achieves this by operating through a user-friendly graphical interface which automates the data flow through the programs in the pipeline. The comprehensive analysis performed by 3D RNA-seq is extremely rapid and accurate, can handle complex experimental designs, allows user setting of statistical parameters, visualizes the results through graphics and tables, and generates publication quality figures such as heat-maps, expression profiles and GO enrichment plots. The utility of 3D RNA-seq is illustrated by analysis of data from a time-series of cold-treated Arabidopsis plants and from dexamethasone-treated male and female mouse cortex and hypothalamus data identifying dexamethasone-induced sex- and brain region-specific differential gene expression and alternative splicing.
Collapse
Affiliation(s)
- Wenbin Guo
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
- Information and Computational Sciences, The James Hutton Institute, Dundee, UK
| | - Nikoleta A Tzioutziou
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
| | - Gordon Stephen
- Information and Computational Sciences, The James Hutton Institute, Dundee, UK
| | - Iain Milne
- Information and Computational Sciences, The James Hutton Institute, Dundee, UK
| | - Cristiane Pg Calixto
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
| | - Robbie Waugh
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
- Cell and Molecular Sciences, The James Hutton Institute, Dundee, UK
| | - John W S Brown
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Dundee, UK
- Cell and Molecular Sciences, The James Hutton Institute, Dundee, UK
| | - Runxuan Zhang
- Information and Computational Sciences, The James Hutton Institute, Dundee, UK
| |
Collapse
|
9
|
Zhang Y, Cai Y, Roca X, Kwoh CK, Fullwood MJ. Chromatin loop anchors predict transcript and exon usage. Brief Bioinform 2021; 22:6319936. [PMID: 34263910 PMCID: PMC8575016 DOI: 10.1093/bib/bbab254] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 06/16/2021] [Accepted: 05/25/2021] [Indexed: 11/24/2022] Open
Abstract
Epigenomics and transcriptomics data from high-throughput sequencing techniques such as RNA-seq and ChIP-seq have been successfully applied in predicting gene transcript expression. However, the locations of chromatin loops in the genome identified by techniques such as Chromatin Interaction Analysis with Paired End Tag sequencing (ChIA-PET) have never been used for prediction tasks. Here, we developed machine learning models to investigate if ChIA-PET could contribute to transcript and exon usage prediction. In doing so, we used a large set of transcription factors as well as ChIA-PET data. We developed different Gradient Boosting Trees models according to the different tasks with the integrated datasets from three cell lines, including GM12878, HeLaS3 and K562. We validated the models via 10-fold cross validation, chromosome-split validation and cross-cell validation. Our results show that both transcript and splicing-derived exon usage can be effectively predicted with at least 0.7512 and 0.7459 of accuracy, respectively, on all cell lines from all kinds of validations. Examining the predictive features, we found that RNA Polymerase II ChIA-PET was one of the most important features in both transcript and exon usage prediction, suggesting that chromatin loop anchors are predictive of both transcript and exon usage.
Collapse
Affiliation(s)
- Yu Zhang
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
| | - Yichao Cai
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore 117599, Singapore
| | - Xavier Roca
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
| | - Melissa Jane Fullwood
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Dr, Singapore 117599, Singapore.,School of Biological Sciences, Nanyang Technological University, 637551, Singapore.,Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), 61 Biopolis Dr, Singapore 138673, Singapore
| |
Collapse
|
10
|
Demircioğlu D, Cukuroglu E, Kindermans M, Nandi T, Calabrese C, Fonseca NA, Kahles A, Lehmann KV, Stegle O, Brazma A, Brooks AN, Rätsch G, Tan P, Göke J. A Pan-cancer Transcriptome Analysis Reveals Pervasive Regulation through Alternative Promoters. Cell 2020; 178:1465-1477.e17. [PMID: 31491388 DOI: 10.1016/j.cell.2019.08.018] [Citation(s) in RCA: 131] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 12/13/2018] [Accepted: 08/07/2019] [Indexed: 02/08/2023]
Abstract
Most human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. However, while a global change in transcription is recognized as a defining feature of cancer, the contribution of alternative promoters still remains largely unexplored. Here, we infer active promoters using RNA-seq data from 18,468 cancer and normal samples, demonstrating that alternative promoters are a major contributor to context-specific regulation of transcription. We find that promoters are deregulated across tissues, cancer types, and patients, affecting known cancer genes and novel candidates. For genes with independently regulated promoters, we demonstrate that promoter activity provides a more accurate predictor of patient survival than gene expression. Our study suggests that a dynamic landscape of active promoters shapes the cancer transcriptome, opening new diagnostic avenues and opportunities to further explore the interplay of regulatory mechanisms with transcriptional aberrations in cancer.
Collapse
Affiliation(s)
- Deniz Demircioğlu
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore; School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Engin Cukuroglu
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Martin Kindermans
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Tannistha Nandi
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Claudia Calabrese
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK; Genome Biology Unit, EMBL, Heidelberg, 69117, Germany
| | - Nuno A Fonseca
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK; CIBIO/InBIO - Research Center in Biodiversity and Genetic Resources, Universidade do Porto, Vairão 4485-601, Portugal
| | - André Kahles
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland; Department of Biology, ETH Zurich, Zurich 8093, Switzerland; Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland; Biomedical Informatics Research, University Hospital Zurich, Zurich 8091, Switzerland
| | - Kjong-Van Lehmann
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland; Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland; Biomedical Informatics Research, University Hospital Zurich, Zurich 8091, Switzerland
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK; Genome Biology Unit, EMBL, Heidelberg, 69117, Germany; Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Angela N Brooks
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Gunnar Rätsch
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland; Department of Biology, ETH Zurich, Zurich 8093, Switzerland; Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland; Biomedical Informatics Research, University Hospital Zurich, Zurich 8091, Switzerland; Weill Cornell Medical College, New York, NY 10065, USA
| | - Patrick Tan
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore 169857, Singapore; Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; Cancer Therapeutics and Stratified Oncology, Genome Institute of Singapore, Singapore 138672, Singapore; SingHealth/Duke-NUS Institute of Precision Medicine, National Heart Centre Singapore, Singapore 169856, Singapore; Cellular and Molecular Research, National Cancer Centre, Singapore 169610, Singapore; Singapore Gastric Cancer Consortium, Singapore 119074, Singapore
| | - Jonathan Göke
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore; Cellular and Molecular Research, National Cancer Centre, Singapore 169610, Singapore.
| |
Collapse
|
11
|
Ma C, Kingsford C. Detecting, Categorizing, and Correcting Coverage Anomalies of RNA-Seq Quantification. Cell Syst 2019; 9:589-599.e7. [PMID: 31786209 DOI: 10.1016/j.cels.2019.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 07/09/2019] [Accepted: 10/17/2019] [Indexed: 11/13/2022]
Abstract
Because of incomplete reference transcriptomes, incomplete sequencing bias models, or other modeling defects, algorithms to infer isoform expression from RNA sequencing (RNA-seq) sometimes do not accurately model expression. We present a computational method to detect instances where a quantification algorithm could not completely explain the input reads. Our approach identifies regions where the read coverage significantly deviates from expectation. We call these regions "expression anomalies." We further present a method to attribute their cause to either the incompleteness of the reference transcriptome or algorithmic mistakes. We detect anomalies for 30 GEUVADIS and 16 Human Body Map samples. By correcting anomalies when possible, we reduce the number of falsely predicted instances of differential expression. Anomalies that cannot be corrected are suspected to indicate the existence of isoforms unannotated by the reference. We detected 88 common anomalies of this type and find that they tend to have a lower-than-expected coverage toward their 3' ends.
Collapse
Affiliation(s)
- Cong Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA.
| |
Collapse
|
12
|
Rapazote-Flores P, Bayer M, Milne L, Mayer CD, Fuller J, Guo W, Hedley PE, Morris J, Halpin C, Kam J, McKim SM, Zwirek M, Casao MC, Barakate A, Schreiber M, Stephen G, Zhang R, Brown JWS, Waugh R, Simpson CG. BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq. BMC Genomics 2019; 20:968. [PMID: 31829136 DOI: 10.1186/s12864-019-6243-6247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 10/29/2019] [Indexed: 05/27/2023] Open
Abstract
BACKGROUND The time required to analyse RNA-seq data varies considerably, due to discrete steps for computational assembly, quantification of gene expression and splicing analysis. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants. RESULTS A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts - BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al. Nature; 544: 427-433, 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al. Plant Physiol; 156: 20-28, 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5' and 3' UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2791 differentially alternatively spliced genes and 2768 transcripts with differential transcript usage. CONCLUSION A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.
Collapse
Affiliation(s)
- Paulo Rapazote-Flores
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Micha Bayer
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Linda Milne
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | | | - John Fuller
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Wenbin Guo
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Pete E Hedley
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Jenny Morris
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Claire Halpin
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Jason Kam
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
- Present address: Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Gogerddan, Aberystwyth, Ceredigion, SY23 3EB, UK
| | - Sarah M McKim
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Monika Zwirek
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
- Present Address: MRC Protein Phosphorylation and Ubiquitylation Unit, Sir James Black Centre, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - M Cristina Casao
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Abdellah Barakate
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Miriam Schreiber
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Gordon Stephen
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Runxuan Zhang
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - John W S Brown
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Robbie Waugh
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Craig G Simpson
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK.
| |
Collapse
|
13
|
Rapazote-Flores P, Bayer M, Milne L, Mayer CD, Fuller J, Guo W, Hedley PE, Morris J, Halpin C, Kam J, McKim SM, Zwirek M, Casao MC, Barakate A, Schreiber M, Stephen G, Zhang R, Brown JWS, Waugh R, Simpson CG. BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq. BMC Genomics 2019; 20:968. [PMID: 31829136 PMCID: PMC6907147 DOI: 10.1186/s12864-019-6243-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 10/29/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The time required to analyse RNA-seq data varies considerably, due to discrete steps for computational assembly, quantification of gene expression and splicing analysis. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants. RESULTS A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts - BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al. Nature; 544: 427-433, 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al. Plant Physiol; 156: 20-28, 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5' and 3' UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2791 differentially alternatively spliced genes and 2768 transcripts with differential transcript usage. CONCLUSION A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.
Collapse
Affiliation(s)
- Paulo Rapazote-Flores
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Micha Bayer
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Linda Milne
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | | | - John Fuller
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Wenbin Guo
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Pete E Hedley
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Jenny Morris
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Claire Halpin
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Jason Kam
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
- Present address: Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Gogerddan, Aberystwyth, Ceredigion, SY23 3EB, UK
| | - Sarah M McKim
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Monika Zwirek
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
- Present Address: MRC Protein Phosphorylation and Ubiquitylation Unit, Sir James Black Centre, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - M Cristina Casao
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Abdellah Barakate
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Miriam Schreiber
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Gordon Stephen
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - Runxuan Zhang
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
| | - John W S Brown
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Robbie Waugh
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK
- Division of Plant Sciences, School of Life Sciences, University of Dundee at the James Hutton Institute, Dundee, DD2 5DA, UK
| | - Craig G Simpson
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK.
| |
Collapse
|
14
|
Van den Berge K, Hembach KM, Soneson C, Tiberi S, Clement L, Love MI, Patro R, Robinson MD. RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021255] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.
Collapse
Affiliation(s)
- Koen Van den Berge
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Katharina M. Hembach
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Simone Tiberi
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Lieven Clement
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Michael I. Love
- Department of Biostatistics and Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, New York 11794, USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
15
|
Hsieh PH, Oyang YJ, Chen CY. Effect of de novo transcriptome assembly on transcript quantification. Sci Rep 2019; 9:8304. [PMID: 31165774 PMCID: PMC6549443 DOI: 10.1038/s41598-019-44499-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Accepted: 05/17/2019] [Indexed: 11/08/2022] Open
Abstract
Correct quantification of transcript expression is essential to understand the functional elements in different physiological conditions. For the organisms without the reference transcriptome, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of erroneous contigs produced by the assemblers might result in unreliable estimation. In this regard, this study investigates how assembly quality affects the performance of quantification based on de novo transcriptome assembly. We examined the over-extended and incomplete contigs, and demonstrated that assembly completeness has a strong impact on the estimation of contig abundance. Then we investigated the behavior of the quantifiers with respect to sequence ambiguity which might be originally presented in the transcriptome or accidentally produced by assemblers. The results suggested that the quantifiers often over-estimate the expression of family-collapse contigs and under-estimate the expression of duplicated contigs. For organisms without reference transcriptome, it remains challenging to detect the inaccurate estimation on family-collapse contigs. On the contrary, we observed that the situation of under-estimation on duplicated contigs can be warned through analyzing the read proportion of estimated abundance (RPEA) of contigs in the connected component inferenced by the quantifiers. In addition, we suggest that the estimated quantification results on the connected component level have better accuracy over sequence level quantification. The analytic results conducted in this study provides valuable insights for future development of transcriptome assembly and quantification.
Collapse
Affiliation(s)
- Ping-Han Hsieh
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, 10617, Taiwan
| | - Yen-Jen Oyang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, 10617, Taiwan
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, 10617, Taiwan
| | - Chien-Yu Chen
- Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei, 10617, Taiwan.
- Genome and Systems Biology Program, National Taiwan University and Academia sinica, Taipei, 10617, Taiwan.
| |
Collapse
|
16
|
Alasoo K, Rodrigues J, Danesh J, Freitag DF, Paul DS, Gaffney DJ. Genetic effects on promoter usage are highly context-specific and contribute to complex traits. eLife 2019; 8:e41673. [PMID: 30618377 PMCID: PMC6349408 DOI: 10.7554/elife.41673] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 01/08/2019] [Indexed: 12/12/2022] Open
Abstract
Genetic variants regulating RNA splicing and transcript usage have been implicated in both common and rare diseases. Although transcript usage quantitative trait loci (tuQTLs) have been mapped across multiple cell types and contexts, it is challenging to distinguish between the main molecular mechanisms controlling transcript usage: promoter choice, splicing and 3' end choice. Here, we analysed RNA-seq data from human macrophages exposed to three inflammatory and one metabolic stimulus. In addition to conventional gene-level and transcript-level analyses, we also directly quantified promoter usage, splicing and 3' end usage. We found that promoters, splicing and 3' ends were predominantly controlled by independent genetic variants enriched in distinct genomic features. Promoter usage QTLs were also 50% more likely to be context-specific than other tuQTLs and constituted 25% of the transcript-level colocalisations with complex traits. Thus, promoter usage might be an underappreciated molecular mechanism mediating complex trait associations in a context-specific manner.
Collapse
Affiliation(s)
- Kaur Alasoo
- Institute of Computer ScienceUniversity of TartuTartuEstonia
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| | - Julia Rodrigues
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| | - John Danesh
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
- BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary CareUniversity of CambridgeCambridgeUnited Kingdom
- British Heart Foundation Centre of Excellence, Division of Cardiovascular MedicineAddenbrooke’s HospitalCambridgeUnited Kingdom
- National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics, Department of Public Health and Primary CareUniversity of CambridgeCambridgeUnited Kingdom
| | - Daniel F Freitag
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
- British Heart Foundation Centre of Excellence, Division of Cardiovascular MedicineAddenbrooke’s HospitalCambridgeUnited Kingdom
| | - Dirk S Paul
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
- British Heart Foundation Centre of Excellence, Division of Cardiovascular MedicineAddenbrooke’s HospitalCambridgeUnited Kingdom
| | - Daniel J Gaffney
- Wellcome Sanger Institute, Wellcome Genome CampusHinxtonUnited Kingdom
| |
Collapse
|
17
|
Chabbert CD, Eberhart T, Guccini I, Krek W, Kovacs WJ. Correction of gene model annotations improves isoform abundance estimates: the example of ketohexokinase ( Khk). F1000Res 2018; 7:1956. [PMID: 31001414 PMCID: PMC6464065 DOI: 10.12688/f1000research.17082.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/20/2019] [Indexed: 12/13/2022] Open
Abstract
Next generation sequencing protocols such as RNA-seq have made the genome-wide characterization of the transcriptome a crucial part of many research projects in biology. Analyses of the resulting data provide key information on gene expression and in certain cases on exon or isoform usage. The emergence of transcript quantification software such as Salmon has enabled researchers to efficiently estimate isoform and gene expressions across the genome while tremendously reducing the necessary computational power. Although overall gene expression estimations were shown to be accurate, isoform expression quantifications appear to be a more challenging task. Low expression levels and uneven or insufficient coverage were reported as potential explanations for inconsistent estimates. Here, through the example of the ketohexokinase (
Khk) gene in mouse, we demonstrate that the use of an incorrect gene annotation can also result in erroneous isoform quantification results. Manual correction of the input
Khk gene model provided a much more accurate estimation of relative
Khk isoform expression when compared to quantitative PCR (qPCR measurements). In particular, removal of an unexpressed retained intron and a proper adjustment of the 5’ and 3’ untranslated regions both had a strong impact on the correction of erroneous estimates. Finally, we observed a better concordance in isoform quantification between datasets and sequencing strategies when relying on the newly generated
Khk annotations. These results highlight the importance of accurate gene models and annotations for correct isoform quantification and reassert the need for orthogonal methods of estimation of isoform expression to confirm important findings.
Collapse
Affiliation(s)
| | - Tanja Eberhart
- Institute of Molecular Health Sciences, ETH Zurich, Zurich, 8093, Switzerland
| | - Ilaria Guccini
- Institute of Molecular Health Sciences, ETH Zurich, Zurich, 8093, Switzerland
| | - Wilhelm Krek
- Institute of Molecular Health Sciences, ETH Zurich, Zurich, 8093, Switzerland
| | - Werner J Kovacs
- Institute of Molecular Health Sciences, ETH Zurich, Zurich, 8093, Switzerland
| |
Collapse
|