51
|
Jabs S, Biton A, Bécavin C, Nahori MA, Ghozlane A, Pagliuso A, Spanò G, Guérineau V, Touboul D, Giai Gianetto Q, Chaze T, Matondo M, Dillies MA, Cossart P. Impact of the gut microbiota on the m 6A epitranscriptome of mouse cecum and liver. Nat Commun 2020; 11:1344. [PMID: 32165618 PMCID: PMC7067863 DOI: 10.1038/s41467-020-15126-x] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 02/17/2020] [Indexed: 12/28/2022] Open
Abstract
The intestinal microbiota modulates host physiology and gene expression via mechanisms that are not fully understood. Here we examine whether host epitranscriptomic marks are affected by the gut microbiota. We use methylated RNA-immunoprecipitation and sequencing (MeRIP-seq) to identify N6-methyladenosine (m6A) modifications in mRNA of mice carrying conventional, modified, or no microbiota. We find that variations in the gut microbiota correlate with m6A modifications in the cecum, and to a lesser extent in the liver, affecting pathways related to metabolism, inflammation and antimicrobial responses. We analyze expression levels of several known writer and eraser enzymes, and find that the methyltransferase Mettl16 is downregulated in absence of a microbiota, and one of its target mRNAs, encoding S-adenosylmethionine synthase Mat2a, is less methylated. We furthermore show that Akkermansia muciniphila and Lactobacillus plantarum affect specific m6A modifications in mono-associated mice. Our results highlight epitranscriptomic modifications as an additional level of interaction between commensal bacteria and their host.
Collapse
Affiliation(s)
- Sabrina Jabs
- Unité des Interactions Bactéries-Cellules, Institut Pasteur, U604 Institut National de la Santé et de la Recherche Médicale, USC 2020 Institut National de la Recherche Agronomique, 25 rue du Dr Roux, F-75015, Paris, France.
| | - Anne Biton
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, 28 rue du Dr Roux, F-75015, Paris, France
| | - Christophe Bécavin
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, 28 rue du Dr Roux, F-75015, Paris, France
| | - Marie-Anne Nahori
- Unité des Interactions Bactéries-Cellules, Institut Pasteur, U604 Institut National de la Santé et de la Recherche Médicale, USC 2020 Institut National de la Recherche Agronomique, 25 rue du Dr Roux, F-75015, Paris, France
| | - Amine Ghozlane
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, 28 rue du Dr Roux, F-75015, Paris, France
| | - Alessandro Pagliuso
- Unité des Interactions Bactéries-Cellules, Institut Pasteur, U604 Institut National de la Santé et de la Recherche Médicale, USC 2020 Institut National de la Recherche Agronomique, 25 rue du Dr Roux, F-75015, Paris, France
| | - Giulia Spanò
- Unité des Interactions Bactéries-Cellules, Institut Pasteur, U604 Institut National de la Santé et de la Recherche Médicale, USC 2020 Institut National de la Recherche Agronomique, 25 rue du Dr Roux, F-75015, Paris, France
| | - Vincent Guérineau
- Institut de Chimie des Substances Naturelles, CNRS UPR 2301, Université Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette, France
| | - David Touboul
- Institut de Chimie des Substances Naturelles, CNRS UPR 2301, Université Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette, France
| | - Quentin Giai Gianetto
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, 28 rue du Dr Roux, F-75015, Paris, France
- Unité de spectrométrie de masse et Protéomique, CNRS USR 2000, Institut Pasteur, 28 rue du Dr Roux, F-75015, Paris, France
| | - Thibault Chaze
- Unité de spectrométrie de masse et Protéomique, CNRS USR 2000, Institut Pasteur, 28 rue du Dr Roux, F-75015, Paris, France
| | - Mariette Matondo
- Unité de spectrométrie de masse et Protéomique, CNRS USR 2000, Institut Pasteur, 28 rue du Dr Roux, F-75015, Paris, France
| | - Marie-Agnès Dillies
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, 28 rue du Dr Roux, F-75015, Paris, France
| | - Pascale Cossart
- Unité des Interactions Bactéries-Cellules, Institut Pasteur, U604 Institut National de la Santé et de la Recherche Médicale, USC 2020 Institut National de la Recherche Agronomique, 25 rue du Dr Roux, F-75015, Paris, France.
| |
Collapse
|
52
|
Van den Berge K, Roux de Bézieux H, Street K, Saelens W, Cannoodt R, Saeys Y, Dudoit S, Clement L. Trajectory-based differential expression analysis for single-cell sequencing data. Nat Commun 2020; 11:1201. [PMID: 32139671 PMCID: PMC7058077 DOI: 10.1038/s41467-020-14766-3] [Citation(s) in RCA: 244] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 01/14/2020] [Indexed: 12/31/2022] Open
Abstract
Trajectory inference has radically enhanced single-cell RNA-seq research by enabling the study of dynamic changes in gene expression. Downstream of trajectory inference, it is vital to discover genes that are (i) associated with the lineages in the trajectory, or (ii) differentially expressed between lineages, to illuminate the underlying biological processes. Current data analysis procedures, however, either fail to exploit the continuous resolution provided by trajectory inference, or fail to pinpoint the exact types of differential expression. We introduce tradeSeq, a powerful generalized additive model framework based on the negative binomial distribution that allows flexible inference of both within-lineage and between-lineage differential expression. By incorporating observation-level weights, the model additionally allows to account for zero inflation. We evaluate the method on simulated datasets and on real datasets from droplet-based and full-length protocols, and show that it yields biological insights through a clear interpretation of the data.
Collapse
Affiliation(s)
- Koen Van den Berge
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
- Department of Statistics, University of California, Berkeley, CA, USA
| | - Hector Roux de Bézieux
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Kelly Street
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Wouter Saelens
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
- Data mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
| | - Robrecht Cannoodt
- Data mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Yvan Saeys
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
- Data mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
| | - Sandrine Dudoit
- Department of Statistics, University of California, Berkeley, CA, USA.
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA.
- Center for Computational Biology, University of California, Berkeley, CA, USA.
| | - Lieven Clement
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
| |
Collapse
|
53
|
Merino GA, Fernández EA. Differential splicing analysis based on isoforms expression with NBSplice. J Biomed Inform 2020; 103:103378. [PMID: 31972288 DOI: 10.1016/j.jbi.2020.103378] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 12/07/2019] [Accepted: 01/13/2020] [Indexed: 01/05/2023]
Abstract
Alternative splicing alterations have been widely related to several human diseases revealing the importance of their study for the success of translational medicine. Differential splicing (DS) occurrence has been mainly analyzed through exon-based approaches over RNA-seq data. Although these strategies allow identifying differentially spliced genes, they ignore the identity of the affected gene isoforms which is crucial to understand the underlying pathological processes behind alternative splicing changes. Moreover, despite several isoform quantification tools for RNA-seq data have been recently developed, DS tools have not taken advantage of them. Here, the NBSplice R package for differential splicing analysis by means of isoform expression data is presented. It estimates differences on relative expressions of gene transcripts between experimental conditions to infer changes in gene alternative splicing patterns. The developed tool was evaluated using a synthetic RNA-seq dataset with controlled differential splicing. NBSplice accurately predicted DS occurrence, outperforming current methods in terms of accuracy, sensitivity, F-score, and false discovery rate control. The usefulness of our development was demonstrated by the analysis of a real cancer dataset, revealing new differentially spliced genes that could be studied pursuing new colorectal cancer biomarkers discovery.
Collapse
Affiliation(s)
- Gabriela Alejandra Merino
- Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática (IBB), Universidad Nacional de Entre Ríos, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ruta 11 Km 10.5, E3100XAD Oro Verde, Argentina; Centro de Investigación y Desarrollo en Inmunología y Enfermedades Infecciosas (CIDIE), Universidad Católica de Córdoba, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Av. Armada Argentina 3555, X5016DHK Córdoba, Argentina.
| | - Elmer Andrés Fernández
- Centro de Investigación y Desarrollo en Inmunología y Enfermedades Infecciosas (CIDIE), Universidad Católica de Córdoba, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Av. Armada Argentina 3555, X5016DHK Córdoba, Argentina; Facultad de Ciencias Exactas, Físicas y Naturales, Universidad Nacional de Córdoba, Av. Vélez Sarsfield 1611, X5016GCA Córdoba, Argentina.
| |
Collapse
|
54
|
Cao M, Zhou W, Breidt FJ, Peers G. Large scale maximum average power multiple inference on time‐course count data with application to RNA‐seq analysis. Biometrics 2019; 76:9-22. [DOI: 10.1111/biom.13144] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 08/28/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Meng Cao
- Department of Statistics Colorado State University Fort Collins Colorado
| | - Wen Zhou
- Department of Statistics Colorado State University Fort Collins Colorado
| | - F. Jay Breidt
- Department of Statistics Colorado State University Fort Collins Colorado
| | - Graham Peers
- Department of Biology Colorado State University Fort Collins Colorado
| |
Collapse
|
55
|
Van den Berge K, Hembach KM, Soneson C, Tiberi S, Clement L, Love MI, Patro R, Robinson MD. RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021255] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.
Collapse
Affiliation(s)
- Koen Van den Berge
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Katharina M. Hembach
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Simone Tiberi
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Lieven Clement
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Michael I. Love
- Department of Biostatistics and Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, New York 11794, USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
56
|
Gaiti F, Chaligne R, Gu H, Brand RM, Kothen-Hill S, Schulman R, Grigorev K, Risso D, Kim KT, Pastore A, Huang KY, Alonso A, Sheridan C, Omans ND, Biederstedt E, Clement K, Wang L, Felsenfeld JA, Bhavsar EB, Aryee MJ, Allan JN, Furman R, Gnirke A, Wu CJ, Meissner A, Landau DA. Epigenetic evolution and lineage histories of chronic lymphocytic leukaemia. Nature 2019; 569:576-580. [PMID: 31092926 PMCID: PMC6533116 DOI: 10.1038/s41586-019-1198-z] [Citation(s) in RCA: 162] [Impact Index Per Article: 32.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 04/12/2019] [Indexed: 11/22/2022]
Abstract
Genetic and epigenetic intra-tumoral heterogeneity cooperate to shape the evolutionary course of cancer1. Chronic lymphocytic leukaemia (CLL) is a highly informative model for cancer evolution as it undergoes substantial genetic diversification and evolution after therapy2,3. The CLL epigenome is also an important disease-defining feature4,5, and growing populations of cells in CLL diversify by stochastic changes in DNA methylation known as epimutations6. However, previous studies using bulk sequencing methods to analyse the patterns of DNA methylation were unable to determine whether epimutations affect CLL populations homogeneously. Here, to measure the epimutation rate at single-cell resolution, we applied multiplexed single-cell reduced-representation bisulfite sequencing to B cells from healthy donors and patients with CLL. We observed that the common clonal origin of CLL results in a consistently increased epimutation rate, with low variability in the cell-to-cell epimutation rate. By contrast, variable epimutation rates across healthy B cells reflect diverse evolutionary ages across the trajectory of B cell differentiation, consistent with epimutations serving as a molecular clock. Heritable epimutation information allowed us to reconstruct lineages at high-resolution with single-cell data, and to apply this directly to patient samples. The CLL lineage tree shape revealed earlier branching and longer branch lengths than in normal B cells, reflecting rapid drift after the initial malignant transformation and a greater proliferative history. Integration of single-cell bisulfite sequencing analysis with single-cell transcriptomes and genotyping confirmed that genetic subclones mapped to distinct clades, as inferred solely on the basis of epimutation information. Finally, to examine potential lineage biases during therapy, we profiled serial samples during ibrutinib-associated lymphocytosis, and identified clades of cells that were preferentially expelled from the lymph node after treatment, marked by distinct transcriptional profiles. The single-cell integration of genetic, epigenetic and transcriptional information thus charts the lineage history of CLL and its evolution with therapy.
Collapse
Affiliation(s)
- Federico Gaiti
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA
| | - Ronan Chaligne
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA
| | - Hongcang Gu
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Ryan Matthew Brand
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA
| | - Steven Kothen-Hill
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA
| | - Rafael Schulman
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA
| | | | - Davide Risso
- Weill Cornell Medicine, New York, NY, 10021, USA,Department of Statistical Sciences, University of Padova, Padova, 35121, Italy
| | - Kyu-Tae Kim
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA
| | - Alessandro Pastore
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Kevin Y. Huang
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA
| | | | | | - Nathaniel D. Omans
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA
| | - Evan Biederstedt
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA
| | - Kendell Clement
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Lili Wang
- Department of Pathology, Massachusetts General Hospital, Boston, MA, 02114, USA,Beckman Research Institute, City of Hope, Monrovia, CA, 91016, USA
| | | | | | - Martin J. Aryee
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA,Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | | | | | - Andreas Gnirke
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Catherine J. Wu
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA,Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Alexander Meissner
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA,Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| | - Dan A. Landau
- New York Genome Center, New York, NY, 10013, USA,Weill Cornell Medicine, New York, NY, 10021, USA,Corresponding author: Dan A. Landau, MD, PhD, Weill Cornell Medicine, Belfer Research Building, 413 East 69th Street, New York, NY 10021,
| |
Collapse
|
57
|
Love MI, Soneson C, Patro R. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res 2018; 7:952. [PMID: 30356428 PMCID: PMC6178912 DOI: 10.12688/f1000research.15398.3] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/27/2018] [Indexed: 12/30/2022] Open
Abstract
Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data.
Collapse
Affiliation(s)
- Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
58
|
Love MI, Soneson C, Patro R. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res 2018; 7:952. [PMID: 30356428 PMCID: PMC6178912 DOI: 10.12688/f1000research.15398.1] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/22/2018] [Indexed: 12/25/2022] Open
Abstract
Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data.
Collapse
Affiliation(s)
- Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
59
|
Love MI, Soneson C, Patro R. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res 2018; 7:952. [PMID: 30356428 PMCID: PMC6178912 DOI: 10.12688/f1000research.15398.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/10/2018] [Indexed: 09/29/2023] Open
Abstract
Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data.
Collapse
Affiliation(s)
- Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
60
|
Yi L, Pimentel H, Bray NL, Pachter L. Gene-level differential analysis at transcript-level resolution. Genome Biol 2018; 19:53. [PMID: 29650040 PMCID: PMC5896116 DOI: 10.1186/s13059-018-1419-z] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 03/08/2018] [Indexed: 11/23/2022] Open
Abstract
Compared to RNA-sequencing transcript differential analysis, gene-level differential expression analysis is more robust and experimentally actionable. However, the use of gene counts for statistical analysis can mask transcript-level dynamics. We demonstrate that ‘analysis first, aggregation second,’ where the p values derived from transcript analysis are aggregated to obtain gene-level results, increase sensitivity and accuracy. The method we propose can also be applied to transcript compatibility counts obtained from pseudoalignment of reads, which circumvents the need for quantification and is fast, accurate, and model-free. The method generalizes to various levels of biology and we showcase an application to gene ontologies.
Collapse
Affiliation(s)
- Lynn Yi
- UCLA-Caltech Medical Science Training Program, Los Angeles, CA, USA.,Division of Biology and Biological Engineering, Caltech, Pasadena, CA, USA
| | - Harold Pimentel
- Department of Genetics, Stanford University, Palo Alto, CA, USA
| | | | - Lior Pachter
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA, USA. .,Department of Computing and Mathematical Sciences, Caltech, Pasadena, CA, USA.
| |
Collapse
|
61
|
Derycke S, Kéver L, Herten K, Van den Berge K, Van Steenberge M, Van Houdt J, Clement L, Poncin P, Parmentier E, Verheyen E. Neurogenomic Profiling Reveals Distinct Gene Expression Profiles Between Brain Parts That Are Consistent in Ophthalmotilapia Cichlids. Front Neurosci 2018; 12:136. [PMID: 29593484 PMCID: PMC5855355 DOI: 10.3389/fnins.2018.00136] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Accepted: 02/20/2018] [Indexed: 01/22/2023] Open
Abstract
The detection of external and internal cues alters gene expression in the brain which in turn may affect neural networks that underly behavioral responses. Previous studies have shown that gene expression profiles differ between major brain regions within individuals and between species with different morphologies, cognitive abilities and/or behaviors. A detailed description of gene expression in all macroanatomical brain regions and in species with similar morphologies and behaviors is however lacking. Here, we dissected the brain of two cichlid species into six macroanatomical regions. Ophthalmotilapia nasuta and O. ventralis have similar morphology and behavior and occasionally hybridize in the wild. We use 3′ mRNA sequencing and a stage-wise statistical testing procedure to identify differential gene expression between females that were kept in a social setting with other females. Our results show that gene expression differs substantially between all six brain parts within species: out of 11,577 assessed genes, 8,748 are differentially expressed (DE) in at least one brain part compared to the average expression of the other brain parts. At most 16% of these DE genes have |log2FC| significantly higher than two. Functional differences between brain parts were consistent between species. The majority (61–79%) of genes that are DE in a particular brain part were shared between both species. Only 32 genes show significant differences in fold change across brain parts between species. These genes are mainly linked to transport, transmembrane transport, transcription (and its regulation) and signal transduction. Moreover, statistical equivalence testing reveals that within each comparison, on average 89% of the genes show an equivalent fold change between both species. The pronounced differences in gene expression between brain parts and the conserved patterns between closely related species with similar morphologies and behavior suggest that unraveling the interactions between genes and behavior will benefit from neurogenomic profiling of distinct brain regions.
Collapse
Affiliation(s)
- Sofie Derycke
- Operational Direction Taxonomy and Phylogeny, Royal Belgian Institute for Natural Sciences, Brussels, Belgium.,Department of Biology, Ghent University, Ghent, Belgium
| | - Loic Kéver
- Laboratory of Functional and Evolutionary Morphology, University of Liège, Liège, Belgium.,Behavioural Biology Unit, Ethology and Animal Psychology, University of Liège, Liège, Belgium
| | - Koen Herten
- Department of Human Genetics, Genomics Core Facility, KU Leuven, Leuven, Belgium
| | - Koen Van den Berge
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Maarten Van Steenberge
- Operational Direction Taxonomy and Phylogeny, Royal Belgian Institute for Natural Sciences, Brussels, Belgium.,Section Vertebrates, Ichthyology, Royal Museum for Central Africa, Tervuren, Belgium
| | - Jeroen Van Houdt
- Department of Human Genetics, Genomics Core Facility, KU Leuven, Leuven, Belgium
| | - Lieven Clement
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Pascal Poncin
- Behavioural Biology Unit, Ethology and Animal Psychology, University of Liège, Liège, Belgium
| | - Eric Parmentier
- Laboratory of Functional and Evolutionary Morphology, University of Liège, Liège, Belgium
| | - Erik Verheyen
- Operational Direction Taxonomy and Phylogeny, Royal Belgian Institute for Natural Sciences, Brussels, Belgium
| |
Collapse
|