1
|
Tiberi S, Meili J, Cai P, Soneson C, He D, Sarkar H, Avalos-Pacheco A, Patro R, Robinson MD. DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes. Biostatistics 2024; 25:1079-1093. [PMID: 38887902 PMCID: PMC11639160 DOI: 10.1093/biostatistics/kxae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 03/21/2024] [Accepted: 05/15/2024] [Indexed: 06/20/2024] Open
Abstract
Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.
Collapse
Affiliation(s)
- Simone Tiberi
- Department of Statistical Sciences, University of Bologna, Via delle Belle Arti 41, Bologna, 40126, Italy
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Joël Meili
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Peiying Cai
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Computational Biology Platform, Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Fabrikstrasse 24, Basel, 4056, Switzerland
| | - Dongze He
- Department of Cell Biology and Molecular Genetics, University of Maryland, 4062 Campus Drive, College Park, MD 20742, United States
- Center for Bioinformatics and Computational Biology, University of Maryland, 8125 Paint Branch Dr, College Park, MD 20742, United States
| | - Hirak Sarkar
- Department of Computer Science, Princeton University, 35 Olden St, Princeton, NJ 08540, United States
| | - Alejandra Avalos-Pacheco
- Research Unit of Applied Statistics, TU Wien, Wiedner Hauptstrabe 8-10/105, Wien 1040, Austria
- Harvard-MIT Center for Regulatory Science, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115200 Longwood Avenue, Boston, MA 02115, United States
| | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, 8125 Paint Branch Dr, College Park, MD 20742, United States
- Department of Computer Science, University of Maryland, 8125 Paint Branch Dr, College Park, MD 20742, United States
| | - Mark D Robinson
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| |
Collapse
|
2
|
Young AM, Van Buren S, Rashid NU. Differential transcript usage analysis incorporating quantification uncertainty via compositional measurement error regression modeling. Biostatistics 2024; 25:559-576. [PMID: 37040757 PMCID: PMC11017126 DOI: 10.1093/biostatistics/kxad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 12/22/2022] [Accepted: 02/06/2023] [Indexed: 04/13/2023] Open
Abstract
Differential transcript usage (DTU) occurs when the relative expression of multiple transcripts arising from the same gene changes between different conditions. Existing approaches to detect DTU often rely on computational procedures that can have speed and scalability issues as the number of samples increases. Here we propose a new method, CompDTU, that uses compositional regression to model the relative abundance proportions of each transcript that are of interest in DTU analyses. This procedure leverages fast matrix-based computations that make it ideally suited for DTU analysis with larger sample sizes. This method also allows for the testing of and adjustment for multiple categorical or continuous covariates. Additionally, many existing approaches for DTU ignore quantification uncertainty in the expression estimates for each transcript in RNA-seq data. We extend our CompDTU method to incorporate quantification uncertainty leveraging common output from RNA-seq expression quantification tool in a novel method CompDTUme. Through several power analyses, we show that CompDTU has excellent sensitivity and reduces false positive results relative to existing methods. Additionally, CompDTUme results in further improvements in performance over CompDTU with sufficient sample size for genes with high levels of quantification uncertainty, while also maintaining favorable speed and scalability. We motivate our methods using data from the Cancer Genome Atlas Breast Invasive Carcinoma data set, specifically using RNA-seq data from primary tumors for 740 patients with breast cancer. We show greatly reduced computation time from our new methods as well as the ability to detect several novel genes with significant DTU across different breast cancer subtypes.
Collapse
Affiliation(s)
- Amber M Young
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC, 27599, USA
| | - Scott Van Buren
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC, 27599, USA
| | - Naim U Rashid
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC, 27599, USA and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 450 West Drive, Chapel Hill, NC, 27599, USA
| |
Collapse
|
3
|
Jones EF, Haldar A, Oza VH, Lasseigne BN. Quantifying transcriptome diversity: a review. Brief Funct Genomics 2024; 23:83-94. [PMID: 37225889 PMCID: PMC11484519 DOI: 10.1093/bfgp/elad019] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 04/14/2023] [Accepted: 05/05/2023] [Indexed: 05/26/2023] Open
Abstract
Following the central dogma of molecular biology, gene expression heterogeneity can aid in predicting and explaining the wide variety of protein products, functions and, ultimately, heterogeneity in phenotypes. There is currently overlapping terminology used to describe the types of diversity in gene expression profiles, and overlooking these nuances can misrepresent important biological information. Here, we describe transcriptome diversity as a measure of the heterogeneity in (1) the expression of all genes within a sample or a single gene across samples in a population (gene-level diversity) or (2) the isoform-specific expression of a given gene (isoform-level diversity). We first overview modulators and quantification of transcriptome diversity at the gene level. Then, we discuss the role alternative splicing plays in driving transcript isoform-level diversity and how it can be quantified. Additionally, we overview computational resources for calculating gene-level and isoform-level diversity for high-throughput sequencing data. Finally, we discuss future applications of transcriptome diversity. This review provides a comprehensive overview of how gene expression diversity arises, and how measuring it determines a more complete picture of heterogeneity across proteins, cells, tissues, organisms and species.
Collapse
Affiliation(s)
- Emma F Jones
- The Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Anisha Haldar
- The Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Vishal H Oza
- The Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Brittany N Lasseigne
- The Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
4
|
Bravo S, Leiva F, Moya J, Guzman O, Vidal R. Unveiling the Role of Dynamic Alternative Splicing Modulation After Infestation with Sea Lice (Caligus rogercresseyi) in Atlantic Salmon. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2023; 25:223-234. [PMID: 36629943 DOI: 10.1007/s10126-023-10196-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 01/04/2023] [Indexed: 05/06/2023]
Abstract
Sea lice are pathogenic marine ectoparasite copepods that represent a severe risk to the worldwide salmon industry. Several transcriptomic investigations have characterized the regulation of gene expression response of Atlantic salmon to sea lice infestation. These studies have focused on the levels of transcript, overlooking the potentially relevant role of alternative splicing (AS), which corresponds to an essential control mechanism of gene expression through RNA processing. In the present study, we performed a genome-wide bioinformatics characterization of differential AS event dynamics in control and infested C. rogercresseyi Atlantic salmon and in resistant and susceptible phenotypes. We identified a significant rise of alternative splicing events and AS genes after infestation and 176 differential alternative splicing events (DASE) from 133 genes. In addition, a higher number of DASE and AS genes were observed among resistant and susceptible phenotypes. Functional annotation of AS genes shows several terms and pathways associated with behavior, RNA splicing, immune response, and RNA binding. Furthermore, three protein-coding genes were identified undergoing differential transcript usage events, among resistant and susceptible phenotypes. Our findings support AS performing a relevant regulatory role in the response of salmonids to sea lice infestation.
Collapse
Affiliation(s)
- Scarleth Bravo
- Laboratory of Molecular Ecology, Genomics and Evolutionary Studies, Department of Biology, Universidad de Santiago de Chile, Santiago, Chile
| | - Francisco Leiva
- Laboratory of Molecular Ecology, Genomics and Evolutionary Studies, Department of Biology, Universidad de Santiago de Chile, Santiago, Chile
| | - Javier Moya
- Benchmark Animal Health Chile, Santa Rosa 560 Of.26, Puerto Varas, Chile
| | - Osiel Guzman
- IDEVAC SpA, Francisco Bilbao 1129 Of. 306, Osorno, Chile
| | - Rodrigo Vidal
- Laboratory of Molecular Ecology, Genomics and Evolutionary Studies, Department of Biology, Universidad de Santiago de Chile, Santiago, Chile.
| |
Collapse
|
5
|
Wang D, Quesnel-Vallieres M, Jewell S, Elzubeir M, Lynch K, Thomas-Tikhonenko A, Barash Y. A Bayesian model for unsupervised detection of RNA splicing based subtypes in cancers. Nat Commun 2023; 14:63. [PMID: 36599821 PMCID: PMC9813260 DOI: 10.1038/s41467-022-35369-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 11/29/2022] [Indexed: 01/06/2023] Open
Abstract
Identification of cancer sub-types is a pivotal step for developing personalized treatment. Specifically, sub-typing based on changes in RNA splicing has been motivated by several recent studies. We thus develop CHESSBOARD, an unsupervised algorithm tailored for RNA splicing data that captures "tiles" in the data, defined by a subset of unique splicing changes in a subset of patients. CHESSBOARD allows for a flexible number of tiles, accounts for uncertainty of splicing quantification, and is able to model missing values as additional signals. We first apply CHESSBOARD to synthetic data to assess its domain specific modeling advantages, followed by analysis of several leukemia datasets. We show detected tiles are reproducible in independent studies, investigate their possible regulatory drivers and probe their relation to known AML mutations. Finally, we demonstrate the potential clinical utility of CHESSBOARD by supplementing mutation based diagnostic assays with discovered splicing profiles to improve drug response correlation.
Collapse
Affiliation(s)
- David Wang
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Mathieu Quesnel-Vallieres
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - San Jewell
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Moein Elzubeir
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristen Lynch
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Andrei Thomas-Tikhonenko
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
6
|
Karakulak T, Moch H, von Mering C, Kahraman A. Probing Isoform Switching Events in Various Cancer Types: Lessons From Pan-Cancer Studies. Front Mol Biosci 2021; 8:726902. [PMID: 34888349 PMCID: PMC8650491 DOI: 10.3389/fmolb.2021.726902] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/01/2021] [Indexed: 12/03/2022] Open
Abstract
Alternative splicing is an essential regulatory mechanism for gene expression in mammalian cells contributing to protein, cellular, and species diversity. In cancer, alternative splicing is frequently disturbed, leading to changes in the expression of alternatively spliced protein isoforms. Advances in sequencing technologies and analysis methods led to new insights into the extent and functional impact of disturbed alternative splicing events. In this review, we give a brief overview of the molecular mechanisms driving alternative splicing, highlight the function of alternative splicing in healthy tissues and describe how alternative splicing is disrupted in cancer. We summarize current available computational tools for analyzing differential transcript usage, isoform switching events, and the pathogenic impact of cancer-specific splicing events. Finally, the strategies of three recent pan-cancer studies on isoform switching events are compared. Their methodological similarities and discrepancies are highlighted and lessons learned from the comparison are listed. We hope that our assessment will lead to new and more robust methods for cancer-specific transcript detection and help to produce more accurate functional impact predictions of isoform switching events.
Collapse
Affiliation(s)
- Tülay Karakulak
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
- Swiss Informatics Institute, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Holger Moch
- Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
- Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Informatics Institute, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Abdullah Kahraman
- Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
- Swiss Informatics Institute, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
7
|
Tekath T, Dugas M. Differential transcript usage analysis of bulk and single-cell RNA-seq data with DTUrtle. Bioinformatics 2021; 37:3781-3787. [PMID: 34469510 PMCID: PMC8570804 DOI: 10.1093/bioinformatics/btab629] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 08/17/2021] [Accepted: 08/30/2021] [Indexed: 11/22/2022] Open
Abstract
Motivation Each year, the number of published bulk and single-cell RNA-seq datasets is growing exponentially. Studies analyzing such data are commonly looking at gene-level differences, while the collected RNA-seq data inherently represents reads of transcript isoform sequences. Utilizing transcriptomic quantifiers, RNA-seq reads can be attributed to specific isoforms, allowing for analysis of transcript-level differences. A differential transcript usage (DTU) analysis is testing for proportional differences in a gene’s transcript composition, and has been of rising interest for many research questions, such as analysis of differential splicing or cell-type identification. Results We present the R package DTUrtle, the first DTU analysis workflow for both bulk and single-cell RNA-seq datasets, and the first package to conduct a ‘classical’ DTU analysis in a single-cell context. DTUrtle extends established statistical frameworks, offers various result aggregation and visualization options and a novel detection probability score for tagged-end data. It has been successfully applied to bulk and single-cell RNA-seq data of human and mouse, confirming and extending key results. In addition, we present novel potential DTU applications like the identification of cell-type specific transcript isoforms as biomarkers. Availability and implementation The R package DTUrtle is available at https://github.com/TobiTekath/DTUrtle with extensive vignettes and documentation at https://tobitekath.github.io/DTUrtle/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tobias Tekath
- Institute of Medical Informatics, University Hospital of Münster, Münster, 48149, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, 69120, Germany
| |
Collapse
|
8
|
MOCCASIN: a method for correcting for known and unknown confounders in RNA splicing analysis. Nat Commun 2021; 12:3353. [PMID: 34099673 PMCID: PMC8184769 DOI: 10.1038/s41467-021-23608-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Accepted: 05/07/2021] [Indexed: 11/09/2022] Open
Abstract
The effects of confounding factors on gene expression analysis have been extensively studied following the introduction of high-throughput microarrays and subsequently RNA sequencing. In contrast, there is a lack of equivalent analysis and tools for RNA splicing. Here we first assess the effect of confounders on both expression and splicing quantifications in two large public RNA-Seq datasets (TARGET, ENCODE). We show quantification of splicing variations are affected at least as much as those of gene expression, revealing unwanted sources of variations in both datasets. Next, we develop MOCCASIN, a method to correct the effect of both known and unknown confounders on RNA splicing quantification and demonstrate MOCCASIN's effectiveness on both synthetic and real data. Code, synthetic and corrected datasets are all made available as resources.
Collapse
|
9
|
Jones DC, Ruzzo WL. Polee: RNA-Seq analysis using approximate likelihood. NAR Genom Bioinform 2021; 3:lqab046. [PMID: 34056596 PMCID: PMC8152449 DOI: 10.1093/nargab/lqab046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/11/2021] [Accepted: 05/11/2021] [Indexed: 12/20/2022] Open
Abstract
The analysis of mRNA transcript abundance with RNA-Seq is a central tool in molecular biology research, but often analyses fail to account for the uncertainty in these estimates, which can be significant, especially when trying to disentangle isoforms or duplicated genes. Preserving uncertainty necessitates a full probabilistic model of the all the sequencing reads, which quickly becomes intractable, as experiments can consist of billions of reads. To overcome these limitations, we propose a new method of approximating the likelihood function of a sparse mixture model, using a technique we call the Pólya tree transformation. We demonstrate that substituting this approximation for the real thing achieves most of the benefits with a fraction of the computational costs, leading to more accurate detection of differential transcript expression and transcript coexpression.
Collapse
Affiliation(s)
- Daniel C Jones
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Walter L Ruzzo
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA 98195-5065, USA
- Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., P.O. Box 19024, Seattle, WA 98109, USA
| |
Collapse
|
10
|
Gilis J, Vitting-Seerup K, Van den Berge K, Clement L. satuRn: Scalable analysis of differential transcript usage for bulk and single-cell RNA-sequencing applications. F1000Res 2021; 10:374. [PMID: 36762203 PMCID: PMC9892655 DOI: 10.12688/f1000research.51749.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/26/2022] [Indexed: 11/20/2022] Open
Abstract
Alternative splicing produces multiple functional transcripts from a single gene. Dysregulation of splicing is known to be associated with disease and as a hallmark of cancer. Existing tools for differential transcript usage (DTU) analysis either lack in performance, cannot account for complex experimental designs or do not scale to massive single-cell transcriptome sequencing (scRNA-seq) datasets. We introduce satuRn, a fast and flexible quasi-binomial generalized linear modelling framework that is on par with the best performing DTU methods from the bulk RNA-seq realm, while providing good false discovery rate control, addressing complex experimental designs, and scaling to scRNA-seq applications.
Collapse
Affiliation(s)
- Jeroen Gilis
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Data Mining and Modeling for Biomedicine, VIB Flemish Institute for Biotechnology, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| | - Kristoffer Vitting-Seerup
- Department of Biology, Kobenhavns Universitet, Copenhagen, 2200, Denmark
- Biotech Research and Innovation Centre (BRIC), Kobenhavns Universitet, Copenhagen, 2200, Denmark
- Danish Cancer Society Research Center, Copenhagen, 2100, Denmark
- Department of Health Technology, Danish Technical University, Kongens Lyngby, 2800, Denmark
| | - Koen Van den Berge
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
- Department of Statistics, University of California, Berkeley, Berkeley, California, USA
| | - Lieven Clement
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| |
Collapse
|
11
|
Gilis J, Vitting-Seerup K, Van den Berge K, Clement L. satuRn: Scalable analysis of differential transcript usage for bulk and single-cell RNA-sequencing applications. F1000Res 2021; 10:374. [PMID: 36762203 PMCID: PMC9892655 DOI: 10.12688/f1000research.51749.1] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/23/2021] [Indexed: 10/04/2023] Open
Abstract
Alternative splicing produces multiple functional transcripts from a single gene. Dysregulation of splicing is known to be associated with disease and as a hallmark of cancer. Existing tools for differential transcript usage (DTU) analysis either lack in performance, cannot account for complex experimental designs or do not scale to massive scRNA-seq data. We introduce satuRn, a fast and flexible quasi-binomial generalized linear modelling framework that is on par with the best performing DTU methods from the bulk RNA-seq realm, while providing good false discovery rate control, addressing complex experimental designs and scaling to scRNA-seq applications.
Collapse
Affiliation(s)
- Jeroen Gilis
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Data Mining and Modeling for Biomedicine, VIB Flemish Institute for Biotechnology, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| | - Kristoffer Vitting-Seerup
- Department of Biology, Kobenhavns Universitet, Copenhagen, 2200, Denmark
- Biotech Research and Innovation Centre (BRIC), Kobenhavns Universitet, Copenhagen, 2200, Denmark
- Danish Cancer Society Research Center, Copenhagen, 2100, Denmark
- Department of Health Technology, Danish Technical University, Kongens Lyngby, 2800, Denmark
| | - Koen Van den Berge
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
- Department of Statistics, University of California, Berkeley, Berkeley, California, USA
| | - Lieven Clement
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| |
Collapse
|
12
|
De Marchi T, Pyl PT, Sjöström M, Klasson S, Sartor H, Tran L, Pekar G, Malmström J, Malmström L, Niméus E. Proteogenomic Workflow Reveals Molecular Phenotypes Related to Breast Cancer Mammographic Appearance. J Proteome Res 2021; 20:2983-3001. [PMID: 33855848 PMCID: PMC8155562 DOI: 10.1021/acs.jproteome.1c00243] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Indexed: 12/21/2022]
Abstract
Proteogenomic approaches have enabled the generat̲ion of novel information levels when compared to single omics studies although burdened by extensive experimental efforts. Here, we improved a data-independent acquisition mass spectrometry proteogenomic workflow to reveal distinct molecular features related to mammographic appearances in breast cancer. Our results reveal splicing processes detectable at the protein level and highlight quantitation and pathway complementarity between RNA and protein data. Furthermore, we confirm previously detected enrichments of molecular pathways associated with estrogen receptor-dependent activity and provide novel evidence of epithelial-to-mesenchymal activity in mammography-detected spiculated tumors. Several transcript-protein pairs displayed radically different abundances depending on the overall clinical properties of the tumor. These results demonstrate that there are differentially regulated protein networks in clinically relevant tumor subgroups, which in turn alter both cancer biology and the abundance of biomarker candidates and drug targets.
Collapse
Affiliation(s)
- Tommaso De Marchi
- Division
of Surgery, Oncology, and Pathology, Department of Clinical Sciences, Lund University, Solvegatan 19, Lund SE-223 62, Sweden
| | - Paul Theodor Pyl
- Division
of Surgery, Oncology, and Pathology, Department of Clinical Sciences, Lund University, Solvegatan 19, Lund SE-223 62, Sweden
| | - Martin Sjöström
- Division
of Surgery, Oncology, and Pathology, Department of Clinical Sciences, Lund University, Solvegatan 19, Lund SE-223 62, Sweden
| | - Stina Klasson
- Department
Plastic and Reconstructive Surgery, Skåne
University Hospital, Inga Marie Nilssons gata 47, Malmö SE-20502, Sweden
| | - Hanna Sartor
- Division
of Diagnostic Radiology, Department of Translational Medicine, Skåne University Hospital, Entrégatan 7, Lund SE-22185, Sweden
| | - Lena Tran
- Division
of Surgery, Oncology, and Pathology, Department of Clinical Sciences, Lund University, Solvegatan 19, Lund SE-223 62, Sweden
| | - Gyula Pekar
- Division
of Oncology and Pathology, Department of Clinical Sciences, Lund University, Skåne University Hospital, Lund SE-22185, Sweden
| | - Johan Malmström
- Division
of Infection Medicine, Department of Clinical Sciences Lund, Faculty
of Medicine, Lund University, Klinikgatan 32, Lund SE-22184, Sweden
| | - Lars Malmström
- S3IT, University of Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland
- Institute
for Computational Science, University of
Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland
| | - Emma Niméus
- Division
of Surgery, Oncology, and Pathology, Department of Clinical Sciences, Lund University, Solvegatan 19, Lund SE-223 62, Sweden
- Department
of Surgery, Skåne University Hospital, Lund 222 42, Sweden
| |
Collapse
|
13
|
Gerber S, Schratt G, Germain PL. Streamlining differential exon and 3' UTR usage with diffUTR. BMC Bioinformatics 2021; 22:189. [PMID: 33849458 PMCID: PMC8045333 DOI: 10.1186/s12859-021-04114-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 03/30/2021] [Indexed: 12/13/2022] Open
Abstract
Background Despite the importance of alternative poly-adenylation and 3′ UTR length for a variety of biological phenomena, there are limited means of detecting UTR changes from standard transcriptomic data. Results We present the diffUTR Bioconductor package which streamlines and improves upon differential exon usage (DEU) analyses, and leverages existing DEU tools and alternative poly-adenylation site databases to enable differential 3′ UTR usage analysis. We demonstrate the diffUTR features and show that it is more flexible and more accurate than state-of-the-art alternatives, both in simulations and in real data. Conclusions diffUTR enables differential 3′ UTR analysis and more generally facilitates DEU and the exploration of their results. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04114-7.
Collapse
Affiliation(s)
- Stefan Gerber
- Group of Computational Neurogenomics, D-HEST Institute for Neurosciences, ETH Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland.,Lab of Systems Neuroscience, D-HEST Institute for Neurosciences, ETH Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Gerhard Schratt
- Lab of Systems Neuroscience, D-HEST Institute for Neurosciences, ETH Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Pierre-Luc Germain
- Group of Computational Neurogenomics, D-HEST Institute for Neurosciences, ETH Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. .,Lab of Statistical Bioinformatics, DMLS, University of Zürich, Winterthurerstrasse 190, 8057, Zurich, Switzerland. .,SIB Swiss Institute of Bioinformatics, Zurich, Switzerland.
| |
Collapse
|
14
|
Van Buren S, Sarkar H, Srivastava A, Rashid NU, Patro R, Love MI. Compression of quantification uncertainty for scRNA-seq counts. Bioinformatics 2021; 37:1699-1707. [PMID: 33471073 PMCID: PMC8289386 DOI: 10.1093/bioinformatics/btab001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/16/2020] [Accepted: 01/04/2021] [Indexed: 11/13/2022] Open
Abstract
Motivation Quantification estimates of gene expression from single-cell RNA-seq (scRNA-seq) data have inherent uncertainty due to reads that map to multiple genes. Many existing scRNA-seq quantification pipelines ignore multi-mapping reads and therefore underestimate expected read counts for many genes. alevin accounts for multi-mapping reads and allows for the generation of ‘inferential replicates’, which reflect quantification uncertainty. Previous methods have shown improved performance when incorporating these replicates into statistical analyses, but storage and use of these replicates increases computation time and memory requirements. Results We demonstrate that storing only the mean and variance from a set of inferential replicates (‘compression’) is sufficient to capture gene-level quantification uncertainty, while reducing disk storage to as low as 9% of original storage, and memory usage when loading data to as low as 6%. Using these values, we generate ‘pseudo-inferential’ replicates from a negative binomial distribution and propose a general procedure for incorporating these replicates into a proposed statistical testing framework. When applying this procedure to trajectory-based differential expression analyses, we show false positives are reduced by more than a third for genes with high levels of quantification uncertainty. We additionally extend the Swish method to incorporate pseudo-inferential replicates and demonstrate improvements in computation time and memory usage without any loss in performance. Lastly, we show that discarding multi-mapping reads can result in significant underestimation of counts for functionally important genes in a real dataset. Availability and implementation makeInfReps and splitSwish are implemented in the R/Bioconductor fishpond package available at https://bioconductor.org/packages/fishpond. Analyses and simulated datasets can be found in the paper’s GitHub repo at https://github.com/skvanburen/scUncertaintyPaperCode. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Scott Van Buren
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA
| | - Hirak Sarkar
- Department of Computer Science, University of Maryland College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland College Park, MD 20742, USA
| | - Avi Srivastava
- New York Genome Center, New York, NY 10013, USA.,Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Naim U Rashid
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA.,Lineberger Comprehensive Cancer Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland College Park, MD 20742, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA.,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| |
Collapse
|