1
|
Aquino J, Witoslawski D, Park S, Holder J, Amei A, Han MV. A novel splicing graph allows a direct comparison between exon-based and splice junction-based approaches to alternative splicing detection. Brief Bioinform 2025; 26:bbaf204. [PMID: 40341920 PMCID: PMC12062524 DOI: 10.1093/bib/bbaf204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 11/24/2024] [Accepted: 04/07/2025] [Indexed: 05/11/2025] Open
Abstract
There are primarily two computational approaches to alternative splicing (AS) detection using short reads: splice junction-based and exon-based approaches. Despite their shared goal of addressing the same biological problem, these approaches have not been reconciled before. We devised a novel graph structure and algorithm aimed at mapping between the exonic parts and splicing events detected by the two different methods. Through simulations, we demonstrated disparities in sensitivity and specificity between splice junction-based and exon-based methods. When applied to empirical data, there were large discrepancies in the results, suggesting that the methods are complementary. With the discrepancies localized to individual events and exonic parts, we were able to gain insights into the strengths and weaknesses inherent in each approach. Finally, we integrated the results to generate a comprehensive list of both common and unique AS events detected by both methodologies.
Collapse
Affiliation(s)
- Jelard Aquino
- School of Life Sciences, University of Nevada, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA
| | - Daniel Witoslawski
- School of Life Sciences, University of Nevada, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA
| | - Steve Park
- New York Medical College, 40 Sunshine Cottage Road, Valhalla, NY 10595, USA
| | - Jessica Holder
- School of Life Sciences, University of Nevada, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA
| | - Mira V Han
- School of Life Sciences, University of Nevada, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA
| |
Collapse
|
2
|
Singlan N, Abou Choucha F, Pasquier C. A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks. Sci Rep 2025; 15:11360. [PMID: 40175439 PMCID: PMC11965526 DOI: 10.1038/s41598-025-95749-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2025] [Accepted: 03/24/2025] [Indexed: 04/04/2025] Open
Abstract
Real-world networks, such as biological networks, often exhibit complex structures and have attributes associated with nodes, which leads to significant challenges for analysis and modeling. Community detection algorithms can help identify groups of nodes of particular importance. However, traditional methods focus primarily on topological information, overlooking the importance of attribute-based similarities. This limitation hinders their ability to identify functionally coherent subnetworks. To address this, we propose a new scoring method for graph partitioning on the basis of a novel similarity function between node attributes. We then adapt the Louvain algorithm to optimize this scoring function, enabling the identification of communities that are both densely connected and functionally coherent. Extensive experiments on diverse biological networks, including artificial and real-world datasets, demonstrate the superiority of our approach over state-of-the-art methods. By leveraging both topological and attribute-based information, our approach provides a powerful tool for uncovering biologically meaningful modules and gaining deeper insights into complex biological processes.
Collapse
Affiliation(s)
- Nina Singlan
- Université Côte d'Azur, CNRS, i3S, 06560, Valbonne, France.
| | | | | |
Collapse
|
3
|
Mabin JW, Vock IW, Machyna M, Haque N, Thakran P, Zhang A, Rai G, Leibler INM, Inglese J, Simon MD, Hogg JR. Uncovering the isoform-resolution kinetic landscape of nonsense-mediated mRNA decay with EZbakR. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.12.642874. [PMID: 40161772 PMCID: PMC11952489 DOI: 10.1101/2025.03.12.642874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Cellular RNA levels are a product of synthesis and degradation kinetics, which can differ among transcripts of the same gene. An important cause of isoform-specific decay is the nonsense-mediated mRNA decay (NMD) pathway, which degrades transcripts with premature termination codons (PTCs) and other features. Understanding NMD functions requires strategies to quantify isoform kinetics; however, current approaches remain limited. Methods like nucleotide-recoding RNA-seq (NR-seq) enable insights into RNA kinetics, but existing bioinformatic tools do not provide robust, isoform-specific degradation rate constant estimates. We extend the EZbakR-suite by implementing a strategy to infer isoform-level kinetics from short-read NR-seq data. This approach uncovers unexpected variability in NMD efficiency among transcripts with conserved PTC-containing exons and rapid decay of a subset of mRNAs lacking PTCs. Our findings highlight the effects of competition between NMD and other decay pathways, provide mechanistic insights into established NMD efficiency correlates, and identify transcript features promoting efficient decay.
Collapse
Affiliation(s)
- Justin W. Mabin
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Isaac W. Vock
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
| | - Martin Machyna
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
- Present address: Paul-Ehrlich-Institut, Host-Pathogen-Interactions, 63225 Langen, Germany
| | - Nazmul Haque
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Present address: Ultragenyx, 7000 Shoreline Ct, South San Francisco, CA 94080
| | - Poonam Thakran
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Alexandra Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
| | - Ganesha Rai
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, 20850 Maryland, USA
| | | | - James Inglese
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, 20850 Maryland, USA
- Metabolic Medicine Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Matthew D. Simon
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA
| | - J. Robert Hogg
- Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
4
|
Prigozhin DM, Sutherland CA, Rangavajjhala S, Krasileva KV. Majority of the Highly Variable NLRs in Maize Share Genomic Location and Contain Additional Target-Binding Domains. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2025; 38:275-284. [PMID: 39013614 DOI: 10.1094/mpmi-05-24-0047-fi] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Nucleotide-binding, leucine-rich repeat (LRR) proteins (NLRs) are a major class of immune receptors in plants. NLRs include both conserved and rapidly evolving members; however, their evolutionary trajectory in crops remains understudied. Availability of crop pan-genomes enables analysis of the recent events in the evolution of this highly complex gene family within domesticated species. Here, we investigated the NLR complement of 26 nested association mapping (NAM) founder lines of maize. We found that maize has just four main subfamilies containing rapidly evolving highly variable NLR (hvNLR) receptors. Curiously, three of these phylogenetically distinct hvNLR lineages are located in adjacent clusters on chromosome 10. Members of the same hvNLR clade show variable expression and methylation across lines and tissues, which is consistent with their rapid evolution. By combining sequence diversity analysis and AlphaFold2 computational structure prediction, we predicted ligand-binding sites in the hvNLRs. We also observed novel insertion domains in the LRR regions of two hvNLR subfamilies that likely contribute to target recognition. To make this analysis accessible, we created NLRCladeFinder, a Google Colaboratory notebook, that accepts any newly identified NLR sequence, places it in the evolutionary context of the maize pan-NLRome, and provides an updated clade alignment, phylogenetic tree, and sequence diversity information for the gene of interest. [Formula: see text] Copyright © 2024 The Author(s). This is an open access article distributed under the CC BY 4.0 International license.
Collapse
Affiliation(s)
- Daniil M Prigozhin
- Molecular Biophysics and Integrated Bioimaging Division, Berkeley Center for Structural Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, U.S.A
| | - Chandler A Sutherland
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, U.S.A
| | - Sanjay Rangavajjhala
- Molecular Biophysics and Integrated Bioimaging Division, Berkeley Center for Structural Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, U.S.A
| | - Ksenia V Krasileva
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, U.S.A
| |
Collapse
|
5
|
Singh NP, Wu EY, Fan J, Love MI, Patro R. Tree-based differential testing using inferential uncertainty for RNA-Seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2023.12.25.573288. [PMID: 38234739 PMCID: PMC10793400 DOI: 10.1101/2023.12.25.573288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Identifying differentially expressed transcripts poses a crucial yet challenging problem in transcriptomics. Substantial uncertainty is associated with the abundance estimates of certain transcripts which, if ignored, can lead to the exaggeration of false positives and, if included, may lead to reduced power. Given a set of RNA-Seq samples, TreeTerminus arranges transcripts in a hierarchical tree structure that encodes different layers of resolution for interpretation of the abundance of transcriptional groups, with uncertainty generally decreasing as one ascends the tree from the leaves. We introduce mehenDi, which utilizes the tree structure from TreeTerminus for differential testing. The nodes output by mehenDi, called the selected nodes are determined in a data-driven manner to maximize the signal that can be extracted from the data while controlling for the uncertainty associated with estimating the transcript abundances. The identified selected nodes can include transcripts and inner nodes, with no two nodes having an ancestor/descendant relationship. We evaluated our method on both simulated and experimental datasets, comparing its performance with other tree-based differential methods as well as with uncertainty-aware differential transcript/gene expression methods. Our method detects inner nodes that show a strong signal for differential expression, which would have been overlooked when analyzing the transcripts alone.
Collapse
Affiliation(s)
- Noor Pratap Singh
- Department of Computer Science, University of Maryland, College Park
| | - Euphy Y Wu
- Department of Biostatistics, University of North Carolina-Chapel Hill
| | - Jason Fan
- Department of Computer Science, University of Maryland, College Park
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill
- Department of Genetics, University of North Carolina-Chapel Hill
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park
| |
Collapse
|
6
|
Kubota N, Chen L, Zheng S. Shiba: a versatile computational method for systematic identification of differential RNA splicing across platforms. Nucleic Acids Res 2025; 53:gkaf098. [PMID: 39997221 PMCID: PMC11851117 DOI: 10.1093/nar/gkaf098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 02/04/2025] [Indexed: 02/26/2025] Open
Abstract
Alternative pre-mRNA splicing (AS) is a fundamental regulatory process that generates transcript diversity and cell type variation. We developed Shiba, a comprehensive method that integrates transcript assembly, splicing event identification, read counting, and differential splicing analysis across RNA-seq platforms. Shiba excels in capturing annotated and unannotated AS events with superior accuracy, sensitivity, and reproducibility. It addresses the often-overlooked issue of junction read imbalance, significantly reducing false positives to aid target prioritization and downstream analyses. Unlike other tools that require large numbers of biological replicates or resulting in low sensitivity and high false positives, Shiba's statistics framework is agnostic to sample size, as demonstrated by simulated data and its effective application to real n= 1 RNA-seq datasets. To extend its utility to single-cell RNA-seq, we developed scShiba, which applies Shiba's pseudobulk approach to analyze splicing at the cluster level. scShiba successfully revealed AS regulation in developmental dopaminergic neurons and differences between excitatory and inhibitory neurons. Both Shiba and scShiba are available in Docker/Singularity containers and Snakemake pipelines, ensuring reproducibility. With their comprehensive capabilities, Shiba and scShiba enable systematic quantification of alternative splicing events across various platforms, laying a solid foundation for mechanistic exploration of the functional complexity in RNA splicing.
Collapse
Affiliation(s)
- Naoto Kubota
- Division of Biomedical Sciences, School of Medicine, University of California, Riverside, CA 92521, United States
- Center for RNA Biology and Medicine, University of California, Riverside, CA 92521, United States
| | - Liang Chen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, United States
| | - Sika Zheng
- Division of Biomedical Sciences, School of Medicine, University of California, Riverside, CA 92521, United States
- Center for RNA Biology and Medicine, University of California, Riverside, CA 92521, United States
| |
Collapse
|
7
|
Van Hecke M, Beerenwinkel N, Lootens T, Fostier J, Raedt R, Marchal K. ELLIPSIS: robust quantification of splicing in scRNA-seq. Bioinformatics 2025; 41:btaf028. [PMID: 39936571 PMCID: PMC11878791 DOI: 10.1093/bioinformatics/btaf028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 12/09/2024] [Accepted: 02/10/2025] [Indexed: 02/13/2025] Open
Abstract
MOTIVATION Alternative splicing is a tightly regulated biological process, that due to its cell type specific behavior, calls for analysis at the single cell level. However, quantifying differential splicing in scRNA-seq is challenging due to low and uneven coverage. Hereto, we developed ELLIPSIS, a tool for robust quantification of splicing in scRNA-seq that leverages locally observed read coverage with conservation of flow and intra-cell type similarity properties. Additionally, it is also able to quantify splicing in novel splicing events, which is extremely important in cancer cells where lots of novel splicing events occur. RESULTS Application of ELLIPSIS to simulated data proves that our method is able to robustly estimate Percent Spliced In values in simulated data, and allows to reliably detect differential splicing between cell types. Using ELLIPSIS on glioblastoma scRNA-seq data, we identified genes that are differentially spliced between cancer cells in the tumor core and infiltrating cancer cells found in peripheral tissue. These genes showed to play a role in a.o. cell migration and motility, cell projection organization, and neuron projection guidance. AVAILABILITY AND IMPLEMENTATION ELLIPSIS quantification tool: https://github.com/MarchalLab/ELLIPSIS.git.
Collapse
Affiliation(s)
- Marie Van Hecke
- IDLab, Department of Information Technology, Ghent University-IMEC, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent University, 9000 Ghent, Belgium
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zürich, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4051 Basel, Switzerland
| | - Thibault Lootens
- Cancer Research Institute Ghent (CRIG), Ghent University, 9000 Ghent, Belgium
- 4Brain, Department of Head and Skin, Ghent University, 9000 Ghent, Belgium
- Laboratory of Experimental Cancer Research, Department of Human Structure and Repair, Ghent University, 9000 Ghent, Belgium
| | - Jan Fostier
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Robrecht Raedt
- Cancer Research Institute Ghent (CRIG), Ghent University, 9000 Ghent, Belgium
- 4Brain, Department of Head and Skin, Ghent University, 9000 Ghent, Belgium
| | - Kathleen Marchal
- IDLab, Department of Information Technology, Ghent University-IMEC, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
8
|
Guo W, Schreiber M, Marosi VB, Bagnaresi P, Jørgensen ME, Braune KB, Chalmers K, Chapman B, Dang V, Dockter C, Fiebig A, Fincher GB, Fricano A, Fuller J, Haaning A, Haberer G, Himmelbach A, Jayakodi M, Jia Y, Kamal N, Langridge P, Li C, Lu Q, Lux T, Mascher M, Mayer KFX, McCallum N, Milne L, Muehlbauer GJ, Nielsen MTS, Padmarasu S, Pedas PR, Pillen K, Pozniak C, Rasmussen MW, Sato K, Schmutzer T, Scholz U, Schüler D, Šimková H, Skadhauge B, Stein N, Thomsen NW, Voss C, Wang P, Wonneberger R, Zhang XQ, Zhang G, Cattivelli L, Spannagl M, Bayer M, Simpson C, Zhang R, Waugh R. A barley pan-transcriptome reveals layers of genotype-dependent transcriptional complexity. Nat Genet 2025; 57:441-450. [PMID: 39901014 PMCID: PMC11821519 DOI: 10.1038/s41588-024-02069-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 12/20/2024] [Indexed: 02/05/2025]
Abstract
A pan-transcriptome describes the transcriptional and post-transcriptional consequences of genome diversity from multiple individuals within a species. We developed a barley pan-transcriptome using 20 inbred genotypes representing domesticated barley diversity by generating and analyzing short- and long-read RNA-sequencing datasets from multiple tissues. To overcome single reference bias in transcript quantification, we constructed genotype-specific reference transcript datasets (RTDs) and integrated these into a linear pan-genome framework to create a pan-RTD, allowing transcript categorization as core, shell or cloud. Focusing on the core (expressed in all genotypes), we observed significant transcript abundance variation among tissues and between genotypes driven partly by RNA processing, gene copy number, structural rearrangements and conservation of promotor motifs. Network analyses revealed conserved co-expression module::tissue correlations and frequent functional diversification. To complement the pan-transcriptome, we constructed a comprehensive cultivar (cv.) Morex gene-expression atlas and illustrate how these combined datasets can be used to guide biological inquiry.
Collapse
Affiliation(s)
- Wenbin Guo
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland
- Higentec Breeding Innovation (ZheJiang) Co., Ltd., Lishui, China
| | - Miriam Schreiber
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland
| | - Vanda B Marosi
- Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health (PGSB), Neuherberg, Germany
- School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Paolo Bagnaresi
- Council for Agriculture Research and Economics (CREA) Research Centre for Genomics and Bioinformatics, Fiorenzuola d'Arda, Italy
- CREA Research Centre for Olive, Fruit and Citrus Crops, Forlì, Italy
| | | | | | - Ken Chalmers
- School of Agriculture, Food and Wine, University of Adelaide, Waite Campus, Urrbrae, South Australia, Australia
| | - Brett Chapman
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | - Viet Dang
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | | | - Anne Fiebig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Geoffrey B Fincher
- School of Agriculture, Food and Wine, University of Adelaide, Waite Campus, Urrbrae, South Australia, Australia
| | - Agostino Fricano
- Council for Agriculture Research and Economics (CREA) Research Centre for Genomics and Bioinformatics, Fiorenzuola d'Arda, Italy
| | - John Fuller
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland
| | - Allison Haaning
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Georg Haberer
- Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health (PGSB), Neuherberg, Germany
| | - Axel Himmelbach
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas, TX, USA
- Department of Soil & Crop Sciences, Texas A&M University, College Station, TX, USA
| | - Yong Jia
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | - Nadia Kamal
- Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health (PGSB), Neuherberg, Germany
- Department of Molecular Life Sciences, Computational Plant Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Peter Langridge
- School of Agriculture, Food and Wine, University of Adelaide, Waite Campus, Urrbrae, South Australia, Australia
| | - Chengdao Li
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
- College of Agriculture, Yangtze University, Jinzhou, China
- Department of Primary Industry and Regional Development Western Australia, South Perth, Western Australia, Australia
| | - Qiongxian Lu
- Carlsberg Research Laboratory (CRL), Copenhagen, Denmark
| | - Thomas Lux
- Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health (PGSB), Neuherberg, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Klaus F X Mayer
- Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health (PGSB), Neuherberg, Germany
| | - Nicola McCallum
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland
| | - Linda Milne
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland
| | - Gary J Muehlbauer
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA
| | | | - Sudharsan Padmarasu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Pai Rosager Pedas
- Carlsberg Research Laboratory (CRL), Copenhagen, Denmark
- DLF, Roskilde, Denmark
| | - Klaus Pillen
- Chair of Plant Breeding, Institute of Agricultural and Nutritional Sciences, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Curtis Pozniak
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan (USASK), Saskatoon, Saskatchewan, Canada
| | | | - Kazuhiro Sato
- Institute of Plant Science and Resources, Okayama University, Kurashiki, Japan
- Kazusa DNA Research Institute, Kisarazu, Japan
| | - Thomas Schmutzer
- Chair of Plant Breeding, Institute of Agricultural and Nutritional Sciences, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Hana Šimková
- Institute of Experimental Botany of the Czech Academy of Sciences, Olomouc, Czech Republic
| | | | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Chair of Crop Plant Genetics, Institute of Agricultural and Nutritional Sciences, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Nina W Thomsen
- Carlsberg Research Laboratory (CRL), Copenhagen, Denmark
| | - Cynthia Voss
- Carlsberg Research Laboratory (CRL), Copenhagen, Denmark
| | - Penghao Wang
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | - Ronja Wonneberger
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Xiao-Qi Zhang
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | - Guoping Zhang
- College of Agriculture & Biotechnology, Zhejiang University, Hangzhou, China
| | - Luigi Cattivelli
- Council for Agriculture Research and Economics (CREA) Research Centre for Genomics and Bioinformatics, Fiorenzuola d'Arda, Italy
| | - Manuel Spannagl
- Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health (PGSB), Neuherberg, Germany
| | - Micha Bayer
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland.
| | - Craig Simpson
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland.
| | - Runxuan Zhang
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland.
| | - Robbie Waugh
- International Barley Hub (IBH)/James Hutton Institute (JHI), Dundee, Scotland.
- School of Agriculture, Food and Wine, University of Adelaide, Waite Campus, Urrbrae, South Australia, Australia.
- School of Life Sciences, University of Dundee, Dundee, UK.
| |
Collapse
|
9
|
Kubota N, Chen L, Zheng S. Shiba: A versatile computational method for systematic identification of differential RNA splicing across platforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.05.30.596331. [PMID: 38895326 PMCID: PMC11185541 DOI: 10.1101/2024.05.30.596331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Alternative pre-mRNA splicing (AS) is a fundamental regulatory process that generates transcript diversity and cell type variation. We developed Shiba, a comprehensive method that integrates transcript assembly, splicing event identification, read counting, and differential splicing analysis across RNA-seq platforms. Shiba excels in capturing annotated and unannotated AS events with superior accuracy, sensitivity, and reproducibility. It addresses the often-overlooked issue of junction read imbalance, significantly reducing false positives to aid target prioritization and downstream analyses. Unlike other tools that require large numbers of biological replicates or resulting in low sensitivity and high false positives, Shiba's statistics framework is agnostic to sample size, as demonstrated by simulated data and its effective application to real n=1 RNA-seq datasets. To extend its utility to single-cell RNA-seq, we developed scShiba, which applies Shiba's pseudobulk approach to analyze splicing at the cluster level. scShiba successfully revealed AS regulation in developmental dopaminergic neurons and differences between excitatory and inhibitory neurons. Both Shiba and scShiba are available in Docker/Singularity containers and Snakemake pipelines, ensuring reproducibility. With their comprehensive capabilities, Shiba and scShiba enable systematic quantification of alternative splicing events across various platforms, laying a solid foundation for mechanistic exploration of the functional complexity in RNA splicing.
Collapse
Affiliation(s)
- Naoto Kubota
- Division of Biomedical Sciences, School of Medicine, University of California, Riverside, CA 92521, USA
- Center for RNA Biology and Medicine, University of California, Riverside, CA 92521, USA
| | - Liang Chen
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Sika Zheng
- Division of Biomedical Sciences, School of Medicine, University of California, Riverside, CA 92521, USA
- Center for RNA Biology and Medicine, University of California, Riverside, CA 92521, USA
| |
Collapse
|
10
|
Zhang Y, Tang L, Zhi S, Hu B, Zuo Z, Ren J, Xie Y, Luo X. M6Allele: a toolkit for detection of allele-specific RNA N6-methyladenosine modifications. Gigascience 2025; 14:giaf040. [PMID: 40388309 PMCID: PMC12087454 DOI: 10.1093/gigascience/giaf040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 01/05/2025] [Accepted: 03/07/2025] [Indexed: 05/21/2025] Open
Abstract
BACKGROUND Allelic gene-specific regulatory events are crucial mechanisms in organisms, pivotal to many fundamental biological processes such as embryonic development and chromosome inactivation. Allelic gene imbalance manifests at both RNA expression and epigenetic levels. Recent research has unveiled allelic-specific regulation of RNA N6-methyladenosine (m6A), emphasizing the need for its precise identification. However, prevailing approaches primarily focus on screening allele-specific genetic variations associated with m6A, but not truly identify allelic m6A events. Therefore, the construction of a novel algorithm dedicated to identifying allele-specific m6A (ASm6A) signals is still necessary for comprehensively understanding the regulatory mechanism of ASm6A. FINDINGS To address this limitation, we have developed a meta-analysis approach using hierarchical Bayesian models to accurately detect ASm6A events at the peak level from MeRIP-seq data. For user convenience, we introduce a unified analysis pipeline named M6Allele, streamlining the assessment of significant ASm6A across single and paired samples. Applying M6Allele to MeRIP-seq data analysis of pulmonary fibrosis and lung adenocarcinoma reveals enrichment of ASm6A events in key regulatory genes associated with these diseases, suggesting their potential involvement in disease regulation. CONCLUSIONS Our effort provides a method for precisely identifying ASm6A events at the peak level, elucidates the interplay of m6A with human health and disease genetics, and paves a new visual angle for disease research. The M6Allele software is freely available at https://github.com/RenLabBioinformatics/M6Allele under the MIT license.
Collapse
Affiliation(s)
- Yin Zhang
- Innovation Center of the Sixth Affiliated hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Lin Tang
- Innovation Center of the Sixth Affiliated hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Shengyao Zhi
- Guangdong Provincial Key Laboratory of Pharmaceutical Bioactive Substances, School of Biosciences and Biopharmaceutics, Guangdong Pharmaceutical University, Guangzhou 510006, China
| | - Bosu Hu
- Innovation Center of the Sixth Affiliated hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Zhixiang Zuo
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Jian Ren
- Innovation Center of the Sixth Affiliated hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Yubin Xie
- Institute of Precision Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510060, China
| | - Xiaotong Luo
- Innovation Center of the Sixth Affiliated hospital, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
- Guangdong Institute of Gastroenterology, Biomedical Innovation Center, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510060, China
| |
Collapse
|
11
|
Li X, Peng L, Wang YP, Zhang W. Open challenges and opportunities in federated foundation models towards biomedical healthcare. BioData Min 2025; 18:2. [PMID: 39755653 DOI: 10.1186/s13040-024-00414-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Accepted: 12/09/2024] [Indexed: 01/06/2025] Open
Abstract
This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) in biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, which are trained on vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, and reinforcement learning from human feedback, represent significant advancements in machine learning. These models, with their ability to generate coherent text and realistic images, are crucial for biomedical applications that require processing diverse data forms such as clinical reports, diagnostic images, and multimodal patient interactions. The incorporation of FL with these sophisticated models presents a promising strategy to harness their analytical power while safeguarding the privacy of sensitive medical data. This approach not only enhances the capabilities of FMs in medical diagnostics and personalized treatment but also addresses critical concerns about data privacy and security in healthcare. This survey reviews the current applications of FMs in federated settings, underscores the challenges, and identifies future research directions including scaling FMs, managing data diversity, and enhancing communication efficiency within FL frameworks. The objective is to encourage further research into the combined potential of FMs and FL, laying the groundwork for healthcare innovations.
Collapse
Affiliation(s)
- Xingyu Li
- Department of Computer Science, Tulane University, New Orleans, LA, USA
| | - Lu Peng
- Department of Computer Science, Tulane University, New Orleans, LA, USA.
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA
| | - Weihua Zhang
- School of Computer Science, Fudan University, Shanghai, China
| |
Collapse
|
12
|
Ou J, Liu H, Park S, Green MR, Zhu LJ. InPAS: An R/Bioconductor Package for Identifying Novel Polyadenylation Sites and Alternative Polyadenylation from Bulk RNA-seq Data. Front Biosci (Schol Ed) 2024; 16:21. [PMID: 39736014 DOI: 10.31083/j.fbs1604021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 09/20/2024] [Accepted: 10/10/2024] [Indexed: 12/31/2024]
Abstract
BACKGROUND Alternative cleavage and polyadenylation (APA) is a crucial post-transcriptional gene regulation mechanism that regulates gene expression in eukaryotes by increasing the diversity and complexity of both the transcriptome and proteome. Despite the development of more than a dozen experimental methods over the last decade to identify and quantify APA events, widespread adoption of these methods has been limited by technical, financial, and time constraints. Consequently, APA remains poorly understood in most eukaryotes. However, RNA sequencing (RNA-seq) technology has revolutionized transcriptome profiling and recent studies have shown that RNA-seq data can be leveraged to identify and quantify APA events. RESULTS To fully capitalize on the exponentially growing RNA-seq data, we developed InPAS (Identification of Novel alternative PolyAdenylation Sites), an R/Bioconductor package for accurate identification of novel and known cleavage and polyadenylation sites (CPSs), as well as quantification of APA from RNA-seq data of various experimental designs. Compared to other APA analysis tools, InPAS offers several important advantages, including the ability to detect both novel proximal and distal CPSs, to fine tune positions of CPSs using a naïve Bayes classifier based on flanking sequence features, and to identify APA events from RNA-seq data of complex experimental designs using linear models. We benchmarked the performance of InPAS and other leading tools using simulated and experimental RNA-seq data with matched 3'-end RNA-seq data. Our results reveal that InPAS frequently outperforms existing tools in terms of precision, sensitivity, and specificity. Furthermore, we demonstrate its scalability and versatility by applying it to large, diverse RNA-seq datasets. CONCLUSIONS InPAS is an efficient and robust tool for identifying and quantifying APA events using readily accessible conventional RNA-seq data. Its versatility opens doors to explore APA regulation across diverse eukaryotic systems with various experimental designs. We believe that InPAS will drive APA research forward, deepening our understanding of its role in regulating gene expression, and potentially leading to the discovery of biomarkers or therapeutics for diseases.
Collapse
Affiliation(s)
- Jianhong Ou
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Regeneration Center, Duke University School of Medicine, Duke University, Durham, NC 27701, USA
| | - Haibo Liu
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Sungmi Park
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Michael R Green
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Lihua Julie Zhu
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Department of Molecular Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| |
Collapse
|
13
|
Zhang X. Highly effective batch effect correction method for RNA-seq count data. Comput Struct Biotechnol J 2024; 27:58-64. [PMID: 39802213 PMCID: PMC11718288 DOI: 10.1016/j.csbj.2024.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 12/13/2024] [Accepted: 12/14/2024] [Indexed: 01/16/2025] Open
Abstract
RNA sequencing (RNA-seq) has become a cornerstone of transcriptomics, providing detailed insights into gene expression across diverse biological conditions and sample types. However, RNA-seq data are often confounded by batch effects, systematic non-biological variations that compromise data reliability and obscure true biological differences. To address these challenges, we introduce ComBat-ref, a refined batch effect correction method designed to enhance the statistical power and reliability of differential expression analysis in RNA-seq data. Building on the principles of ComBat-seq, ComBat-ref employs a negative binomial model for count data adjustment but innovates by selecting a reference batch with the smallest dispersion, preserving count data for the reference batch, and adjusting other batches towards the reference batch. Our method demonstrated superior performance in both simulated environments and real-world datasets, including the growth factor receptor network (GFRN) data and NASA GeneLab transcriptomic datasets, significantly improving sensitivity and specificity compared to existing methods. By effectively mitigating batch effects while maintaining high detection power, ComBat-ref provides a robust solution for improving the accuracy and interpretability of RNA-seq data analyses.
Collapse
Affiliation(s)
- Xiaoyu Zhang
- Department of Computer Science and Information Science, California State University San Marcos, 333 S. Twin Oaks Valley Rd, San Marcos, CA 92096, USA
| |
Collapse
|
14
|
Simon NM, Kim Y, Gribnau J, Bautista DM, Dutton JR, Brem RB. Stem cell transcriptional profiles from mouse subspecies reveal cis-regulatory evolution at translation genes. Heredity (Edinb) 2024; 133:308-316. [PMID: 39164520 PMCID: PMC11527988 DOI: 10.1038/s41437-024-00715-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 08/06/2024] [Accepted: 08/08/2024] [Indexed: 08/22/2024] Open
Abstract
A key goal of evolutionary genomics is to harness molecular data to draw inferences about selective forces that have acted on genomes. The field progresses in large part through the development of advanced molecular-evolution analysis methods. Here we explored the intersection between classical sequence-based tests for selection and an empirical expression-based approach, using stem cells from Mus musculus subspecies as a model. Using a test of directional, cis-regulatory evolution across genes in pathways, we discovered a unique program of induction of translation genes in stem cells of the Southeast Asian mouse M. m. castaneus relative to its sister taxa. We then mined population-genomic sequences to pursue underlying regulatory mechanisms for this expression divergence, finding robust evidence for alleles unique to M. m. castaneus at the upstream regions of the translation genes. We interpret our data under a model of changes in lineage-specific pressures across Mus musculus in stem cells with high translational capacity. Our findings underscore the rigor of integrating expression and sequence-based methods to generate hypotheses about evolutionary events from long ago.
Collapse
Affiliation(s)
- Noah M Simon
- Biology of Aging Doctoral Program, Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, 90089, USA
- Buck Institute for Research on Aging, Novato, CA, 94945, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Yujin Kim
- Stem Cell Institute, University of Minnesota, Minneapolis, MN, 55455, USA
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Joost Gribnau
- Department of Reproduction and Development, Erasmus MC, Rotterdam, PO Box 2040, CA, 3000, Netherlands
| | - Diana M Bautista
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - James R Dutton
- Stem Cell Institute, University of Minnesota, Minneapolis, MN, 55455, USA
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Rachel B Brem
- Buck Institute for Research on Aging, Novato, CA, 94945, USA.
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, 94720, USA.
| |
Collapse
|
15
|
Santiago KCL, Shrestha AMS. DNA-protein quasi-mapping for rapid differential gene expression analysis in non-model organisms. BMC Bioinformatics 2024; 25:335. [PMID: 39448913 PMCID: PMC11515663 DOI: 10.1186/s12859-024-05924-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 09/05/2024] [Indexed: 10/26/2024] Open
Abstract
BACKGROUND Conventional differential gene expression analysis pipelines for non-model organisms require computationally expensive transcriptome assembly. We recently proposed an alternative strategy of directly aligning RNA-seq reads to a protein database, and demonstrated drastic improvements in speed, memory usage, and accuracy in identifying differentially expressed genes. RESULT Here we report a further speed-up by replacing DNA-protein alignment by quasi-mapping, making our pipeline > 1000× faster than assembly-based approach, and still more accurate. We also compare quasi-mapping to other mapping techniques, and show that it is faster but at the cost of sensitivity. CONCLUSION We provide a quick-and-dirty differential gene expression analysis pipeline for non-model organisms without a reference transcriptome, which directly quasi-maps RNA-seq reads to a reference protein database, avoiding computationally expensive transcriptome assembly.
Collapse
Affiliation(s)
- Kyle Christian L Santiago
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking, De La Salle University Manila, 2401 Taft Avenue, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University Manila, 2401 Taft Avenue, Manila, Philippines
| | - Anish M S Shrestha
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking, De La Salle University Manila, 2401 Taft Avenue, Manila, Philippines.
- Department of Software Technology, College of Computer Studies, De La Salle University Manila, 2401 Taft Avenue, Manila, Philippines.
| |
Collapse
|
16
|
Wang R, Zheng Y, Zhang Z, Song K, Wu E, Zhu X, Wu TP, Ding J. MATES: a deep learning-based model for locus-specific quantification of transposable elements in single cell. Nat Commun 2024; 15:8798. [PMID: 39394211 PMCID: PMC11470080 DOI: 10.1038/s41467-024-53114-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 09/24/2024] [Indexed: 10/13/2024] Open
Abstract
Transposable elements (TEs) are crucial for genetic diversity and gene regulation. Current single-cell quantification methods often align multi-mapping reads to either 'best-mapped' or 'random-mapped' locations and categorize them at the subfamily levels, overlooking the biological necessity for accurate, locus-specific TE quantification. Moreover, these existing methods are primarily designed for and focused on transcriptomics data, which restricts their adaptability to single-cell data of other modalities. To address these challenges, here we introduce MATES, a deep-learning approach that accurately allocates multi-mapping reads to specific loci of TEs, utilizing context from adjacent read alignments flanking the TE locus. When applied to diverse single-cell omics datasets, MATES shows improved performance over existing methods, enhancing the accuracy of TE quantification and aiding in the identification of marker TEs for identified cell populations. This development facilitates the exploration of single-cell heterogeneity and gene regulation through the lens of TEs, offering an effective transposon quantification tool for the single-cell genomics community.
Collapse
Affiliation(s)
- Ruohan Wang
- School of Computer Science, McGill University, Montreal, Quebec, Canada
- Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Department of Medicine, McGill University, Montreal, Quebec, Canada
| | - Yumin Zheng
- Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Department of Medicine, McGill University, Montreal, Quebec, Canada
- Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Zijian Zhang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Kailu Song
- Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Department of Medicine, McGill University, Montreal, Quebec, Canada
- Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Erxi Wu
- Department of Neurosurgery, Baylor College of Medicine, Temple, TX, USA
- College of Medicine and Irma Lerma Rangel College of Pharmacy, Texas A&M University, College Station, TX, USA
- LIVESTRONG Cancer Institutes and Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Neuroscience Institute and Department of Neurosurgery, Baylor Scott & White Health, Temple, TX, USA
| | | | - Tao P Wu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| | - Jun Ding
- School of Computer Science, McGill University, Montreal, Quebec, Canada.
- Meakins-Christie Laboratories, Translational Research in Respiratory Diseases Program, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada.
- Department of Medicine, McGill University, Montreal, Quebec, Canada.
- Quantitative Life Sciences, Faculty of Medicine & Health Sciences, McGill University, Montreal, Quebec, Canada.
- Mila-Quebec AI Institue, Montreal, Quebec, Canada.
| |
Collapse
|
17
|
Ueda MT, Inamo J, Miya F, Shimada M, Yamaguchi K, Kochi Y. Functional and dynamic profiling of transcript isoforms reveals essential roles of alternative splicing in interferon response. CELL GENOMICS 2024; 4:100654. [PMID: 39288763 PMCID: PMC11602592 DOI: 10.1016/j.xgen.2024.100654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 04/04/2024] [Accepted: 08/20/2024] [Indexed: 09/19/2024]
Abstract
Type I interferon (IFN-I) plays an important role in the innate immune response through inducing IFN-I-stimulated genes (ISGs). However, how alternative splicing (AS) events, especially over time, affect their function remains poorly understood. We generated an annotation (113,843 transcripts) for IFN-I-stimulated human B cells called isoISG using high-accuracy long-read sequencing data from PacBio Sequel II/IIe. Transcript isoform profiling using isoISG revealed that isoform switching occurred in the early response to IFN-I so that ISGs would gain functional domains (e.g., C4B) or higher protein production (e.g., IRF3). Conversely, isoforms lacking functional domains increased during the late phase of IFN-I response, mainly due to intron retention events. This suggests that isoform switching both triggers and terminates IFN-I responses at the translation and protein levels. Furthermore, genetic variants influencing the isoform ratio of ISGs were associated with immunological and infectious diseases. AS has essential roles in regulating innate immune response and associated diseases.
Collapse
Affiliation(s)
- Mahoko Takahashi Ueda
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan
| | - Jun Inamo
- Division of Rheumatology, University of Colorado School of Medicine, Aurora, CO, USA; Department of Biomedical Informatics, Center for Health Artificial Intelligence, University of Colorado School of Medicine, Aurora, CO, USA
| | - Fuyuki Miya
- Center for Medical Genetics, Keio University School of Medicine, Tokyo 160-8582, Japan
| | - Mihoko Shimada
- National Center for Global Health and Medicine, Tokyo 162-8655, Japan
| | - Kensuke Yamaguchi
- Biomedical Engineering Research Innovation Center, Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University, Tokyo 113-8510, Japan; Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan; Department of Allergy and Rheumatology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yuta Kochi
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan; Department of Allergy and Rheumatology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
18
|
Garbulowski M, Hillerton T, Morgan D, Seçilmiş D, Sonnhammer L, Tjärnberg A, Nordling TEM, Sonnhammer ELL. GeneSPIDER2: large scale GRN simulation and benchmarking with perturbed single-cell data. NAR Genom Bioinform 2024; 6:lqae121. [PMID: 39296931 PMCID: PMC11409065 DOI: 10.1093/nargab/lqae121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 08/20/2024] [Accepted: 09/02/2024] [Indexed: 09/21/2024] Open
Abstract
Single-cell data is increasingly used for gene regulatory network (GRN) inference, and benchmarks for this have been developed based on simulated data. However, existing single-cell simulators cannot model the effects of gene perturbations. A further challenge lies in generating large-scale GRNs that often struggle with computational and stability issues. We present GeneSPIDER2, an update of the GeneSPIDER MATLAB toolbox for GRN benchmarking, inference, and analysis. Several software modules have improved capabilities and performance, and new functionalities have been added. A major improvement is the ability to generate large GRNs with biologically realistic topological properties in terms of scale-free degree distribution and modularity. Another major addition is a simulation of single-cell data, which is becoming increasingly popular as input for GRN inference. Specifically, we introduced the unique feature to generate single-cell data based on genetic perturbations. Finally, the simulated single-cell data was compared to real single-cell Perturb-seq data from two cell lines, showing that the synthetic and real data exhibit similar properties.
Collapse
Affiliation(s)
- Mateusz Garbulowski
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna 171 21, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala 751 85, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna 171 21, Sweden
| | - Daniel Morgan
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna 171 21, Sweden
| | - Deniz Seçilmiş
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna 171 21, Sweden
- Department of Cell and Molecular Biology, Karolinska Institutet, Solna 171 77, Sweden
| | - Lisbet Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna 171 21, Sweden
| | - Andreas Tjärnberg
- Department of Neuro-Science, University of Wisconsin-Madison, Waisman Center, WI 53705, USA
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, No. 1 University Road, Tainan City 701, Taiwan
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna 171 21, Sweden
| |
Collapse
|
19
|
Ritter AJ, Wallace A, Ronaghi N, Sanford J. junctionCounts: comprehensive alternative splicing analysis and prediction of isoform-level impacts to the coding sequence. NAR Genom Bioinform 2024; 6:lqae093. [PMID: 39131822 PMCID: PMC11310779 DOI: 10.1093/nargab/lqae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 07/15/2024] [Accepted: 07/22/2024] [Indexed: 08/13/2024] Open
Abstract
Alternative splicing (AS) is emerging as an important regulatory process for complex biological processes. Transcriptomic studies therefore commonly involve the identification and quantification of alternative processing events, but the need for predicting the functional consequences of changes to the relative inclusion of alternative events remains largely unaddressed. Many tools exist for the former task, albeit each constrained to its own event type definitions. Few tools exist for the latter task; each with significant limitations. To address these issues we developed junctionCounts, which captures both simple and complex pairwise AS events and quantifies them with straightforward exon-exon and exon-intron junction reads in RNA-seq data, performing competitively among similar tools in terms of sensitivity, false discovery rate and quantification accuracy. Its partner utility, cdsInsertion, identifies transcript coding sequence (CDS) information via in silico translation from annotated start codons, including the presence of premature termination codons. Finally, findSwitchEvents connects AS events with CDS information to predict the impact of individual events to the isoform-level CDS. We used junctionCounts to characterize splicing dynamics and NMD regulation during neuronal differentiation across four primates, demonstrating junctionCounts' capacity to robustly characterize AS in a variety of organisms and to predict its effect on mRNA isoform fate.
Collapse
Affiliation(s)
- Alexander J Ritter
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Andrew Wallace
- Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Neda Ronaghi
- Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jeremy R Sanford
- Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
20
|
Zang XC, Li X, Metcalfe K, Ben-Yehezkel T, Kelley R, Shao M. Anchorage Accurately Assembles Anchor-Flanked Synthetic Long Reads. LIPICS : LEIBNIZ INTERNATIONAL PROCEEDINGS IN INFORMATICS 2024; 312:22. [PMID: 39764549 PMCID: PMC11702288 DOI: 10.4230/lipics.wabi.2024.22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol. LoopSeq Solo also achieves ultra-high sequencing depth and high purity of short reads covering the entire captured molecule. Despite the availability of many assembly methods, constructing full-length sequence from these anchor-enabled, ultra-high coverage sequencing data remains challenging due to the complexity of the underlying assembly graphs and the lack of specific algorithms leveraging anchors. We present Anchorage, a novel assembler that performs anchor-guided assembly for ultra-high-depth sequencing data. Anchorage starts with a kmer-based approach for precise estimation of molecule lengths. It then formulates the assembly problem as finding an optimal path that connects the two nodes determined by anchors in the underlying compact de Bruijn graph. The optimality is defined as maximizing the weight of the smallest node while matching the estimated sequence length. Anchorage uses a modified dynamic programming algorithm to efficiently find the optimal path. Through both simulations and real data, we show that Anchorage outperforms existing assembly methods, particularly in the presence of sequencing artifacts. Anchorage fills the gap in assembling anchor-enabled data. We anticipate its broad use as anchor-enabled sequencing technologies become prevalent. Anchorage is freely available at https://github.com/Shao-Group/anchorage; the scripts and documents that can reproduce all experiments in this manuscript are available at https://github.com/Shao-Group/anchorage-test.
Collapse
Affiliation(s)
- Xiaofei Carl Zang
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Xiang Li
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA
| | | | | | | | - Mingfu Shao
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
21
|
Paiva I, Seguin J, Grgurina I, Singh AK, Cosquer B, Plassard D, Tzeplaeff L, Le Gras S, Cotellessa L, Decraene C, Gambi J, Alcala-Vida R, Eswaramoorthy M, Buée L, Cassel JC, Giacobini P, Blum D, Merienne K, Kundu TK, Boutillier AL. Dysregulated expression of cholesterol biosynthetic genes in Alzheimer's disease alters epigenomic signatures of hippocampal neurons. Neurobiol Dis 2024; 198:106538. [PMID: 38789057 DOI: 10.1016/j.nbd.2024.106538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 05/18/2024] [Accepted: 05/20/2024] [Indexed: 05/26/2024] Open
Abstract
Aging is the main risk factor of cognitive neurodegenerative diseases such as Alzheimer's disease, with epigenome alterations as a contributing factor. Here, we compared transcriptomic/epigenomic changes in the hippocampus, modified by aging and by tauopathy, an AD-related feature. We show that the cholesterol biosynthesis pathway is severely impaired in hippocampal neurons of tauopathic but not of aged mice pointing to vulnerability of these neurons in the disease. At the epigenomic level, histone hyperacetylation was observed at neuronal enhancers associated with glutamatergic regulations only in the tauopathy. Lastly, a treatment of tau mice with the CSP-TTK21 epi-drug that restored expression of key cholesterol biosynthesis genes counteracted hyperacetylation at neuronal enhancers and restored object memory. As acetyl-CoA is the primary substrate of both pathways, these data suggest that the rate of the cholesterol biosynthesis in hippocampal neurons may trigger epigenetic-driven changes, that may compromise the functions of hippocampal neurons in pathological conditions.
Collapse
Affiliation(s)
- Isabel Paiva
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France.
| | - Jonathan Seguin
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France
| | - Iris Grgurina
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France
| | - Akash Kumar Singh
- Transcription and Disease Laboratory, Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Bangalore, India
| | - Brigitte Cosquer
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France
| | - Damien Plassard
- University of Strasbourg, CNRS UMR7104, Inserm U1258 - GenomEast Platform - IGBMC - Institut de Génétique et de Biologie Moléculaire et Cellulaire, F-67404 Illkirch, France
| | - Laura Tzeplaeff
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France
| | - Stephanie Le Gras
- University of Strasbourg, CNRS UMR7104, Inserm U1258 - GenomEast Platform - IGBMC - Institut de Génétique et de Biologie Moléculaire et Cellulaire, F-67404 Illkirch, France
| | - Ludovica Cotellessa
- University of Lille, Inserm, CHU Lille, Laboratory of Development and Plasticity of the Postnatal Brain, Lille Neuroscience & Cognition, UMR-S1172, FHU 1000 Days for Health, 59000 Lille, France
| | - Charles Decraene
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France
| | - Johanne Gambi
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France
| | - Rafael Alcala-Vida
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France
| | - Muthusamy Eswaramoorthy
- Chemistry and Physics of Materials Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India
| | - Luc Buée
- University of Lille, Inserm, CHU Lille, UMR-S1172 LilNCog - Lille Neuroscience & Cognition, Lille, France; Alzheimer and Tauopathies, LabEx DISTALZ, France
| | - Jean-Christophe Cassel
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France
| | - Paolo Giacobini
- University of Lille, Inserm, CHU Lille, Laboratory of Development and Plasticity of the Postnatal Brain, Lille Neuroscience & Cognition, UMR-S1172, FHU 1000 Days for Health, 59000 Lille, France
| | - David Blum
- University of Lille, Inserm, CHU Lille, UMR-S1172 LilNCog - Lille Neuroscience & Cognition, Lille, France; Alzheimer and Tauopathies, LabEx DISTALZ, France
| | - Karine Merienne
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France
| | - Tapas K Kundu
- Transcription and Disease Laboratory, Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Bangalore, India
| | - Anne-Laurence Boutillier
- University of Strasbourg, Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France; CNRS, UMR7364 - Laboratoire de Neuroscience Cognitives et Adaptatives (LNCA), Strasbourg F-67000, France.
| |
Collapse
|
22
|
Zhao Z, Chen Y, Zou X, Lin L, Zhou X, Cheng X, Yang G, Xu Q, Gong L, Li L, Ni T. Pan-cancer transcriptome analysis reveals widespread regulation through alternative tandem transcription initiation. SCIENCE ADVANCES 2024; 10:eadl5606. [PMID: 38985880 PMCID: PMC11235174 DOI: 10.1126/sciadv.adl5606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 06/05/2024] [Indexed: 07/12/2024]
Abstract
Abnormal transcription initiation from alternative first exon has been reported to promote tumorigenesis. However, the prevalence and impact of gene expression regulation mediated by alternative tandem transcription initiation were mostly unknown in cancer. Here, we developed a robust computational method to analyze alternative tandem transcription start site (TSS) usage from standard RNA sequencing data. Applying this method to pan-cancer RNA sequencing datasets, we observed widespread dysregulation of tandem TSS usage in tumors, many of which were independent of changes in overall expression level or alternative first exon usage. We showed that the dynamics of tandem TSS usage was associated with epigenomic modulation. We found that significant 5' untranslated region shortening of gene TIMM13 contributed to increased protein production, and up-regulation of TIMM13 by CRISPR-mediated transcriptional activation promoted proliferation and migration of lung cancer cells. Our findings suggest that dysregulated tandem TSS usage represents an addtional layer of cancer-associated transcriptome alterations.
Collapse
Affiliation(s)
- Zhaozhao Zhao
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Yu Chen
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xudong Zou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Limin Lin
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xiaolan Zhou
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xiaomeng Cheng
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Guangrui Yang
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Qiushi Xu
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Lihai Gong
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Lei Li
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Ting Ni
- State Key Laboratory of Genetic Engineering, National Clinical Research Center for Aging and Medicine, Huashan Hospital, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Center for Evolutionary Biology, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences, Fudan University, Shanghai 200438, China
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, School of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| |
Collapse
|
23
|
Shi Q, Zhang Q, Shao M. Accurate assembly of multiple RNA-seq samples with Aletsch. Bioinformatics 2024; 40:i307-i317. [PMID: 38940157 PMCID: PMC11211816 DOI: 10.1093/bioinformatics/btae215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full-length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations. RESULTS We present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a "bridging" system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages "supporting" information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch's significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC, constrained by precision), Aletsch surpasses the leading assemblers TransMeta by 22.9%-62.1% and PsiCLASS by 23.0%-175.5% on human datasets. AVAILABILITY AND IMPLEMENTATION Aletsch is freely available at https://github.com/Shao-Group/aletsch. Scripts that reproduce the experimental results of this manuscript is available at https://github.com/Shao-Group/aletsch-test.
Collapse
Affiliation(s)
- Qian Shi
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, United States
| | - Qimin Zhang
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, United States
| | - Mingfu Shao
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, United States
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, United States
| |
Collapse
|
24
|
Popitsch N, Neumann T, von Haeseler A, Ameres SL. Splice_sim: a nucleotide conversion-enabled RNA-seq simulation and evaluation framework. Genome Biol 2024; 25:166. [PMID: 38918865 PMCID: PMC11514792 DOI: 10.1186/s13059-024-03313-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 06/17/2024] [Indexed: 06/27/2024] Open
Abstract
Nucleotide conversion RNA sequencing techniques interrogate chemical RNA modifications in cellular transcripts, resulting in mismatch-containing reads. Biases in mapping the resulting reads to reference genomes remain poorly understood. We present splice_sim, a splice-aware RNA-seq simulation and evaluation pipeline that introduces user-defined nucleotide conversions at set frequencies, creates mixture models of converted and unconverted reads, and calculates mapping accuracies per genomic annotation. By simulating nucleotide conversion RNA-seq datasets under realistic experimental conditions, including metabolic RNA labeling and RNA bisulfite sequencing, we measure mapping accuracies of state-of-the-art spliced-read mappers for mouse and human transcripts and derive strategies to prevent biases in the data interpretation.
Collapse
Affiliation(s)
- Niko Popitsch
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna, A-1030, Austria.
- Max Perutz Labs, Department of Biochemistry and Cell Biology, University of Vienna, Vienna, A-1030, Austria.
| | - Tobias Neumann
- Quantro Therapeutics, Vienna, A-1030, Austria
- Vienna Biocenter PhD Program, a Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, A-1030, Austria
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Vienna, A-1030, Austria
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Vienna, A-1030, Austria
- Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, A-1090, Austria
| | - Stefan L Ameres
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna, A-1030, Austria
- Max Perutz Labs, Department of Biochemistry and Cell Biology, University of Vienna, Vienna, A-1030, Austria
- Institute of Molecular Biotechnology, IMBA, Vienna Biocenter Campus (VBC), Vienna, A-1030, Austria
| |
Collapse
|
25
|
Gonzalez Gomez C, Rosa-Calatrava M, Fouret J. Optimizing in silico drug discovery: simulation of connected differential expression signatures and applications to benchmarking. Brief Bioinform 2024; 25:bbae299. [PMID: 38935068 PMCID: PMC11210109 DOI: 10.1093/bib/bbae299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/19/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND We present a novel simulation method for generating connected differential expression signatures. Traditional methods have struggled with the lack of reliable benchmarking data and biases in drug-disease pair labeling, limiting the rigorous benchmarking of connectivity-based approaches. OBJECTIVE Our aim is to develop a simulation method based on a statistical framework that allows for adjustable levels of parametrization, especially the connectivity, to generate a pair of interconnected differential signatures. This could help to address the issue of benchmarking data availability for connectivity-based drug repurposing approaches. METHODS We first detailed the simulation process and how it reflected real biological variability and the interconnectedness of gene expression signatures. Then, we generated several datasets to enable the evaluation of different existing algorithms that compare differential expression signatures, providing insights into their performance and limitations. RESULTS Our findings demonstrate the ability of our simulation to produce realistic data, as evidenced by correlation analyses and the log2 fold-change distribution of deregulated genes. Benchmarking reveals that methods like extreme cosine similarity and Pearson correlation outperform others in identifying connected signatures. CONCLUSION Overall, our method provides a reliable tool for simulating differential expression signatures. The data simulated by our tool encompass a wide spectrum of possibilities to challenge and evaluate existing methods to estimate connectivity scores. This may represent a critical gap in connectivity-based drug repurposing research because reliable benchmarking data are essential for assessing and advancing in the development of new algorithms. The simulation tool is available as a R package (General Public License (GPL) license) at https://github.com/cgonzalez-gomez/cosimu.
Collapse
Affiliation(s)
- Catalina Gonzalez Gomez
- CIRI, Centre International de Recherche en Infectiologie, Team VirPath, Univ Lyon, 14 Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-15 69007 Lyon, Rhône-Alpes, France
- International Associated Laboratory RespiVir France—Canada, Centre de Recherche en Infectiologie, Faculté de Médecine RTH Laennec 69008 Lyon, Université Claude Bernard Lyon 1, Université de Lyon, INSERM, CNRS, ENS de Lyon, France, Centre Hospitalier Universitaire de Québec - Université Laval, QC G1V 4G2 Québec, Canada
- Nexomis, Faculté de Médecine RTH Laennec, Université Claude Bernard Lyon 1, Université de Lyon, 7 Rue Guillaume Paradin, 69008 Lyon, Rhône-Alpes, France
- Signia Therapeutics, 60 Avenue Rockefeller, 69008 Lyon, Rhône-Alpes, France
| | - Manuel Rosa-Calatrava
- CIRI, Centre International de Recherche en Infectiologie, Team VirPath, Univ Lyon, 14 Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-15 69007 Lyon, Rhône-Alpes, France
- International Associated Laboratory RespiVir France—Canada, Centre de Recherche en Infectiologie, Faculté de Médecine RTH Laennec 69008 Lyon, Université Claude Bernard Lyon 1, Université de Lyon, INSERM, CNRS, ENS de Lyon, France, Centre Hospitalier Universitaire de Québec - Université Laval, QC G1V 4G2 Québec, Canada
- Nexomis, Faculté de Médecine RTH Laennec, Université Claude Bernard Lyon 1, Université de Lyon, 7 Rue Guillaume Paradin, 69008 Lyon, Rhône-Alpes, France
- VirNext, Faculté de Médecine RTH Laennec, Université Claude Bernard Lyon 1, Université de Lyon, 7 Rue Guillaume Paradin, 69008 Lyon, Rhône-Alpes, France
| | - Julien Fouret
- CIRI, Centre International de Recherche en Infectiologie, Team VirPath, Univ Lyon, 14 Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-15 69007 Lyon, Rhône-Alpes, France
- International Associated Laboratory RespiVir France—Canada, Centre de Recherche en Infectiologie, Faculté de Médecine RTH Laennec 69008 Lyon, Université Claude Bernard Lyon 1, Université de Lyon, INSERM, CNRS, ENS de Lyon, France, Centre Hospitalier Universitaire de Québec - Université Laval, QC G1V 4G2 Québec, Canada
- Nexomis, Faculté de Médecine RTH Laennec, Université Claude Bernard Lyon 1, Université de Lyon, 7 Rue Guillaume Paradin, 69008 Lyon, Rhône-Alpes, France
- Signia Therapeutics, 60 Avenue Rockefeller, 69008 Lyon, Rhône-Alpes, France
| |
Collapse
|
26
|
Zhang X. Highly Effective Batch Effect Correction Method for RNA-seq Count Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.02.592266. [PMID: 38746101 PMCID: PMC11092589 DOI: 10.1101/2024.05.02.592266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
RNA sequencing (RNA-seq) has become a cornerstone in transcriptomics, offering detailed insights into gene expression across diverse biological conditions and sample types. However, RNA-seq data often suffer from batch effects, which are systematic non-biological differences that compromise data reliability and obscure true biological variation. To address these challenges, we introduce ComBat-ref, a refined method of batch effect correction that enhances the statistical power and reliability of differential expression analysis in RNA-seq data. Building on the foundations of ComBat-seq, ComBat-ref employs a negative binomial model to adjust count data but innovates by using a pooled dispersion parameter for entire batches and preserving count data for the reference batch. Our method demonstrated superior performance in both simulated environments and real datasets, such as the growth factor receptor network (GFRN) data and NASA GeneLab transcriptomic datasets, significantly improving sensitivity and specificity over existing methods. By effectively mitigating batch effects while maintaining high detection power, ComBat-ref proves to be a robust tool for enhancing the accuracy and interpretability of RNA-seq data analyses.
Collapse
|
27
|
Brooks TG, Lahens NF, Mrčela A, Grant GR. Challenges and best practices in omics benchmarking. Nat Rev Genet 2024; 25:326-339. [PMID: 38216661 DOI: 10.1038/s41576-023-00679-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2023] [Indexed: 01/14/2024]
Abstract
Technological advances enabling massively parallel measurement of biological features - such as microarrays, high-throughput sequencing and mass spectrometry - have ushered in the omics era, now in its third decade. The resulting complex landscape of analytical methods has naturally fostered the growth of an omics benchmarking industry. Benchmarking refers to the process of objectively comparing and evaluating the performance of different computational or analytical techniques when processing and analysing large-scale biological data sets, such as transcriptomics, proteomics and metabolomics. With thousands of omics benchmarking studies published over the past 25 years, the field has matured to the point where the foundations of benchmarking have been established and well described. However, generating meaningful benchmarking data and properly evaluating performance in this complex domain remains challenging. In this Review, we highlight some common oversights and pitfalls in omics benchmarking. We also establish a methodology to bring the issues that can be addressed into focus and to be transparent about those that cannot: this takes the form of a spreadsheet template of guidelines for comprehensive reporting, intended to accompany publications. In addition, a survey of recent developments in benchmarking is provided as well as specific guidance for commonly encountered difficulties.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
28
|
Ng JCF, Montamat Garcia G, Stewart AT, Blair P, Mauri C, Dunn-Walters DK, Fraternali F. sciCSR infers B cell state transition and predicts class-switch recombination dynamics using single-cell transcriptomic data. Nat Methods 2024; 21:823-834. [PMID: 37932398 PMCID: PMC11093741 DOI: 10.1038/s41592-023-02060-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/02/2023] [Indexed: 11/08/2023]
Abstract
Class-switch recombination (CSR) is an integral part of B cell maturation. Here we present sciCSR (pronounced 'scissor', single-cell inference of class-switch recombination), a computational pipeline that analyzes CSR events and dynamics of B cells from single-cell RNA sequencing (scRNA-seq) experiments. Validated on both simulated and real data, sciCSR re-analyzes scRNA-seq alignments to differentiate productive heavy-chain immunoglobulin transcripts from germline 'sterile' transcripts. From a snapshot of B cell scRNA-seq data, a Markov state model is built to infer the dynamics and direction of CSR. Applying sciCSR on severe acute respiratory syndrome coronavirus 2 vaccination time-course scRNA-seq data, we observe that sciCSR predicts, using data from an earlier time point in the collected time-course, the isotype distribution of B cell receptor repertoires of subsequent time points with high accuracy (cosine similarity ~0.9). Using processes specific to B cells, sciCSR identifies transitions that are often missed by conventional RNA velocity analyses and can reveal insights into the dynamics of B cell CSR during immune response.
Collapse
Affiliation(s)
- Joseph C F Ng
- Department of Structural and Molecular Biology, Division of Biosciences and Institute of Structural and Molecular Biology, University College London, London, UK.
| | - Guillem Montamat Garcia
- Division of Infection and Immunity and Institute of Immunity and Transplantation, Royal Free Hospital, University College London, London, UK
| | | | - Paul Blair
- Division of Infection and Immunity and Institute of Immunity and Transplantation, Royal Free Hospital, University College London, London, UK
| | - Claudia Mauri
- Division of Infection and Immunity and Institute of Immunity and Transplantation, Royal Free Hospital, University College London, London, UK
| | | | - Franca Fraternali
- Department of Structural and Molecular Biology, Division of Biosciences and Institute of Structural and Molecular Biology, University College London, London, UK.
| |
Collapse
|
29
|
Sutherland CA, Prigozhin DM, Monroe JG, Krasileva KV. High allelic diversity in Arabidopsis NLRs is associated with distinct genomic features. EMBO Rep 2024; 25:2306-2322. [PMID: 38528170 PMCID: PMC11093987 DOI: 10.1038/s44319-024-00122-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 03/07/2024] [Accepted: 03/08/2024] [Indexed: 03/27/2024] Open
Abstract
Plants rely on Nucleotide-binding, Leucine-rich repeat Receptors (NLRs) for pathogen recognition. Highly variable NLRs (hvNLRs) show remarkable intraspecies diversity, while their low-variability paralogs (non-hvNLRs) are conserved between ecotypes. At a population level, hvNLRs provide new pathogen-recognition specificities, but the association between allelic diversity and genomic and epigenomic features has not been established. Our investigation of NLRs in Arabidopsis Col-0 has revealed that hvNLRs show higher expression, less gene body cytosine methylation, and closer proximity to transposable elements than non-hvNLRs. hvNLRs show elevated synonymous and nonsynonymous nucleotide diversity and are in chromatin states associated with an increased probability of mutation. Diversifying selection maintains variability at a subset of codons of hvNLRs, while purifying selection maintains conservation at non-hvNLRs. How these features are established and maintained, and whether they contribute to the observed diversity of hvNLRs is key to understanding the evolution of plant innate immune receptors.
Collapse
Affiliation(s)
- Chandler A Sutherland
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Daniil M Prigozhin
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - J Grey Monroe
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616, USA
| | - Ksenia V Krasileva
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA.
| |
Collapse
|
30
|
Choudhery S, DeJesus MA, Srinivasan A, Rock J, Schnappinger D, Ioerger TR. A dose-response model for statistical analysis of chemical genetic interactions in CRISPRi screens. PLoS Comput Biol 2024; 20:e1011408. [PMID: 38768228 PMCID: PMC11104602 DOI: 10.1371/journal.pcbi.1011408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 04/22/2024] [Indexed: 05/22/2024] Open
Abstract
An important application of CRISPR interference (CRISPRi) technology is for identifying chemical-genetic interactions (CGIs). Discovery of genes that interact with exposure to antibiotics can yield insights to drug targets and mechanisms of action or resistance. The objective is to identify CRISPRi mutants whose relative abundance is suppressed (or enriched) in the presence of a drug when the target protein is depleted, reflecting synergistic behavior. Different sgRNAs for a given target can induce a wide range of protein depletion and differential effects on growth rate. The effect of sgRNA strength can be partially predicted based on sequence features. However, the actual growth phenotype depends on the sensitivity of cells to depletion of the target protein. For essential genes, sgRNA efficiency can be empirically measured by quantifying effects on growth rate. We observe that the most efficient sgRNAs are not always optimal for detecting synergies with drugs. sgRNA efficiency interacts in a non-linear way with drug sensitivity, producing an effect where the concentration-dependence is maximized for sgRNAs of intermediate strength (and less so for sgRNAs that induce too much or too little target depletion). To capture this interaction, we propose a novel statistical method called CRISPRi-DR (for Dose-Response model) that incorporates both sgRNA efficiencies and drug concentrations in a modified dose-response equation. We use CRISPRi-DR to re-analyze data from a recent CGI experiment in Mycobacterium tuberculosis to identify genes that interact with antibiotics. This approach can be generalized to non-CGI datasets, which we show via an CRISPRi dataset for E. coli growth on different carbon sources. The performance is competitive with the best of several related analytical methods. However, for noisier datasets, some of these methods generate far more significant interactions, likely including many false positives, whereas CRISPRi-DR maintains higher precision, which we observed in both empirical and simulated data.
Collapse
Affiliation(s)
- Sanjeevani Choudhery
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas, United States of America
| | - Michael A. DeJesus
- Laboratory of Host-Pathogen Biology, The Rockefeller University, New York, New York, United States of America
| | - Aarthi Srinivasan
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas, United States of America
| | - Jeremy Rock
- Laboratory of Host-Pathogen Biology, The Rockefeller University, New York, New York, United States of America
| | - Dirk Schnappinger
- Department of Microbiology and Immunology, Weill Cornell Medical College, New York, New York, United States of America
| | - Thomas R. Ioerger
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas, United States of America
| |
Collapse
|
31
|
Liu Z, Ouyang T, Yang Y, Sheng Y, Shi H, Liu Q, Bai Y, Ge Q. The Impact of Blood Sample Processing on Ribonucleic Acid (RNA) Sequencing. Genes (Basel) 2024; 15:502. [PMID: 38674435 PMCID: PMC11050547 DOI: 10.3390/genes15040502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 04/12/2024] [Accepted: 04/13/2024] [Indexed: 04/28/2024] Open
Abstract
In gene quantification and expression analysis, issues with sample selection and processing can be serious, as they can easily introduce irrelevant variables and lead to ambiguous results. This study aims to investigate the extent and mechanism of the impact of sample selection and processing on ribonucleic acid (RNA) sequencing. RNA from PBMCs and blood samples was investigated in this study. The integrity of this RNA was measured under different storage times. All the samples underwent high-throughput sequencing for comprehensive evaluation. The differentially expressed genes and their potential functions were analyzed after the samples were placed at room temperature for 0h, 4h and 8h, and different feature changes in these samples were also revealed. The sequencing results showed that the differences in gene expression were higher with an increased storage time, while the total number of genes detected did not change significantly. There were five genes showing gradient patterns over different storage times, all of which were protein-coding genes that had not been mentioned in previous studies. The effect of different storage times on seemingly the same samples was analyzed in this present study. This research, therefore, provides a theoretical basis for the long-term consideration of whether sample processing should be adequately addressed.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Qinyu Ge
- State Key Laboratory of Digital Medical Engineering, Southeast University, Nanjing 211189, China; (Z.L.); (T.O.); (Y.Y.); (Y.S.); (H.S.); (Q.L.); (Y.B.)
| |
Collapse
|
32
|
Li H, Khang TF. SIEVE: One-stop differential expression, variability, and skewness analyses using RNA-Seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.09.588804. [PMID: 38645120 PMCID: PMC11030344 DOI: 10.1101/2024.04.09.588804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Motivation RNA-Seq data analysis is commonly biased towards detecting differentially expressed genes and insufficiently conveys the complexity of gene expression changes between biological conditions. This bias arises because discrete models of RNA-Seq count data cannot fully characterize the mean, variance, and skewness of gene expression distribution using independent model parameters. A unified framework that simultaneously tests for differential expression, variability, and skewness is needed to realize the full potential of RNA-Seq data analysis in a systems biology context. Results We present SIEVE, a statistical methodology that provides the desired unified framework. SIEVE embraces a compositional data analysis framework that transforms discrete RNA-Seq counts to a continuous form with a distribution that is well-fitted by a skew-normal distribution. Simulation results show that SIEVE controls the false discovery rate and probability of Type II error better than existing methods for differential expression analysis. Analysis of the Mayo RNA-Seq dataset for Alzheimer's disease using SIEVE reveals that a gene set with significant expression difference in mean, standard deviation and skewness between the control and the Alzheimer's disease group strongly predicts a subject's disease state. Furthermore, functional enrichment analysis shows that relying solely on differentially expressed genes detects only a segment of a much broader spectrum of biological aspects associated with Alzheimer's disease. The latter aspects can only be revealed using genes that show differential variability and skewness. Thus, SIEVE enables fresh perspectives for understanding the intricate changes in gene expression that occur in complex diseases. Availability The SIEVE R package and source codes are available at https://github.com/Divo-Lee/SIEVE .
Collapse
|
33
|
Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. BEERS2: RNA-Seq simulation through high fidelity in silico modeling. Brief Bioinform 2024; 25:bbae164. [PMID: 38605641 PMCID: PMC11009461 DOI: 10.1093/bib/bbae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 01/26/2024] [Accepted: 03/26/2024] [Indexed: 04/13/2024] Open
Abstract
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Dimitra Sarantopoulou
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Soumyashant Nayak
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: Statistics and Mathematics Unit, Indian Statistical Institute, Bengaluru, Karnataka, India
| | - Amruta Naik
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shaon Sengupta
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Peter S Choi
- Division of Cancer Pathobiology, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology & Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
34
|
Barra J, Taverna F, Bong F, Ahmed I, Karakach TK. Error modelled gene expression analysis (EMOGEA) provides a superior overview of time course RNA-seq measurements and low count gene expression. Brief Bioinform 2024; 25:bbae233. [PMID: 38770716 PMCID: PMC11106635 DOI: 10.1093/bib/bbae233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/03/2024] [Accepted: 04/30/2024] [Indexed: 05/22/2024] Open
Abstract
Temporal RNA-sequencing (RNA-seq) studies of bulk samples provide an opportunity for improved understanding of gene regulation during dynamic phenomena such as development, tumor progression or response to an incremental dose of a pharmacotherapeutic. Moreover, single-cell RNA-seq (scRNA-seq) data implicitly exhibit temporal characteristics because gene expression values recapitulate dynamic processes such as cellular transitions. Unfortunately, temporal RNA-seq data continue to be analyzed by methods that ignore this ordinal structure and yield results that are often difficult to interpret. Here, we present Error Modelled Gene Expression Analysis (EMOGEA), a framework for analyzing RNA-seq data that incorporates measurement uncertainty, while introducing a special formulation for those acquired to monitor dynamic phenomena. This method is specifically suited for RNA-seq studies in which low-count transcripts with small-fold changes lead to significant biological effects. Such transcripts include genes involved in signaling and non-coding RNAs that inherently exhibit low levels of expression. Using simulation studies, we show that this framework down-weights samples that exhibit extreme responses such as batch effects allowing them to be modeled with the rest of the samples and maintain the degrees of freedom originally envisioned for a study. Using temporal experimental data, we demonstrate the framework by extracting a cascade of gene expression waves from a well-designed RNA-seq study of zebrafish embryogenesis and an scRNA-seq study of mouse pre-implantation and provide unique biological insights into the regulation of genes in each wave. For non-ordinal measurements, we show that EMOGEA has a much higher rate of true positive calls and a vanishingly small rate of false negative discoveries compared to common approaches. Finally, we provide two packages in Python and R that are self-contained and easy to use, including test data.
Collapse
Affiliation(s)
- Jasmine Barra
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
- Department of Microbiology & Immunology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
| | - Federico Taverna
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
| | - Fabian Bong
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
| | - Ibrahim Ahmed
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
| | - Tobias K Karakach
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
| |
Collapse
|
35
|
Liu X, Chen H, Li Z, Yang X, Jin W, Wang Y, Zheng J, Li L, Xuan C, Yuan J, Yang Y. InPACT: a computational method for accurate characterization of intronic polyadenylation from RNA sequencing data. Nat Commun 2024; 15:2583. [PMID: 38519498 PMCID: PMC10960005 DOI: 10.1038/s41467-024-46875-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 03/12/2024] [Indexed: 03/25/2024] Open
Abstract
Alternative polyadenylation can occur in introns, termed intronic polyadenylation (IPA), has been implicated in diverse biological processes and diseases, as it can produce noncoding transcripts or transcripts with truncated coding regions. However, a reliable method is required to accurately characterize IPA. Here, we propose a computational method called InPACT, which allows for the precise characterization of IPA from conventional RNA-seq data. InPACT successfully identifies numerous previously unannotated IPA transcripts in human cells, many of which are translated, as evidenced by ribosome profiling data. We have demonstrated that InPACT outperforms other methods in terms of IPA identification and quantification. Moreover, InPACT applied to monocyte activation reveals temporally coordinated IPA events. Further application on single-cell RNA-seq data of human fetal bone marrow reveals the expression of several IPA isoforms in a context-specific manner. Therefore, InPACT represents a powerful tool for the accurate characterization of IPA from RNA-seq data.
Collapse
Affiliation(s)
- Xiaochuan Liu
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Hao Chen
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Zekun Li
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Xiaoxiao Yang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Wen Jin
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Yuting Wang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Jian Zheng
- Department of Immunology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Long Li
- Department of Immunology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Chenghao Xuan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
| | - Jiapei Yuan
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300020, China.
- Tianjin Institutes of Health Science, Tianjin, 301600, China.
| | - Yang Yang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Key Laboratory of Inflammatory Biology, The Second Hospital of Tianjin Medical University, Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China.
| |
Collapse
|
36
|
Coxe T, Burks DJ, Singh U, Mittler R, Azad RK. Benchmarking RNA-Seq Aligners at Base-Level and Junction Base-Level Resolution Using the Arabidopsis thaliana Genome. PLANTS (BASEL, SWITZERLAND) 2024; 13:582. [PMID: 38475429 DOI: 10.3390/plants13050582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/07/2024] [Accepted: 02/16/2024] [Indexed: 03/14/2024]
Abstract
The utmost goal of selecting an RNA-Seq alignment software is to perform accurate alignments with a robust algorithm, which is capable of detecting the various intricacies underlying read-mapping procedures and beyond. Most alignment software tools are typically pre-tuned with human or prokaryotic data, and therefore may not be suitable for applications to other organisms, such as plants. The rapidly growing plant RNA-Seq databases call for the assessment of the alignment tools on curated plant data, which will aid the calibration of these tools for applications to plant transcriptomic data. We therefore focused here on benchmarking RNA-Seq read alignment tools, using simulated data derived from the model organism Arabidopsis thaliana. We assessed the performance of five popular RNA-Seq alignment tools that are currently available, based on their usage (citation count). By introducing annotated single nucleotide polymorphisms (SNPs) from The Arabidopsis Information Resource (TAIR), we recorded alignment accuracy at both base-level and junction base-level resolutions for each alignment tool. In addition to assessing the performance of the alignment tools at their default settings, accuracies were also recorded by varying the values of numerous parameters, including the confidence threshold and the level of SNP introduction. The performances of the aligners were found consistent under various testing conditions at the base-level accuracy; however, the junction base-level assessment produced varying results depending upon the applied algorithm. At the read base-level assessment, the overall performance of the aligner STAR was superior to other aligners, with the overall accuracy reaching over 90% under different test conditions. On the other hand, at the junction base-level assessment, SubRead emerged as the most promising aligner, with an overall accuracy over 80% under most test conditions.
Collapse
Affiliation(s)
- Tallon Coxe
- Department of Biological Sciences and BioDiscovery Institute, College of Science, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203-5017, USA
| | - David J Burks
- Department of Biological Sciences and BioDiscovery Institute, College of Science, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203-5017, USA
| | - Utkarsh Singh
- Texas Academy of Mathematics and Science, University of North Texas, Denton, TX 76203, USA
| | - Ron Mittler
- The Division of Plant Science and Technology, and Interdisciplinary Plant Group, College of Agriculture, Food and Natural Resources, Christopher S. Bond Life Sciences Center University of Missouri, 1201 Rollins St., Columbia, MO 65201, USA
- Department of Surgery, University of Missouri School of Medicine, Columbia, MO 65212, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, College of Science, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203-5017, USA
- Department of Mathematics, University of North Texas, Denton, TX 76203-5017, USA
| |
Collapse
|
37
|
Choudhery S, DeJesus MA, Srinivasan A, Rock J, Schnappinger D, Ioerger TR. A dose-response model for statistical analysis of chemical genetic interactions in CRISPRi screens. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.03.551759. [PMID: 37577548 PMCID: PMC10418283 DOI: 10.1101/2023.08.03.551759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
An important application of CRISPR interference (CRISPRi) technology is for identifying chemical-genetic interactions (CGIs). Discovery of genes that interact with exposure to antibiotics can yield insights to drug targets and mechanisms of action or resistance. The objective is to identify CRISPRi mutants whose relative abundance is suppressed (or enriched) in the presence of a drug when the target protein is depleted, reflecting synergistic behavior. Different sgRNAs for a given target can induce a wide range of protein depletion and differential effects on growth rate. The effect of sgRNA strength can be partially predicted based on sequence features. However, the actual growth phenotype depends on the sensitivity of cells to depletion of the target protein. For essential genes, sgRNA efficiency can be empirically measured by quantifying effects on growth rate. We observe that the most efficient sgRNAs are not always optimal for detecting synergies with drugs. sgRNA efficiency interacts in a non-linear way with drug sensitivity, producing an effect where the concentration-dependence is maximized for sgRNAs of intermediate strength (and less so for sgRNAs that induce too much or too little target depletion). To capture this interaction, we propose a novel statistical method called CRISPRi-DR (for Dose-Response model) that incorporates both sgRNA efficiencies and drug concentrations in a modified dose-response equation. We use CRISPRi-DR to re-analyze data from a recent CGI experiment in Mycobacterium tuberculosis to identify genes that interact with antibiotics. This approach can be generalized to non-CGI datasets, which we show via an CRISPRi dataset for E. coli growth on different carbon sources. The performance is competitive with the best of several related analytical methods. However, for noisier datasets, some of these methods generate far more significant interactions, likely including many false positives, whereas CRISPRi-DR maintains higher precision, which we observed in both empirical and simulated data.
Collapse
Affiliation(s)
- Sanjeevani Choudhery
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas, United States of America
| | - Michael A. DeJesus
- Laboratory of Host-Pathogen Biology, The Rockefeller University, New York, New York, United States of America
| | - Aarthi Srinivasan
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas, United States of America
| | - Jeremy Rock
- Laboratory of Host-Pathogen Biology, The Rockefeller University, New York, New York, United States of America
| | - Dirk Schnappinger
- Department of Microbiology and Immunology, Weill Cornell Medical College, New York, New York, United States of America
| | - Thomas R. Ioerger
- Department of Computer Science and Engineering, Texas A&M University, College Station, Texas, United States of America
| |
Collapse
|
38
|
Zayakin P. sRNAflow: A Tool for the Analysis of Small RNA-Seq Data. Noncoding RNA 2024; 10:6. [PMID: 38250806 PMCID: PMC10801628 DOI: 10.3390/ncrna10010006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 12/29/2023] [Accepted: 01/15/2024] [Indexed: 01/23/2024] Open
Abstract
The analysis of small RNA sequencing data across a range of biofluids is a significant research area, given the diversity of RNA types that hold potential diagnostic, prognostic, and predictive value. The intricate task of segregating the complex mixture of small RNAs from both human and other species, including bacteria, fungi, and viruses, poses one of the most formidable challenges in the analysis of small RNA sequencing data, currently lacking satisfactory solutions. This study introduces sRNAflow, a user-friendly bioinformatic tool with a web interface designed for the analysis of small RNAs obtained from biological fluids. Tailored to the unique requirements of such samples, the proposed pipeline addresses various challenges, including filtering potential RNAs from reagents and environment, classifying small RNA types, managing small RNA annotation overlap, conducting differential expression assays, analysing isomiRs, and presenting an approach to identify the sources of small RNAs within samples. sRNAflow also encompasses an alternative alignment-free analysis of RNA-seq data, featuring clustering and initial RNA source identification using BLAST. This comprehensive approach facilitates meaningful comparisons of results between different analytical methods.
Collapse
Affiliation(s)
- Pawel Zayakin
- Latvian Biomedical Research and Study Centre, LV-1067 Riga, Latvia;
- European Bioinformatics Institute, EMBL-EBI, Hinxton CB10 1SD, UK
| |
Collapse
|
39
|
Li G, Mahajan S, Ma S, Jeffery ED, Zhang X, Bhattacharjee A, Venkatasubramanian M, Weirauch MT, Miraldi ER, Grimes HL, Sheynkman GM, Tilburgs T, Salomonis N. Splicing neoantigen discovery with SNAF reveals shared targets for cancer immunotherapy. Sci Transl Med 2024; 16:eade2886. [PMID: 38232136 PMCID: PMC11517820 DOI: 10.1126/scitranslmed.ade2886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 12/13/2023] [Indexed: 01/19/2024]
Abstract
Immunotherapy has emerged as a crucial strategy to combat cancer by "reprogramming" a patient's own immune system. Although immunotherapy is typically reserved for patients with a high mutational burden, neoantigens produced from posttranscriptional regulation may provide an untapped reservoir of common immunogenic targets for new targeted therapies. To comprehensively define tumor-specific and likely immunogenic neoantigens from patient RNA-Seq, we developed Splicing Neo Antigen Finder (SNAF), an easy-to-use and open-source computational workflow to predict splicing-derived immunogenic MHC-bound peptides (T cell antigen) and unannotated transmembrane proteins with altered extracellular epitopes (B cell antigen). This workflow uses a highly accurate deep learning strategy for immunogenicity prediction (DeepImmuno) in conjunction with new algorithms to rank the tumor specificity of neoantigens (BayesTS) and to predict regulators of mis-splicing (RNA-SPRINT). T cell antigens from SNAF were frequently evidenced as HLA-presented peptides from mass spectrometry (MS) and predict response to immunotherapy in melanoma. Splicing neoantigen burden was attributed to coordinated splicing factor dysregulation. Shared splicing neoantigens were found in up to 90% of patients with melanoma, correlated to overall survival in multiple cancer cohorts, induced T cell reactivity, and were characterized by distinct cells of origin and amino acid preferences. In addition to T cell neoantigens, our B cell focused pipeline (SNAF-B) identified a new class of tumor-specific extracellular neoepitopes, which we termed ExNeoEpitopes. ExNeoEpitope full-length mRNA predictions were tumor specific and were validated using long-read isoform sequencing and in vitro transmembrane localization assays. Therefore, our systematic identification of splicing neoantigens revealed potential shared targets for therapy in heterogeneous cancers.
Collapse
Affiliation(s)
- Guangyuan Li
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, OH, 45267 USA
| | - Shweta Mahajan
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 45229
| | - Siyuan Ma
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 45229
| | - Erin D. Jeffery
- Department of Molecular Physiology and Biological Physics, University of Virginia, VA 22903
| | - Xuan Zhang
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 45229
| | - Anukana Bhattacharjee
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Meenakshi Venkatasubramanian
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Computer Science, University of Cincinnati, Cincinnati, OH 45229
| | - Matthew T. Weirauch
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital, Cincinnati, OH 45229
- Division of Human Genetics, Cincinnati Children’s Hospital, Cincinnati, OH 45229
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229
| | - Emily R. Miraldi
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 45229
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229
| | - H. Leighton Grimes
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 45229
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229
| | - Gloria M. Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, VA 22903
| | - Tamara Tilburgs
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 45229
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229
| | - Nathan Salomonis
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, OH, 45267 USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229
| |
Collapse
|
40
|
Mestre-Tomás J, Liu T, Pardo-Palacios F, Conesa A. SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark. Genome Biol 2023; 24:286. [PMID: 38082294 PMCID: PMC10712166 DOI: 10.1186/s13059-023-03127-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 11/27/2023] [Indexed: 12/18/2023] Open
Abstract
Long-read RNA sequencing has emerged as a powerful tool for transcript discovery, even in well-annotated organisms. However, assessing the accuracy of different methods in identifying annotated and novel transcripts remains a challenge. Here, we present SQANTI-SIM, a versatile tool that wraps around popular long-read simulators to allow precise management of transcript novelty based on the structural categories defined by SQANTI3. By selectively excluding specific transcripts from the reference dataset, SQANTI-SIM effectively emulates scenarios involving unannotated transcripts. Furthermore, the tool provides customizable features and supports the simulation of additional types of data, representing the first multi-omics simulation tool for the lrRNA-seq field.
Collapse
Affiliation(s)
- Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Camino de Vera, Valencia, 46022, Spain
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Francisco Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain.
| |
Collapse
|
41
|
Bendik J, Kalavacherla S, Webster N, Califano J, Fertig EJ, Ochs MF, Carter H, Guo T. OutSplice: A Novel Tool for the Identification of Tumor-Specific Alternative Splicing Events. BIOMEDINFORMATICS 2023; 3:853-868. [PMID: 40236985 PMCID: PMC11997874 DOI: 10.3390/biomedinformatics3040053] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Protein variation that occurs during alternative splicing has been shown to play a major role in disease onset and oncogenesis. Due to this, we have developed OutSplice, a user-friendly algorithm to classify splicing outliers in tumor samples compared to a distribution of normal samples. Several tools have previously been developed to help uncover splicing events, each coming with varying methodologies, complexities, and features that can make it difficult for a new researcher to use or to determine which tool they should be using. Therefore, we benchmarked several algorithms to determine which may be best for a particular user's needs and demonstrate how OutSplice differs from these methodologies. We find that despite detecting a lower number of genes with significant aberrant events, OutSplice is able to identify those that are biologically impactful. Additionally, we identify 17 genes that contain significant splicing alterations in tumor tissue that were discovered across at least 5 of the tested algorithms, making them good candidates for future studies. Overall, researchers should consider a combined use of OutSplice with other splicing software to help provide additional validation for aberrant splicing events and to narrow down biologically relevant events.
Collapse
Affiliation(s)
- Joseph Bendik
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
| | - Sandhya Kalavacherla
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
| | - Nicholas Webster
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
| | - Joseph Califano
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
- Gleiberman Head and Neck Cancer Center, University of California, San Diego, CA 92037, USA
- Department of Otolaryngology-Head and Neck Surgery, University of California San Diego, San Diego, CA 92037, USA
| | - Elana J. Fertig
- Quantitative Sciences Division and Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21224, USA
- Department of Oncology, Johns Hopkins University, Baltimore, MD 21224, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21224, USA
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21224, USA
| | - Michael F. Ochs
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ 08628, USA
| | - Hannah Carter
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
- Division of Medical Genetics, Department of Medicine, University of California San Diego, San Diego, CA 92093, USA
| | - Theresa Guo
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
- Gleiberman Head and Neck Cancer Center, University of California, San Diego, CA 92037, USA
- Department of Otolaryngology-Head and Neck Surgery, University of California San Diego, San Diego, CA 92037, USA
| |
Collapse
|
42
|
Sutherland CA, Prigozhin DM, Monroe JG, Krasileva KV. High intraspecies allelic diversity in Arabidopsis NLR immune receptors is associated with distinct genomic and epigenomic features. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.12.523861. [PMID: 36711945 PMCID: PMC9882162 DOI: 10.1101/2023.01.12.523861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Plants rely on Nucleotide-binding, Leucine-rich repeat Receptors (NLRs) for pathogen recognition. Highly variable NLRs (hvNLRs) show remarkable intraspecies diversity, while their low variability paralogs (non-hvNLRs) are conserved between ecotypes. At a population level, hvNLRs provide new pathogen recognition specificities, but the association between allelic diversity and genomic and epigenomic features has not been established. Our investigation of NLRs in Arabidopsis Col-0 has revealed that hvNLRs show higher expression, less gene body cytosine methylation, and closer proximity to transposable elements than non-hvNLRs. hvNLRs show elevated synonymous and nonsynonymous nucleotide diversity and are in chromatin states associated with an increased probability of mutation. Diversifying selection maintains variability at a subset of codons of hvNLRs, while purifying selection maintains conservation at non-hvNLRs. How these features are established and maintained, and whether they contribute to the observed diversity of hvNLRs is key to understanding the evolution of plant innate immune receptors.
Collapse
Affiliation(s)
- Chandler A Sutherland
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA 94720
| | - Daniil M Prigozhin
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA 94720
| | - J Grey Monroe
- Department of Plant Sciences, University of California Davis, Davis, CA, USA 95616
| | - Ksenia V Krasileva
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA 94720
| |
Collapse
|
43
|
Li H, Khang TF. clrDV: a differential variability test for RNA-Seq data based on the skew-normal distribution. PeerJ 2023; 11:e16126. [PMID: 37790621 PMCID: PMC10544356 DOI: 10.7717/peerj.16126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 08/27/2023] [Indexed: 10/05/2023] Open
Abstract
Background Pathological conditions may result in certain genes having expression variance that differs markedly from that of the control. Finding such genes from gene expression data can provide invaluable candidates for therapeutic intervention. Under the dominant paradigm for modeling RNA-Seq gene counts using the negative binomial model, tests of differential variability are challenging to develop, owing to dependence of the variance on the mean. Methods Here, we describe clrDV, a statistical method for detecting genes that show differential variability between two populations. We present the skew-normal distribution for modeling gene-wise null distribution of centered log-ratio transformation of compositional RNA-seq data. Results Simulation results show that clrDV has false discovery rate and probability of Type II error that are on par with or superior to existing methodologies. In addition, its run time is faster than its closest competitors, and remains relatively constant for increasing sample size per group. Analysis of a large neurodegenerative disease RNA-Seq dataset using clrDV successfully recovers multiple gene candidates that have been reported to be associated with Alzheimer's disease.
Collapse
Affiliation(s)
- Hongxiang Li
- Institute of Mathematical Sciences, Universiti Malaya, Kuala Lumpur, Malaysia
| | - Tsung Fei Khang
- Institute of Mathematical Sciences, Universiti Malaya, Kuala Lumpur, Malaysia
- Universiti Malaya Centre for Data Analytics, Universiti Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
44
|
Zhang J, Zhang H, Ju Z, Peng Y, Pan Y, Xi W, Wei Y. JCcirc: circRNA full-length sequence assembly through integrated junction contigs. Brief Bioinform 2023; 24:bbad363. [PMID: 37833842 DOI: 10.1093/bib/bbad363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/04/2023] [Accepted: 09/20/2023] [Indexed: 10/15/2023] Open
Abstract
Recent studies have shed light on the potential of circular RNA (circRNA) as a biomarker for disease diagnosis and as a nucleic acid vaccine. The exploration of these functionalities requires correct circRNA full-length sequences; however, existing assembly tools can only correctly assemble some circRNAs, and their performance can be further improved. Here, we introduce a novel feature known as the junction contig (JC), which is an extension of the back-splice junction (BSJ). Leveraging the strengths of both BSJ and JC, we present a novel method called JCcirc (https://github.com/cbbzhang/JCcirc). It enables efficient reconstruction of all types of circRNA full-length sequences and their alternative isoforms using splice graphs and fragment coverage. Our findings demonstrate the superiority of JCcirc over existing methods on human simulation datasets, and its average F1 score surpasses CircAST by 0.40 and outperforms both CIRI-full and circRNAfull by 0.13. For circRNAs below 400 bp, 400-800 bp, 800 bp-1200 bp and above 1200 bp, the correct assembly rates are 0.13, 0.09, 0.04 and 0.03 higher, respectively, than those achieved by existing methods. Moreover, JCcirc also outperforms existing assembly tools on other five model species datasets and real sequencing datasets. These results show that JCcirc is a robust tool for accurately assembling circRNA full-length sequences, laying the foundation for the functional analysis of circRNAs.
Collapse
Affiliation(s)
- Jingjing Zhang
- University of Chinese Academy of Sciences, Beijing, China
- Shenzhen Key Laboratory of Intelligent Bioinformatics & Center for High Performance Computing, Shenzhen Institute of Advanced Technology, CAS, Shenzhen, China
| | - Huiling Zhang
- College of Mathematics and Information, South China Agriculture University, Guangzhou, China
| | - Zhen Ju
- University of Chinese Academy of Sciences, Beijing, China
- Shenzhen Key Laboratory of Intelligent Bioinformatics & Center for High Performance Computing, Shenzhen Institute of Advanced Technology, CAS, Shenzhen, China
| | - Yin Peng
- Guangdong Key Laboratory for Genome Stability and Disease Prevention and Regional Immunity and Diseases, Department of Pathology, Shenzhen University School of Medicine, Shenzhen, China
| | - Yi Pan
- Shenzhen Key Laboratory of Intelligent Bioinformatics & Center for High Performance Computing, Shenzhen Institute of Advanced Technology, CAS, Shenzhen, China
| | - Wenhui Xi
- Shenzhen Key Laboratory of Intelligent Bioinformatics & Center for High Performance Computing, Shenzhen Institute of Advanced Technology, CAS, Shenzhen, China
| | - Yanjie Wei
- Shenzhen Key Laboratory of Intelligent Bioinformatics & Center for High Performance Computing, Shenzhen Institute of Advanced Technology, CAS, Shenzhen, China
| |
Collapse
|
45
|
Mestre-Tomás J, Liu T, Pardo-Palacios F, Conesa A. SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554392. [PMID: 37662216 PMCID: PMC10473693 DOI: 10.1101/2023.08.23.554392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Long-read RNA-seq has emerged as a powerful tool for transcript discovery, even in well-annotated organisms. However, assessing the accuracy of different methods in identifying annotated and novel transcripts remains a challenge. Here, we present SQANTI-SIM, a versatile utility that wraps around popular long-read simulators to allow precise management of transcript novelty based on the structural categories defined by SQANTI3. By selectively excluding specific transcripts from the reference dataset, SQANTI-SIM effectively emulates scenarios involving unannotated transcripts. Furthermore, the tool provides customizable features and supports the simulation of additional types of data, representing the first multi-omics simulation tool for the lrRNA-seq field. We demonstrate the effectiveness of SQANTI-SIM by benchmarking five transcriptome reconstruction pipelines using the simulated data.
Collapse
Affiliation(s)
- Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Francisco Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| |
Collapse
|
46
|
Wu EY, Singh NP, Choi K, Zakeri M, Vincent M, Churchill GA, Ackert-Bicknell CL, Patro R, Love MI. SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty. Genome Biol 2023; 24:165. [PMID: 37438847 PMCID: PMC10337143 DOI: 10.1186/s13059-023-03003-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 06/29/2023] [Indexed: 07/14/2023] Open
Abstract
Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time.
Collapse
Affiliation(s)
- Euphy Y Wu
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Noor P Singh
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | - Mohsen Zakeri
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | | | - Cheryl L Ackert-Bicknell
- Department of Orthopedics, School of Medicine, University of Colorado, Anschutz Campus, Aurora, CO, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
47
|
Hoffmann M, Schwartz L, Ciora OA, Trummer N, Willruth LL, Jankowski J, Lee HK, Baumbach J, Furth PA, Hennighausen L, List M. circRNA-sponging: a pipeline for extensive analysis of circRNA expression and their role in miRNA sponging. BIOINFORMATICS ADVANCES 2023; 3:vbad093. [PMID: 37485422 PMCID: PMC10359604 DOI: 10.1093/bioadv/vbad093] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/23/2023] [Accepted: 07/07/2023] [Indexed: 07/25/2023]
Abstract
Motivation Circular RNAs (circRNAs) are long noncoding RNAs (lncRNAs) often associated with diseases and considered potential biomarkers for diagnosis and treatment. Among other functions, circRNAs have been shown to act as microRNA (miRNA) sponges, preventing the role of miRNAs that repress their targets. However, there is no pipeline to systematically assess the sponging potential of circRNAs. Results We developed circRNA-sponging, a nextflow pipeline that (i) identifies circRNAs via backsplicing junctions detected in RNA-seq data, (ii) quantifies their expression values in relation to their linear counterparts spliced from the same gene, (iii) performs differential expression analysis, (iv) identifies and quantifies miRNA expression from miRNA-sequencing (miRNA-seq) data, (v) predicts miRNA binding sites on circRNAs, (vi) systematically investigates potential circRNA-miRNA sponging events, (vii) creates a network of competing endogenous RNAs and (viii) identifies potential circRNA biomarkers. We showed the functionality of the circRNA-sponging pipeline using RNA sequencing data from brain tissues, where we identified two distinct types of circRNAs characterized by a specific ratio of the number of the binding site to the length of the transcript. The circRNA-sponging pipeline is the first end-to-end pipeline to identify circRNAs and their sponging systematically with raw total RNA-seq and miRNA-seq files, allowing us to better indicate the functional impact of circRNAs as a routine aspect in transcriptomic research. Availability and implementation https://github.com/biomedbigdata/circRNA-sponging. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | | | - Nico Trummer
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising D-85354, Germany
| | - Lina-Liv Willruth
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising D-85354, Germany
| | - Jakub Jankowski
- Laboratory of Genetics and Physiology, National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Hye Kyung Lee
- Laboratory of Genetics and Physiology, National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jan Baumbach
- Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Odense, Denmark
| | - Priscilla A Furth
- Laboratory of Genetics and Physiology, National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
- Departments of Oncology & Medicine, Georgetown University, Washington, DC, USA
| | - Lothar Hennighausen
- Institute for Advanced Study, Technical University of Munich, Garching D-85748, Germany
- Laboratory of Genetics and Physiology, National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Markus List
- To whom correspondence should be addressed. or
| |
Collapse
|
48
|
Singh NP, Love MI, Patro R. TreeTerminus -creating transcript trees using inferential replicate counts. iScience 2023; 26:106961. [PMID: 37378336 PMCID: PMC10291472 DOI: 10.1016/j.isci.2023.106961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 04/18/2023] [Accepted: 05/22/2023] [Indexed: 06/29/2023] Open
Abstract
A certain degree of uncertainty is always associated with the transcript abundance estimates. The uncertainty may make many downstream analyses, such as differential testing, difficult for certain transcripts. Conversely, gene-level analysis, though less ambiguous, is often too coarse-grained. We introduce TreeTerminus, a data-driven approach for grouping transcripts into a tree structure where leaves represent individual transcripts and internal nodes represent an aggregation of a transcript set. TreeTerminus constructs trees such that, on average, the inferential uncertainty decreases as we ascend the tree topology. The tree provides the flexibility to analyze data at nodes that are at different levels of resolution in the tree and can be tuned depending on the analysis of interest. We evaluated TreeTerminus on two simulated and two experimental datasets and observed an improved performance compared to transcripts (leaves) and other methods under several different metrics.
Collapse
Affiliation(s)
- Noor Pratap Singh
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, USA
| |
Collapse
|
49
|
Wajnberg G, Allain EP, Roy JW, Srivastava S, Saucier D, Morin P, Marrero A, O’Connell C, Ghosh A, Lewis SM, Ouellette RJ, Crapoulet N. Application of annotation-agnostic RNA sequencing data analysis tools for biomarker discovery in liquid biopsy. FRONTIERS IN BIOINFORMATICS 2023; 3:1127661. [PMID: 37252342 PMCID: PMC10213969 DOI: 10.3389/fbinf.2023.1127661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 04/17/2023] [Indexed: 05/31/2023] Open
Abstract
RNA sequencing analysis is an important field in the study of extracellular vesicles (EVs), as these particles contain a variety of RNA species that may have diagnostic, prognostic and predictive value. Many of the bioinformatics tools currently used to analyze EV cargo rely on third-party annotations. Recently, analysis of unannotated expressed RNAs has become of interest, since these may provide complementary information to traditional annotated biomarkers or may help refine biological signatures used in machine learning by including unknown regions. Here we perform a comparative analysis of annotation-free and classical read-summarization tools for the analysis of RNA sequencing data generated for EVs isolated from persons with amyotrophic lateral sclerosis (ALS) and healthy donors. Differential expression analysis and digital-droplet PCR validation of unannotated RNAs also confirmed their existence and demonstrates the usefulness of including such potential biomarkers in transcriptome analysis. We show that find-then-annotate methods perform similarly to standard tools for the analysis of known features, and can also identify unannotated expressed RNAs, two of which were validated as overexpressed in ALS samples. We demonstrate that these tools can therefore be used for a stand-alone analysis or easily integrated into current workflows and may be useful for re-analysis as annotations can be integrated post hoc.
Collapse
Affiliation(s)
| | - Eric P. Allain
- Atlantic Cancer Research Institute, Moncton, NB, Canada
- Department of Clinical Genetics, Vitalité Health Network, Dr. Georges-L.-Dumont University Hospital Centre, Moncton, NB, Canada
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS, Canada
| | - Jeremy W. Roy
- Atlantic Cancer Research Institute, Moncton, NB, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS, Canada
| | | | - Daniel Saucier
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
| | - Pier Morin
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
| | - Alier Marrero
- Dr. Georges-L.-Dumont University Hospital Centre, Moncton, NB, Canada
| | | | - Anirban Ghosh
- Atlantic Cancer Research Institute, Moncton, NB, Canada
| | - Stephen M. Lewis
- Atlantic Cancer Research Institute, Moncton, NB, Canada
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS, Canada
| | - Rodney J. Ouellette
- Atlantic Cancer Research Institute, Moncton, NB, Canada
- Department of Chemistry and Biochemistry, Université de Moncton, Moncton, NB, Canada
- Beatrice Hunter Cancer Research Institute, Halifax, NS, Canada
- Dr. Georges-L.-Dumont University Hospital Centre, Moncton, NB, Canada
| | | |
Collapse
|
50
|
Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. BEERS2: RNA-Seq simulation through high fidelity in silico modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.21.537847. [PMID: 37162982 PMCID: PMC10168222 DOI: 10.1101/2023.04.21.537847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking, and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully-length mRNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM, or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in PCR amplification, barcode read errors, and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Dimitra Sarantopoulou
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Soumyashant Nayak
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: Statistics and Mathematics Unit, Indian Statistical Institute, Bengaluru, Karnataka, India
| | - Amruta Naik
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shaon Sengupta
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Peter S Choi
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology & Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|