1
|
Vasquez I, Soto-Davila M, Hossain A, Gnanagobal H, Hall JR, Santander J. Dual-seq transcriptomics of Aeromonas salmonicida infection in Atlantic salmon (Salmo salar) primary macrophages reveals lysosome and apoptosis impairments. FISH & SHELLFISH IMMUNOLOGY 2025; 162:110359. [PMID: 40262690 DOI: 10.1016/j.fsi.2025.110359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 04/02/2025] [Accepted: 04/18/2025] [Indexed: 04/24/2025]
Abstract
A. salmonicida subsp. salmonicida is one of the oldest-known marine pathogens causing furunculosis in freshwater to marine fish species. A. salmonicida causes septicemia and fish death due to a systemic shock. Early stages of A. salmonicida infection, including intracellular macrophage infection, are not fully comprehended. Here, we conducted a dual RNA-seq study and functional analyses in Atlantic salmon primary macrophages infected with A. salmonicida to identify relevant genes for fish cellular immunity and A. salmonicida pathogenesis. At 1-h post-infection (hpi), A. salmonicida modulated the expression of genes associated with inflammation, fatty acids synthesis, and apoptosis. While at 2 hpi A. salmonicida hijacked pathways related to myeloid cell differentiation, cytoskeleton and actin filament organization, lysosome maturation, and apoptosis. In contrast, A. salmonicida upregulated genes encoding for hemolysin, aerolysin, type IVa pili, and T3SS effectors. In conclusion, these results suggest that A. salmonicida induces endocytosis, impairs lysosome maturation, and reduces apoptosis.
Collapse
Affiliation(s)
- Ignacio Vasquez
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences Memorial University of Newfoundland, NL, Canada.
| | - Manuel Soto-Davila
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences Memorial University of Newfoundland, NL, Canada
| | - Ahmed Hossain
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences Memorial University of Newfoundland, NL, Canada
| | - Hajarooba Gnanagobal
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences Memorial University of Newfoundland, NL, Canada
| | - Jennifer R Hall
- Aquatic Research Cluster, CREAIT Network, Ocean Sciences Centre, Memorial University of Newfoundland, 0 Marine Lab Road, St. John's, NL, A1C 5S7, Canada
| | - Javier Santander
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences Memorial University of Newfoundland, NL, Canada.
| |
Collapse
|
2
|
Zhao S, Macakova K, Sinson JC, Dai H, Rosenfeld J, Zapata GE, Li S, Ward PA, Wang C, Qu C, Maywald B, Lee B, Eng C, Liu P. Clinical validation of RNA sequencing for Mendelian disorder diagnostics. Am J Hum Genet 2025; 112:779-792. [PMID: 40043707 DOI: 10.1016/j.ajhg.2025.02.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 02/06/2025] [Accepted: 02/06/2025] [Indexed: 03/12/2025] Open
Abstract
Despite rapid advancements in clinical sequencing, over half of diagnostic evaluations still lack definitive results. RNA sequencing (RNA-seq) has shown promise in research settings for bridging this gap by providing essential functional data for accurate interpretation of diagnostic sequencing results. However, despite advanced research pipelines, clinical translation of diagnostic RNA-seq has not yet been realized. We have developed and validated a clinical diagnostic RNA-seq test for individuals with suspected genetic disorders who have existing or concurrent comprehensive DNA diagnostic testing. This diagnostic RNA-seq test processes RNA samples from fibroblasts or blood and derives clinical interpretations based on the analytical detection of outliers in gene expressions and splicing patterns. The clinical validation involves 130 samples, including 90 negative and 40 positive samples. We developed provisional expression and splicing benchmarks using short-read and long-read RNA-seq data from the GM24385 lymphoblastoid sample produced by the Genome in a Bottle Consortium. For clinical validation, we first established reference ranges for each gene and junction based on expression distributions from our control data. We then evaluated the clinical performance of our outlier-based pipeline using 40 positive samples with previously identified diagnostic findings from the Undiagnosed Diseases Network project. Our study provides a paradigm and necessary resources for independent laboratories to validate a clinical RNA-seq test.
Collapse
Affiliation(s)
- Sen Zhao
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kristina Macakova
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA; Graduate Program in Diagnostic Genetics and Genomics, The University of Texas MD Anderson Cancer Center School of Health Professions, Houston, TX 77030, USA
| | - Jefferson C Sinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA
| | - Hongzheng Dai
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Baylor Genetics, Houston, TX 77021, USA
| | - Jill Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Gladys E Zapata
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shenglan Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA
| | - Patricia A Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA
| | - Christiana Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Becky Maywald
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA
| | - Brendan Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA
| | - Christine Eng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA; Baylor Genetics, Houston, TX 77021, USA
| | - Pengfei Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Genetics and Multiomics Laboratory, Baylor College of Medicine, Houston, TX 77030, USA; Baylor Genetics, Houston, TX 77021, USA.
| |
Collapse
|
3
|
Rodrigues ABM, Passetti F, Guimarães ACR. Complementary Strategies to Identify Differentially Expressed Genes in the Choroid Plexus of Patients with Progressive Multiple Sclerosis. Neuroinformatics 2025; 23:10. [PMID: 39836313 DOI: 10.1007/s12021-024-09713-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/28/2024] [Indexed: 01/22/2025]
Abstract
Multiple sclerosis (MS) is a neurological disease causing myelin and axon damage through inflammatory and autoimmune processes. Despite affecting millions worldwide, understanding its genetic pathways remains limited. The choroid plexus (ChP) has been studied in neurodegenerative processes and diseases like MS due to its dysregulation, yet its role in MS pathophysiology remains unclear. Our work re-evaluates the ChP transcriptome in progressive MS patients and compares gene expression profiles using diverse methodological strategies. Samples from patient and healthy control RNASeq sequencing of brain tissue from post-mortem patients (GEO: GSE137619) were used. After an evaluation and quality control of these data, they had their transcripts mapped and quantified against the reference transcriptome GRCh38/hg38 of Homo sapiens using three strategies to identify differentially expressed genes in progressive MS patients. Functional analysis of genes revealed their involvement in immune processes, cell adhesion and migration, hormonal actions, amino acid transport, chemokines, metals, and signaling pathways. Our findings can offer valuable insights for progressive MS therapies, suggesting specific genes influence immune cell recruitment and potential ChP microenvironment changes. Combining complementary approaches maximizes literature coverage, facilitating a deeper understanding of the biological context in progressive MS.
Collapse
Affiliation(s)
| | - Fabio Passetti
- Instituto Carlos Chagas - Fiocruz/Paraná, Curitiba, PR, Brazil
| | - Ana Carolina Ramos Guimarães
- Laboratory for Applied Genomics and Bioinnovations, Instituto Oswaldo Cruz - Fiocruz, Rio de Janeiro, RJ, Brazil.
| |
Collapse
|
4
|
Wang D, Gazzara MR, Jewell S, Wales-McGrath B, Brown CD, Choi PS, Barash Y. A Deep Dive into Statistical Modeling of RNA Splicing QTLs Reveals New Variants that Explain Neurodegenerative Disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.01.610696. [PMID: 39282456 PMCID: PMC11398334 DOI: 10.1101/2024.09.01.610696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Genome-wide association studies (GWAS) have identified thousands of putative disease causing variants with unknown regulatory effects. Efforts to connect these variants with splicing quantitative trait loci (sQTLs) have provided functional insights, yet sQTLs reported by existing methods cannot explain many GWAS signals. We show current sQTL modeling approaches can be improved by considering alternative splicing representation, model calibration, and covariate integration. We then introduce MAJIQTL, a new pipeline for sQTL discovery. MAJIQTL includes two new statistical methods: a weighted multiple testing approach for sGene discovery and a model for sQTL effect size inference to improve variant prioritization. By applying MAJIQTL to GTEx, we find significantly more sGenes harboring sQTLs with functional significance. Notably, our analysis implicates the novel variant rs582283 in Alzheimer's disease. Using antisense oligonucleotides, we validate this variant's effect by blocking the implicated YBX3 binding site, leading to exon skipping in the gene MS4A3.
Collapse
Affiliation(s)
- David Wang
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania
| | - Matthew R. Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania
| | - San Jewell
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
| | | | | | - Peter S. Choi
- Department of Pathology & Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania
- Division of Cancer Pathobiology, The Children’s Hospital of Philadelphia
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
- Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania
| |
Collapse
|
5
|
Kim CS, Cairns J, Quarantotti V, Kaczkowski B, Wang Y, Konings P, Zhang X. A statistical simulation model to guide the choices of analytical methods in arrayed CRISPR screen experiments. PLoS One 2024; 19:e0307445. [PMID: 39163294 PMCID: PMC11335118 DOI: 10.1371/journal.pone.0307445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 07/03/2024] [Indexed: 08/22/2024] Open
Abstract
An arrayed CRISPR screen is a high-throughput functional genomic screening method, which typically uses 384 well plates and has different gene knockouts in different wells. Despite various computational workflows, there is currently no systematic way to find what is a good workflow for arrayed CRISPR screening data analysis. To guide this choice, we developed a statistical simulation model that mimics the data generating process of arrayed CRISPR screening experiments. Our model is flexible and can simulate effects on phenotypic readouts of various experimental factors, such as the effect size of gene editing, as well as biological and technical variations. With two examples, we showed that the simulation model can assist making principled choice of normalization and hit calling method for the arrayed CRISPR data analysis. This simulation model is implemented in an R package and can be downloaded from Github.
Collapse
Affiliation(s)
- Chang Sik Kim
- Data Sciences & Quantitative Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, England
| | - Jonathan Cairns
- Data Sciences & Quantitative Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, England
| | - Valentina Quarantotti
- Functional Genomics, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, England
| | - Bogumil Kaczkowski
- Data Sciences & Quantitative Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, England
| | - Yinhai Wang
- Data Sciences & Quantitative Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, England
| | - Peter Konings
- Data Sciences & Quantitative Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, England
| | - Xiang Zhang
- Data Sciences & Quantitative Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, England
| |
Collapse
|
6
|
Zhou J, Li Q, Deng X, Peng L, Sun J, Zhang Y, Du Y. Comprehensive analysis identifies ubiquitin ligase FBXO42 as a tumor-promoting factor in neuroblastoma. Sci Rep 2024; 14:18697. [PMID: 39134694 PMCID: PMC11319589 DOI: 10.1038/s41598-024-69760-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 08/08/2024] [Indexed: 08/15/2024] Open
Abstract
Neuroblastoma, the deadliest solid tumor in children, exhibits alarming mortality rates, particularly among high-risk cases. To enhance survival rates, a more precise risk stratification for patients is imperative. Utilizing proteomic data from 34 cases with or without N-Myc amplification, we identified 28 differentially expressed ubiquitination-related proteins (URGs). From these, a prognostic signature comprising 6 URGs was constructed. A nomogram incorporating clinical-pathological parameters yielded impressive AUC values of 0.88, 0.93, and 0.95 at 1, 3, and 5 years, respectively. Functional experiments targeting the E3 ubiquitin ligase FBXO42, a component of the prognostic signature, revealed its TP53-dependent promotion of neuroblastoma cell proliferation. In conclusion, our ubiquitination-related prognostic model robustly predicts patient outcomes, guiding clinical decisions. Additionally, the newfound pro-proliferative role of FBXO42 offers a novel foundation for understanding the molecular mechanisms of neuroblastoma.
Collapse
Affiliation(s)
- Jianwu Zhou
- Department of Pediatric Surgical Oncology, Children's Hospital of Chongqing Medical University; and the National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Chongqing, 400014, People's Republic of China
| | - Qijun Li
- Laboratory Animal Center, Chongqing Medical University, Chongqing, 400016, People's Republic of China
| | - Xiaobin Deng
- Department of Pediatric Surgical Oncology, Children's Hospital of Chongqing Medical University; and the National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Chongqing, 400014, People's Republic of China
| | - Liang Peng
- Department of Pediatric Surgical Oncology, Children's Hospital of Chongqing Medical University; and the National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Chongqing, 400014, People's Republic of China
| | - Jian Sun
- Department of Pediatric Surgical Oncology, Children's Hospital of Chongqing Medical University; and the National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Chongqing, 400014, People's Republic of China
| | - Yao Zhang
- Department of Pediatric Surgical Oncology, Children's Hospital of Chongqing Medical University; and the National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Chongqing, 400014, People's Republic of China
| | - Yifei Du
- Department of Pediatric Surgical Oncology, Children's Hospital of Chongqing Medical University; and the National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatrics, Chongqing, 400014, People's Republic of China.
| |
Collapse
|
7
|
Carels N. Assessing RNA-Seq Workflow Methodologies Using Shannon Entropy. BIOLOGY 2024; 13:482. [PMID: 39056677 PMCID: PMC11274087 DOI: 10.3390/biology13070482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 06/20/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024]
Abstract
RNA-seq faces persistent challenges due to the ongoing, expanding array of data processing workflows, none of which have yet achieved standardization to date. It is imperative to determine which method most effectively preserves biological facts. Here, we used Shannon entropy as a tool for depicting the biological status of a system. Thus, we assessed the measurement of Shannon entropy by several RNA-seq workflow approaches, such as DESeq2 and edgeR, but also by combining nine normalization methods with log2 fold change on paired samples of TCGA RNA-seq representing datasets of 515 patients and spanning 12 different cancer types with 5-year overall survival rates ranging from 20% to 98%. Our analysis revealed that TPM, RLE, and TMM normalization, coupled with a threshold of log2 fold change ≥1, for identifying differentially expressed genes, yielded the best results. We propose that Shannon entropy can serve as an objective metric for refining the optimization of RNA-seq workflows and mRNA sequencing technologies.
Collapse
Affiliation(s)
- Nicolas Carels
- Laboratory of Biological System Modeling, Center of Technological Development in Health (CDTS), Oswaldo Cruz Foundation (Fiocruz), Rio de Janeiro 21040-900, RJ, Brazil
| |
Collapse
|
8
|
Stadler M, Lukauskas S, Bartke T, Müller CL. asteRIa enables robust interaction modeling between chromatin modifications and epigenetic readers. Nucleic Acids Res 2024; 52:6129-6144. [PMID: 38752495 PMCID: PMC11194111 DOI: 10.1093/nar/gkae361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/15/2024] [Accepted: 04/24/2024] [Indexed: 06/25/2024] Open
Abstract
Chromatin, the nucleoprotein complex consisting of DNA and histone proteins, plays a crucial role in regulating gene expression by controlling access to DNA. Chromatin modifications are key players in this regulation, as they help to orchestrate DNA transcription, replication, and repair. These modifications recruit epigenetic 'reader' proteins, which mediate downstream events. Most modifications occur in distinctive combinations within a nucleosome, suggesting that epigenetic information can be encoded in combinatorial chromatin modifications. A detailed understanding of how multiple modifications cooperate in recruiting such proteins has, however, remained largely elusive. Here, we integrate nucleosome affinity purification data with high-throughput quantitative proteomics and hierarchical interaction modeling to estimate combinatorial effects of chromatin modifications on protein recruitment. This is facilitated by the computational workflow asteRIa which combines hierarchical interaction modeling, stability-based model selection, and replicate-consistency checks for a stable estimation of Robust Interactions among chromatin modifications. asteRIa identifies several epigenetic reader candidates responding to specific interactions between chromatin modifications. For the polycomb protein CBX8, we independently validate our results using genome-wide ChIP-Seq and bisulphite sequencing datasets. We provide the first quantitative framework for identifying cooperative effects of chromatin modifications on protein binding.
Collapse
Affiliation(s)
- Mara Stadler
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Statistics, Ludwig-Maximilians-University Munich, 80539 Munich, Germany
| | - Saulius Lukauskas
- Institute of Functional Epigenetics, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Till Bartke
- Institute of Functional Epigenetics, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Christian L Müller
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Statistics, Ludwig-Maximilians-University Munich, 80539 Munich, Germany
- Center for Computational Mathematics, Flatiron Institute, New York, NY 10010, USA
| |
Collapse
|
9
|
Brooks TG, Lahens NF, Mrčela A, Grant GR. Challenges and best practices in omics benchmarking. Nat Rev Genet 2024; 25:326-339. [PMID: 38216661 DOI: 10.1038/s41576-023-00679-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2023] [Indexed: 01/14/2024]
Abstract
Technological advances enabling massively parallel measurement of biological features - such as microarrays, high-throughput sequencing and mass spectrometry - have ushered in the omics era, now in its third decade. The resulting complex landscape of analytical methods has naturally fostered the growth of an omics benchmarking industry. Benchmarking refers to the process of objectively comparing and evaluating the performance of different computational or analytical techniques when processing and analysing large-scale biological data sets, such as transcriptomics, proteomics and metabolomics. With thousands of omics benchmarking studies published over the past 25 years, the field has matured to the point where the foundations of benchmarking have been established and well described. However, generating meaningful benchmarking data and properly evaluating performance in this complex domain remains challenging. In this Review, we highlight some common oversights and pitfalls in omics benchmarking. We also establish a methodology to bring the issues that can be addressed into focus and to be transparent about those that cannot: this takes the form of a spreadsheet template of guidelines for comprehensive reporting, intended to accompany publications. In addition, a survey of recent developments in benchmarking is provided as well as specific guidance for commonly encountered difficulties.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
10
|
Einson J, Minaeva M, Rafi F, Lappalainen T. The impact of genetically controlled splicing on exon inclusion and protein structure. PLoS One 2024; 19:e0291960. [PMID: 38478511 PMCID: PMC10936842 DOI: 10.1371/journal.pone.0291960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 09/08/2023] [Indexed: 03/17/2024] Open
Abstract
Common variants affecting mRNA splicing are typically identified though splicing quantitative trait locus (sQTL) mapping and have been shown to be enriched for GWAS signals by a similar degree to eQTLs. However, the specific splicing changes induced by these variants have been difficult to characterize, making it more complicated to analyze the effect size and direction of sQTLs, and to determine downstream splicing effects on protein structure. In this study, we catalogue sQTLs using exon percent spliced in (PSI) scores as a quantitative phenotype. PSI is an interpretable metric for identifying exon skipping events and has some advantages over other methods for quantifying splicing from short read RNA sequencing. In our set of sQTL variants, we find evidence of selective effects based on splicing effect size and effect direction, as well as exon symmetry. Additionally, we utilize AlphaFold2 to predict changes in protein structure associated with sQTLs overlapping GWAS traits, highlighting a potential new use-case for this technology for interpreting genetic effects on traits and disorders.
Collapse
Affiliation(s)
- Jonah Einson
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, United States of America
- New York Genome Center, New York, NY, United States of America
| | - Mariia Minaeva
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Faiza Rafi
- New York Genome Center, New York, NY, United States of America
- Department of Biotechnology, The City College of New York, New York, NY, United States of America
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, United States of America
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, United States of America
| |
Collapse
|
11
|
Perelo LW, Gabernet G, Straub D, Nahnsen S. How tool combinations in different pipeline versions affect the outcome in RNA-seq analysis. NAR Genom Bioinform 2024; 6:lqae020. [PMID: 38456178 PMCID: PMC10919883 DOI: 10.1093/nargab/lqae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/07/2024] [Accepted: 02/12/2024] [Indexed: 03/09/2024] Open
Abstract
Data analysis tools are continuously changed and improved over time. In order to test how these changes influence the comparability between analyses, the output of different workflow options of the nf-core/rnaseq pipeline were compared. Five different pipeline settings (STAR+Salmon, STAR+RSEM, STAR+featureCounts, HISAT2+featureCounts, pseudoaligner Salmon) were run on three datasets (human, Arabidopsis, zebrafish) containing spike-ins of the External RNA Control Consortium (ERCC). Fold change ratios and differential expression of genes and spike-ins were used for comparative analyses of the different tools and versions settings of the pipeline. An overlap of 85% for differential gene classification between pipelines could be shown. Genes interpreted with a bias were mostly those present at lower concentration. Also, the number of isoforms and exons per gene were determinants. Previous pipeline versions using featureCounts showed a higher sensitivity to detect one-isoform genes like ERCC. To ensure data comparability in long-term analysis series it would be recommendable to either stay with the pipeline version the series was initialized with or to run both versions during a transition time in order to ensure that the target genes are addressed the same way.
Collapse
Affiliation(s)
- Louisa Wessels Perelo
- Quantitative Biology Center (QBiC), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
| | - Gisela Gabernet
- Quantitative Biology Center (QBiC), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
| | - Daniel Straub
- Quantitative Biology Center (QBiC), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
- M3 Research Center, Faculty of Medicine, University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
- Department of Computer Science, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
- Cluster of Excellence iFIT (EXC 2180), Image-Guided and Functionally Instructed Tumor Therapies, University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
| |
Collapse
|
12
|
Tang M, Zhao G, Awais M, Gao X, Meng W, Lin J, Zhao B, Lai Z, Lin Y, Chen Y. Genome-Wide Identification and Expression Analysis Reveals the B3 Superfamily Involved in Embryogenesis and Hormone Responses in Dimocarpus longan Lour. Int J Mol Sci 2023; 25:127. [PMID: 38203301 PMCID: PMC10779397 DOI: 10.3390/ijms25010127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/17/2023] [Accepted: 12/19/2023] [Indexed: 01/12/2024] Open
Abstract
B3 family transcription factors play an essential regulatory role in plant growth and development processes. This study performed a comprehensive analysis of the B3 family transcription factor in longan (Dimocarpus longan Lour.), and a total of 75 DlB3 genes were identified. DlB3 genes were unevenly distributed on the 15 chromosomes of longan. Based on the protein domain similarities and functional diversities, the DlB3 family was further clustered into four subgroups (ARF, RAV, LAV, and REM). Bioinformatics and comparative analyses of B3 superfamily expression were conducted in different light and with different temperatures and tissues, and early somatic embryogenesis (SE) revealed its specific expression profile and potential biological functions during longan early SE. The qRT-PCR results indicated that DlB3 family members played a crucial role in longan SE and zygotic embryo development. Exogenous treatments of 2,4-D (2,4-dichlorophenoxyacetic acid), NPA (N-1-naphthylphthalamic acid), and PP333 (paclobutrazol) could significantly inhibit the expression of the DlB3 family. Supplementary ABA (abscisic acid), IAA (indole-3-acetic acid), and GA3 (gibberellin) suppressed the expressions of DlLEC2, DlARF16, DlTEM1, DlVAL2, and DlREM40, but DlFUS3, DlARF5, and DlREM9 showed an opposite trend. Furthermore, subcellular localization indicated that DlLEC2 and DlFUS3 were located in the nucleus, suggesting that they played a role in the nucleus. Therefore, DlB3s might be involved in complex plant hormone signal transduction pathways during longan SE and zygotic embryo development.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Yuling Lin
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou 350002, China; (M.T.); (G.Z.); (M.A.); (X.G.); (W.M.); (J.L.); (B.Z.); (Z.L.)
| | - Yukun Chen
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou 350002, China; (M.T.); (G.Z.); (M.A.); (X.G.); (W.M.); (J.L.); (B.Z.); (Z.L.)
| |
Collapse
|
13
|
Xie Z, Chen C, Ma’ayan A. Dex-Benchmark: datasets and code to evaluate algorithms for transcriptomics data analysis. PeerJ 2023; 11:e16351. [PMID: 37953774 PMCID: PMC10638921 DOI: 10.7717/peerj.16351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 10/04/2023] [Indexed: 11/14/2023] Open
Abstract
Many tools and algorithms are available for analyzing transcriptomics data. These include algorithms for performing sequence alignment, data normalization and imputation, clustering, identifying differentially expressed genes, and performing gene set enrichment analysis. To make the best choice about which tools to use, objective benchmarks can be developed to compare the quality of different algorithms to extract biological knowledge maximally and accurately from these data. The Dexamethasone Benchmark (Dex-Benchmark) resource aims to fill this need by providing the community with datasets and code templates for benchmarking different gene expression analysis tools and algorithms. The resource provides access to a collection of curated RNA-seq, L1000, and ChIP-seq data from dexamethasone treatment as well as genetic perturbations of its known targets. In addition, the website provides Jupyter Notebooks that use these pre-processed curated datasets to demonstrate how to benchmark the different steps in gene expression analysis. By comparing two independent data sources and data types with some expected concordance, we can assess which tools and algorithms best recover such associations. To demonstrate the usefulness of the resource for discovering novel drug targets, we applied it to optimize data processing strategies for the chemical perturbations and CRISPR single gene knockouts from the L1000 transcriptomics data from the Library of Integrated Network Cellular Signatures (LINCS) program, with a focus on understudied proteins from the Illuminating the Druggable Genome (IDG) program. Overall, the Dex-Benchmark resource can be utilized to assess the quality of transcriptomics and other related bioinformatics data analysis workflows. The resource is available from: https://maayanlab.github.io/dex-benchmark.
Collapse
Affiliation(s)
- Zhuorui Xie
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Clara Chen
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Avi Ma’ayan
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
14
|
Zhao D, Liu J, Yu T. Protocol for transcriptome assembly by the TransBorrow algorithm. Biol Methods Protoc 2023; 8:bpad028. [PMID: 38023349 PMCID: PMC10640700 DOI: 10.1093/biomethods/bpad028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/21/2023] [Accepted: 10/31/2023] [Indexed: 12/01/2023] Open
Abstract
High-throughput RNA-seq enables comprehensive analysis of the transcriptome for various purposes. However, this technology generally generates massive amounts of sequencing reads with a shorter read length. Consequently, fast, accurate, and flexible tools are needed for assembling raw RNA-seq data into full-length transcripts and quantifying their expression levels. In this protocol, we report TransBorrow, a novel transcriptome assembly software specifically designed for short RNA-seq reads. TransBorrow is employed in conjunction with a splice-aware alignment tool (e.g. Hisat2 and Star) and some other transcriptome assembly tools (e.g. StringTie, Cufflinks, and Scallop). The protocol encompasses all necessary steps, starting from downloading and processing raw sequencing data to assembling the full-length transcripts and quantifying their expressed abundances. The execution time of the protocol may vary depending on the sizes of processed datasets and computational platforms.
Collapse
Affiliation(s)
- Dengyi Zhao
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Ting Yu
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
15
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
16
|
Luo J, Wu X, Cheng Y, Chen G, Wang J, Song X. Expression quantitative trait locus studies in the era of single-cell omics. Front Genet 2023; 14:1182579. [PMID: 37284065 PMCID: PMC10239882 DOI: 10.3389/fgene.2023.1182579] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 04/26/2023] [Indexed: 06/08/2023] Open
Abstract
Genome-wide association studies have revealed that the regulation of gene expression bridges genetic variants and complex phenotypes. Profiling of the bulk transcriptome coupled with linkage analysis (expression quantitative trait locus (eQTL) mapping) has advanced our understanding of the relationship between genetic variants and gene regulation in the context of complex phenotypes. However, bulk transcriptomics has inherited limitations as the regulation of gene expression tends to be cell-type-specific. The advent of single-cell RNA-seq technology now enables the identification of the cell-type-specific regulation of gene expression through a single-cell eQTL (sc-eQTL). In this review, we first provide an overview of sc-eQTL studies, including data processing and the mapping procedure of the sc-eQTL. We then discuss the benefits and limitations of sc-eQTL analyses. Finally, we present an overview of the current and future applications of sc-eQTL discoveries.
Collapse
Affiliation(s)
- Jie Luo
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Xinyi Wu
- Institute of Vegetables, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Yuan Cheng
- Institute of Vegetables, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Guang Chen
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Jian Wang
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Xijiao Song
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| |
Collapse
|
17
|
Bhattacharyya M, Dhar R, Basu S, Das A, Reynolds DM, Dutta TK. Molecular evaluation of the metabolism of estrogenic di(2-ethylhexyl) phthalate in Mycolicibacterium sp. Microb Cell Fact 2023; 22:82. [PMID: 37101185 PMCID: PMC10134610 DOI: 10.1186/s12934-023-02096-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 04/12/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Di(2-ethylhexyl) phthalate (DEHP) is a widely detected plasticizer and a priority pollutant of utmost concern for its adverse impact on humans, wildlife and the environment. To eliminate such toxic burden, biological processes are the most promising ways to combat rampant environmental insults under eco-friendly conditions. The present study investigated the biochemical and molecular assessment of the catabolic potential of Mycolicibacterium sp. strain MBM in the assimilation of estrogenic DEHP. RESULTS A detailed biochemical study revealed an initial hydrolytic pathway of degradation for DEHP followed by the assimilation of hydrolyzed phthalic acid and 2-ethylhexanol to TCA cycle intermediates. Besides the inducible nature of DEHP-catabolic enzymes, strain MBM can efficiently utilize various low- and high-molecular-weight phthalate diesters and can grow under moderately halotolerant conditions. Whole genome sequence analysis exhibited a genome size of 6.2 Mb with a GC content of 66.51% containing 6,878 coding sequences, including multiple genes, annotated as relevant to the catabolism of phthalic acid esters (PAEs). Substantiating the annotated genes through transcriptome assessment followed by RT-qPCR analysis, the possible roles of upregulated genes/gene clusters in the metabolism of DEHP were revealed, reinforcing the biochemical pathway of degradation at the molecular level. CONCLUSIONS A detailed co-relation of biochemical, genomic, transcriptomic and RT-qPCR analyses highlights the PAE-degrading catabolic machineries in strain MBM. Further, due to functional attributes in the salinity range of both freshwater and seawater, strain MBM may find use as a suitable candidate in the bioremediation of PAEs.
Collapse
Affiliation(s)
- Mousumi Bhattacharyya
- Department of Microbiology, Bose Institute, EN-80, Sector V, Salt Lake, Kolkata, West Bengal, 700091, India
| | - Rinita Dhar
- Department of Microbiology, Bose Institute, EN-80, Sector V, Salt Lake, Kolkata, West Bengal, 700091, India
| | - Suman Basu
- Department of Microbiology, Bose Institute, EN-80, Sector V, Salt Lake, Kolkata, West Bengal, 700091, India
| | - Avijit Das
- Department of Microbiology, Bose Institute, EN-80, Sector V, Salt Lake, Kolkata, West Bengal, 700091, India
| | - Darren M Reynolds
- Centre for Research in Biosciences, Department of Applied Sciences, University of the West of England, Bristol, BS16 1QY, UK
| | - Tapan K Dutta
- Department of Microbiology, Bose Institute, EN-80, Sector V, Salt Lake, Kolkata, West Bengal, 700091, India.
| |
Collapse
|
18
|
Shaw TI, Zhao B, Li Y, Wang H, Wang L, Manley B, Stewart PA, Karolak A. Multi-omics approach to identifying isoform variants as therapeutic targets in cancer patients. Front Oncol 2022; 12:1051487. [PMID: 36505834 PMCID: PMC9730332 DOI: 10.3389/fonc.2022.1051487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 11/07/2022] [Indexed: 11/25/2022] Open
Abstract
Cancer-specific alternatively spliced events (ASE) play a role in cancer pathogenesis and can be targeted by immunotherapy, oligonucleotide therapy, and small molecule inhibition. However, identifying actionable ASE targets remains challenging due to the uncertainty of its protein product, structure impact, and proteoform (protein isoform) function. Here we argue that an integrated multi-omics profiling strategy can overcome these challenges, allowing us to mine this untapped source of targets for therapeutic development. In this review, we will provide an overview of current multi-omics strategies in characterizing ASEs by utilizing the transcriptome, proteome, and state-of-art algorithms for protein structure prediction. We will discuss limitations and knowledge gaps associated with each technology and informatics analytics. Finally, we will discuss future directions that will enable the full integration of multi-omics data for ASE target discovery.
Collapse
Affiliation(s)
- Timothy I. Shaw
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States,*Correspondence: Timothy I. Shaw,
| | - Bi Zhao
- Department of Machine Learning, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Yuxin Li
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN, United States
| | - Hong Wang
- Center for Proteomics and Metabolomics, St. Jude Children’s Research Hospital, Memphis, TN, United States
| | - Liang Wang
- Department of Tumor Biology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Brandon Manley
- Department of Genitourinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Paul A. Stewart
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Aleksandra Karolak
- Department of Machine Learning, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| |
Collapse
|
19
|
Chakraborty S, Hossain A, Cao T, Gnanagobal H, Segovia C, Hill S, Monk J, Porter J, Boyce D, Hall JR, Bindea G, Kumar S, Santander J. Multi-Organ Transcriptome Response of Lumpfish ( Cyclopterus lumpus) to Aeromonas salmonicida Subspecies salmonicida Systemic Infection. Microorganisms 2022; 10:2113. [PMID: 36363710 PMCID: PMC9692985 DOI: 10.3390/microorganisms10112113] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/17/2022] [Accepted: 10/21/2022] [Indexed: 09/10/2023] Open
Abstract
Lumpfish is utilized as a cleaner fish to biocontrol sealice infestations in Atlantic salmon farms. Aeromonas salmonicida, a Gram-negative facultative intracellular pathogen, is the causative agent of furunculosis in several fish species, including lumpfish. In this study, lumpfish were intraperitoneally injected with different doses of A. salmonicida to calculate the LD50. Samples of blood, head-kidney, spleen, and liver were collected at different time points to determine the infection kinetics. We determined that A. salmonicida LD50 is 102 CFU per dose. We found that the lumpfish head-kidney is the primary target organ of A. salmonicida. Triplicate biological samples were collected from head-kidney, spleen, and liver pre-infection and at 3- and 10-days post-infection for RNA-sequencing. The reference genome-guided transcriptome assembly resulted in 6246 differentially expressed genes. The de novo assembly resulted in 403,204 transcripts, which added 1307 novel genes not identified by the reference genome-guided transcriptome. Differential gene expression and gene ontology enrichment analyses suggested that A. salmonicida induces lethal infection in lumpfish by uncontrolled and detrimental blood coagulation, complement activation, inflammation, DNA damage, suppression of the adaptive immune system, and prevention of cytoskeleton formation.
Collapse
Affiliation(s)
- Setu Chakraborty
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Ahmed Hossain
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Trung Cao
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Hajarooba Gnanagobal
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Cristopher Segovia
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Stephen Hill
- Cold-Ocean Deep-Sea Research Facility, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Jennifer Monk
- Dr. Joe Brown Aquatic Research Building, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Jillian Porter
- Dr. Joe Brown Aquatic Research Building, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Danny Boyce
- Dr. Joe Brown Aquatic Research Building, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Jennifer R. Hall
- Aquatic Research Cluster, CREAIT Network, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Gabriela Bindea
- INSERM, Laboratory of Integrative Cancer Immunology, 75006 Paris, France
- Equipe Labellisée Ligue Contre Le Cancer, 75013 Paris, France
- Centre de Recherche des Cordeliers, Sorbonne Université, Université de Paris, 75006 Paris, France
| | - Surendra Kumar
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
- Ocean Frontier Institute, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Javier Santander
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| |
Collapse
|
20
|
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 2022; 31:R123-R136. [PMID: 35960994 PMCID: PMC9585682 DOI: 10.1093/hmg/ddac196] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 02/04/2023] Open
Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
21
|
Queiroz AL, Dantas E, Ramsamooj S, Murthy A, Ahmed M, Zunica ERM, Liang RJ, Murphy J, Holman CD, Bare CJ, Ghahramani G, Wu Z, Cohen DE, Kirwan JP, Cantley LC, Axelrod CL, Goncalves MD. Blocking ActRIIB and restoring appetite reverses cachexia and improves survival in mice with lung cancer. Nat Commun 2022; 13:4633. [PMID: 35941104 PMCID: PMC9360437 DOI: 10.1038/s41467-022-32135-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 07/13/2022] [Indexed: 12/30/2022] Open
Abstract
Cancer cachexia is a common, debilitating condition with limited therapeutic options. Using an established mouse model of lung cancer, we find that cachexia is characterized by reduced food intake, spontaneous activity, and energy expenditure accompanied by muscle metabolic dysfunction and atrophy. We identify Activin A as a purported driver of cachexia and treat with ActRIIB-Fc, a decoy ligand for TGF-β/activin family members, together with anamorelin (Ana), a ghrelin receptor agonist, to reverse muscle dysfunction and anorexia, respectively. Ana effectively increases food intake but only the combination of drugs increases lean mass, restores spontaneous activity, and improves overall survival. These beneficial effects are limited to female mice and are dependent on ovarian function. In agreement, high expression of Activin A in human lung adenocarcinoma correlates with unfavorable prognosis only in female patients, despite similar expression levels in both sexes. This study suggests that multimodal, sex-specific, therapies are needed to reverse cachexia.
Collapse
Affiliation(s)
- Andre Lima Queiroz
- Division of Endocrinology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Ezequiel Dantas
- Division of Endocrinology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Shakti Ramsamooj
- Division of Endocrinology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Anirudh Murthy
- Division of Endocrinology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Mujmmail Ahmed
- Division of Endocrinology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA
| | | | - Roger J Liang
- Division of Endocrinology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Jessica Murphy
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA
- Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Corey D Holman
- Division of Gastroenterology and Hepatology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Curtis J Bare
- Division of Gastroenterology and Hepatology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Gregory Ghahramani
- Weill Cornell Graduate School of Medical Sciences, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Zhidan Wu
- Internal Medicine Research Unit, Pfizer Global R&D, Cambridge, MA, USA
| | - David E Cohen
- Division of Gastroenterology and Hepatology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - John P Kirwan
- Pennington Biomedical Research Center, Baton Rouge, LA, 70808, USA
| | - Lewis C Cantley
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA
| | | | - Marcus D Goncalves
- Division of Endocrinology, Department of Medicine, Weill Cornell Medicine, New York, NY, 10065, USA.
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY, 10065, USA.
| |
Collapse
|
22
|
Glinos DA, Garborcauskas G, Hoffman P, Ehsan N, Jiang L, Gokden A, Dai X, Aguet F, Brown KL, Garimella K, Bowers T, Costello M, Ardlie K, Jian R, Tucker NR, Ellinor PT, Harrington ED, Tang H, Snyder M, Juul S, Mohammadi P, MacArthur DG, Lappalainen T, Cummings BB. Transcriptome variation in human tissues revealed by long-read sequencing. Nature 2022; 608:353-359. [PMID: 35922509 PMCID: PMC10337767 DOI: 10.1038/s41586-022-05035-y] [Citation(s) in RCA: 160] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/28/2022] [Indexed: 12/12/2022]
Abstract
Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.
Collapse
Affiliation(s)
- Dafni A Glinos
- New York Genome Center, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Garrett Garborcauskas
- Medical and Population Genetics Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | | | - Nava Ehsan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lihua Jiang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | | | | | - Kathleen L Brown
- New York Genome Center, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | | | - Tera Bowers
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Ruiqi Jian
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Nathan R Tucker
- Masonic Medical Research Institute, Utica, NY, USA
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Patrick T Ellinor
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Sissel Juul
- Oxford Nanopore Technology, New York, NY, USA
| | - Pejman Mohammadi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
- Scripps Research Translational Institute, La Jolla, CA, USA
| | - Daniel G MacArthur
- Medical and Population Genetics Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Beryl B Cummings
- Medical and Population Genetics Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
23
|
Liu G, Yang G, Zhao G, Guo C, Zeng Y, Xue Y, Zeng F. Spatial transcriptomic profiling to identify mesoderm progenitors with precision genomic screening and functional confirmation. Cell Prolif 2022; 55:e13298. [PMID: 35906841 PMCID: PMC9528766 DOI: 10.1111/cpr.13298] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/03/2022] [Accepted: 06/09/2022] [Indexed: 11/30/2022] Open
Abstract
Objectives Mesoderm, derived from a new layer between epiblast and hypoblast during gastrulation, can differentiate into various tissues, including muscles, bones, kidneys, blood, and the urogenital system. However, systematic elucidation of mesoderm characteristics and specific markers remains a challenge. This study aims to screen and identify candidate genes important for mesoderm development. Materials and Methods Cells originating from the three germ layers were obtained by laser capture microdissection, followed by microcellular RNA sequencing. Mesoderm‐specific differentially expressed genes (DEGs) were identified by using a combination of three bioinformatics pipelines. Candidate mesoderm‐specific genes expression were verified by real‐time quantitative polymerase chain reaction analysis and immunohistochemistry. Functional analyses were verified by ESCs‐EBs differentiation and colony‐forming units (CFUs) assay. Results A total of 1962 differentially expressed mesoderm genes were found, out of which 50 were candidate mesoderm‐specific DEGs which mainly participate in somite development, formation of the primary germ layer, segmentation, mesoderm development, and pattern specification process by GO analysis. Representative genes Cdh2, Cdh11, Jag1, T, Fn‐1, and Pcdh7 were specifically expressed in mesoderm among the three germ layers. Pcdh7 as membrane‐associated gene has hematopoietic‐relevant functions identified by ESCs‐EBs differentiation and CFUs assay. Conclusions Spatial transcriptomic profiling with multi‐method analysis and confirmation revealed candidate mesoderm progenitors. This approach appears to be efficient and reliable and can be extended to screen and validate candidate genes in various cellular systems.
Collapse
Affiliation(s)
- Guanghui Liu
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Guanheng Yang
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Guijun Zhao
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Chuanliang Guo
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yitao Zeng
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yan Xue
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Department of Histo-Embryology, Genetics and Developmental Biology, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,NHC Key Laboratory of Medical Embryogenesis and Developmental Molecular Biology, Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai, China
| | - Fanyi Zeng
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Department of Histo-Embryology, Genetics and Developmental Biology, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,NHC Key Laboratory of Medical Embryogenesis and Developmental Molecular Biology, Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai, China.,School of Pharmacy, Macau University of Science and Technology, Macau, China
| |
Collapse
|
24
|
Ringeling FR, Chakraborty S, Vissers C, Reiman D, Patel AM, Lee KH, Hong A, Park CW, Reska T, Gagneur J, Chang H, Spletter ML, Yoon KJ, Ming GL, Song H, Canzar S. Partitioning RNAs by length improves transcriptome reconstruction from short-read RNA-seq data. Nat Biotechnol 2022; 40:741-750. [PMID: 35013600 PMCID: PMC11332977 DOI: 10.1038/s41587-021-01136-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 10/26/2021] [Indexed: 02/06/2023]
Abstract
The accuracy of methods for assembling transcripts from short-read RNA sequencing data is limited by the lack of long-range information. Here we introduce Ladder-seq, an approach that separates transcripts according to their lengths before sequencing and uses the additional information to improve the quantification and assembly of transcripts. Using simulated data, we show that a kallisto algorithm extended to process Ladder-seq data quantifies transcripts of complex genes with substantially higher accuracy than conventional kallisto. For reference-based assembly, a tailored scheme based on the StringTie2 algorithm reconstructs a single transcript with 30.8% higher precision than its conventional counterpart and is more than 30% more sensitive for complex genes. For de novo assembly, a similar scheme based on the Trinity algorithm correctly assembles 78% more transcripts than conventional Trinity while improving precision by 78%. In experimental data, Ladder-seq reveals 40% more genes harboring isoform switches compared to conventional RNA sequencing and unveils widespread changes in isoform usage upon m6A depletion by Mettl14 knockout.
Collapse
Affiliation(s)
| | | | - Caroline Vissers
- Department of Biochemistry & Biophysics, University of California, San Francisco, San Francisco, CA, USA
| | - Derek Reiman
- Department of Biomedical Engineering, University of Illinois at Chicago, Chicago, IL, USA
| | - Akshay M Patel
- Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Ki-Heon Lee
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Ari Hong
- Center for RNA Research, Institute for Basic Science (IBS), Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chan-Woo Park
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Tim Reska
- Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Garching, Germany
- Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Hyeshik Chang
- Center for RNA Research, Institute for Basic Science (IBS), Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Maria L Spletter
- Biomedical Center, Department of Physiological Chemistry, Ludwig-Maximilians-Universität München, Martinsried-Planegg, Germany
| | - Ki-Jun Yoon
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Guo-Li Ming
- Department of Neuroscience and Mahoney Institute for Neurosciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Hongjun Song
- Department of Neuroscience and Mahoney Institute for Neurosciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Stefan Canzar
- Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany.
| |
Collapse
|
25
|
Wang K, Patkar S, Lee JS, Gertz EM, Robinson W, Schischlik F, Crawford DR, Schäffer AA, Ruppin E. Deconvolving Clinically Relevant Cellular Immune Cross-talk from Bulk Gene Expression Using CODEFACS and LIRICS Stratifies Patients with Melanoma to Anti-PD-1 Therapy. Cancer Discov 2022; 12:1088-1105. [PMID: 34983745 PMCID: PMC8983586 DOI: 10.1158/2159-8290.cd-21-0887] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Revised: 11/09/2021] [Accepted: 12/22/2021] [Indexed: 11/16/2022]
Abstract
The tumor microenvironment (TME) is a complex mixture of cell types whose interactions affect tumor growth and clinical outcome. To discover such interactions, we developed CODEFACS (COnfident DEconvolution For All Cell Subsets), a tool deconvolving cell type-specific gene expression in each sample from bulk expression, and LIRICS (Ligand-Receptor Interactions between Cell Subsets), a statistical framework prioritizing clinically relevant ligand-receptor interactions between cell types from the deconvolved data. We first demonstrate the superiority of CODEFACS versus the state-of-the-art deconvolution method CIBERSORTx. Second, analyzing The Cancer Genome Atlas, we uncover cell type-specific ligand-receptor interactions uniquely associated with mismatch-repair deficiency across different cancer types, providing additional insights into their enhanced sensitivity to anti-programmed cell death protein 1 (PD-1) therapy compared with other tumors with high neoantigen burden. Finally, we identify a subset of cell type-specific ligand-receptor interactions in the melanoma TME that stratify survival of patients receiving anti-PD-1 therapy better than some recently published bulk transcriptomics-based methods. SIGNIFICANCE This work presents two new computational methods that can deconvolve a large collection of bulk tumor gene expression profiles into their respective cell type-specific gene expression profiles and identify cell type-specific ligand-receptor interactions predictive of response to immune-checkpoint blockade therapy. This article is highlighted in the In This Issue feature, p. 873.
Collapse
Affiliation(s)
- Kun Wang
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD
| | - Sushant Patkar
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD
- Department of Computer Science, University of Maryland, College Park, MD
| | - Joo Sang Lee
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD
- Department of Artificial Intelligence & Department of Precision Medicine, School of Medicine, Sungkyunkwan University, Suwon, Republic of Korea
| | - E. Michael Gertz
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD
| | - Welles Robinson
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD
- Department of Computer Science, University of Maryland, College Park, MD
| | - Fiorella Schischlik
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD
| | - David R. Crawford
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD
| | | | - Eytan Ruppin
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD
| |
Collapse
|
26
|
Liu X, Zhao J, Xue L, Zhao T, Ding W, Han Y, Ye H. A comparison of transcriptome analysis methods with reference genome. BMC Genomics 2022; 23:232. [PMID: 35337265 PMCID: PMC8957167 DOI: 10.1186/s12864-022-08465-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 03/08/2022] [Indexed: 11/10/2022] Open
Abstract
Background The application of RNA-seq technology has become more extensive and the number of analysis procedures available has increased over the past years. Selecting an appropriate workflow has become an important issue for researchers in the field. Methods In our study, six popular analytical procedures/pipeline were compared using four RNA-seq datasets from mouse, human, rat, and macaque, respectively. The gene expression value, fold change of gene expression, and statistical significance were evaluated to compare the similarities and differences among the six procedures. qRT-PCR was performed to validate the differentially expressed genes (DEGs) from all six procedures. Results Cufflinks-Cuffdiff demands the highest computing resources and Kallisto-Sleuth demands the least. Gene expression values, fold change, p and q values of differential expression (DE) analysis are highly correlated among procedures using HTseq for quantification. For genes with medium expression abundance, the expression values determined using the different procedures were similar. Major differences in expression values come from genes with particularly high or low expression levels. HISAT2-StringTie-Ballgown is more sensitive to genes with low expression levels, while Kallisto-Sleuth may only be useful to evaluate genes with medium to high abundance. When the same thresholds for fold change and p value are chosen in DE analysis, StringTie-Ballgown produce the least number of DEGs, while HTseq-DESeq2, -edgeR or -limma generally produces more DEGs. The performance of Cufflinks-Cuffdiff and Kallisto-Sleuth varies in different datasets. For DEGs with medium expression levels, the biological verification rates were similar among all procedures. Conclusion Results are highly correlated among RNA-seq analysis procedures using HTseq for quantification. Difference in gene expression values mainly come from genes with particularly high or low expression levels. Moreover, biological validation rates of DEGs from all six procedures were similar for genes with medium expression levels. Investigators can choose analytical procedures according to their available computer resources, or whether genes of high or low expression levels are of interest. If computer resources are abundant, one can utilize multiple procedures to obtain the intersection of results to get the most reliable DEGs, or to obtain a combination of results to get a more comprehensive DE profile for transcriptomes. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08465-0.
Collapse
Affiliation(s)
- Xu Liu
- Department of Medical Genetics and Developmental Biology, School of Basic Medical Sciences, Capital Medical University, Beijing, China.,Beijing Key Laboratory of Neural Regeneration and Repair, Capital Medical University, Beijing, China
| | - Jialu Zhao
- Department of Medical Genetics and Developmental Biology, School of Basic Medical Sciences, Capital Medical University, Beijing, China.,Beijing Key Laboratory of Neural Regeneration and Repair, Capital Medical University, Beijing, China.,Monogenic Disease Research Center for Neurological Disorders, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.,China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Liting Xue
- Department of Medical Genetics and Developmental Biology, School of Basic Medical Sciences, Capital Medical University, Beijing, China.,Beijing Key Laboratory of Neural Regeneration and Repair, Capital Medical University, Beijing, China
| | - Tian Zhao
- Department of Medical Genetics and Developmental Biology, School of Basic Medical Sciences, Capital Medical University, Beijing, China.,Beijing Key Laboratory of Neural Regeneration and Repair, Capital Medical University, Beijing, China
| | - Wei Ding
- Department of Medical Genetics and Developmental Biology, School of Basic Medical Sciences, Capital Medical University, Beijing, China
| | - Yuying Han
- Department of Medical Genetics and Developmental Biology, School of Basic Medical Sciences, Capital Medical University, Beijing, China. .,Beijing Key Laboratory of Neural Regeneration and Repair, Capital Medical University, Beijing, China.
| | - Haihong Ye
- Department of Medical Genetics and Developmental Biology, School of Basic Medical Sciences, Capital Medical University, Beijing, China. .,Beijing Key Laboratory of Neural Regeneration and Repair, Capital Medical University, Beijing, China.
| |
Collapse
|
27
|
Functional annotation of regulatory elements in cattle genome reveals the roles of extracellular interaction and dynamic change of chromatin states in rumen development during weaning. Genomics 2022; 114:110296. [PMID: 35143887 DOI: 10.1016/j.ygeno.2022.110296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 12/20/2021] [Accepted: 02/01/2022] [Indexed: 12/24/2022]
Abstract
We profiled landscapes of bovine regulatory elements and explored dynamic changes of chromatin states in rumen development during weaning. The regulatory elements (15 chromatin states) and their coordinated activities in cattle were defined through genome-wide profiling of four histone modifications, CTCF-binding, DNA accessibility, DNA methylation, and transcriptome in rumen epithelial tissues. Each chromatin state presented specific enrichment for sequence ontology, methylation, trait-associated variants, transcription, gene expression-associated variants, selection signatures, and evolutionarily conserved elements. During weaning, weak enhancers and flanking active transcriptional start sites (TSS) were the most dynamic chromatin states and occurred in tandem with significant variations in gene expression and DNA methylation, significantly associated with stature, production, and reproduction economic traits. By comparing with in vitro cultured epithelial cells and in vivo rumen tissues, we showed the commonness and uniqueness of these results, especially the roles of cell interactions and mitochondrial activities in tissue development.
Collapse
|
28
|
Goll JB, Bosinger SE, Jensen TL, Walum H, Grimes T, Tharp GK, Natrajan MS, Blazevic A, Head RD, Gelber CE, Steenbergen KJ, Patel NB, Sanz P, Rouphael NG, Anderson EJ, Mulligan MJ, Hoft DF. The Vacc-SeqQC project: Benchmarking RNA-Seq for clinical vaccine studies. Front Immunol 2022; 13:1093242. [PMID: 36741404 PMCID: PMC9893923 DOI: 10.3389/fimmu.2022.1093242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 12/30/2022] [Indexed: 01/20/2023] Open
Abstract
Introduction Over the last decade, the field of systems vaccinology has emerged, in which high throughput transcriptomics and other omics assays are used to probe changes of the innate and adaptive immune system in response to vaccination. The goal of this study was to benchmark key technical and analytical parameters of RNA sequencing (RNA-seq) in the context of a multi-site, double-blind randomized vaccine clinical trial. Methods We collected longitudinal peripheral blood mononuclear cell (PBMC) samples from 10 subjects before and after vaccination with a live attenuated Francisella tularensis vaccine and performed RNA-Seq at two different sites using aliquots from the same sample to generate two replicate datasets (5 time points for 50 samples each). We evaluated the impact of (i) filtering lowly-expressed genes, (ii) using external RNA controls, (iii) fold change and false discovery rate (FDR) filtering, (iv) read length, and (v) sequencing depth on differential expressed genes (DEGs) concordance between replicate datasets. Using synthetic mRNA spike-ins, we developed a method for empirically establishing minimal read-count thresholds for maintaining fold change accuracy on a per-experiment basis. We defined a reference PBMC transcriptome by pooling sequence data and established the impact of sequencing depth and gene filtering on transcriptome representation. Lastly, we modeled statistical power to detect DEGs for a range of sample sizes, effect sizes, and sequencing depths. Results and Discussion Our results showed that (i) filtering lowly-expressed genes is recommended to improve fold-change accuracy and inter-site agreement, if possible guided by mRNA spike-ins (ii) read length did not have a major impact on DEG detection, (iii) applying fold-change cutoffs for DEG detection reduced inter-set agreement and should be used with caution, if at all, (iv) reduction in sequencing depth had a minimal impact on statistical power but reduced the identifiable fraction of the PBMC transcriptome, (v) after sample size, effect size (i.e. the magnitude of fold change) was the most important driver of statistical power to detect DEG. The results from this study provide RNA sequencing benchmarks and guidelines for planning future similar vaccine studies.
Collapse
Affiliation(s)
- Johannes B Goll
- Department of Biomedical Data Science and Bioinformatics, The Emmes Company, LLC, Rockville, MD, United States
| | - Steven E Bosinger
- Division of Microbiology & Immunology, Emory National Primate Research Center, Emory University, Atlanta, GA, United States.,Department of Pathology & Laboratory Medicine, School of Medicine, Emory University, Atlanta, GA, United States.,Emory NPRC Genomics Core, Emory National Primate Research Center, Emory University, Atlanta, GA, United States.,Emory Vaccine Center, Emory University School of Medicine, Atlanta, GA, United States
| | - Travis L Jensen
- Department of Biomedical Data Science and Bioinformatics, The Emmes Company, LLC, Rockville, MD, United States
| | - Hasse Walum
- Division of Microbiology & Immunology, Emory National Primate Research Center, Emory University, Atlanta, GA, United States
| | - Tyler Grimes
- Department of Biomedical Data Science and Bioinformatics, The Emmes Company, LLC, Rockville, MD, United States
| | - Gregory K Tharp
- Emory NPRC Genomics Core, Emory National Primate Research Center, Emory University, Atlanta, GA, United States
| | - Muktha S Natrajan
- Emory Vaccine Center, Emory University School of Medicine, Atlanta, GA, United States.,Hope Clinic of the Emory Vaccine Center, Emory University, Atlanta, GA, United States
| | - Azra Blazevic
- Division of Infectious Diseases, Allergy, and Immunology, Department of Internal Medicine, Saint Louis University School of Medicine, St. Louis, MO, United States
| | - Richard D Head
- McDonnell Genome Institute, Washington University, St. Louis, MO, United States
| | - Casey E Gelber
- Department of Biomedical Data Science and Bioinformatics, The Emmes Company, LLC, Rockville, MD, United States
| | - Kristen J Steenbergen
- Department of Biomedical Data Science and Bioinformatics, The Emmes Company, LLC, Rockville, MD, United States
| | - Nirav B Patel
- Emory NPRC Genomics Core, Emory National Primate Research Center, Emory University, Atlanta, GA, United States
| | - Patrick Sanz
- Office of Biodefense, Research Resources and Translational Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD, United States
| | - Nadine G Rouphael
- Emory Vaccine Center, Emory University School of Medicine, Atlanta, GA, United States.,Hope Clinic of the Emory Vaccine Center, Emory University, Atlanta, GA, United States.,Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine, Emory University, Atlanta, GA, United States
| | - Evan J Anderson
- Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine, Emory University, Atlanta, GA, United States.,Center for Childhood Infections and Vaccines (CCIV) of Children's Healthcare of Atlanta and Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, United States
| | - Mark J Mulligan
- Emory Vaccine Center, Emory University School of Medicine, Atlanta, GA, United States.,Hope Clinic of the Emory Vaccine Center, Emory University, Atlanta, GA, United States.,Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine, Emory University, Atlanta, GA, United States.,New York University Vaccine Center, New York, NY, United States
| | - Daniel F Hoft
- Division of Infectious Diseases, Allergy, and Immunology, Department of Internal Medicine, Saint Louis University School of Medicine, St. Louis, MO, United States.,Department of Molecular Microbiology & Immunology, Saint Louis University, St. Louis, MO, United States
| |
Collapse
|
29
|
Burks DJ, Azad RK. RNA-Seq Data Analysis Pipeline for Plants: Transcriptome Assembly, Alignment, and Differential Expression Analysis. Methods Mol Biol 2022; 2396:47-60. [PMID: 34786675 DOI: 10.1007/978-1-0716-1822-6_5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this chapter, we describe methods for analyzing RNA-Seq data, presented as a flow along a pipeline beginning with raw data from a sequencer and ending with an output of differentially expressed genes and their functional characterization. The first section covers de novo transcriptome assembly for organisms lacking reference genomes or for those interested in probing against the background of organism-specific transcriptomes assembled from RNA-Seq data. Section 2 covers both gene- and transcript-level quantifications, leading to the third and final section on differential expression analysis between two or more conditions. The pipeline starts with raw sequence reads, followed by quality assessment and preprocessing of the input data to ensure a robust estimate of the transcripts and their differential regulation. The preprocessed data can be inputted into the de novo transcriptome flow to assemble transcripts, functionally annotated using tools such as InterProScan or Blast2Go and then forwarded to differential expression analysis flow, or directly inputted into the differential expression analysis flow if a reference genome is available. An online repository containing sample data has also been made available, as well as custom Python scripts to modify the output of the programs within the pipeline for various downstream analyses.
Collapse
Affiliation(s)
- David J Burks
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, USA.
- Department of Mathematics, University of North Texas, Denton, TX, USA.
| |
Collapse
|
30
|
Ujifuku K, Morofuji Y, Masumoto H. RNA Sequencing Data Analysis on the Maser Platform and the Tag-Count Comparison Graphical User Interface. Methods Mol Biol 2022; 2535:157-170. [PMID: 35867230 DOI: 10.1007/978-1-0716-2513-2_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The RNA sequencing (RNA-seq) process that allows for comprehensive transcriptome analysis has become increasingly simple. Analysis and interpretation of RNA-seq output data are indispensable for research, but bioinformatics experts are not always available to assist. Currently, however, even a wet-lab specialist can perform the pipeline analysis of RNA-seq described in this chapter using the Maser platform and the Tag-Count Comparison Graphical User Interface (TCC-GUI). These are free of charge for scientific use.
Collapse
Affiliation(s)
- Kenta Ujifuku
- Department of Neurosurgery, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan.
| | - Yoichi Morofuji
- Department of Neurosurgery, Nagasaki University Hospital, Nagasaki, Japan
| | - Hiroshi Masumoto
- Biomedical research support center, Nagasaki University School of Medicine, Nagasaki, Japan
| |
Collapse
|
31
|
Mumtaz PT, Taban Q, Bhat B, Ahmad SM, Dar MA, Kashoo ZA, Ganie NA, Shah RA. Expression of lncRNAs in response to bacterial infections of goat mammary epithelial cells reveals insights into mammary gland diseases. Microb Pathog 2021; 162:105367. [PMID: 34963641 DOI: 10.1016/j.micpath.2021.105367] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 12/16/2021] [Accepted: 12/17/2021] [Indexed: 10/19/2022]
Abstract
Mastitis or inflammation of the mammary gland is a highly economic and deadly alarming disease for the dairy sector as well as policymakers caused by microbial infection. Transcriptomic and proteomic approaches have been widely employed to identify the underlying molecular mechanisms of bacterial infections in the mammary gland. Numerous differentially expressed mRNAs, miRNAs, and proteins together with their associated signaling pathways have been identified during bacterial infection, paving the way for analysis of their biological functions. Long noncoding RNAs (lncRNAs) are important regulators of multiple biological processes. However, little is known regarding their role in bacterial infection in mammary epithelial cells. Hence, RNA-sequencing was performed by infecting primary mammary epithelial cells (pMECs) with both gram-negative (E. coli) and gram-positive bacteria (S. aureus). Using stringent pipeline, a set of 1957 known and 1175 novel lncRNAs were identified, among which, 112 lncRNAs were found differentially expressed in bacteria challenged PMECs compared with the control. Additionally, potential targets of the lncRNAs were predicted in cis- and trans-configuration. KEGG analysis revealed that DE lncRNAs were associated with at least 15 immune-related pathways. Therefore, our study revealed that bacterial challenge triggers the expression of lncRNAs associated with immune response and defense mechanisms in goat mammary epithelial cells.
Collapse
Affiliation(s)
- Peerzada Tajamul Mumtaz
- Division of Animal Biotechnology, Faculty of Veterinary Sciences and Animal Husbandry, Shuhama, SKUAST-K, India; Department of Biochemistry, School of Life Sciences Jaipur National University, India
| | - Qamar Taban
- Division of Animal Biotechnology, Faculty of Veterinary Sciences and Animal Husbandry, Shuhama, SKUAST-K, India
| | - Basharat Bhat
- Division of Animal Biotechnology, Faculty of Veterinary Sciences and Animal Husbandry, Shuhama, SKUAST-K, India
| | - Syed Mudasir Ahmad
- Division of Animal Biotechnology, Faculty of Veterinary Sciences and Animal Husbandry, Shuhama, SKUAST-K, India.
| | - Mashooq Ahmad Dar
- Division of Animal Biotechnology, Faculty of Veterinary Sciences and Animal Husbandry, Shuhama, SKUAST-K, India
| | - Zahid Amin Kashoo
- Division of Veterinary Microbiology, Faculty of Veterinary Sciences and Animal Husbandry, Shuhama, SKUAST-K, India
| | - Nazir A Ganie
- Division of Animal Biotechnology, Faculty of Veterinary Sciences and Animal Husbandry, Shuhama, SKUAST-K, India
| | - Riaz Ahmad Shah
- Division of Animal Biotechnology, Faculty of Veterinary Sciences and Animal Husbandry, Shuhama, SKUAST-K, India
| |
Collapse
|
32
|
Lahens NF, Brooks TG, Sarantopoulou D, Nayak S, Lawrence C, Mrčela A, Srinivasan A, Schug J, Hogenesch JB, Barash Y, Grant GR. CAMPAREE: a robust and configurable RNA expression simulator. BMC Genomics 2021; 22:692. [PMID: 34563123 PMCID: PMC8467241 DOI: 10.1186/s12864-021-07934-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 08/17/2021] [Indexed: 11/10/2022] Open
Abstract
Background The accurate interpretation of RNA-Seq data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms. Simulated datasets are an invaluable tool to accurately assess the performance of RNA-Seq analysis methods. However, existing RNA-Seq simulators focus on modeling the technical biases and artifacts of sequencing, rather than on simulating the original RNA samples. A first step in simulating RNA-Seq is to simulate RNA. Results To fill this need, we developed the Configurable And Modular Program Allowing RNA Expression Emulation (CAMPAREE), a simulator using empirical data to simulate diploid RNA samples at the level of individual molecules. We demonstrated CAMPAREE’s use for generating idealized coverage plots from real data, and for adding the ability to generate allele-specific data to existing RNA-Seq simulators that do not natively support this feature. Conclusions Separating input sample modeling from library preparation/sequencing offers added flexibility for both users and developers to mix-and-match different sample and sequencing simulators to suit their specific needs. Furthermore, the ability to maintain sample and sequencing simulators independently provides greater agility to incorporate new biological findings about transcriptomics and new developments in sequencing technologies. Additionally, by simulating at the level of individual molecules, CAMPAREE has the potential to model molecules transcribed from the same genes as a heterogeneous population of transcripts with different states of degradation and processing (splicing, editing, etc.). CAMPAREE was developed in Python, is open source, and freely available at https://github.com/itmat/CAMPAREE. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07934-2.
Collapse
Affiliation(s)
- Nicholas F Lahens
- The Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Thomas G Brooks
- The Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Dimitra Sarantopoulou
- The Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Present address: National Institute on Aging, National Institutes of Health, Baltimore, Maryland, USA
| | - Soumyashant Nayak
- Statistics and Mathematics Unit, Indian Statistical Institute, Bengaluru, Karnataka, India
| | - Cris Lawrence
- The Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Antonijo Mrčela
- The Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Anand Srinivasan
- Perelman School of Medicine, Enterprise Research Applications and High Performance Computing, Penn Medicine Academic Computing Services, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jonathan Schug
- The Institute for Diabetes, Obesity and Metabolism, The Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - John B Hogenesch
- Division of Human Genetics, Department of Pediatrics, Center for Chronobiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Yoseph Barash
- The Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Gregory R Grant
- The Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA. .,The Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
| |
Collapse
|
33
|
Creason A, Haan D, Dang K, Chiotti KE, Inkman M, Lamb A, Yu T, Hu Y, Norman TC, Buchanan A, van Baren MJ, Spangler R, Rollins MR, Spellman PT, Rozanov D, Zhang J, Maher CA, Caloian C, Watson JD, Uhrig S, Haas BJ, Jain M, Akeson M, Ahsen ME, Stolovitzky G, Guinney J, Boutros PC, Stuart JM, Ellrott K. A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery. Cell Syst 2021; 12:827-838.e5. [PMID: 34146471 PMCID: PMC8376800 DOI: 10.1016/j.cels.2021.05.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 09/15/2020] [Accepted: 05/25/2021] [Indexed: 02/03/2023]
Abstract
The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Allison Creason
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - David Haan
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Kami E Chiotti
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Matthew Inkman
- The Genome Institute, Washington University School of Medicine, 4444 Forest Park Avenue, St. Louis, MO 63110, USA
| | | | | | - Yin Hu
- Sage Bionetworks, Seattle, WA, USA
| | | | - Alex Buchanan
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Marijke J van Baren
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Spangler
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - M Rick Rollins
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Paul T Spellman
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Dmitri Rozanov
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA
| | - Jin Zhang
- The Genome Institute, Washington University School of Medicine, 4444 Forest Park Avenue, St. Louis, MO 63110, USA
| | - Christopher A Maher
- The Genome Institute, Washington University School of Medicine, 4444 Forest Park Avenue, St. Louis, MO 63110, USA
| | - Cristian Caloian
- Computational Biology, Ontario Institute for Cancer Research, Toronto, Canada
| | - John D Watson
- Computational Biology, Ontario Institute for Cancer Research, Toronto, Canada
| | - Sebastian Uhrig
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ) and Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Brian J Haas
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Miten Jain
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark Akeson
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mehmet Eren Ahsen
- Icahn School of Medicine at Mount Sinai, Department of Genetics and Genomic Sciences, One Gustave Levy Place, New York, NY 1498, USA
| | - Gustavo Stolovitzky
- Icahn School of Medicine at Mount Sinai, Department of Genetics and Genomic Sciences, One Gustave Levy Place, New York, NY 1498, USA; IBM T.J. Watson Research Center, 1101 Kitchawan Road, Route 134, Yorktown Heights, NY 10598, USA
| | | | - Paul C Boutros
- Computational Biology, Ontario Institute for Cancer Research, Toronto, Canada; Departments of Medical Biophysics and Pharmacology & Toxicology, University of Toronto, Toronto, Canada; Departments of Human Genetics and Urology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Joshua M Stuart
- Biomolecular Engineering and UC Santa Cruz Genome Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kyle Ellrott
- Biomedical Engineering, Oregon Health and Science University, Portland, OR 97239, USA.
| |
Collapse
|
34
|
Singh N. Role of mammalian long non-coding RNAs in normal and neuro oncological disorders. Genomics 2021; 113:3250-3273. [PMID: 34302945 DOI: 10.1016/j.ygeno.2021.07.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/10/2021] [Accepted: 07/14/2021] [Indexed: 12/09/2022]
Abstract
Long non-coding RNAs (lncRNAs) are expressed at lower levels than protein-coding genes but have a crucial role in gene regulation. LncRNA is distinct, they are being transcribed using RNA polymerase II, and their functionality depends on subcellular localization. Depending on their niche, they specifically interact with DNA, RNA, and proteins and modify chromatin function, regulate transcription at various stages, forms nuclear condensation bodies and nucleolar organization. lncRNAs may also change the stability and translation of cytoplasmic mRNAs and hamper signaling pathways. Thus, lncRNAs affect the physio-pathological states and lead to the development of various disorders, immune responses, and cancer. To date, ~40% of lncRNAs have been reported in the nervous system (NS) and are involved in the early development/differentiation of the NS to synaptogenesis. LncRNA expression patterns in the most common adult and pediatric tumor suggest them as potential biomarkers and provide a rationale for targeting them pharmaceutically. Here, we discuss the mechanisms of lncRNA synthesis, localization, and functions in transcriptional, post-transcriptional, and other forms of gene regulation, methods of lncRNA identification, and their potential therapeutic applications in neuro oncological disorders as explained by molecular mechanisms in other malignant disorders.
Collapse
Affiliation(s)
- Neetu Singh
- Molecular Biology Unit, Department of Centre for Advance Research, King George's Medical University, Lucknow, Uttar Pradesh 226 003, India.
| |
Collapse
|
35
|
Cuomo ASE, Alvari G, Azodi CB, McCarthy DJ, Bonder MJ. Optimizing expression quantitative trait locus mapping workflows for single-cell studies. Genome Biol 2021; 22:188. [PMID: 34167583 PMCID: PMC8223300 DOI: 10.1186/s13059-021-02407-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 06/09/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) has enabled the unbiased, high-throughput quantification of gene expression specific to cell types and states. With the cost of scRNA-seq decreasing and techniques for sample multiplexing improving, population-scale scRNA-seq, and thus single-cell expression quantitative trait locus (sc-eQTL) mapping, is increasingly feasible. Mapping of sc-eQTL provides additional resolution to study the regulatory role of common genetic variants on gene expression across a plethora of cell types and states and promises to improve our understanding of genetic regulation across tissues in both health and disease. RESULTS While previously established methods for bulk eQTL mapping can, in principle, be applied to sc-eQTL mapping, there are a number of open questions about how best to process scRNA-seq data and adapt bulk methods to optimize sc-eQTL mapping. Here, we evaluate the role of different normalization and aggregation strategies, covariate adjustment techniques, and multiple testing correction methods to establish best practice guidelines. We use both real and simulated datasets across single-cell technologies to systematically assess the impact of these different statistical approaches. CONCLUSION We provide recommendations for future single-cell eQTL studies that can yield up to twice as many eQTL discoveries as default approaches ported from bulk studies.
Collapse
Affiliation(s)
- Anna S E Cuomo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
| | - Giordano Alvari
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Christina B Azodi
- St. Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
- University of Melbourne, Parkville, Victoria, Australia
| | - Davis J McCarthy
- St. Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia.
- University of Melbourne, Parkville, Victoria, Australia.
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
| |
Collapse
|
36
|
Hu Y, Fang L, Chen X, Zhong JF, Li M, Wang K. LIQA: long-read isoform quantification and analysis. Genome Biol 2021; 22:182. [PMID: 34140043 PMCID: PMC8212471 DOI: 10.1186/s13059-021-02399-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 06/04/2021] [Indexed: 11/10/2022] Open
Abstract
Long-read RNA sequencing (RNA-seq) technologies can sequence full-length transcripts, facilitating the exploration of isoform-specific gene expression over short-read RNA-seq. We present LIQA to quantify isoform expression and detect differential alternative splicing (DAS) events using long-read direct mRNA sequencing or cDNA sequencing data. LIQA incorporates base pair quality score and isoform-specific read length information in a survival model to assign different weights across reads, and uses an expectation-maximization algorithm for parameter estimation. We apply LIQA to long-read RNA-seq data from the Universal Human Reference, acute myeloid leukemia, and esophageal squamous epithelial cells and demonstrate its high accuracy in profiling alternative splicing events.
Collapse
Affiliation(s)
- Yu Hu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Xuelian Chen
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Jiang F Zhong
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
37
|
Sarantopoulou D, Brooks TG, Nayak S, Mrčela A, Lahens NF, Grant GR. Comparative evaluation of full-length isoform quantification from RNA-Seq. BMC Bioinformatics 2021; 22:266. [PMID: 34034652 PMCID: PMC8145802 DOI: 10.1186/s12859-021-04198-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 05/16/2021] [Indexed: 11/18/2022] Open
Abstract
Background Full-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and has been an area of active development since the beginning. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are short. Results Here we use simulated benchmarking data that reflects many properties of real data, including polymorphisms, intron signal and non-uniform coverage, allowing for systematic comparative analyses of isoform quantification accuracy and its impact on differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a simple approach is included as a baseline control. Conclusions Salmon, kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform dramatically better than the simple approach. We determine the structural parameters with the greatest impact on quantification accuracy to be length and sequence compression complexity and not so much the number of isoforms. The effect of incomplete annotation on performance is also investigated. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification and isoform level DE should still be employed selectively. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04198-1.
Collapse
Affiliation(s)
- Dimitra Sarantopoulou
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.,National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Soumyashant Nayak
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA. .,Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
38
|
Davies P, Jones M, Liu J, Hebenstreit D. Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision. Brief Bioinform 2021; 22:6265204. [PMID: 33959753 PMCID: PMC8574610 DOI: 10.1093/bib/bbab148] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 03/10/2021] [Accepted: 03/26/2021] [Indexed: 12/29/2022] Open
Abstract
RNA-seq, including single cell RNA-seq (scRNA-seq), is plagued by insufficient sensitivity and lack of precision. As a result, the full potential of (sc)RNA-seq is limited. Major factors in this respect are the presence of global bias in most datasets, which affects detection and quantitation of RNA in a length-dependent fashion. In particular, scRNA-seq is affected by technical noise and a high rate of dropouts, where the vast majority of original transcripts is not converted into sequencing reads. We discuss these biases origins and implications, bioinformatics approaches to correct for them, and how biases can be exploited to infer characteristics of the sample preparation process, which in turn can be used to improve library preparation.
Collapse
Affiliation(s)
- Philip Davies
- Daniel Hebenstreit's Research Group University of Warwick, CV4 7AL Coventry, UK
| | - Matt Jones
- Daniel Hebenstreit's Research Group University of Warwick, CV4 7AL Coventry, UK
| | - Juntai Liu
- Physics Department, University of Warwick, CV4 7AL Coventry, UK
| | | |
Collapse
|
39
|
Overbey EG, Saravia-Butler AM, Zhang Z, Rathi KS, Fogle H, da Silveira WA, Barker RJ, Bass JJ, Beheshti A, Berrios DC, Blaber EA, Cekanaviciute E, Costa HA, Davin LB, Fisch KM, Gebre SG, Geniza M, Gilbert R, Gilroy S, Hardiman G, Herranz R, Kidane YH, Kruse CP, Lee MD, Liefeld T, Lewis NG, McDonald JT, Meller R, Mishra T, Perera IY, Ray S, Reinsch SS, Rosenthal SB, Strong M, Szewczyk NJ, Tahimic CG, Taylor DM, Vandenbrink JP, Villacampa A, Weging S, Wolverton C, Wyatt SE, Zea L, Costes SV, Galazka JM. NASA GeneLab RNA-seq consensus pipeline: standardized processing of short-read RNA-seq data. iScience 2021; 24:102361. [PMID: 33870146 PMCID: PMC8044432 DOI: 10.1016/j.isci.2021.102361] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 10/30/2020] [Accepted: 03/23/2021] [Indexed: 12/15/2022] Open
Abstract
With the development of transcriptomic technologies, we are able to quantify precise changes in gene expression profiles from astronauts and other organisms exposed to spaceflight. Members of NASA GeneLab and GeneLab-associated analysis working groups (AWGs) have developed a consensus pipeline for analyzing short-read RNA-sequencing data from spaceflight-associated experiments. The pipeline includes quality control, read trimming, mapping, and gene quantification steps, culminating in the detection of differentially expressed genes. This data analysis pipeline and the results of its execution using data submitted to GeneLab are now all publicly available through the GeneLab database. We present here the full details and rationale for the construction of this pipeline in order to promote transparency, reproducibility, and reusability of pipeline data; to provide a template for data processing of future spaceflight-relevant datasets; and to encourage cross-analysis of data from other databases with the data available in GeneLab.
Collapse
Affiliation(s)
- Eliah G. Overbey
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Amanda M. Saravia-Butler
- Logyx, LLC, Mountain View, CA 94043, USA
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Zhe Zhang
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Komal S. Rathi
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Homer Fogle
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
- The Bionetics Corporation, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Willian A. da Silveira
- Institute for Global Food Security (IGFS) & School of Biological Sciences, Queen's University Belfast, Belfast, UK
| | - Richard J. Barker
- Department of Botany, University of Wisconsin, Madison, WI 53706, USA
| | - Joseph J. Bass
- MRC Versus Arthritis Centre for Musculoskeletal Ageing Research, Royal Derby Hospital, University of Nottingham & National Institute for Health Research Nottingham Biomedical Research Centre, Derby DE22 3DT, UK
| | - Afshin Beheshti
- KBR, NASA Ames Research Center, Moffett Field, CA 94035, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Daniel C. Berrios
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Elizabeth A. Blaber
- Center for Biotechnology and Interdisciplinary Studies, Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Egle Cekanaviciute
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Helio A. Costa
- Departments of Pathology, and of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laurence B. Davin
- Institute of Biological Chemistry, Washington State University, Pullman, WA 99164, USA
| | - Kathleen M. Fisch
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Samrawit G. Gebre
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
- KBR, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | | | - Rachel Gilbert
- NASA Postdoctoral Program, Universities Space Research Association, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Simon Gilroy
- Department of Botany, University of Wisconsin, Madison, WI 53706, USA
| | - Gary Hardiman
- Institute for Global Food Security (IGFS) & School of Biological Sciences, Queen's University Belfast, Belfast, UK
- Medical University of South Carolina, Charleston, SC, USA
| | - Raúl Herranz
- Centro de Investigaciones Biológicas Margarita Salas (CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Yared H. Kidane
- Center for Pediatric Bone Biology and Translational Research, Texas Scottish Rite Hospital for Children, 2222 Welborn St., Dallas, TX 75219, USA
| | - Colin P.S. Kruse
- Los Alamos National Laboratory, Bioscience Division, Los Alamos, NM 87545, USA
| | - Michael D. Lee
- Exobiology Branch, NASA Ames Research Center, Mountain View, CA 94035, USA
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA
| | - Ted Liefeld
- Department of Medicine, University of California San Diego, San Diego, CA 92093, USA
| | - Norman G. Lewis
- Institute of Biological Chemistry, Washington State University, Pullman, WA 99164, USA
| | - J. Tyson McDonald
- Department of Radiation Medicine, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Robert Meller
- Department of Neurobiology and Pharmacology, Morehouse School of Medicine, Atlanta, GA 30310, USA
| | - Tejaswini Mishra
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Imara Y. Perera
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
| | - Shayoni Ray
- NGM Biopharmaceuticals, South San Francisco, CA 94080, USA
| | - Sigrid S. Reinsch
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Sara Brin Rosenthal
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Michael Strong
- National Jewish Health, Center for Genes, Environment, and Health, 1400 Jackson Street, Denver, CO 80206, USA
| | - Nathaniel J. Szewczyk
- Ohio Musculoskeletal and Neurological Institute and Department of Biomedical Sciences, Ohio University, Athens, OH 43147, USA
| | | | - Deanne M. Taylor
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia and the Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Alicia Villacampa
- Centro de Investigaciones Biológicas Margarita Salas (CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Silvio Weging
- Institute of Computer Science, Martin-Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, Halle 06120, Germany
| | - Chris Wolverton
- Department of Botany and Microbiology, Ohio Wesleyan University, Delaware, OH, USA
| | - Sarah E. Wyatt
- Department of Environmental and Plant Biology, Ohio University, Athens, OH 45701, USA
- Interdisciplinary Program in Molecular and Cellular Biology, Ohio University, Athens, OH 45701, USA
| | - Luis Zea
- BioServe Space Technologies, Aerospace Engineering Sciences Department, University of Colorado Boulder, Boulder 80303 USA
| | - Sylvain V. Costes
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Jonathan M. Galazka
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| |
Collapse
|
40
|
Marete A, Ariel O, Ibeagha-Awemu E, Bissonnette N. Identification of Long Non-coding RNA Isolated From Naturally Infected Macrophages and Associated With Bovine Johne's Disease in Canadian Holstein Using a Combination of Neural Networks and Logistic Regression. Front Vet Sci 2021; 8:639053. [PMID: 33969037 PMCID: PMC8100051 DOI: 10.3389/fvets.2021.639053] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 02/15/2021] [Indexed: 01/15/2023] Open
Abstract
Mycobacterium avium ssp. paratuberculosis (MAP) causes chronic enteritis in most ruminants. The pathogen MAP causes Johne's disease (JD), a chronic, incurable, wasting disease. Weight loss, diarrhea, and a gradual drop in milk production characterize the disease's clinical phase, culminating in death. Several studies have characterized long non-coding RNA (lncRNA) in bovine tissues, and a previous study characterizes (lncRNA) in macrophages infected with MAP in vitro. In this study, we aim to characterize the lncRNA in macrophages from cows naturally infected with MAP. From 15 herds, feces and blood samples were collected for each cow older than 24 months, twice yearly over 3–5 years. Paired samples were analyzed by fecal PCR and blood ELISA. We used RNA-seq data to study lncRNA in macrophages from 33 JD(+) and 33 JD(–) dairy cows. We performed RNA-seq analysis using the “new Tuxedo” suite. We characterized lncRNA using logistic regression and multilayered neural networks and used DESeq2 for differential expression analysis and Panther and Reactome classification systems for gene ontology (GO) analysis. The study identified 13,301 lncRNA, 605 of which were novel lncRNA. We found seven genes close to differentially expressed lncRNA, including CCDC174, ERI1, FZD1, TWSG1, ZBTB38, ZNF814, and ZSCAN4. None of the genes associated with susceptibility to JD have been cited in the literature. LncRNA target genes were significantly enriched for biological process GO terms involved in immunity and nucleic acid regulation. These include the MyD88 pathway (TLR5), GO:0043312 (neutrophil degranulation), GO:0002446 (neutrophil-mediated immunity), and GO:0042119 (neutrophil activation). These results identified lncRNA with potential roles in host immunity and potential candidate genes and pathways through which lncRNA might function in response to MAP infection.
Collapse
Affiliation(s)
- Andrew Marete
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC, Canada
| | - Olivier Ariel
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC, Canada.,Faculty of Science, Sherbrooke University, Sherbrooke, QC, Canada
| | - Eveline Ibeagha-Awemu
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC, Canada
| | - Nathalie Bissonnette
- Agriculture and Agri-Food Canada, Sherbrooke Research and Development Centre, Sherbrooke, QC, Canada
| |
Collapse
|
41
|
Li J, Wang Y, Wang L, Zhu J, Deng J, Tang R, Chen G. Integration of transcriptomic and proteomic analyses for finger millet [Eleusine coracana (L.) Gaertn.] in response to drought stress. PLoS One 2021; 16:e0247181. [PMID: 33596255 PMCID: PMC7888627 DOI: 10.1371/journal.pone.0247181] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 02/02/2021] [Indexed: 11/19/2022] Open
Abstract
Drought is one of the most significant abiotic stresses that affects the growth and productivity of crops worldwide. Finger millet [Eleusine coracana (L.) Gaertn.] is a C4 crop with high nutritional value and drought tolerance. However, the drought stress tolerance genetic mechanism of finger millet is largely unknown. In this study, transcriptomic (RNA-seq) and proteomic (iTRAQ) technologies were combined to investigate the finger millet samples treated with drought at different stages to determine drought response mechanism. A total of 80,602 differentially expressed genes (DEGs) and 3,009 differentially expressed proteins (DEPs) were identified in the transcriptomic and proteomic levels, respectively. An integrated analysis, which combined transcriptome and proteome data, revealed the presence of 1,305 DEPs were matched with the corresponding DEGs (named associated DEGs-DEPs) when comparing the control to samples which were treated with 19 days of drought (N1-N2 comparison group), 1,093 DEGs-DEPs between control and samples which underwent rehydration treatment for 36 hours (N1-N3 comparison group) and 607 DEGs-DEPs between samples which were treated with drought for 19 days and samples which underwent rehydration treatment for 36 hours (N2-N3 comparison group). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis identified 80 DEGs-DEPs in the N1-N2 comparison group, 49 DEGs-DEPs in the N1-N3 comparison group, and 59 DEGs-DEPs in the N2-N3 comparison group, which were associated with drought stress. The DEGs-DEPs which were drought tolerance-related were enriched in hydrolase activity, glycosyl bond formation, oxidoreductase activity, carbohydrate binding and biosynthesis of unsaturated fatty acids. Co-expression network analysis revealed two candidate DEGs-DEPs which were found to be centrally involved in drought stress response. These results suggested that the coordination of the DEGs-DEPs was essential to the enhanced drought tolerance response in the finger millet.
Collapse
Affiliation(s)
- Jiguang Li
- Agricultural College, Hunan Agricultural University, Changsha, Hunan, China
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan, China
| | - Yanlan Wang
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan, China
| | - Liqun Wang
- Agricultural College, Hunan Agricultural University, Changsha, Hunan, China
| | - Jianyu Zhu
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan, China
| | - Jing Deng
- Agricultural College, Hunan Agricultural University, Changsha, Hunan, China
| | - Rui Tang
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan, China
- * E-mail: (RT); (GC)
| | - Guanghui Chen
- Agricultural College, Hunan Agricultural University, Changsha, Hunan, China
- * E-mail: (RT); (GC)
| |
Collapse
|
42
|
Sastry AV, Hu A, Heckmann D, Poudel S, Kavvas E, Palsson BO. Independent component analysis recovers consistent regulatory signals from disparate datasets. PLoS Comput Biol 2021; 17:e1008647. [PMID: 33529205 PMCID: PMC7888660 DOI: 10.1371/journal.pcbi.1008647] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 02/17/2021] [Accepted: 12/18/2020] [Indexed: 01/03/2023] Open
Abstract
The availability of bacterial transcriptomes has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. With this understanding, we expanded our analysis to over 3,000 E. coli expression profiles and predicted three high-impact regulons that respond to oxidative stress, anaerobiosis, and antibiotic treatment. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets. Cells adapt to diverse environments by regulating gene expression. Genome-wide measurements of gene expression levels have exponentially increased in recent years, but successful integration and analysis of these datasets are limited. Recently, we showed that independent component analysis (ICA), a signal deconvolution algorithm, can separate a large bacterial gene expression dataset into groups of co-regulated genes. This previous study focused on data generated by a standardized pipeline and did not address whether ICA extracts the same quantitative co-expression signals across expression profiling platforms. In this study, we show that ICA finds similar co-regulation patterns underlying multiple gene expression datasets and can be used as a tool to integrate and interpret diverse datasets. Using a dataset containing over 3,000 expression profiles, we predicted three new regulons and characterized their activities. Since large, standardized expression datasets only exist for a few bacterial strains, these results broaden the possible applications of this tool to better understand transcriptional regulation across a wide range of microbes.
Collapse
Affiliation(s)
- Anand V. Sastry
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Alyssa Hu
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - David Heckmann
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Saugat Poudel
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Erol Kavvas
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
- * E-mail:
| |
Collapse
|
43
|
Videm P, Kumar A, Zharkov O, Grüning BA, Backofen R. ChiRA: an integrated framework for chimeric read analysis from RNA-RNA interactome and RNA structurome data. Gigascience 2021; 10:giaa158. [PMID: 33511995 PMCID: PMC7844879 DOI: 10.1093/gigascience/giaa158] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 11/26/2020] [Accepted: 12/15/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND With the advances in next-generation sequencing technologies, it is possible to determine RNA-RNA interaction and RNA structure predictions on a genome-wide level. The reads from these experiments usually are chimeric, with each arm generated from one of the interaction partners. Owing to short read lengths, often these sequenced arms ambiguously map to multiple locations. Thus, inferring the origin of these can be quite complicated. Here we present ChiRA, a generic framework for sensitive annotation of these chimeric reads, which in turn can be used to predict the sequenced hybrids. RESULTS Grouping reference loci on the basis of aligned common reads and quantification improved the handling of the multi-mapped reads in contrast to common strategies such as the selection of the longest hit or a random choice among all hits. On benchmark data ChiRA improved the number of correct alignments to the reference up to 3-fold. It is shown that the genes that belong to the common read loci share the same protein families or similar pathways. In published data, ChiRA could detect 3 times more new interactions compared to existing approaches. In addition, ChiRAViz can be used to visualize and filter large chimeric datasets intuitively. CONCLUSION ChiRA tool suite provides a complete analysis and visualization framework along with ready-to-use Galaxy workflows and tutorials for RNA-RNA interactome and structurome datasets. Common read loci built by ChiRA can rescue multi-mapped reads on paralogous genes without requiring any information on gene relations. We showed that ChiRA is sensitive in detecting new RNA-RNA interactions from published RNA-RNA interactome datasets.
Collapse
Affiliation(s)
- Pavankumar Videm
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Anup Kumar
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Oleg Zharkov
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Björn Andreas Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| |
Collapse
|
44
|
Caught between Two Genes: Accounting for Operonic Gene Structure Improves Prokaryotic RNA Sequencing Quantification. mSystems 2021; 6:6/1/e01256-20. [PMID: 33436519 PMCID: PMC7901486 DOI: 10.1128/msystems.01256-20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
RNA sequencing (RNA-seq) has matured into a reliable and low-cost assay for transcriptome profiling and has been deployed across a range of systems. The computational tool space for the analysis of RNA-seq data has kept pace with advances in sequencing. Yet tool development has largely centered around the human transcriptome. While eukaryotic and prokaryotic transcriptomes are similar, key differences in transcribed units limit the transfer of wet-lab and computational tools between the two domains. The article by M. Chung, R. S. Adkins, J. S. A. Mattick, K. R. Bradwell, et al. (mSystems 6:e00917-20, 2021, https://doi.org/10.1128/mSystems.00917-20), demonstrates that integrating prokaryote-specific strategies into existing RNA-seq analyses improves read quantification. Unlike in eukaryotes, polycistronic transcripts derived from operons lead to sequencing reads that span multiple neighboring genes. Chung et al. introduce FADU, a software tool that performs a correction for such reads and thereby improves read quantification and biological interpretation of prokaryotic RNA sequencing.
Collapse
|
45
|
Abstract
Computers are able to systematically exploit RNA-seq data allowing us to efficiently detect RNA editing sites in a genome-wide scale. This chapter introduces a very flexible computational framework for detecting RNA editing sites in plant organelles. This framework comprises three major steps: RNA-seq data processing, RNA read alignment, and RNA editing site detection. Each step is discussed in sufficient detail to be implemented by the reader. As a study case, the framework will be used with publicly available sequencing data to detect C-to-U RNA editing sites in the coding sequences of the mitochondrial genome of Nicotiana tabacum.
Collapse
Affiliation(s)
- Alejandro A Edera
- Facultad de Ciencias Agrarias, IBAM, Universidad Nacional de Cuyo, CONICET, Almirante Brown, Argentina.
| | - M Virginia Sanchez-Puerta
- Facultad de Ciencias Agrarias, IBAM, Universidad Nacional de Cuyo, CONICET, Almirante Brown, Argentina
- Facultad de Ciencias Exactas y Naturales, Universidad Nacional de Cuyo, Mendoza, Argentina
| |
Collapse
|
46
|
Abstract
MicroRNAs (miRNAs) regulate gene expression by binding to mRNAs. Consequently, they reduce target gene expression levels and expression variability, also known as "noise." Single-cell RNA sequencing (scRNA-seq) technology has been used to study miRNA and mRNA expression in single cells, and has demonstrated its strength in quantifying cell-to-cell variation. Here we describe how to investigate miRNA regulation using data with both mRNA and miRNA expression in single cell format. We show that miRNAs reduce the expression levels and also expression noise of target genes in single cells. Finally, we also discuss potential improvements in experimental design and computational analysis of scRNA-seq in order to reduce or partition the technical noise.
Collapse
Affiliation(s)
- Wendao Liu
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Noam Shomron
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
47
|
Liu Q, Hu Y, Stucky A, Fang L, Zhong JF, Wang K. LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing. BMC Genomics 2020; 21:793. [PMID: 33372596 PMCID: PMC7771079 DOI: 10.1186/s12864-020-07207-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 10/29/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. RESULTS In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. CONCLUSIONS In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF .
Collapse
Affiliation(s)
- Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Yu Hu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Andres Stucky
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Jiang F Zhong
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
48
|
Yu R, Yang W, Wang S. Performance evaluation of lossy quality compression algorithms for RNA-seq data. BMC Bioinformatics 2020; 21:321. [PMID: 32689929 PMCID: PMC7372835 DOI: 10.1186/s12859-020-03658-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 07/13/2020] [Indexed: 11/29/2022] Open
Abstract
Background Recent advancements in high-throughput sequencing technologies have generated an unprecedented amount of genomic data that must be stored, processed, and transmitted over the network for sharing. Lossy genomic data compression, especially of the base quality values of sequencing data, is emerging as an efficient way to handle this challenge due to its superior compression performance compared to lossless compression methods. Many lossy compression algorithms have been developed for and evaluated using DNA sequencing data. However, whether these algorithms can be used on RNA sequencing (RNA-seq) data remains unclear. Results In this study, we evaluated the impacts of lossy quality value compression on common RNA-seq data analysis pipelines including expression quantification, transcriptome assembly, and short variants detection using RNA-seq data from different species and sequencing platforms. Our study shows that lossy quality value compression could effectively improve RNA-seq data compression. In some cases, lossy algorithms achieved up to 1.2-3 times further reduction on the overall RNA-seq data size compared to existing lossless algorithms. However, lossy quality value compression could affect the results of some RNA-seq data processing pipelines, and hence its impacts to RNA-seq studies cannot be ignored in some cases. Pipelines using HISAT2 for alignment were most significantly affected by lossy quality value compression, while the effects of lossy compression on pipelines that do not depend on quality values, e.g., STAR-based expression quantification and transcriptome assembly pipelines, were not observed. Moreover, regardless of using either STAR or HISAT2 as the aligner, variant detection results were affected by lossy quality value compression, albeit to a lesser extent when STAR-based pipeline was used. Our results also show that the impacts of lossy quality value compression depend on the compression algorithms being used and the compression levels if the algorithm supports setting of multiple compression levels. Conclusions Lossy quality value compression can be incorporated into existing RNA-seq analysis pipelines to alleviate the data storage and transmission burdens. However, care should be taken on the selection of compression tools and levels based on the requirements of the downstream analysis pipelines to avoid introducing undesirable adverse effects on the analysis results.
Collapse
|
49
|
Khan Y, Hammarström D, Rønnestad BR, Ellefsen S, Ahmad R. Increased biological relevance of transcriptome analyses in human skeletal muscle using a model-specific pipeline. BMC Bioinformatics 2020; 21:548. [PMID: 33256614 PMCID: PMC7708234 DOI: 10.1186/s12859-020-03866-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 11/09/2020] [Indexed: 12/12/2022] Open
Abstract
Background Human skeletal muscle responds to weight-bearing exercise with significant inter-individual differences. Investigation of transcriptome responses could improve our understanding of this variation. However, this requires bioinformatic pipelines to be established and evaluated in study-specific contexts. Skeletal muscle subjected to mechanical stress, such as through resistance training (RT), accumulates RNA due to increased ribosomal biogenesis. When a fixed amount of total-RNA is used for RNA-seq library preparations, mRNA counts are thus assessed in different amounts of tissue, potentially invalidating subsequent conclusions. The purpose of this study was to establish a bioinformatic pipeline specific for analysis of RNA-seq data from skeletal muscles, to explore the effects of different normalization strategies and to identify genes responding to RT in a volume-dependent manner (moderate vs. low volume). To this end, we analyzed RNA-seq data derived from a twelve-week RT intervention, wherein 25 participants performed both low- and moderate-volume leg RT, allocated to the two legs in a randomized manner. Bilateral muscle biopsies were sampled from m. vastus lateralis before and after the intervention, as well as before and after the fifth training session (Week 2). Result Bioinformatic tools were selected based on read quality, observed gene counts, methodological variation between paired observations, and correlations between mRNA abundance and protein expression of myosin heavy chain family proteins. Different normalization strategies were compared to account for global changes in RNA to tissue ratio. After accounting for the amounts of muscle tissue used in library preparation, global mRNA expression increased by 43–53%. At Week 2, this was accompanied by dose-dependent increases for 21 genes in rested-state muscle, most of which were related to the extracellular matrix. In contrast, at Week 12, no readily explainable dose-dependencies were observed. Instead, traditional normalization and non-normalized models resulted in counterintuitive reverse dose-dependency for many genes. Overall, training led to robust transcriptome changes, with the number of differentially expressed genes ranging from 603 to 5110, varying with time point and normalization strategy. Conclusion Optimized selection of bioinformatic tools increases the biological relevance of transcriptome analyses from resistance-trained skeletal muscle. Moreover, normalization procedures need to account for global changes in rRNA and mRNA abundance.
Collapse
Affiliation(s)
- Yusuf Khan
- Department of Biotechnology, Inland Norway University of Applied Sciences, Holsetgata 22, 2317, Hamar, Norway.,Section for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied Sciences, Lillehammer, Norway
| | - Daniel Hammarström
- Section for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied Sciences, Lillehammer, Norway.,Swedish School of Sport and Health Sciences, Stockholm, Sweden
| | - Bent R Rønnestad
- Section for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied Sciences, Lillehammer, Norway
| | - Stian Ellefsen
- Section for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied Sciences, Lillehammer, Norway.,Innlandet Hospital Trust, Lillehammer, Norway
| | - Rafi Ahmad
- Department of Biotechnology, Inland Norway University of Applied Sciences, Holsetgata 22, 2317, Hamar, Norway. .,Faculty of Health Sciences, Institute of Clinical Medicine, UiT - The Arctic University of Norway, Hansine Hansens veg 18, 9019, Tromsø, Norway.
| |
Collapse
|
50
|
Su S, Tian L, Dong X, Hickey PF, Freytag S, Ritchie ME. CellBench: R/Bioconductor software for comparing single-cell RNA-seq analysis methods. Bioinformatics 2020; 36:2288-2290. [PMID: 31778143 PMCID: PMC7141847 DOI: 10.1093/bioinformatics/btz889] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 10/03/2019] [Accepted: 11/26/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required. Current benchmarks of single-cell methods are mostly implemented with ad-hoc code that is often difficult to reproduce or extend, and exhaustive manual coding of many combinations is infeasible in most instances. Therefore, new software is needed to manage pipeline benchmarking. RESULTS The CellBench R software facilitates method comparisons in either a task-centric or combinatorial way to allow pipelines of methods to be evaluated in an effective manner. CellBench automatically runs combinations of methods, provides facilities for measuring running time and delivers output in tabular form which is highly compatible with tidyverse R packages for summary and visualization. Our software has enabled comprehensive benchmarking of single-cell RNA-seq normalization, imputation, clustering, trajectory analysis and data integration methods using various performance metrics obtained from data with available ground truth. CellBench is also amenable to benchmarking other bioinformatics analysis tasks. AVAILABILITY AND IMPLEMENTATION Available from https://bioconductor.org/packages/CellBench.
Collapse
Affiliation(s)
- Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Luyi Tian
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Peter F Hickey
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Saskia Freytag
- Epigenetics and Genomics, Harry Perkins Institute of Medical Research, Nedlands, WA 6009, Australia
| | - Matthew E Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia.,School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|