1
|
Tang N, Zhou Q, Liu S, Sun H, Li H, Zhang Q, Hao J, Qi C. GSEA analysis identifies potential drug targets and their interaction networks in coronary microcirculation disorders. SLAS Technol 2024:100152. [PMID: 38823582 DOI: 10.1016/j.slast.2024.100152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 05/20/2024] [Accepted: 05/29/2024] [Indexed: 06/03/2024]
Abstract
Coronary microcirculation dysfunction (CMD) is one of the main causes of cardiovascular disease. Traditional treatment methods lack specificity, making it difficult to fully consider the differences in patient conditions and achieve effective treatment and intervention. The complexity and diversity of CMD require more standardized diagnosis and treatment plans to clarify the best treatment strategy and long-term outcomes. The existing treatment measures mainly focus on symptom management, including medication treatment, lifestyle intervention, and psychological therapy. However, the efficacy of these methods is not consistent for all patients, and the long-term efficacy is not yet clear. GSEA is a bioinformatics method used to interpret gene expression data, particularly for identifying the enrichment of predefined gene sets in gene expression data. In order to achieve personalized treatment and improve the quality and effectiveness of interventions, this article combined GSEA (Gene Set Enrichment Analysis) technology to conduct in-depth research on potential drug targets and their interaction networks in coronary microcirculation dysfunctions. This article first utilized the Coremine medical database, GeneCards, and DrugBank public databases to collect gene data. Then, filtering methods were used to preprocess the data, and GSEA was used to analyze the preprocessed gene expression data to identify and calculate pathways and enrichment scores related to CMD. Finally, protein sequence features were extracted through the calculation of autocorrelation features. To verify the effectiveness of GSEA, this article conducted experimental analysis from four aspects: precision, receiver operating characteristic (ROC) curve, correlation, and potential drug targets, and compared them with Gene Regulatory Networks (GRN) and Random Forest (RF) methods. The results showed that compared to the GRN and RF methods, the average precision of GSEA improved by 0.11. The conclusion indicated that GSEA helped identify and explore potential drug targets and their interaction networks, providing new ideas for personalized quality of CMD.
Collapse
Affiliation(s)
- Nan Tang
- Department of Cardiology, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu 221000, China
| | - Qiang Zhou
- Department of Cardiology, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu 221000, China
| | - Shuang Liu
- Department of Cardiology, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu 221000, China
| | - Huamei Sun
- Department of Cardiology, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu 221000, China
| | - Haoran Li
- Department of Cardiology, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu 221000, China
| | - Qingdui Zhang
- Department of Cardiology, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu 221000, China
| | - Ji Hao
- Department of Cardiology, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu 221000, China
| | - Chunmei Qi
- Department of Cardiology, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu 221000, China.
| |
Collapse
|
2
|
Candia J, Ferrucci L. Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks. PLoS One 2024; 19:e0302696. [PMID: 38753612 PMCID: PMC11098418 DOI: 10.1371/journal.pone.0302696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 04/09/2024] [Indexed: 05/18/2024] Open
Abstract
Pathway enrichment analysis is a ubiquitous computational biology method to interpret a list of genes (typically derived from the association of large-scale omics data with phenotypes of interest) in terms of higher-level, predefined gene sets that share biological function, chromosomal location, or other common features. Among many tools developed so far, Gene Set Enrichment Analysis (GSEA) stands out as one of the pioneering and most widely used methods. Although originally developed for microarray data, GSEA is nowadays extensively utilized for RNA-seq data analysis. Here, we quantitatively assessed the performance of a variety of GSEA modalities and provide guidance in the practical use of GSEA in RNA-seq experiments. We leveraged harmonized RNA-seq datasets available from The Cancer Genome Atlas (TCGA) in combination with large, curated pathway collections from the Molecular Signatures Database to obtain cancer-type-specific target pathway lists across multiple cancer types. We carried out a detailed analysis of GSEA performance using both gene-set and phenotype permutations combined with four different choices for the Kolmogorov-Smirnov enrichment statistic. Based on our benchmarks, we conclude that the classic/unweighted gene-set permutation approach offered comparable or better sensitivity-vs-specificity tradeoffs across cancer types compared with other, more complex and computationally intensive permutation methods. Finally, we analyzed other large cohorts for thyroid cancer and hepatocellular carcinoma. We utilized a new consensus metric, the Enrichment Evidence Score (EES), which showed a remarkable agreement between pathways identified in TCGA and those from other sources, despite differences in cancer etiology. This finding suggests an EES-based strategy to identify a core set of pathways that may be complemented by an expanded set of pathways for downstream exploratory analysis. This work fills the existing gap in current guidelines and benchmarks for the use of GSEA with RNA-seq data and provides a framework to enable detailed benchmarking of other RNA-seq-based pathway analysis tools.
Collapse
Affiliation(s)
- Julián Candia
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| | - Luigi Ferrucci
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| |
Collapse
|
3
|
Geistlinger L, Mirzayi C, Zohra F, Azhar R, Elsafoury S, Grieve C, Wokaty J, Gamboa-Tuz SD, Sengupta P, Hecht I, Ravikrishnan A, Gonçalves RS, Franzosa E, Raman K, Carey V, Dowd JB, Jones HE, Davis S, Segata N, Huttenhower C, Waldron L. BugSigDB captures patterns of differential abundance across a broad range of host-associated microbial signatures. Nat Biotechnol 2024; 42:790-802. [PMID: 37697152 PMCID: PMC11098749 DOI: 10.1038/s41587-023-01872-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 06/20/2023] [Indexed: 09/13/2023]
Abstract
The literature of human and other host-associated microbiome studies is expanding rapidly, but systematic comparisons among published results of host-associated microbiome signatures of differential abundance remain difficult. We present BugSigDB, a community-editable database of manually curated microbial signatures from published differential abundance studies accompanied by information on study geography, health outcomes, host body site and experimental, epidemiological and statistical methods using controlled vocabulary. The initial release of the database contains >2,500 manually curated signatures from >600 published studies on three host species, enabling high-throughput analysis of signature similarity, taxon enrichment, co-occurrence and coexclusion and consensus signatures. These data allow assessment of microbiome differential abundance within and across experimental conditions, environments or body sites. Database-wide analysis reveals experimental conditions with the highest level of consistency in signatures reported by independent studies and identifies commonalities among disease-associated signatures, including frequent introgression of oral pathobionts into the gut.
Collapse
Affiliation(s)
- Ludwig Geistlinger
- Center for Computational Biomedicine, Harvard Medical School, Boston, MA, USA
| | - Chloe Mirzayi
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Fatima Zohra
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Rimsha Azhar
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Shaimaa Elsafoury
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Clare Grieve
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Jennifer Wokaty
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Samuel David Gamboa-Tuz
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Pratyay Sengupta
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence, Indian Institute of Technology (IIT) Madras, Chennai, India
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India
| | | | - Aarthi Ravikrishnan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Rafael S Gonçalves
- Center for Computational Biomedicine, Harvard Medical School, Boston, MA, USA
| | - Eric Franzosa
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Harvard Chan Microbiome in Public Health Center, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Karthik Raman
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence, Indian Institute of Technology (IIT) Madras, Chennai, India
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India
| | - Vincent Carey
- Channing Division of Network Medicine, Mass General Brigham, Harvard Medical School, Boston, MA, USA
| | - Jennifer B Dowd
- Leverhulme Centre for Demographic Science, University of Oxford, Oxford, UK
| | - Heidi E Jones
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA
| | - Sean Davis
- Departments of Biomedical Informatics and Medicine, University of Colorado Anschutz School of Medicine, Denver, CO, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
- Istituto Europeo di Oncologia (IEO) IRCSS, Milan, Italy
| | - Curtis Huttenhower
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Harvard Chan Microbiome in Public Health Center, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Levi Waldron
- Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY, USA.
- Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY, USA.
- Department CIBIO, University of Trento, Trento, Italy.
| |
Collapse
|
4
|
Frost HR. Reconstruction Set Test (RESET): A computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error. PLoS Comput Biol 2024; 20:e1012084. [PMID: 38683883 PMCID: PMC11081506 DOI: 10.1371/journal.pcbi.1012084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 05/09/2024] [Accepted: 04/17/2024] [Indexed: 05/02/2024] Open
Abstract
We have developed a new, and analytically novel, single sample gene set testing method called Reconstruction Set Test (RESET). RESET quantifies gene set importance based on the ability of set genes to reconstruct values for all measured genes. RESET is realized using a computationally efficient randomized reduced rank reconstruction algorithm (available via the RESET R package on CRAN) that can effectively detect patterns of differential abundance and differential correlation for self-contained and competitive scenarios. As demonstrated using real and simulated scRNA-seq data, RESET provides superior performance at a lower computational cost relative to other single sample approaches.
Collapse
Affiliation(s)
- H. Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, United States of America
| |
Collapse
|
5
|
Peng C, Chen Q, Tan S, Shen X, Jiang C. Generalized reporter score-based enrichment analysis for omics data. Brief Bioinform 2024; 25:bbae116. [PMID: 38546324 PMCID: PMC10976918 DOI: 10.1093/bib/bbae116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/25/2024] [Accepted: 03/01/2024] [Indexed: 06/15/2024] Open
Abstract
Enrichment analysis contextualizes biological features in pathways to facilitate a systematic understanding of high-dimensional data and is widely used in biomedical research. The emerging reporter score-based analysis (RSA) method shows more promising sensitivity, as it relies on P-values instead of raw values of features. However, RSA cannot be directly applied to multi-group and longitudinal experimental designs and is often misused due to the lack of a proper tool. Here, we propose the Generalized Reporter Score-based Analysis (GRSA) method for multi-group and longitudinal omics data. A comparison with other popular enrichment analysis methods demonstrated that GRSA had increased sensitivity across multiple benchmark datasets. We applied GRSA to microbiome, transcriptome and metabolome data and discovered new biological insights in omics studies. Finally, we demonstrated the application of GRSA beyond functional enrichment using a taxonomy database. We implemented GRSA in an R package, ReporterScore, integrating with a powerful visualization module and updatable pathway databases, which is available on the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/ReporterScore). We believe that the ReporterScore package will be a valuable asset for broad biomedical research fields.
Collapse
Affiliation(s)
- Chen Peng
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310009, China
| | - Qiong Chen
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310009, China
| | - Shangjin Tan
- BGI Research, Wuhan, Hubei 430074, China
- BGI Research, Shenzhen, Guangdong 518083, China
| | - Xiaotao Shen
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Chao Jiang
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang 310009, China
- Center for Life Sciences, Shaoxing Institute, Zhejiang University, Shaoxing, Zhejiang 321000, China
| |
Collapse
|
6
|
Buzzao D, Castresana-Aguirre M, Guala D, Sonnhammer ELL. Benchmarking enrichment analysis methods with the disease pathway network. Brief Bioinform 2024; 25:bbae069. [PMID: 38436561 PMCID: PMC10939300 DOI: 10.1093/bib/bbae069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/10/2024] [Accepted: 02/03/2024] [Indexed: 03/05/2024] Open
Abstract
Enrichment analysis (EA) is a common approach to gain functional insights from genome-scale experiments. As a consequence, a large number of EA methods have been developed, yet it is unclear from previous studies which method is the best for a given dataset. The main issues with previous benchmarks include the complexity of correctly assigning true pathways to a test dataset, and lack of generality of the evaluation metrics, for which the rank of a single target pathway is commonly used. We here provide a generalized EA benchmark and apply it to the most widely used EA methods, representing all four categories of current approaches. The benchmark employs a new set of 82 curated gene expression datasets from DNA microarray and RNA-Seq experiments for 26 diseases, of which only 13 are cancers. In order to address the shortcomings of the single target pathway approach and to enhance the sensitivity evaluation, we present the Disease Pathway Network, in which related Kyoto Encyclopedia of Genes and Genomes pathways are linked. We introduce a novel approach to evaluate pathway EA by combining sensitivity and specificity to provide a balanced evaluation of EA methods. This approach identifies Network Enrichment Analysis methods as the overall top performers compared with overlap-based methods. By using randomized gene expression datasets, we explore the null hypothesis bias of each method, revealing that most of them produce skewed P-values.
Collapse
Affiliation(s)
- Davide Buzzao
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | | | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| |
Collapse
|
7
|
Lardelli M, Baer L, Hin N, Allen A, Pederson SM, Barthelson K. The Use of Zebrafish in Transcriptome Analysis of the Early Effects of Mutations Causing Early Onset Familial Alzheimer's Disease and Other Inherited Neurodegenerative Conditions. J Alzheimers Dis 2024; 99:S367-S381. [PMID: 37742650 DOI: 10.3233/jad-230522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
The degree to which non-human animals can be used to model Alzheimer's disease is a contentious issue, particularly as there is still widespread disagreement regarding the pathogenesis of this neurodegenerative dementia. The currently popular transgenic models are based on artificial expression of genes mutated in early onset forms of familial Alzheimer's disease (EOfAD). Uncertainty regarding the veracity of these models led us to focus on heterozygous, single mutations of endogenous genes (knock-in models) as these most closely resemble the genetic state of humans with EOfAD, and so incorporate the fewest assumptions regarding pathological mechanism. We have generated a number of lines of zebrafish bearing EOfAD-like and non-EOfAD-like mutations in genes equivalent to human PSEN1, PSEN2, and SORL1. To analyze the young adult brain transcriptomes of these mutants, we exploited the ability of zebrafish to produce very large families of simultaneous siblings composed of a variety of genotypes and raised in a uniform environment. This "intra-family" analysis strategy greatly reduced genetic and environmental "noise" thereby allowing detection of subtle changes in gene sets after bulk RNA sequencing of entire brains. Changes to oxidative phosphorylation were predicted for all EOfAD-like mutations in the three genes studied. Here we describe some of the analytical lessons learned in our program combining zebrafish genome editing with transcriptomics to understand the molecular pathologies of neurodegenerative disease.
Collapse
Affiliation(s)
- Michael Lardelli
- Alzheimer's Disease Genetics Laboratory, The University of Adelaide, Adelaide, SA, Australia
| | - Lachlan Baer
- Alzheimer's Disease Genetics Laboratory, The University of Adelaide, Adelaide, SA, Australia
| | - Nhi Hin
- Alkahest Inc., San Carlos, CA, USA
| | - Angel Allen
- Alzheimer's Disease Genetics Laboratory, The University of Adelaide, Adelaide, SA, Australia
| | - Stephen Martin Pederson
- Black Ochre Data Labs, Indigenous Genomics, Telethon Kinds Institute, Adelaide, SA, Australia
- John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
| | - Karissa Barthelson
- Alzheimer's Disease Genetics Laboratory, The University of Adelaide, Adelaide, SA, Australia
- Childhood Dementia Research Group, College of Medicine and Public Health, Flinders Health and Medical Research Institute, Flinders University, Bedford Park, SA, Australia
| |
Collapse
|
8
|
Jablonski KP, Beerenwinkel N. Coherent pathway enrichment estimation by modeling inter-pathway dependencies using regularized regression. Bioinformatics 2023; 39:btad522. [PMID: 37610338 PMCID: PMC10471899 DOI: 10.1093/bioinformatics/btad522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 07/04/2023] [Accepted: 08/22/2023] [Indexed: 08/24/2023] Open
Abstract
MOTIVATION Gene set enrichment methods are a common tool to improve the interpretability of gene lists as obtained, for example, from differential gene expression analyses. They are based on computing whether dysregulated genes are located in certain biological pathways more often than expected by chance. Gene set enrichment tools rely on pre-existing pathway databases such as KEGG, Reactome, or the Gene Ontology. These databases are increasing in size and in the number of redundancies between pathways, which complicates the statistical enrichment computation. RESULTS We address this problem and develop a novel gene set enrichment method, called pareg, which is based on a regularized generalized linear model and directly incorporates dependencies between gene sets related to certain biological functions, for example, due to shared genes, in the enrichment computation. We show that pareg is more robust to noise than competing methods. Additionally, we demonstrate the ability of our method to recover known pathways as well as to suggest novel treatment targets in an exploratory analysis using breast cancer samples from TCGA. AVAILABILITY AND IMPLEMENTATION pareg is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/pareg.html) as well as on https://github.com/cbg-ethz/pareg. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here.
Collapse
Affiliation(s)
- Kim Philipp Jablonski
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4058, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4058, Switzerland
| |
Collapse
|
9
|
Frost HR. Reconstruction Set Test (RESET): a computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535366. [PMID: 37066315 PMCID: PMC10104009 DOI: 10.1101/2023.04.03.535366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
We have developed a new, and analytically novel, single sample gene set testing method called Reconstruction Set Test (RESET). RESET quantifies gene set importance at both the sample-level and for the entire dataset based on the ability of set genes to reconstruct values for all measured genes. RESET addresses four important limitations of current techniques: 1) existing single sample methods are designed to detect mean differences and struggle to identify differential correlation patterns, 2) computationally efficient techniques are self-contained methods and cannot directly detect competitive scenarios where set genes differ from non-set genes in the same sample, 3) the scores generated by current methods can only be accurately compared across samples for a single set and not between sets, and 4) the computational performance of even the fastest existing methods be significant on very large datasets. RESET is realized using a computationally efficient randomized reduced rank reconstruction algorithm (available via the RESET R package on CRAN) that can effectively detect patterns of differential abundance and differential correlation for self-contained and competitive scenarios. As demonstrated using real and simulated scRNA-seq data, RESET provides superior accuracy at a lower computational cost relative to other single sample approaches.
Collapse
Affiliation(s)
- H. Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755
| |
Collapse
|
10
|
Angel-Velez D, Meese T, Hedia M, Fernandez-Montoro A, De Coster T, Pascottini OB, Van Nieuwerburgh F, Govaere J, Van Soom A, Pavani K, Smits K. Transcriptomics Reveal Molecular Differences in Equine Oocytes Vitrified before and after In Vitro Maturation. Int J Mol Sci 2023; 24:ijms24086915. [PMID: 37108081 PMCID: PMC10138936 DOI: 10.3390/ijms24086915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/27/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023] Open
Abstract
In the last decade, in vitro embryo production in horses has become an established clinical practice, but blastocyst rates from vitrified equine oocytes remain low. Cryopreservation impairs the oocyte developmental potential, which may be reflected in the messenger RNA (mRNA) profile. Therefore, this study aimed to compare the transcriptome profiles of metaphase II equine oocytes vitrified before and after in vitro maturation. To do so, three groups were analyzed with RNA sequencing: (1) fresh in vitro matured oocytes as a control (FR), (2) oocytes vitrified after in vitro maturation (VMAT), and (3) oocytes vitrified immature, warmed, and in vitro matured (VIM). In comparison with fresh oocytes, VIM resulted in 46 differentially expressed (DE) genes (14 upregulated and 32 downregulated), while VMAT showed 36 DE genes (18 in each category). A comparison of VIM vs. VMAT resulted in 44 DE genes (20 upregulated and 24 downregulated). Pathway analyses highlighted cytoskeleton, spindle formation, and calcium and cation ion transport and homeostasis as the main affected pathways in vitrified oocytes. The vitrification of in vitro matured oocytes presented subtle advantages in terms of the mRNA profile over the vitrification of immature oocytes. Therefore, this study provides a new perspective for understanding the impact of vitrification on equine oocytes and can be the basis for further improvements in the efficiency of equine oocyte vitrification.
Collapse
Affiliation(s)
- Daniel Angel-Velez
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
- Research Group in Animal Sciences-INCA-CES, Universidad CES, Medellin 050021, Colombia
| | - Tim Meese
- Laboratory for Pharmaceutical Biotechnology, Faculty of Pharmaceutical Science, Ghent University, 9000 Ghent, Belgium
| | - Mohamed Hedia
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
- Department of Theriogenology, Faculty of Veterinary Medicine, Cairo University, Giza 12211, Egypt
| | - Andrea Fernandez-Montoro
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Tine De Coster
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Osvaldo Bogado Pascottini
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Filip Van Nieuwerburgh
- Laboratory for Pharmaceutical Biotechnology, Faculty of Pharmaceutical Science, Ghent University, 9000 Ghent, Belgium
| | - Jan Govaere
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Ann Van Soom
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| | - Krishna Pavani
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
- Department for Reproductive Medicine, Ghent University Hospital, Corneel Heymanslaan 10, 9000 Gent, Belgium
| | - Katrien Smits
- Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, 9820 Merelbeke, Belgium
| |
Collapse
|
11
|
Zhao K, Rhee SY. Interpreting omics data with pathway enrichment analysis. Trends Genet 2023; 39:308-319. [PMID: 36750393 DOI: 10.1016/j.tig.2023.01.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 11/24/2022] [Accepted: 01/13/2023] [Indexed: 02/09/2023]
Abstract
Pathway enrichment analysis is indispensable for interpreting omics datasets and generating hypotheses. However, the foundations of enrichment analysis remain elusive to many biologists. Here, we discuss best practices in interpreting different types of omics data using pathway enrichment analysis and highlight the importance of considering intrinsic features of various types of omics data. We further explain major components that influence the outcomes of a pathway enrichment analysis, including defining background sets and choosing reference annotation databases. To improve reproducibility, we describe how to standardize reporting methodological details in publications. This article aims to serve as a primer for biologists to leverage the wealth of omics resources and motivate bioinformatics tool developers to enhance the power of pathway enrichment analysis.
Collapse
Affiliation(s)
- Kangmei Zhao
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| | - Seung Yon Rhee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94025, USA.
| |
Collapse
|
12
|
Whittaker CA, Kucukural A, Gates C, Wilkins OM, Bell GW, Hutchinson JN, Polson SW, Dragon J. Functional Annotation Routines Used by ABRF Bioinformatics Core Facilities - Observations, Comparisons, and Considerations. J Biomol Tech 2023; 34:3fc1f5fe.0b74b9db. [PMID: 37089874 PMCID: PMC10121236 DOI: 10.7171/3fc1f5fe.0b74b9db] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
The functional annotation of gene lists is a common analysis routine required for most genomics experiments, and bioinformatics core facilities must support these analyses. In contrast to methods such as the quantitation of RNA-Seq reads or differential expression analysis, our research group noted a lack of consensus in our preferred approaches to functional annotation. To investigate this observation, we selected 4 experiments that represent a range of experimental designs encountered by our cores and analyzed those data with 6 tools used by members of the Association of Biomolecular Resource Facilities (ABRF) Genomic Bioinformatics Research Group (GBIRG). To facilitate comparisons between tools, we focused on a single biological result for each experiment. These results were represented by a gene set, and we analyzed these gene sets with each tool considered in our study to map the result to the annotation categories presented by each tool. In most cases, each tool produces data that would facilitate identification of the selected biological result for each experiment. For the exceptions, Fisher's exact test parameters could be adjusted to detect the result. Because Fisher's exact test is used by many functional annotation tools, we investigated input parameters and demonstrate that, while background set size is unlikely to have a significant impact on the results, the numbers of differentially expressed genes in an annotation category and the total number of differentially expressed genes under consideration are both critical parameters that may need to be modified during analyses. In addition, we note that differences in the annotation categories tested by each tool, as well as the composition of those categories, can have a significant impact on results.
Collapse
Affiliation(s)
- Charles A. Whittaker
- Barbara K. Ostrom (1978) Bioinformatics and Computing Core FacilitySwanson Biotechnology CenterKoch Institute at the Massachusetts Institute of TechnologyCambridgeMassachusetts02139USA
| | - Alper Kucukural
- Bioinformatics CoreUniversity of Massachusetts Medical SchoolWorcesterMassachusetts01605USA
| | - Chris Gates
- BRCF Bioinformatics CoreUniversity of MichiganAnn ArborMichigan48109USA
| | - Owen Michael Wilkins
- Department of Biomedical Data ScienceGeisel School of Medicine at DartmouthHanoverNew Hampshire03755USA
- Dartmouth Cancer CenterDartmouth Hitchcock Medical CenterLebanonNew Hampshire03756USA
| | - George W. Bell
- Bioinformatics and Research ComputingWhitehead InstituteCambridgeMassachusetts02142USA
| | - John N. Hutchinson
- Harvard T.H. Chan School of Public HealthDepartment of BiostatisticsBostonMassachusetts02115USA
| | - Shawn W. Polson
- Bioinformatics CoreCenter for Bioinformatics and Computational BiologyUniversity of DelawareDelaware Biotechnology InstituteNewarkDelaware19713USA
| | - Julie Dragon
- Vermont Integrative Genomics Resource and Vermont Biomedical Research Network Bioinformatic CoreUniversity of VermontBurlingtonVermont05405USA
| |
Collapse
|
13
|
Ye J, Feng JW, Wu WX, Qi GF, Wang F, Hu J, Hong LZ, Liu SY, Jiang Y. Microarray profiling identifies hsa_circ_0082003 as a novel tumor promoter for papillary thyroid carcinoma. J Endocrinol Invest 2023; 46:509-522. [PMID: 36115894 DOI: 10.1007/s40618-022-01922-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 09/11/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND Circular RNAs (circRNAs) are non-coding RNAs that have essential regulatory roles in the development of various tumors. This study explored whether circRNAs are involved in the progression of papillary thyroid carcinoma (PTC). METHODS Differentially expressed circRNAs (DECs) in four pairs of PTC and matched normal thyroid tissues were screened using a circRNA microarray. The potential functions of dysregulated circRNAs were predicted by bioinformatic analyses. Reverse transcription quantitative polymerase chain reaction (RT-qPCR) was used to determine hsa_circ_0082003 expression in 80 pairs of PTC and matched normal thyroid tissues. Cell counting kit-8, colony formation, wound healing, and Transwell assays were performed to evaluate the biological functions of hsa_circ_0082003 in PTC cells. The role of hsa_circ_0082003 in PTC tumorigenesis in vivo was validated in nude mice. RESULTS In total, 3150 DECs (2317 upregulated and 833 downregulated) were identified. Pathway enrichment analyses indicated that the dysregulated circRNAs may play roles in PTC development. RT-qPCR validation demonstrated that hsa_circ_0082003 expression was significantly increased in PTC tissues and correlated with poor clinicopathological parameters. Receiver operating characteristic curve analysis showed that hsa_circ_0082003 had good performance for diagnosing PTC and judging whether it was accompanied by lymph node metastasis. Knockdown of hsa_circ_0082003 inhibited PTC cell proliferation, migration, and invasion. Tumor formation assays in vivo showed that downregulation of hsa_circ_0082003 significantly suppressed the growth of PTC. CONCLUSION Hsa_circ_0082003 may serve as a novel diagnostic biomarker and potential therapeutic target for PTC.
Collapse
Affiliation(s)
- J Ye
- Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China
| | - J-W Feng
- Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China
| | - W-X Wu
- Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China
| | - G-F Qi
- Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China
| | - F Wang
- Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China
| | - J Hu
- Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China
| | - L-Z Hong
- Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China
| | - S-Y Liu
- Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China
| | - Y Jiang
- Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China.
| |
Collapse
|
14
|
Lu Y, Pang Z, Xia J. Comprehensive investigation of pathway enrichment methods for functional interpretation of LC-MS global metabolomics data. Brief Bioinform 2023; 24:bbac553. [PMID: 36572652 PMCID: PMC9851290 DOI: 10.1093/bib/bbac553] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/31/2022] [Accepted: 11/15/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Global or untargeted metabolomics is widely used to comprehensively investigate metabolic profiles under various pathophysiological conditions such as inflammations, infections, responses to exposures or interactions with microbial communities. However, biological interpretation of global metabolomics data remains a daunting task. Recent years have seen growing applications of pathway enrichment analysis based on putative annotations of liquid chromatography coupled with mass spectrometry (LC-MS) peaks for functional interpretation of LC-MS-based global metabolomics data. However, due to intricate peak-metabolite and metabolite-pathway relationships, considerable variations are observed among results obtained using different approaches. There is an urgent need to benchmark these approaches to inform the best practices. RESULTS We have conducted a benchmark study of common peak annotation approaches and pathway enrichment methods in current metabolomics studies. Representative approaches, including three peak annotation methods and four enrichment methods, were selected and benchmarked under different scenarios. Based on the results, we have provided a set of recommendations regarding peak annotation, ranking metrics and feature selection. The overall better performance was obtained for the mummichog approach. We have observed that a ~30% annotation rate is sufficient to achieve high recall (~90% based on mummichog), and using semi-annotated data improves functional interpretation. Based on the current platforms and enrichment methods, we further propose an identifiability index to indicate the possibility of a pathway being reliably identified. Finally, we evaluated all methods using 11 COVID-19 and 8 inflammatory bowel diseases (IBD) global metabolomics datasets.
Collapse
Affiliation(s)
- Yao Lu
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
| | - Zhiqiang Pang
- Institute of Parasitology, McGill University, Quebec, Canada
| | - Jianguo Xia
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
- Institute of Parasitology, McGill University, Quebec, Canada
| |
Collapse
|
15
|
Chen JW, Shrestha L, Green G, Leier A, Marquez-Lago TT. The hitchhikers' guide to RNA sequencing and functional analysis. Brief Bioinform 2023; 24:bbac529. [PMID: 36617463 PMCID: PMC9851315 DOI: 10.1093/bib/bbac529] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/18/2022] [Accepted: 11/07/2022] [Indexed: 01/10/2023] Open
Abstract
DNA and RNA sequencing technologies have revolutionized biology and biomedical sciences, sequencing full genomes and transcriptomes at very high speeds and reasonably low costs. RNA sequencing (RNA-Seq) enables transcript identification and quantification, but once sequencing has concluded researchers can be easily overwhelmed with questions such as how to go from raw data to differential expression (DE), pathway analysis and interpretation. Several pipelines and procedures have been developed to this effect. Even though there is no unique way to perform RNA-Seq analysis, it usually follows these steps: 1) raw reads quality check, 2) alignment of reads to a reference genome, 3) aligned reads' summarization according to an annotation file, 4) DE analysis and 5) gene set analysis and/or functional enrichment analysis. Each step requires researchers to make decisions, and the wide variety of options and resulting large volumes of data often lead to interpretation challenges. There also seems to be insufficient guidance on how best to obtain relevant information and derive actionable knowledge from transcription experiments. In this paper, we explain RNA-Seq steps in detail and outline differences and similarities of different popular options, as well as advantages and disadvantages. We also discuss non-coding RNA analysis, multi-omics, meta-transcriptomics and the use of artificial intelligence methods complementing the arsenal of tools available to researchers. Lastly, we perform a complete analysis from raw reads to DE and functional enrichment analysis, visually illustrating how results are not absolute truths and how algorithmic decisions can greatly impact results and interpretation.
Collapse
Affiliation(s)
- Jiung-Wen Chen
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Lisa Shrestha
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - George Green
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Microbiology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| |
Collapse
|
16
|
Cousins H, Hall T, Guo Y, Tso L, Tzeng KTH, Cong L, Altman RB. Gene set proximity analysis: expanding gene set enrichment analysis through learned geometric embeddings, with drug-repurposing applications in COVID-19. Bioinformatics 2023; 39:btac735. [PMID: 36394254 PMCID: PMC9805577 DOI: 10.1093/bioinformatics/btac735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 09/27/2022] [Accepted: 11/16/2022] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION Gene set analysis methods rely on knowledge-based representations of genetic interactions in the form of both gene set collections and protein-protein interaction (PPI) networks. However, explicit representations of genetic interactions often fail to capture complex interdependencies among genes, limiting the analytic power of such methods. RESULTS We propose an extension of gene set enrichment analysis to a latent embedding space reflecting PPI network topology, called gene set proximity analysis (GSPA). Compared with existing methods, GSPA provides improved ability to identify disease-associated pathways in disease-matched gene expression datasets, while improving reproducibility of enrichment statistics for similar gene sets. GSPA is statistically straightforward, reducing to a version of traditional gene set enrichment analysis through a single user-defined parameter. We apply our method to identify novel drug associations with SARS-CoV-2 viral entry. Finally, we validate our drug association predictions through retrospective clinical analysis of claims data from 8 million patients, supporting a role for gabapentin as a risk factor and metformin as a protective factor for severe COVID-19. AVAILABILITY AND IMPLEMENTATION GSPA is available for download as a command-line Python package at https://github.com/henrycousins/gspa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Henry Cousins
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Taryn Hall
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Yinglong Guo
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Luke Tso
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Kathy T H Tzeng
- Optum Labs at UnitedHealth Group, Minneapolis, MN 55343, USA
| | - Le Cong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Russ B Altman
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
17
|
Liu Z, Gao J, Gu R, Shi Y, Hu H, Liu J, Huang J, Zhong C, Zhou W, Yang Y, Gong C. Comprehensive Analysis of Transcriptomics and Genetic Alterations Identifies Potential Mechanisms Underlying Anthracycline Therapy Resistance in Breast Cancer. Biomolecules 2022; 12:biom12121834. [PMID: 36551262 PMCID: PMC9775906 DOI: 10.3390/biom12121834] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 12/01/2022] [Accepted: 12/06/2022] [Indexed: 12/14/2022] Open
Abstract
Anthracycline is a mainstay of treatment for breast cancer patients because of its antitumor activity. However, anthracycline resistance is a critical barrier in treating breast cancer. Thus, it is of great importance to uncover the molecular mechanisms underlying anthracycline resistance in breast cancer. Herein, we integrated transcriptome data, genetic alterations data, and clinical data of The Cancer Genome Atlas (TCGA) to identify the molecular mechanisms involved in anthracycline resistance in breast cancer. Two hundred and four upregulated genes and 1376 downregulated genes were characterized between the anthracycline-sensitive and anthracycline-resistant groups. It was found that drug resistance-associated genes such as ABCB5, CYP1A1, and CYP4Z1 were significantly upregulated in the anthracycline-resistant group. The gene set enrichment analysis (GSEA) suggested that the P53 signaling pathway, DNA replication, cysteine, and methionine metabolism pathways were associated with anthracycline sensitivity. Somatic TP53 mutation was a common genetic abnormality observed in the anthracycline-sensitive group, while CDH1 mutation was presented in the anthracycline-resistant group. Immune infiltration patterns were extremely different between the anthracycline-sensitive and anthracycline-resistant groups. Immune-associated chemokines and cytokines, immune regulators, and human leukocyte antigen genes were significantly upregulated in the anthracycline-sensitive group. These results reveal potential molecular mechanisms associated with anthracycline resistance.
Collapse
Affiliation(s)
- Zihao Liu
- Breast Tumor Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
- Department of Breast and Thyroid Surgery, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen 518020, China
| | - Jingbo Gao
- Breast Tumor Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
| | - Ran Gu
- Breast Tumor Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
| | - Yu Shi
- Breast Tumor Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
| | - Hong Hu
- Department of Breast and Thyroid Surgery, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen 518020, China
| | - Jianlan Liu
- Department of Pathology, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen 518020, China
| | - Jiefeng Huang
- Department of Breast and Thyroid Surgery, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen 518020, China
| | - Caineng Zhong
- Department of Breast and Thyroid Surgery, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen 518020, China
| | - Wenbin Zhou
- Department of Breast and Thyroid Surgery, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen 518020, China
| | - Yaping Yang
- Breast Tumor Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
- Correspondence: (Y.Y.); or (C.G.)
| | - Chang Gong
- Breast Tumor Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
- Correspondence: (Y.Y.); or (C.G.)
| |
Collapse
|
18
|
Zhou CD, Pettersson A, Plym A, Tyekucheva S, Penney KL, Sesso HD, Kantoff PW, Mucci LA, Stopsack KH. Differences in Prostate Cancer Transcriptomes by Age at Diagnosis: Are Primary Tumors from Older Men Inherently Different? Cancer Prev Res (Phila) 2022; 15:815-825. [PMID: 36125434 PMCID: PMC9722523 DOI: 10.1158/1940-6207.capr-22-0212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 08/03/2022] [Accepted: 09/01/2022] [Indexed: 01/31/2023]
Abstract
Older age at diagnosis is consistently associated with worse clinical outcomes in prostate cancer. We sought to characterize gene expression profiles of prostate tumor tissue by age at diagnosis. We conducted a discovery analysis in The Cancer Genome Atlas prostate cancer dataset (n = 320; 29% of men >65 years at diagnosis), using linear regressions of age at diagnosis and mRNA expression and adjusting for TMPRSS2:ERG fusion status and race. This analysis identified 13 age-related candidate genes at FDR < 0.1, six of which were also found in an analysis additionally adjusted for Gleason score. We then validated the 13 age-related genes in a transcriptome study nested in the Health Professionals Follow-up Study and Physicians' Health Study (n = 374; 53% of men >65 years). Gene expression differences by age in the 13 candidate genes were directionally consistent, and age at diagnosis was weakly associated with the 13-gene score. However, the age-related genes were not consistently associated with risk of metastases and prostate cancer-specific death. Collectively, these findings argue against tumor genomic differences as a main explanation for age-related differences in prostate cancer prognosis. PREVENTION RELEVANCE Older age at diagnosis is consistently associated with worse clinical outcomes in prostate cancer. This study with independent discovery and validation sets and long-term follow-up suggests that prevention of lethal prostate cancer should focus on implementing appropriate screening, staging, and treatment among older men without expecting fundamentally different tumor biology.
Collapse
Affiliation(s)
- Charlie D. Zhou
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Andreas Pettersson
- Clinical Epidemiology Division, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden
| | - Anna Plym
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA,Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden,Department of Urology, Brigham and Women’s Hospital, Boston, MA, USA
| | - Svitlana Tyekucheva
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA,Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Kathryn L. Penney
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA,Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Howard D. Sesso
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA,Division of Preventative Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Philip W. Kantoff
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA,Convergent Therapeutics Inc., Cambridge, MA, USA
| | - Lorelei A. Mucci
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Konrad H. Stopsack
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA,Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
19
|
Zeng L, Yang K, Zhang T, Zhu X, Hao W, Chen H, Ge J. Research progress of single-cell transcriptome sequencing in autoimmune diseases and autoinflammatory disease: A review. J Autoimmun 2022; 133:102919. [PMID: 36242821 DOI: 10.1016/j.jaut.2022.102919] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 09/16/2022] [Accepted: 09/19/2022] [Indexed: 12/07/2022]
Abstract
Autoimmunity refers to the phenomenon that the body's immune system produces antibodies or sensitized lymphocytes to its own tissues to cause an immune response. Immune disorders caused by autoimmunity can mediate autoimmune diseases. Autoimmune diseases have complicated pathogenesis due to the many types of cells involved, and the mechanism is still unclear. The emergence of single-cell research technology can solve the problem that ordinary transcriptome technology cannot be accurate to cell type. It provides unbiased results through independent analysis of cells in tissues and provides more mRNA information for identifying cell subpopulations, which provides a novel approach to study disruption of immune tolerance and disturbance of pro-inflammatory pathways on a cellular basis. It may fundamentally change the understanding of molecular pathways in the pathogenesis of autoimmune diseases and develop targeted drugs. Single-cell transcriptome sequencing (scRNA-seq) has been widely applied in autoimmune diseases, which provides a powerful tool for demonstrating the cellular heterogeneity of tissues involved in various immune inflammations, identifying pathogenic cell populations, and revealing the mechanism of disease occurrence and development. This review describes the principles of scRNA-seq, introduces common sequencing platforms and practical procedures, and focuses on the progress of scRNA-seq in 41 autoimmune diseases, which include 9 systemic autoimmune diseases and autoinflammatory diseases (rheumatoid arthritis, systemic lupus erythematosus, etc.) and 32 organ-specific autoimmune diseases (5 Skin diseases, 3 Nervous system diseases, 4 Eye diseases, 2 Respiratory system diseases, 2 Circulatory system diseases, 6 Liver, Gallbladder and Pancreas diseases, 2 Gastrointestinal system diseases, 3 Muscle, Bones and joint diseases, 3 Urinary system diseases, 2 Reproductive system diseases). This review also prospects the molecular mechanism targets of autoimmune diseases from the multi-molecular level and multi-dimensional analysis combined with single-cell multi-omics sequencing technology (such as scRNA-seq, Single cell ATAC-seq and single cell immune group library sequencing), which provides a reference for further exploring the pathogenesis and marker screening of autoimmune diseases and autoimmune inflammatory diseases in the future.
Collapse
Affiliation(s)
- Liuting Zeng
- Department of Rheumatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, National Clinical Research Center for Dermatologic and Immunologic Diseases, State Key Laboratory of Complex Severe and Rare Diseases, Beijing, China.
| | - Kailin Yang
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China.
| | - Tianqing Zhang
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China
| | - Xiaofei Zhu
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China.
| | - Wensa Hao
- Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Hua Chen
- Department of Rheumatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, National Clinical Research Center for Dermatologic and Immunologic Diseases, State Key Laboratory of Complex Severe and Rare Diseases, Beijing, China.
| | - Jinwen Ge
- Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China; Hunan Academy of Chinese Medicine, Changsha, China.
| |
Collapse
|
20
|
Wieder C, Lai RPJ, Ebbels TMD. Single sample pathway analysis in metabolomics: performance evaluation and application. BMC Bioinformatics 2022; 23:481. [PMID: 36376837 PMCID: PMC9664704 DOI: 10.1186/s12859-022-05005-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/25/2022] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Single sample pathway analysis (ssPA) transforms molecular level omics data to the pathway level, enabling the discovery of patient-specific pathway signatures. Compared to conventional pathway analysis, ssPA overcomes the limitations by enabling multi-group comparisons, alongside facilitating numerous downstream analyses such as pathway-based machine learning. While in transcriptomics ssPA is a widely used technique, there is little literature evaluating its suitability for metabolomics. Here we provide a benchmark of established ssPA methods (ssGSEA, GSVA, SVD (PLAGE), and z-score) alongside the evaluation of two novel methods we propose: ssClustPA and kPCA, using semi-synthetic metabolomics data. We then demonstrate how ssPA can facilitate pathway-based interpretation of metabolomics data by performing a case-study on inflammatory bowel disease mass spectrometry data, using clustering to determine subtype-specific pathway signatures. RESULTS While GSEA-based and z-score methods outperformed the others in terms of recall, clustering/dimensionality reduction-based methods provided higher precision at moderate-to-high effect sizes. A case study applying ssPA to inflammatory bowel disease data demonstrates how these methods yield a much richer depth of interpretation than conventional approaches, for example by clustering pathway scores to visualise a pathway-based patient subtype-specific correlation network. We also developed the sspa python package (freely available at https://pypi.org/project/sspa/ ), providing implementations of all the methods benchmarked in this study. CONCLUSION This work underscores the value ssPA methods can add to metabolomic studies and provides a useful reference for those wishing to apply ssPA methods to metabolomics data.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, UK
| | - Rachel P J Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, UK
| | - Timothy M D Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
21
|
Jiménez‐Santos MJ, García‐Martín S, Fustero‐Torre C, Di Domenico T, Gómez‐López G, Al‐Shahrour F. Bioinformatics roadmap for therapy selection in cancer genomics. Mol Oncol 2022; 16:3881-3908. [PMID: 35811332 PMCID: PMC9627786 DOI: 10.1002/1878-0261.13286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/22/2022] [Accepted: 07/08/2022] [Indexed: 12/24/2022] Open
Abstract
Tumour heterogeneity is one of the main characteristics of cancer and can be categorised into inter- or intratumour heterogeneity. This heterogeneity has been revealed as one of the key causes of treatment failure and relapse. Precision oncology is an emerging field that seeks to design tailored treatments for each cancer patient according to epidemiological, clinical and omics data. This discipline relies on bioinformatics tools designed to compute scores to prioritise available drugs, with the aim of helping clinicians in treatment selection. In this review, we describe the current approaches for therapy selection depending on which type of tumour heterogeneity is being targeted and the available next-generation sequencing data. We cover intertumour heterogeneity studies and individual treatment selection using genomics variants, expression data or multi-omics strategies. We also describe intratumour dissection through clonal inference and single-cell transcriptomics, in each case providing bioinformatics tools for tailored treatment selection. Finally, we discuss how these therapy selection workflows could be integrated into the clinical practice.
Collapse
Affiliation(s)
| | | | - Coral Fustero‐Torre
- Bioinformatics UnitSpanish National Cancer Research Centre (CNIO)MadridSpain
| | - Tomás Di Domenico
- Bioinformatics UnitSpanish National Cancer Research Centre (CNIO)MadridSpain
| | - Gonzalo Gómez‐López
- Bioinformatics UnitSpanish National Cancer Research Centre (CNIO)MadridSpain
| | - Fátima Al‐Shahrour
- Bioinformatics UnitSpanish National Cancer Research Centre (CNIO)MadridSpain
| |
Collapse
|
22
|
Mishra BH, Sievänen H, Raitoharju E, Mononen N, Viikari J, Juonala M, Laaksonen M, Hutri-Kähönen N, Kähönen M, Raitakari OT, Lehtimäki T, Mishra PP. Gene set analysis of transcriptomics data identifies new biological processes associated with early markers of atherosclerosis but not with those of osteoporosis: Atherosclerosis-osteoporosis co/multimorbidity study in the Young Finns Study. Atherosclerosis 2022; 361:1-9. [PMID: 36252457 DOI: 10.1016/j.atherosclerosis.2022.10.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 10/06/2022] [Accepted: 10/06/2022] [Indexed: 12/15/2022]
Abstract
AIM We aimed at identifying the shared biological processes underlying atherosclerosis-osteoporosis co/multimorbidity. METHODS We performed gene set analysis (GSA) of whole-blood transcriptomic data to identify biological processes shared by the early markers of these two diseases. Early markers of diseases, carotid intima-media thickness (CIMT) for atherosclerosis and trabecular bone mineral density (BMD) from distal radius and tibia for osteoporosis, were used to categorize the study participants into cases and controls. Participants with high CIMT (>90th percentile) were defined as cases for subclinical atherosclerosis. Study population-based T-scores for BMD were calculated and T-score ≤ -1 was used for the definition of low BMD cases i.e., early indicator of osteoporosis. RESULTS We did not identify any gene sets jointly associated with early markers of atherosclerosis and osteoporosis. We identified three novel and replicated 234 gene sets significantly associated with high CIMT with false discovery rate (FDR) ≤ 0.01. Only two genes, both related to the immune system, were identified to be associated with high CIMT by traditional differential gene expression analysis. However, none of the studied gene sets or individual genes were significantly associated with tibial or radial BMD. The three novel CIMT associated gene sets contained genes involved in copper homeostasis, neural crest cell migration and nicotinate and nicotinamide metabolism. The 234 replicated gene sets in this study are related to the immune system, hypoxia and apoptosis, consistent with the existing literature on atherosclerosis. CONCLUSIONS This study identified novel biological processes associated with high CIMT but not with reduced BMD.
Collapse
Affiliation(s)
- Binisha H Mishra
- Department of Clinical Chemistry, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Department of Clinical Chemistry, Fimlab Laboratories, Tampere, Finland.
| | - Harri Sievänen
- The UKK Institute for Health Promotion Research, Tampere, Finland
| | - Emma Raitoharju
- Molecular Epidemiology, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Tampere University Hospital, Tampere, Finland
| | - Nina Mononen
- Department of Clinical Chemistry, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Department of Clinical Chemistry, Fimlab Laboratories, Tampere, Finland
| | - Jorma Viikari
- Department of Medicine, University of Turku, Turku, Finland; Division of Medicine, Turku University Hospital, Turku, Finland
| | - Markus Juonala
- Department of Medicine, University of Turku, Turku, Finland; Division of Medicine, Turku University Hospital, Turku, Finland
| | | | - Nina Hutri-Kähönen
- Department of Paediatrics, Tampere University Hospital, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Mika Kähönen
- Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Department of Clinical Physiology, Tampere University Hospital, Tampere, Finland
| | - Olli T Raitakari
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland; Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland; Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Department of Clinical Chemistry, Fimlab Laboratories, Tampere, Finland
| | - Pashupati P Mishra
- Department of Clinical Chemistry, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Department of Clinical Chemistry, Fimlab Laboratories, Tampere, Finland
| |
Collapse
|
23
|
Makrooni MA, O’Shea D, Geeleher P, Seoighe C. Random-effects meta-analysis of effect sizes as a unified framework for gene set analysis. PLoS Comput Biol 2022; 18:e1010278. [PMID: 36197939 PMCID: PMC9576052 DOI: 10.1371/journal.pcbi.1010278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 10/17/2022] [Accepted: 09/18/2022] [Indexed: 11/06/2022] Open
Abstract
Gene set analysis (GSA) remains a common step in genome-scale studies because it can reveal insights that are not apparent from results obtained for individual genes. Many different computational tools are applied for GSA, which may be sensitive to different types of signals; however, most methods implicitly test whether there are differences in the distribution of the effect of some experimental condition between genes in gene sets of interest. We have developed a unifying framework for GSA that first fits effect size distributions, and then tests for differences in these distributions between gene sets. These differences can be in the proportions of genes that are perturbed or in the sign or size of the effects. Inspired by statistical meta-analysis, we take into account the uncertainty in effect size estimates by reducing the influence of genes with greater uncertainty on the estimation of distribution parameters. We demonstrate, using simulation and by application to real data, that this approach provides significant gains in performance over existing methods. Furthermore, the statistical tests carried out are defined in terms of effect sizes, rather than the results of prior statistical tests measuring these changes, which leads to improved interpretability and greater robustness to variation in sample sizes. The role of gene set analysis is to identify groups of genes that are perturbed in a genomics experiment. There are many tools available for this task and they do not all test for the same types of changes. Here we propose a new way to carry out gene set analysis that involves first working out the distribution of the group effect in the gene set and then comparing this distribution to the equivalent distribution in other genes. Tests performed by existing tools for gene set analysis can be related to different comparisons in these distributions of group effects. A unified framework for gene set analysis provides for more explicit null hypotheses against which to test sets of genes for different types of responses to the experimental conditions. These results are more interpretable, because the group effect distributions can be compared visually, providing an indication of how the experimental effect differs between the gene sets.
Collapse
Affiliation(s)
- Mohammad A. Makrooni
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Dónal O’Shea
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Paul Geeleher
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Cathal Seoighe
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland,* E-mail:
| |
Collapse
|
24
|
Lee AJ, Mould DL, Crawford J, Hu D, Powers RK, Doing G, Costello JC, Hogan DA, Greene CS. SOPHIE: Generative Neural Networks Separate Common and Specific Transcriptional Responses. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:912-927. [PMID: 36216026 PMCID: PMC10025681 DOI: 10.1016/j.gpb.2022.09.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 09/09/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022]
Abstract
Genome-wide transcriptome profiling identifies genes that are prone to differential expression (DE) across contexts, as well as genes with changes specific to the experimental manipulation. Distinguishing genes that are specifically changed in a context of interest from common differentially expressed genes (DEGs) allows more efficient prediction of which genes are specific to a given biological process under scrutiny. Currently, common DEGs or pathways can only be identified through the laborious manual curation of experiments, an inordinately time-consuming endeavor. Here we pioneer an approach, Specific cOntext Pattern Highlighting In Expression data (SOPHIE), for distinguishing between common and specific transcriptional patterns using a generative neural network to create a background set of experiments from which a null distribution of gene and pathway changes can be generated. We apply SOPHIE to diverse datasets including those from human, human cancer, and bacterial pathogen Pseudomonas aeruginosa. SOPHIE identifies common DEGs in concordance with previously described, manually and systematically determined common DEGs. Further molecular validation indicates that SOPHIE detects highly specific but low-magnitude biologically relevant transcriptional changes. SOPHIE's measure of specificity can complement log2 fold change values generated from traditional DE analyses. For example, by filtering the set of DEGs, one can identify genes that are specifically relevant to the experimental condition of interest. Consequently, these results can inform future research directions. All scripts used in these analyses are available at https://github.com/greenelab/generic-expression-patterns. Users can access https://github.com/greenelab/sophie to run SOPHIE on their own data.
Collapse
Affiliation(s)
- Alexandra J Lee
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Dallas L Mould
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Jake Crawford
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Dongbo Hu
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Rani K Powers
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Georgia Doing
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - James C Costello
- Department of Pharmacology, University of Colorado School of Medicine, Denver, CO 80045, USA
| | - Deborah A Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA; Center for Health AI, University of Colorado School of Medicine, Denver, CO 80045, USA; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO 80045, USA.
| |
Collapse
|
25
|
Datasets for gene expression profiles of head and neck squamous cell carcinoma and lung cancer treated or not by PD1/PD-L1 inhibitors. Data Brief 2022; 44:108556. [PMID: 36111282 PMCID: PMC9467865 DOI: 10.1016/j.dib.2022.108556] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 08/19/2022] [Accepted: 08/22/2022] [Indexed: 11/22/2022] Open
Abstract
Identification of tumors harboring an overall active immune phenotype may help for selecting patients with advanced head and neck squamous cell carcinomas (HNSCC) and non-small cell lung cancer (NSCLC) who may benefit from immunotherapies. In this context, we generated targeted gene expression profiles in three and two independent cohorts of patients with HNSCC or NSCLC respectively, treated or not by PD-1/PD-L1 inhibitors. Notably, we generated two datasets including 102 and 82 patients with HNSCC or NSCLC treated with PD-1/PD-L1 inhibitors. Clinical information, including detailed survival raw data, is available for each patient, allowing to test association between gene expression data and patient survival (overall and progression-free survival). Moreover, we also generated gene expression datasets of 27 paired HNSCC samples from diagnostic biopsies and versus surgically resected specimens as well as 33 paired HNSCC samples at initial diagnosis (untreated) and at recurrence. Those datasets may allow to test the stability of a given biomarker across paired samples.
Collapse
|
26
|
Xu S, Chen Z, Ge L, Ma C, He Q, Liu W, Zhang L, Zhou L. Identification of potential biomarkers and pathogenesis in neutrophil-predominant severe asthma: A comprehensive bioinformatics analysis. Medicine (Baltimore) 2022; 101:e30661. [PMID: 36197221 PMCID: PMC9509178 DOI: 10.1097/md.0000000000030661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Airway neutrophilia has been associated with asthma severity and asthma exacerbations. This study attempted to identify biomarkers, pathogenesis, and therapeutic molecular targets for severe asthma in neutrophils using bioinformatics analysis. METHODS Fifteen healthy controls and 3 patients with neutrophilic severe asthma were screened from the Gene Expression Omnibus (GEO) database. Based on the analysis of differentially expressed genes (DEGs), functional and pathway enrichment analyses, gene set enrichment analysis, protein-protein interaction network construction, and analysis were performed. Moreover, small-molecule drug candidates have also been identified. RESULTS Three hundred and three upregulated and 59 downregulated genes were identified. Gene ontology function enrichment analyses were primarily related to inflammatory response, immune response, leukocyte migration, neutrophil chemotaxis, mitogen-activated protein kinase cascade, Jun N-terminal kinase cascade, I-kappaB kinase/nuclear factor-κB, and MyD88-dependent toll-like receptor signaling pathway. Pathway enrichment analyses and gene set enrichment analysis were mainly involved in cytokine-cytokine receptor interaction, the TNF signaling pathway, leukocyte transendothelial migration, and the NOD-like receptor signaling pathway. Furthermore, 1 important module and 10 hub genes (CXCL8, TLR2, CXCL1, ICAM1, CXCR4, FPR2, SELL, PTEN, TREM1, and LEP) were identified in the protein-protein interaction network. Moreover, indoprofen, mimosine, STOCK1N-35874, trapidil, iloprost, aminoglutethimide, ajmaline, levobunolol, ethionamide, cefaclor, dimenhydrinate, and bethanechol are potential drugs for the treatment of neutrophil-predominant severe asthma. CONCLUSION This study identified potential biomarkers, pathogenesis, and therapeutic molecular targets for neutrophil-predominant severe asthma.
Collapse
Affiliation(s)
- Shuanglan Xu
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Zi Chen
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Linyang Ge
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Chenhui Ma
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Quan He
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Weihua Liu
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Liuchao Zhang
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Linfu Zhou
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
- Institute of Integrative Medicine, Nanjing Medical University, Nanjing, Jiangsu, China
- *Correspondence: Linfu Zhou, Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, 300 Guangzhou Road, Nanjing, Jiangsu 210029, China (e-mail: )
| |
Collapse
|
27
|
Zhong H, Wang Z, Wei X, Liu Y, Huang X, Mo X, Tang W. Prognostic and immunological role of SERPINH1 in pan-cancer. Front Genet 2022; 13:900495. [PMID: 36105106 PMCID: PMC9465257 DOI: 10.3389/fgene.2022.900495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 07/05/2022] [Indexed: 11/13/2022] Open
Abstract
Background: The SERPINH1 gene plays a vital part in tumorigenesis and development, whereas its potential as an immunotherapy target is still unknown. Hence, this research aimed to probe the roles of SERPINH1 in human tumors.Method: Using The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx) database, Oncomine, and SangerBox software, the pan-cancer expression of SERPINH1 and its correlation were systematically analyzed. SERPINH1 protein information was detected by the Human Protein Atlas (HPA) database and STRING database. The genomic alterations of SERPINH1 were studied using the c-BioPortal database. The influence of SERPINH1 on prognosis was analyzed using Kaplan–Meier plotter. The R package “clusterProfiler” was used for enrichment analysis to detect the role of SERPINH1. The TIMER2 database was used to further analyze the correlation between the immune cell infiltration score of TCGA samples and the expression of SERPINH1.Results: SERPINH1 overexpression was related to worse survival status in pan-cancer. In addition, high expression of SERPINH1 was positively associated with tumor stage and poor prognosis. Moreover, SERPINH1 played an important role in tumor microenvironment and immune regulation. Our study revealed that SERPINH1 expression has a strong correlation with immune cell filtration, immune regulation, chemokines, and immune checkpoints.Conclusion: Our research found that SERPINH1 was a risk factor and predictor of poor prognosis in various tumors. High expression of SERPINH1 may contribute to tumor immune-suppressive status. Also, SERPINH1 may become a potential immunotherapy target in pan-cancer.
Collapse
Affiliation(s)
- Huage Zhong
- Division of Colorectal and Anal Surgery, Department of Gastrointestinal Surgery, Guangxi Medical University Cancer Hospital, Nanning, China
- Guangxi Clinical Research Center for Colorectal Cancer, Nanning, China
| | - Zheng Wang
- Centre of Imaging Diagnosis, Affiliated Tumor Hospital of Guangxi Medical University, Nanning, China
| | - Xiaoxia Wei
- Division of Colorectal and Anal Surgery, Department of Gastrointestinal Surgery, Guangxi Medical University Cancer Hospital, Nanning, China
- Guangxi Clinical Research Center for Colorectal Cancer, Nanning, China
| | - Yaning Liu
- Division of Colorectal and Anal Surgery, Department of Gastrointestinal Surgery, Guangxi Medical University Cancer Hospital, Nanning, China
- Guangxi Clinical Research Center for Colorectal Cancer, Nanning, China
| | - Xiaoliang Huang
- Division of Colorectal and Anal Surgery, Department of Gastrointestinal Surgery, Guangxi Medical University Cancer Hospital, Nanning, China
- Guangxi Clinical Research Center for Colorectal Cancer, Nanning, China
- *Correspondence: Weizhong Tang, ; Xiaoliang Huang,
| | - Xianwei Mo
- Division of Colorectal and Anal Surgery, Department of Gastrointestinal Surgery, Guangxi Medical University Cancer Hospital, Nanning, China
- Guangxi Clinical Research Center for Colorectal Cancer, Nanning, China
| | - Weizhong Tang
- Division of Colorectal and Anal Surgery, Department of Gastrointestinal Surgery, Guangxi Medical University Cancer Hospital, Nanning, China
- Guangxi Clinical Research Center for Colorectal Cancer, Nanning, China
- *Correspondence: Weizhong Tang, ; Xiaoliang Huang,
| |
Collapse
|
28
|
Kagiwada H, Motono C, Horimoto K, Fukui K. Phosprof: pathway analysis database of drug response based on phosphorylation activity measurements. Database (Oxford) 2022; 2022:baac072. [PMID: 35994309 PMCID: PMC9394491 DOI: 10.1093/database/baac072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 07/19/2022] [Accepted: 08/17/2022] [Indexed: 06/15/2023]
Abstract
UNLABELLED Protein phosphorylation plays a fundamental role in many cellular processes. Proteins are phosphorylated by kinases, which have been studied as drug targets for the treatment of various diseases, particularly cancer. Because kinases have multiple roles in interconnected molecular pathways, their specific regulation is required to enhance beneficial and reduce adversarial effects of drugs. Using our previously developed platform, we measured phosphorylation profiles of MCF7 and K562 cells treated with 94 clinical drugs. These phosphorylation profiles can provide insights into pathway activities and biological functions. Here, we introduce Phosprof, a novel database of drug response based on phosphorylation activity. Phosprof is able to present up- or downregulated phosphorylated signature proteins on pathway maps, significant pathways on the hierarchal tree in signal transduction and commonly perturbed pathways affected by the selected drugs. It also serves as a useful web interface for new or known drug profile search based on their molecular similarity with the 94 drugs. Phosprof can be helpful for further investigation of drug responses in terms of phosphorylation by utilizing the various approved drugs whose target phenotypes are known. DATABASE URL https://phosprof.medals.jp/.
Collapse
Affiliation(s)
- Harumi Kagiwada
- *Corresponding author: Tel: +81 3 5501 1017; Fax: +81 3 5530 2061; Correspondence may also be addressed to Kazuhiko Fukui. Tel: +81 3 3599 8667; Fax: +81 3 5530 2061;
| | - Chie Motono
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology, 2-4-7, Aomi Koto-ku, Tokyo, Japan
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Katsuhisa Horimoto
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7, Aomi Koto-ku, Tokyo, Japan
| | - Kazuhiko Fukui
- *Corresponding author: Tel: +81 3 5501 1017; Fax: +81 3 5530 2061; Correspondence may also be addressed to Kazuhiko Fukui. Tel: +81 3 3599 8667; Fax: +81 3 5530 2061;
| |
Collapse
|
29
|
Androulakis IP. Towards a comprehensive assessment of QSP models: what would it take? J Pharmacokinet Pharmacodyn 2022:10.1007/s10928-022-09820-0. [PMID: 35962928 PMCID: PMC9922790 DOI: 10.1007/s10928-022-09820-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 07/15/2022] [Indexed: 10/15/2022]
Abstract
Quantitative Systems Pharmacology (QSP) has emerged as a powerful ensemble of approaches aiming at developing integrated mathematical and computational models elucidating the complex interactions between pharmacology, physiology, and disease. As the field grows and matures its applications expand beyond the boundaries of research and development and slowly enter the decision making and regulatory arenas. However, widespread acceptance and eventual adoption of a new modeling approach requires assessment criteria and quantifiable metrics that establish credibility and increase confidence in model predictions. QSP aims to provide an integrated understanding of pathology in the context of therapeutic interventions. Because of its ambitious nature and the fact that QSP emerged in an uncoordinated manner as a result of activities distributed across organizations and academic institutions, high entropy characterizes the tools, methods, and computational methodologies and approaches used. The eventual acceptance of QSP model predictions as supporting material for an application to a regulatory agency will require that two key aspects are considered: (1) increase confidence in the QSP framework, which drives standardization and assessment; and (2) careful articulation of the expectations. Both rely heavily on our ability to rigorously and consistently assess QSP models. In this manuscript, we wish to discuss the meaning and purpose of such an assessment in the context of QSP model development and elaborate on the differentiating features of QSP that render such an endeavor challenging. We argue that QSP establishes a conceptual, integrative framework rather than a specific and well-defined computational methodology. QSP elicits the use of a wide variety of modeling and computational methodologies optimized with respect to specific applications and available data modalities, which exceed the data structures employed by chemometrics and PK/PD models. While the range of options fosters creativity and promises to substantially advance our ability to design pharmaceutical interventions rationally and optimally, our expectations of QSP models need to be clearly articulated and agreed on, with assessment emphasizing the scope of QSP studies rather than the methods used. Nevertheless, QSP should not be considered an independent approach, rather one of many in the broader continuum of computational models.
Collapse
Affiliation(s)
- Ioannis P Androulakis
- Biomedical Engineering Department and Chemical & Biochemical Engineering Department, Rutgers, The State University of New Jersey, New Brunswick, USA.
| |
Collapse
|
30
|
Oh S, Geistlinger L, Ramos M, Blankenberg D, van den Beek M, Taroni JN, Carey VJ, Greene CS, Waldron L, Davis S. GenomicSuperSignature facilitates interpretation of RNA-seq experiments through robust, efficient comparison to public databases. Nat Commun 2022; 13:3695. [PMID: 35760813 PMCID: PMC9237024 DOI: 10.1038/s41467-022-31411-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 06/14/2022] [Indexed: 02/04/2023] Open
Abstract
Millions of transcriptomic profiles have been deposited in public archives, yet remain underused for the interpretation of new experiments. We present a method for interpreting new transcriptomic datasets through instant comparison to public datasets without high-performance computing requirements. We apply Principal Component Analysis on 536 studies comprising 44,890 human RNA sequencing profiles and aggregate sufficiently similar loading vectors to form Replicable Axes of Variation (RAV). RAVs are annotated with metadata of originating studies and by gene set enrichment analysis. Functionality to associate new datasets with RAVs, extract interpretable annotations, and provide intuitive visualization are implemented as the GenomicSuperSignature R/Bioconductor package. We demonstrate the efficient and coherent database search, robustness to batch effects and heterogeneous training data, and transfer learning capacity of our method using TCGA and rare diseases datasets. GenomicSuperSignature aids in analyzing new gene expression data in the context of existing databases using minimal computing resources.
Collapse
Affiliation(s)
- Sehyun Oh
- grid.212340.60000000122985718Graduate School of Public Health and Health Policy and Institute for Implementation Sciences in Public Health, City University of New York, New York, NY USA
| | - Ludwig Geistlinger
- grid.38142.3c000000041936754XCenter for Computational Biomedicine, Harvard Medical School, Boston, MA USA
| | - Marcel Ramos
- grid.212340.60000000122985718Graduate School of Public Health and Health Policy and Institute for Implementation Sciences in Public Health, City University of New York, New York, NY USA
| | - Daniel Blankenberg
- grid.239578.20000 0001 0675 4725Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH USA ,grid.67105.350000 0001 2164 3847Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH USA
| | - Marius van den Beek
- grid.29857.310000 0001 2097 4281The Pennsylvania State University, State College, PA USA
| | - Jaclyn N. Taroni
- grid.430722.0Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA USA
| | - Vincent J. Carey
- grid.38142.3c000000041936754XChanning Division of Network Medicine, Mass General Brigham, Harvard Medical School, Boston, MA USA
| | - Casey S. Greene
- grid.241116.10000000107903411Center for Health AI, University of Colorado Anschutz School of Medicine, Denver, CO USA
| | - Levi Waldron
- grid.212340.60000000122985718Graduate School of Public Health and Health Policy and Institute for Implementation Sciences in Public Health, City University of New York, New York, NY USA
| | - Sean Davis
- grid.241116.10000000107903411Center for Health AI, University of Colorado Anschutz School of Medicine, Denver, CO USA
| |
Collapse
|
31
|
Cerulo L, Pagnotta SM. massiveGST: A Mann-Whitney-Wilcoxon Gene-Set Test Tool That Gives Meaning to Gene-Set Enrichment Analysis. ENTROPY 2022; 24:e24050739. [PMID: 35626622 PMCID: PMC9140214 DOI: 10.3390/e24050739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 05/16/2022] [Accepted: 05/19/2022] [Indexed: 01/27/2023]
Abstract
Gene-set enrichment analysis is the key methodology for obtaining biological information from transcriptomic space’s statistical result. Since its introduction, Gene-set Enrichment analysis methods have obtained more reliable results and a wider range of application. Great attention has been devoted to global tests, in contrast to competitive methods that have been largely ignored, although they appear more flexible because they are independent from the source of gene-profiles. We analyzed the properties of the Mann–Whitney–Wilcoxon test, a competitive method, and adapted its interpretation in the context of enrichment analysis by introducing a Normalized Enrichment Score that summarize two interpretations: a probability estimate and a location index. Two implementations are presented and compared with relevant literature methods: an R package and an online web tool. Both allow for obtaining tabular and graphical results with attention to reproducible research.
Collapse
Affiliation(s)
- Luigi Cerulo
- Department of Science and Technology, Università degli Studi del Sannio, 82100 Benevento, Italy;
- Bioinformatics Lab, Biogem, Molecular Biology and Genetics Research Institute, 83031 Ariano Irpino, Italy
| | - Stefano Maria Pagnotta
- Department of Science and Technology, Università degli Studi del Sannio, 82100 Benevento, Italy;
- Correspondence:
| |
Collapse
|
32
|
Mubeen S, Tom Kodamullil A, Hofmann-Apitius M, Domingo-Fernández D. On the influence of several factors on pathway enrichment analysis. Brief Bioinform 2022; 23:bbac143. [PMID: 35453140 PMCID: PMC9116215 DOI: 10.1093/bib/bbac143] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/21/2022] [Accepted: 03/30/2022] [Indexed: 02/01/2023] Open
Abstract
Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
- Fraunhofer Center for Machine Learning, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Fraunhofer Center for Machine Learning, Germany
- Enveda Biosciences, Boulder, CO, 80301, USA
| |
Collapse
|
33
|
Nguyen QP, Hoen AG, Frost HR. CBEA: Competitive balances for taxonomic enrichment analysis. PLoS Comput Biol 2022; 18:e1010091. [PMID: 35584140 PMCID: PMC9154102 DOI: 10.1371/journal.pcbi.1010091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 05/31/2022] [Accepted: 04/08/2022] [Indexed: 12/15/2022] Open
Abstract
Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and sparsity is to aggregate variables into pre-defined sets. Set-based analysis is ubiquitous in the genomics literature and has demonstrable impact on improving interpretability and power of downstream analysis. Unfortunately, there is a lack of sophisticated set-based analysis methods specific to microbiome taxonomic data, where current practice often employs abundance summation as a technique for aggregation. This approach prevents comparison across sets of different sizes, does not preserve inter-sample distances, and amplifies protocol bias. Here, we attempt to fill this gap with a new single-sample taxon enrichment method that uses a novel log-ratio formulation based on the competitive null hypothesis commonly used in the enrichment analysis literature. Our approach, titled competitive balances for taxonomic enrichment analysis (CBEA), generates sample-specific enrichment scores as the scaled log-ratio of the subcomposition defined by taxa within a set and the subcomposition defined by its complement. We provide sample-level significance testing by estimating an empirical null distribution of our test statistic with valid p-values. Herein, we demonstrate, using both real data applications and simulations, that CBEA controls for type I error, even under high sparsity and high inter-taxa correlation scenarios. Additionally, CBEA provides informative scores that can be inputs to downstream analyses such as prediction tasks.
Collapse
Affiliation(s)
- Quang P. Nguyen
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
| | - Anne G. Hoen
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
| | - H. Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
| |
Collapse
|
34
|
Jessica A. C, Rocío L. C. Differential gene expression in cancer: An overrated analysis? Curr Bioinform 2022. [DOI: 10.2174/1574893617666220422134525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Abstract:
The search for marker genes associated with different pathologies traditionally begins with some form of differential expression analysis. This step is essential in most functional genomics' works that analyze gene expression data. In the present article, we present a different analysis, starting from the known biological significance of different groups of genes and then assessing the proportion of differentially expressed genes. The analysis is performed in the context of cancer expression data to unveil the true importance of differential expression, approaching it from different research objectives. Firstly, it was seen that the percentage of differentially expressed genes is generally low concerning gene sets annotated in KEGG. On the other hand, it was observed that in the training and prediction process of both statistical and machine learning models, the fact of using differentially expressed genes sustainably improves their results.
Collapse
Affiliation(s)
- Carballido Jessica A.
- Department of CS and Engineering - Institute for CS and Engineering
CONICET - UNS
Bahía Blanca, Bs. As. Argentina
| | - Cecchini Rocío L.
- Department of CS and Engineering - Institute for CS and Engineering
CONICET - UNS
Bahía Blanca, Bs. As. Argentina
| |
Collapse
|
35
|
Functional Enrichment Analysis of Regulatory Elements. Biomedicines 2022; 10:biomedicines10030590. [PMID: 35327392 PMCID: PMC8945021 DOI: 10.3390/biomedicines10030590] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 02/22/2022] [Accepted: 02/25/2022] [Indexed: 01/27/2023] Open
Abstract
Statistical methods for enrichment analysis are important tools to extract biological information from omics experiments. Although these methods have been widely used for the analysis of gene and protein lists, the development of high-throughput technologies for regulatory elements demands dedicated statistical and bioinformatics tools. Here, we present a set of enrichment analysis methods for regulatory elements, including CpG sites, miRNAs, and transcription factors. Statistical significance is determined via a power weighting function for target genes and tested by the Wallenius noncentral hypergeometric distribution model to avoid selection bias. These new methodologies have been applied to the analysis of a set of miRNAs associated with arrhythmia, showing the potential of this tool to extract biological information from a list of regulatory elements. These new methods are available in GeneCodis 4, a web tool able to perform singular and modular enrichment analysis that allows the integration of heterogeneous information.
Collapse
|
36
|
Lycopene Supplementation to Serum-Free Maturation Medium Improves In Vitro Bovine Embryo Development and Quality and Modulates Embryonic Transcriptomic Profile. Antioxidants (Basel) 2022; 11:antiox11020344. [PMID: 35204226 PMCID: PMC8868338 DOI: 10.3390/antiox11020344] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/02/2022] [Accepted: 02/08/2022] [Indexed: 02/08/2023] Open
Abstract
Bovine embryos are typically cultured at reduced oxygen tension to lower the impact of oxidative stress on embryo development. However, oocyte in vitro maturation (IVM) is performed at atmospheric oxygen tension since low oxygen during maturation has a negative impact on oocyte developmental competence. Lycopene, a carotenoid, acts as a powerful antioxidant and may protect the oocyte against oxidative stress during maturation at atmospheric oxygen conditions. Here, we assessed the effect of adding 0.2 μM lycopene (antioxidant), 5 μM menadione (pro-oxidant), and their combination on the generation of reactive oxygen species (ROS) in matured oocytes and the subsequent development, quality, and transcriptome of the blastocysts in a bovine in vitro model. ROS fluorescent intensity in matured oocytes was significantly lower in the lycopene group, and the resulting embryos showed a significantly higher blastocyst rate on day 8 and a lower apoptotic cell ratio than all other groups. Transcriptomic analysis disclosed a total of 296 differentially expressed genes (Benjamini–Hochberg-adjusted p < 0.05 and ≥ 1-log2-fold change) between the lycopene and control groups, where pathways associated with cellular function, metabolism, DNA repair, and anti-apoptosis were upregulated in the lycopene group. Lycopene supplementation to serum-free maturation medium neutralized excess ROS during maturation, enhanced blastocyst development and quality, and modulated the transcriptomic landscape.
Collapse
|
37
|
Identification of MAD2L1 as a Potential Biomarker in Hepatocellular Carcinoma via Comprehensive Bioinformatics Analysis. BIOMED RESEARCH INTERNATIONAL 2022; 2022:9868022. [PMID: 35132379 PMCID: PMC8817109 DOI: 10.1155/2022/9868022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 11/19/2021] [Accepted: 01/15/2022] [Indexed: 11/17/2022]
Abstract
Background Hepatocellular carcinoma (HCC) is widely acknowledged as a malignant tumor with rapid progression, high recurrence rate, and poor prognosis. At present, there is a paucity of reliable biomarkers at the clinical level to guide the management of HCC and improve patient outcomes. Our research is aimed at assessing the prognostic value of MAD2L1 in HCC. Methods Four datasets, GSE121248, GSE101685, GSE85598, and GSE62232, were selected from the GEO database to analyze differentially expressed genes (DEGs) between HCC and normal liver tissues. After functional analysis, we constructed a protein-protein interaction network (PPI) for DEGs and identified core genes in this network with high connectivity with other genes. We assessed the relationship between core genes and the pathogenesis and prognosis of HCC. Finally, we explored the gene regulatory signaling mechanisms involved in HCC pathogenesis. Results 145 DEGs were screened from the intersection of the four GEO datasets. MAD2L1 was associated with most genes according to the PPI network and was selected as a candidate gene for further study. Survival analysis suggested that high MAD2L1 expression in HCC correlated with a worse prognosis. In addition, real-time quantitative PCR (RT-qPCR), western blot (WB), and immunohistochemistry (IHC) findings suggested that the expression of MAD2L1 was abnormally increased in HCC tissues and cells compared to paraneoplastic tissues and normal hepatocytes. Conclusion We found that high MAD2L1 expression in HCC was significantly associated with overall patient survival and clinical features. We also explored the potential biological properties of this gene.
Collapse
|
38
|
Huang JB, Hu BB, He R, He L, Zou C, Man CF, Fan Y. Analysis of N6-Methyladenosine Methylome in Adenocarcinoma of Esophagogastric Junction. Front Genet 2022; 12:787800. [PMID: 35140740 PMCID: PMC8820482 DOI: 10.3389/fgene.2021.787800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/30/2021] [Indexed: 11/21/2022] Open
Abstract
Background: From previous studies, we found that there are more than 100 types of RNA modifications in RNA molecules. m6A methylation is the most common. The incidence rate of adenocarcinoma of the esophagogastric junction (AEG) at home and abroad has increased faster than that of stomach cancer at other sites in recent years. Here, we systematically analyze the modification pattern of m6A mRNA in adenocarcinoma at the esophagogastric junction. Methods: m6A sequencing, RNA sequencing, and bioinformatics analysis were used to describe the m6A modification pattern in adenocarcinoma and normal tissues at the esophagogastric junction. Results: In AEG samples, a total of 4,775 new m6A peaks appeared, and 3,054 peaks disappeared. The unique m6A-related genes in AEG are related to cancer-related pathways. There are hypermethylated or hypomethylated m6A peaks in AEG in differentially expressed mRNA transcripts. Conclusion: This study preliminarily constructed the first m6A full transcriptome map of human AEG. This has a guiding role in revealing the mechanism of m6A-mediated gene expression regulation.
Collapse
|
39
|
Marczyk M, Macioszek A, Tobiasz J, Polanska J, Zyla J. Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies. Front Genet 2021; 12:767358. [PMID: 34956320 PMCID: PMC8696167 DOI: 10.3389/fgene.2021.767358] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 11/10/2021] [Indexed: 11/13/2022] Open
Abstract
A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar's test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.
Collapse
Affiliation(s)
- Michal Marczyk
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland.,Yale Cancer Center, Yale School of Medicine, New Haven, CT, United States
| | - Agnieszka Macioszek
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Tobiasz
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Polanska
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Zyla
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
40
|
Marini F, Ludt A, Linke J, Strauch K. GeneTonic: an R/Bioconductor package for streamlining the interpretation of RNA-seq data. BMC Bioinformatics 2021; 22:610. [PMID: 34949163 PMCID: PMC8697502 DOI: 10.1186/s12859-021-04461-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 10/26/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The interpretation of results from transcriptome profiling experiments via RNA sequencing (RNA-seq) can be a complex task, where the essential information is distributed among different tabular and list formats-normalized expression values, results from differential expression analysis, and results from functional enrichment analyses. A number of tools and databases are widely used for the purpose of identification of relevant functional patterns, yet often their contextualization within the data and results at hand is not straightforward, especially if these analytic components are not combined together efficiently. RESULTS We developed the GeneTonic software package, which serves as a comprehensive toolkit for streamlining the interpretation of functional enrichment analyses, by fully leveraging the information of expression values in a differential expression context. GeneTonic is implemented in R and Shiny, leveraging packages that enable HTML-based interactive visualizations for executing drilldown tasks seamlessly, viewing the data at a level of increased detail. GeneTonic is integrated with the core classes of existing Bioconductor workflows, and can accept the output of many widely used tools for pathway analysis, making this approach applicable to a wide range of use cases. Users can effectively navigate interlinked components (otherwise available as flat text or spreadsheet tables), bookmark features of interest during the exploration sessions, and obtain at the end a tailored HTML report, thus combining the benefits of both interactivity and reproducibility. CONCLUSION GeneTonic is distributed as an R package in the Bioconductor project ( https://bioconductor.org/packages/GeneTonic/ ) under the MIT license. Offering both bird's-eye views of the components of transcriptome data analysis and the detailed inspection of single genes, individual signatures, and their relationships, GeneTonic aims at simplifying the process of interpretation of complex and compelling RNA-seq datasets for many researchers with different expertise profiles.
Collapse
Affiliation(s)
- Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
- Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
| | - Annekathrin Ludt
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
| | - Jan Linke
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
- Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
| | - Konstantin Strauch
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
| |
Collapse
|
41
|
Mubeen S, Bharadhwaj VS, Gadiya Y, Hofmann-Apitius M, Kodamullil AT, Domingo-Fernández D. DecoPath: a web application for decoding pathway enrichment analysis. NAR Genom Bioinform 2021; 3:lqab087. [PMID: 34568823 PMCID: PMC8459727 DOI: 10.1093/nargab/lqab087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 08/31/2021] [Accepted: 09/14/2021] [Indexed: 12/16/2022] Open
Abstract
The past decades have brought a steady growth of pathway databases and enrichment methods. However, the advent of pathway data has not been accompanied by an improvement in interoperability across databases, hampering the use of pathway knowledge from multiple databases for enrichment analysis. While integrative databases have attempted to address this issue, they often do not account for redundant information across resources. Furthermore, the majority of studies that employ pathway enrichment analysis still rely upon a single database or enrichment method, though the use of another could yield differing results. These shortcomings call for approaches that investigate the differences and agreements across databases and methods as their selection in the design of a pathway analysis can be a crucial step in ensuring the results of such an analysis are meaningful. Here we present DecoPath, a web application to assist in the interpretation of the results of pathway enrichment analysis. DecoPath provides an ecosystem to run enrichment analysis or directly upload results and facilitate the interpretation of results with custom visualizations that highlight the consensus and/or discrepancies at the pathway- and gene-levels. DecoPath is available at https://decopath.scai.fraunhofer.de, and its source code and documentation can be found on GitHub at https://github.com/DecoPath/DecoPath.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany
- Fraunhofer Center for Machine Learning, Germany
| | - Vinay S Bharadhwaj
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany
| | - Yojana Gadiya
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53115, Germany
| | - Alpha T Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Fraunhofer Center for Machine Learning, Germany
- Enveda Biosciences, Boulder, CO 80301, USA
| |
Collapse
|
42
|
Ramos M, Geistlinger L, Oh S, Schiffer L, Azhar R, Kodali H, de Bruijn I, Gao J, Carey VJ, Morgan M, Waldron L. Multiomic Integration of Public Oncology Databases in Bioconductor. JCO Clin Cancer Inform 2021; 4:958-971. [PMID: 33119407 PMCID: PMC7608653 DOI: 10.1200/cci.19.00119] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
PURPOSE Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples.
Collapse
Affiliation(s)
- Marcel Ramos
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY.,Roswell Park Comprehensive Cancer Center, Buffalo, NY
| | - Ludwig Geistlinger
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY
| | - Sehyun Oh
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY
| | - Lucas Schiffer
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY.,Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA
| | - Rimsha Azhar
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY.,Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY
| | - Hanish Kodali
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY
| | - Ino de Bruijn
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Jianjiong Gao
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY.,Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Vincent J Carey
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Martin Morgan
- Roswell Park Comprehensive Cancer Center, Buffalo, NY
| | - Levi Waldron
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY.,Institute for Implementation Science and Population Health, City University of New York, New York, NY
| |
Collapse
|
43
|
Application of Bioinformatics Methods to Identify Key Genes and Functions in Chronic Pelvic Pain. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2021; 2021:7257405. [PMID: 34381521 PMCID: PMC8352682 DOI: 10.1155/2021/7257405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 07/19/2021] [Indexed: 11/17/2022]
Abstract
Neuropathologic pain (NPP) occurs in most patients with chronic pelvic pain (CPP), and the unique physiological characteristics of visceral sensory neurons make the current analgesic effect of CPP patients not optimistic. Therefore, this study explored the possible biological characteristics of key genes in CPP through the bioinformatics method. CPP-related dataset GSE131619 was downloaded from Gene Expression Omnibus to investigate the differentially expressed genes (DEGs) between lumbar dorsal root ganglia (DRG) and sacral DRG, and the functional enrichment analysis was performed. A protein-protein interaction (PPI) network was constructed to search subnet modules of specific biological processes, and then, the genes in the subnet were enriched by single gene set analysis. A CPP mouse model was established, and the expression of key genes were identified by qPCR. The results showed that 127 upregulated DEGs and 103 downregulated DEGs are identified. Functional enrichment analysis showed that most of the genes involved in signal transduction were involved in the pathway of receptor interaction. A subnet module related to neural signal regulation was identified in PPI, including CHRNB4, CHRNA3, and CHRNB2. All three genes were associated with neurological or inflammatory activity and are downregulated in the sacral spinal cord of CPP mice. This study provided three key candidate genes for CPP: CHRNB4, CHRNA3, and CHRNB2, which may be involved in the occurrence and development of CPP, and provided a powerful molecular target for the clinical diagnosis and treatment of CPP.
Collapse
|
44
|
Gene expression analysis method integration and co-expression module detection applied to rare glucide metabolism disorders using ExpHunterSuite. Sci Rep 2021; 11:15062. [PMID: 34301987 PMCID: PMC8302605 DOI: 10.1038/s41598-021-94343-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 07/09/2021] [Indexed: 12/13/2022] Open
Abstract
High-throughput gene expression analysis is widely used. However, analysis is not straightforward. Multiple approaches should be applied and methods to combine their results implemented and investigated. We present methodology for the comprehensive analysis of expression data, including co-expression module detection and result integration via data-fusion, threshold based methods, and a Naïve Bayes classifier trained on simulated data. Application to rare-disease model datasets confirms existing knowledge related to immune cell infiltration and suggest novel hypotheses including the role of calcium channels. Application to simulated and spike-in experiments shows that combining multiple methods using consensus and classifiers leads to optimal results. ExpHunter Suite is implemented as an R/Bioconductor package available from https://bioconductor.org/packages/ExpHunterSuite. It can be applied to model and non-model organisms and can be run modularly in R; it can also be run from the command line, allowing scalability with large datasets. Code and reports for the studies are available from https://github.com/fmjabato/ExpHunterSuiteExamples.
Collapse
|
45
|
Bu D, Luo H, Huo P, Wang Z, Zhang S, He Z, Wu Y, Zhao L, Liu J, Guo J, Fang S, Cao W, Yi L, Zhao Y, Kong L. KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic Acids Res 2021; 49:W317-W325. [PMID: 34086934 PMCID: PMC8265193 DOI: 10.1093/nar/gkab447] [Citation(s) in RCA: 676] [Impact Index Per Article: 225.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/24/2021] [Accepted: 05/09/2021] [Indexed: 12/20/2022] Open
Abstract
Gene set enrichment (GSE) analysis plays an essential role in extracting biological insight from genome-scale experiments. ORA (overrepresentation analysis), FCS (functional class scoring), and PT (pathway topology) approaches are three generations of GSE methods along the timeline of development. Previous versions of KOBAS provided services based on just the ORA method. Here we presented version 3.0 of KOBAS, which is named KOBAS-i (short for KOBAS intelligent version). It introduced a novel machine learning-based method we published earlier, CGPS, which incorporates seven FCS tools and two PT tools into a single ensemble score and intelligently prioritizes the relevant biological pathways. In addition, KOBAS has expanded the downstream exploratory visualization for selecting and understanding the enriched results. The tool constructs a novel view of cirFunMap, which presents different enriched terms and their correlations in a landscape. Finally, based on the previous version's framework, KOBAS increased the number of supported species from 1327 to 5944. For an easier local run, it also provides a prebuilt Docker image that requires no installation, as a supplementary to the source code version. KOBAS can be freely accessed at http://kobas.cbi.pku.edu.cn, and a mirror site is available at http://bioinfo.org/kobas.
Collapse
Affiliation(s)
| | | | - Peipei Huo
- Chinese Academy of Sciences, LuoYang Branch of Institute of Computing Technology, Luoyang, 471000, China
| | - Zhihao Wang
- Chinese Academy of Sciences, LuoYang Branch of Institute of Computing Technology, Luoyang, 471000, China
| | - Shan Zhang
- Chinese Academy of Sciences, LuoYang Branch of Institute of Computing Technology, Luoyang, 471000, China
| | - Zihao He
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, ChaoYang District, Beijing 100029, China
| | - Yang Wu
- Pervasive Computing Research Center, Institute of Computing Technology, Chinese Academy ofSciences, Beijing, 100190, China
| | - Lianhe Zhao
- Pervasive Computing Research Center, Institute of Computing Technology, Chinese Academy ofSciences, Beijing, 100190, China
| | - Jingjia Liu
- Cancer Center, Ningbo Institute of Life and Health Industry, University of Chinese Academy of Sciences, Zhejiang 315000, China
| | - Jincheng Guo
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, ChaoYang District, Beijing 100029, China
| | - Shuangsang Fang
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, ChaoYang District, Beijing 100029, China
| | - Wanchen Cao
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, ChaoYang District, Beijing 100029, China
| | - Lan Yi
- Pervasive Computing Research Center, Institute of Computing Technology, Chinese Academy ofSciences, Beijing, 100190, China
| | - Yi Zhao
- Correspondence may also be addressed to Yi Zhao. Tel: +86 010 62600822;
| | - Lei Kong
- To whom correspondence should be addressed. Tel: +86 010 62755206;
| |
Collapse
|
46
|
Cheng X, Yan J, Liu Y, Wang J, Taubert S. eVITTA: a web-based visualization and inference toolbox for transcriptome analysis. Nucleic Acids Res 2021; 49:W207-W215. [PMID: 34019643 PMCID: PMC8218201 DOI: 10.1093/nar/gkab366] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/12/2021] [Accepted: 05/04/2021] [Indexed: 12/12/2022] Open
Abstract
Transcriptome profiling is essential for gene regulation studies in development and disease. Current web-based tools enable functional characterization of transcriptome data, but most are restricted to applying gene-list-based methods to single datasets, inefficient in leveraging up-to-date and species-specific information, and limited in their visualization options. Additionally, there is no systematic way to explore data stored in the largest transcriptome repository, NCBI GEO. To fill these gaps, we have developed eVITTA (easy Visualization and Inference Toolbox for Transcriptome Analysis; https://tau.cmmt.ubc.ca/eVITTA/). eVITTA provides modules for analysis and exploration of studies published in NCBI GEO (easyGEO), detailed molecular- and systems-level functional profiling (easyGSEA), and customizable comparisons among experimental groups (easyVizR). We tested eVITTA on transcriptomes of SARS-CoV-2 infected human nasopharyngeal swab samples, and identified a downregulation of olfactory signal transducers, in line with the clinical presentation of anosmia in COVID-19 patients. We also analyzed transcriptomes of Caenorhabditis elegans worms with disrupted S-adenosylmethionine metabolism, confirming activation of innate immune responses and feedback induction of one-carbon cycle genes. Collectively, eVITTA streamlines complex computational workflows into an accessible interface, thus filling the gap of an end-to-end platform capable of capturing both broad and granular changes in human and model organism transcriptomes.
Collapse
Affiliation(s)
- Xuanjin Cheng
- Centre for Molecular Medicine and Therapeutics, The University of British Columbia, Vancouver, British Columbia, Canada.,British Columbia Children's Hospital Research Institute, The University of British Columbia, Vancouver, British Columbia, Canada.,Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Junran Yan
- Centre for Molecular Medicine and Therapeutics, The University of British Columbia, Vancouver, British Columbia, Canada.,British Columbia Children's Hospital Research Institute, The University of British Columbia, Vancouver, British Columbia, Canada.,Graduate Program for Cell and Developmental Biology, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Yongxing Liu
- Centre for Molecular Medicine and Therapeutics, The University of British Columbia, Vancouver, British Columbia, Canada.,British Columbia Children's Hospital Research Institute, The University of British Columbia, Vancouver, British Columbia, Canada.,Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Jiahe Wang
- Centre for Molecular Medicine and Therapeutics, The University of British Columbia, Vancouver, British Columbia, Canada.,British Columbia Children's Hospital Research Institute, The University of British Columbia, Vancouver, British Columbia, Canada.,Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Stefan Taubert
- Centre for Molecular Medicine and Therapeutics, The University of British Columbia, Vancouver, British Columbia, Canada.,British Columbia Children's Hospital Research Institute, The University of British Columbia, Vancouver, British Columbia, Canada.,Department of Medical Genetics, The University of British Columbia, Vancouver, British Columbia, Canada.,Graduate Program for Cell and Developmental Biology, The University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
47
|
Angeloni M, Thievessen I, Engel FB, Magni P, Ferrazzi F. Functional genomics meta-analysis to identify gene set enrichment networks in cardiac hypertrophy. Biol Chem 2021; 402:953-972. [PMID: 33951759 DOI: 10.1515/hsz-2020-0378] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/19/2021] [Indexed: 12/28/2022]
Abstract
In order to take advantage of the continuously increasing number of transcriptome studies, it is important to develop strategies that integrate multiple expression datasets addressing the same biological question to allow a robust analysis. Here, we propose a meta-analysis framework that integrates enriched pathways identified through the Gene Set Enrichment Analysis (GSEA) approach and calculates for each meta-pathway an empirical p-value. Validation of our approach on benchmark datasets showed comparable or even better performance than existing methods and an increase in robustness with increasing number of integrated datasets. We then applied the meta-analysis framework to 15 functional genomics datasets of physiological and pathological cardiac hypertrophy. Within these datasets we grouped expression sets measured at time points that represent the same hallmarks of heart tissue remodeling ('aggregated time points') and performed meta-analysis on the expression sets assigned to each aggregated time point. To facilitate biological interpretation, results were visualized as gene set enrichment networks. Here, our meta-analysis framework identified well-known biological mechanisms associated with pathological cardiac hypertrophy (e.g., cardiomyocyte apoptosis, cardiac contractile dysfunction, and alteration in energy metabolism). In addition, results highlighted novel, potentially cardioprotective mechanisms in physiological cardiac hypertrophy involving the down-regulation of immune cell response, which are worth further investigation.
Collapse
Affiliation(s)
- Miriam Angeloni
- Department of Nephropathology, Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Krankenhausstr. 8-10, D-91054Erlangen, Germany.,Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Krankenhausstr. 8-10, D-91054Erlangen, Germany
| | - Ingo Thievessen
- Biophysics Group, Department of Physics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Henkestraße 91, D-91052Erlangen, Germany.,Muscle Research Center Erlangen (MURCE), D-91052 Erlangen, Germany
| | - Felix B Engel
- Experimental Renal and Cardiovascular Research, Department of Nephropathology, Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 12, D-91054Erlangen, Germany.,Muscle Research Center Erlangen (MURCE), D-91052 Erlangen, Germany
| | - Paolo Magni
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, via Ferrata 5, I-27100Pavia, Italy
| | - Fulvia Ferrazzi
- Department of Nephropathology, Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Krankenhausstr. 8-10, D-91054Erlangen, Germany.,Muscle Research Center Erlangen (MURCE), D-91052 Erlangen, Germany
| |
Collapse
|
48
|
Xie C, Jauhari S, Mora A. Popularity and performance of bioinformatics software: the case of gene set analysis. BMC Bioinformatics 2021; 22:191. [PMID: 33858350 PMCID: PMC8050894 DOI: 10.1186/s12859-021-04124-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 04/08/2021] [Indexed: 11/22/2022] Open
Abstract
Background Gene Set Analysis (GSA) is arguably the method of choice for the functional interpretation of omics results. The following paper explores the popularity and the performance of all the GSA methodologies and software published during the 20 years since its inception. "Popularity" is estimated according to each paper's citation counts, while "performance" is based on a comprehensive evaluation of the validation strategies used by papers in the field, as well as the consolidated results from the existing benchmark studies. Results Regarding popularity, data is collected into an online open database ("GSARefDB") which allows browsing bibliographic and method-descriptive information from 503 GSA paper references; regarding performance, we introduce a repository of jupyter workflows and shiny apps for automated benchmarking of GSA methods (“GSA-BenchmarKING”). After comparing popularity versus performance, results show discrepancies between the most popular and the best performing GSA methods. Conclusions The above-mentioned results call our attention towards the nature of the tool selection procedures followed by researchers and raise doubts regarding the quality of the functional interpretation of biological datasets in current biomedical studies. Suggestions for the future of the functional interpretation field are made, including strategies for education and discussion of GSA tools, better validation and benchmarking practices, reproducibility, and functional re-analysis of previously reported data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04124-5.
Collapse
Affiliation(s)
- Chengshu Xie
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences, Guangzhou, China
| | - Shaurya Jauhari
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences, Guangzhou, China
| | - Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences, Guangzhou, China.
| |
Collapse
|
49
|
Katz S, Song J, Webb KP, Lounsbury NW, Bryant CE, Fraser IDC. SIGNAL: A web-based iterative analysis platform integrating pathway and network approaches optimizes hit selection from genome-scale assays. Cell Syst 2021; 12:338-352.e5. [PMID: 33894945 DOI: 10.1016/j.cels.2021.03.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 11/25/2020] [Accepted: 03/03/2021] [Indexed: 01/13/2023]
Abstract
Hit selection from high-throughput assays remains a critical bottleneck in realizing the potential of omic-scale studies in biology. Widely used methods such as setting of cutoffs, prioritizing pathway enrichments, or incorporating predicted network interactions offer divergent solutions yet are associated with critical analytical trade-offs. The specific limitations of these individual approaches and the lack of a systematic way by which to integrate their rankings have contributed to limited overlap in the reported results from comparable genome-wide studies and costly inefficiencies in secondary validation efforts. Using comparative analysis of parallel independent studies as a benchmark, we characterize the specific complementary contributions of each approach and demonstrate an optimal framework to integrate these methods. We describe selection by iterative pathway group and network analysis looping (SIGNAL), an integrated, iterative approach that uses both pathway and network methods to optimize gene prioritization. SIGNAL is accessible as a rapid user-friendly web-based application (https://signal.niaid.nih.gov). A record of this paper's transparent peer review is included in the Supplemental information.
Collapse
Affiliation(s)
- Samuel Katz
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA; University of Cambridge, Department of Veterinary Medicine, Cambridge, UK
| | - Jian Song
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA
| | - Kyle P Webb
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA
| | - Nicolas W Lounsbury
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA
| | - Clare E Bryant
- University of Cambridge, Department of Veterinary Medicine, Cambridge, UK
| | - Iain D C Fraser
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA.
| |
Collapse
|
50
|
Risso D, Pagnotta SM. Per-sample standardization and asymmetric winsorization lead to accurate clustering of RNA-seq expression profiles. Bioinformatics 2021; 37:2356-2364. [PMID: 33560368 PMCID: PMC8388024 DOI: 10.1093/bioinformatics/btab091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 01/27/2021] [Accepted: 02/05/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Data transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformation on the outcome of unsupervised clustering procedures is still unclear. RESULTS Here, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications. AVAILABILITY The AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst\_analysis. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Davide Risso
- Dept. of Statistical Sciences, Università degli Studi di Padova, Padova, Italy
| | | |
Collapse
|