1
|
Kioroglou D, Peña-Cearra A, Corraliza AM, Seoane I, Castelo J, Panés J, Gómez-Irwin L, Rodríguez-Lago I, Ortiz de Zarate J, Fuertes M, Martín-Ruiz I, Gonzalez M, Aransay AM, Salas A, Rodríguez H, Anguita J, Abecia L, Marigorta UM. Mitochondrial Dysfunction: Unraveling the Elusive Biology Behind Anti-TNF Response During Ulcerative Colitis. Inflamm Bowel Dis 2025; 31:1366-1379. [PMID: 39946175 PMCID: PMC12069986 DOI: 10.1093/ibd/izaf015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Indexed: 05/14/2025]
Abstract
BACKGROUND Recent studies hint at mitochondrial genes influencing UC patient response to anti-TNF treatment. We evaluated this hypothesis by following a targeted strategy to identify gene expression that captures the relationship between mitochondrial dysregulation and response to treatment. Our objective was to initially examine this relationship in colon samples and subsequently assess whether the resulting signal persists in the bloodstream. METHODS We analyzed the transcriptome of colon samples from an anti-TNF-treated murine model characterized by impaired mitochondrial activity and treatment resistance. We then transferred the findings that linked mitochondrial dysfunction and compromised treatment response to an anti-TNF-treated UC human cohort. We next matched differential expression in the blood using monocytes from the peripheral blood of controls and IBD patients, and we evaluated a classification process at baseline with whole blood samples from UC patients. RESULTS In human colon samples, the derived gene set from the murine model showed differential expression, primarily enriched metabolic pathways, and exhibited similar classification capacity as genes enriching inflammatory pathways. Moreover, the evaluation of the classification signal using blood samples from UC patients at baseline highlighted the involvement of mitochondrial homeostasis in treatment response. CONCLUSIONS Our results highlight the involvement of metabolic pathways and mitochondrial homeostasis in determining treatment response and their ability to provide promising classification signals with detection levels in both the colon and the bloodstream.
Collapse
Affiliation(s)
- Dimitrios Kioroglou
- Integrative Genomics Lab, Center for Cooperative Research in Biosciences (CIC bioGUNE), Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Basque Country, Spain
| | - Ainize Peña-Cearra
- Inflammation and Macrophage Plasticity Laboratory, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Derio 48160, Spain
- Immunology, Microbiology and Parasitology Department, Faculty of Medicine and Nursery, University of the Basque Country, UPV/EHU, P.O. Box 699, 48080 Bilbao, Spain
| | - Ana M Corraliza
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), ISCIII, Barcelona, Spain
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | - Iratxe Seoane
- Inflammation and Macrophage Plasticity Laboratory, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Derio 48160, Spain
- Immunology, Microbiology and Parasitology Department, Faculty of Medicine and Nursery, University of the Basque Country, UPV/EHU, P.O. Box 699, 48080 Bilbao, Spain
| | - Janire Castelo
- Inflammation and Macrophage Plasticity Laboratory, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Derio 48160, Spain
| | - Julian Panés
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), ISCIII, Barcelona, Spain
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | - Laura Gómez-Irwin
- Departamento de Gastroenterología, Hospital Universitario de Cruces and Biobizkaia Health Research Institute, 48903 Barakaldo, Spain
| | - Iago Rodríguez-Lago
- Departamento de Gastroenterología, Hospital Universitario de Galdakao and Biobizkaia Health Research Institute, 48960 Galdakao, Spain
| | - Jone Ortiz de Zarate
- Departamento de Gastroenterología, Hospital Universitario de Basurto, 48013 Bilbao, Bizkaia, Spain
| | - Miguel Fuertes
- NEIKER-Basque Institute for Agricultural Research and Development, Basque Research and Technology Alliance (BRTA), Bizkaia Science and Technology Park, Building 812L, 48160, Derio, Spain
| | - Itziar Martín-Ruiz
- Inflammation and Macrophage Plasticity Laboratory, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Derio 48160, Spain
| | - Monika Gonzalez
- GAP, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Derio 48160, Spain
| | - Ana M Aransay
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), ISCIII, Barcelona, Spain
- GAP, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Derio 48160, Spain
| | - Azucena Salas
- Centro de Investigación Biomédica en Red de Enfermedades Hepática y Digestivas (CIBERehd), ISCIII, Barcelona, Spain
- Department of Gastroenterology, IDIBAPS, Hospital Clínic, Barcelona, Spain
| | - Héctor Rodríguez
- Inflammation and Macrophage Plasticity Laboratory, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Derio 48160, Spain
| | - Juan Anguita
- Inflammation and Macrophage Plasticity Laboratory, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Derio 48160, Spain
- Ikerbasque, Basque Foundation for Science, 48013 Bilbao, Bizkaia, Spain
| | - Leticia Abecia
- Inflammation and Macrophage Plasticity Laboratory, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Derio 48160, Spain
- Immunology, Microbiology and Parasitology Department, Faculty of Medicine and Nursery, University of the Basque Country, UPV/EHU, P.O. Box 699, 48080 Bilbao, Spain
| | - Urko M Marigorta
- Integrative Genomics Lab, Center for Cooperative Research in Biosciences (CIC bioGUNE), Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Basque Country, Spain
- Ikerbasque, Basque Foundation for Science, 48013 Bilbao, Bizkaia, Spain
| |
Collapse
|
2
|
Henke DM, Renwick A, Zoeller JR, Meena JK, Neill NJ, Bowling EA, Meerbrey KL, Westbrook TF, Simon LM. Bio-primed machine learning to enhance discovery of relevant biomarkers. NPJ Precis Oncol 2025; 9:39. [PMID: 39915634 PMCID: PMC11802771 DOI: 10.1038/s41698-025-00825-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Accepted: 01/28/2025] [Indexed: 02/09/2025] Open
Abstract
Precision medicine relies on identifying reliable biomarkers for gene dependencies to tailor individualized therapeutic strategies. The advent of high-throughput technologies presents unprecedented opportunities to explore molecular disease mechanisms but also challenges due to high dimensionality and collinearity among features. Traditional statistical methods often fall short in this context, necessitating novel computational approaches that harness the full potential of big data in bioinformatics. Here, we introduce a novel machine learning approach extending the Least Absolute Shrinkage and Selection Operator (LASSO) regression framework to incorporate biological knowledge, such as protein-protein interaction databases, into the regularization process. This bio-primed approach prioritizes variables that are both statistically significant and biologically relevant. Applying our method to multiple dependency datasets, we identified biomarkers which traditional methods overlooked. Our biologically informed LASSO method effectively identifies relevant biomarkers from high-dimensional collinear data, bridging the gap between statistical rigor and biological insight. This method holds promise for advancing personalized medicine by uncovering novel therapeutic targets and understanding the complex interplay of genetic and molecular factors in disease.
Collapse
Affiliation(s)
- David M Henke
- Molecular Virology & Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | - Joseph R Zoeller
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX, 77030, USA
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, 77030, USA
- Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jitendra K Meena
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, 77030, USA
- Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Nicholas J Neill
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, 77030, USA
- Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Elizabeth A Bowling
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, 77030, USA
- Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kristen L Meerbrey
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, 77030, USA
- Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Thomas F Westbrook
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, 77030, USA
- Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Lukas M Simon
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX, 77030, USA.
- Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
3
|
Patsalis C, Iyer G, Brandenburg M, Karnovsky A, Michailidis G. DNEA: an R package for fast and versatile data-driven network analysis of metabolomics data. BMC Bioinformatics 2024; 25:383. [PMID: 39695921 DOI: 10.1186/s12859-024-05994-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 11/19/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Metabolomics is a high-throughput technology that measures small molecule metabolites in cells, tissues or biofluids. Analysis of metabolomics data is a multi-step process that involves data processing, quality control and normalization, followed by statistical and bioinformatics analysis. The latter step often involves pathway analysis to aid biological interpretation of the data. This approach is limited to endogenous metabolites that can be readily mapped to metabolic pathways. An alternative to pathway analysis that can be used for any classes of metabolites, including unknown compounds that are ubiquitous in untargeted metabolomics data, involves defining metabolite-metabolite interactions using experimental data. Our group has developed several network-based methods that use partial correlations of experimentally determined metabolite measurements. These were implemented in CorrelationCalculator and Filigree, two software tools for the analysis of metabolomics data we developed previously. The latter tool implements the Differential Network Enrichment Analysis (DNEA) algorithm. This analysis is useful for building differential networks from metabolomics data containing two experimental groups and identifying differentially enriched metabolic modules. While Filigree is a user-friendly tool, it has certain limitations when used for the analysis of large-scale metabolomics datasets. RESULTS We developed the DNEA R package for the data-driven network analysis of metabolomics data. We present the DNEA workflow and functionality, algorithm enhancements implemented with respect to the package's predecessor, Filigree, and discuss best practices for analyses. We tested the performance of the DNEA R package and illustrated its features using publicly available metabolomics data from the environmental determinants of diabetes in the young. To our knowledge, this package is the only publicly available tool designed for the construction of biological networks and subsequent enrichment testing for datasets containing exogenous, secondary, and unknown compounds. This greatly expands the scope of traditional enrichment analysis tools that can be used to analyze a relatively small set of well-annotated metabolites. CONCLUSIONS The DNEA R package is a more flexible and powerful implementation of our previously published software tool, Filigree. The modular structure of the package, along with the parallel processing framework built into the most computationally extensive steps of the algorithm, make it a powerful tool for the analysis of large and complex metabolomics datasets.
Collapse
Affiliation(s)
- Christopher Patsalis
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Internal Medicine, Hematology/Oncology Division, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Gayatri Iyer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Marci Brandenburg
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
- Taubman Health Sciences Library, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Alla Karnovsky
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
| | - George Michailidis
- Department of Statistics, University of Florida, Gainesville, FL, 32611, USA.
| |
Collapse
|
4
|
Cai Z, Poulos RC, Aref A, Robinson PJ, Reddel RR, Zhong Q. DeePathNet: A Transformer-Based Deep Learning Model Integrating Multiomic Data with Cancer Pathways. CANCER RESEARCH COMMUNICATIONS 2024; 4:3151-3164. [PMID: 39530738 PMCID: PMC11652962 DOI: 10.1158/2767-9764.crc-24-0285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 10/10/2024] [Accepted: 11/08/2024] [Indexed: 11/16/2024]
Abstract
SIGNIFICANCE DeePathNet integrates cancer-specific biological pathways using transformer-based deep learning for enhanced cancer analysis. It outperforms existing models in predicting drug responses, cancer types, and subtypes. By enabling pathway-level biomarker discovery, DeePathNet represents a significant advancement in cancer research and could lead to more effective treatments.
Collapse
Affiliation(s)
- Zhaoxiang Cai
- ProCan, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, Australia
| | - Rebecca C. Poulos
- ProCan, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, Australia
| | - Adel Aref
- ProCan, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, Australia
| | - Phillip J. Robinson
- ProCan, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, Australia
| | - Roger R. Reddel
- ProCan, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, Australia
| | - Qing Zhong
- ProCan, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, Australia
| |
Collapse
|
5
|
Cousins HC, Nayar G, Altman RB. Computational Approaches to Drug Repurposing: Methods, Challenges, and Opportunities. Annu Rev Biomed Data Sci 2024; 7:15-29. [PMID: 38598857 DOI: 10.1146/annurev-biodatasci-110123-025333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Drug repurposing refers to the inference of therapeutic relationships between a clinical indication and existing compounds. As an emerging paradigm in drug development, drug repurposing enables more efficient treatment of rare diseases, stratified patient populations, and urgent threats to public health. However, prioritizing well-suited drug candidates from among a nearly infinite number of repurposing options continues to represent a significant challenge in drug development. Over the past decade, advances in genomic profiling, database curation, and machine learning techniques have enabled more accurate identification of drug repurposing candidates for subsequent clinical evaluation. This review outlines the major methodologic classes that these approaches comprise, which rely on (a) protein structure, (b) genomic signatures, (c) biological networks, and (d) real-world clinical data. We propose that realizing the full impact of drug repurposing methodologies requires a multidisciplinary understanding of each method's advantages and limitations with respect to clinical practice.
Collapse
Affiliation(s)
- Henry C Cousins
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA;
| | - Gowri Nayar
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA;
| | - Russ B Altman
- Departments of Genetics, Medicine, and Bioengineering, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California, USA;
| |
Collapse
|
6
|
Iyer G, Brandenburg M, Patsalis C, Michailidis G, Karnovsky A. CorrelationCalculator and Filigree: Tools for Data-Driven Network Analysis of Metabolomics Data. J Vis Exp 2023:10.3791/65512. [PMID: 38009735 PMCID: PMC11785453 DOI: 10.3791/65512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023] Open
Abstract
A significant challenge in the analysis of omics data is extracting actionable biological knowledge. Metabolomics is no exception. The general problem of relating changes in levels of individual metabolites to specific biological processes is compounded by the large number of unknown metabolites present in untargeted liquid chromatography-mass spectrometry (LC-MS) studies. Further, secondary metabolism and lipid metabolism are poorly represented in existing pathway databases. To overcome these limitations, our group has developed several tools for data-driven network construction and analysis. These include CorrelationCalculator and Filigree. Both tools allow users to build partial correlation-based networks from experimental metabolomics data when the number of metabolites exceeds the number of samples. CorrelationCalculator supports the construction of a single network, while Filigree allows building a differential network utilizing data from two groups of samples, followed by network clustering and enrichment analysis. We will describe the utility and application of both tools for the analysis of real-life metabolomics data.
Collapse
Affiliation(s)
- Gayatri Iyer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor
| | - Marci Brandenburg
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor; Taubman Health Sciences Library, University of Michigan, Ann Arbor
| | - Christopher Patsalis
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor
| | | | - Alla Karnovsky
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor;
| |
Collapse
|
7
|
Wang R, Chen Y, Chen J, Ma M, Xu M, Liu S. Integration of transcriptomics and metabolomics analysis for unveiling the toxicological profile in the liver of mice exposed to uranium in drinking water. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 335:122296. [PMID: 37536476 DOI: 10.1016/j.envpol.2023.122296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 04/17/2023] [Accepted: 07/29/2023] [Indexed: 08/05/2023]
Abstract
Uranium is a contaminate in the underground water in many regions of the world, which poses health risks to the local populations through drinking water. Although the health hazards of natural uranium have been concerned for decades, the controversies about its detrimental effects continue at present since it is still unclear how uranium interacts with molecular regulatory networks to generate toxicity. Here, we integrate transcriptomic and metabolomic methods to unveil the molecular mechanism of lipid metabolism disorder induced by uranium. Following exposure to uranium in drinking water for twenty-eight days, aberrant lipid metabolism and lipogenesis were found in the liver, accompanied with aggravated lipid peroxidation and an increase in dead cells. Multi-omics analysis reveals that uranium can promote the biosynthesis of unsaturated fatty acids through dysregulating the metabolism of arachidonic acid (AA), linoleic acid, and glycerophospholipid. Most notably, the disordered metabolism of polyunsaturated fatty acids (PUFAs) like AA may contribute to lipid peroxidation induced by uranium, which in turn triggers ferroptosis in hepatocytes. Our findings highlight disorder of lipid metabolism as an essential toxicological mechanism of uranium in the liver, offering insight into the health risks of uranium in drinking water.
Collapse
Affiliation(s)
- Ruixia Wang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yongjiu Chen
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Department of Unit III & Ostomy Service, Gastrointestinal Cancer Center, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Jiahao Chen
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China; School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Minghao Ma
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ming Xu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China; School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Sijin Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
8
|
Zhuang Y, Xing F, Ghosh D, Hobbs BD, Hersh CP, Banaei-Kashani F, Bowler RP, Kechris K. Deep learning on graphs for multi-omics classification of COPD. PLoS One 2023; 18:e0284563. [PMID: 37083575 PMCID: PMC10121008 DOI: 10.1371/journal.pone.0284563] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 04/03/2023] [Indexed: 04/22/2023] Open
Abstract
Network approaches have successfully been used to help reveal complex mechanisms of diseases including Chronic Obstructive Pulmonary Disease (COPD). However despite recent advances, we remain limited in our ability to incorporate protein-protein interaction (PPI) network information with omics data for disease prediction. New deep learning methods including convolution Graph Neural Network (ConvGNN) has shown great potential for disease classification using transcriptomics data and known PPI networks from existing databases. In this study, we first reconstructed the COPD-associated PPI network through the AhGlasso (Augmented High-Dimensional Graphical Lasso Method) algorithm based on one independent transcriptomics dataset including COPD cases and controls. Then we extended the existing ConvGNN methods to successfully integrate COPD-associated PPI, proteomics, and transcriptomics data and developed a prediction model for COPD classification. This approach improves accuracy over several conventional classification methods and neural networks that do not incorporate network information. We also demonstrated that the updated COPD-associated network developed using AhGlasso further improves prediction accuracy. Although deep neural networks often achieve superior statistical power in classification compared to other methods, it can be very difficult to explain how the model, especially graph neural network(s), makes decisions on the given features and identifies the features that contribute the most to prediction generally and individually. To better explain how the spectral-based Graph Neural Network model(s) works, we applied one unified explainable machine learning method, SHapley Additive exPlanations (SHAP), and identified CXCL11, IL-2, CD48, KIR3DL2, TLR2, BMP10 and several other relevant COPD genes in subnetworks of the ConvGNN model for COPD prediction. Finally, Gene Ontology (GO) enrichment analysis identified glycosaminoglycan, heparin signaling, and carbohydrate derivative signaling pathways significantly enriched in the top important gene/proteins for COPD classifications.
Collapse
Affiliation(s)
- Yonghua Zhuang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
- Biostatistics Shared Resource, University of Colorado Cancer Center, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
- Department of Pediatrics, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Fuyong Xing
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Brian D. Hobbs
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
- Harvard Medical School, Boston, MA, United States of America
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
- Harvard Medical School, Boston, MA, United States of America
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, University of Colorado Denver, Denver, CO, United States of America
| | | | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| |
Collapse
|
9
|
Shutta KH, Weighill D, Burkholz R, Guebila M, DeMeo DL, Zacharias HU, Quackenbush J, Altenbuchinger M. DRAGON: Determining Regulatory Associations using Graphical models on multi-Omic Networks. Nucleic Acids Res 2022; 51:e15. [PMID: 36533448 PMCID: PMC9943674 DOI: 10.1093/nar/gkac1157] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 11/08/2022] [Accepted: 11/23/2022] [Indexed: 12/23/2022] Open
Abstract
The increasing quantity of multi-omic data, such as methylomic and transcriptomic profiles collected on the same specimen or even on the same cell, provides a unique opportunity to explore the complex interactions that define cell phenotype and govern cellular responses to perturbations. We propose a network approach based on Gaussian Graphical Models (GGMs) that facilitates the joint analysis of paired omics data. This method, called DRAGON (Determining Regulatory Associations using Graphical models on multi-Omic Networks), calibrates its parameters to achieve an optimal trade-off between the network's complexity and estimation accuracy, while explicitly accounting for the characteristics of each of the assessed omics 'layers.' In simulation studies, we show that DRAGON adapts to edge density and feature size differences between omics layers, improving model inference and edge recovery compared to state-of-the-art methods. We further demonstrate in an analysis of joint transcriptome - methylome data from TCGA breast cancer specimens that DRAGON can identify key molecular mechanisms such as gene regulation via promoter methylation. In particular, we identify Transcription Factor AP-2 Beta (TFAP2B) as a potential multi-omic biomarker for basal-type breast cancer. DRAGON is available as open-source code in Python through the Network Zoo package (netZooPy v0.8; netzoo.github.io).
Collapse
Affiliation(s)
| | | | - Rebekka Burkholz
- CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Dawn L DeMeo
- Channing Division of Network Medicine, Brigham and Women’s Hospital, and Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Helena U Zacharias
- Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Campus Kiel, Kiel, Germany,Institute of Clinical Molecular Biology, Kiel University and University Medical Center Schleswig-Holstein, Campus Kiel, Kiel, Germany,Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover, Germany
| | | | - Michael Altenbuchinger
- To whom correspondence should be addressed. Tel: +49 551 39 61788; Fax: +49 551 39 61783;
| |
Collapse
|
10
|
Goutman SA, Guo K, Savelieff MG, Patterson A, Sakowski SA, Habra H, Karnovsky A, Hur J, Feldman EL. Metabolomics identifies shared lipid pathways in independent amyotrophic lateral sclerosis cohorts. Brain 2022; 145:4425-4439. [PMID: 35088843 PMCID: PMC9762943 DOI: 10.1093/brain/awac025] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/22/2021] [Accepted: 01/05/2022] [Indexed: 11/12/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease lacking effective treatments. This is due, in part, to a complex and incompletely understood pathophysiology. To shed light, we conducted untargeted metabolomics on plasma from two independent cross-sectional ALS cohorts versus control participants to identify recurrent dysregulated metabolic pathways. Untargeted metabolomics was performed on plasma from two ALS cohorts (cohort 1, n = 125; cohort 2, n = 225) and healthy controls (cohort 1, n = 71; cohort 2, n = 104). Individual differential metabolites in ALS cases versus controls were assessed by Wilcoxon, adjusted logistic regression and partial least squares-discriminant analysis, while group lasso explored sub-pathway level differences. Adjustment parameters included age, sex and body mass index. Metabolomics pathway enrichment analysis was performed on metabolites selected using the above methods. Additionally, we conducted a sex sensitivity analysis due to sex imbalance in the cohort 2 control arm. Finally, a data-driven approach, differential network enrichment analysis (DNEA), was performed on a combined dataset to further identify important ALS metabolic pathways. Cohort 2 ALS participants were slightly older than the controls (64.0 versus 62.0 years, P = 0.009). Cohort 2 controls were over-represented in females (68%, P < 0.001). The most concordant cohort 1 and 2 pathways centred heavily on lipid sub-pathways, including complex and signalling lipid species and metabolic intermediates. There were differences in sub-pathways that were enriched in ALS females versus males, including in lipid sub-pathways. Finally, DNEA of the merged metabolite dataset of both ALS and control cohorts identified nine significant subnetworks; three centred on lipids and two encompassed a range of sub-pathways. In our analysis, we saw consistent and important shared metabolic sub-pathways in both ALS cohorts, particularly in lipids, further supporting their importance as ALS pathomechanisms and therapeutics targets.
Collapse
Affiliation(s)
- Stephen A Goutman
- Department of Neurology, University of Michigan, Ann Arbor, MI, USA
- NeuroNetwork for Emerging Therapies, University of Michigan, Ann Arbor, MI, USA
| | - Kai Guo
- Department of Neurology, University of Michigan, Ann Arbor, MI, USA
- NeuroNetwork for Emerging Therapies, University of Michigan, Ann Arbor, MI, USA
| | - Masha G Savelieff
- NeuroNetwork for Emerging Therapies, University of Michigan, Ann Arbor, MI, USA
| | - Adam Patterson
- Department of Neurology, University of Michigan, Ann Arbor, MI, USA
- NeuroNetwork for Emerging Therapies, University of Michigan, Ann Arbor, MI, USA
| | - Stacey A Sakowski
- Department of Neurology, University of Michigan, Ann Arbor, MI, USA
- NeuroNetwork for Emerging Therapies, University of Michigan, Ann Arbor, MI, USA
| | - Hani Habra
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Alla Karnovsky
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Junguk Hur
- Department of Biomedical Sciences, University of North Dakota, Grand Forks, ND, USA
| | - Eva L Feldman
- Department of Neurology, University of Michigan, Ann Arbor, MI, USA
- NeuroNetwork for Emerging Therapies, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
11
|
Zaman A, Bivona TG. Quantitative Framework for Bench-to-Bedside Cancer Research. Cancers (Basel) 2022; 14:5254. [PMID: 36358671 PMCID: PMC9658824 DOI: 10.3390/cancers14215254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/20/2022] [Accepted: 10/24/2022] [Indexed: 11/29/2022] Open
Abstract
Bioscience is an interdisciplinary venture. Driven by a quantum shift in the volume of high throughput data and in ready availability of data-intensive technologies, mathematical and quantitative approaches have become increasingly common in bioscience. For instance, a recent shift towards a quantitative description of cells and phenotypes, which is supplanting conventional qualitative descriptions, has generated immense promise and opportunities in the field of bench-to-bedside cancer OMICS, chemical biology and pharmacology. Nevertheless, like any burgeoning field, there remains a lack of shared and standardized framework for quantitative cancer research. Here, in the context of cancer, we present a basic framework and guidelines for bench-to-bedside quantitative research and therapy. We outline some of the basic concepts and their parallel use cases for chemical-protein interactions. Along with several recommendations for assay setup and conditions, we also catalog applications of these quantitative techniques in some of the most widespread discovery pipeline and analytical methods in the field. We believe adherence to these guidelines will improve experimental design, reduce variabilities and standardize quantitative datasets.
Collapse
Affiliation(s)
- Aubhishek Zaman
- Department of Medicine, University of California, San Francisco, CA 94158, USA
- UCSF Helen Diller Comprehensive Cancer Center, University of California, San Francisco, CA 94158, USA
| | - Trever G. Bivona
- Department of Medicine, University of California, San Francisco, CA 94158, USA
- UCSF Helen Diller Comprehensive Cancer Center, University of California, San Francisco, CA 94158, USA
- Chan-Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
12
|
Zhou J, Hoen AG, Mcritchie S, Pathmasiri W, Viles WD, Nguyen QP, Madan JC, Dade E, Karagas MR, Gui J. Information enhanced model selection for Gaussian graphical model with application to metabolomic data. Biostatistics 2022; 23:926-948. [PMID: 33720330 PMCID: PMC9608647 DOI: 10.1093/biostatistics/kxab006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 11/12/2022] Open
Abstract
In light of the low signal-to-noise nature of many large biological data sets, we propose a novel method to learn the structure of association networks using Gaussian graphical models combined with prior knowledge. Our strategy includes two parts. In the first part, we propose a model selection criterion called structural Bayesian information criterion, in which the prior structure is modeled and incorporated into Bayesian information criterion. It is shown that the popular extended Bayesian information criterion is a special case of structural Bayesian information criterion. In the second part, we propose a two-step algorithm to construct the candidate model pool. The algorithm is data-driven and the prior structure is embedded into the candidate model automatically. Theoretical investigation shows that under some mild conditions structural Bayesian information criterion is a consistent model selection criterion for high-dimensional Gaussian graphical model. Simulation studies validate the superiority of the proposed algorithm over the existing ones and show the robustness to the model misspecification. Application to relative concentration data from infant feces collected from subjects enrolled in a large molecular epidemiological cohort study validates that metabolic pathway involvement is a statistically significant factor for the conditional dependence between metabolites. Furthermore, new relationships among metabolites are discovered which can not be identified by the conventional methods of pathway analysis. Some of them have been widely recognized in biological literature.
Collapse
Affiliation(s)
- Jie Zhou
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Anne G Hoen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Susan Mcritchie
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Wimal Pathmasiri
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Weston D Viles
- Department of Mathematics and Statistics, University of Southern Maine, 96 Falmouth St, Portland, ME 04103, USA
| | - Quang P Nguyen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Juliette C Madan
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Erika Dade
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Margaret R Karagas
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| |
Collapse
|
13
|
Wainberg M, Merico D, Keller MC, Fauman EB, Tripathy SJ. Predicting causal genes from psychiatric genome-wide association studies using high-level etiological knowledge. Mol Psychiatry 2022; 27:3095-3106. [PMID: 35411039 DOI: 10.1038/s41380-022-01542-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/08/2022] [Accepted: 03/21/2022] [Indexed: 12/24/2022]
Abstract
Genome-wide association studies have discovered hundreds of genomic loci associated with psychiatric traits, but the causal genes underlying these associations are often unclear, a research gap that has hindered clinical translation. Here, we present a Psychiatric Omnilocus Prioritization Score (PsyOPS) derived from just three binary features encapsulating high-level assumptions about psychiatric disease etiology - namely, that causal psychiatric disease genes are likely to be mutationally constrained, be specifically expressed in the brain, and overlap with known neurodevelopmental disease genes. To our knowledge, PsyOPS is the first method specifically tailored to prioritizing causal genes at psychiatric GWAS loci. We show that, despite its extreme simplicity, PsyOPS achieves state-of-the-art performance at this task, comparable to a prior domain-agnostic approach relying on tens of thousands of features. Genes prioritized by PsyOPS are substantially more likely than other genes at the same loci to have convergent evidence of direct regulation by the GWAS variant according to both DNA looping assays and expression or splicing quantitative trait locus (QTL) maps. We provide examples of genes hundreds of kilobases away from the lead variant, like GABBR1 for schizophrenia, that are prioritized by all three of PsyOPS, DNA looping and QTLs. Our results underscore the power of incorporating high-level knowledge of trait etiology into causal gene prediction at GWAS loci, and comprise a resource for researchers interested in experimentally characterizing psychiatric gene candidates.
Collapse
Affiliation(s)
- Michael Wainberg
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Daniele Merico
- Deep Genomics Inc, Toronto, ON, Canada.,The Centre for Applied Genomics (TCAG), The Hospital for Sick Children, Toronto, ON, Canada
| | - Matthew C Keller
- Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, USA.,Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
| | - Eric B Fauman
- Internal Medicine Research Unit, Pfizer Worldwide Research, Development and Medical, Cambridge, MA, USA
| | - Shreejoy J Tripathy
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health, Toronto, ON, Canada. .,Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada. .,Department of Psychiatry, University of Toronto, Toronto, ON, Canada. .,Department of Physiology, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
14
|
Zhuang Y, Xing F, Ghosh D, Banaei-Kashani F, Bowler RP, Kechris K. An Augmented High-Dimensional Graphical Lasso Method to Incorporate Prior Biological Knowledge for Global Network Learning. Front Genet 2022; 12:760299. [PMID: 35154240 PMCID: PMC8829118 DOI: 10.3389/fgene.2021.760299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/08/2021] [Indexed: 01/21/2023] Open
Abstract
Biological networks are often inferred through Gaussian graphical models (GGMs) using gene or protein expression data only. GGMs identify conditional dependence by estimating a precision matrix between genes or proteins. However, conventional GGM approaches often ignore prior knowledge about protein-protein interactions (PPI). Recently, several groups have extended GGM to weighted graphical Lasso (wGlasso) and network-based gene set analysis (Netgsa) and have demonstrated the advantages of incorporating PPI information. However, these methods are either computationally intractable for large-scale data, or disregard weights in the PPI networks. To address these shortcomings, we extended the Netgsa approach and developed an augmented high-dimensional graphical Lasso (AhGlasso) method to incorporate edge weights in known PPI with omics data for global network learning. This new method outperforms weighted graphical Lasso-based algorithms with respect to computational time in simulated large-scale data settings while achieving better or comparable prediction accuracy of node connections. The total runtime of AhGlasso is approximately five times faster than weighted Glasso methods when the graph size ranges from 1,000 to 3,000 with a fixed sample size (n = 300). The runtime difference between AhGlasso and weighted Glasso increases when the graph size increases. Using proteomic data from a study on chronic obstructive pulmonary disease, we demonstrate that AhGlasso improves protein network inference compared to the Netgsa approach by incorporating PPI information.
Collapse
Affiliation(s)
- Yonghua Zhuang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States,*Correspondence: Yonghua Zhuang, ; Katerina Kechris,
| | - Fuyong Xing
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, University of Colorado Denver, Denver, CO, United States
| | | | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States,*Correspondence: Yonghua Zhuang, ; Katerina Kechris,
| |
Collapse
|
15
|
Maudsley S, Leysen H, van Gastel J, Martin B. Systems Pharmacology: Enabling Multidimensional Therapeutics. COMPREHENSIVE PHARMACOLOGY 2022:725-769. [DOI: 10.1016/b978-0-12-820472-6.00017-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
16
|
Leysen H, Walter D, Christiaenssen B, Vandoren R, Harputluoğlu İ, Van Loon N, Maudsley S. GPCRs Are Optimal Regulators of Complex Biological Systems and Orchestrate the Interface between Health and Disease. Int J Mol Sci 2021; 22:ijms222413387. [PMID: 34948182 PMCID: PMC8708147 DOI: 10.3390/ijms222413387] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/08/2021] [Accepted: 12/09/2021] [Indexed: 02/06/2023] Open
Abstract
GPCRs arguably represent the most effective current therapeutic targets for a plethora of diseases. GPCRs also possess a pivotal role in the regulation of the physiological balance between healthy and pathological conditions; thus, their importance in systems biology cannot be underestimated. The molecular diversity of GPCR signaling systems is likely to be closely associated with disease-associated changes in organismal tissue complexity and compartmentalization, thus enabling a nuanced GPCR-based capacity to interdict multiple disease pathomechanisms at a systemic level. GPCRs have been long considered as controllers of communication between tissues and cells. This communication involves the ligand-mediated control of cell surface receptors that then direct their stimuli to impact cell physiology. Given the tremendous success of GPCRs as therapeutic targets, considerable focus has been placed on the ability of these therapeutics to modulate diseases by acting at cell surface receptors. In the past decade, however, attention has focused upon how stable multiprotein GPCR superstructures, termed receptorsomes, both at the cell surface membrane and in the intracellular domain dictate and condition long-term GPCR activities associated with the regulation of protein expression patterns, cellular stress responses and DNA integrity management. The ability of these receptorsomes (often in the absence of typical cell surface ligands) to control complex cellular activities implicates them as key controllers of the functional balance between health and disease. A greater understanding of this function of GPCRs is likely to significantly augment our ability to further employ these proteins in a multitude of diseases.
Collapse
Affiliation(s)
- Hanne Leysen
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Deborah Walter
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Bregje Christiaenssen
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Romi Vandoren
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - İrem Harputluoğlu
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
- Department of Chemistry, Middle East Technical University, Çankaya, Ankara 06800, Turkey
| | - Nore Van Loon
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Stuart Maudsley
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
- Correspondence:
| |
Collapse
|
17
|
Demirel HC, Arici MK, Tuncbag N. Computational approaches leveraging integrated connections of multi-omic data toward clinical applications. Mol Omics 2021; 18:7-18. [PMID: 34734935 DOI: 10.1039/d1mo00158b] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In line with the advances in high-throughput technologies, multiple omic datasets have accumulated to study biological systems and diseases coherently. No single omics data type is capable of fully representing cellular activity. The complexity of the biological processes arises from the interactions between omic entities such as genes, proteins, and metabolites. Therefore, multi-omic data integration is crucial but challenging. The impact of the molecular alterations in multi-omic data is not local in the neighborhood of the altered gene or protein; rather, the impact diffuses in the network and changes the functionality of multiple signaling pathways and regulation of the gene expression. Additionally, multi-omic data is high-dimensional and has background noise. Several integrative approaches have been developed to accurately interpret the multi-omic datasets, including machine learning, network-based methods, and their combination. In this review, we overview the most recent integrative approaches and tools with a focus on network-based methods. We then discuss these approaches according to their specific applications, from disease-network and biomarker identification to patient stratification, drug discovery, and repurposing.
Collapse
Affiliation(s)
- Habibe Cansu Demirel
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Muslum Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey.,Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, 06044, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, 34450, Turkey.,School of Medicine, Koc University, Istanbul, 34450, Turkey.,Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Turkey.
| |
Collapse
|
18
|
Yue K, Ma J, Thornton T, Shojaie A. REHE: Fast variance components estimation for linear mixed models. Genet Epidemiol 2021; 45:891-905. [PMID: 34658056 DOI: 10.1002/gepi.22432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 06/11/2021] [Accepted: 10/04/2021] [Indexed: 11/07/2022]
Abstract
Linear mixed models are widely used in ecological and biological applications, especially in genetic studies. Reliable estimation of variance components is crucial for using linear mixed models. However, standard methods, such as the restricted maximum likelihood (REML), are computationally inefficient in large samples and may be unstable with small samples. Other commonly used methods, such as the Haseman-Elston (HE) regression, may yield negative estimates of variances. Utilizing regularized estimation strategies, we propose the restricted Haseman-Elston (REHE) regression and REHE with resampling (reREHE) estimators, along with an inference framework for REHE, as fast and robust alternatives that provide nonnegative estimates with comparable accuracy to REML. The merits of REHE are illustrated using real data and benchmark simulation studies.
Collapse
Affiliation(s)
- Kun Yue
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Jing Ma
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Timothy Thornton
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| |
Collapse
|
19
|
Park Y, Heider D, Hauschild AC. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers (Basel) 2021; 13:3148. [PMID: 34202427 PMCID: PMC8269018 DOI: 10.3390/cancers13133148] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Dominik Heider
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
| | - Anne-Christin Hauschild
- Department of Mathematics and Computer Science, Philipps-University of Marburg, 35032 Marburg, Germany; (Y.P.); (D.H.)
- Department of Medical Informatics, University Medical Center Göttingen, 37075 Göttingen, Germany
| |
Collapse
|
20
|
Hellstern M, Ma J, Yue K, Shojaie A. netgsa: Fast computation and interactive visualization for topology-based pathway enrichment analysis. PLoS Comput Biol 2021; 17:e1008979. [PMID: 34115744 PMCID: PMC8221786 DOI: 10.1371/journal.pcbi.1008979] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 06/23/2021] [Accepted: 04/18/2021] [Indexed: 01/26/2023] Open
Abstract
Existing software tools for topology-based pathway enrichment analysis are either computationally inefficient, have undesirable statistical power, or require expert knowledge to leverage the methods' capabilities. To address these limitations, we have overhauled NetGSA, an existing topology-based method, to provide a computationally-efficient user-friendly tool that offers interactive visualization. Pathway enrichment analysis for thousands of genes can be performed in minutes on a personal computer without sacrificing statistical power. The new software also removes the need for expert knowledge by directly curating gene-gene interaction information from multiple external databases. Lastly, by utilizing the capabilities of Cytoscape, the new software also offers interactive and intuitive network visualization.
Collapse
Affiliation(s)
- Michael Hellstern
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Jing Ma
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Kun Yue
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington
| |
Collapse
|
21
|
Shojaie A. Differential Network Analysis: A Statistical Perspective. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2021; 13:e1508. [PMID: 37050915 PMCID: PMC10088462 DOI: 10.1002/wics.1508] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 03/03/2020] [Indexed: 11/06/2022]
Abstract
Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of complex diseases. They have also been used to gain insight into mechanisms of disease initiation and progression. Primarily motivated by biological applications, this article provides a review of recent statistical machine learning methods for inferring networks and identifying changes in their structures.
Collapse
Affiliation(s)
- Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle WA
| |
Collapse
|
22
|
Amirmahani F, Ebrahimi N, Molaei F, Faghihkhorasani F, Jamshidi Goharrizi K, Mirtaghi SM, Borjian‐Boroujeni M, Hamblin MR. Approaches for the integration of big data in translational medicine: single‐cell and computational methods. Ann N Y Acad Sci 2021; 1493:3-28. [PMID: 33410160 DOI: 10.1111/nyas.14544] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 10/31/2020] [Accepted: 11/12/2020] [Indexed: 12/11/2022]
Affiliation(s)
- Farzane Amirmahani
- Genetics Division, Department of Cell and Molecular Biology and Microbiology, Faculty of Science and Technology University of Isfahan Isfahan Iran
| | - Nasim Ebrahimi
- Genetics Division, Department of Cell and Molecular Biology and Microbiology, Faculty of Science and Technology University of Isfahan Isfahan Iran
| | - Fatemeh Molaei
- Department of Anesthesiology, Faculty of Paramedical Jahrom University of Medical Sciences Jahrom Iran
| | | | | | | | | | - Michael R. Hamblin
- Laser Research Centre, Faculty of Health Science University of Johannesburg South Africa
| |
Collapse
|
23
|
Yeganeh PN, Mostafavi MT. Causal Disturbance Analysis: A Novel Graph Centrality Based Method for Pathway Enrichment Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1613-1624. [PMID: 30908237 DOI: 10.1109/tcbb.2019.2907246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Pathway enrichment analysis models (PEM) are the premier methods for interpreting gene expression profiles from high-throughput experiments. PEM often use a priori background knowledge to infer the underlying biological functions and mechanisms. A shortcoming of standard PEM is their disregarding of interactions for simplicity, which potentially results in partial and inaccurate inference. In this study, we introduce a graph-based PEM, namely Causal Disturbance Analysis (CADIA), that leverages gene interactions to quantify the topological importance of genes' expression profiles in pathways organizations. In particular, CADIA uses a novel graph centrality model, namely Source/Sink, to measure the topological importance. Source/Sink Centrality quantifies a gene's importance as a receiver and a sender of biological information, which allows for prioritizing the genes that are more likely to disturb a pathways functionality. CADIA infers an enrichment score for a pathway by deriving statistical evidence from Source/Sink centrality of the differentially expressed genes and combines it with classical over-representation analysis. Through real-world experimental and synthetic data evaluations, we show that CADIA can uniquely infer critical pathway enrichments that are not observable through other PEM. Our results indicate that CADIA is sensitive towards topologically central gene-level changes that and provides an informative framework for interpreting high-throughput data.
Collapse
|
24
|
Naderi Yeganeh P, Richardson C, Saule E, Loraine A, Taghi Mostafavi M. Revisiting the use of graph centrality models in biological pathway analysis. BioData Min 2020; 13:5. [PMID: 32549913 PMCID: PMC7296696 DOI: 10.1186/s13040-020-00214-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 05/12/2020] [Indexed: 12/15/2022] Open
Abstract
The use of graph theory models is widespread in biological pathway analyses as it is often desired to evaluate the position of genes and proteins in their interaction networks of the biological systems. In this article, we argue that the common standard graph centrality measures do not sufficiently capture the informative topological organizations of the pathways, and thus, limit the biological inference. While key pathway elements may appear both upstream and downstream in pathways, standard directed graph centralities attribute significant topological importance to the upstream elements and evaluate the downstream elements as having no importance.We present a directed graph framework, Source/Sink Centrality (SSC), to address the limitations of standard models. SSC separately measures the importance of a node in the upstream and the downstream of a pathway, as a sender and a receiver of biological signals, and combines the two terms for evaluating the centrality. To validate SSC, we evaluate the topological position of known human cancer genes and mouse lethal genes in their respective KEGG annotated pathways and show that SSC-derived centralities provide an effective framework for associating higher positional importance to the genes with higher importance from a priori knowledge. While the presented work challenges some of the modeling assumptions in the common pathway analyses, it provides a straight-forward methodology to extend the existing models. The SSC extensions can result in more informative topological description of pathways, and thus, more informative biological inference.
Collapse
Affiliation(s)
- Pourya Naderi Yeganeh
- Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Ave., Boston, 02215 MA USA.,Department of Computer Science, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| | - Chrsitine Richardson
- Department of Biological Sciences, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| | - Erik Saule
- Department of Computer Science, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| | - Ann Loraine
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| | - M Taghi Mostafavi
- Department of Computer Science, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| |
Collapse
|
25
|
Djordjilović V, Chiogna M, Romualdi C. Simulating gene silencing through intervention analysis. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
26
|
Linder H, Zhang Y. A pan-cancer integrative pathway analysis of multi-omics data. QUANTITATIVE BIOLOGY 2020. [DOI: 10.1007/s40484-019-0185-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
27
|
Ma J, Shojaie A, Michailidis G. A comparative study of topology-based pathway enrichment analysis methods. BMC Bioinformatics 2019; 20:546. [PMID: 31684881 PMCID: PMC6829999 DOI: 10.1186/s12859-019-3146-1] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 10/02/2019] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Pathway enrichment extensively used in the analysis of Omics data for gaining biological insights into the functional roles of pre-defined subsets of genes, proteins and metabolites. A large number of methods have been proposed in the literature for this task. The vast majority of these methods use as input expression levels of the biomolecules under study together with their membership in pathways of interest. The latest generation of pathway enrichment methods also leverages information on the topology of the underlying pathways, which as evidence from their evaluation reveals, lead to improved sensitivity and specificity. Nevertheless, a systematic empirical comparison of such methods is still lacking, making selection of the most suitable method for a specific experimental setting challenging. This comparative study of nine network-based methods for pathway enrichment analysis aims to provide a systematic evaluation of their performance based on three real data sets with different number of features (genes/metabolites) and number of samples. RESULTS The findings highlight both methodological and empirical differences across the nine methods. In particular, certain methods assess pathway enrichment due to differences both across expression levels and in the strength of the interconnectedness of the members of the pathway, while others only leverage differential expression levels. In the more challenging setting involving a metabolomics data set, the results show that methods that utilize both pieces of information (with NetGSA being a prototypical one) exhibit superior statistical power in detecting pathway enrichment. CONCLUSION The analysis reveals that a number of methods perform equally well when testing large size pathways, which is the case with genomic data. On the other hand, NetGSA that takes into consideration both differential expression of the biomolecules in the pathway, as well as changes in the topology exhibits a superior performance when testing small size pathways, which is usually the case for metabolomics data.
Collapse
Affiliation(s)
- Jing Ma
- Texas A&M University, Department of Statistics, College Station, 77840 USA
- Fred Hutchinson Cancer Research Center, Public Health Sciences Division, Seattle, 98107 USA
| | - Ali Shojaie
- University of Washington, Department of Biostatistics, Seattle, 98105 USA
| | | |
Collapse
|
28
|
Navarro SL, Tarkhan A, Shojaie A, Randolph TW, Gu H, Djukovic D, Osterbauer KJ, Hullar MA, Kratz M, Neuhouser ML, Lampe PD, Raftery D, Lampe JW. Plasma metabolomics profiles suggest beneficial effects of a low-glycemic load dietary pattern on inflammation and energy metabolism. Am J Clin Nutr 2019; 110:984-992. [PMID: 31432072 PMCID: PMC6766441 DOI: 10.1093/ajcn/nqz169] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Accepted: 07/02/2019] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Low-glycemic load dietary patterns, characterized by consumption of whole grains, legumes, fruits, and vegetables, are associated with reduced risk of several chronic diseases. METHODS Using samples from a randomized, controlled, crossover feeding trial, we evaluated the effects on metabolic profiles of a low-glycemic whole-grain dietary pattern (WG) compared with a dietary pattern high in refined grains and added sugars (RG) for 28 d. LC-MS-based targeted metabolomics analysis was performed on fasting plasma samples from 80 healthy participants (n = 40 men, n = 40 women) aged 18-45 y. Linear mixed models were used to evaluate differences in response between diets for individual metabolites. Kyoto Encyclopedia of Genes and Genomes (KEGG)-defined pathways and 2 novel data-driven analyses were conducted to consider differences at the pathway level. RESULTS There were 121 metabolites with detectable signal in >98% of all plasma samples. Eighteen metabolites were significantly different between diets at day 28 [false discovery rate (FDR) < 0.05]. Inositol, hydroxyphenylpyruvate, citrulline, ornithine, 13-hydroxyoctadecadienoic acid, glutamine, and oxaloacetate were higher after the WG diet than after the RG diet, whereas melatonin, betaine, creatine, acetylcholine, aspartate, hydroxyproline, methylhistidine, tryptophan, cystamine, carnitine, and trimethylamine were lower. Analyses using KEGG-defined pathways revealed statistically significant differences in tryptophan metabolism between diets, with kynurenine and melatonin positively associated with serum C-reactive protein concentrations. Novel data-driven methods at the metabolite and network levels found correlations among metabolites involved in branched-chain amino acid (BCAA) degradation, trimethylamine-N-oxide production, and β oxidation of fatty acids (FDR < 0.1) that differed between diets, with more favorable metabolic profiles detected after the WG diet. Higher BCAAs and trimethylamine were positively associated with homeostasis model assessment-insulin resistance. CONCLUSIONS These exploratory metabolomics results support beneficial effects of a low-glycemic load dietary pattern characterized by whole grains, legumes, fruits, and vegetables, compared with a diet high in refined grains and added sugars on inflammation and energy metabolism pathways. This trial was registered at clinicaltrials.gov as NCT00622661.
Collapse
Affiliation(s)
- Sandi L Navarro
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA,Address correspondence to SLN (e-mail: )
| | - Aliasghar Tarkhan
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ali Shojaie
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA,Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Timothy W Randolph
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Haiwei Gu
- College of Health Solutions, Arizona State University, Phoenix, AZ, USA
| | - Danijel Djukovic
- Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine, University of Washington, Seattle, WA, USA
| | - Katie J Osterbauer
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Meredith A Hullar
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Mario Kratz
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Marian L Neuhouser
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Paul D Lampe
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Daniel Raftery
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA,Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine, University of Washington, Seattle, WA, USA
| | - Johanna W Lampe
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| |
Collapse
|
29
|
Ma J, Karnovsky A, Afshinnia F, Wigginton J, Rader DJ, Natarajan L, Sharma K, Porter AC, Rahman M, He J, Hamm L, Shafi T, Gipson D, Gadegbeku C, Feldman H, Michailidis G, Pennathur S. Differential network enrichment analysis reveals novel lipid pathways in chronic kidney disease. Bioinformatics 2019; 35:3441-3452. [PMID: 30887029 PMCID: PMC6748777 DOI: 10.1093/bioinformatics/btz114] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 01/31/2019] [Accepted: 02/12/2019] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION Functional enrichment testing methods can reduce data comprising hundreds of altered biomolecules to smaller sets of altered biological 'concepts' that help generate testable hypotheses. This study leveraged differential network enrichment analysis methodology to identify and validate lipid subnetworks that potentially differentiate chronic kidney disease (CKD) by severity or progression. RESULTS We built a partial correlation interaction network, identified highly connected network components, applied network-based gene-set analysis to identify differentially enriched subnetworks, and compared the subnetworks in patients with early-stage versus late-stage CKD. We identified two subnetworks 'triacylglycerols' and 'cardiolipins-phosphatidylethanolamines (CL-PE)' characterized by lower connectivity, and a higher abundance of longer polyunsaturated triacylglycerols in patients with severe CKD (stage ≥4) from the Clinical Phenotyping Resource and Biobank Core. These finding were replicated in an independent cohort, the Chronic Renal Insufficiency Cohort. Using an innovative method for elucidating biological alterations in lipid networks, we demonstrated alterations in triacylglycerols and cardiolipins-phosphatidylethanolamines that precede the clinical outcome of end-stage kidney disease by several years. AVAILABILITY AND IMPLEMENTATION A complete list of NetGSA results in HTML format can be found at http://metscape.ncibi.org/netgsa/12345-022118/cric_cprobe/022118/results_cric_cprobe/main.html. The DNEA is freely available at https://github.com/wiggie/DNEA. Java wrapper leveraging the cytoscape.js framework is available at http://js.cytoscape.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Ma
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Alla Karnovsky
- Department of Computational Medicine & Bioinformatics, Ann Arbor, MI, USA
- Michigan Regional Comprehensive Metabolomics Resource Core, Ann Arbor, MI, USA
| | - Farsad Afshinnia
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Janis Wigginton
- Michigan Regional Comprehensive Metabolomics Resource Core, Ann Arbor, MI, USA
| | - Daniel J Rader
- Department of Medicine, Translational-Clinical Research, University of Pennsylvania, Philadelphia, PA, USA
| | - Loki Natarajan
- Department of Family Medicine and Public Health, University of California at San Diego, San Diego, CA, USA
| | - Kumar Sharma
- Department of Internal Medicine, University of Texas Health at San Antonio, San Antonio, TX, USA
| | - Anna C Porter
- Department of Internal Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Mahboob Rahman
- Department of Internal Medicine, Case-Western Reserve University, Cleveland, OH, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Medicine, Tulane University, New Orleans, LA, USA
| | - Lee Hamm
- School of Medicine, Division of Nephrology and Hypertension, Tulane University, New Orleans, LA, USA
| | - Tariq Shafi
- Department of Internal Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Debbie Gipson
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA
| | - Crystal Gadegbeku
- Department of Internal Medicine, Temple University, Philadelphia, PA, USA
| | - Harold Feldman
- Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania, Philadelphia, PA, USA
| | - George Michailidis
- Michigan Regional Comprehensive Metabolomics Resource Core, Ann Arbor, MI, USA
- Department of Statistics and the Informatics Institute, University of Florida, Gainesville, FL, USA
| | - Subramaniam Pennathur
- Michigan Regional Comprehensive Metabolomics Resource Core, Ann Arbor, MI, USA
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
30
|
Wen Bin Goh W, Thalappilly S, Thibault G. Moving beyond the current limits of data analysis in longevity and healthy lifespan studies. Drug Discov Today 2019; 24:2273-2285. [PMID: 31499187 DOI: 10.1016/j.drudis.2019.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 08/03/2019] [Accepted: 08/28/2019] [Indexed: 11/19/2022]
Abstract
Living longer with sustainable quality of life is becoming increasingly important in aging populations. Understanding associative biological mechanisms have proven daunting, because of multigenicity and population heterogeneity. Although Big Data and Artificial Intelligence (AI) could help, naïve adoption is ill advised. We hold the view that model organisms are better suited for big-data analytics but might lack relevance because they do not immediately reflect the human condition. Resolving this hurdle and bridging the human-model organism gap will require some finesse. This includes improving signal:noise ratios by appropriate contextualization of high-throughput data, establishing consistency across multiple high-throughput platforms, and adopting supporting technologies that provide useful in silico and in vivo validation strategies.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Bio-Data Science and Education Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore.
| | - Subhash Thalappilly
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Guillaume Thibault
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore; Institute of Molecular and Cell Biology, A*STAR, 138673, Singapore.
| |
Collapse
|
31
|
Epigenetic loss of AOX1 expression via EZH2 leads to metabolic deregulations and promotes bladder cancer progression. Oncogene 2019; 39:6265-6285. [PMID: 31383940 DOI: 10.1038/s41388-019-0902-7] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/04/2019] [Accepted: 04/05/2019] [Indexed: 12/24/2022]
Abstract
Advanced Bladder Cancer (BLCA) remains a clinical challenge that lacks effective therapeutic measures. Here, we show that distinct, stage-wise metabolic alterations in BLCA are associated with the loss of function of aldehyde oxidase (AOX1). AOX1 associated metabolites have a high predictive value for advanced BLCA and our findings demonstrate that AOX1 is epigenetically silenced during BLCA progression by the methyltransferase activity of EZH2. Knockdown (KD) of AOX1 in normal bladder epithelial cells re-wires the tryptophan-kynurenine pathway resulting in elevated NADP levels which may increase metabolic flux through the pentose phosphate (PPP) pathway, enabling increased nucleotide synthesis, and promoting cell invasion. Inhibition of NADP synthesis rescues the metabolic effects of AOX1 KD. Ectopic AOX1 expression decreases NADP production, PPP flux and nucleotide synthesis, while decreasing invasion in cell line models and suppressing growth in tumor xenografts. Further gain and loss of AOX1 confirm the EZH2-dependent activation, metabolic deregulation, and tumor growth in BLCA. Our findings highlight the therapeutic potential of AOX1 and provide a basis for the development of prognostic markers for advanced BLCA.
Collapse
|
32
|
Yan J, Risacher SL, Shen L, Saykin AJ. Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Brief Bioinform 2019; 19:1370-1381. [PMID: 28679163 DOI: 10.1093/bib/bbx066] [Citation(s) in RCA: 135] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Indexed: 11/14/2022] Open
Abstract
In the past decade, significant progress has been made in complex disease research across multiple omics layers from genome, transcriptome and proteome to metabolome. There is an increasing awareness of the importance of biological interconnections, and much success has been achieved using systems biology approaches. However, because of the typical focus on one single omics layer at a time, existing systems biology findings explain only a modest portion of complex disease. Recent advances in multi-omics data collection and sharing present us new opportunities for studying complex diseases in a more comprehensive fashion, and yet simultaneously create new challenges considering the unprecedented data dimensionality and diversity. Here, our goal is to review extant and emerging network approaches that can be applied across multiple biological layers to facilitate a more comprehensive and integrative multilayered omics analysis of complex diseases.
Collapse
Affiliation(s)
- Jingwen Yan
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| |
Collapse
|
33
|
Vijayakumar S, Conway M, Lió P, Angione C. Seeing the wood for the trees: a forest of methods for optimization and omic-network integration in metabolic modelling. Brief Bioinform 2019; 19:1218-1235. [PMID: 28575143 DOI: 10.1093/bib/bbx053] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Indexed: 11/13/2022] Open
Abstract
Metabolic modelling has entered a mature phase with dozens of methods and software implementations available to the practitioner and the theoretician. It is not easy for a modeller to be able to see the wood (or the forest) for the trees. Driven by this analogy, we here present a 'forest' of principal methods used for constraint-based modelling in systems biology. This provides a tree-based view of methods available to prospective modellers, also available in interactive version at http://modellingmetabolism.net, where it will be kept updated with new methods after the publication of the present manuscript. Our updated classification of existing methods and tools highlights the most promising in the different branches, with the aim to develop a vision of how existing methods could hybridize and become more complex. We then provide the first hands-on tutorial for multi-objective optimization of metabolic models in R. We finally discuss the implementation of multi-view machine learning approaches in poly-omic integration. Throughout this work, we demonstrate the optimization of trade-offs between multiple metabolic objectives, with a focus on omic data integration through machine learning. We anticipate that the combination of a survey, a perspective on multi-view machine learning and a step-by-step R tutorial should be of interest for both the beginner and the advanced user.
Collapse
Affiliation(s)
| | - Max Conway
- Computer Laboratory, University of Cambridge, UK
| | - Pietro Lió
- Computer Laboratory, University of Cambridge, UK
| | - Claudio Angione
- Department of Computer Science and Information Systems, Teesside University, UK
| |
Collapse
|
34
|
Jung S. KEDDY: a knowledge-based statistical gene set test method to detect differential functional protein-protein interactions. Bioinformatics 2019; 35:619-627. [PMID: 30101275 DOI: 10.1093/bioinformatics/bty686] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Revised: 07/18/2018] [Accepted: 08/06/2018] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Identifying differential patterns between conditions is a popular approach to understanding the discrepancy between different biological contexts. Although many statistical tests were proposed for identifying gene sets with differential patterns based on different definitions of differentiality, few methods were suggested to identify gene sets with differential functional protein networks due to computational complexity. RESULTS We propose a method of Knowledge-based Evaluation of Dependency DifferentialitY (KEDDY), which is a statistical test for differential functional protein networks of a set of genes between two conditions with utilizing known functional protein-protein interaction information. Unlike other approaches focused on differential expressions of individual genes or differentiality of individual interactions, KEDDY compares two conditions by evaluating the probability distributions of functional protein networks based on known functional protein-protein interactions. The method has been evaluated and compared with previous methods through simulation studies, where KEDDY achieves significantly improved performance in accuracy and speed than the previous method that does not use prior knowledge and better performance in identifying gene sets with differential interactions than other methods evaluating changes in gene expressions. Applications to cancer data sets show that KEDDY identifies alternative cancer subtype-related differential gene sets compared to other differential expression-based methods, and the results also provide detailed gene regulatory information that drives the differentiality of the gene sets. AVAILABILITY AND IMPLEMENTATION The Java implementation of KEDDY is freely available to non-commercial users at https://sites.google.com/site/sjunggsm/keddy. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sungwon Jung
- Department of Genome Medicine and Science, Gachon University College of Medicine, Incheon, Republic of Korea.,Gachon Institute of Genome Medicine and Science, Gachon University Gil Medical Center, Incheon, Republic of Korea
| |
Collapse
|
35
|
Enhanced Molecular Appreciation of Psychiatric Disorders Through High-Dimensionality Data Acquisition and Analytics. Methods Mol Biol 2019; 2011:671-723. [PMID: 31273728 DOI: 10.1007/978-1-4939-9554-7_39] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The initial diagnosis, molecular investigation, treatment, and posttreatment care of major psychiatric disorders (schizophrenia and bipolar depression) are all still significantly hindered by the current inability to define these disorders in an explicit molecular signaling manner. High-dimensionality data analytics, using large datastreams from transcriptomic, proteomic, or metabolomic investigations, will likely advance both the appreciation of the molecular nature of major psychiatric disorders and simultaneously enhance our ability to more efficiently diagnose and treat these debilitating conditions. High-dimensionality data analysis in psychiatric research has been heterogeneous in aims and methods and limited by insufficient sample sizes, poorly defined case definitions, methodological inhomogeneity, and confounding results. All of these issues combine to constrain the conclusions that can be extracted from them. Here, we discuss possibilities for overcoming methodological challenges through the implementation of transcriptomic, proteomic, or metabolomics signatures in psychiatric diagnosis and offer an outlook for future investigations. To fulfill the promise of intelligent high-dimensionality data-based differential diagnosis in mental disease diagnosis and treatment, future research will need large, well-defined cohorts in combination with state-of-the-art technologies.
Collapse
|
36
|
Rush STA, Repsilber D. Capturing context-specific regulation in molecular interaction networks. BMC Bioinformatics 2018; 19:539. [PMID: 30577761 PMCID: PMC6303932 DOI: 10.1186/s12859-018-2513-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 11/20/2018] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Molecular profiles change in response to perturbations. These changes are coordinated into functional modules via regulatory interactions. The genes and their products within a functional module are expected to be differentially expressed in a manner coherent with their regulatory network. This perspective presents a promising approach to increase precision in detecting differential signals as well as for describing differential regulatory signals within the framework of a priori knowledge about the underlying network, and so from a mechanistic point of view. RESULTS We present Coherent Network Expression (CoNE), an effective procedure for identifying differentially activated functional modules in molecular interaction networks. Differential gene expression is chosen as example, and differential signals coherent with the regulatory nature of the network are identified. We apply our procedure to systematically simulated data, comparing its performance to alternative methods. We then take the example case of a transcription regulatory network in the context of particle-induced pulmonary inflammation, recapitulating and proposing additional candidates to previously obtained results. CoNE is conveniently implemented in an R-package along with simulation utilities. CONCLUSION Combining coherent interactions with error control on differential gene expression results in uniformly greater specificity in inference than error control alone, ensuring that captured functional modules constitute real findings.
Collapse
Affiliation(s)
- Stephen T. A. Rush
- School of Medical Sciences, Örebro University, Södra Grev Rosengatan, Örebro, Sweden
| | - Dirk Repsilber
- School of Medical Sciences, Örebro University, Södra Grev Rosengatan, Örebro, Sweden
| |
Collapse
|
37
|
Manatakis DV, Raghu VK, Benos PV. piMGM: incorporating multi-source priors in mixed graphical models for learning disease networks. Bioinformatics 2018; 34:i848-i856. [PMID: 30423087 PMCID: PMC6129280 DOI: 10.1093/bioinformatics/bty591] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Motivation Learning probabilistic graphs over mixed data is an important way to combine gene expression and clinical disease data. Leveraging the existing, yet imperfect, information in pathway databases for mixed graphical model (MGM) learning is an understudied problem with tremendous potential applications in systems medicine, the problems of which often involve high-dimensional data. Results We present a new method, piMGM, which can learn with accuracy the structure of probabilistic graphs over mixed data by appropriately incorporating priors from multiple experts with different degrees of reliability. We show that piMGM accurately scores the reliability of prior information from a given expert even at low sample sizes. The reliability scores can be used to determine active pathways in healthy and disease samples. We tested piMGM on both simulated and real data from TCGA, and we found that its performance is not affected by unreliable priors. We demonstrate the applicability of piMGM by successfully using prior information to identify pathway components that are important in breast cancer and improve cancer subtype classification. Availability and implementation http://www.benoslab.pitt.edu/manatakisECCB2018.html. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dimitris V Manatakis
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Vineet K Raghu
- Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA
| | - Panayiotis V Benos
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
38
|
Abstract
Background Gene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. This is an active area of research and numerous gene set analysis methods have been developed. Despite this popularity, systematic comparative studies have been limited in scope. Methods In this study we present a semi-synthetic simulation study using real datasets in order to test and compare commonly used methods. Results A software pipeline, Flexible Algorithm for Novel Gene set Simulation (FANGS) develops simulated data based on a prostate cancer dataset where the KRAS and TGF-β pathways were differentially expressed. The FANGS software is compatible with other datasets and pathways. Comparisons of gene set analysis methods are presented for Gene Set Enrichment Analysis (GSEA), Significance Analysis of Function and Expression (SAFE), sigPathway, and Correlation Adjusted Mean RAnk (CAMERA) methods. All gene set analysis methods are tested using gene sets from the MSigDB knowledge base. The false positive rate and power are estimated and presented for comparison. Recommendations are made for the utility of the default settings of methods and each method’s sensitivity towards various effect sizes. Conclusions The results of this study provide empirical guidance to users of gene set analysis methods. The FANGS software is available for researchers for continued methods comparisons. Electronic supplementary material The online version of this article (10.1186/s13040-018-0166-8) contains supplementary material, which is available to authorized users.
Collapse
|
39
|
Zhang Y, Linder MH, Shojaie A, Ouyang Z, Shen R, Baggerly KA, Baladandayuthapani V, Zhao H. Dissecting Pathway Disturbances Using Network Topology and Multi-platform Genomics Data. STATISTICS IN BIOSCIENCES 2018. [DOI: 10.1007/s12561-017-9193-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
40
|
Harrington LX, Way GP, Doherty JA, Greene CS. Functional network community detection can disaggregate and filter multiple underlying pathways in enrichment analyses. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:157-167. [PMID: 29218878 PMCID: PMC5760988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Differential expression experiments or other analyses often end in a list of genes. Pathway enrichment analysis is one method to discern important biological signals and patterns from noisy expression data. However, pathway enrichment analysis may perform suboptimally in situations where there are multiple implicated pathways - such as in the case of genes that define subtypes of complex diseases. Our simulation study shows that in this setting, standard overrepresentation analysis identifies many false positive pathways along with the true positives. These false positives hamper investigators' attempts to glean biological insights from enrichment analysis. We develop and evaluate an approach that combines community detection over functional networks with pathway enrichment to reduce false positives. Our simulation study demonstrates that a large reduction in false positives can be obtained with a small decrease in power. Though we hypothesized that multiple communities might underlie previously described subtypes of high-grade serous ovarian cancer and applied this approach, our results do not support this hypothesis. In summary, applying community detection before enrichment analysis may ease interpretation for complex gene sets that represent multiple distinct pathways.
Collapse
Affiliation(s)
- Lia X Harrington
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover 03784, USA,
| | | | | | | |
Collapse
|