1
|
Pashaei E, Liu S, Li K, Zang Y, Yang L, Lautenschlaeger T, Huang J, Lu X, Wan J. DiCE: differential centrality-ensemble analysis based on gene expression profiles and protein-protein interaction network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.14.638654. [PMID: 40166319 PMCID: PMC11956993 DOI: 10.1101/2025.03.14.638654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Uncovering key genes that drive diseases and cancers is crucial for advancing understanding and developing targeted therapies. Traditional differential expression analysis often relies on arbitrary cutoffs, missing critical genes with subtle expression changes. Some methods incorporate protein-protein interactions (PPIs) but depend on prior disease knowledge. To address these challenges, we developed DiCE (Differential Centrality-Ensemble), a novel approach that combines differential expression with network centrality analysis, independent of prior disease annotations. DiCE identifies candidate genes, refines them with an information gain filter, and reconstructs a condition-specific weighted PPI network. Using centrality measures, DiCE ranks genes based on expression shifts and network influence. Validated on prostate cancer datasets, DiCE identified genes over-represented in key pathways and cancer fitness genes, significantly correlating with disease-free survival (DFS), despite DFS not being used in selection. DiCE offers a comprehensive, unbiased approach to identifying disease-associated genes, advancing biomarker discovery and therapeutic development.
Collapse
|
2
|
Liu S, Nam HS, Zeng Z, Deng X, Pashaei E, Zang Y, Yang L, Li C, Huang J, Wendt MK, Lu X, Huang R, Wan J. CDHu40: a novel marker gene set of neuroendocrine prostate cancer. Brief Bioinform 2024; 25:bbae471. [PMID: 39318189 PMCID: PMC11422505 DOI: 10.1093/bib/bbae471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/22/2024] [Accepted: 09/10/2024] [Indexed: 09/26/2024] Open
Abstract
Prostate cancer (PCa) is the most prevalent cancer affecting American men. Castration-resistant prostate cancer (CRPC) can emerge during hormone therapy for PCa, manifesting with elevated serum prostate-specific antigen levels, continued disease progression, and/or metastasis to the new sites, resulting in a poor prognosis. A subset of CRPC patients shows a neuroendocrine (NE) phenotype, signifying reduced or no reliance on androgen receptor signaling and a particularly unfavorable prognosis. In this study, we incorporated computational approaches based on both gene expression profiles and protein-protein interaction networks. We identified 500 potential marker genes, which are significantly enriched in cell cycle and neuronal processes. The top 40 candidates, collectively named CDHu40, demonstrated superior performance in distinguishing NE PCa (NEPC) and non-NEPC samples based on gene expression profiles. CDHu40 outperformed most of the other published marker sets, excelling particularly at the prognostic level. Notably, some marker genes in CDHu40, absent in the other marker sets, have been reported to be associated with NEPC in the literature, such as DDC, FOLH1, BEX1, MAST1, and CACNA1A. Importantly, elevated CDHu40 scores derived from our predictive model showed a robust correlation with unfavorable survival outcomes in patients, indicating the potential of the CDHu40 score as a promising indicator for predicting the survival prognosis of those patients with the NE phenotype. Motif enrichment analysis on the top candidates suggests that REST and E2F6 may serve as key regulators in the NEPC progression.
Collapse
Affiliation(s)
- Sheng Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 W 10th Street, Indianapolis, IN 46202, United States
| | - Hye Seung Nam
- Borch Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, 575 Stadium Mall Drive, West Lafayette, IN 47907, United States
| | - Ziyu Zeng
- Department of Biological Sciences, Boler-Parseghian Center for Rare and Neglected Diseases, Harper Cancer Research Institute, University of Notre Dame, 100 Galvin Life Science Center, Notre Dame, IN 46556, United States
| | - Xuehong Deng
- Borch Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, 575 Stadium Mall Drive, West Lafayette, IN 47907, United States
| | - Elnaz Pashaei
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 W 10th Street, Indianapolis, IN 46202, United States
| | - Yong Zang
- Department of Biostatistics & Health Data Science, Indiana University School of Medicine, 410 W 10th Street, Indianapolis, IN 46202, United States
| | - Lei Yang
- Department of Pediatrics, Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, 1044 W Walnut St, Indianapolis, IN 46202, United States
| | - Chenglong Li
- Department of Medicinal Chemistry, College of Pharmacy, University of Florida, 1345 Center Dr Room P3-12, Gainesville, FL 32603, United States
| | - Jiaoti Huang
- Department of Pathology, Duke University School of Medicine, Davison Building, 40 Duke Medicine, Durham, NC 27710, United States
| | - Michael K Wendt
- Department of Internal Medicine, Division of Hematology and Oncology, University of Iowa, 200 Hawkins Dr, Iowa City, IA 52242, United States
- Holden Comprehensive Cancer Center, University of Iowa, 200 Hawkins Dr, Iowa City, IA, 52242, United States
| | - Xin Lu
- Department of Biological Sciences, Boler-Parseghian Center for Rare and Neglected Diseases, Harper Cancer Research Institute, University of Notre Dame, 100 Galvin Life Science Center, Notre Dame, IN 46556, United States
- Indiana University Simon Comprehensive Cancer Center, Indiana University School of Medicine, 535 Barnhill Dr, Indianapolis, IN 46202, United States
| | - Rong Huang
- Borch Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, 575 Stadium Mall Drive, West Lafayette, IN 47907, United States
| | - Jun Wan
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 W 10th Street, Indianapolis, IN 46202, United States
- Indiana University Simon Comprehensive Cancer Center, Indiana University School of Medicine, 535 Barnhill Dr, Indianapolis, IN 46202, United States
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 W 10th Street, Indianapolis, IN 46202, United States
| |
Collapse
|
3
|
Arici MK, Tuncbag N. Unveiling hidden connections in omics data via pyPARAGON: an integrative hybrid approach for disease network construction. Brief Bioinform 2024; 25:bbae399. [PMID: 39163205 PMCID: PMC11334722 DOI: 10.1093/bib/bbae399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 06/26/2024] [Accepted: 08/07/2024] [Indexed: 08/22/2024] Open
Abstract
Network inference or reconstruction algorithms play an integral role in successfully analyzing and identifying causal relationships between omics hits for detecting dysregulated and altered signaling components in various contexts, encompassing disease states and drug perturbations. However, accurate representation of signaling networks and identification of context-specific interactions within sparse omics datasets in complex interactomes pose significant challenges in integrative approaches. To address these challenges, we present pyPARAGON (PAgeRAnk-flux on Graphlet-guided network for multi-Omic data integratioN), a novel tool that combines network propagation with graphlets. pyPARAGON enhances accuracy and minimizes the inclusion of nonspecific interactions in signaling networks by utilizing network rather than relying on pairwise connections among proteins. Through comprehensive evaluations on benchmark signaling pathways, we demonstrate that pyPARAGON outperforms state-of-the-art approaches in node propagation and edge inference. Furthermore, pyPARAGON exhibits promising performance in discovering cancer driver networks. Notably, we demonstrate its utility in network-based stratification of patient tumors by integrating phosphoproteomic data from 105 breast cancer tumors with the interactome and demonstrating tumor-specific signaling pathways. Overall, pyPARAGON is a novel tool for analyzing and integrating multi-omic data in the context of signaling networks. pyPARAGON is available at https://github.com/netlab-ku/pyPARAGON.
Collapse
Affiliation(s)
- Muslum Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara 06800, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul 34450, Turkey
- School of Medicine, Koc University, Istanbul 34450, Turkey
- Koc University Research Center for Translational Medicine (KUTTAM), Koc University, Istanbul 34450, Turkey
| |
Collapse
|
4
|
Kim Y, Han Y, Hopper C, Lee J, Joo JI, Gong JR, Lee CK, Jang SH, Kang J, Kim T, Cho KH. A gray box framework that optimizes a white box logical model using a black box optimizer for simulating cellular responses to perturbations. CELL REPORTS METHODS 2024; 4:100773. [PMID: 38744288 PMCID: PMC11133856 DOI: 10.1016/j.crmeth.2024.100773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 03/19/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024]
Abstract
Predicting cellular responses to perturbations requires interpretable insights into molecular regulatory dynamics to perform reliable cell fate control, despite the confounding non-linearity of the underlying interactions. There is a growing interest in developing machine learning-based perturbation response prediction models to handle the non-linearity of perturbation data, but their interpretation in terms of molecular regulatory dynamics remains a challenge. Alternatively, for meaningful biological interpretation, logical network models such as Boolean networks are widely used in systems biology to represent intracellular molecular regulation. However, determining the appropriate regulatory logic of large-scale networks remains an obstacle due to the high-dimensional and discontinuous search space. To tackle these challenges, we present a scalable derivative-free optimizer trained by meta-reinforcement learning for Boolean network models. The logical network model optimized by the trained optimizer successfully predicts anti-cancer drug responses of cancer cell lines, while simultaneously providing insight into their underlying molecular regulatory mechanisms.
Collapse
Affiliation(s)
- Yunseong Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Younghyun Han
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Corbin Hopper
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jae Il Joo
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jeong-Ryeol Gong
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Chun-Kyung Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Seong-Hoon Jang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Junsoo Kang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Taeyoung Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
| |
Collapse
|
5
|
Liu S, Nam HS, Zeng Z, Deng X, Pashaei E, Zang Y, Yang L, Li C, Huang J, Wendt MK, Lu X, Huang R, Wan J. CDHu40: a novel marker gene set of neuroendocrine prostate cancer (NEPC). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.28.587205. [PMID: 38585861 PMCID: PMC10996696 DOI: 10.1101/2024.03.28.587205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Prostate cancer (PCa) is the most prevalent cancer affecting American men. Castration-resistant prostate cancer (CRPC) can emerge during hormone therapy for PCa, manifesting with elevated serum prostate-specific antigen (PSA) levels, continued disease progression, and/or metastasis to the new sites, resulting in a poor prognosis. A subset of CRPC patients shows a neuroendocrine (NE) phenotype, signifying reduced or no reliance on androgen receptor (AR) signaling and a particularly unfavorable prognosis. In this study, we incorporated computational approaches based on both gene expression profiles and protein-protein interaction (PPI) networks. We identified 500 potential marker genes, which are significantly enriched in cell cycle and neuronal processes. The top 40 candidates, collectively named as CDHu40, demonstrated superior performance in distinguishing NE prostate cancer (NEPC) and non-NEPC samples based on gene expression profiles compared to other published marker sets. Notably, some novel marker genes in CDHu40, absent in the other marker sets, have been reported to be associated with NEPC in the literature, such as DDC, FOLH1, BEX1, MAST1, and CACNA1A. Importantly, elevated CDHu40 scores derived from our predictive model showed a robust correlation with unfavorable survival outcomes in patients, indicating the potential of the CDHu40 score as a promising indicator for predicting the survival prognosis of those patients with the NE phenotype. Motif enrichment analysis on the top candidates suggests that REST and E2F6 may serve as key regulators in the NEPC progression. Significance our study integrates gene expression variances in multiple NEPC studies and protein-protein interaction network to pinpoint a specific set of NEPC maker genes namely CDHu40. These genes and scores based on their gene expression levels effectively distinguish NEPC samples and underscore the clinical prognostic significance and potential mechanism.
Collapse
|
6
|
Koo HJ, Pan W. Are trait-associated genes clustered together in a gene network? Genet Epidemiol 2024. [PMID: 38472164 DOI: 10.1002/gepi.22557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 01/25/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024]
Abstract
Genome-wide association studies (GWAS) have provided an abundance of information about the genetic variants and their loci that are associated to complex traits and diseases. However, due to linkage disequilibrium (LD) and noncoding regions of loci, it remains a challenge to pinpoint the causal genes. Gene network-based approaches, paired with network diffusion methods, have been proposed to prioritize causal genes and to boost statistical power in GWAS based on the assumption that trait-associated genes are clustered in a gene network. Due to the difficulty in mapping trait-associated variants to genes in GWAS, this assumption has never been directly or rigorously tested empirically. On the other hand, whole exome sequencing (WES) data focuses on the protein-coding regions, directly identifying trait-associated genes. In this study, we tested the assumption by leveraging the recently available exome-based association statistics from the UK Biobank WES data along with two types of networks. We found that almost all trait-associated genes were significantly more proximal to each other than randomly selected genes within both networks. These results support the assumption that trait-associated genes are clustered in gene networks, which can be further leveraged to boost the power of GWAS such as by introducing less stringent p value thresholds.
Collapse
Affiliation(s)
- Hyun Jung Koo
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, USA
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
7
|
Shah E, Maji P. Multi-View Kernel Learning for Identification of Disease Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2278-2290. [PMID: 37027602 DOI: 10.1109/tcbb.2023.3247033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Gene expression data sets and protein-protein interaction (PPI) networks are two heterogeneous data sources that have been extensively studied, due to their ability to capture the co-expression patterns among genes and their topological connections. Although they depict different traits of the data, both of them tend to group co-functional genes together. This phenomenon agrees with the basic assumption of multi-view kernel learning, according to which different views of the data contain a similar inherent cluster structure. Based on this inference, a new multi-view kernel learning based disease gene identification algorithm, termed as DiGId, is put forward. A novel multi-view kernel learning approach is proposed that aims to learn a consensus kernel, which efficiently captures the heterogeneous information of individual views as well as depicts the underlying inherent cluster structure. Some low-rank constraints are imposed on the learned multi-view kernel, so that it can effectively be partitioned into k or fewer clusters. The learned joint cluster structure is used to curate a set of potential disease genes. Moreover, a novel approach is put forward to quantify the importance of each view. In order to demonstrate the effectiveness of the proposed approach in capturing the relevant information depicted by individual views, an extensive analysis is performed on four different cancer-related gene expression data sets and PPI network, considering different similarity measures.
Collapse
|
8
|
Han S, Hong J, Yun SJ, Koo HJ, Kim TY. PWN: enhanced random walk on a warped network for disease target prioritization. BMC Bioinformatics 2023; 24:105. [PMID: 36944912 PMCID: PMC10031933 DOI: 10.1186/s12859-023-05227-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 03/13/2023] [Indexed: 03/23/2023] Open
Abstract
BACKGROUND Extracting meaningful information from unbiased high-throughput data has been a challenge in diverse areas. Specifically, in the early stages of drug discovery, a considerable amount of data was generated to understand disease biology when identifying disease targets. Several random walk-based approaches have been applied to solve this problem, but they still have limitations. Therefore, we suggest a new method that enhances the effectiveness of high-throughput data analysis with random walks. RESULTS We developed a new random walk-based algorithm named prioritization with a warped network (PWN), which employs a warped network to achieve enhanced performance. Network warping is based on both internal and external features: graph curvature and prior knowledge. CONCLUSIONS We showed that these compositive features synergistically increased the resulting performance when applied to random walk algorithms, which led to PWN consistently achieving the best performance among several other known methods. Furthermore, we performed subsequent experiments to analyze the characteristics of PWN.
Collapse
Affiliation(s)
- Seokjin Han
- Standigm Inc., 70, Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, 06234 Republic of Korea
| | - Jinhee Hong
- Standigm Inc., 70, Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, 06234 Republic of Korea
| | - So Jeong Yun
- Standigm Inc., 70, Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, 06234 Republic of Korea
| | - Hee Jung Koo
- Standigm UK Co., Ltd, 50-60 Station Road, Cambridge, CB1 2JH UK
| | - Tae Yong Kim
- Standigm Inc., 70, Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, 06234 Republic of Korea
| |
Collapse
|
9
|
Delmas M, Filangi O, Duperier C, Paulhe N, Vinson F, Rodriguez-Mier P, Giacomoni F, Jourdan F, Frainay C. Suggesting disease associations for overlooked metabolites using literature from metabolic neighbors. Gigascience 2022; 12:giad065. [PMID: 37712592 PMCID: PMC10502579 DOI: 10.1093/gigascience/giad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 06/13/2023] [Accepted: 07/28/2023] [Indexed: 09/16/2023] Open
Abstract
In human health research, metabolic signatures extracted from metabolomics data have a strong added value for stratifying patients and identifying biomarkers. Nevertheless, one of the main challenges is to interpret and relate these lists of discriminant metabolites to pathological mechanisms. This task requires experts to combine their knowledge with information extracted from databases and the scientific literature. However, we show that most compounds (>99%) in the PubChem database lack annotated literature. This dearth of available information can have a direct impact on the interpretation of metabolic signatures, which is often restricted to a subset of significant metabolites. To suggest potential pathological phenotypes related to overlooked metabolites that lack annotated literature, we extend the "guilt-by-association" principle to literature information by using a Bayesian framework. The underlying assumption is that the literature associated with the metabolic neighbors of a compound can provide valuable insights, or an a priori, into its biomedical context. The metabolic neighborhood of a compound can be defined from a metabolic network and correspond to metabolites to which it is connected through biochemical reactions. With the proposed approach, we suggest more than 35,000 associations between 1,047 overlooked metabolites and 3,288 diseases (or disease families). All these newly inferred associations are freely available on the FORUM ftp server (see information at https://github.com/eMetaboHUB/Forum-LiteraturePropagation).
Collapse
Affiliation(s)
- Maxime Delmas
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| | - Olivier Filangi
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Christophe Duperier
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Nils Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Florence Vinson
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, 31300, France
| | - Pablo Rodriguez-Mier
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| | - Franck Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Fabien Jourdan
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, 31300, France
| | - Clément Frainay
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| |
Collapse
|
10
|
Zhu Y, Zhang H, Yang Y, Zhang C, Ou-Yang L, Bai L, Deng M, Yi M, Liu S, Wang C. Discovery of pan-cancer related genes via integrative network analysis. Brief Funct Genomics 2022; 21:325-338. [PMID: 35760070 DOI: 10.1093/bfgp/elac012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/14/2022] [Accepted: 05/25/2022] [Indexed: 01/02/2023] Open
Abstract
Identification of cancer-related genes is helpful for understanding the pathogenesis of cancer, developing targeted drugs and creating new diagnostic and therapeutic methods. Considering the complexity of the biological laboratory methods, many network-based methods have been proposed to identify cancer-related genes at the global perspective with the increasing availability of high-throughput data. Some studies have focused on the tissue-specific cancer networks. However, cancers from different tissues may share common features, and those methods may ignore the differences and similarities across cancers during the establishment of modeling. In this work, in order to make full use of global information of the network, we first establish the pan-cancer network via differential network algorithm, which not only contains heterogeneous data across multiple cancer types but also contains heterogeneous data between tumor samples and normal samples. Second, the node representation vectors are learned by network embedding. In contrast to ranking analysis-based methods, with the help of integrative network analysis, we transform the cancer-related gene identification problem into a binary classification problem. The final results are obtained via ensemble classification. We further applied these methods to the most commonly used gene expression data involving six tissue-specific cancer types. As a result, an integrative pan-cancer network and several biologically meaningful results were obtained. As examples, nine genes were ultimately identified as potential pan-cancer-related genes. Most of these genes have been reported in published studies, thus showing our method's potential for application in identifying driver gene candidates for further biological experimental verification.
Collapse
Affiliation(s)
- Yuan Zhu
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence(Fudan University), Ministry of Education, Handan Road, 200433, Shanghai, China
| | - Houwang Zhang
- Electrical Engineering, City University of HongKong, Kowloon, 999077, HongKong, China
| | - Yuanhang Yang
- School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, The University of Southern Mississippi, Hattiesburg, USA
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, Shenzhen University, Nanhai Avenue, 518060, Shenzhen, China
| | - Litai Bai
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, No.5 Yiheyuan Road, 100871, Beijing, China
| | - Ming Yi
- School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
| | - Song Liu
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
| | - Chao Wang
- Hepatic Surgery Center, Institute of Hepato-Pancreato-Biliary Surgery, Department of Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Jiefang Avenue, 430030, Wuhan, China
| |
Collapse
|
11
|
Law JN, Akers K, Tasnina N, Santina CMD, Deutsch S, Kshirsagar M, Klein-Seetharaman J, Crovella M, Rajagopalan P, Kasif S, Murali TM. Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2. Gigascience 2021; 10:giab082. [PMID: 34966926 PMCID: PMC8716363 DOI: 10.1093/gigascience/giab082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 09/21/2021] [Accepted: 11/28/2021] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.
Collapse
Affiliation(s)
- Jeffrey N Law
- Interdisciplinary Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, USA
| | - Kyle Akers
- Interdisciplinary Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, USA
| | - Nure Tasnina
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | | - Shay Deutsch
- Department of Mathematics, University of California, Los Angeles, CA 90095, USA
| | | | | | - Mark Crovella
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| | | | - Simon Kasif
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
12
|
Demirel HC, Arici MK, Tuncbag N. Computational approaches leveraging integrated connections of multi-omic data toward clinical applications. Mol Omics 2021; 18:7-18. [PMID: 34734935 DOI: 10.1039/d1mo00158b] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
In line with the advances in high-throughput technologies, multiple omic datasets have accumulated to study biological systems and diseases coherently. No single omics data type is capable of fully representing cellular activity. The complexity of the biological processes arises from the interactions between omic entities such as genes, proteins, and metabolites. Therefore, multi-omic data integration is crucial but challenging. The impact of the molecular alterations in multi-omic data is not local in the neighborhood of the altered gene or protein; rather, the impact diffuses in the network and changes the functionality of multiple signaling pathways and regulation of the gene expression. Additionally, multi-omic data is high-dimensional and has background noise. Several integrative approaches have been developed to accurately interpret the multi-omic datasets, including machine learning, network-based methods, and their combination. In this review, we overview the most recent integrative approaches and tools with a focus on network-based methods. We then discuss these approaches according to their specific applications, from disease-network and biomarker identification to patient stratification, drug discovery, and repurposing.
Collapse
Affiliation(s)
- Habibe Cansu Demirel
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Muslum Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey.,Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, 06044, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, 34450, Turkey.,School of Medicine, Koc University, Istanbul, 34450, Turkey.,Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Turkey.
| |
Collapse
|