1
|
Kumar P, Metzger VT, Purushotham ST, Kedia P, Bologa CG, Lambert CG, Yang JJ. KG2ML: Integrating Knowledge Graphs and Positive Unlabeled Learning for Identifying Disease-Associated Genes. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.03.17.25323906. [PMID: 40166563 PMCID: PMC11957101 DOI: 10.1101/2025.03.17.25323906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Background Biomedical knowledge graphs (KGs), such as the Data Distillery Knowledge Graph (DDKG), capture known relationships among entities (e.g., genes, diseases, proteins), providing valuable insights for research. However, these relationships are typically derived from prior studies, leaving potential unknown associations unexplored. Identifying such unknown associations, including previously unknown disease-associated genes, remains a critical challenge in bioinformatics and is crucial for advancing biomedical knowledge. Traditional methods, such as linkage analysis and genome-wide association studies (GWAS), can be time-consuming and resource-intensive. This highlights the need for efficient computational approaches to identify or predict new genes using known disease-gene associations. Recently, network-based methods and KGs, enhanced by advances in machine learning (ML) frameworks, have emerged as promising tools for inferring these unexplored associations. Given the technical limitations of the Neo4j Graph Data Science (GDS) machine learning pipeline, we developed a novel machine learning pipeline called KG2ML (Knowledge Graph to Machine Learning). This pipeline utilizes our Positive and Unlabeled (PU) learning algorithm, PULSNAR (Positive Unlabeled Learning Selected Not At Random), and incorporates path-based feature extraction from ProteinGraphML. Results KG2ML was applied to 12 diseases, including Bipolar Disorder, Coronary Artery Disease, and Parkinson's Disease, to infer disease-associated genes not explicitly recorded in DDKG. For several of these diseases, 14 out of the 15 top-ranked genes lacked prior explicit associations in the DDKG but were supported by literature and TINX (Target Importance and Novelty Explorer) evidence. Incorporating PULSNAR-imputed genes as positives enhanced XGBoost classification, demonstrating the potential of PU learning in identifying hidden gene-disease relationships. Conclusion The observed improvement in classification performance after the inclusion of PULSNAR-imputed genes as positive examples, along with the subject matter experts' (SME) evaluations of the top 15 imputed genes for 12 diseases, suggests that PU learning can effectively uncover disease-gene associations missing from existing knowledge graphs (KGs). By integrating KG data with ML-based inference, our KG2ML pipeline provides a scalable and interpretable framework to advance biomedical research while addressing the inherent limitations of current KGs.
Collapse
Affiliation(s)
- Praveen Kumar
- University of New Mexico (UNM), School of Medicine, Department of Internal Medicine, Translational Informatics Division, Albuquerque, New Mexico, USA
| | - Vincent T Metzger
- University of New Mexico (UNM), School of Medicine, Department of Internal Medicine, Translational Informatics Division, Albuquerque, New Mexico, USA
| | - Swastika T Purushotham
- University of New Mexico (UNM), School of Medicine, Department of Internal Medicine, Translational Informatics Division, Albuquerque, New Mexico, USA
| | - Priyansh Kedia
- University of New Mexico (UNM), School of Medicine, Department of Internal Medicine, Translational Informatics Division, Albuquerque, New Mexico, USA
| | - Cristian G Bologa
- University of New Mexico (UNM), School of Medicine, Department of Internal Medicine, Translational Informatics Division, Albuquerque, New Mexico, USA
| | - Christophe G Lambert
- University of New Mexico (UNM), School of Medicine, Department of Internal Medicine, Translational Informatics Division, Albuquerque, New Mexico, USA
| | - Jeremy J Yang
- University of New Mexico (UNM), School of Medicine, Department of Internal Medicine, Translational Informatics Division, Albuquerque, New Mexico, USA
| |
Collapse
|
2
|
Guo X, Li J, Jiao P, Zhang W, Li T, Wang W. Counterfactual learning for higher-order relation prediction in heterogeneous information networks. Neural Netw 2025; 183:107024. [PMID: 39674095 DOI: 10.1016/j.neunet.2024.107024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 11/26/2024] [Accepted: 12/04/2024] [Indexed: 12/16/2024]
Abstract
Heterogeneous Information Networks (HINs) play a crucial role in modeling complex social systems, where predicting missing links/relations is a significant task. Existing methods primarily focus on pairwise relations, but real-world scenarios often involve multi-entity interactions. For example, in academic collaboration networks, an interaction occurs between a paper, a conference, and multiple authors. These higher-order relations are prevalent but have been underexplored. Moreover, existing methods often neglect the causal relationship between the global graph structure and the state of relations, limiting their ability to capture the fundamental factors driving relation prediction. In this paper, we propose HINCHOR, an end-to-end model for higher-order relation prediction in HINs. HINCHOR introduces a higher-order structure encoder to capture multi-entity proximity information. Then, it focuses on a counterfactual question: "If the global graph structure were different, would the higher-order relation change?" By presenting a counterfactual data augmentation module, HINCHOR utilizes global structure information to generate counterfactual relations. Through counterfactual learning, HINCHOR estimates causal effects while predicting higher-order relations. The experimental results on four constructed benchmark datasets show that HINCHOR outperforms existing state-of-the-art methods.
Collapse
Affiliation(s)
- Xuan Guo
- Tianjin University, Tianjin, 300350, China.
| | - Jie Li
- Tianjin University, Tianjin, 300350, China.
| | - Pengfei Jiao
- Hangzhou Dianzi University, Hangzhou, 310018, China.
| | - Wang Zhang
- Tianjin University, Tianjin, 300350, China.
| | | | - Wenjun Wang
- Tianjin University, Tianjin, 300350, China; Hainan Tropical Ocean University, Sanya, 572022, China; Yazhou Bay Innovation Institute, Hainan Tropical Ocean University, Sanya, 572022, China.
| |
Collapse
|
3
|
Mao S, Liu J. MulitDeepsurv: survival analysis of gastric cancer based on deep learning multimodal fusion models. BIOMEDICAL OPTICS EXPRESS 2025; 16:126-141. [PMID: 39816158 PMCID: PMC11729289 DOI: 10.1364/boe.541570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 11/21/2024] [Accepted: 12/01/2024] [Indexed: 01/18/2025]
Abstract
Gastric cancer is a leading cause of cancer-related deaths globally. As mortality rates continue to rise, predicting cancer survival using multimodal data-including histopathological images, genomic data, and clinical information-has become increasingly crucial. However, extracting effective predictive features from this complex data has posed challenges for survival analysis due to the high dimensionality and heterogeneity of histopathology images and genomic data. Furthermore, existing methods often lack sufficient interaction between intra- and inter-modal features, significantly impacting model performance. To address these challenges, we developed a deep learning-based multimodal feature fusion model, MultiDeepsurv, designed to predict the survival of gastric cancer patients by integrating histopathological images, clinical data, and gene expression data. Our approach includes a two-branch hybrid network, GLFUnet, which leverages the attention mechanism for enhanced pathology image representation learning. Additionally, we employ a graph convolutional neural network (GCN) to extract features from gene expression data and clinical information. To capture the correlations between different modalities, we utilize the SFusion fusion strategy that employs a self-attention mechanism to learn potential correlations across modalities. Finally, these deeply processed features are fed into Cox regression models for an end-to-end survival analysis. Comprehensive experiments and analyses conducted on a gastric cancer cohort from The Cancer Genome Atlas (TCGA) demonstrate that our proposed MultiDeepsurv model outperforms other methods in terms of prognostic accuracy, with a C-index of 0.806 and an AUC of 0.842.
Collapse
Affiliation(s)
- Songren Mao
- College of Computer Science and Technology, Taiyuan Normal University, JinZhong 030619, China
| | - Jie Liu
- Computer Engineering Department, Taiyuan Institute of Technology, Taiyuan 030008, China
| |
Collapse
|
4
|
Alsaggaf I, Freitas A, Wan C. Predicting the pro-longevity or anti-longevity effect of model organism genes with enhanced Gaussian noise augmentation-based contrastive learning on protein-protein interaction networks. NAR Genom Bioinform 2024; 6:lqae153. [PMID: 39633720 PMCID: PMC11616696 DOI: 10.1093/nargab/lqae153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 10/23/2024] [Accepted: 10/29/2024] [Indexed: 12/07/2024] Open
Abstract
Ageing is a highly complex and important biological process that plays major roles in many diseases. Therefore, it is essential to better understand the molecular mechanisms of ageing-related genes. In this work, we proposed a novel enhanced Gaussian noise augmentation-based contrastive learning (EGsCL) framework to predict the pro-longevity or anti-longevity effect of four model organisms' ageing-related genes by exploiting protein-protein interaction (PPI) networks. The experimental results suggest that EGsCL successfully outperformed the conventional Gaussian noise augmentation-based contrastive learning methods and obtained state-of-the-art performance on three model organisms' predictive tasks when merely relying on PPI network data. In addition, we use EGsCL to predict 10 novel pro-/anti-longevity mouse genes and discuss the support for these predictions in the literature.
Collapse
Affiliation(s)
- Ibrahim Alsaggaf
- School of Computing and Mathematical Sciences, Birkbeck, University of London, WC1E 7HX, London, UK
| | - Alex A Freitas
- School of Computing, University of Kent, CT2 7FS, Canterbury, Kent, UK
| | - Cen Wan
- School of Computing and Mathematical Sciences, Birkbeck, University of London, WC1E 7HX, London, UK
| |
Collapse
|
5
|
Zhou Z, Zhang R, Zhou A, Lv J, Chen S, Zou H, Zhang G, Lin T, Wang Z, Zhang Y, Weng S, Han X, Liu Z. Proteomics appending a complementary dimension to precision oncotherapy. Comput Struct Biotechnol J 2024; 23:1725-1739. [PMID: 38689716 PMCID: PMC11058087 DOI: 10.1016/j.csbj.2024.04.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/11/2024] [Accepted: 04/17/2024] [Indexed: 05/02/2024] Open
Abstract
Recent advances in high-throughput proteomic profiling technologies have facilitated the precise quantification of numerous proteins across multiple specimens concurrently. Researchers have the opportunity to comprehensively analyze the molecular signatures in plentiful medical specimens or disease pattern cell lines. Along with advances in data analysis and integration, proteomics data could be efficiently consolidated and employed to recognize precise elementary molecular mechanisms and decode individual biomarkers, guiding the precision treatment of tumors. Herein, we review a broad array of proteomics technologies and the progress and methods for the integration of proteomics data and further discuss how to better merge proteomics in precision medicine and clinical settings.
Collapse
Affiliation(s)
- Zhaokai Zhou
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Department of Urology, The First Affiliated Hospital of Zhengzhou University, Henan 450052, China
| | - Ruiqi Zhang
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Aoyang Zhou
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Jinxiang Lv
- Department of Gastroenterology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Shuang Chen
- Center of Reproductive Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Haijiao Zou
- Center of Reproductive Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Ge Zhang
- Department of Cardiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Ting Lin
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Zhan Wang
- Department of Urology, The First Affiliated Hospital of Zhengzhou University, Henan 450052, China
| | - Yuyuan Zhang
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Siyuan Weng
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Xinwei Han
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Interventional Institute of Zhengzhou University, Zhengzhou, Henan 450052, China
- Interventional Treatment and Clinical Research Center of Henan Province, Zhengzhou, Henan 450052, China
| | - Zaoqu Liu
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Interventional Institute of Zhengzhou University, Zhengzhou, Henan 450052, China
- Interventional Treatment and Clinical Research Center of Henan Province, Zhengzhou, Henan 450052, China
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China
| |
Collapse
|
6
|
Huang Y, Zhang H, Lin Z, Wei Y, Xi W. RevGraphVAMP: A protein molecular simulation analysis model combining graph convolutional neural networks and physical constraints. Methods 2024; 229:163-174. [PMID: 38972499 DOI: 10.1016/j.ymeth.2024.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 06/19/2024] [Accepted: 06/24/2024] [Indexed: 07/09/2024] Open
Abstract
Molecular dynamics simulation is a crucial research domain within the life sciences, focusing on comprehending the mechanisms of biomolecular interactions at atomic scales. Protein simulation, as a critical subfield, often utilizes MD for implementation, with trajectory data play a pivotal role in drug discovery. The advancement of high-performance computing and deep learning technology becomes popular and critical to predict protein properties from vast trajectory data, posing challenges regarding data features extraction from the complicated simulation data and dimensionality reduction. Simultaneously, it is essential to provide a meaningful explanation of the biological mechanism behind dimensionality. To tackle this challenge, we propose a new unsupervised model named RevGraphVAMP to intelligently analyze the simulation trajectory. This model is based on the variational approach for Markov processes (VAMP) and integrates graph convolutional neural networks and physical constraint optimization to enhance the learning performance. Additionally, we introduce attention mechanism to assess the importance of key interaction region, facilitating the interpretation of molecular mechanism. In comparison to other VAMPNets models, our model showcases competitive performance, improved accuracy in state transition prediction, as demonstrated through its application to two public datasets and the Shank3-Rap1 complex, which is associated with autism spectrum disorder. Moreover, it enhanced dimensionality reduction discrimination across different substates and provides interpretable results for protein structural characterization.
Collapse
Affiliation(s)
- Ying Huang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Huiling Zhang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China
| | - Zhenli Lin
- Department of Ophthalmology, Shenzhen University General Hospital, Shenzhen 518055, China
| | - Yanjie Wei
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen 518107, China.
| | - Wenhui Xi
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen 518107, China.
| |
Collapse
|
7
|
Saarinen H, Goldsmith M, Wang RS, Loscalzo J, Maniscalco S. Disease gene prioritization with quantum walks. Bioinformatics 2024; 40:btae513. [PMID: 39171848 PMCID: PMC11361815 DOI: 10.1093/bioinformatics/btae513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 06/23/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024] Open
Abstract
MOTIVATION Disease gene prioritization methods assign scores to genes or proteins according to their likely relevance for a given disease based on a provided set of seed genes. This scoring can be used to find new biologically relevant genes or proteins for many diseases. Although methods based on classical random walks have proven to yield competitive results, quantum walk methods have not been explored to this end. RESULTS We propose a new algorithm for disease gene prioritization based on continuous-time quantum walks using the adjacency matrix of a protein-protein interaction (PPI) network. We demonstrate the success of our proposed quantum walk method by comparing it to several well-known gene prioritization methods on three disease sets, across seven different PPI networks. In order to compare these methods, we use cross-validation and examine the mean reciprocal ranks of recall and average precision values. We further validate our method by performing an enrichment analysis of the predicted genes for coronary artery disease. AVAILABILITY AND IMPLEMENTATION The data and code for the methods can be accessed at https://github.com/markgolds/qdgp.
Collapse
Affiliation(s)
- Harto Saarinen
- Algorithmiq Ltd, FI-00160 Helsinki, Finland
- Department of Mathematics and Statistics, Complex Systems Research Group, University of Turku, FI-20014, Turku, Finland
| | - Mark Goldsmith
- Algorithmiq Ltd, FI-00160 Helsinki, Finland
- Department of Mathematics and Statistics, Complex Systems Research Group, University of Turku, FI-20014, Turku, Finland
| | - Rui-Sheng Wang
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, United States
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, United States
| | | |
Collapse
|
8
|
He X, Zhao L, Huang B, Zhang G, Lu Y, Mi D, Sun Y. Integrated analysis of miRNAome and transcriptome reveals that microgravity induces the alterations of critical functional gene modules via the regulation of miRNAs in short-term space-flown C. elegans. LIFE SCIENCES IN SPACE RESEARCH 2024; 42:117-132. [PMID: 39067983 DOI: 10.1016/j.lssr.2024.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 06/11/2024] [Accepted: 07/03/2024] [Indexed: 07/30/2024]
Abstract
Microgravity, as a unique hazardous factor encountered in space, can induce a series of harmful effects on living organisms. The impact of microgravity on the pivotal functional gene modules stemming from gene enrichment analysis via the regulation of miRNAs is not fully illustrated. To explore the microgravity-induced alterations in critical functional gene modules via the regulation of miRNAs, in the present study, we proposed a novel bioinformatics algorithm for the integrated analysis of miRNAome and transcriptome from short-term space-flown C. elegans. The samples of C. elegans were exposed to two space conditions, namely spaceflight (SF) and spaceflight control (SC) onboard the International Space Station for 4 days. Additionally, the samples of ground control (GC) were included for comparative analysis. Using the present algorithm, we constructed regulatory networks of functional gene modules annotated from differentially expressed genes (DEGs) and their associated regulatory differentially expressed miRNAs (DEmiRNAs). The results showed that functional gene modules of molting cycle, defense response, fatty acid metabolism, lysosome, and longevity regulating pathway were facilitated by 25 down-regulated DEmiRNAs (e.g., cel-miR-792, cel-miR-65, cel-miR-70, cel-lsy-6, cel-miR-796, etc.) in the SC vs. GC groups, whereas these modules were inhibited by 13 up-regulated DEmiRNAs (e.g., cel-miR-74, cel-miR-229, cel-miR-70, cel-miR-249, cel-miR-85, etc.) in the SF vs. GC groups. These findings indicated that microgravity could significantly alter gene expression patterns and their associated functional gene modules in short-term space-flown C. elegans. Additionally, we identified 34 miRNAs as post-transcriptional regulators that modulated these functional gene modules under microgravity conditions. Through the experimental verification, our results demonstrated that microgravity could induce the down-regulation of five critical functional gene modules (i.e., molting cycle, defense response, fatty acid metabolism, lysosome, and longevity regulating pathways) via the regulation of miRNAs in short-term space-flown C. elegans.
Collapse
Affiliation(s)
- Xinye He
- Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, Liaoning, PR China
| | - Lei Zhao
- Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, Liaoning, PR China.
| | - Baohang Huang
- Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, Liaoning, PR China
| | - Ge Zhang
- Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, Liaoning, PR China
| | - Ye Lu
- Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, Liaoning, PR China
| | - Dong Mi
- College of Science, Dalian Maritime University, Dalian, 116026, Liaoning, PR China
| | - Yeqing Sun
- Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, Liaoning, PR China.
| |
Collapse
|
9
|
Sajid S, Mashkoor M, Jørgensen MG, Christensen LP, Hansen PR, Franzyk H, Mirza O, Prabhala BK. The Y-ome Conundrum: Insights into Uncharacterized Genes and Approaches for Functional Annotation. Mol Cell Biochem 2024; 479:1957-1968. [PMID: 37610616 DOI: 10.1007/s11010-023-04827-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 08/09/2023] [Indexed: 08/24/2023]
Abstract
The ever-increasing availability of genome sequencing data has revealed a substantial number of uncharacterized genes without known functions across various organisms. The first comprehensive genome sequencing of E. coli K12 revealed that more than 50% of its open reading frames corresponded to transcripts with no known functions. The group of protein-coding genes without a functional description and/or a recognized pathway, beginning with the letter "Y", is classified as the "y-ome". Several efforts have been made to elucidate the functions of these genes and to recognize their role in biological processes. This review provides a brief update on various strategies employed when studying the y-ome, such as high-throughput experimental approaches, comparative omics, metabolic engineering, gene expression analysis, and data integration techniques. Additionally, we highlight recent advancements in functional annotation methods, including the use of machine learning, network analysis, and functional genomics approaches. Novel approaches are required to produce more precise functional annotations across the genome to reduce the number of genes with unknown functions.
Collapse
Affiliation(s)
- Salvia Sajid
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Maliha Mashkoor
- Department of Surgery, Center for Surgical Sciences, Zealand University Hospital, Lykkebækvej 1, 4600, Køge, Denmark
| | - Mikkel Girke Jørgensen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Lars Porskjær Christensen
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark
| | - Paul Robert Hansen
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
| | - Henrik Franzyk
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
| | - Osman Mirza
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100, Copenhagen Ø, Denmark
| | - Bala Krishna Prabhala
- Department of Physics, Chemistry, and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark.
| |
Collapse
|
10
|
Giudice G, Chen H, Koutsandreas T, Petsalaki E. phuEGO: A Network-Based Method to Reconstruct Active Signaling Pathways From Phosphoproteomics Datasets. Mol Cell Proteomics 2024; 23:100771. [PMID: 38642805 PMCID: PMC11134849 DOI: 10.1016/j.mcpro.2024.100771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 04/08/2024] [Accepted: 04/17/2024] [Indexed: 04/22/2024] Open
Abstract
Signaling networks are critical for virtually all cell functions. Our current knowledge of cell signaling has been summarized in signaling pathway databases, which, while useful, are highly biased toward well-studied processes, and do not capture context specific network wiring or pathway cross-talk. Mass spectrometry-based phosphoproteomics data can provide a more unbiased view of active cell signaling processes in a given context, however, it suffers from low signal-to-noise ratio and poor reproducibility across experiments. While progress in methods to extract active signaling signatures from such data has been made, there are still limitations with respect to balancing bias and interpretability. Here we present phuEGO, which combines up-to-three-layer network propagation with ego network decomposition to provide small networks comprising active functional signaling modules. PhuEGO boosts the signal-to-noise ratio from global phosphoproteomics datasets, enriches the resulting networks for functional phosphosites and allows the improved comparison and integration across datasets. We applied phuEGO to five phosphoproteomics data sets from cell lines collected upon infection with SARS CoV2. PhuEGO was better able to identify common active functions across datasets and to point to a subnetwork enriched for known COVID-19 targets. Overall, phuEGO provides a flexible tool to the community for the improved functional interpretation of global phosphoproteomics datasets.
Collapse
Affiliation(s)
- Girolamo Giudice
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Haoqi Chen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Thodoris Koutsandreas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Evangelia Petsalaki
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom.
| |
Collapse
|
11
|
Liu C, Xiao K, Yu C, Lei Y, Lyu K, Tian T, Zhao D, Zhou F, Tang H, Zeng J. A probabilistic knowledge graph for target identification. PLoS Comput Biol 2024; 20:e1011945. [PMID: 38578805 PMCID: PMC11034645 DOI: 10.1371/journal.pcbi.1011945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 04/22/2024] [Accepted: 02/24/2024] [Indexed: 04/07/2024] Open
Abstract
Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.
Collapse
Affiliation(s)
- Chang Liu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kaimin Xiao
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, China
- Joint Graduate Program of Peking-Tsinghua-NIBS, School of Life Sciences, Tsinghua University, Beijing, China
| | - Cuinan Yu
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, Jiangsu Province, China
| | - Yipin Lei
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kangbo Lyu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, Jilin Province, China
| | - Haidong Tang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- School of Engineering, Westlake University, Hangzhou, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- Research Center for Industries of the Future and School of Engineering, Westlake University, Hangzhou, Zhejiang Province, China
| |
Collapse
|
12
|
Ratajczak F, Joblin M, Hildebrandt M, Ringsquandl M, Falter-Braun P, Heinig M. Speos: an ensemble graph representation learning framework to predict core gene candidates for complex diseases. Nat Commun 2023; 14:7206. [PMID: 37938585 PMCID: PMC10632370 DOI: 10.1038/s41467-023-42975-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/27/2023] [Indexed: 11/09/2023] Open
Abstract
Understanding phenotype-to-genotype relationships is a grand challenge of 21st century biology with translational implications. The recently proposed "omnigenic" model postulates that effects of genetic variation on traits are mediated by core-genes and -proteins whose activities mechanistically influence the phenotype, whereas peripheral genes encode a regulatory network that indirectly affects phenotypes via core gene products. Here, we develop a positive-unlabeled graph representation-learning ensemble-approach based on a nested cross-validation to predict core-like genes for diverse diseases using Mendelian disorder genes for training. Employing mouse knockout phenotypes for external validations, we demonstrate that core-like genes display several key properties of core genes: Mouse knockouts of genes corresponding to our most confident predictions give rise to relevant mouse phenotypes at rates on par with the Mendelian disorder genes, and all candidates exhibit core gene properties like transcriptional deregulation in disease and loss-of-function intolerance. Moreover, as predicted for core genes, our candidates are enriched for drug targets and druggable proteins. In contrast to Mendelian disorder genes the new core-like genes are enriched for druggable yet untargeted gene products, which are therefore attractive targets for drug development. Interpretation of the underlying deep learning model suggests plausible explanations for our core gene predictions in form of molecular mechanisms and physical interactions. Our results demonstrate the potential of graph representation learning for the interpretation of biological complexity and pave the way for studying core gene properties and future drug development.
Collapse
Affiliation(s)
- Florin Ratajczak
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany
| | | | | | | | - Pascal Falter-Braun
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany.
- Microbe-Host Interactions, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| | - Matthias Heinig
- Institute of Computational Biology (ICB), Helmholtz Munich, Neuherberg, Germany.
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- German Centre for Cardiovascular Research (DZHK), Munich Heart Association, Partner Site Munich, Berlin, Germany.
| |
Collapse
|
13
|
Shin W, Kutmon M, Mina E, van Amelsvoort T, Evelo CT, Ehrhart F. Exploring pathway interactions to detect molecular mechanisms of disease: 22q11.2 deletion syndrome. Orphanet J Rare Dis 2023; 18:335. [PMID: 37872602 PMCID: PMC10594698 DOI: 10.1186/s13023-023-02953-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 10/10/2023] [Indexed: 10/25/2023] Open
Abstract
BACKGROUND 22q11.2 Deletion Syndrome (22q11DS) is a genetic disorder characterized by the deletion of adjacent genes at a location specified as q11.2 of chromosome 22, resulting in an array of clinical phenotypes including autistic spectrum disorder, schizophrenia, congenital heart defects, and immune deficiency. Many characteristics of the disorder are known, such as the phenotypic variability of the disease and the biological processes associated with it; however, the exact and systemic molecular mechanisms between the deleted area and its resulting clinical phenotypic expression, for example that of neuropsychiatric diseases, are not yet fully understood. RESULTS Using previously published transcriptomics data (GEO:GSE59216), we constructed two datasets: one set compares 22q11DS patients experiencing neuropsychiatric diseases versus healthy controls, and the other set 22q11DS patients without neuropsychiatric diseases versus healthy controls. We modified and applied the pathway interaction method, originally proposed by Kelder et al. (2011), on a network created using the WikiPathways pathway repository and the STRING protein-protein interaction database. We identified genes and biological processes that were exclusively associated with the development of neuropsychiatric diseases among the 22q11DS patients. Compared with the 22q11DS patients without neuropsychiatric diseases, patients experiencing neuropsychiatric diseases showed significant overrepresentation of regulated genes involving the natural killer cell function and the PI3K/Akt signalling pathway, with affected genes being closely associated with downregulation of CRK like proto-oncogene adaptor protein. Both the pathway interaction and the pathway overrepresentation analysis observed the disruption of the same biological processes, even though the exact lists of genes collected by the two methods were different. CONCLUSIONS Using the pathway interaction method, we were able to detect a molecular network that could possibly explain the development of neuropsychiatric diseases among the 22q11DS patients. This way, our method was able to complement the pathway overrepresentation analysis, by filling the knowledge gaps on how the affected pathways are linked to the original deletion on chromosome 22. We expect our pathway interaction method could be used for problems with similar contexts, where complex genetic mechanisms need to be identified to explain the resulting phenotypic plasticity.
Collapse
Affiliation(s)
- Woosub Shin
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
| | - Martina Kutmon
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Eleni Mina
- Leiden University, Leiden, The Netherlands
| | | | - Chris T Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6229 ER, The Netherlands.
- Psychiatry & Neuropsychology, MHeNs, Maastricht University, Maastricht, The Netherlands.
| |
Collapse
|
14
|
Mohseni Behbahani Y, Saighi P, Corsi F, Laine E, Carbone A. LEVELNET to visualize, explore, and compare protein-protein interaction networks. Proteomics 2023; 23:e2200159. [PMID: 37403279 DOI: 10.1002/pmic.202200159] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 04/27/2023] [Accepted: 04/28/2023] [Indexed: 07/06/2023]
Abstract
Physical interactions between proteins are central to all biological processes. Yet, the current knowledge of who interacts with whom in the cell and in what manner relies on partial, noisy, and highly heterogeneous data. Thus, there is a need for methods comprehensively describing and organizing such data. LEVELNET is a versatile and interactive tool for visualizing, exploring, and comparing protein-protein interaction (PPI) networks inferred from different types of evidence. LEVELNET helps to break down the complexity of PPI networks by representing them as multi-layered graphs and by facilitating the direct comparison of their subnetworks toward biological interpretation. It focuses primarily on the protein chains whose 3D structures are available in the Protein Data Bank. We showcase some potential applications, such as investigating the structural evidence supporting PPIs associated to specific biological processes, assessing the co-localization of interaction partners, comparing the PPI networks obtained through computational experiments versus homology transfer, and creating PPI benchmarks with desired properties.
Collapse
Affiliation(s)
- Yasser Mohseni Behbahani
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Paul Saighi
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Flavia Corsi
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| |
Collapse
|
15
|
Tian L, Yu T. An integrated deep learning framework for the interpretation of untargeted metabolomics data. Brief Bioinform 2023; 24:bbad244. [PMID: 37369636 DOI: 10.1093/bib/bbad244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 06/02/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Untargeted metabolomics is gaining widespread applications. The key aspects of the data analysis include modeling complex activities of the metabolic network, selecting metabolites associated with clinical outcome and finding critical metabolic pathways to reveal biological mechanisms. One of the key roadblocks in data analysis is not well-addressed, which is the problem of matching uncertainty between data features and known metabolites. Given the limitations of the experimental technology, the identities of data features cannot be directly revealed in the data. The predominant approach for mapping features to metabolites is to match the mass-to-charge ratio (m/z) of data features to those derived from theoretical values of known metabolites. The relationship between features and metabolites is not one-to-one since some metabolites share molecular composition, and various adduct ions can be derived from the same metabolite. This matching uncertainty causes unreliable metabolite selection and functional analysis results. Here we introduce an integrated deep learning framework for metabolomics data that take matching uncertainty into consideration. The model is devised with a gradual sparsification neural network based on the known metabolic network and the annotation relationship between features and metabolites. This architecture characterizes metabolomics data and reflects the modular structure of biological system. Three goals can be achieved simultaneously without requiring much complex inference and additional assumptions: (1) evaluate metabolite importance, (2) infer feature-metabolite matching likelihood and (3) select disease sub-networks. When applied to a COVID metabolomics dataset and an aging mouse brain dataset, our method found metabolic sub-networks that were easily interpretable.
Collapse
Affiliation(s)
- Leqi Tian
- School of Data Science, The Chinese University of Hong Kong - Shenzhen, Guangdong, China
- Shenzhen Research Institute of Big Data, Guangdong, China
| | - Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong - Shenzhen, Guangdong, China
- Shenzhen Research Institute of Big Data, Guangdong, China
- Guangdong Provincial Key Laboratory of Big Data Computing, Guangdong, China
| |
Collapse
|
16
|
Tian L, Wu W, Yu T. Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features. Biomolecules 2023; 13:1153. [PMID: 37509188 PMCID: PMC10377046 DOI: 10.3390/biom13071153] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/26/2023] [Accepted: 06/30/2023] [Indexed: 07/30/2023] Open
Abstract
Random Forest (RF) is a widely used machine learning method with good performance on classification and regression tasks. It works well under low sample size situations, which benefits applications in the field of biology. For example, gene expression data often involve much larger numbers of features (p) compared to the size of samples (n). Though the predictive accuracy using RF is often high, there are some problems when selecting important genes using RF. The important genes selected by RF are usually scattered on the gene network, which conflicts with the biological assumption of functional consistency between effective features. To improve feature selection by incorporating external topological information between genes, we propose the Graph Random Forest (GRF) for identifying highly connected important features by involving the known biological network when constructing the forest. The algorithm can identify effective features that form highly connected sub-graphs and achieve equivalent classification accuracy to RF. To evaluate the capability of our proposed method, we conducted simulation experiments and applied the method to two real datasets-non-small cell lung cancer RNA-seq data from The Cancer Genome Atlas, and human embryonic stem cell RNA-seq dataset (GSE93593). The resulting high classification accuracy, connectivity of selected sub-graphs, and interpretable feature selection results suggest the method is a helpful addition to graph-based classification models and feature selection procedures.
Collapse
Affiliation(s)
- Leqi Tian
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
| | - Wenbin Wu
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
- Guangdong Provincial Key Laboratory of Big Data Computing, Shenzhen 518172, China
| |
Collapse
|
17
|
Fasano M, Alberio T. Neurodegenerative disorders: From clinicopathology convergence to systems biology divergence. HANDBOOK OF CLINICAL NEUROLOGY 2023; 192:73-86. [PMID: 36796949 DOI: 10.1016/b978-0-323-85538-9.00007-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
Neurodegenerative diseases are multifactorial. This means that several genetic, epigenetic, and environmental factors contribute to their emergence. Therefore, for the future management of these highly prevalent diseases, it is necessary to change perspective. If a holistic viewpoint is assumed, the phenotype (the clinicopathological convergence) emerges from the perturbation of a complex system of functional interactions among proteins (systems biology divergence). The systems biology top-down approach starts with the unbiased collection of sets of data generated through one or more -omics techniques and has the aim to identify the networks and the components that participate in the generation of a phenotype (disease), often without any available a priori knowledge. The principle behind the top-down method is that the molecular components that respond similarly to experimental perturbations are somehow functionally related. This allows the study of complex and relatively poorly characterized diseases without requiring extensive knowledge of the processes under investigation. In this chapter, the use of a global approach will be applied to the comprehension of neurodegeneration, with a particular focus on the two most prevalent ones, Alzheimer's and Parkinson's diseases. The final purpose is to distinguish disease subtypes (even with similar clinical manifestations) to launch a future of precision medicine for patients with these disorders.
Collapse
Affiliation(s)
- Mauro Fasano
- Department of Science and High Technology, University of Insubria, Busto Arsizio and Como, Italy; Center of Neuroscience, University of Insubria, Busto Arsizio and Como, Italy.
| | - Tiziana Alberio
- Department of Science and High Technology, University of Insubria, Busto Arsizio and Como, Italy; Center of Neuroscience, University of Insubria, Busto Arsizio and Como, Italy
| |
Collapse
|
18
|
Jagodnik KM, Shvili Y, Bartal A. HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression. PLoS One 2023; 18:e0280839. [PMID: 36791052 PMCID: PMC9931161 DOI: 10.1371/journal.pone.0280839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/10/2023] [Indexed: 02/16/2023] Open
Abstract
Graph analytical approaches permit identifying novel genes involved in complex diseases, but are limited by (i) inferring structural network similarity of connected gene nodes, ignoring potentially relevant unconnected nodes; (ii) using homogeneous graphs, missing gene-disease associations' complexity; (iii) relying on disease/gene-phenotype associations' similarities, involving highly incomplete data; (iv) using binary classification, with gene-disease edges as positive training samples, and non-associated gene and disease nodes as negative samples that may include currently unknown disease genes; or (v) reporting predicted novel associations without systematically evaluating their accuracy. Addressing these limitations, we develop the Heterogeneous Integrated Graph for Predicting Disease Genes (HetIG-PreDiG) model that includes gene-gene, gene-disease, and gene-tissue associations. We predict novel disease genes using low-dimensional representation of nodes accounting for network structure, and extending beyond network structure using the developed Gene-Disease Prioritization Score (GDPS) reflecting the degree of gene-disease association via gene co-expression data. For negative training samples, we select non-associated gene and disease nodes with lower GDPS that are less likely to be affiliated. We evaluate the developed model's success in predicting novel disease genes by analyzing the prediction probabilities of gene-disease associations. HetIG-PreDiG successfully predicts (Micro-F1 = 0.95) gene-disease associations, outperforming baseline models, and is validated using published literature, thus advancing our understanding of complex genetic diseases.
Collapse
Affiliation(s)
- Kathleen M. Jagodnik
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- Department of Psychiatry, Harvard Medical School, Boston, MA, United States of America
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States of America
| | - Yael Shvili
- Department of Surgery A, Meir Medical Center, Kfar Sava, Israel
| | - Alon Bartal
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- * E-mail:
| |
Collapse
|
19
|
Sharma A, Patil SS, Muthu MS, Venkatesan V, Kirubakaran R, Nuvvula S, Arockiam S. Single nucleotide polymorphisms of enamel formation genes and early childhood caries - systematic review, gene-based, gene cluster and meta-analysis. J Indian Soc Pedod Prev Dent 2023; 41:3-15. [PMID: 37282406 DOI: 10.4103/jisppd.jisppd_78_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/08/2023] Open
Abstract
Introduction Genetic polymorphisms of genes regulating amelogenesis can alter susceptibility to Early Childhood Caries (ECC). This systematic review aims to analyze associations between single-nucleotide polymorphisms of enamel formation genes and ECC. Methods Search was conducted across PUBMED, CINAHL, LILACS, SCOPUS, EMBASE, Web of Science, Genome-Wide Association Studies databases from January 2003 to September 2022. This was supplemented by hand search. Totally 7124 articles were identified and 21 articles that satisfied the inclusion criteria proceeded to data extraction. Quality assessment was done using the Q-Genie tool. Results Quantitative synthesis revealed that homozygous genotype AA of rs12640848 was significantly higher in children with ECC with an odds ratio of 2.36. Gene-based analysis revealed significant association between six variants of AMBN, four variants of KLK4, two variants of MMP20, and a single variant of each of MMP9 and MMP13 genes and ECC. The Bonferroni corrected-log10 P value of amelogenesis gene Cluster was 2.25 (0.05/88 = 5.6 × 10-4). Search Tool for Retrieval of Interacting Genes and Proteins plot constructed to comprehend the protein-protein interaction revealed the presence of four functional clusters. Gene function prediction using Multiple Association Network Integration Algorithm revealed that physical interaction between these genes was 69.3%. Conclusion Polymorphisms of genes regulating amelogenesis can influence the susceptibility to ECC. AA genotype of rs12640848 may increase the susceptibility to ECC. Gene-based analysis revealed a significant association between multiple polymorphisms of genes regulating amelogenesis and ECC.
Collapse
Affiliation(s)
- Aruna Sharma
- Department of Pediatric and Preventive Dentistry, Centre for Early Childhood Caries and Research, Sri Ramachandra Institute of Higher Education and Research, Chennai, Tamil Nadu; Department of Pediatric and Preventive Dentistry, Indira Gandhi Institute of Dental Sciences, Sri Balaji Vidyapeeth, Puducherry, India
| | - Sneha S Patil
- Department of Environmental Health Engineering, Faculty of Public Health, Sri Ramachandra Institute of Higher Education and Research, Chennai, Tamil Nadu; Department of Pediatric and Preventive Dentistry, Dr. D.Y. Patil Dental College and Hospital, Dr. D.Y. Patil Vidyapeeth, Pune, Maharashtra, India
| | - M S Muthu
- Department of Pediatric and Preventive Dentistry, Centre for Early Childhood Caries and Research, Sri Ramachandra Institute of Higher Education and Research, Chennai, Tamil Nadu, India; Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, Ajman, United Arab Emirates
| | - Vettriselvi Venkatesan
- Department of Human Genetics, Sri Ramachandra Institute of Higher Education and Research, Chennai, Tamil Nadu, India
| | - Richard Kirubakaran
- South Asian Cochrane Network and Centre, Christian Medical College, Vellore, Tamil Nadu, India
| | - Sivakumar Nuvvula
- Department of Paediatric and Preventive Dentistry, Narayana Dental College and Hospital, Nellore, Andhra Pradesh, India
| | - Selva Arockiam
- Department of Orthodontics, Meenakshi Ammal Dental College and Hospital, Chennai, Tamil Nadu, India
| |
Collapse
|
20
|
Abstract
Thousands of genes are perturbed by cancer, and these disturbances can be seen in transcriptome, methylation, somatic mutation, and copy number variation omics studies. Understanding their connectivity patterns as an omnigenic neighbourhood in a molecular interaction network (interactome) is a key step towards advancing knowledge of the molecular mechanisms underlying cancers. Here, we introduce a unified connectivity line (CLine) to pinpoint omics-specific omnigenic patterns across 15 curated cancers. Taking advantage of the universality of CLine, we distinguish the peripheral and core genes for each omics aspect. We propose a network-based framework, multi-omics periphery and core (MOPC), to combine peripheral and core genes from different omics into a button-like structure. On the basis of network proximity, we provide evidence that core genes tend to be specifically perturbed in one omics, but the peripheral genes are diversely perturbed in multiple omics. And the core of one omics is regulated by multiple omics peripheries. Finally, we take the MOPC as an omnigenic neighbourhood, describe its characteristics, and explore its relative contribution to network-based mechanisms of cancer. We were able to present how multi-omics perturbations percolate through the human interactome and contribute to an integrated periphery and core.
Collapse
|
21
|
Qumsiyeh E, Showe L, Yousef M. GediNET for discovering gene associations across diseases using knowledge based machine learning approach. Sci Rep 2022; 12:19955. [PMID: 36402891 PMCID: PMC9675776 DOI: 10.1038/s41598-022-24421-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/15/2022] [Indexed: 11/21/2022] Open
Abstract
The most common approaches to discovering genes associated with specific diseases are based on machine learning and use a variety of feature selection techniques to identify significant genes that can serve as biomarkers for a given disease. More recently, the integration in this process of prior knowledge-based approaches has shown significant promise in the discovery of new biomarkers with potential translational applications. In this study, we developed a novel approach, GediNET, that integrates prior biological knowledge to gene Groups that are shown to be associated with a specific disease such as a cancer. The novelty of GediNET is that it then also allows the discovery of significant associations between that specific disease and other diseases. The initial step in this process involves the identification of gene Groups. The Groups are then subjected to a Scoring component to identify the top performing classification Groups. The top-ranked gene Groups are then used to train a Machine Learning Model. The process of Grouping, Scoring and Modelling (G-S-M) is used by GediNET to identify other diseases that are similarly associated with this signature. GediNET identifies these relationships through Disease-Disease Association (DDA) based machine learning. DDA explores novel associations between diseases and identifies relationships which could be used to further improve approaches to diagnosis, prognosis, and treatment. The GediNET KNIME workflow can be downloaded from: https://github.com/malikyousef/GediNET.git or https://kni.me/w/3kH1SQV_mMUsMTS .
Collapse
Affiliation(s)
- Emma Qumsiyeh
- Information Technology Engineering, Al-Quds University, Abu Dis, Palestine.
| | - Louise Showe
- The Wistar Institute, Philadelphia, PA, 19104, USA
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, 13206, Zefat, Israel.
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel.
| |
Collapse
|
22
|
Gentili M, Martini L, Sponziello M, Becchetti L. Biological Random Walks: multi-omics integration for disease gene prioritization. Bioinformatics 2022; 38:4145-4152. [PMID: 35792834 DOI: 10.1093/bioinformatics/btac446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 06/22/2022] [Accepted: 07/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Over the past decade, network-based approaches have proven useful in identifying disease modules within the human interactome, often providing insights into key mechanisms and guiding the quest for therapeutic targets. This is all the more important, since experimental investigation of potential gene candidates is an expensive task, thus not always a feasible option. On the other hand, many sources of biological information exist beyond the interactome and an important research direction is the design of effective techniques for their integration. RESULTS In this work, we introduce the Biological Random Walks (BRW) approach for disease gene prioritization in the human interactome. The proposed framework leverages multiple biological sources within an integrated framework. We perform an extensive, comparative study of BRW's performance against well-established baselines. AVAILABILITY AND IMPLEMENTATION All codes are publicly available and can be downloaded at https://github.com/LeoM93/BiologicalRandomWalks. We used publicly available datasets, details on their retrieval and preprocessing are provided in the Supplementary Material. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michele Gentili
- Department of Computer, Control, and Management Engineering Antonio Ruberti, Sapienza University of Rome, Rome, Italy
| | - Leonardo Martini
- Department of Computer, Control, and Management Engineering Antonio Ruberti, Sapienza University of Rome, Rome, Italy
| | - Marialuisa Sponziello
- Translational and Precision Medicine Department, Sapienza University of Rome, Rome, Italy
| | - Luca Becchetti
- Department of Computer, Control, and Management Engineering Antonio Ruberti, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
23
|
Zhu Y, Zhang H, Yang Y, Zhang C, Ou-Yang L, Bai L, Deng M, Yi M, Liu S, Wang C. Discovery of pan-cancer related genes via integrative network analysis. Brief Funct Genomics 2022; 21:325-338. [PMID: 35760070 DOI: 10.1093/bfgp/elac012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/14/2022] [Accepted: 05/25/2022] [Indexed: 01/02/2023] Open
Abstract
Identification of cancer-related genes is helpful for understanding the pathogenesis of cancer, developing targeted drugs and creating new diagnostic and therapeutic methods. Considering the complexity of the biological laboratory methods, many network-based methods have been proposed to identify cancer-related genes at the global perspective with the increasing availability of high-throughput data. Some studies have focused on the tissue-specific cancer networks. However, cancers from different tissues may share common features, and those methods may ignore the differences and similarities across cancers during the establishment of modeling. In this work, in order to make full use of global information of the network, we first establish the pan-cancer network via differential network algorithm, which not only contains heterogeneous data across multiple cancer types but also contains heterogeneous data between tumor samples and normal samples. Second, the node representation vectors are learned by network embedding. In contrast to ranking analysis-based methods, with the help of integrative network analysis, we transform the cancer-related gene identification problem into a binary classification problem. The final results are obtained via ensemble classification. We further applied these methods to the most commonly used gene expression data involving six tissue-specific cancer types. As a result, an integrative pan-cancer network and several biologically meaningful results were obtained. As examples, nine genes were ultimately identified as potential pan-cancer-related genes. Most of these genes have been reported in published studies, thus showing our method's potential for application in identifying driver gene candidates for further biological experimental verification.
Collapse
Affiliation(s)
- Yuan Zhu
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence(Fudan University), Ministry of Education, Handan Road, 200433, Shanghai, China
| | - Houwang Zhang
- Electrical Engineering, City University of HongKong, Kowloon, 999077, HongKong, China
| | - Yuanhang Yang
- School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, The University of Southern Mississippi, Hattiesburg, USA
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, Shenzhen University, Nanhai Avenue, 518060, Shenzhen, China
| | - Litai Bai
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, No.5 Yiheyuan Road, 100871, Beijing, China
| | - Ming Yi
- School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
| | - Song Liu
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
| | - Chao Wang
- Hepatic Surgery Center, Institute of Hepato-Pancreato-Biliary Surgery, Department of Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Jiefang Avenue, 430030, Wuhan, China
| |
Collapse
|
24
|
Network-Based Methods for Approaching Human Pathologies from a Phenotypic Point of View. Genes (Basel) 2022; 13:genes13061081. [PMID: 35741843 PMCID: PMC9222217 DOI: 10.3390/genes13061081] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 01/27/2023] Open
Abstract
Network and systemic approaches to studying human pathologies are helping us to gain insight into the molecular mechanisms of and potential therapeutic interventions for human diseases, especially for complex diseases where large numbers of genes are involved. The complex human pathological landscape is traditionally partitioned into discrete “diseases”; however, that partition is sometimes problematic, as diseases are highly heterogeneous and can differ greatly from one patient to another. Moreover, for many pathological states, the set of symptoms (phenotypes) manifested by the patient is not enough to diagnose a particular disease. On the contrary, phenotypes, by definition, are directly observable and can be closer to the molecular basis of the pathology. These clinical phenotypes are also important for personalised medicine, as they can help stratify patients and design personalised interventions. For these reasons, network and systemic approaches to pathologies are gradually incorporating phenotypic information. This review covers the current landscape of phenotype-centred network approaches to study different aspects of human diseases.
Collapse
|
25
|
Elastic network modeling of cellular networks unveils sensor and effector genes that control information flow. PLoS Comput Biol 2022; 18:e1010181. [PMID: 35639793 PMCID: PMC9216591 DOI: 10.1371/journal.pcbi.1010181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 06/22/2022] [Accepted: 05/07/2022] [Indexed: 12/03/2022] Open
Abstract
The high-level organization of the cell is embedded in indirect relationships that connect distinct cellular processes. Existing computational approaches for detecting indirect relationships between genes typically consist of propagating abstract information through network representations of the cell. However, the selection of genes to serve as the source of propagation is inherently biased by prior knowledge. Here, we sought to derive an unbiased view of the high-level organization of the cell by identifying the genes that propagate and receive information most effectively in the cell, and the indirect relationships between these genes. To this aim, we adapted a perturbation-response scanning strategy initially developed for identifying allosteric interactions within proteins. We deployed this strategy onto an elastic network model of the yeast genetic interaction profile similarity network. This network revealed a superior propensity for information propagation relative to simulated networks with similar topology. Perturbation-response scanning identified the major distributors and receivers of information in the network, named effector and sensor genes, respectively. Effectors formed dense clusters centrally integrated into the network, whereas sensors formed loosely connected antenna-shaped clusters and contained genes with previously characterized involvement in signal transduction. We propose that indirect relationships between effector and sensor clusters represent major paths of information flow between distinct cellular processes. Genetic similarity networks for fission yeast and human displayed similarly strong propensities for information propagation and clusters of effector and sensor genes, suggesting that the global architecture enabling indirect relationships is evolutionarily conserved across species. Our results demonstrate that elastic network modeling of cellular networks constitutes a promising strategy to probe the high-level organization and cooperativity in the cell.
Collapse
|
26
|
Erdogan F, Radu TB, Orlova A, Qadree AK, de Araujo ED, Israelian J, Valent P, Mustjoki SM, Herling M, Moriggl R, Gunning PT. JAK-STAT core cancer pathway: An integrative cancer interactome analysis. J Cell Mol Med 2022; 26:2049-2062. [PMID: 35229974 PMCID: PMC8980946 DOI: 10.1111/jcmm.17228] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 12/14/2021] [Accepted: 12/22/2021] [Indexed: 12/25/2022] Open
Abstract
Through a comprehensive review and in silico analysis of reported data on STAT-linked diseases, we analysed the communication pathways and interactome of the seven STATs in major cancer categories and proposed rational targeting approaches for therapeutic intervention to disrupt critical pathways and addictions to hyperactive JAK/STAT in neoplastic states. Although all STATs follow a similar molecular activation pathway, STAT1, STAT2, STAT4 and STAT6 exert specific biological profiles associated with a more restricted pattern of activation by cytokines. STAT3 and STAT5A as well as STAT5B have pleiotropic roles in the body and can act as critical oncogenes that promote many processes involved in cancer development. STAT1, STAT3 and STAT5 also possess tumour suppressive action in certain mutational and cancer type context. Here, we demonstrated member-specific STAT activity in major cancer types. Through systems biology approaches, we found surprising roles for EGFR family members, sex steroid hormone receptor ESR1 interplay with oncogenic STAT function and proposed new drug targeting approaches of oncogenic STAT pathway addiction.
Collapse
Affiliation(s)
- Fettah Erdogan
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| | - Tudor Bogdan Radu
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| | - Anna Orlova
- Institute of Animal Breeding and GeneticsUniversity of Veterinary MedicineViennaAustria
| | - Abdul Khawazak Qadree
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| | - Elvin Dominic de Araujo
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
| | - Johan Israelian
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| | - Peter Valent
- Division of Hematology and HemostaseologyDepartment of Internal Medicine IMedical University of ViennaViennaAustria
- Ludwig Boltzmann Institute for Hematology and OncologyMedical University of ViennaViennaAustria
| | - Satu M. Mustjoki
- Translational Immunology Research Program and Department of Clinical Chemistry and HematologyUniversity of HelsinkiHelsinkiFinland
- Hematology Research UnitHelsinki University Hospital Comprehensive Cancer CenterHelsinkiFinland
- iCAN Digital Precision Cancer Medicine FlagshipHelsinkiFinland
| | - Marco Herling
- Department of Hematology, Cellular Therapy, and HemostaseologyUniversity of LeipzigLeipzigGermany
| | - Richard Moriggl
- Institute of Animal Breeding and GeneticsUniversity of Veterinary MedicineViennaAustria
| | - Patrick Thomas Gunning
- Department of Chemical and Physical SciencesUniversity of Toronto MississaugaMississaugaOntarioCanada
- Department of ChemistryUniversity of TorontoTorontoOntarioCanada
| |
Collapse
|
27
|
Shah E, Maji P. Scalable Non-Linear Graph Fusion for Prioritizing Cancer-Causing Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1130-1143. [PMID: 32966220 DOI: 10.1109/tcbb.2020.3026219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the past few decades, both gene expression data and protein-protein interaction (PPI)networks have been extensively studied, due to their ability to depict important characteristics of disease-associated genes. In this regard, the paper presents a new gene prioritization algorithm to identify and prioritize cancer-causing genes, integrating judiciously the complementary information obtained from two data sources. The proposed algorithm selects disease-causing genes by maximizing the importance of selected genes and functional similarity among them. A new quantitative index is introduced to evaluate the importance of a gene. It considers whether a gene exhibits a differential expression pattern across sick and healthy individuals, and has a strong connectivity in the PPI network, which are the important characteristics of a potential biomarker. As disease-associated genes are expected to have similar expression profiles and topological structures, a scalable non-linear graph fusion technique, termed as ScaNGraF, is proposed to learn a disease-dependent functional similarity network from the co-expression and common neighbor based similarity networks. The proposed ScaNGraF, which is based on message passing algorithm, efficiently combines the shared and complementary information provided by different data sources with significantly lower computational cost. A new measure, termed as DiCoIN, is introduced to evaluate the quality of a learned affinity network. The performance of the proposed graph fusion technique and gene selection algorithm is extensively compared with that of some existing methods, using several cancer data sets.
Collapse
|
28
|
Nam S, Lee S, Park S, Lee J, Park A, Kim YH, Park T. PATHOME-Drug: a subpathway-based polypharmacology drug-repositioning method. Bioinformatics 2022; 38:444-452. [PMID: 34515762 DOI: 10.1093/bioinformatics/btab566] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 06/10/2021] [Accepted: 09/09/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Drug repositioning reveals novel indications for existing drugs and in particular, diseases with no available drugs. Diverse computational drug repositioning methods have been proposed by measuring either drug-treated gene expression signatures or the proximity of drug targets and disease proteins found in prior networks. However, these methods do not explain which signaling subparts allow potential drugs to be selected, and do not consider polypharmacology, i.e. multiple targets of a known drug, in specific subparts. RESULTS Here, to address the limitations, we developed a subpathway-based polypharmacology drug repositioning method, PATHOME-Drug, based on drug-associated transcriptomes. Specifically, this tool locates subparts of signaling cascading related to phenotype changes (e.g. disease status changes), and identifies existing approved drugs such that their multiple targets are enriched in the subparts. We show that our method demonstrated better performance for detecting signaling context and specific drugs/compounds, compared to WebGestalt and clusterProfiler, for both real biological and simulated datasets. We believe that our tool can successfully address the current shortage of targeted therapy agents. AVAILABILITY AND IMPLEMENTATION The web-service is available at http://statgen.snu.ac.kr/software/pathome. The source codes and data are available at https://github.com/labnams/pathome-drug. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Seungyoon Nam
- Department of Genome Medicine and Science, College of Medicine, Gachon University, 21565 Incheon, Korea.,Department of Life Sciences, Gachon University, 13120 Seongnam, Korea.,Gachon Institute of Genomic Medicine and Science, Gachon University Gil Medical Center, 21565 Incheon, Korea.,Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology, Gachon University, 21999 Incheon, Korea
| | - Sungyoung Lee
- Department of Genomic Medicine, Seoul National University Hospital, 03080 Seoul, Korea.,Center for Precision Medicine, Seoul National University Hospital, 03080 Seoul, Korea
| | - Sungjin Park
- Department of Genome Medicine and Science, College of Medicine, Gachon University, 21565 Incheon, Korea.,Gachon Institute of Genomic Medicine and Science, Gachon University Gil Medical Center, 21565 Incheon, Korea
| | - Jinhyuk Lee
- Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, 34141 Daejeon, Korea.,Department of Bioinformatics, University of Sciences and Technology, 34113 Daejeon, Korea
| | - Aron Park
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology, Gachon University, 21999 Incheon, Korea
| | - Yon Hui Kim
- Department of Biomedical Science, Hanyang University, 04763 Seoul, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, 08826 Seoul, Korea.,Department of Statistics, Seoul National University, 08826 Seoul, Korea
| |
Collapse
|
29
|
Le DH. A network-based method for predicting disease-associated enhancers. PLoS One 2021; 16:e0260432. [PMID: 34879086 PMCID: PMC8654176 DOI: 10.1371/journal.pone.0260432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 11/09/2021] [Indexed: 11/18/2022] Open
Abstract
Background Enhancers regulate transcription of target genes, causing a change in expression level. Thus, the aberrant activity of enhancers can lead to diseases. To date, a large number of enhancers have been identified, yet a small portion of them have been found to be associated with diseases. This raises a pressing need to develop computational methods to predict associations between diseases and enhancers. Results In this study, we assumed that enhancers sharing target genes could be associated with similar diseases to predict the association. Thus, we built an enhancer functional interaction network by connecting enhancers significantly sharing target genes, then developed a network diffusion method RWDisEnh, based on a random walk with restart algorithm, on networks of diseases and enhancers to globally measure the degree of the association between diseases and enhancers. RWDisEnh performed best when the disease similarities are integrated with the enhancer functional interaction network by known disease-enhancer associations in the form of a heterogeneous network of diseases and enhancers. It was also superior to another network diffusion method, i.e., PageRank with Priors, and a neighborhood-based one, i.e., MaxLink, which simply chooses the closest neighbors of known disease-associated enhancers. Finally, we showed that RWDisEnh could predict novel enhancers, which are either directly or indirectly associated with diseases. Conclusions Taken together, RWDisEnh could be a potential method for predicting disease-enhancer associations.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
- * E-mail:
| |
Collapse
|
30
|
Rosenthal SB, Willsey HR, Xu Y, Mei Y, Dea J, Wang S, Curtis C, Sempou E, Khokha MK, Chi NC, Willsey AJ, Fisch KM, Ideker T. A convergent molecular network underlying autism and congenital heart disease. Cell Syst 2021; 12:1094-1107.e6. [PMID: 34411509 PMCID: PMC8602730 DOI: 10.1016/j.cels.2021.07.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 05/10/2021] [Accepted: 07/28/2021] [Indexed: 12/29/2022]
Abstract
Patients with neurodevelopmental disorders, including autism, have an elevated incidence of congenital heart disease, but the extent to which these conditions share molecular mechanisms remains unknown. Here, we use network genetics to identify a convergent molecular network underlying autism and congenital heart disease. This network is impacted by damaging genetic variants from both disorders in multiple independent cohorts of patients, pinpointing 101 genes with shared genetic risk. Network analysis also implicates risk genes for each disorder separately, including 27 previously unidentified genes for autism and 46 for congenital heart disease. For 7 genes with shared risk, we create engineered disruptions in Xenopus tropicalis, confirming both heart and brain developmental abnormalities. The network includes a family of ion channels, such as the sodium transporter SCN2A, linking these functions to early heart and brain development. This study provides a road map for identifying risk genes and pathways involved in co-morbid conditions.
Collapse
Affiliation(s)
- Sara Brin Rosenthal
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Helen Rankin Willsey
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Yuxiao Xu
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Yuan Mei
- Division of Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jeanselle Dea
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sheng Wang
- Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94158, USA
| | - Charlotte Curtis
- Division of Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Emily Sempou
- Pediatric Genomics Discovery Program, Department of Pediatrics and Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Mustafa K Khokha
- Pediatric Genomics Discovery Program, Department of Pediatrics and Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Neil C Chi
- Division of Cardiology, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Arthur Jeremy Willsey
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA; Quantitative Biosciences Institute (QBI), University of California, San Francisco, San Francisco, CA 94158, USA.
| | - Kathleen M Fisch
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
31
|
Kotlyar M, Pastrello C, Ahmed Z, Chee J, Varyova Z, Jurisica I. IID 2021: towards context-specific protein interaction analyses by increased coverage, enhanced annotation and enrichment analysis. Nucleic Acids Res 2021; 50:D640-D647. [PMID: 34755877 PMCID: PMC8728267 DOI: 10.1093/nar/gkab1034] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/13/2021] [Accepted: 11/03/2021] [Indexed: 01/02/2023] Open
Abstract
Improved bioassays have significantly increased the rate of identifying new protein-protein interactions (PPIs), and the number of detected human PPIs has greatly exceeded early estimates of human interactome size. These new PPIs provide a more complete view of disease mechanisms but precise understanding of how PPIs affect phenotype remains a challenge. It requires knowledge of PPI context (e.g. tissues, subcellular localizations), and functional roles, especially within pathways and protein complexes. The previous IID release focused on PPI context, providing networks with comprehensive tissue, disease, cellular localization, and druggability annotations. The current update adds developmental stages to the available contexts, and provides a way of assigning context to PPIs that could not be previously annotated due to insufficient data or incompatibility with available context categories (e.g. interactions between membrane and cytoplasmic proteins). This update also annotates PPIs with conservation across species, directionality in pathways, membership in large complexes, interaction stability (i.e. stable or transient), and mutation effects. Enrichment analysis is now available for all annotations, and includes multiple options; for example, context annotations can be analyzed with respect to PPIs or network proteins. In addition to tabular view or download, IID provides online network visualization. This update is available at http://ophid.utoronto.ca/iid.
Collapse
Affiliation(s)
- Max Kotlyar
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Chiara Pastrello
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Zuhaib Ahmed
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Justin Chee
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Zofia Varyova
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Igor Jurisica
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada.,Departments of Medical Biophysics and Computer Science, University of Toronto, Toronto, ON M5S 1A4, Canada.,Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| |
Collapse
|
32
|
Petti M, Farina L, Francone F, Lucidi S, Macali A, Palagi L, De Santis M. MOSES: A New Approach to Integrate Interactome Topology and Functional Features for Disease Gene Prediction. Genes (Basel) 2021; 12:1713. [PMID: 34828319 PMCID: PMC8624742 DOI: 10.3390/genes12111713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 10/16/2021] [Accepted: 10/25/2021] [Indexed: 11/17/2022] Open
Abstract
Disease gene prediction is to date one of the main computational challenges of precision medicine. It is still uncertain if disease genes have unique functional properties that distinguish them from other non-disease genes or, from a network perspective, if they are located randomly in the interactome or show specific patterns in the network topology. In this study, we propose a new method for disease gene prediction based on the use of biological knowledge-bases (gene-disease associations, genes functional annotations, etc.) and interactome network topology. The proposed algorithm called MOSES is based on the definition of two somewhat opposing sets of genes both disease-specific from different perspectives: warm seeds (i.e., disease genes obtained from databases) and cold seeds (genes far from the disease genes on the interactome and not involved in their biological functions). The application of MOSES to a set of 40 diseases showed that the suggested putative disease genes are significantly enriched in their reference disease. Reassuringly, known and predicted disease genes together, tend to form a connected network module on the human interactome, mitigating the scattered distribution of disease genes which is probably due to both the paucity of disease-gene associations and the incompleteness of the interactome.
Collapse
Affiliation(s)
- Manuela Petti
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, 00185 Rome, Italy; (L.F.); (F.F.); (S.L.); (A.M.); (L.P.); (M.D.S.)
| | | | | | | | | | | | | |
Collapse
|
33
|
Liu J, Zhu H, Qiu J. Locally Adjust Networks Based on Connectivity and Semantic Similarities for Disease Module Detection. Front Genet 2021; 12:726596. [PMID: 34759955 PMCID: PMC8575408 DOI: 10.3389/fgene.2021.726596] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/22/2021] [Indexed: 11/13/2022] Open
Abstract
For studying the pathogenesis of complex diseases, it is important to identify the disease modules in the system level. Since the protein-protein interaction (PPI) networks contain a number of incomplete and incorrect interactome, most existing methods often lead to many disease proteins isolating from disease modules. In this paper, we propose an effective disease module identification method IDMCSS, where the used human PPI networks are obtained by adding some potential missing interactions from existing PPI networks, as well as removing some potential incorrect interactions. In IDMCSS, a network adjustment strategy is developed to add or remove links around disease proteins based on both topological and semantic information. Next, neighboring proteins of disease proteins are prioritized according to a suggested similarity between each of them and disease proteins, and the protein with the largest similarity with disease proteins is added into a candidate disease protein set one by one. The stopping criterion is set to the boundary of the disease proteins. Finally, the connected subnetwork having the largest number of disease proteins is selected as a disease module. Experimental results on asthma demonstrate the effectiveness of the method in comparison to existing algorithms for disease module identification. It is also shown that the proposed IDMCSS can obtain the disease modules having crucial biological processes of asthma and 12 targets for drug intervention can be predicted.
Collapse
Affiliation(s)
- Jia Liu
- State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, China
| | - Huole Zhu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, China
| | - Jianfeng Qiu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, China
| |
Collapse
|
34
|
Sledzieski S, Singh R, Cowen L, Berger B. D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Syst 2021; 12:969-982.e6. [PMID: 34536380 PMCID: PMC8586911 DOI: 10.1016/j.cels.2021.08.010] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 08/01/2021] [Accepted: 08/19/2021] [Indexed: 11/29/2022]
Abstract
We combine advances in neural language modeling and structurally motivated design to develop D-SCRIPT, an interpretable and generalizable deep-learning model, which predicts interaction between two proteins using only their sequence and maintains high accuracy with limited training data and across species. We show that a D-SCRIPT model trained on 38,345 human PPIs enables significantly improved functional characterization of fly proteins compared with the state-of-the-art approach. Evaluating the same D-SCRIPT model on protein complexes with known 3D structure, we find that the inter-protein contact map output by D-SCRIPT has significant overlap with the ground truth. We apply D-SCRIPT to screen for PPIs in cow (Bos taurus) at a genome-wide scale and focusing on rumen physiology, identify functional gene modules related to metabolism and immune response. The predicted interactions can then be leveraged for function prediction at scale, addressing the genome-to-phenome challenge, especially in species where little data are available.
Collapse
Affiliation(s)
- Samuel Sledzieski
- Computer Science and Artificial Intelligence Lab., Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Rohit Singh
- Computer Science and Artificial Intelligence Lab., Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, USA.
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Lab., Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
35
|
Zhang XM, Liang L, Liu L, Tang MJ. Graph Neural Networks and Their Current Applications in Bioinformatics. Front Genet 2021; 12:690049. [PMID: 34394185 PMCID: PMC8360394 DOI: 10.3389/fgene.2021.690049] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/28/2021] [Indexed: 12/22/2022] Open
Abstract
Graph neural networks (GNNs), as a branch of deep learning in non-Euclidean space, perform particularly well in various tasks that process graph structure data. With the rapid accumulation of biological network data, GNNs have also become an important tool in bioinformatics. In this research, a systematic survey of GNNs and their advances in bioinformatics is presented from multiple perspectives. We first introduce some commonly used GNN models and their basic principles. Then, three representative tasks are proposed based on the three levels of structural information that can be learned by GNNs: node classification, link prediction, and graph generation. Meanwhile, according to the specific applications for various omics data, we categorize and discuss the related studies in three aspects: disease prediction, drug discovery, and biomedical imaging. Based on the analysis, we provide an outlook on the shortcomings of current studies and point out their developing prospect. Although GNNs have achieved excellent results in many biological tasks at present, they still face challenges in terms of low-quality data processing, methodology, and interpretability and have a long road ahead. We believe that GNNs are potentially an excellent method that solves various biological problems in bioinformatics research.
Collapse
Affiliation(s)
- Xiao-Meng Zhang
- School of Information, Yunnan Normal University, Kunming, China
| | - Li Liang
- School of Information, Yunnan Normal University, Kunming, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, China
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
| | - Ming-Jing Tang
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
- School of Life Sciences, Yunnan Normal University, Kunming, China
| |
Collapse
|
36
|
Bello T, Chan M, Golkowski M, Xue AG, Khasnavis N, Ceribelli M, Ong SE, Thomas CJ, Gujral TS. KiRNet: Kinase-centered network propagation of pharmacological screen results. CELL REPORTS METHODS 2021; 1:100007. [PMID: 34296206 PMCID: PMC8294099 DOI: 10.1016/j.crmeth.2021.100007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/21/2021] [Accepted: 03/19/2021] [Indexed: 11/29/2022]
Abstract
The ever-increasing size and scale of biological information have popularized network-based approaches as a means to interpret these data. We develop a network propagation method that integrates kinase-inhibitor-focused functional screens with known protein-protein interactions (PPIs). This method, dubbed KiRNet, uses an a priori edge-weighting strategy based on node degree to establish a pipeline from a kinase inhibitor screen to the generation of a predictive PPI subnetwork. We apply KiRNet to uncover molecular regulators of mesenchymal cancer cells driven by overexpression of Frizzled 2 (FZD2). KiRNet produces a network model consisting of 166 high-value proteins. These proteins exhibit FZD2-dependent differential phosphorylation, and genetic knockdown studies validate their role in maintaining a mesenchymal cell state. Finally, analysis of clinical data shows that mesenchymal tumors exhibit significantly higher average expression of the 166 corresponding genes than epithelial tumors for nine different cancer types.
Collapse
Affiliation(s)
- Thomas Bello
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Molecular and Cellular Biology, University of Washington, Seattle, WA 98195-7275, USA
| | - Marina Chan
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Martin Golkowski
- Department of Pharmacology, University of Washington, Seattle, WA 98195-7275, USA
| | - Andrew G. Xue
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Nithisha Khasnavis
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Michele Ceribelli
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Bethesda, MD, USA
| | - Shao-En Ong
- Department of Pharmacology, University of Washington, Seattle, WA 98195-7275, USA
| | - Craig J. Thomas
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Bethesda, MD, USA
| | - Taranjit S. Gujral
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Department of Molecular and Cellular Biology, University of Washington, Seattle, WA 98195-7275, USA
- Department of Pharmacology, University of Washington, Seattle, WA 98195-7275, USA
| |
Collapse
|
37
|
Coşkun M, Baggag A, Koyutürk M. Fast computation of Katz index for efficient processing of link prediction queries. Data Min Knowl Discov 2021. [DOI: 10.1007/s10618-021-00754-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
38
|
Molecular aggregation in liquid water: Laplace spectra and spectral clustering of H-bonded network. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2020.114802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
39
|
Lakizadeh A, Hassan Mir-Ashrafi SM. Drug repurposing improvement using a novel data integration framework based on the drug side effect. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100523] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
40
|
de Weerd HA, Badam TVS, Martínez-Enguita D, Åkesson J, Muthas D, Gustafsson M, Lubovac-Pilav Z. MODifieR: an Ensemble R Package for Inference of Disease Modules from Transcriptomics Networks. Bioinformatics 2020; 36:3918-3919. [PMID: 32271876 DOI: 10.1093/bioinformatics/btaa235] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 03/27/2020] [Accepted: 04/02/2020] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Complex diseases are due to the dense interactions of many disease-associated factors that dysregulate genes that in turn form the so-called disease modules, which have shown to be a powerful concept for understanding pathological mechanisms. There exist many disease module inference methods that rely on somewhat different assumptions, but there is still no gold standard or best-performing method. Hence, there is a need for combining these methods to generate robust disease modules. RESULTS We developed MODule IdentiFIER (MODifieR), an ensemble R package of nine disease module inference methods from transcriptomics networks. MODifieR uses standardized input and output allowing the possibility to combine individual modules generated from these methods into more robust disease-specific modules, contributing to a better understanding of complex diseases. AVAILABILITY AND IMPLEMENTATION MODifieR is available under the GNU GPL license and can be freely downloaded from https://gitlab.com/Gustafsson-lab/MODifieR and as a Docker image from https://hub.docker.com/r/ddeweerd/modifier. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hendrik A de Weerd
- School of Bioscience, Systems Biology Research Center, Skövde 541 45, Sweden.,Department of Physics, Chemistry and Biology, Linköping University, Linköping 581 83, Sweden
| | - Tejaswi V S Badam
- School of Bioscience, Systems Biology Research Center, Skövde 541 45, Sweden.,Department of Physics, Chemistry and Biology, Linköping University, Linköping 581 83, Sweden
| | - David Martínez-Enguita
- Department of Physics, Chemistry and Biology, Linköping University, Linköping 581 83, Sweden
| | - Julia Åkesson
- School of Bioscience, Systems Biology Research Center, Skövde 541 45, Sweden.,Department of Physics, Chemistry and Biology, Linköping University, Linköping 581 83, Sweden
| | - Daniel Muthas
- Translational Science and Experimental Medicine, Early Respiratory, Inflammation and Autoimmunity, BioPharmaceuticals R&D, AstraZeneca, Mölndal 43183, Sweden
| | - Mika Gustafsson
- Department of Physics, Chemistry and Biology, Linköping University, Linköping 581 83, Sweden
| | | |
Collapse
|
41
|
Petti M, Bizzarri D, Verrienti A, Falcone R, Farina L. Connectivity Significance for Disease Gene Prioritization in an Expanding Universe. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2155-2161. [PMID: 31484130 DOI: 10.1109/tcbb.2019.2938512] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A fundamental topic in network medicine is disease genes prioritization. The underlying hypothesis is that disease genes are organized as modules confined within the interactome. Here, we propose a novel algorithm called DiaBLE (DIAMOnD Background Local Expansion) which is a modified version of DIAMOnD, a successful algorithm based on the concept of connectivity significance. Instead of taking the whole interactome as the background model, DiaBLE considers as gene universe the smallest local expansion of the current seeds set at each iteration step. We show that DiaBLE significantly increases the overall DIAMOnD ranking quality of genes prioritization both in terms of cross-validation and biological consistency. Here, we focus on the two algorithms only since a comparative analysis among gene prioritization methods is beyond the scope of this study. Finally, we briefly discuss the improvement of biological insight provided by DiaBLE for two cancers (head and neck squamous cell carcinoma and kidney renal clear cell carcinoma).
Collapse
|
42
|
Wang B, Hu J, Wang Y, Zhang C, Zhou Y, Yu L, Guo X, Gao L, Chen Y. C3: connect separate connected components to form a succinct disease module. BMC Bioinformatics 2020; 21:433. [PMID: 33008305 PMCID: PMC7531168 DOI: 10.1186/s12859-020-03769-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 09/20/2020] [Indexed: 01/08/2023] Open
Abstract
Background Precise disease module is conducive to understanding the molecular mechanism of disease causation and identifying drug targets. However, due to the fragmentization of disease module in incomplete human interactome, how to determine connectivity pattern and detect a complete neighbourhood of disease based on this is still an open question. Results In this paper, we perform exploratory analysis leading to an important observation that through a few intermediate nodes, most separate connected components formed by disease-associated proteins can be effectively connected and eventually form a complete disease module. And based on the topological properties of these intermediate nodes, we propose a connect separate connected components (C3) method to detect a succinct disease module by introducing a relatively small number of intermediate nodes, which allows us to obtain more pure disease module than other methods. Then we apply C3 across a large corpus of diseases to validate this connectivity pattern of disease module. Furthermore, the connectivity of the perturbed genes in multi-omics data such as The Cancer Genome Atlas also fits this pattern. Conclusions C3 tool is not only useful in detecting a clearly-defined connected disease neighbourhood of 299 diseases and cancer with multi-omics data, but also helpful in better understanding the interconnection of phenotypically related genes in different omics data and studying complex pathological processes.
Collapse
Affiliation(s)
- Bingbo Wang
- School of Computer Science and Technology, Xidian University, Xi'an, People's Republic of China.
| | - Jie Hu
- School of Computer Science and Technology, Xidian University, Xi'an, People's Republic of China
| | - Yajun Wang
- School of Humanities and Foreign Languages, Xi'an University of Technology, Xi'an, People's Republic of China
| | - Chenxing Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, People's Republic of China
| | - Yuanjun Zhou
- School of Computer Science and Technology, Xidian University, Xi'an, People's Republic of China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, People's Republic of China
| | - Xingli Guo
- School of Computer Science and Technology, Xidian University, Xi'an, People's Republic of China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, People's Republic of China
| | - Yunru Chen
- The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, People's Republic of China.
| |
Collapse
|
43
|
Abstract
Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.
Collapse
Affiliation(s)
- Ning Sun
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| |
Collapse
|
44
|
Wang S, Flynn ER, Altman RB. Gaussian Embedding for Large-scale Gene Set Analysis. NAT MACH INTELL 2020; 2:387-395. [PMID: 32968711 PMCID: PMC7505077 DOI: 10.1038/s42256-020-0193-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 05/15/2020] [Indexed: 02/08/2023]
Abstract
Gene sets, including protein complexes and signaling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available machine learning models. Here, we study the problem of embedding gene sets as compact features that are compatible with available machine learning codes. We present Set2Gaussian, a novel network-based gene set embedding approach, which represents each gene set as a multivariate Gaussian distribution rather than a single point in the low-dimensional space, according to the proximity of these genes in a protein-protein interaction network. We demonstrate that Set2Gaussian improves gene set member identification, accurately stratifies tumors, and finds concise gene sets for gene set enrichment analysis. We further show how Set2Gaussian allows us to identify a previously unknown clinical prognostic and predictive subnetwork around NEFM in sarcoma, which we validate in independent cohorts.
Collapse
Affiliation(s)
- Sheng Wang
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Emily R. Flynn
- Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA
| | - Russ B. Altman
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
45
|
Hristov BH, Chazelle B, Singh M. uKIN Combines New and Prior Information with Guided Network Propagation to Accurately Identify Disease Genes. Cell Syst 2020; 10:470-479.e3. [PMID: 32684276 PMCID: PMC7821437 DOI: 10.1016/j.cels.2020.05.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 04/24/2020] [Accepted: 05/19/2020] [Indexed: 12/23/2022]
Abstract
Protein interaction networks provide a powerful framework for identifying genes causal for complex genetic diseases. Here, we introduce a general framework, uKIN, that uses prior knowledge of disease-associated genes to guide, within known protein-protein interaction networks, random walks that are initiated from newly identified candidate genes. In large-scale testing across 24 cancer types, we demonstrate that our network propagation approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. We also apply our approach to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes. uKIN is freely available for download at: https://github.com/Singh-Lab/uKIN.
Collapse
Affiliation(s)
- Borislav H Hristov
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
46
|
Peng J, Zhu L, Wang Y, Chen J. Mining Relationships among Multiple Entities in Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:769-776. [PMID: 30872239 DOI: 10.1109/tcbb.2019.2904965] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying topological relationships among multiple entities in biological networks is critical towards the understanding of the organizational principles of network functionality. Theoretically, this problem can be solved using minimum Steiner tree (MSTT) algorithms. However, due to large network size, it remains to be computationally challenging, and the predictive value of multi-entity topological relationships is still unclear. We present a novel solution called Cluster-based Steiner Tree Miner (CST-Miner) to instantly identify multi-entity topological relationships in biological networks. Given a list of user-specific entities, CST-Miner decomposes a biological network into nested cluster-based subgraphs, on which multiple minimum Steiner trees are identified. By merging all of them into a minimum cost tree, the optimal topological relationships among all the user-specific entities are revealed. Experimental results showed that CST-Miner can finish in nearly log-linear time and the tree constructed by CST-Miner is close to the global minimum.
Collapse
|
47
|
Silverman EK, Schmidt HHHW, Anastasiadou E, Altucci L, Angelini M, Badimon L, Balligand JL, Benincasa G, Capasso G, Conte F, Di Costanzo A, Farina L, Fiscon G, Gatto L, Gentili M, Loscalzo J, Marchese C, Napoli C, Paci P, Petti M, Quackenbush J, Tieri P, Viggiano D, Vilahur G, Glass K, Baumbach J. Molecular networks in Network Medicine: Development and applications. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2020; 12:e1489. [PMID: 32307915 DOI: 10.1002/wsbm.1489] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Revised: 02/29/2020] [Accepted: 03/20/2020] [Indexed: 12/14/2022]
Abstract
Network Medicine applies network science approaches to investigate disease pathogenesis. Many different analytical methods have been used to infer relevant molecular networks, including protein-protein interaction networks, correlation-based networks, gene regulatory networks, and Bayesian networks. Network Medicine applies these integrated approaches to Omics Big Data (including genetics, epigenetics, transcriptomics, metabolomics, and proteomics) using computational biology tools and, thereby, has the potential to provide improvements in the diagnosis, prognosis, and treatment of complex diseases. We discuss briefly the types of molecular data that are used in molecular network analyses, survey the analytical methods for inferring molecular networks, and review efforts to validate and visualize molecular networks. Successful applications of molecular network analysis have been reported in pulmonary arterial hypertension, coronary heart disease, diabetes mellitus, chronic lung diseases, and drug development. Important knowledge gaps in Network Medicine include incompleteness of the molecular interactome, challenges in identifying key genes within genetic association regions, and limited applications to human diseases. This article is categorized under: Models of Systems Properties and Processes > Mechanistic Models Translational, Genomic, and Systems Medicine > Translational Medicine Analytical and Computational Methods > Analytical Methods Analytical and Computational Methods > Computational Methods.
Collapse
Affiliation(s)
- Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Harald H H W Schmidt
- Department of Pharmacology and Personalized Medicine, School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Science, Maastricht University, Maastricht, The Netherlands
| | - Eleni Anastasiadou
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Lucia Altucci
- Department of Precision Medicine, University of Campania 'Luigi Vanvitelli', Naples, Italy
| | - Marco Angelini
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Lina Badimon
- Cardiovascular Program-ICCC, IR-Hospital de la Santa Creu i Sant Pau, CiberCV, IIB-Sant Pau, Autonomous University of Barcelona, Barcelona, Spain
| | - Jean-Luc Balligand
- Pole of Pharmacology and Therapeutics (FATH), Institute for Clinical and Experimental Research (IREC), UCLouvain, Brussels, Belgium
| | - Giuditta Benincasa
- Department of Advanced Clinical and Surgical Sciences, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Giovambattista Capasso
- Department of Translational Medical Sciences, University of Campania "L. Vanvitelli", Naples, Italy.,BIOGEM, Ariano Irpino, Italy
| | - Federica Conte
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| | - Antonella Di Costanzo
- Department of Precision Medicine, University of Campania 'Luigi Vanvitelli', Naples, Italy
| | - Lorenzo Farina
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Giulia Fiscon
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| | - Laurent Gatto
- de Duve Institute, Brussels, Belgium.,Institute for Experimental and Clinical Research (IREC), UCLouvain, Brussels, Belgium
| | - Michele Gentili
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.,Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Cinzia Marchese
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Claudio Napoli
- Department of Advanced Clinical and Surgical Sciences, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Paola Paci
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Manuela Petti
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - John Quackenbush
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Paolo Tieri
- CNR National Research Council of Italy, IAC Institute for Applied Computing, Rome, Italy
| | - Davide Viggiano
- BIOGEM, Ariano Irpino, Italy.,Department of Medicine and Health Sciences, University of Molise, Campobasso, Italy
| | - Gemma Vilahur
- Cardiovascular Program-ICCC, IR-Hospital de la Santa Creu i Sant Pau, CiberCV, IIB-Sant Pau, Autonomous University of Barcelona, Barcelona, Spain
| | - Kimberly Glass
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Jan Baumbach
- Department of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Maximus-von-Imhof-Forum 3, Freising, Germany.,Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
48
|
Yu X, Lai S, Chen H, Chen M. Protein–protein interaction network with machine learning models and multiomics data reveal potential neurodegenerative disease-related proteins. Hum Mol Genet 2020; 29:1378-1387. [DOI: 10.1093/hmg/ddaa065] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 12/22/2019] [Accepted: 04/01/2020] [Indexed: 12/18/2022] Open
Abstract
AbstractResearch of protein–protein interaction in several model organisms is accumulating since the development of high-throughput experimental technologies and computational methods. The protein–protein interaction network (PPIN) is able to examine biological processes in a systematic manner and has already been used to predict potential disease-related proteins or drug targets. Based on the topological characteristics of the PPIN, we investigated the application of the random forest classification algorithm to predict proteins that may cause neurodegenerative disease, a set of pathological changes featured by protein malfunction. By integrating multiomics data, we further showed the validity of our machine learning model and narrowed down the prediction results to several hub proteins that play essential roles in the PPIN. The novel insights into neurodegeneration pathogenesis brought by this computational study can indicate promising directions for future experimental research.
Collapse
Affiliation(s)
- Xinjian Yu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Siqi Lai
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hongjun Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
49
|
Nwadiugwu MC. Gene-Based Clustering Algorithms: Comparison Between Denclue, Fuzzy-C, and BIRCH. Bioinform Biol Insights 2020; 14:1177932220909851. [PMID: 32284672 PMCID: PMC7133071 DOI: 10.1177/1177932220909851] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 02/02/2020] [Indexed: 11/17/2022] Open
Abstract
The current study seeks to compare 3 clustering algorithms that can be used in gene-based bioinformatics research to understand disease networks, protein-protein interaction networks, and gene expression data. Denclue, Fuzzy-C, and Balanced Iterative and Clustering using Hierarchies (BIRCH) were the 3 gene-based clustering algorithms selected. These algorithms were explored in relation to the subfield of bioinformatics that analyzes omics data, which include but are not limited to genomics, proteomics, metagenomics, transcriptomics, and metabolomics data. The objective was to compare the efficacy of the 3 algorithms and determine their strength and drawbacks. Result of the review showed that unlike Denclue and Fuzzy-C which are more efficient in handling noisy data, BIRCH can handle data set with outliers and have a better time complexity.
Collapse
Affiliation(s)
- Martin C Nwadiugwu
- Department of Biomedical Informatics, University of Nebraska Omaha, Omaha, NE, USA
| |
Collapse
|
50
|
Collins TK, Houghten S. A centrality based multi-objective approach to disease gene association. Biosystems 2020; 193-194:104133. [PMID: 32243908 DOI: 10.1016/j.biosystems.2020.104133] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 02/27/2020] [Accepted: 03/23/2020] [Indexed: 01/11/2023]
Abstract
Disease Gene Association finds genes that are involved in the presentation of a given genetic disease. We present a hybrid approach which implements a multi-objective genetic algorithm, where input consists of centrality measures based on various relational biological evidence types merged into a complex network. Multiple objective settings and parameters are studied including the development of a new exchange methodology, safe dealer-based crossover. Successful results with respect to breast cancer and Parkinson's disease compared to previous techniques and popular known databases are shown. In addition, the newly developed methodology is also successfully applied to Alzheimer's disease, further demonstrating its flexibility. Across all three case studies the strongest results were produced by the shortest path-based measures stress and betweenness, either in a single objective parameter setting or when used in conjunction in a multi-objective environment. The new crossover technique achieved the best results when applied to Alzheimer's disease.
Collapse
Affiliation(s)
- Tyler K Collins
- Computer Science Department, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, Ontario L2S 3A1, Canada
| | - Sheridan Houghten
- Computer Science Department, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, Ontario L2S 3A1, Canada.
| |
Collapse
|