1
|
Pržulj N, Malod-Dognin N. Simplicity within biological complexity. BIOINFORMATICS ADVANCES 2025; 5:vbae164. [PMID: 39927291 PMCID: PMC11805345 DOI: 10.1093/bioadv/vbae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 10/01/2024] [Accepted: 10/23/2024] [Indexed: 02/11/2025]
Abstract
Motivation Heterogeneous, interconnected, systems-level, molecular (multi-omic) data have become increasingly available and key in precision medicine. We need to utilize them to better stratify patients into risk groups, discover new biomarkers and targets, repurpose known and discover new drugs to personalize medical treatment. Existing methodologies are limited and a paradigm shift is needed to achieve quantitative and qualitative breakthroughs. Results In this perspective paper, we survey the literature and argue for the development of a comprehensive, general framework for embedding of multi-scale molecular network data that would enable their explainable exploitation in precision medicine in linear time. Network embedding methods (also called graph representation learning) map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships. They have recently achieved unprecedented performance on hard problems of utilizing few omic data in various biomedical applications. However, research thus far has been limited to special variants of the problems and data, with the performance depending on the underlying topology-function network biology hypotheses, the biomedical applications, and evaluation metrics. The availability of multi-omic data, modern graph embedding paradigms and compute power call for a creation and training of efficient, explainable and controllable models, having no potentially dangerous, unexpected behaviour, that make a qualitative breakthrough. We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation, and to apply it to biomedical informatics, focusing on precision medicine and personalized drug discovery. It will lead to a paradigm shift in the computational and biomedical understanding of data and diseases that will open up ways to solve some of the major bottlenecks in precision medicine and other domains.
Collapse
Affiliation(s)
- Nataša Pržulj
- Computational Biology Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, 00000, United Arabic Emirates
- Barcelona Supercomputing Center, Barcelona 08034, Spain
- Department of Computer Science, University College London, London WC1E6BT, United Kingdom
- ICREA, Pg. Lluís Companys 23, Barcelona 08010, Spain
| | | |
Collapse
|
2
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
3
|
Paul D, Saha S, Basu S, Chakraborti T. Computational analysis of pathogen-host interactome for fast and low-risk in-silico drug repurposing in emerging viral threats like Mpox. Sci Rep 2024; 14:18736. [PMID: 39134619 PMCID: PMC11319331 DOI: 10.1038/s41598-024-69617-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 08/07/2024] [Indexed: 08/15/2024] Open
Abstract
Monkeypox (Mpox), a zoonotic illness triggered by the monkeypox virus (MPXV), poses a significant threat since it may be transmitted and has no cure. This work introduces a computational method to predict Protein-Protein Interactions (PPIs) during MPXV infection. The objective is to discover prospective drug targets and repurpose current potential Food and Drug Administration (FDA) drugs for therapeutic purposes. In this work, ensemble features, comprising 2-5 node graphlet attributes and protein composition-based features are utilized for Deep Learning (DL) models to predict PPIs. The technique that is used here demonstrated an excellent prediction performance for PPI on both the Human Integrated Protein-Protein Interaction Reference (HIPPIE) and MPXV-Human PPI datasets. In addition, the human protein targets for MPXV have been identified accurately along with the detection of possible therapeutic targets. Furthermore, the validation process included conducting docking research studies on potential FDA drugs like Nicotinamide Adenine Dinucleotide and Hydrogen (NADH), Fostamatinib, Glutamic acid, Cannabidiol, Copper, and Zinc in DrugBank identified via research on drug repurposing and the Drug Consensus Score (DCS) for MPXV. This has been achieved by employing the primary crystal structures of MPXV, which are now accessible. The docking study is also supported by Molecular Dynamics (MD) simulation. The results of our study emphasize the effectiveness of using ensemble feature-based PPI prediction to understand the molecular processes involved in viral infection and to aid in the development of repurposed drugs for emerging infectious diseases such as, but not limited to, Mpox. The source code and link to data used in this work is available at: https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox .
Collapse
Affiliation(s)
- Debarati Paul
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
- Embedded Devices & Intelligent Systems, TCS Research & Innovation, Kolkata, India
| | - Sovan Saha
- Computer Science and Engineering (Artificial Intelligence and Machine Learning), Techno Main Salt Lake, Kolkata, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India.
| | - Tapabrata Chakraborti
- Health Sciences Programme, The Alan Turing Institute, London, UK.
- Department of Medical Physics and Biomedical Engineering and UCL Cancer Institute, University College London, London, UK.
| |
Collapse
|
4
|
Yazdani K, Mousapour R, Hayes WB. New GO-based measures in multiple network alignment. Bioinformatics 2024; 40:btae476. [PMID: 39082966 PMCID: PMC11310457 DOI: 10.1093/bioinformatics/btae476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/11/2024] [Accepted: 07/30/2024] [Indexed: 08/10/2024] Open
Abstract
MOTIVATION Protein-protein interaction (PPI) networks provide valuable insights into the function of biological systems. Aligning multiple PPI networks may expose relationships beyond those observable by pairwise comparisons. However, assessing the biological quality of multiple network alignments is a challenging problem. RESULTS We propose two new measures to evaluate the quality of multiple network alignments using functional information from Gene Ontology (GO) terms. When aligning multiple real PPI networks across species, we observe that both measures are highly correlated with objective quality indicators, such as common orthologs. Additionally, our measures strongly correlate with an alignment's ability to predict novel GO annotations, which is a unique advantage over existing GO-based measures. AVAILABILITY AND IMPLEMENTATION The scripts and the links to the raw and alignment data can be accessed at https://github.com/kimiayazdani/GO_Measures.git.
Collapse
Affiliation(s)
- Kimia Yazdani
- Department of Computer Science, University of California, Irvine, CA 92697-3435, United States
| | - Reza Mousapour
- Department of Computer Engineering, Sharif University of Technology, Tehran 1458889694, Iran
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA 92697-3435, United States
| |
Collapse
|
5
|
Yavuz BR, Arici MK, Demirel HC, Tsai CJ, Jang H, Nussinov R, Tuncbag N. Neurodevelopmental disorders and cancer networks share pathways, but differ in mechanisms, signaling strength, and outcome. NPJ Genom Med 2023; 8:37. [PMID: 37925498 PMCID: PMC10625621 DOI: 10.1038/s41525-023-00377-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 10/02/2023] [Indexed: 11/06/2023] Open
Abstract
Epidemiological studies suggest that individuals with neurodevelopmental disorders (NDDs) are more prone to develop certain types of cancer. Notably, however, the case statistics can be impacted by late discovery of cancer in individuals afflicted with NDDs, such as intellectual disorders, autism, and schizophrenia, which may bias the numbers. As to NDD-associated mutations, in most cases, they are germline while cancer mutations are sporadic, emerging during life. However, somatic mosaicism can spur NDDs, and cancer-related mutations can be germline. NDDs and cancer share proteins, pathways, and mutations. Here we ask (i) exactly which features they share, and (ii) how, despite their commonalities, they differ in clinical outcomes. To tackle these questions, we employed a statistical framework followed by network analysis. Our thorough exploration of the mutations, reconstructed disease-specific networks, pathways, and transcriptome levels and profiles of autism spectrum disorder (ASD) and cancers, point to signaling strength as the key factor: strong signaling promotes cell proliferation in cancer, and weaker (moderate) signaling impacts differentiation in ASD. Thus, we suggest that signaling strength, not activating mutations, can decide clinical outcome.
Collapse
Affiliation(s)
- Bengi Ruken Yavuz
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, 21702, USA
| | - M Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Turkey
| | - Habibe Cansu Demirel
- Graduate School of Sciences and Engineering, Koc University, Istanbul, 34450, Turkey
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, 21702, USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD, 21702, USA.
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, 69978, Israel.
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, Turkey.
- School of Medicine, Koc University, Istanbul, 34450, Turkey.
- Koc University Research Center for Translational Medicine (KUTTAM), Istanbul, Turkey.
| |
Collapse
|
6
|
Gureghian V, Herbst H, Kozar I, Mihajlovic K, Malod-Dognin N, Ceddia G, Angeli C, Margue C, Randic T, Philippidou D, Nomigni MT, Hemedan A, Tranchevent LC, Longworth J, Bauer M, Badkas A, Gaigneaux A, Muller A, Ostaszewski M, Tolle F, Pržulj N, Kreis S. A multi-omics integrative approach unravels novel genes and pathways associated with senescence escape after targeted therapy in NRAS mutant melanoma. Cancer Gene Ther 2023; 30:1330-1345. [PMID: 37420093 PMCID: PMC10581906 DOI: 10.1038/s41417-023-00640-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 05/19/2023] [Accepted: 06/21/2023] [Indexed: 07/09/2023]
Abstract
Therapy Induced Senescence (TIS) leads to sustained growth arrest of cancer cells. The associated cytostasis has been shown to be reversible and cells escaping senescence further enhance the aggressiveness of cancers. Chemicals specifically targeting senescent cells, so-called senolytics, constitute a promising avenue for improved cancer treatment in combination with targeted therapies. Understanding how cancer cells evade senescence is needed to optimise the clinical benefits of this therapeutic approach. Here we characterised the response of three different NRAS mutant melanoma cell lines to a combination of CDK4/6 and MEK inhibitors over 33 days. Transcriptomic data show that all cell lines trigger a senescence programme coupled with strong induction of interferons. Kinome profiling revealed the activation of Receptor Tyrosine Kinases (RTKs) and enriched downstream signaling of neurotrophin, ErbB and insulin pathways. Characterisation of the miRNA interactome associates miR-211-5p with resistant phenotypes. Finally, iCell-based integration of bulk and single-cell RNA-seq data identifies biological processes perturbed during senescence and predicts 90 new genes involved in its escape. Overall, our data associate insulin signaling with persistence of a senescent phenotype and suggest a new role for interferon gamma in senescence escape through the induction of EMT and the activation of ERK5 signaling.
Collapse
Affiliation(s)
- Vincent Gureghian
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Hailee Herbst
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Ines Kozar
- Laboratoire National de Santé, Dudelange, Luxembourg
| | | | | | - Gaia Ceddia
- Barcelona Supercomputing Center, 08034, Barcelona, Spain
| | - Cristian Angeli
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Christiane Margue
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Tijana Randic
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Demetra Philippidou
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Milène Tetsi Nomigni
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Ahmed Hemedan
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Leon-Charles Tranchevent
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Joseph Longworth
- Experimental and Molecular Immunology, Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
| | - Mark Bauer
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Apurva Badkas
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Anthoula Gaigneaux
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Arnaud Muller
- LuxGen, TMOH and Bioinformatics platform, Data Integration and Analysis unit, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Fabrice Tolle
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Nataša Pržulj
- Barcelona Supercomputing Center, 08034, Barcelona, Spain
- Department of Computer Science, University College London, London, WC1E 6BT, UK
- ICREA, Pg. Lluís Companys 23, 08010, Barcelona, Spain
| | - Stephanie Kreis
- Department of Life Sciences and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg.
| |
Collapse
|
7
|
Singh V, Singh V. Characterizing the circadian connectome of Ocimum tenuiflorum using an integrated network theoretic framework. Sci Rep 2023; 13:13108. [PMID: 37567911 PMCID: PMC10421869 DOI: 10.1038/s41598-023-40212-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 08/07/2023] [Indexed: 08/13/2023] Open
Abstract
Across the three domains of life, circadian clock is known to regulate vital physiological processes, like, growth, development, defence etc. by anticipating environmental cues. In this work, we report an integrated network theoretic methodology comprising of random walk with restart and graphlet degree vectors to characterize genome wide core circadian clock and clock associated raw candidate proteins in a plant for which protein interaction information is available. As a case study, we have implemented this framework in Ocimum tenuiflorum (Tulsi); one of the most valuable medicinal plants that has been utilized since ancient times in the management of a large number of diseases. For that, 24 core clock (CC) proteins were mined in 56 template plant genomes to build their hidden Markov models (HMMs). These HMMs were then used to identify 24 core clock proteins in O. tenuiflorum. The local topology of the interologous Tulsi protein interaction network was explored to predict the CC associated raw candidate proteins. Statistical and biological significance of the raw candidates was determined using permutation and enrichment tests. A total of 66 putative CC associated proteins were identified and their functional annotation was performed.
Collapse
Affiliation(s)
- Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himahcal Pradesh, Dharamshala, Himahcal Pradesh, 176206, India
| | - Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himahcal Pradesh, Dharamshala, Himahcal Pradesh, 176206, India.
| |
Collapse
|
8
|
Ding K, Wang S, Luo Y. Supervised biological network alignment with graph neural networks. Bioinformatics 2023; 39:i465-i474. [PMID: 37387160 PMCID: PMC10311300 DOI: 10.1093/bioinformatics/btad241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Despite the advances in sequencing technology, massive proteins with known sequences remain functionally unannotated. Biological network alignment (NA), which aims to find the node correspondence between species' protein-protein interaction (PPI) networks, has been a popular strategy to uncover missing annotations by transferring functional knowledge across species. Traditional NA methods assumed that topologically similar proteins in PPIs are functionally similar. However, it was recently reported that functionally unrelated proteins can be as topologically similar as functionally related pairs, and a new data-driven or supervised NA paradigm has been proposed, which uses protein function data to discern which topological features correspond to functional relatedness. RESULTS Here, we propose GraNA, a deep learning framework for the supervised NA paradigm for the pairwise NA problem. Employing graph neural networks, GraNA utilizes within-network interactions and across-network anchor links for learning protein representations and predicting functional correspondence between across-species proteins. A major strength of GraNA is its flexibility to integrate multi-faceted non-functional relationship data, such as sequence similarity and ortholog relationships, as anchor links to guide the mapping of functionally related proteins across species. Evaluating GraNA on a benchmark dataset composed of several NA tasks between different pairs of species, we observed that GraNA accurately predicted the functional relatedness of proteins and robustly transferred functional annotations across species, outperforming a number of existing NA methods. When applied to a case study on a humanized yeast network, GraNA also successfully discovered functionally replaceable human-yeast protein pairs that were documented in previous studies. AVAILABILITY AND IMPLEMENTATION The code of GraNA is available at https://github.com/luo-group/GraNA.
Collapse
Affiliation(s)
- Kerr Ding
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195, United States
| | - Yunan Luo
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| |
Collapse
|
9
|
Bouritsas G, Frasca F, Zafeiriou S, Bronstein MM. Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:657-668. [PMID: 35201983 DOI: 10.1109/tpami.2022.3154319] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
While Graph Neural Networks (GNNs) have achieved remarkable results in a variety of applications, recent studies exposed important shortcomings in their ability to capture the structure of the underlying graph. It has been shown that the expressive power of standard GNNs is bounded by the Weisfeiler-Leman (WL) graph isomorphism test, from which they inherit proven limitations such as the inability to detect and count graph substructures. On the other hand, there is significant empirical evidence, e.g. in network science and bioinformatics, that substructures are often intimately related to downstream tasks. To this end, we propose "Graph Substructure Networks" (GSN), a topologically-aware message passing scheme based on substructure encoding. We theoretically analyse the expressive power of our architecture, showing that it is strictly more expressive than the WL test, and provide sufficient conditions for universality. Importantly, we do not attempt to adhere to the WL hierarchy; this allows us to retain multiple attractive properties of standard GNNs such as locality and linear network complexity, while being able to disambiguate even hard instances of graph isomorphism. We perform an extensive experimental evaluation on graph classification and regression tasks and obtain state-of-the-art results in diverse real-world settings including molecular graphs and social networks.
Collapse
|
10
|
Girisha MN, Badiger VP, Pattar S. A comprehensive review of global alignment of multiple biological networks: background, applications and open issues. NETWORK MODELING ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS 2022; 11:9. [DOI: 10.1007/s13721-022-00353-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/16/2021] [Accepted: 12/16/2021] [Indexed: 01/03/2025]
|
11
|
Corominas GR, Blesa MJ, Blum C. AntNetAlign: Ant colony optimization for Network Alignment. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
12
|
Jia M, Van Alboom M, Goubert L, Bracke P, Gabrys B, Musial K. Encoding edge type information in graphlets. PLoS One 2022; 17:e0273609. [PMID: 36026434 PMCID: PMC9416998 DOI: 10.1371/journal.pone.0273609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 08/11/2022] [Indexed: 11/18/2022] Open
Abstract
Graph embedding approaches have been attracting increasing attention in recent years mainly due to their universal applicability. They convert network data into a vector space in which the graph structural information and properties are maximumly preserved. Most existing approaches, however, ignore the rich information about interactions between nodes, i.e., edge attribute or edge type. Moreover, the learned embeddings suffer from a lack of explainability, and cannot be used to study the effects of typed structures in edge-attributed networks. In this paper, we introduce a framework to embed edge type information in graphlets and generate a Typed-Edge Graphlets Degree Vector (TyE-GDV). Additionally, we extend two combinatorial approaches, i.e., the colored graphlets and heterogeneous graphlets approaches to edge-attributed networks. Through applying the proposed method to a case study of chronic pain patients, we find that not only the network structure of a patient could indicate his/her perceived pain grade, but also certain social ties, such as those with friends, colleagues, and healthcare professionals, are more crucial in understanding the impact of chronic pain. Further, we demonstrate that in a node classification task, the edge-type encoded graphlets approaches outperform the traditional graphlet degree vector approach by a significant margin, and that TyE-GDV could achieve a competitive performance of the combinatorial approaches while being far more efficient in space requirements.
Collapse
Affiliation(s)
- Mingshan Jia
- Complex Adaptive Systems Lab, School of Computer Science, University of Technology Sydney, Sydney, NSW, Australia
- * E-mail:
| | | | | | - Piet Bracke
- Health Psychology Lab, Ghent University, Ghent, Belgium
| | - Bogdan Gabrys
- Complex Adaptive Systems Lab, School of Computer Science, University of Technology Sydney, Sydney, NSW, Australia
| | - Katarzyna Musial
- Complex Adaptive Systems Lab, School of Computer Science, University of Technology Sydney, Sydney, NSW, Australia
| |
Collapse
|
13
|
Mao R, O’Leary J, Mesbah A, Mittal J. A Deep Learning Framework Discovers Compositional Order and Self-Assembly Pathways in Binary Colloidal Mixtures. JACS AU 2022; 2:1818-1828. [PMID: 36032540 PMCID: PMC9400045 DOI: 10.1021/jacsau.2c00111] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Binary colloidal superlattices (BSLs) have demonstrated enormous potential for the design of advanced multifunctional materials that can be synthesized via colloidal self-assembly. However, mechanistic understanding of the three-dimensional self-assembly of BSLs is largely limited due to a lack of tractable strategies for characterizing the many two-component structures that can appear during the self-assembly process. To address this gap, we present a framework for colloidal crystal structure characterization that uses branched graphlet decomposition with deep learning to systematically and quantitatively describe the self-assembly of BSLs at the single-particle level. Branched graphlet decomposition is used to evaluate local structure via high-dimensional neighborhood graphs that quantify both structural order (e.g., body-centered-cubic vs face-centered-cubic) and compositional order (e.g., substitutional defects) of each individual particle. Deep autoencoders are then used to efficiently translate these neighborhood graphs into low-dimensional manifolds from which relationships among neighborhood graphs can be more easily inferred. We demonstrate the framework on in silico systems of DNA-functionalized particles, in which two well-recognized design parameters, particle size ratio and interparticle potential well depth can be adjusted independently. The framework reveals that binary colloidal mixtures with small interparticle size disparities (i.e., A- and B-type particle radius ratios of r A/r B = 0.8 to r A/r B = 0.95) can promote the self-assembly of defect-free BSLs much more effectively than systems of identically sized particles, as nearly defect-free BCC-CsCl, FCC-CuAu, and IrV crystals are observed in the former case. The framework additionally reveals that size-disparate colloidal mixtures can undergo nonclassical nucleation pathways where BSLs evolve from dense amorphous precursors, instead of directly nucleating from dilute solution. These findings illustrate that the presented characterization framework can assist in enhancing mechanistic understanding of the self-assembly of binary colloidal mixtures, which in turn can pave the way for engineering the growth of defect-free BSLs.
Collapse
Affiliation(s)
- Runfang Mao
- Department
of Chemical and Biomolecular Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Jared O’Leary
- Department
of Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, United States
| | - Ali Mesbah
- Department
of Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, United States
| | - Jeetain Mittal
- Artie
McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843, United States
| |
Collapse
|
14
|
Li W, Zhang H, Li M, Han M, Yin Y. MGEGFP: a multi-view graph embedding method for gene function prediction based on adaptive estimation with GCN. Brief Bioinform 2022; 23:6659744. [PMID: 35947989 DOI: 10.1093/bib/bbac333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/02/2022] [Accepted: 07/21/2022] [Indexed: 11/14/2022] Open
Abstract
In recent years, a number of computational approaches have been proposed to effectively integrate multiple heterogeneous biological networks, and have shown impressive performance for inferring gene function. However, the previous methods do not fully represent the critical neighborhood relationship between genes during the feature learning process. Furthermore, it is difficult to accurately estimate the contributions of different views for multi-view integration. In this paper, we propose MGEGFP, a multi-view graph embedding method based on adaptive estimation with Graph Convolutional Network (GCN), to learn high-quality gene representations among multiple interaction networks for function prediction. First, we design a dual-channel GCN encoder to disentangle the view-specific information and the consensus pattern across diverse networks. By the aid of disentangled representations, we develop a multi-gate module to adaptively estimate the contributions of different views during each reconstruction process and make full use of the multiplexity advantages, where a diversity preservation constraint is designed to prevent the over-fitting problem. To validate the effectiveness of our model, we conduct experiments on networks from the STRING database for both yeast and human datasets, and compare the performance with seven state-of-the-art methods in five evaluation metrics. Moreover, the ablation study manifests the important contribution of the designed dual-channel encoder, multi-gate module and the diversity preservation constraint in MGEGFP. The experimental results confirm the superiority of our proposed method and suggest that MGEGFP can be a useful tool for gene function prediction.
Collapse
Affiliation(s)
- Wei Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Minghe Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Mingjing Han
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, 1400 R Street, 68588, Nebraska, USA
| |
Collapse
|
15
|
Pušnik Ž, Mraz M, Zimic N, Moškon M. Review and assessment of Boolean approaches for inference of gene regulatory networks. Heliyon 2022; 8:e10222. [PMID: 36033302 PMCID: PMC9403406 DOI: 10.1016/j.heliyon.2022.e10222] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 04/22/2022] [Accepted: 08/03/2022] [Indexed: 10/25/2022] Open
Abstract
Boolean descriptions of gene regulatory networks can provide an insight into interactions between genes. Boolean networks hold predictive power, are easy to understand, and can be used to simulate the observed networks in different scenarios. We review fundamental and state-of-the-art methods for inference of Boolean networks. We introduce a methodology for a straightforward evaluation of Boolean inference approaches based on the generation of evaluation datasets, application of selected inference methods, and evaluation of performance measures to guide the selection of the best method for a given inference problem. We demonstrate this procedure on inference methods REVEAL (REVerse Engineering ALgorithm), Best-Fit Extension, MIBNI (Mutual Information-based Boolean Network Inference), GABNI (Genetic Algorithm-based Boolean Network Inference) and ATEN (AND/OR Tree ENsemble algorithm), which infers Boolean descriptions of gene regulatory networks from discretised time series data. Boolean inference approaches tend to perform better in terms of dynamic accuracy, and slightly worse in terms of structural correctness. We believe that the proposed methodology and provided guidelines will help researchers to develop Boolean inference approaches with a good predictive capability while maintaining structural correctness and biological relevance.
Collapse
Affiliation(s)
- Žiga Pušnik
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Miha Mraz
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Nikolaj Zimic
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Miha Moškon
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| |
Collapse
|
16
|
Traquete F, Luz J, Cordeiro C, Sousa Silva M, Ferreira AEN. Graph Properties of Mass-Difference Networks for Profiling and Discrimination in Untargeted Metabolomics. Front Mol Biosci 2022; 9:917911. [PMID: 35936789 PMCID: PMC9353772 DOI: 10.3389/fmolb.2022.917911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 06/03/2022] [Indexed: 11/16/2022] Open
Abstract
Untargeted metabolomics seeks to identify and quantify most metabolites in a biological system. In general, metabolomics results are represented by numerical matrices containing data that represent the intensities of the detected variables. These matrices are subsequently analyzed by methods that seek to extract significant biological information from the data. In mass spectrometry-based metabolomics, if mass is detected with sufficient accuracy, below 1 ppm, it is possible to derive mass-difference networks, which have spectral features as nodes and chemical changes as edges. These networks have previously been used as means to assist formula annotation and to rank the importance of chemical transformations. In this work, we propose a novel role for such networks in untargeted metabolomics data analysis: we demonstrate that their properties as graphs can also be used as signatures for metabolic profiling and class discrimination. For several benchmark examples, we computed six graph properties and we found that the degree profile was consistently the property that allowed for the best performance of several clustering and classification methods, reaching levels that are competitive with the performance using intensity data matrices and traditional pretreatment procedures. Furthermore, we propose two new metrics for the ranking of chemical transformations derived from network properties, which can be applied to sample comparison or clustering. These metrics illustrate how the graph properties of mass-difference networks can highlight the aspects of the information contained in data that are complementary to the information extracted from intensity-based data analysis.
Collapse
|
17
|
Wang S, Atkinson GRS, Hayes WB. SANA: cross-species prediction of Gene Ontology GO annotations via topological network alignment. NPJ Syst Biol Appl 2022; 8:25. [PMID: 35859153 PMCID: PMC9300714 DOI: 10.1038/s41540-022-00232-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 05/20/2022] [Indexed: 12/31/2022] Open
Abstract
Topological network alignment aims to align two networks node-wise in order to maximize the observed common connection (edge) topology between them. The topological alignment of two protein-protein interaction (PPI) networks should thus expose protein pairs with similar interaction partners allowing, for example, the prediction of common Gene Ontology (GO) terms. Unfortunately, no network alignment algorithm based on topology alone has been able to achieve this aim, though those that include sequence similarity have seen some success. We argue that this failure of topology alone is due to the sparsity and incompleteness of the PPI network data of almost all species, which provides the network topology with a small signal-to-noise ratio that is effectively swamped when sequence information is added to the mix. Here we show that the weak signal can be detected using multiple stochastic samples of "good" topological network alignments, which allows us to observe regions of the two networks that are robustly aligned across multiple samples. The resulting network alignment frequency (NAF) strongly correlates with GO-based Resnik semantic similarity and enables the first successful cross-species predictions of GO terms based on topology-only network alignments. Our best predictions have an AUPR of about 0.4, which is competitive with state-of-the-art algorithms, even when there is no observable sequence similarity and no known homology relationship. While our results provide only a "proof of concept" on existing network data, we hypothesize that predicting GO terms from topology-only network alignments will become increasingly practical as the volume and quality of PPI network data increase.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA
| | - Giles R S Atkinson
- Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, 92697-3435, USA.
| |
Collapse
|
18
|
Introduction to the Class of Prefractal Graphs. MATHEMATICS 2022. [DOI: 10.3390/math10142500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Fractals are already firmly rooted in modern science. Research continues on the fractal properties of objects in physics, chemistry, biology and many other scientific fields. Fractal graphs as a discrete representation are used to model and describe the structure of various objects and processes, both natural and artificial. The paper proposes an introduction to prefractal graphs. The main definitions and notation are proposed—the concept of a seed, the operations of processing a seed, the procedure for generating a prefractal graph. Canonical (typical) and non-canonical (special) types of prefractal graphs are considered separately. Important characteristics are proposed and described—the preservation of adjacency of edges for different ranks in the trajectory. The definition of subgraph-seeds of different ranks is given separately. Rules for weighting a prefractal graph by natural numbers and intervals are proposed. Separately, the definition of a fractal graph as infinite is given, and the differences between the concepts of fractal and prefractal graphs are described. At the end of the work, already published works of the authors are proposed, indicating the main backlogs, as well as a list of directions for new research. This work is the beginning of a cycle of works on the study of the properties and characteristics of fractal and prefractal graphs.
Collapse
|
19
|
Li Q, Milenkovic T. Supervised Prediction of Aging-Related Genes From a Context-Specific Protein Interaction Subnetwork. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2484-2498. [PMID: 33929964 DOI: 10.1109/tcbb.2021.3076961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Human aging is linked to many prevalent diseases. The aging process is highly influenced by genetic factors. Hence, it is important to identify human aging-related genes. We focus on supervised prediction of such genes. Gene expression-based methods for this purpose study genes in isolation from each other. While protein-protein interaction (PPI) network-based methods for this purpose account for interactions between genes' protein products, current PPI network data are context-unspecific, spanning different biological conditions. Instead, here, we focus on an aging-specific subnetwork of the entire PPI network, obtained by integrating aging-specific gene expression data and PPI network data. The potential of such data integration has been recognized but mostly in the context of cancer. So, we are the first to propose a supervised learning framework for predicting aging-related genes from an aging-specific PPI subnetwork. In a systematic and comprehensive evaluation, we find that in many of the evaluation tests: (i) using an aging-specific subnetwork indeed yields more accurate aging-related gene predictions than using the entire network, and (ii) predictive methods from our framework that have not previously been used for supervised prediction of aging-related genes outperform existing prominent methods for the same purpose. These results justify the need for our framework.
Collapse
|
20
|
Wang S, Chen X, Frederisy BJ, Mbakogu BA, Kanne AD, Khosravi P, Hayes WB. On the current failure-but bright future-of topology-driven biological network alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:1-44. [PMID: 35871888 DOI: 10.1016/bs.apcsb.2022.05.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Since the function of a protein is defined by its interaction partners, and since we expect similar interaction patterns across species, the alignment of protein-protein interaction (PPI) networks between species, based on network topology alone, should uncover functionally related proteins across species. Surprisingly, despite the publication of more than fifty algorithms aimed at performing PPI network alignment, few have demonstrated a statistically significant link between network topology and functional similarity, and none have demonstrated that orthologs can be recovered using network topology alone. We find that the major contributing factors to this surprising failure are: (i) edge densities in most currently available experimental PPI networks are demonstrably too low to expect topological network alignment to succeed; (ii) in the few cases where the edge densities are high enough, some measures of topological similarity easily uncover functionally similar proteins while others do not; and (iii) most network alignment algorithms to date perform poorly at optimizing even their own topological objective functions, hampering their ability to use topology effectively. We demonstrate that SANA-the Simulated Annealing Network Aligner-significantly outperforms existing aligners at optimizing their own objective functions, even achieving near-optimal solutions when the optimal solution is known. We offer the first demonstration of global network alignments based on topology alone that align functionally similar proteins with p-values in some cases below 10-300. We predict that topological network alignment has a bright future as edge densities increase toward the value where good alignments become possible. We demonstrate that when enough common topology is present at high enough edge densities-for example in the recent, partly synthetic networks of the Integrated Interaction Database-topological network alignment easily recovers most orthologs, paving the way toward high-throughput functional prediction based on topology-driven network alignment.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Xiaoyin Chen
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Brent J Frederisy
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Benedict A Mbakogu
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Amy D Kanne
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Pasha Khosravi
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, United States.
| |
Collapse
|
21
|
Gu S, Jiang M, Guzzi PH, Milenković T. Modeling multi-scale data via a network of networks. Bioinformatics 2022; 38:2544-2553. [PMID: 35238343 PMCID: PMC9048659 DOI: 10.1093/bioinformatics/btac133] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 02/01/2022] [Accepted: 02/28/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Prediction of node and graph labels are prominent network science tasks. Data analyzed in these tasks are sometimes related: entities represented by nodes in a higher-level (higher scale) network can themselves be modeled as networks at a lower level. We argue that systems involving such entities should be integrated with a 'network of networks' (NoNs) representation. Then, we ask whether entity label prediction using multi-level NoN data via our proposed approaches is more accurate than using each of single-level node and graph data alone, i.e. than traditional node label prediction on the higher-level network and graph label prediction on the lower-level networks. To obtain data, we develop the first synthetic NoN generator and construct a real biological NoN. We evaluate accuracy of considered approaches when predicting artificial labels from the synthetic NoNs and proteins' functions from the biological NoN. RESULTS For the synthetic NoNs, our NoN approaches outperform or are as good as node- and network-level ones depending on the NoN properties. For the biological NoN, our NoN approaches outperform the single-level approaches for just under half of the protein functions, and for 30% of the functions, only our NoN approaches make meaningful predictions, while node- and network-level ones achieve random accuracy. So, NoN-based data integration is important. AVAILABILITY AND IMPLEMENTATION The software and data are available at https://nd.edu/~cone/NoNs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University Magna Graecia of Catanzaro, Catanzaro 88100, Italy
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
22
|
Mohr B, Shmilovich K, Kleinwächter IS, Schneider D, Ferguson AL, Bereau T. Data-driven discovery of cardiolipin-selective small molecules by computational active learning. Chem Sci 2022; 13:4498-4511. [PMID: 35656132 PMCID: PMC9019913 DOI: 10.1039/d2sc00116k] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 02/24/2022] [Indexed: 12/23/2022] Open
Abstract
Subtle variations in the lipid composition of mitochondrial membranes can have a profound impact on mitochondrial function. The inner mitochondrial membrane contains the phospholipid cardiolipin, which has been demonstrated to act as a biomarker for a number of diverse pathologies. Small molecule dyes capable of selectively partitioning into cardiolipin membranes enable visualization and quantification of the cardiolipin content. Here we present a data-driven approach that combines a deep learning-enabled active learning workflow with coarse-grained molecular dynamics simulations and alchemical free energy calculations to discover small organic compounds able to selectively permeate cardiolipin-containing membranes. By employing transferable coarse-grained models we efficiently navigate the all-atom design space corresponding to small organic molecules with molecular weight less than ≈500 Da. After direct simulation of only 0.42% of our coarse-grained search space we identify molecules with considerably increased levels of cardiolipin selectivity compared to a widely used cardiolipin probe 10-N-nonyl acridine orange. Our accumulated simulation data enables us to derive interpretable design rules linking coarse-grained structure to cardiolipin selectivity. The findings are corroborated by fluorescence anisotropy measurements of two compounds conforming to our defined design rules. Our findings highlight the potential of coarse-grained representations and multiscale modelling for materials discovery and design.
Collapse
Affiliation(s)
- Bernadette Mohr
- Van't Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam Amsterdam 1098 XH The Netherlands
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago Chicago Illinois 60637 USA
| | - Isabel S Kleinwächter
- Department of Chemistry - Biochemistry, Johannes Gutenberg University Mainz 55128 Mainz Germany
| | - Dirk Schneider
- Department of Chemistry - Biochemistry, Johannes Gutenberg University Mainz 55128 Mainz Germany
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago Chicago Illinois 60637 USA
| | - Tristan Bereau
- Van't Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam Amsterdam 1098 XH The Netherlands
- Max Planck Institute for Polymer Research 55128 Mainz Germany
| |
Collapse
|
23
|
Abstract
DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.In this chapter we provide an overview of the methods and tools used to create networks from microarray data and describe multiple methods on how to analyze a single network or a group of networks. The described methods range from topological metrics, functional group identification to data integration strategies, topological pathway analysis as well as graphical models.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology , University of Helsinki, Helsinki, Finland.
| |
Collapse
|
24
|
Wang Y, Chen L, Jo J, Wang Y. Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:623-632. [PMID: 34587021 DOI: 10.1109/tvcg.2021.3114765] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present Joint t-Stochastic Neighbor Embedding (Joint t-SNE), a technique to generate comparable projections of multiple high-dimensional datasets. Although t-SNE has been widely employed to visualize high-dimensional datasets from various domains, it is limited to projecting a single dataset. When a series of high-dimensional datasets, such as datasets changing over time, is projected independently using t-SNE, misaligned layouts are obtained. Even items with identical features across datasets are projected to different locations, making the technique unsuitable for comparison tasks. To tackle this problem, we introduce edge similarity, which captures the similarities between two adjacent time frames based on the Graphlet Frequency Distribution (GFD). We then integrate a novel loss term into the t-SNE loss function, which we call vector constraints, to preserve the vectors between projected points across the projections, allowing these points to serve as visual landmarks for direct comparisons between projections. Using synthetic datasets whose ground-truth structures are known, we show that Joint t-SNE outperforms existing techniques, including Dynamic t-SNE, in terms of local coherence error, Kullback-Leibler divergence, and neighborhood preservation. We also showcase a real-world use case to visualize and compare the activation of different layers of a neural network.
Collapse
|
25
|
Zambrana C, Xenos A, Böttcher R, Malod-Dognin N, Pržulj N. Network neighbors of viral targets and differentially expressed genes in COVID-19 are drug target candidates. Sci Rep 2021; 11:18985. [PMID: 34556735 PMCID: PMC8460804 DOI: 10.1038/s41598-021-98289-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 08/23/2021] [Indexed: 12/12/2022] Open
Abstract
The COVID-19 pandemic is raging. It revealed the importance of rapid scientific advancement towards understanding and treating new diseases. To address this challenge, we adapt an explainable artificial intelligence algorithm for data fusion and utilize it on new omics data on viral-host interactions, human protein interactions, and drugs to better understand SARS-CoV-2 infection mechanisms and predict new drug-target interactions for COVID-19. We discover that in the human interactome, the human proteins targeted by SARS-CoV-2 proteins and the genes that are differentially expressed after the infection have common neighbors central in the interactome that may be key to the disease mechanisms. We uncover 185 new drug-target interactions targeting 49 of these key genes and suggest re-purposing of 149 FDA-approved drugs, including drugs targeting VEGF and nitric oxide signaling, whose pathways coincide with the observed COVID-19 symptoms. Our integrative methodology is universal and can enable insight into this and other serious diseases.
Collapse
Affiliation(s)
| | | | | | - Noël Malod-Dognin
- Barcelona Supercomputing Center, Barcelona, Spain
- Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Nataša Pržulj
- Barcelona Supercomputing Center, Barcelona, Spain.
- Department of Computer Science, University College London, London, WC1E 6BT, UK.
- ICREA, Pg. Lluís Companys 23, Barcelona, Spain.
| |
Collapse
|
26
|
Nelson CJ, Bonner S. Neuronal Graphs: A Graph Theory Primer for Microscopic, Functional Networks of Neurons Recorded by Calcium Imaging. Front Neural Circuits 2021; 15:662882. [PMID: 34177469 PMCID: PMC8222695 DOI: 10.3389/fncir.2021.662882] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 04/13/2021] [Indexed: 11/13/2022] Open
Abstract
Connected networks are a fundamental structure of neurobiology. Understanding these networks will help us elucidate the neural mechanisms of computation. Mathematically speaking these networks are "graphs"-structures containing objects that are connected. In neuroscience, the objects could be regions of the brain, e.g., fMRI data, or be individual neurons, e.g., calcium imaging with fluorescence microscopy. The formal study of graphs, graph theory, can provide neuroscientists with a large bank of algorithms for exploring networks. Graph theory has already been applied in a variety of ways to fMRI data but, more recently, has begun to be applied at the scales of neurons, e.g., from functional calcium imaging. In this primer we explain the basics of graph theory and relate them to features of microscopic functional networks of neurons from calcium imaging-neuronal graphs. We explore recent examples of graph theory applied to calcium imaging and we highlight some areas where researchers new to the field could go awry.
Collapse
Affiliation(s)
- Carl J. Nelson
- School of Physics and Astronomy, University of Glasgow, Glasgow, United Kingdom
| | - Stephen Bonner
- School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
27
|
Barot M, Gligorijević V, Cho K, Bonneau R. NetQuilt: Deep Multispecies Network-based Protein Function Prediction using Homology-informed Network Similarity. Bioinformatics 2021; 37:2414-2422. [PMID: 33576802 PMCID: PMC8388039 DOI: 10.1093/bioinformatics/btab098] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 02/04/2021] [Accepted: 02/09/2021] [Indexed: 02/02/2023] Open
Abstract
Motivation Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks. Results In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism’s PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance. Availability and implementation The code is freely available at https://github.com/nowittynamesleft/NetQuilt. The data, including sequences, PPI networks and GO annotations are available at https://string-db.org/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meet Barot
- Center for Data Science, New York University, New York, 10011, USA
| | | | - Kyunghyun Cho
- Center for Data Science, New York University, New York, 10011, USA
| | - Richard Bonneau
- Center for Data Science, New York University, New York, 10011, USA.,Center for Computational Biology, Flatiron Institute, New York, 10010, USA
| |
Collapse
|
28
|
Gu S, Milenković T. Data-driven biological network alignment that uses topological, sequence, and functional information. BMC Bioinformatics 2021; 22:34. [PMID: 33514304 PMCID: PMC7847157 DOI: 10.1186/s12859-021-03971-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 01/15/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Network alignment (NA) can transfer functional knowledge between species' conserved biological network regions. Traditional NA assumes that it is topological similarity (isomorphic-like matching) between network regions that corresponds to the regions' functional relatedness. However, we recently found that functionally unrelated proteins are as topologically similar as functionally related proteins. So, we redefined NA as a data-driven method called TARA, which learns from network and protein functional data what kind of topological relatedness (rather than similarity) between proteins corresponds to their functional relatedness. TARA used topological information (within each network) but not sequence information (between proteins across networks). Yet, TARA yielded higher protein functional prediction accuracy than existing NA methods, even those that used both topological and sequence information. RESULTS Here, we propose TARA++ that is also data-driven, like TARA and unlike other existing methods, but that uses across-network sequence information on top of within-network topological information, unlike TARA. To deal with the within-and-across-network analysis, we adapt social network embedding to the problem of biological NA. TARA++ outperforms protein functional prediction accuracy of existing methods. CONCLUSIONS As such, combining research knowledge from different domains is promising. Overall, improvements in protein functional prediction have biomedical implications, for example allowing researchers to better understand how cancer progresses or how humans age.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, Eck Institute for Global Health, Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, Eck Institute for Global Health, Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
29
|
O'Leary J, Mao R, Pretti EJ, Paulson JA, Mittal J, Mesbah A. Deep learning for characterizing the self-assembly of three-dimensional colloidal systems. SOFT MATTER 2021; 17:989-999. [PMID: 33284930 DOI: 10.1039/d0sm01853h] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Creating a systematic framework to characterize the structural states of colloidal self-assembly systems is crucial for unraveling the fundamental understanding of these systems' stochastic and non-linear behavior. The most accurate characterization methods create high-dimensional neighborhood graphs that may not provide useful information about structures unless these are well-defined reference crystalline structures. Dimensionality reduction methods are thus required to translate the neighborhood graphs into a low-dimensional space that can be easily interpreted and used to characterize non-reference structures. We investigate a framework for colloidal system state characterization that employs deep learning methods to reduce the dimensionality of neighborhood graphs. The framework next uses agglomerative hierarchical clustering techniques to partition the low-dimensional space and assign physically meaningful classifications to the resulting partitions. We first demonstrate the proposed colloidal self-assembly state characterization framework on a three-dimensional in silico system of 500 multi-flavored colloids that self-assemble under isothermal conditions. We next investigate the generalizability of the characterization framework by applying the framework to several independent self-assembly trajectories, including a three-dimensional in silico system of 2052 colloidal particles that undergo evaporation-induced self-assembly.
Collapse
Affiliation(s)
- Jared O'Leary
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720, USA.
| | | | | | | | | | | |
Collapse
|
30
|
Fiscarelli AM, Brust MR, Bouffanais R, Piyatumrong A, Danoy G, Bouvry P. Interplay between success and patterns of human collaboration: case study of a Thai Research Institute. Sci Rep 2021; 11:318. [PMID: 33431924 PMCID: PMC7801490 DOI: 10.1038/s41598-020-79447-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 12/02/2020] [Indexed: 11/09/2022] Open
Abstract
Networks of collaboration are notoriously complex and the mechanisms underlying their evolution, although of high interest, are still not fully understood. In particular, collaboration networks can be used to model the interactions between scientists and analyze the circumstances that lead to successful research. This task is not trivial and conventional metrics, based on number of publications and number of citations of individual authors, may not be sufficient to provide a deep insight into the factors driving scientific success. However, network analysis techniques based on centrality measures and measures of the structural properties of the network are promising to that effect. In recent years, it has become evident that most successful research works are achieved by teams rather than individual researchers. Therefore, researchers have developed a keen interest in the dynamics of social groups. In this study, we use real world data from a Thai computer technology research center, where researchers collaborate on different projects and team up to produce a range of artifacts. For each artifact, a score that measures quality of research is available and shared between the researchers that contributed to its creation, according to their percentage of contribution. We identify several measures to quantify productivity and quality of work, as well as centrality measures and structural measures. We find that, at individual level, centrality metrics are linked to high productivity and quality of work, suggesting that researchers who cover strategic positions in the network of collaboration are more successful. At the team level, we show that the evolution in time of structural measures are also linked to high productivity and quality of work. This result suggests that variables such as team size, turnover rate, team compactness and team openness are critical factors that must be taken into account for the success of a team. The key findings of this study indicate that the success of a research institute needs to be assessed in the context of not just researcher or team level, but also on how the researchers engage in collaboration as well as on how teams evolve.
Collapse
Affiliation(s)
- Antonio Maria Fiscarelli
- Luxembourg Centre for Contemporary and Digital History (C2DH), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Matthias R Brust
- Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| | - Roland Bouffanais
- Department of Mechanical Engineering, University of Ottawa, Ottawa, Canada
| | - Apivadee Piyatumrong
- NSTDA Supercomputer Center (ThaiSC), National Electronics and Computer Technology Center (NECTEC), Pathum Thani, Thailand
| | - Grégoire Danoy
- Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Department of Computer Science (FSTM/DCS), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Pascal Bouvry
- Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Esch-sur-Alzette, Luxembourg.
- Department of Computer Science (FSTM/DCS), University of Luxembourg, Esch-sur-Alzette, Luxembourg.
| |
Collapse
|
31
|
Bibi N, Hupp T, Kamal MA, Rashid S. Elucidation of PLK1 Linked Biomarkers in Oesophageal Cancer Cell Lines: A Step Towards Novel Signaling Pathways by p53 and PLK1-Linked Functions Crosstalk. Protein Pept Lett 2021; 28:340-358. [PMID: 32875973 DOI: 10.2174/0929866527999200901201837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 06/30/2020] [Accepted: 07/03/2020] [Indexed: 02/08/2023]
Abstract
BACKGROUND Oesophgeal adenocarcinoma (OAC) is the most frequent cause of cancer death. POLO-like kinase 1 (PLK1) is overexpressed in broad spectrum of tumors and has prognostic value in many cancers including esophageal cancer, suggesting its potential as a therapeutic target. p53, the guardian of genome is the most important tumor suppressors that represses the promoter of PLK1, whereas tumor cells with inactive p53 are arrested in mitosis due to DNA damage. PLK1 expression has been linked to the elevated p53 expression and has been shown to act as a biomarker that predicts poor prognosis in OAC. OBJECTIVES The aim of the present study was identification of PLK1 associated phosphorylation targets in p53 mutant and p53 normal cells to explore the downstream signaling evets. METHODS Here we develop a proof-of-concept phospho-proteomics approach to identify possible biomarkers that can be used to identify mutant p53 or wild-type p53 pathways. We treated PLK1 asynchronously followed by mass spectrometry data analysis. Protein networking and motif analysis tools were used to identify the significant clusters and potential biomarkers. RESULTS We investigated approximately 1300 potential PLK1-dependent phosphopeptides by LCMS/ MS. In total, 2216 and 1155 high confidence phosphosites were identified in CP-A (p53+) and OE33 (p53-) cell lines owing to PLK1 inhibition. Further clustering and motif assessment uncovered many significant biomarkers with known and novel link to PLK1. CONCLUSION Taken together, our study suggests that PLK1 may serve as a potential therapeutic target in human OAC. The data highlight the efficacy and specificity of small molecule PLK1 kinase inhibitors to identify novel signaling pathways in vivo.
Collapse
Affiliation(s)
- Nousheen Bibi
- Department of Bioinformatics, Shaheed Benazir Bhutto Women University, Peshawar, Pakistan
| | - Ted Hupp
- Edinburgh Cancer Research Center, University of Edinburgh, Scotland, United Kingdom
| | - Mohammad Amjad Kamal
- West China School of Nursing / Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu 610041, Sichuan, Saudi Arabia
| | - Sajid Rashid
- National Center for Bioinformatics, Quaid-i-Azam University, Islamabad, Pakistan
| |
Collapse
|
32
|
Maharaj S, Qian T, Ohiba Z, Hayes W. Common Neighbors Extension of the Sticky Model for PPI Networks Evaluated by Global and Local Graphlet Similarity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:16-26. [PMID: 32809943 DOI: 10.1109/tcbb.2020.3017374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The structure of protein-protein interaction (PPI) networks has been studied for over a decade. Many theoretical models have been proposed to model PPI network structure, but continuing noise and incompleteness in these networks make conclusions about their structure difficult. Using newer, larger networks from Sept. 2018 BioGRID and Jan. 2019 IID, we show the joint distribution of degree products and common neighbors has a greater impact on PPI edge connectivity than their individual distributions, and introduce two new models (CN and STICKY-CN) for PPI networks employing these features. Since graphlet-based measures are believed to be among the most discerning and sensitive network comparison tools available, we assess their overall global and local fits to PPI networks using Graphlet Kernel (GK). We fit 10 theoretical models to nine BioGRID networks and twelve Integrated Interactive Database (IID) networks and find: (1) STICKY and STICKY-CN are the overall globally best fitting models according to GK, (2) Hyperbolic Geometric Graph model is a better fit than any STICKY-based model on 4 species, (3) though STICKY-CN provides a better local fit than the STICKY model, the CN model provides the greatest local fit over most species. We conclude that the inclusion of CN into STICKY-CN makes it the best overall fit for PPI networks as it is a good fit locally and globally.
Collapse
|
33
|
Doria-Belenguer S, Youssef MK, Böttcher R, Malod-Dognin N, Pržulj N. Probabilistic graphlets capture biological function in probabilistic molecular networks. Bioinformatics 2020; 36:i804-i812. [PMID: 33381834 DOI: 10.1093/bioinformatics/btaa812] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Molecular interactions have been successfully modeled and analyzed as networks, where nodes represent molecules and edges represent the interactions between them. These networks revealed that molecules with similar local network structure also have similar biological functions. The most sensitive measures of network structure are based on graphlets. However, graphlet-based methods thus far are only applicable to unweighted networks, whereas real-world molecular networks may have weighted edges that can represent the probability of an interaction occurring in the cell. This information is commonly discarded when applying thresholds to generate unweighted networks, which may lead to information loss. RESULTS We introduce probabilistic graphlets as a tool for analyzing the local wiring patterns of probabilistic networks. To assess their performance compared to unweighted graphlets, we generate synthetic networks based on different well-known random network models and edge probability distributions and demonstrate that probabilistic graphlets outperform their unweighted counterparts in distinguishing network structures. Then we model different real-world molecular interaction networks as weighted graphs with probabilities as weights on edges and we analyze them with our new weighted graphlets-based methods. We show that due to their probabilistic nature, probabilistic graphlet-based methods more robustly capture biological information in these data, while simultaneously showing a higher sensitivity to identify condition-specific functions compared to their unweighted graphlet-based method counterparts. AVAILABILITYAND IMPLEMENTATION Our implementation of probabilistic graphlets is available at https://github.com/Serdobe/Probabilistic_Graphlets. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sergio Doria-Belenguer
- Barcelona Supercomputing Center, Barcelona 08034, Spain.,Universitat Politècnica de Catalunya (UPC), Barcelona 08034, Spain
| | - Markus K Youssef
- Barcelona Supercomputing Center, Barcelona 08034, Spain.,Universitat Politècnica de Catalunya (UPC), Barcelona 08034, Spain
| | - René Böttcher
- Barcelona Supercomputing Center, Barcelona 08034, Spain
| | - Noël Malod-Dognin
- Barcelona Supercomputing Center, Barcelona 08034, Spain.,Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Nataša Pržulj
- Barcelona Supercomputing Center, Barcelona 08034, Spain.,Department of Computer Science, University College London, London WC1E 6BT, UK.,ICREA, Barcelona 08010, Spain
| |
Collapse
|
34
|
Huang K, Xiao C, Glass LM, Zitnik M, Sun J. SkipGNN: predicting molecular interactions with skip-graph networks. Sci Rep 2020; 10:21092. [PMID: 33273494 PMCID: PMC7713130 DOI: 10.1038/s41598-020-77766-9] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 11/17/2020] [Indexed: 11/17/2022] Open
Abstract
Molecular interaction networks are powerful resources for molecular discovery. They are increasingly used with machine learning methods to predict biologically meaningful interactions. While deep learning on graphs has dramatically advanced the prediction prowess, current graph neural network (GNN) methods are mainly optimized for prediction on the basis of direct similarity between interacting nodes. In biological networks, however, similarity between nodes that do not directly interact has proved incredibly useful in the last decade across a variety of interaction networks. Here, we present SkipGNN, a graph neural network approach for the prediction of molecular interactions. SkipGNN predicts molecular interactions by not only aggregating information from direct interactions but also from second-order interactions, which we call skip similarity. In contrast to existing GNNs, SkipGNN receives neural messages from two-hop neighbors as well as immediate neighbors in the interaction network and non-linearly transforms the messages to obtain useful information for prediction. To inject skip similarity into a GNN, we construct a modified version of the original network, called the skip graph. We then develop an iterative fusion scheme that optimizes a GNN using both the skip graph and the original graph. Experiments on four interaction networks, including drug–drug, drug–target, protein–protein, and gene–disease interactions, show that SkipGNN achieves superior and robust performance. Furthermore, we show that unlike popular GNNs, SkipGNN learns biologically meaningful embeddings and performs especially well on noisy, incomplete interaction networks.
Collapse
Affiliation(s)
- Kexin Huang
- Health Data Science, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Cao Xiao
- Analytics Center of Excellence, IQVIA, Cambridge, MA, USA
| | - Lucas M Glass
- Analytics Center of Excellence, IQVIA, Cambridge, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, Boston, MA, USA
| | - Jimeng Sun
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
35
|
Klimm F, Toledo EM, Monfeuga T, Zhang F, Deane CM, Reinert G. Functional module detection through integration of single-cell RNA sequencing data with protein-protein interaction networks. BMC Genomics 2020; 21:756. [PMID: 33138772 PMCID: PMC7607865 DOI: 10.1186/s12864-020-07144-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 10/12/2020] [Indexed: 12/14/2022] Open
Abstract
Background Recent advances in single-cell RNA sequencing have allowed researchers to explore transcriptional function at a cellular level. In particular, single-cell RNA sequencing reveals that there exist clusters of cells with similar gene expression profiles, representing different transcriptional states. Results In this study, we present scPPIN, a method for integrating single-cell RNA sequencing data with protein–protein interaction networks that detects active modules in cells of different transcriptional states. We achieve this by clustering RNA-sequencing data, identifying differentially expressed genes, constructing node-weighted protein–protein interaction networks, and finding the maximum-weight connected subgraphs with an exact Steiner-tree approach. As case studies, we investigate two RNA-sequencing data sets from human liver spheroids and human adipose tissue, respectively. With scPPIN we expand the output of differential expressed genes analysis with information from protein interactions. We find that different transcriptional states have different subnetworks of the protein–protein interaction networks significantly enriched which represent biological pathways. In these pathways, scPPIN identifies proteins that are not differentially expressed but have a crucial biological function (e.g., as receptors) and therefore reveals biology beyond a standard differential expressed gene analysis. Conclusions The introduced scPPIN method can be used to systematically analyse differentially expressed genes in single-cell RNA sequencing data by integrating it with protein interaction data. The detected modules that characterise each cluster help to identify and hypothesise a biological function associated to those cells. Our analysis suggests the participation of unexpected proteins in these pathways that are undetectable from the single-cell RNA sequencing data alone. The techniques described here are applicable to other organisms and tissues. Supplementary Information The online version contains supplementary material available at (doi:10.1186/s12864-020-07144-2).
Collapse
Affiliation(s)
- Florian Klimm
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK. .,Mitochondrial Biology Unit, University of Cambridge, Cambridge, CB2 0XY, UK.
| | - Enrique M Toledo
- Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford, Oxford, OX3 7FZ, UK
| | - Thomas Monfeuga
- Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford, Oxford, OX3 7FZ, UK
| | - Fang Zhang
- Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford, Oxford, OX3 7FZ, UK
| | | | - Gesine Reinert
- Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| |
Collapse
|
36
|
|
37
|
Delgado-Chaves FM, Gómez-Vela F, Divina F, García-Torres M, Rodriguez-Baena DS. Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks. Genes (Basel) 2020; 11:E831. [PMID: 32708319 PMCID: PMC7397019 DOI: 10.3390/genes11070831] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 06/26/2020] [Accepted: 07/13/2020] [Indexed: 12/21/2022] Open
Abstract
Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E Δ H S C compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E Δ H S C mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E Δ H S C mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches.
Collapse
|
38
|
Abstract
In this study, we deal with the problem of biological network alignment (NA), which aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for the transfer of functional knowledge between the aligned nodes. We provide evidence that current NA methods, which assume that topologically similar nodes (i.e., nodes whose network neighborhoods are isomorphic-like) have high functional relatedness, do not actually end up aligning functionally related nodes. That is, we show that the current topological similarity assumption does not hold well. Consequently, we argue that a paradigm shift is needed with how the NA problem is approached. So, we redefine NA as a data-driven framework, called TARA (data-driven NA), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity. TARA makes no assumptions about what nodes should be aligned, distinguishing it from existing NA methods. Specifically, TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns (features). We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. TARA as currently implemented uses topological but not protein sequence information for functional knowledge transfer. In this context, we find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance. The software and data are available at http://www.nd.edu/~cone/TARA/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| |
Collapse
|
39
|
Abstract
MOTIVATION The structure of chromatin impacts gene expression. Its alteration has been shown to coincide with the occurrence of cancer. A key challenge is in understanding the role of chromatin structure (CS) in cellular processes and its implications in diseases. RESULTS We propose a comparative pipeline to analyze CSs and apply it to study chronic lymphocytic leukemia (CLL). We model the chromatin of the affected and control cells as networks and analyze the network topology by state-of-the-art methods. Our results show that CSs are a rich source of new biological and functional information about DNA elements and cells that can complement protein-protein and co-expression data. Importantly, we show the existence of structural markers of cancer-related DNA elements in the chromatin. Surprisingly, CLL driver genes are characterized by specific local wiring patterns not only in the CS network of CLL cells, but also of healthy cells. This allows us to successfully predict new CLL-related DNA elements. Importantly, this shows that we can identify cancer-related DNA elements in other cancer types by investigating the CS network of the healthy cell of origin, a key new insight paving the road to new therapeutic strategies. This gives us an opportunity to exploit chromosome conformation data in healthy cells to predict new drivers. AVAILABILITY AND IMPLEMENTATION Our predicted CLL genes and RNAs are provided as a free resource to the community at https://life.bsc.es/iconbi/chromatin/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- N Malod-Dognin
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - V Pancaldi
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
- Centre de Recherches en Cancérologie de Toulouse (CRCT), Toulouse 31037, France
- University Paul Sabatier III, Toulouse 31330, France
| | - A Valencia
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
- ICREA, Pg. Lluís Companys 23, Barcelona 08010, Spain
- Coordination Node, Spanish National Bioinformatics Institute, ELIXIR-Spain (INB, ELIXIR-ES), Madrid 28029, Spain
| | - N Pržulj
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
- Department of Computer Science, University College London, London WC1E 6BT, UK
- ICREA, Pg. Lluís Companys 23, Barcelona 08010, Spain
| |
Collapse
|
40
|
Newaz K, Ghalehnovi M, Rahnama A, Antsaklis PJ, Milenković T. Network-based protein structural classification. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191461. [PMID: 32742675 PMCID: PMC7353965 DOI: 10.1098/rsos.191461] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 05/05/2020] [Indexed: 06/11/2023]
Abstract
Experimental determination of protein function is resource-consuming. As an alternative, computational prediction of protein function has received attention. In this context, protein structural classification (PSC) can help, by allowing for determining structural classes of currently unclassified proteins based on their features, and then relying on the fact that proteins with similar structures have similar functions. Existing PSC approaches rely on sequence-based or direct three-dimensional (3D) structure-based protein features. By contrast, we first model 3D structures of proteins as protein structure networks (PSNs). Then, we use network-based features for PSC. We propose the use of graphlets, state-of-the-art features in many research areas of network science, in the task of PSC. Moreover, because graphlets can deal only with unweighted PSNs, and because accounting for edge weights when constructing PSNs could improve PSC accuracy, we also propose a deep learning framework that automatically learns network features from weighted PSNs. When evaluated on a large set of approximately 9400 CATH and approximately 12 800 SCOP protein domains (spanning 36 PSN sets), the best of our proposed approaches are superior to existing PSC approaches in terms of accuracy, with comparable running times. Our data and code are available at https://doi.org/10.5281/zenodo.3787922.
Collapse
Affiliation(s)
- Khalique Newaz
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN 46556, USA
- Eck institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Mahboobeh Ghalehnovi
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Arash Rahnama
- Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Panos J. Antsaklis
- Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN 46556, USA
- Eck institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
41
|
Ni P, Wang J, Zhong P, Li Y, Wu FX, Pan Y. Constructing Disease Similarity Networks Based on Disease Module Theory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:906-915. [PMID: 29993782 DOI: 10.1109/tcbb.2018.2817624] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Quantifying the associations between diseases is now playing an important role in modern biology and medicine. Actually discovering associations between diseases could help us gain deeper insights into pathogenic mechanisms of complex diseases, thus could lead to improvements in disease diagnosis, drug repositioning, and drug development. Due to the growing body of high-throughput biological data, a number of methods have been developed for computing similarity between diseases during the past decade. However, these methods rarely consider the interconnections of genes related to each disease in protein-protein interaction network (PPIN). Recently, the disease module theory has been proposed, which states that disease-related genes or proteins tend to interact with each other in the same neighborhood of a PPIN. In this study, we propose a new method called ModuleSim to measure associations between diseases by using disease-gene association data and PPIN data based on disease module theory. The experimental results show that by considering the interactions between disease modules and their modularity, the disease similarity calculated by ModuleSim has a significant correlation with disease classification of Disease Ontology (DO). Furthermore, ModuleSim outperforms other four popular methods which are all using disease-gene association data and PPIN data to measure disease-disease associations. In addition, the disease similarity network constructed by MoudleSim suggests that ModuleSim is capable of finding potential associations between diseases.
Collapse
|
42
|
System Network Complexity: Network Evolution Subgraphs of System State Series. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2020. [DOI: 10.1109/tetci.2018.2848293] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
43
|
Kernel Differential Subgraph Analysis to Reveal the Key Period Affecting Glioblastoma. Biomolecules 2020; 10:biom10020318. [PMID: 32079293 PMCID: PMC7072688 DOI: 10.3390/biom10020318] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 02/05/2020] [Accepted: 02/10/2020] [Indexed: 12/26/2022] Open
Abstract
Glioblastoma (GBM) is a fast-growing type of malignant primary brain tumor. To explore the mechanisms in GBM, complex biological networks are used to reveal crucial changes among different biological states, which reflect on the development of living organisms. It is critical to discover the kernel differential subgraph (KDS) that leads to drastic changes. However, identifying the KDS is similar to the Steiner Tree problem that is an NP-hard problem. In this paper, we developed a criterion to explore the KDS (CKDS), which considered the connectivity and scale of KDS, the topological difference of nodes and function relevance between genes in the KDS. The CKDS algorithm was applied to simulated datasets and three single-cell RNA sequencing (scRNA-seq) datasets including GBM, fetal human cortical neurons (FHCN) and neural differentiation. Then we performed the network topology and functional enrichment analyses on the extracted KDSs. Compared with the state-of-art methods, the CKDS algorithm outperformed on simulated datasets to discover the KDSs. In the GBM and FHCN, seventeen genes (one biomarker, nine regulatory genes, one driver genes, six therapeutic targets) and KEGG pathways in KDSs were strongly supported by literature mining that they were highly interrelated with GBM. Moreover, focused on GBM, there were fifteen genes (including ten regulatory genes, three driver genes, one biomarkers, one therapeutic target) and KEGG pathways found in the KDS of neural differentiation process from activated neural stem cells (aNSC) to neural progenitor cells (NPC), while few genes and no pathway were found in the period from NPC to astrocytes (Ast). These experiments indicated that the process from aNSC to NPC is a key differentiation period affecting the development of GBM. Therefore, the CKDS algorithm provides a unique perspective in identifying cell-type-specific genes and KDSs.
Collapse
|
44
|
Zhang Y, Chen Y, Hu T. PANDA: Prioritization of autism-genes using network-based deep-learning approach. Genet Epidemiol 2020; 44:382-394. [PMID: 32039500 DOI: 10.1002/gepi.22282] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 12/31/2019] [Accepted: 01/27/2020] [Indexed: 11/06/2022]
Abstract
Understanding the genetic background of complex diseases and disorders plays an essential role in the promising precision medicine. The evaluation of candidate genes, however, requires time-consuming and expensive experiments given a large number of possibilities. Thus, computational methods have seen increasing applications in predicting gene-disease associations. We proposed a bioinformatics framework, Prioritization of Autism-genes using Network-based Deep-learning Approach (PANDA). Our approach aims to identify autism-genes across the human genome based on patterns of gene-gene interactions and topological similarity of genes in the interaction network. PANDA trains a graph deep learning classifier using the input of the human molecular interaction network and predicts and ranks the probability of autism association of every node (gene) in the network. PANDA was able to achieve a high classification accuracy of 89%, outperforming three other commonly used machine learning algorithms. Moreover, the gene prioritization ranking list produced by PANDA was evaluated and validated using an independent large-scale exome-sequencing study. The top 10% of PANDA-ranked genes were found significantly enriched for autism association.
Collapse
Affiliation(s)
- Yu Zhang
- Department of Computer Science, Memorial University, St. John's, Newfoundland and Labrador, Canada
| | - Yuanzhu Chen
- Department of Computer Science, Memorial University, St. John's, Newfoundland and Labrador, Canada
| | - Ting Hu
- Department of Computer Science, Memorial University, St. John's, Newfoundland and Labrador, Canada.,School of Computing, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
45
|
Hayes WB. An Introductory Guide to Aligning Networks Using SANA, the Simulated Annealing Network Aligner. Methods Mol Biol 2020; 2074:263-284. [PMID: 31583643 DOI: 10.1007/978-1-4939-9873-9_18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Sequence alignment has had an enormous impact on our understanding of biology, evolution, and disease. The alignment of biological networks holds similar promise. Biological networks generally model interactions between biomolecules such as proteins, genes, metabolites, or mRNAs. There is strong evidence that the network topology-the "structure" of the network-is correlated with the functions performed, so that network topology can be used to help predict or understand function. However, unlike sequence comparison and alignment-which is an essentially solved problem-network comparison and alignment is an NP-complete problem for which heuristic algorithms must be used.Here we introduce SANA, the Simulated Annealing Network Aligner. SANA is one of many algorithms proposed for the arena of biological network alignment. In the context of global network alignment, SANA stands out for its speed, memory efficiency, ease-of-use, and flexibility in the arena of producing alignments between two or more networks. SANA produces better alignments in minutes on a laptop than most other algorithms can produce in hours or days of CPU time on large server-class machines. We walk the user through how to use SANA for several types of biomolecular networks.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, USA.
| |
Collapse
|
46
|
Windels SFL, Malod-Dognin N, Pržulj N. Graphlet Laplacians for topology-function and topology-disease relationships. Bioinformatics 2019; 35:5226-5234. [PMID: 31192358 DOI: 10.1093/bioinformatics/btz455] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 05/08/2019] [Accepted: 06/10/2019] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Laplacian matrices capture the global structure of networks and are widely used to study biological networks. However, the local structure of the network around a node can also capture biological information. Local wiring patterns are typically quantified by counting how often a node touches different graphlets (small, connected, induced sub-graphs). Currently available graphlet-based methods do not consider whether nodes are in the same network neighbourhood. To combine graphlet-based topological information and membership of nodes to the same network neighbourhood, we generalize the Laplacian to the Graphlet Laplacian, by considering a pair of nodes to be 'adjacent' if they simultaneously touch a given graphlet. RESULTS We utilize Graphlet Laplacians to generalize spectral embedding, spectral clustering and network diffusion. Applying Graphlet Laplacian-based spectral embedding, we visually demonstrate that Graphlet Laplacians capture biological functions. This result is quantified by applying Graphlet Laplacian-based spectral clustering, which uncovers clusters enriched in biological functions dependent on the underlying graphlet. We explain the complementarity of biological functions captured by different Graphlet Laplacians by showing that they capture different local topologies. Finally, diffusing pan-cancer gene mutation scores based on different Graphlet Laplacians, we find complementary sets of cancer-related genes. Hence, we demonstrate that Graphlet Laplacians capture topology-function and topology-disease relationships in biological networks. AVAILABILITY AND IMPLEMENTATION http://www0.cs.ucl.ac.uk/staff/natasa/graphlet-laplacian/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sam F L Windels
- Department of Computer Science, University College London, London, WC1E 6BT, United Kingdom
| | | | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, United Kingdom.,Barcelona Supercomputing Center, Barcelona, 08034, Spain.,ICREA, Pg. Lluís Companys 23, Barcelona, 08010, Spain
| |
Collapse
|
47
|
Tantardini M, Ieva F, Tajoli L, Piccardi C. Comparing methods for comparing networks. Sci Rep 2019; 9:17557. [PMID: 31772246 PMCID: PMC6879644 DOI: 10.1038/s41598-019-53708-y] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 10/25/2019] [Indexed: 11/17/2022] Open
Abstract
With the impressive growth of available data and the flexibility of network modelling, the problem of devising effective quantitative methods for the comparison of networks arises. Plenty of such methods have been designed to accomplish this task: most of them deal with undirected and unweighted networks only, but a few are capable of handling directed and/or weighted networks too, thus properly exploiting richer information. In this work, we contribute to the effort of comparing the different methods for comparing networks and providing a guide for the selection of an appropriate one. First, we review and classify a collection of network comparison methods, highlighting the criteria they are based on and their advantages and drawbacks. The set includes methods requiring known node-correspondence, such as DeltaCon and Cut Distance, as well as methods not requiring a priori known node-correspondence, such as alignment-based, graphlet-based, and spectral methods, and the recently proposed Portrait Divergence and NetLSD. We test the above methods on synthetic networks and we assess their usability and the meaningfulness of the results they provide. Finally, we apply the methods to two real-world datasets, the European Air Transportation Network and the FAO Trade Network, in order to discuss the results that can be drawn from this type of analysis.
Collapse
Affiliation(s)
| | - Francesca Ieva
- MOX - Modelling and Scientific Computing Lab, Department of Mathematics, Politecnico di Milano, Via Bonardi 9, 20133, Milano, Italy.,CADS - Center for Analysis, Decisions and Society, Human Technopole, 20157, Milano, Italy
| | - Lucia Tajoli
- Department of Management, Economics and Industrial Engineering, Politecnico di Milano, Via Lambruschini 4/b, 20156, Milano, Italy
| | - Carlo Piccardi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133, Milano, Italy.
| |
Collapse
|
48
|
Delgado-Chaves FM, Gómez-Vela F, García-Torres M, Divina F, Vázquez Noguera JL. Computational Inference of Gene Co-Expression Networks for the identification of Lung Carcinoma Biomarkers: An Ensemble Approach. Genes (Basel) 2019; 10:E962. [PMID: 31766738 PMCID: PMC6947459 DOI: 10.3390/genes10120962] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 10/28/2019] [Accepted: 10/31/2019] [Indexed: 12/22/2022] Open
Abstract
Gene Networks (GN), have emerged as an useful tool in recent years for the analysis of different diseases in the field of biomedicine. In particular, GNs have been widely applied for the study and analysis of different types of cancer. In this context, Lung carcinoma is among the most common cancer types and its short life expectancy is partly due to late diagnosis. For this reason, lung cancer biomarkers that can be easily measured are highly demanded in biomedical research. In this work, we present an application of gene co-expression networks in the modelling of lung cancer gene regulatory networks, which ultimately served to the discovery of new biomarkers. For this, a robust GN inference was performed from microarray data concomitantly using three different co-expression measures. Results identified a major cluster of genes involved in SRP-dependent co-translational protein target to membrane, as well as a set of 28 genes that were exclusively found in networks generated from cancer samples. Amongst potential biomarkers, genes N C K A P 1 L and D M D are highlighted due to their implications in a considerable portion of lung and bronchus primary carcinomas. These findings demonstrate the potential of GN reconstruction in the rational prediction of biomarkers.
Collapse
Affiliation(s)
- Fernando M. Delgado-Chaves
- Division of Computer Science, Pablo de Olavide University, 41013 Seville, Spain; (F.M.D.-C.); (M.G.-T.); (F.D.)
| | - Francisco Gómez-Vela
- Division of Computer Science, Pablo de Olavide University, 41013 Seville, Spain; (F.M.D.-C.); (M.G.-T.); (F.D.)
| | - Miguel García-Torres
- Division of Computer Science, Pablo de Olavide University, 41013 Seville, Spain; (F.M.D.-C.); (M.G.-T.); (F.D.)
| | - Federico Divina
- Division of Computer Science, Pablo de Olavide University, 41013 Seville, Spain; (F.M.D.-C.); (M.G.-T.); (F.D.)
| | | |
Collapse
|
49
|
Li Q, Milenkovic T. Supervised prediction of aging-related genes from a context-specific protein interaction subnetwork. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) 2019:130-137. [DOI: 10.1109/bibm47256.2019.8983063] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
50
|
Gligorijevic V, Barot M, Bonneau R. deepNF: deep network fusion for protein function prediction. Bioinformatics 2019; 34:3873-3881. [PMID: 29868758 PMCID: PMC6223364 DOI: 10.1093/bioinformatics/bty440] [Citation(s) in RCA: 131] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 05/28/2018] [Indexed: 01/10/2023] Open
Abstract
Motivation The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly non-linear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. Results We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting gene ontology terms of varying type and specificity. Availability and implementation deepNF is freely available at: https://github.com/VGligorijevic/deepNF. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vladimir Gligorijevic
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Meet Barot
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.,Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA.,Center for Data Science, New York University, New York, NY, USA
| |
Collapse
|