1
|
Chang M, Ahn J, Kang BG, Yoon S. Cross-modal embedding integrator for disease-gene/protein association prediction using a multi-head attention mechanism. Pharmacol Res Perspect 2024; 12:e70034. [PMID: 39560053 PMCID: PMC11574662 DOI: 10.1002/prp2.70034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 09/07/2024] [Accepted: 10/28/2024] [Indexed: 11/20/2024] Open
Abstract
Knowledge graphs, powerful tools that explicitly transfer knowledge to machines, have significantly advanced new knowledge inferences. Discovering unknown relationships between diseases and genes/proteins in biomedical knowledge graphs can lead to the identification of disease development mechanisms and new treatment targets. Generating high-quality representations of biomedical entities is essential for successfully predicting disease-gene/protein associations. We developed a computational model that predicts disease-gene/protein associations using the Precision Medicine Knowledge Graph, a biomedical knowledge graph. Embeddings of biomedical entities were generated using two different methods-a large language model (LLM) and the knowledge graph embedding (KGE) algorithm. The LLM utilizes information obtained from massive amounts of text data, whereas the KGE algorithm relies on graph structures. We developed a disease-gene/protein association prediction model, "Cross-Modal Embedding Integrator (CMEI)," by integrating embeddings from different modalities using a multi-head attention mechanism. The area under the receiver operating characteristic curve of CMEI was 0.9662 (± 0.0002) in predicting disease-gene/protein associations. In conclusion, we developed a computational model that effectively predicts disease-gene/protein associations. CMEI may contribute to the identification of disease development mechanisms and new treatment targets.
Collapse
Affiliation(s)
- Munyoung Chang
- Education and Research Program for Future ICT Pioneers, Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
| | - Junyong Ahn
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea
| | - Bong Gyun Kang
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea
| | - Sungroh Yoon
- Education and Research Program for Future ICT Pioneers, Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
| |
Collapse
|
2
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
3
|
Arici MK, Tuncbag N. Performance Assessment of the Network Reconstruction Approaches on Various Interactomes. Front Mol Biosci 2021; 8:666705. [PMID: 34676243 PMCID: PMC8523993 DOI: 10.3389/fmolb.2021.666705] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 07/14/2021] [Indexed: 01/04/2023] Open
Abstract
Beyond the list of molecules, there is a necessity to collectively consider multiple sets of omic data and to reconstruct the connections between the molecules. Especially, pathway reconstruction is crucial to understanding disease biology because abnormal cellular signaling may be pathological. The main challenge is how to integrate the data together in an accurate way. In this study, we aim to comparatively analyze the performance of a set of network reconstruction algorithms on multiple reference interactomes. We first explored several human protein interactomes, including PathwayCommons, OmniPath, HIPPIE, iRefWeb, STRING, and ConsensusPathDB. The comparison is based on the coverage of each interactome in terms of cancer driver proteins, structural information of protein interactions, and the bias toward well-studied proteins. We next used these interactomes to evaluate the performance of network reconstruction algorithms including all-pair shortest path, heat diffusion with flux, personalized PageRank with flux, and prize-collecting Steiner forest (PCSF) approaches. Each approach has its own merits and weaknesses. Among them, PCSF had the most balanced performance in terms of precision and recall scores when 28 pathways from NetPath were reconstructed using the listed algorithms. Additionally, the reference interactome affects the performance of the network reconstruction approaches. The coverage and disease- or tissue-specificity of each interactome may vary, which may result in differences in the reconstructed networks.
Collapse
Affiliation(s)
- M Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, Turkey.,School of Medicine, Koc University, Istanbul, Turkey
| |
Collapse
|
4
|
Sledzieski S, Singh R, Cowen L, Berger B. D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Syst 2021; 12:969-982.e6. [PMID: 34536380 PMCID: PMC8586911 DOI: 10.1016/j.cels.2021.08.010] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 08/01/2021] [Accepted: 08/19/2021] [Indexed: 11/29/2022]
Abstract
We combine advances in neural language modeling and structurally motivated design to develop D-SCRIPT, an interpretable and generalizable deep-learning model, which predicts interaction between two proteins using only their sequence and maintains high accuracy with limited training data and across species. We show that a D-SCRIPT model trained on 38,345 human PPIs enables significantly improved functional characterization of fly proteins compared with the state-of-the-art approach. Evaluating the same D-SCRIPT model on protein complexes with known 3D structure, we find that the inter-protein contact map output by D-SCRIPT has significant overlap with the ground truth. We apply D-SCRIPT to screen for PPIs in cow (Bos taurus) at a genome-wide scale and focusing on rumen physiology, identify functional gene modules related to metabolism and immune response. The predicted interactions can then be leveraged for function prediction at scale, addressing the genome-to-phenome challenge, especially in species where little data are available.
Collapse
Affiliation(s)
- Samuel Sledzieski
- Computer Science and Artificial Intelligence Lab., Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Rohit Singh
- Computer Science and Artificial Intelligence Lab., Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, USA.
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Lab., Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
5
|
Devkota K, Murphy JM, Cowen LJ. GLIDE: combining local methods and diffusion state embeddings to predict missing interactions in biological networks. Bioinformatics 2021; 36:i464-i473. [PMID: 32657369 PMCID: PMC7355260 DOI: 10.1093/bioinformatics/btaa459] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Motivation One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein–protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. Results We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE’s global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn’s disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. Availability and implementation GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | - James M Murphy
- Department of Mathematics, Tufts University, Medford, MA 02155, USA
| | - Lenore J Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| |
Collapse
|
6
|
Gu S, Milenković T. Data-driven biological network alignment that uses topological, sequence, and functional information. BMC Bioinformatics 2021; 22:34. [PMID: 33514304 PMCID: PMC7847157 DOI: 10.1186/s12859-021-03971-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 01/15/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Network alignment (NA) can transfer functional knowledge between species' conserved biological network regions. Traditional NA assumes that it is topological similarity (isomorphic-like matching) between network regions that corresponds to the regions' functional relatedness. However, we recently found that functionally unrelated proteins are as topologically similar as functionally related proteins. So, we redefined NA as a data-driven method called TARA, which learns from network and protein functional data what kind of topological relatedness (rather than similarity) between proteins corresponds to their functional relatedness. TARA used topological information (within each network) but not sequence information (between proteins across networks). Yet, TARA yielded higher protein functional prediction accuracy than existing NA methods, even those that used both topological and sequence information. RESULTS Here, we propose TARA++ that is also data-driven, like TARA and unlike other existing methods, but that uses across-network sequence information on top of within-network topological information, unlike TARA. To deal with the within-and-across-network analysis, we adapt social network embedding to the problem of biological NA. TARA++ outperforms protein functional prediction accuracy of existing methods. CONCLUSIONS As such, combining research knowledge from different domains is promising. Overall, improvements in protein functional prediction have biomedical implications, for example allowing researchers to better understand how cancer progresses or how humans age.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, Eck Institute for Global Health, Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, Eck Institute for Global Health, Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
7
|
Wang XW, Chen Y, Liu YY. Link Prediction through Deep Generative Model. iScience 2020; 23:101626. [PMID: 33103070 PMCID: PMC7575873 DOI: 10.1016/j.isci.2020.101626] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 06/19/2020] [Accepted: 09/24/2020] [Indexed: 02/04/2023] Open
Abstract
Inferring missing links based on the currently observed network is known as link prediction, which has tremendous real-world applications in biomedicine, e-commerce, social media, and criminal intelligence. Numerous methods have been proposed to solve the link prediction problem. Yet, many of these methods are designed for undirected networks only and based on domain-specific heuristics. Here we developed a new link prediction method based on deep generative models, which does not rely on any domain-specific heuristic and works for general undirected or directed complex networks. Our key idea is to represent the adjacency matrix of a network as an image and then learn hierarchical feature representations of the image by training a deep generative model. Those features correspond to structural patterns in the network at different scales, from small subgraphs to mesoscopic communities. When applied to various real-world networks from different domains, our method shows overall superior performance against existing methods. A novel link prediction method based on deep generative models is developed This method works for general undirected or directed complex networks Leveraging structural patterns at different scales, this method outperforms others
Collapse
Affiliation(s)
- Xu-Wen Wang
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Yize Chen
- Department of Electrical and Computer Engineering, University of Washington, Seattle, WA 98195, USA
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
8
|
Abstract
In this study, we deal with the problem of biological network alignment (NA), which aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for the transfer of functional knowledge between the aligned nodes. We provide evidence that current NA methods, which assume that topologically similar nodes (i.e., nodes whose network neighborhoods are isomorphic-like) have high functional relatedness, do not actually end up aligning functionally related nodes. That is, we show that the current topological similarity assumption does not hold well. Consequently, we argue that a paradigm shift is needed with how the NA problem is approached. So, we redefine NA as a data-driven framework, called TARA (data-driven NA), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity. TARA makes no assumptions about what nodes should be aligned, distinguishing it from existing NA methods. Specifically, TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns (features). We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. TARA as currently implemented uses topological but not protein sequence information for functional knowledge transfer. In this context, we find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance. The software and data are available at http://www.nd.edu/~cone/TARA/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| |
Collapse
|
9
|
Liu C, Ma Y, Zhao J, Nussinov R, Zhang YC, Cheng F, Zhang ZK. Computational network biology: Data, models, and applications. PHYSICS REPORTS 2020; 846:1-66. [DOI: 10.1016/j.physrep.2019.12.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
10
|
Yue Z, Nguyen T, Zhang E, Zhang J, Chen JY. WIPER: Weighted in-Path Edge Ranking for biomolecular association networks. QUANTITATIVE BIOLOGY 2019; 7:313-326. [PMID: 38525413 PMCID: PMC10959292 DOI: 10.1007/s40484-019-0180-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 08/02/2019] [Accepted: 08/08/2019] [Indexed: 10/25/2022]
Abstract
Background In network biology researchers generate biomolecular networks with candidate genes or proteins experimentally-derived from high-throughput data and known biomolecular associations. Current bioinformatics research focuses on characterizing candidate genes/proteins, or nodes, with network characteristics, e.g., betweenness centrality. However, there have been few research reports to characterize and prioritize biomolecular associations ("edges"), which can represent gene regulatory events essential to biological processes. Method We developed Weighted In-Path Edge Ranking (WIPER), a new computational algorithm which can help evaluate all biomolecular interactions/associations ("edges") in a network model and generate a rank order of every edge based on their in-path traversal scores and statistical significance test result. To validate whether WIPER worked as we designed, we tested the algorithm on synthetic network models. Results Our results showed WIPER can reliably discover both critical "well traversed in-path edges", which are statistically more traversed than normal edges, and "peripheral in-path edges", which are less traversed than normal edges. Compared with other simple measures such as betweenness centrality, WIPER provides better biological interpretations. In the case study of analyzing postanal pig hearts gene expression, WIPER highlighted new signaling pathways suggestive of cardiomyocyte regeneration and proliferation. In the case study of Alzheimer's disease genetic disorder association, WIPER reports SRC:APP, AR:APP, APP:FYN, and APP:NES edges (gene-gene associations) both statistically and biologically important from PubMed co-citation. Conclusion We believe that WIPER will become an essential software tool to help biologists discover and validate essential signaling/regulatory events from high-throughput biology data in the context of biological networks. Availability The free WIPER API is described at discovery.informatics.uab.edu/wiper/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL 35233, USA
| | - Thanh Nguyen
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL 35233, USA
| | - Eric Zhang
- Department of Biomedical Engineering, University of Alabama, Birmingham, AL 35233, USA
| | - Jianyi Zhang
- Department of Biomedical Engineering, University of Alabama, Birmingham, AL 35233, USA
| | - Jake Y. Chen
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL 35233, USA
- Department of Biomedical Engineering, University of Alabama, Birmingham, AL 35233, USA
- Department of Computer Science, University of Alabama, Birmingham, AL 35233, USA
| |
Collapse
|
11
|
Rossi RA, Zhou R, Ahmed NK. Estimation of Graphlet Counts in Massive Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:44-57. [PMID: 29994543 DOI: 10.1109/tnnls.2018.2826529] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Graphlets are induced subgraphs of a large network and are important for understanding and modeling complex networks. Despite their practical importance, graphlets have been severely limited to applications and domains with relatively small graphs. Most previous work has focused on exact algorithms; however, it is often too expensive to compute graphlets exactly in massive networks with billions of edges, and finding an approximate count is usually sufficient for many applications. In this paper, we propose an unbiased graphlet estimation framework that is: (a) fast with large speedups compared to the state of the art; (b) parallel with nearly linear speedups; (c) accurate with less than 1% relative error; (d) scalable and space efficient for massive networks with billions of edges; and (e) effective for a variety of real-world settings as well as estimating global and local graphlet statistics (e.g., counts). On 300 networks from 20 domains, we obtain <1% relative error for all graphlets. This is vastly more accurate than the existing methods while using less data. Moreover, it takes a few seconds on billion edge graphs (as opposed to days/weeks). These are by far the largest graphlet computations to date.
Collapse
|
12
|
Zhu Y, Li Y, Liu J, Qin L, Yu JX. Discovering large conserved functional components in global network alignment by graph matching. BMC Genomics 2018; 19:670. [PMID: 30255780 PMCID: PMC6157291 DOI: 10.1186/s12864-018-5027-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Aligning protein-protein interaction (PPI) networks is very important to discover the functionally conserved sub-structures between different species. In recent years, the global PPI network alignment problem has been extensively studied aiming at finding the one-to-one alignment with the maximum matching score. However, finding large conserved components remains challenging due to its NP-hardness. RESULTS We propose a new graph matching method GMAlign for global PPI network alignment. It first selects some pairs of important proteins as seeds, followed by a gradual expansion to obtain an initial matching, and then it refines the current result to obtain an optimal alignment result iteratively based on the vertex cover. We compare GMAlign with the state-of-the-art methods on the PPI network pairs obtained from the largest BioGRID dataset and validate its performance. The results show that our algorithm can produce larger size of alignment, and can find bigger and denser common connected subgraphs as well for the first time. Meanwhile, GMAlign can achieve high quality biological results, as measured by functional consistency and semantic similarity of the Gene Ontology terms. Moreover, we also show that GMAlign can achieve better results which are structurally and biologically meaningful in the detection of large conserved biological pathways between species. CONCLUSIONS GMAlign is a novel global network alignment tool to discover large conserved functional components between PPI networks. It also has many potential biological applications such as conserved pathway and protein complex discovery across species. The GMAlign software and datasets are avaialbile at https://github.com/yzlwhu/GMAlign .
Collapse
Affiliation(s)
- Yuanyuan Zhu
- School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072 China
| | - Yuezhi Li
- School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072 China
| | - Juan Liu
- School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072 China
| | - Lu Qin
- Centre of Quantum Computation and Intelligent Systems, University of Technology, Sydney, Australia
| | - Jeffrey Xu Yu
- The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
13
|
Gu S, Johnson J, Faisal FE, Milenković T. From homogeneous to heterogeneous network alignment via colored graphlets. Sci Rep 2018; 8:12524. [PMID: 30131590 PMCID: PMC6104050 DOI: 10.1038/s41598-018-30831-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 08/07/2018] [Indexed: 11/19/2022] Open
Abstract
Network alignment (NA) compares networks with the goal of finding a node mapping that uncovers highly similar (conserved) network regions. Existing NA methods are homogeneous, i.e., they can deal only with networks containing nodes and edges of one type. Due to increasing amounts of heterogeneous network data with nodes or edges of different types, we extend three recent state-of-the-art homogeneous NA methods, WAVE, MAGNA++, and SANA, to allow for heterogeneous NA for the first time. We introduce several algorithmic novelties. Namely, these existing methods compute homogeneous graphlet-based node similarities and then find high-scoring alignments with respect to these similarities, while simultaneously maximizing the amount of conserved edges. Instead, we extend homogeneous graphlets to their heterogeneous counterparts, which we then use to develop a new measure of heterogeneous node similarity. Also, we extend S3, a state-of-the-art measure of edge conservation for homogeneous NA, to its heterogeneous counterpart. Then, we find high-scoring alignments with respect to our heterogeneous node similarity and edge conservation measures. In evaluations on synthetic and real-world biological networks, our proposed heterogeneous NA methods lead to higher-quality alignments and better robustness to noise in the data than their homogeneous counterparts. The software and data from this work is available at https://nd.edu/~cone/colored_graphlets/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - John Johnson
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
14
|
Yerneni S, Khan IK, Wei Q, Kihara D. IAS: Interaction Specific GO Term Associations for Predicting Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1247-1258. [PMID: 26415209 DOI: 10.1109/tcbb.2015.2476809] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Proteins carry out their function in a cell through interactions with other proteins. A large scale protein-protein interaction (PPI) network of an organism provides static yet an essential structure of interactions, which is valuable clue for understanding the functions of proteins and pathways. PPIs are determined primarily by experimental methods; however, computational PPI prediction methods can supplement or verify PPIs identified by experiment. Here, we developed a novel scoring method for predicting PPIs from Gene Ontology (GO) annotations of proteins. Unlike existing methods that consider functional similarity as an indication of interaction between proteins, the new score, named the protein-protein Interaction Association Score (IAS), was computed from GO term associations of known interacting protein pairs in 49 organisms. IAS was evaluated on PPI data of six organisms and found to outperform existing GO term-based scoring methods. Moreover, consensus scoring methods that combine different scores further improved performance of PPI prediction.
Collapse
|
15
|
GRAFENE: Graphlet-based alignment-free network approach integrates 3D structural and sequence (residue order) data to improve protein structural comparison. Sci Rep 2017; 7:14890. [PMID: 29097661 PMCID: PMC5668259 DOI: 10.1038/s41598-017-14411-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 10/11/2017] [Indexed: 12/26/2022] Open
Abstract
Initial protein structural comparisons were sequence-based. Since amino acids that are distant in the sequence can be close in the 3-dimensional (3D) structure, 3D contact approaches can complement sequence approaches. Traditional 3D contact approaches study 3D structures directly and are alignment-based. Instead, 3D structures can be modeled as protein structure networks (PSNs). Then, network approaches can compare proteins by comparing their PSNs. These can be alignment-based or alignment-free. We focus on the latter. Existing network alignment-free approaches have drawbacks: 1) They rely on naive measures of network topology. 2) They are not robust to PSN size. They cannot integrate 3) multiple PSN measures or 4) PSN data with sequence data, although this could improve comparison because the different data types capture complementary aspects of the protein structure. We address this by: 1) exploiting well-established graphlet measures via a new network alignment-free approach, 2) introducing normalized graphlet measures to remove the bias of PSN size, 3) allowing for integrating multiple PSN measures, and 4) using ordered graphlets to combine the complementary PSN data and sequence (specifically, residue order) data. We compare synthetic networks and real-world PSNs more accurately and faster than existing network (alignment-free and alignment-based), 3D contact, or sequence approaches.
Collapse
|
16
|
Huang L, Liao L, Wu CH. Evolutionary analysis and interaction prediction for protein-protein interaction network in geometric space. PLoS One 2017; 12:e0183495. [PMID: 28886027 PMCID: PMC5590856 DOI: 10.1371/journal.pone.0183495] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Accepted: 08/05/2017] [Indexed: 01/26/2023] Open
Abstract
Prediction of protein-protein interaction (PPI) remains a central task in systems biology. With more PPIs identified, forming PPI networks, it has become feasible and also imperative to study PPIs at the network level, such as evolutionary analysis of the networks, for better understanding of PPI networks and for more accurate prediction of pairwise PPIs by leveraging the information gained at the network level. In this work we developed a novel method that enables us to incorporate evolutionary information into geometric space to improve PPI prediction, which in turn can be used to select and evaluate various evolutionary models. The method is tested with cross-validation using human PPI network and yeast PPI network data. The results show that the accuracy of PPI prediction measured by ROC score is increased by up to 14.6%, as compared to a baseline without using evolutionary information. The results also indicate that our modified evolutionary model DANEOsf—combining a gene duplication/neofunctionalization model and scale-free model—has a better fitness and prediction efficacy for these two PPI networks. The improved PPI prediction performance may suggest that our DANEOsf evolutionary model can uncover the underlying evolutionary mechanism for these two PPI networks better than other tested models. Consequently, of particular importance is that our method offers an effective way to select evolutionary models that best capture the underlying evolutionary mechanisms, evaluating the fitness of evolutionary models from the perspective of PPI prediction on real PPI networks.
Collapse
Affiliation(s)
- Lei Huang
- Department of Computer and Information Sciences, University of Delaware, Newark, DE, United States of America
| | - Li Liao
- Department of Computer and Information Sciences, University of Delaware, Newark, DE, United States of America
- * E-mail:
| | - Cathy H. Wu
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, United States of America
| |
Collapse
|
17
|
Yoo B, Faisal FE, Chen H, Milenkovic T. Improving Identification of Key Players in Aging via Network De-Noising and Core Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1056-1069. [PMID: 26529776 DOI: 10.1109/tcbb.2015.2495170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Current "ground truth" knowledge about human aging has been obtained by transferring aging-related knowledge from well-studied model species via sequence homology or by studying human gene expression data. Since proteins function by interacting with each other, analyzing protein-protein interaction (PPI) networks in the context of aging is promising. Unlike existing static network research of aging, since cellular functioning is dynamic, we recently integrated the static human PPI network with aging-related gene expression data to form dynamic, age-specific networks. Then, we predicted as key players in aging those proteins whose network topologies significantly changed with age. Since current networks are noisy , here, we use link prediction to de-noise the human network and predict improved key players in aging from the de-noised data. Indeed, de-noising gives more significant overlap between the predicted data and the "ground truth" aging-related data. Yet, we obtain novel predictions, which we validate in the literature. Also, we improve the predictions by an alternative strategy: removing "redundant" edges from the age-specific networks and using the resulting age-specific network "cores" to study aging. We produce new knowledge from dynamic networks encompassing multiple data types, via network de-noising or core inference, complementing the existing knowledge obtained from sequence or expression data.
Collapse
|
18
|
Chen B, Li F, Chen S, Hu R, Chen L. Link prediction based on non-negative matrix factorization. PLoS One 2017; 12:e0182968. [PMID: 28854195 PMCID: PMC5576740 DOI: 10.1371/journal.pone.0182968] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 07/27/2017] [Indexed: 02/02/2023] Open
Abstract
With the rapid expansion of internet, the complex networks has become high-dimensional, sparse and redundant. Besides, the problem of link prediction in such networks has also obatined increasingly attention from different types of domains like information science, anthropology, sociology and computer sciences. It makes requirements for effective link prediction techniques to extract the most essential and relevant information for online users in internet. Therefore, this paper attempts to put forward a link prediction algorithm based on non-negative matrix factorization. In the algorithm, we reconstruct the correlation between different types of matrix through the projection of high-dimensional vector space to a low-dimensional one, and then use the similarity between the column vectors of the weight matrix as the scoring matrix. The experiment results demonstrate that the algorithm not only reduces data storage space but also effectively makes the improvements of the prediction performance during the process of sustaining a low time complexity.
Collapse
Affiliation(s)
- Bolun Chen
- College of Computer Engineering, Huaiyin Institute of Technology, Huaian, China
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Fenfen Li
- College of Computer Engineering, Huaiyin Institute of Technology, Huaian, China
| | - Senbo Chen
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
- * E-mail:
| | - Ronglin Hu
- College of Computer Engineering, Huaiyin Institute of Technology, Huaian, China
| | - Ling Chen
- Department of Computer Science, Yangzhou University, Yangzhou, China
- State Key Lab of Novel Software Tech, Nanjing University, Nanjing, China
| |
Collapse
|
19
|
Wang T, He XS, Zhou MY, Fu ZQ. Link Prediction in Evolving Networks Based on Popularity of Nodes. Sci Rep 2017; 7:7147. [PMID: 28769053 PMCID: PMC5540936 DOI: 10.1038/s41598-017-07315-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 06/26/2017] [Indexed: 01/26/2023] Open
Abstract
Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.
Collapse
Affiliation(s)
- Tong Wang
- Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, 230027, P. R. China
| | - Xing-Sheng He
- Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, 230027, P. R. China
| | - Ming-Yang Zhou
- Guangdong Province Key Laboratory of Popular High Performance Computers, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, P. R. China. .,Physics Department, University of Fribourg, Chemin du Musée 3, Fribourg, CH-1700, Switzerland.
| | - Zhong-Qian Fu
- Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, 230027, P. R. China
| |
Collapse
|
20
|
Abstract
MOTIVATION Network alignment (NA) aims to find a node mapping that conserves similar regions between compared networks. NA is applicable to many fields, including computational biology, where NA can guide the transfer of biological knowledge from well- to poorly-studied species across aligned network regions. Existing NA methods can only align static networks. However, most complex real-world systems evolve over time and should thus be modeled as dynamic networks. We hypothesize that aligning dynamic network representations of evolving systems will produce superior alignments compared to aligning the systems' static network representations, as is currently done. RESULTS For this purpose, we introduce the first ever dynamic NA method, DynaMAGNA ++. This proof-of-concept dynamic NA method is an extension of a state-of-the-art static NA method, MAGNA++. Even though both MAGNA++ and DynaMAGNA++ optimize edge as well as node conservation across the aligned networks, MAGNA++ conserves static edges and similarity between static node neighborhoods, while DynaMAGNA++ conserves dynamic edges (events) and similarity between evolving node neighborhoods. For this purpose, we introduce the first ever measure of dynamic edge conservation and rely on our recent measure of dynamic node conservation. Importantly, the two dynamic conservation measures can be optimized with any state-of-the-art NA method and not just MAGNA++. We confirm our hypothesis that dynamic NA is superior to static NA, on synthetic and real-world networks, in computational biology and social domains. DynaMAGNA++ is parallelized and has a user-friendly graphical interface. AVAILABILITY AND IMPLEMENTATION http://nd.edu/∼cone/DynaMAGNA++/ . CONTACT tmilenko@nd.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- V Vijayan
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - D Critchlow
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
- Department of Physics and Astronomy, Austin Peay State University, Clarksville, Tennessee, TN, USA
| | - T Milenković
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
21
|
Zhang W, Xu J, Li Y, Zou X. A new two-stage method for revealing missing parts of edges in protein-protein interaction networks. PLoS One 2017; 12:e0177029. [PMID: 28493910 PMCID: PMC5426645 DOI: 10.1371/journal.pone.0177029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2016] [Accepted: 04/20/2017] [Indexed: 12/24/2022] Open
Abstract
With the increasing availability of high-throughput data, various computational methods have recently been developed for understanding the cell through protein-protein interaction (PPI) networks at a systems level. However, due to the incompleteness of the original PPI networks those efforts have been significantly hindered. In this paper, we propose a two stage method to predict underlying links between two originally unlinked protein pairs. First, we measure gene expression and gene functional similarly between unlinked protein pairs on Saccharomyces cerevisiae benchmark network and obtain new constructed networks. Then, we select the significant part of the new predicted links by analyzing the difference between essential proteins that have been identified based on the new constructed networks and the original network. Furthermore, we validate the performance of the new method by using the reliable and comprehensive PPI dataset obtained from the STRING database and compare the new proposed method with four other random walk-based methods. Comparing the results indicates that the new proposed strategy performs well in predicting underlying links. This study provides a general paradigm for predicting new interactions between protein pairs and offers new insights into identifying essential proteins.
Collapse
Affiliation(s)
- Wei Zhang
- School of Science, East China Jiaotong University, Nanchang 330013, China
- * E-mail: (WZ); (XFZ)
| | - Jia Xu
- School of Mechatronic Engineering, East China Jiaotong University, Nanchang 330013, China
| | - Yuanyuan Li
- School of Mathematics and Statistics, Wuhan Institute of Technology in Wuhan, Wuhan, 430072, China
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
- * E-mail: (WZ); (XFZ)
| |
Collapse
|
22
|
Yang J, Yang T, Wu D, Lin L, Yang F, Zhao J. The integration of weighted human gene association networks based on link prediction. BMC SYSTEMS BIOLOGY 2017; 11:12. [PMID: 28137253 PMCID: PMC5282786 DOI: 10.1186/s12918-017-0398-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 01/25/2017] [Indexed: 12/27/2022]
Abstract
Background Physical and functional interplays between genes or proteins have important biological meaning for cellular functions. Some efforts have been made to construct weighted gene association meta-networks by integrating multiple biological resources, where the weight indicates the confidence of the interaction. However, it is found that these existing human gene association networks share only quite limited overlapped interactions, suggesting their incompleteness and noise. Results Here we proposed a workflow to construct a weighted human gene association network using information of six existing networks, including two weighted specific PPI networks and four gene association meta-networks. We applied link prediction algorithm to predict possible missing links of the networks, cross-validation approach to refine each network and finally integrated the refined networks to get the final integrated network. Conclusions The common information among the refined networks increases notably, suggesting their higher reliability. Our final integrated network owns much more links than most of the original networks, meanwhile its links still keep high functional relevance. Being used as background network in a case study of disease gene prediction, the final integrated network presents good performance, implying its reliability and application significance. Our workflow could be insightful for integrating and refining existing gene association data. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0398-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jian Yang
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Tinghong Yang
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Duzhi Wu
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Limei Lin
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Fan Yang
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Jing Zhao
- Department of Mathematics, Logistical Engineering University, Chongqing, China. .,Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
| |
Collapse
|
23
|
SCOUT: simultaneous time segmentation and community detection in dynamic networks. Sci Rep 2016; 6:37557. [PMID: 27881879 PMCID: PMC5121586 DOI: 10.1038/srep37557] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 11/01/2016] [Indexed: 11/24/2022] Open
Abstract
Many evolving complex real-world systems can be modeled via dynamic networks. An important problem in dynamic network research is community detection, which finds groups of topologically related nodes. Typically, this problem is approached by assuming either that each time point has a distinct community organization or that all time points share a single community organization. The reality likely lies between these two extremes. To find the compromise, we consider community detection in the context of the problem of segment detection, which identifies contiguous time periods with consistent network structure. Consequently, we formulate a combined problem of segment community detection (SCD), which simultaneously partitions the network into contiguous time segments with consistent community organization and finds this community organization for each segment. To solve SCD, we introduce SCOUT, an optimization framework that explicitly considers both segmentation quality and partition quality. SCOUT addresses limitations of existing methods that can be adapted to solve SCD, which consider only one of segmentation quality or partition quality. In a thorough evaluation, SCOUT outperforms the existing methods in terms of both accuracy and computational complexity. We apply SCOUT to biological network data to study human aging.
Collapse
|
24
|
Huang XT, Zhu Y, Chan LLH, Zhao Z, Yan H. An integrative C. elegans protein-protein interaction network with reliability assessment based on a probabilistic graphical model. MOLECULAR BIOSYSTEMS 2016; 12:85-92. [PMID: 26555698 DOI: 10.1039/c5mb00417a] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In Caenorhabditis elegans, a large number of protein-protein interactions (PPIs) are identified by different experiments. However, a comprehensive weighted PPI network, which is essential for signaling pathway inference, is not yet available in this model organism. Therefore, we firstly construct an integrative PPI network in C. elegans with 12,951 interactions involving 5039 proteins from seven molecular interaction databases. Then, a reliability score based on a probabilistic graphical model (RSPGM) is proposed to assess PPIs. It assumes that the random number of interactions between two proteins comes from the Bernoulli distribution to avoid multi-links. The main parameter of the RSPGM score contains a few latent variables which can be considered as several common properties between two proteins. Validations on high-confidence yeast datasets show that RSPGM provides more accurate evaluation than other approaches, and the PPIs in the reconstructed PPI network have higher biological relevance than that in the original network in terms of gene ontology, gene expression, essentiality and the prediction of known protein complexes. Furthermore, this weighted integrative PPI network in C. elegans is employed on inferring interaction path of the canonical Wnt/β-catenin pathway as well. Most genes on the inferred interaction path have been validated to be Wnt pathway components. Therefore, RSPGM is essential and effective for evaluating PPIs and inferring interaction path. Finally, the PPI network with RSPGM scores can be queried and visualized on a user interactive website, which is freely available at .
Collapse
Affiliation(s)
- Xiao-Tai Huang
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| | - Yuan Zhu
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China and School of Automation, China University of Geosciences, Wuhan, China.
| | - Leanne Lai Hang Chan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| | - Zhongying Zhao
- Department of Biology, Faculty of Science, Hong Kong Baptist University, Hong Kong, China
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
25
|
Hulovatyy Y, Chen H, Milenković T. Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 2015; 31:i171-80. [PMID: 26072480 PMCID: PMC4765862 DOI: 10.1093/bioinformatics/btv227] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION With increasing availability of temporal real-world networks, how to efficiently study these data? One can model a temporal network as a single aggregate static network, or as a series of time-specific snapshots, each being an aggregate static network over the corresponding time window. Then, one can use established methods for static analysis on the resulting aggregate network(s), but losing in the process valuable temporal information either completely, or at the interface between different snapshots, respectively. Here, we develop a novel approach for studying a temporal network more explicitly, by capturing inter-snapshot relationships. RESULTS We base our methodology on well-established graphlets (subgraphs), which have been proven in numerous contexts in static network research. We develop new theory to allow for graphlet-based analyses of temporal networks. Our new notion of dynamic graphlets is different from existing dynamic network approaches that are based on temporal motifs (statistically significant subgraphs). The latter have limitations: their results depend on the choice of a null network model that is required to evaluate the significance of a subgraph, and choosing a good null model is non-trivial. Our dynamic graphlets overcome the limitations of the temporal motifs. Also, when we aim to characterize the structure and function of an entire temporal network or of individual nodes, our dynamic graphlets outperform the static graphlets. Clearly, accounting for temporal information helps. We apply dynamic graphlets to temporal age-specific molecular network data to deepen our limited knowledge about human aging. AVAILABILITY AND IMPLEMENTATION http://www.nd.edu/∼cone/DG.
Collapse
Affiliation(s)
- Y Hulovatyy
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - H Chen
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - T Milenković
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
26
|
Crawford J, Sun Y, Milenković T. Fair evaluation of global network aligners. Algorithms Mol Biol 2015; 10:19. [PMID: 26060505 PMCID: PMC4460690 DOI: 10.1186/s13015-015-0050-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2014] [Accepted: 05/10/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analogous to genomic sequence alignment, biological network alignment identifies conserved regions between networks of different species. Then, function can be transferred from well- to poorly-annotated species between aligned network regions. Network alignment typically encompasses two algorithmic components: node cost function (NCF), which measures similarities between nodes in different networks, and alignment strategy (AS), which uses these similarities to rapidly identify high-scoring alignments. Different methods use both different NCFs and different ASs. Thus, it is unclear whether the superiority of a method comes from its NCF, its AS, or both. We already showed on state-of-the-art methods, MI-GRAAL and IsoRankN, that combining NCF of one method and AS of another method can give a new superior method. Here, we evaluate MI-GRAAL against a newer approach, GHOST, by mixing-and-matching the methods' NCFs and ASs to potentially further improve alignment quality. While doing so, we approach important questions that have not been asked systematically thus far. First, we ask how much of the NCF information should come from protein sequence data compared to network topology data. Existing methods determine this parameter more-less arbitrarily, which could affect alignment quality. Second, when topological information is used in NCF, we ask how large the size of the neighborhoods of the compared nodes should be. Existing methods assume that the larger the neighborhood size, the better. RESULTS Our findings are as follows. MI-GRAAL's NCF is superior to GHOST's NCF, while the performance of the methods' ASs is data-dependent. Thus, for data on which GHOST's AS is superior to MI-GRAAL's AS, the combination of MI-GRAAL's NCF and GHOST's AS represents a new superior method. Also, which amount of sequence information is used within NCF does not affect alignment quality, while the inclusion of topological information is crucial for producing good alignments. Finally, larger neighborhood sizes are preferred, but often, it is the second largest size that is superior. Using this size instead of the largest one would decrease computational complexity. CONCLUSION Taken together, our results represent general recommendations for a fair evaluation of network alignment methods and in particular of two-stage NCF-AS approaches.
Collapse
|
27
|
Zhu L, Deng SP, Huang DS. A Two-Stage Geometric Method for Pruning Unreliable Links in Protein-Protein Networks. IEEE Trans Nanobioscience 2015; 14:528-34. [PMID: 25861086 DOI: 10.1109/tnb.2015.2420754] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Protein-protein interactions (PPIs) play essential roles for determining the outcomes of most of the cellular functions of the cell. Although the experimentally detected high-throughput PPI data promise new opportunities for the study of many biological mechanisms including cellular metabolism and protein functions, experimentally detected PPIs have high levels of false positive rate. Therefore, it is of high practical value to develop novel computational tools for pruning low-confidence PPIs. In this paper, we propose a new geometric approach called Leave-One-Out Logistic Metric Embedding (LOO-LME) for assessing the reliability of interactions. Unlike previous approaches which mainly seek to preserve the noisy topological information of the PPI networks in the embedding space, LOO-LME first transforms the learning task into an equivalent discriminant form, then directly deals with the uncertainty in PPI networks using a leave-one-out-style approach. The experimental results show that LOO-LME substantially outperforms previous methods on PPI assessment problems. LOO-LME could thus facilitate further graph-based studies of PPIs and may help infer their hidden underlying biological knowledge.
Collapse
|
28
|
Segura J, Sorzano COS, Cuenca-Alba J, Aloy P, Carazo JM. Using neighborhood cohesiveness to infer interactions between protein domains. Bioinformatics 2015; 31:2545-52. [DOI: 10.1093/bioinformatics/btv188] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 03/28/2015] [Indexed: 01/18/2023] Open
|
29
|
Sun Y, Crawford J, Tang J, Milenković T. Simultaneous Optimization of both Node and Edge Conservation in Network Alignment via WAVE. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-662-48221-6_2] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
30
|
Faisal FE, Zhao H, Milenkovic T. Global Network Alignment in the Context of Aging. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:40-52. [PMID: 26357077 DOI: 10.1109/tcbb.2014.2326862] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Analogous to sequence alignment, network alignment (NA) can be used to transfer biological knowledge across species between conserved network regions. NA faces two algorithmic challenges: 1) Which cost function to use to capture "similarities" between nodes in different networks? 2) Which alignment strategy to use to rapidly identify "high-scoring" alignments from all possible alignments? We "break down" existing state-of-the-art methods that use both different cost functions and different alignment strategies to evaluate each combination of their cost functions and alignment strategies. We find that a combination of the cost function of one method and the alignment strategy of another method beats the existing methods. Hence, we propose this combination as a novel superior NA method. Then, since human aging is hard to study experimentally due to long lifespan, we use NA to transfer aging-related knowledge from well annotated model species to poorly annotated human. By doing so, we produce novel human aging-related knowledge, which complements currently available knowledge about aging that has been obtained mainly by sequence alignment. We demonstrate significant similarity between topological and functional properties of our novel predictions and those of known aging-related genes. We are the first to use NA to learn more about aging.
Collapse
|
31
|
|