1
|
Menor-Flores M, Vega-Rodríguez MA. A protein-protein interaction network aligner study in the multi-objective domain. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 250:108188. [PMID: 38657382 DOI: 10.1016/j.cmpb.2024.108188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 04/14/2024] [Accepted: 04/17/2024] [Indexed: 04/26/2024]
Abstract
BACKGROUND AND OBJECTIVE The protein-protein interaction (PPI) network alignment has proven to be an efficient technique in the diagnosis and prevention of certain diseases. However, the difficulty in maximizing, at the same time, the two qualities that measure the goodness of alignments (topological and biological quality) has led aligners to produce very different alignments. Thus making a comparative study among alignments of such different qualities a big challenge. Multi-objective optimization is a computer method, which is very powerful in this kind of contexts because both conflicting qualities are considered together. Analysing the alignments of each PPI network aligner with multi-objective methodologies allows you to visualize a bigger picture of the alignments and their qualities, obtaining very interesting conclusions. This paper proposes a comprehensive PPI network aligner study in the multi-objective domain. METHODS Alignments from each aligner and all aligners together were studied and compared to each other via Pareto dominance methodologies. The best alignments produced by each aligner and all aligners together for five different alignment scenarios were displayed in Pareto front graphs. Later, the aligners were ranked according to the topological, biological, and combined quality of their alignments. Finally, the aligners were also ranked based on their average runtimes. RESULTS Regarding aligners constructing the best overall alignments, we found that SAlign, BEAMS, SANA, and HubAlign are the best options. Additionally, the alignments of best topological quality are produced by: SANA, SAlign, and HubAlign aligners. On the contrary, the aligners returning the alignments of best biological quality are: BEAMS, TAME, and WAVE. However, if there are time constraints, it is recommended to select SAlign to obtain high topological quality alignments and PISwap or SAlign aligners for high biological quality alignments. CONCLUSIONS The use of the SANA aligner is recommended for obtaining the best alignments of topological quality, BEAMS for alignments of the best biological quality, and SAlign for alignments of the best combined topological and biological quality. Simultaneously, SANA and BEAMS have above-average runtimes. Therefore, it is suggested, if necessary due to time restrictions, to choose other, faster aligners like SAlign or PISwap whose alignments are also of high quality.
Collapse
Affiliation(s)
- Manuel Menor-Flores
- Escuela Politécnica, Universidad de Extremadura,(1) Campus Universitario s/n, 10003 Cáceres, Spain.
| | - Miguel A Vega-Rodríguez
- Escuela Politécnica, Universidad de Extremadura,(1) Campus Universitario s/n, 10003 Cáceres, Spain.
| |
Collapse
|
2
|
Hayes WB. Exact p-values for global network alignments via combinatorial analysis of shared GO terms : REFANGO: Rigorous Evaluation of Functional Alignments of Networks using Gene Ontology. J Math Biol 2024; 88:50. [PMID: 38551701 PMCID: PMC10980677 DOI: 10.1007/s00285-024-02058-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 01/21/2024] [Accepted: 02/05/2024] [Indexed: 04/01/2024]
Abstract
Network alignment aims to uncover topologically similar regions in the protein-protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no "gold standard" exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown's Method. We note that, just as with BLAST's p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, UC Irvine, Irvine, USA.
| |
Collapse
|
3
|
Menor-Flores M, Vega-Rodríguez MA. Boosting-based ensemble of global network aligners for PPI network alignment. EXPERT SYSTEMS WITH APPLICATIONS 2023; 230:120671. [DOI: 10.1016/j.eswa.2023.120671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
4
|
Corominas GR, Blesa MJ, Blum C. AntNetAlign: Ant colony optimization for Network Alignment. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
5
|
Carrasco-Santano I, Vega-Rodríguez MA. MOMEA: Multi-Objective Mutation-based Evolutionary Algorithm for the alignment of protein networks. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
6
|
Wang S, Chen X, Frederisy BJ, Mbakogu BA, Kanne AD, Khosravi P, Hayes WB. On the current failure-but bright future-of topology-driven biological network alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:1-44. [PMID: 35871888 DOI: 10.1016/bs.apcsb.2022.05.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Since the function of a protein is defined by its interaction partners, and since we expect similar interaction patterns across species, the alignment of protein-protein interaction (PPI) networks between species, based on network topology alone, should uncover functionally related proteins across species. Surprisingly, despite the publication of more than fifty algorithms aimed at performing PPI network alignment, few have demonstrated a statistically significant link between network topology and functional similarity, and none have demonstrated that orthologs can be recovered using network topology alone. We find that the major contributing factors to this surprising failure are: (i) edge densities in most currently available experimental PPI networks are demonstrably too low to expect topological network alignment to succeed; (ii) in the few cases where the edge densities are high enough, some measures of topological similarity easily uncover functionally similar proteins while others do not; and (iii) most network alignment algorithms to date perform poorly at optimizing even their own topological objective functions, hampering their ability to use topology effectively. We demonstrate that SANA-the Simulated Annealing Network Aligner-significantly outperforms existing aligners at optimizing their own objective functions, even achieving near-optimal solutions when the optimal solution is known. We offer the first demonstration of global network alignments based on topology alone that align functionally similar proteins with p-values in some cases below 10-300. We predict that topological network alignment has a bright future as edge densities increase toward the value where good alignments become possible. We demonstrate that when enough common topology is present at high enough edge densities-for example in the recent, partly synthetic networks of the Integrated Interaction Database-topological network alignment easily recovers most orthologs, paving the way toward high-throughput functional prediction based on topology-driven network alignment.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Xiaoyin Chen
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Brent J Frederisy
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Benedict A Mbakogu
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Amy D Kanne
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Pasha Khosravi
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, United States.
| |
Collapse
|
7
|
Ma L, Shao Z, Li L, Huang J, Wang S, Lin Q, Li J, Gong M, Nandi AK. Heuristics and metaheuristics for biological network alignment: A review. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.08.156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Menor-Flores M, Vega-Rodríguez MA. Decomposition-based multi-objective optimization approach for PPI network alignment. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
9
|
Ma L, Wang S, Lin Q, Li J, You Z, Huang J, Gong M. Multi-Neighborhood Learning for Global Alignment in Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2598-2611. [PMID: 32305933 DOI: 10.1109/tcbb.2020.2985838] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The global alignment of biological networks (GABN) aims to find an optimal alignment between proteins across species, such that both the biological structures and the topological structures of the proteins are maximally conserved. The research on GABN has attracted great attention due to its applications on species evolution, orthology detection and genetic analyses. Most of the existing methods for GABN are difficult to obtain a good tradeoff between the conservation of the biological structures and topological structures. In this paper, we propose a multi-neighborhood learning method for solving GABN (called as CLMNA). CLMNA first models GABN as an optimization of a weighted similarity which evaluates the conserved biological and topological similarities of an alignment, and then it combines a first-proximity, second-proximity and individual-aware proximity learning algorithm to solve the modeled problem. Finally, systematic experiments on 10 pairs of biological networks across 5 species show the superiority of CLMNA over the state-of-the-art network alignment algorithms. They also validate the effectiveness of CLMNA as a refinement method on improving the performance of the compared algorithms.
Collapse
|
10
|
Mahdipour E, Ghasemzadeh M. The protein-protein interaction network alignment using recurrent neural network. Med Biol Eng Comput 2021; 59:2263-2286. [PMID: 34529185 DOI: 10.1007/s11517-021-02428-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 08/05/2021] [Indexed: 11/29/2022]
Abstract
The main challenge of biological network alignment is that the problem of finding the alignments in two graphs is NP-hard. The discovery of protein-protein interaction (PPI) networks is of great importance in bioinformatics due to their utilization in identifying the cellular pathways, finding new medicines, and disease recognition. In this regard, we describe the network alignment method in the form of a classification problem for the very first time and introduce a deep network that finds the alignment of nodes present in the two networks. We call this method RENA, which means Network Alignment using REcurrent neural network. The proposed solution consists of three steps; in the first phase, we obtain the sequence and topological similarities from the networks' structure. For the second phase, the dataset needed for the transformation of the problem into a classification problem is created from obtained features. In the third phase, we predict the nodes' alignment between two networks using deep learning. We used Biogrid dataset for RENA evaluation. The RENA method is compared with three classification approaches of support vector machine, K-nearest neighbors, and linear discriminant analysis. The experimental results demonstrate the efficiency of the RENA method and 100% accuracy in PPI network alignment prediction.
Collapse
Affiliation(s)
- Elham Mahdipour
- Computer Engineering Department at Khavaran Institute of Higher Education, Mashhad, Iran.
| | | |
Collapse
|
11
|
Woo HM, Yoon BJ. MONACO: accurate biological network alignment through optimal neighborhood matching between focal nodes. Bioinformatics 2021; 37:1401-1410. [PMID: 33165517 DOI: 10.1093/bioinformatics/btaa962] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 10/19/2020] [Accepted: 11/02/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Alignment of protein-protein interaction networks can be used for the unsupervised prediction of functional modules, such as protein complexes and signaling pathways, that are conserved across different species. To date, various algorithms have been proposed for biological network alignment, many of which attempt to incorporate topological similarity between the networks into the alignment process with the goal of constructing accurate and biologically meaningful alignments. Especially, random walk models have been shown to be effective for quantifying the global topological relatedness between nodes that belong to different networks by diffusing node-level similarity along the interaction edges. However, these schemes are not ideal for capturing the local topological similarity between nodes. RESULTS In this article, we propose MONACO, a novel and versatile network alignment algorithm that finds highly accurate pairwise and multiple network alignments through the iterative optimal matching of 'local' neighborhoods around focal nodes. Extensive performance assessment based on real networks as well as synthetic networks, for which the ground truth is known, demonstrates that MONACO clearly and consistently outperforms all other state-of-the-art network alignment algorithms that we have tested, in terms of accuracy, coherence and topological quality of the aligned network regions. Furthermore, despite the sharply enhanced alignment accuracy, MONACO remains computationally efficient and it scales well with increasing size and number of networks. AVAILABILITY AND IMPLEMENTATION Matlab implementation is freely available at https://github.com/bjyoontamu/MONACO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hyun-Myung Woo
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77845, USA.,Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| |
Collapse
|
12
|
Wang Y, Jeong H, Yoon BJ, Qian X. ClusterM: a scalable algorithm for computational prediction of conserved protein complexes across multiple protein interaction networks. BMC Genomics 2020; 21:615. [PMID: 33208103 PMCID: PMC7677834 DOI: 10.1186/s12864-020-07010-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background The current computational methods on identifying conserved protein complexes across multiple Protein-Protein Interaction (PPI) networks suffer from the lack of explicit modeling of the desired topological properties within conserved protein complexes as well as their scalability. Results To overcome those issues, we propose a scalable algorithm—ClusterM—for identifying conserved protein complexes across multiple PPI networks through the integration of network topology and protein sequence similarity information. ClusterM overcomes the computational barrier that existed in previous methods, where the complexity escalates exponentially when handling an increasing number of PPI networks; and it is able to detect conserved protein complexes with both topological separability and cohesive protein sequence conservation. On two independent compendiums of PPI networks from Saccharomyces cerevisiae (Sce, yeast), Drosophila melanogaster (Dme, fruit fly), Caenorhabditis elegans (Cel, worm), and Homo sapiens (Hsa, human), we demonstrate that ClusterM outperforms other state-of-the-art algorithms by a significant margin and is able to identify de novo conserved protein complexes across four species that are missed by existing algorithms. Conclusions ClusterM can better capture the desired topological property of a typical conserved protein complex, which is densely connected within the complex while being well-separated from the rest of the networks. Furthermore, our experiments have shown that ClusterM is highly scalable and efficient when analyzing multiple PPI networks.
Collapse
|
13
|
Ma CY, Liao CS. A review of protein-protein interaction network alignment: From pathway comparison to global alignment. Comput Struct Biotechnol J 2020; 18:2647-2656. [PMID: 33033584 PMCID: PMC7533294 DOI: 10.1016/j.csbj.2020.09.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 09/01/2020] [Accepted: 09/05/2020] [Indexed: 12/13/2022] Open
Abstract
Network alignment provides a comprehensive way to discover the similar parts between molecular systems of different species based on topological and biological similarity. With such a strong basis, one can do comparative studies at a systems level in the field of computational biology. In this survey paper, we focus on protein-protein interaction networks and review some representative algorithms for network alignment in the past two decades as well as the state-of-the-art aligners. We also introduce the most popular evaluation measures in the literature to benchmark the performance of these approaches. Finally, we address several future challenges and the possible ways to conquer the existing problems of biological network alignment.
Collapse
Affiliation(s)
- Cheng-Yu Ma
- Chang Gung Memorial Hospital, No. 5, Fu-Hsing St., Kuei Shan Dist., Taoyuan City 33305, Taiwan, ROC
| | - Chung-Shou Liao
- National Tsing Hua University, No. 101, Section 2, Kuang-Fu Rd., Hsinchu City 30013, Taiwan, ROC
| |
Collapse
|
14
|
Woo HM, Jeong H, Yoon BJ. NAPAbench 2: A network synthesis algorithm for generating realistic protein-protein interaction (PPI) network families. PLoS One 2020; 15:e0227598. [PMID: 31986158 PMCID: PMC6984706 DOI: 10.1371/journal.pone.0227598] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/23/2019] [Indexed: 11/18/2022] Open
Abstract
Comparative network analysis provides effective computational means for gaining novel insights into the structural and functional compositions of biological networks. In recent years, various methods have been developed for biological network alignment, whose main goal is to identify important similarities and critical differences between networks in terms of their topology and composition. A major impediment to advancing network alignment techniques has been the lack of gold-standard benchmarks that can be used for accurate and comprehensive performance assessment of such algorithms. The original NAPAbench (network alignment performance assessment benchmark) was developed to address this problem, and it has been widely utilized by many researchers for the development, evaluation, and comparison of novel network alignment techniques. In this work, we introduce NAPAbench 2-a major update of the original NAPAbench that was introduced in 2012. NAPAbench 2 includes a completely redesigned network synthesis algorithm that can generate protein-protein interaction (PPI) network families whose characteristics closely match those of the latest real PPI networks. Furthermore, the network synthesis algorithm comes with an intuitive GUI that allows users to easily generate PPI network families with an arbitrary number of networks of any size, according to a flexible user-defined phylogeny. In addition, NAPAbench 2 provides updated benchmark datasets-created using the redesigned network synthesis algorithm-which can be used for comprehensive performance assessment of network alignment algorithms and their scalability.
Collapse
Affiliation(s)
- Hyun-Myung Woo
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America
| | - Hyundoo Jeong
- Department of Mechatronics Engineering, Incheon National University, Incheon, Republic of Korea
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, United States of America
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States of America
- * E-mail:
| |
Collapse
|
15
|
Hayes WB. An Introductory Guide to Aligning Networks Using SANA, the Simulated Annealing Network Aligner. Methods Mol Biol 2020; 2074:263-284. [PMID: 31583643 DOI: 10.1007/978-1-4939-9873-9_18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Sequence alignment has had an enormous impact on our understanding of biology, evolution, and disease. The alignment of biological networks holds similar promise. Biological networks generally model interactions between biomolecules such as proteins, genes, metabolites, or mRNAs. There is strong evidence that the network topology-the "structure" of the network-is correlated with the functions performed, so that network topology can be used to help predict or understand function. However, unlike sequence comparison and alignment-which is an essentially solved problem-network comparison and alignment is an NP-complete problem for which heuristic algorithms must be used.Here we introduce SANA, the Simulated Annealing Network Aligner. SANA is one of many algorithms proposed for the arena of biological network alignment. In the context of global network alignment, SANA stands out for its speed, memory efficiency, ease-of-use, and flexibility in the arena of producing alignments between two or more networks. SANA produces better alignments in minutes on a laptop than most other algorithms can produce in hours or days of CPU time on large server-class machines. We walk the user through how to use SANA for several types of biomolecular networks.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, USA.
| |
Collapse
|
16
|
Wen Bin Goh W, Thalappilly S, Thibault G. Moving beyond the current limits of data analysis in longevity and healthy lifespan studies. Drug Discov Today 2019; 24:2273-2285. [PMID: 31499187 DOI: 10.1016/j.drudis.2019.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 08/03/2019] [Accepted: 08/28/2019] [Indexed: 11/19/2022]
Abstract
Living longer with sustainable quality of life is becoming increasingly important in aging populations. Understanding associative biological mechanisms have proven daunting, because of multigenicity and population heterogeneity. Although Big Data and Artificial Intelligence (AI) could help, naïve adoption is ill advised. We hold the view that model organisms are better suited for big-data analytics but might lack relevance because they do not immediately reflect the human condition. Resolving this hurdle and bridging the human-model organism gap will require some finesse. This includes improving signal:noise ratios by appropriate contextualization of high-throughput data, establishing consistency across multiple high-throughput platforms, and adopting supporting technologies that provide useful in silico and in vivo validation strategies.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Bio-Data Science and Education Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore.
| | - Subhash Thalappilly
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Guillaume Thibault
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore; Institute of Molecular and Cell Biology, A*STAR, 138673, Singapore.
| |
Collapse
|
17
|
Rantala M, Niemistö H, Karhela T, Sierla S, Vyatkin V. Applying graph matching techniques to enhance reuse of plant design information. COMPUT IND 2019. [DOI: 10.1016/j.compind.2019.01.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
18
|
Guzzi PH, Milenkovic T. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin. Brief Bioinform 2019; 19:472-481. [PMID: 28062413 DOI: 10.1093/bib/bbw132] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Indexed: 12/23/2022] Open
Abstract
Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology.
Collapse
Affiliation(s)
- Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University Magna Graecia, Catanzaro, 88100 Italy
| | - Tijana Milenkovic
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications (iCeNSA), ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
19
|
Hayes WB, Mamano N. SANA NetGO: a combinatorial approach to using Gene Ontology (GO) terms to score network alignments. Bioinformatics 2019; 34:1345-1352. [PMID: 29228175 DOI: 10.1093/bioinformatics/btx716] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 12/04/2017] [Indexed: 01/05/2023] Open
Abstract
Motivation Gene Ontology (GO) terms are frequently used to score alignments between protein-protein interaction (PPI) networks. Methods exist to measure GO similarity between proteins in isolation, but proteins in a network alignment are not isolated: each pairing is dependent on every other via the alignment itself. Existing measures fail to take into account the frequency of GO terms across networks, instead imposing arbitrary rules on when to allow GO terms. Results Here we develop NetGO, a new measure that naturally weighs infrequent, informative GO terms more heavily than frequent, less informative GO terms, without arbitrary cutoffs, instead downweighting GO terms according to their frequency in the networks being aligned. This is a global measure applicable only to alignments, independent of pairwise GO measures, in the same sense that the edge-based EC or S3 scores are global measures of topological similarity independent of pairwise topological similarities. We demonstrate the superiority of NetGO in alignments of predetermined quality and show that NetGO correlates with alignment quality better than any existing GO-based alignment measures. We also demonstrate that NetGO provides a measure of taxonomic similarity between species, consistent with existing taxonomic measuresa feature not shared with existing GObased network alignment measures. Finally, we re-score alignments produced by almost a dozen aligners from a previous study and show that NetGO does a better job at separating good alignments from bad ones. Availability and implementation Available as part of SANA. Contact whayes@uci.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA 92697-3435, USA
| | - Nil Mamano
- Department of Computer Science, University of California, Irvine, CA 92697-3435, USA
| |
Collapse
|
20
|
Djeddi WE, Yahia SB, Nguifo EM. A Novel Computational Approach for Global Alignment for Multiple Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2060-2066. [PMID: 29994444 DOI: 10.1109/tcbb.2018.2808529] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Due to the rapid progress of biological networks for modeling biological systems, a lot of biomolecular networks have been producing more and more protein-protein interaction (PPI) data. Analyzing protein-protein interaction networks aims to find regions of topological and functional (dis)similarities between molecular networks of different species. The study of PPI networks has the potential to teach us as much about life process and diseases at the molecular level. Although few methods have been developed for multiple PPI network alignment and thus, new network alignment methods are of a compelling need. In this paper, we propose a novel algorithm for a global alignment of multiple protein-protein interaction networks called MAPPIN. The latter relies on information available for the proteins in the networks, such as sequence, function, and network topology. Our algorithm is perfectly designed to exploit current multi-core CPU architectures, and has been extensively tested on a real data (eight species). Our experimental results show that MAPPIN significantly outperforms NetCoffee in terms of coverage. Nevertheless, MAPPIN is handicapped by the time required to load the gene annotation file. An extensive comparison versus the pioneering PPI methods also show that MAPPIN is often efficient in terms of coverage, mean entropy, or mean normalized.
Collapse
|
21
|
Vijayan V, Milenkovic T. Multiple Network Alignment via MultiMAGNA+. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1669-1682. [PMID: 28829315 DOI: 10.1109/tcbb.2017.2740381] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Network alignment (NA) aims to find a node mapping that identifies topologically or functionally similar network regions between molecular networks of different species. Analogous to genomic sequence alignment, NA can be used to transfer biological knowledge from well- to poorly-studied species between aligned network regions. Pairwise NA (PNA) finds similar regions between two networks while multiple NA (MNA) can align more than two networks. We focus on MNA. Existing MNA methods aim to maximize total similarity over all aligned nodes (node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Directly optimizing edge conservation during alignment construction in addition to node conservation may result in superior alignments. Thus, we present a novel MNA method called multiMAGNA++ that can achieve this. Indeed, multiMAGNA++ outperforms or is on par with existing MNA methods, while often completing faster than existing methods. That is, multiMAGNA++ scales well to larger network data and can be parallelized effectively. During method evaluation, we also introduce new MNA quality measures to allow for more fair MNA method comparison compared to the existing alignment quality measures. The multiMAGNA++ code is available on the method's web page at http://nd.edu/~cone/multiMAGNA++/.
Collapse
|
22
|
Huang J, Gong M, Ma L. A Global Network Alignment Method Using Discrete Particle Swarm Optimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:705-718. [PMID: 27775534 DOI: 10.1109/tcbb.2016.2618380] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Molecular interactions data increase exponentially with the advance of biotechnology. This makes it possible and necessary to comparatively analyze the different data at a network level. Global network alignment is an important network comparison approach to identify conserved subnetworks and get insight into evolutionary relationship across species. Network alignment which is analogous to subgraph isomorphism is known to be an NP-hard problem. In this paper, we introduce a novel heuristic Particle-Swarm-Optimization based Network Aligner (PSONA), which optimizes a weighted global alignment model considering both protein sequence similarity and interaction conservations. The particle statuses and status updating rules are redefined in a discrete form by using permutation. A seed-and-extend strategy is employed to guide the searching for the superior alignment. The proposed initialization method "seeds" matches with high sequence similarity into the alignment, which guarantees the functional coherence of the mapping nodes. A greedy local search method is designed as the "extension" procedure to iteratively optimize the edge conservations. PSONA is compared with several state-of-art methods on ten network pairs combined by five species. The experimental results demonstrate that the proposed aligner can map the proteins with high functional coherence and can be used as a booster to effectively refine the well-studied aligners.
Collapse
|
23
|
Vyas R, Bapat S, Goel P, Karthikeyan M, Tambe SS, Kulkarni BD. Application of Genetic Programming (GP) Formalism for Building Disease Predictive Models from Protein-Protein Interactions (PPI) Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:27-37. [PMID: 28113781 DOI: 10.1109/tcbb.2016.2621042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Protein-protein interactions (PPIs) play a vital role in the biological processes involved in the cell functions and disease pathways. The experimental methods known to predict PPIs require tremendous efforts and the results are often hindered by the presence of a large number of false positives. Herein, we demonstrate the use of a new Genetic Programming (GP) based Symbolic Regression (SR) approach for predicting PPIs related to a disease. In a case study, a dataset consisting of one hundred and thirty five PPI complexes related to cancer was used to construct a generic PPI predicting model with good PPI prediction accuracy and generalization ability. A high correlation coefficient(CC) of 0.893, low root mean square error (RMSE) and mean absolute percentage error (MAPE) values of 478.221 and 0.239, respectively were achieved for both the training and test set outputs. To validate the discriminatory nature of the model, it was applied on a dataset of diabetes complexes where it yielded significantly low CC values. Thus, the GP model developed here serves a dual purpose: (a)a predictor of the binding energy of cancer related PPI complexes, and (b)a classifier for discriminating PPI complexes related to cancer from those of other diseases.
Collapse
|
24
|
Elmsallati A, Msalati A, Kalita J. Index-Based Network Aligner of Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:330-336. [PMID: 28113986 DOI: 10.1109/tcbb.2016.2613098] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Network Alignment over graph-structured data has received considerable attention in many recent applications. Global network alignment tries to uniquely find the best mapping for a node in one network to only one node in another network. The mapping is performed according to some matching criteria that depend on the nature of data. In molecular biology, functional orthologs, protein complexes, and evolutionary conserved pathways are some examples of information uncovered by global network alignment. Current techniques for global network alignment suffer from several drawbacks, e.g., poor performance and high memory requirements. We address these problems by proposing IBNAL, Indexes-Based Network ALigner, for better alignment quality and faster results. To accelerate the alignment step, IBNAL makes use of a novel clique-based index and is able to align large networks in seconds. IBNAL produces a higher topological quality alignment and comparable biological match in alignment relative to other state-of-the-art aligners even though topological fit is primarily used to match nodes. IBNAL's results confirm and give another evidence that homology information is more likely to be encoded in network topology than sequence information.
Collapse
|
25
|
Mir A, Naghibzadeh M, Saadati N. INDEX: Incremental depth extension approach for protein-protein interaction networks alignment. Biosystems 2017; 162:24-34. [PMID: 28860070 DOI: 10.1016/j.biosystems.2017.08.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 05/29/2017] [Accepted: 08/17/2017] [Indexed: 12/11/2022]
Abstract
High-throughput methods have provided us with a large amount of data pertaining to protein-protein interaction networks. The alignment of these networks enables us to better understand biological systems. Given the fact that the alignment of networks is computationally intractable, it is important to introduce a more efficient and accurate algorithm which finds as large as possible similar areas among networks. This paper proposes a new algorithm named INDEX for the global alignment of protein-protein interaction networks. INDEX has multiple phases. First, it computes topological and biological scores of proteins and creates the initial alignment based on the proposed matching score strategy. Using networks topologies and aligned proteins, it then selects a set of high scoring proteins in each phase and extends new alignments around them until final alignment is obtained. Proposing a new alignment strategy, detailed consideration of matching scores, and growth of the alignment core has led INDEX to obtain a larger common connected subgraph with a much greater number of edges compared with previous methods. Regarding other measures such as edge correctness, symmetric substructure score, and runtime, the proposed algorithm performed considerably better than existing popular methods. Our results show that INDEX can be a promising method for identifying functionally conserved interactions. AVAILABILITY The INDEX executable file is available at https://github.com/a-mir/index/.
Collapse
Affiliation(s)
- Abolfazl Mir
- Department of Computer Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Nayyereh Saadati
- Department of Internal Medicine, Ghaem Hospital, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
26
|
Abstract
Paralleling the increasing availability of protein-protein interaction (PPI) network data, several network alignment methods have been proposed. Network alignments have been used to uncover functionally conserved network parts and to transfer annotations. However, due to the computational intractability of the network alignment problem, aligners are heuristics providing divergent solutions and no consensus exists on a gold standard, or which scoring scheme should be used to evaluate them. We comprehensively evaluate the alignment scoring schemes and global network aligners on large scale PPI data and observe that three methods, HUBALIGN, L-GRAAL and NATALIE, regularly produce the most topologically and biologically coherent alignments. We study the collective behaviour of network aligners and observe that PPI networks are almost entirely aligned with a handful of aligners that we unify into a new tool, Ulign. Ulign enables complete alignment of two networks, which traditional global and local aligners fail to do. Also, multiple mappings of Ulign define biologically relevant soft clusterings of proteins in PPI networks, which may be used for refining the transfer of annotations across networks. Hence, PPI networks are already well investigated by current aligners, so to gain additional biological insights, a paradigm shift is needed. We propose such a shift come from aligning all available data types collectively rather than any particular data type in isolation from others.
Collapse
|
27
|
Mamano N, Hayes WB. SANA: simulated annealing far outperforms many other search algorithms for biological network alignment. Bioinformatics 2017; 33:2156-2164. [DOI: 10.1093/bioinformatics/btx090] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 02/08/2017] [Indexed: 11/14/2022] Open
|
28
|
Gong M, Peng Z, Ma L, Huang J. Global Biological Network Alignment by Using Efficient Memetic Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:1117-1129. [PMID: 28055895 DOI: 10.1109/tcbb.2015.2511741] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
High-throughput experimental screening techniques have resulted in a large number of biological network data such as protein-protein interactions (PPI) data. The analysis of these data can enhance our understanding of cellular processes. PPI network alignment is one of the comparative analysis methods for analyzing biological networks. Research on PPI networks can identify conserved subgraphs and help us to understand evolutionary trajectories across species. Some evolutionary algorithms have been proposed for coping with PPI network alignment, but most of them are limited by the lower search efficiency due to the lack of the priori knowledge. In this paper, we propose a memetic algorithm, denoted as MeAlgn, to solve the biological network alignment by optimizing an objective function which introduces topological structure and sequence similarities. MeAlign combines genetic algorithm with a local search refinement. The genetic algorithm is to find interesting alignment solution regions, and the local search is to find optimal solutions around the regions. The proposed algorithm first develops a coarse similarity score matrix for initialization and then it uses a specific neighborhood heuristic local search strategy to find an optimal alignment. In MeAlign, the information of topological structure and sequence similarities is used to guide our mapping. Experimental results demonstrate that our algorithm can achieve a better mapping than some state-of-the-art algorithms and it makes a better balance between the network topology and nodes sequence similarities.
Collapse
|
29
|
Meng L, Striegel A, Milenković T. Local versus global biological network alignment. Bioinformatics 2016; 32:3155-3164. [PMID: 27357169 PMCID: PMC5048063 DOI: 10.1093/bioinformatics/btw348] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 02/18/2016] [Accepted: 05/23/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Network alignment (NA) aims to find regions of similarities between species' molecular networks. There exist two NA categories: local (LNA) and global (GNA). LNA finds small highly conserved network regions and produces a many-to-many node mapping. GNA finds large conserved regions and produces a one-to-one node mapping. Given the different outputs of LNA and GNA, when a new NA method is proposed, it is compared against existing methods from the same category. However, both NA categories have the same goal: to allow for transferring functional knowledge from well- to poorly-studied species between conserved network regions. So, which one to choose, LNA or GNA? To answer this, we introduce the first systematic evaluation of the two NA categories. RESULTS We introduce new measures of alignment quality that allow for fair comparison of the different LNA and GNA outputs, as such measures do not exist. We provide user-friendly software for efficient alignment evaluation that implements the new and existing measures. We evaluate prominent LNA and GNA methods on synthetic and real-world biological networks. We study the effect on alignment quality of using different interaction types and confidence levels. We find that the superiority of one NA category over the other is context-dependent. Further, when we contrast LNA and GNA in the application of learning novel protein functional knowledge, the two produce very different predictions, indicating their complementarity. Our results and software provide guidelines for future NA method development and evaluation. AVAILABILITY AND IMPLEMENTATION Software: http://www.nd.edu/~cone/LNA_GNA CONTACT: : tmilenko@nd.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lei Meng
- Department of Computer Science and Engineering ECK Institute of Global Health and Interdisciplinary Center for Network Science and Applications
| | - Aaron Striegel
- Department of Computer Science and Engineering Wireless Institute, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering ECK Institute of Global Health and Interdisciplinary Center for Network Science and Applications
| |
Collapse
|
30
|
Elmsallati A, Clark C, Kalita J. Global Alignment of Protein-Protein Interaction Networks: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:689-705. [PMID: 26336140 DOI: 10.1109/tcbb.2015.2474391] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we survey algorithms that perform global alignment of networks or graphs. Global network alignment aligns two or more given networks to find the best mapping from nodes in one network to nodes in other networks. Since graphs are a common method of data representation, graph alignment has become important with many significant applications. Protein-protein interactions can be modeled as networks and aligning these networks of protein interactions has many applications in biological research. In this survey, we review algorithms for global pairwise alignment highlighting various proposed approaches, and classify them based on their methodology. Evaluation metrics that are used to measure the quality of the resulting alignments are also surveyed. We discuss and present a comparison between selected aligners on the same datasets and evaluate using the same evaluation metrics. Finally, a quick overview of the most popular databases of protein interaction networks is presented focusing on datasets that have been used recently.
Collapse
|
31
|
Natalie 2.0: Sparse Global Network Alignment as a Special Case of Quadratic Assignment. ALGORITHMS 2015. [DOI: 10.3390/a8041035] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|