1
|
Song J, Song Z, Gong Y, Ge L, Lou W. Advancing cancer driver gene identification through an integrative network and pathway approach. J Biomed Inform 2024; 158:104729. [PMID: 39306314 DOI: 10.1016/j.jbi.2024.104729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 08/29/2024] [Accepted: 09/16/2024] [Indexed: 09/26/2024]
Abstract
OBJECTIVE Cancer is a complex genetic disease characterized by the accumulation of various mutations, with driver genes playing a crucial role in cancer initiation and progression. Distinguishing driver genes from passenger mutations is essential for understanding cancer biology and discovering therapeutic targets. However, the majority of existing methods ignore the mutational heterogeneity and commonalities among patients, which hinders the identification of driver genes more effectively. METHODS This study introduces MCSdriver, a novel computational model that integrates network and pathway information to prioritize the identification of cancer driver genes. MCSdriver employs a bidirectional random walk algorithm to quantify the mutual exclusivity and functional relationships between mutated genes within patient cohorts. It calculates similarity scores based on a mutual exclusivity-weighted network and pathway coverage patterns, accounting for patient-specific heterogeneity and molecular profile similarity. RESULTS This approach enhances the accuracy and quality of driver gene identification. MCSdriver demonstrates superior performance in identifying cancer driver genes across four cancer types from The Cancer Genome Atlas, showing a higher F-score, Recall and Precision compared to existing ranking list-based and module-based models. CONCLUSION The MCSdriver model not only outperforms other models in identifying known cancer driver genes but also effectively identifies novel driver genes involved in cancer-related biological processes. The model's consideration of patient-specific heterogeneity and similarity in molecular profiles significantly enhances the accuracy and quality of driver gene identification. Validation through Gene Ontology enrichment analysis and literature mining further underscores its potential application value in personalized cancer therapy, offering a promising tool for advancing our understanding and treatment of cancer.
Collapse
Affiliation(s)
- Junrong Song
- The School of Information, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China; Yunnan Key Laboratory of Service Computing, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China.
| | - Zhiming Song
- The School of Information, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China; Yunnan Key Laboratory of Service Computing, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China
| | - Yuanli Gong
- The School of Information, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China
| | - Lichang Ge
- The School of Information, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China
| | - Wenlu Lou
- Yunnan Key Laboratory of Service Computing, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China; The School of Business, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China
| |
Collapse
|
2
|
Menor-Flores M, Vega-Rodríguez MA. A protein-protein interaction network aligner study in the multi-objective domain. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 250:108188. [PMID: 38657382 DOI: 10.1016/j.cmpb.2024.108188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 04/14/2024] [Accepted: 04/17/2024] [Indexed: 04/26/2024]
Abstract
BACKGROUND AND OBJECTIVE The protein-protein interaction (PPI) network alignment has proven to be an efficient technique in the diagnosis and prevention of certain diseases. However, the difficulty in maximizing, at the same time, the two qualities that measure the goodness of alignments (topological and biological quality) has led aligners to produce very different alignments. Thus making a comparative study among alignments of such different qualities a big challenge. Multi-objective optimization is a computer method, which is very powerful in this kind of contexts because both conflicting qualities are considered together. Analysing the alignments of each PPI network aligner with multi-objective methodologies allows you to visualize a bigger picture of the alignments and their qualities, obtaining very interesting conclusions. This paper proposes a comprehensive PPI network aligner study in the multi-objective domain. METHODS Alignments from each aligner and all aligners together were studied and compared to each other via Pareto dominance methodologies. The best alignments produced by each aligner and all aligners together for five different alignment scenarios were displayed in Pareto front graphs. Later, the aligners were ranked according to the topological, biological, and combined quality of their alignments. Finally, the aligners were also ranked based on their average runtimes. RESULTS Regarding aligners constructing the best overall alignments, we found that SAlign, BEAMS, SANA, and HubAlign are the best options. Additionally, the alignments of best topological quality are produced by: SANA, SAlign, and HubAlign aligners. On the contrary, the aligners returning the alignments of best biological quality are: BEAMS, TAME, and WAVE. However, if there are time constraints, it is recommended to select SAlign to obtain high topological quality alignments and PISwap or SAlign aligners for high biological quality alignments. CONCLUSIONS The use of the SANA aligner is recommended for obtaining the best alignments of topological quality, BEAMS for alignments of the best biological quality, and SAlign for alignments of the best combined topological and biological quality. Simultaneously, SANA and BEAMS have above-average runtimes. Therefore, it is suggested, if necessary due to time restrictions, to choose other, faster aligners like SAlign or PISwap whose alignments are also of high quality.
Collapse
Affiliation(s)
- Manuel Menor-Flores
- Escuela Politécnica, Universidad de Extremadura,(1) Campus Universitario s/n, 10003 Cáceres, Spain.
| | - Miguel A Vega-Rodríguez
- Escuela Politécnica, Universidad de Extremadura,(1) Campus Universitario s/n, 10003 Cáceres, Spain.
| |
Collapse
|
3
|
Hayes WB. Exact p-values for global network alignments via combinatorial analysis of shared GO terms : REFANGO: Rigorous Evaluation of Functional Alignments of Networks using Gene Ontology. J Math Biol 2024; 88:50. [PMID: 38551701 PMCID: PMC10980677 DOI: 10.1007/s00285-024-02058-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 01/21/2024] [Accepted: 02/05/2024] [Indexed: 04/01/2024]
Abstract
Network alignment aims to uncover topologically similar regions in the protein-protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no "gold standard" exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown's Method. We note that, just as with BLAST's p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, UC Irvine, Irvine, USA.
| |
Collapse
|
4
|
Menor-Flores M, Vega-Rodríguez MA. Boosting-based ensemble of global network aligners for PPI network alignment. EXPERT SYSTEMS WITH APPLICATIONS 2023; 230:120671. [DOI: 10.1016/j.eswa.2023.120671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
5
|
Girisha MN, Badiger VP, Pattar S. A comprehensive review of global alignment of multiple biological networks: background, applications and open issues. NETWORK MODELING ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS 2022; 11:9. [DOI: 10.1007/s13721-022-00353-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/16/2021] [Accepted: 12/16/2021] [Indexed: 01/03/2025]
|
6
|
Corominas GR, Blesa MJ, Blum C. AntNetAlign: Ant colony optimization for Network Alignment. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
Carrasco-Santano I, Vega-Rodríguez MA. MOMEA: Multi-Objective Mutation-based Evolutionary Algorithm for the alignment of protein networks. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
8
|
Wang S, Chen X, Frederisy BJ, Mbakogu BA, Kanne AD, Khosravi P, Hayes WB. On the current failure-but bright future-of topology-driven biological network alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:1-44. [PMID: 35871888 DOI: 10.1016/bs.apcsb.2022.05.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Since the function of a protein is defined by its interaction partners, and since we expect similar interaction patterns across species, the alignment of protein-protein interaction (PPI) networks between species, based on network topology alone, should uncover functionally related proteins across species. Surprisingly, despite the publication of more than fifty algorithms aimed at performing PPI network alignment, few have demonstrated a statistically significant link between network topology and functional similarity, and none have demonstrated that orthologs can be recovered using network topology alone. We find that the major contributing factors to this surprising failure are: (i) edge densities in most currently available experimental PPI networks are demonstrably too low to expect topological network alignment to succeed; (ii) in the few cases where the edge densities are high enough, some measures of topological similarity easily uncover functionally similar proteins while others do not; and (iii) most network alignment algorithms to date perform poorly at optimizing even their own topological objective functions, hampering their ability to use topology effectively. We demonstrate that SANA-the Simulated Annealing Network Aligner-significantly outperforms existing aligners at optimizing their own objective functions, even achieving near-optimal solutions when the optimal solution is known. We offer the first demonstration of global network alignments based on topology alone that align functionally similar proteins with p-values in some cases below 10-300. We predict that topological network alignment has a bright future as edge densities increase toward the value where good alignments become possible. We demonstrate that when enough common topology is present at high enough edge densities-for example in the recent, partly synthetic networks of the Integrated Interaction Database-topological network alignment easily recovers most orthologs, paving the way toward high-throughput functional prediction based on topology-driven network alignment.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Xiaoyin Chen
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Brent J Frederisy
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Benedict A Mbakogu
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Amy D Kanne
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Pasha Khosravi
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, United States.
| |
Collapse
|
9
|
Ma L, Shao Z, Li L, Huang J, Wang S, Lin Q, Li J, Gong M, Nandi AK. Heuristics and metaheuristics for biological network alignment: A review. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.08.156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
10
|
Menor-Flores M, Vega-Rodríguez MA. Decomposition-based multi-objective optimization approach for PPI network alignment. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
11
|
Abstract
Since the large-scale experimental characterization of protein–protein interactions (PPIs) is not possible for all species, several computational PPI prediction methods have been developed that harness existing data from other species. While PPI network prediction has been extensively used in eukaryotes, microbial network inference has lagged behind. However, bacterial interactomes can be built using the same principles and techniques; in fact, several methods are better suited to bacterial genomes. These predicted networks allow systems-level analyses in species that lack experimental interaction data. This review describes the current network inference and analysis techniques and summarizes the use of computationally-predicted microbial interactomes to date.
Collapse
|
12
|
Pairwise Biological Network Alignment Based on Discrete Bat Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5548993. [PMID: 34777564 PMCID: PMC8580637 DOI: 10.1155/2021/5548993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 09/29/2021] [Accepted: 10/13/2021] [Indexed: 11/18/2022]
Abstract
The development of high-throughput technology has provided a reliable technical guarantee for an increased amount of available data on biological networks. Network alignment is used to analyze these data to identify conserved functional network modules and understand evolutionary relationships across species. Thus, an efficient computational network aligner is needed for network alignment. In this paper, the classic bat algorithm is discretized and applied to the network alignment. The bat algorithm initializes the population randomly and then searches for the optimal solution iteratively. Based on the bat algorithm, the global pairwise alignment algorithm BatAlign is proposed. In BatAlign, the individual velocity and the position are represented by a discrete code. BatAlign uses a search algorithm based on objective function that uses the number of conserved edges as the objective function. The similarity between the networks is used to initialize the population. The experimental results showed that the algorithm was able to match proteins with high functional consistency and reach a relatively high topological quality.
Collapse
|
13
|
Mahdipour E, Ghasemzadeh M. The protein-protein interaction network alignment using recurrent neural network. Med Biol Eng Comput 2021; 59:2263-2286. [PMID: 34529185 DOI: 10.1007/s11517-021-02428-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 08/05/2021] [Indexed: 11/29/2022]
Abstract
The main challenge of biological network alignment is that the problem of finding the alignments in two graphs is NP-hard. The discovery of protein-protein interaction (PPI) networks is of great importance in bioinformatics due to their utilization in identifying the cellular pathways, finding new medicines, and disease recognition. In this regard, we describe the network alignment method in the form of a classification problem for the very first time and introduce a deep network that finds the alignment of nodes present in the two networks. We call this method RENA, which means Network Alignment using REcurrent neural network. The proposed solution consists of three steps; in the first phase, we obtain the sequence and topological similarities from the networks' structure. For the second phase, the dataset needed for the transformation of the problem into a classification problem is created from obtained features. In the third phase, we predict the nodes' alignment between two networks using deep learning. We used Biogrid dataset for RENA evaluation. The RENA method is compared with three classification approaches of support vector machine, K-nearest neighbors, and linear discriminant analysis. The experimental results demonstrate the efficiency of the RENA method and 100% accuracy in PPI network alignment prediction.
Collapse
Affiliation(s)
- Elham Mahdipour
- Computer Engineering Department at Khavaran Institute of Higher Education, Mashhad, Iran.
| | | |
Collapse
|
14
|
Woo HM, Yoon BJ. MONACO: accurate biological network alignment through optimal neighborhood matching between focal nodes. Bioinformatics 2021; 37:1401-1410. [PMID: 33165517 DOI: 10.1093/bioinformatics/btaa962] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 10/19/2020] [Accepted: 11/02/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Alignment of protein-protein interaction networks can be used for the unsupervised prediction of functional modules, such as protein complexes and signaling pathways, that are conserved across different species. To date, various algorithms have been proposed for biological network alignment, many of which attempt to incorporate topological similarity between the networks into the alignment process with the goal of constructing accurate and biologically meaningful alignments. Especially, random walk models have been shown to be effective for quantifying the global topological relatedness between nodes that belong to different networks by diffusing node-level similarity along the interaction edges. However, these schemes are not ideal for capturing the local topological similarity between nodes. RESULTS In this article, we propose MONACO, a novel and versatile network alignment algorithm that finds highly accurate pairwise and multiple network alignments through the iterative optimal matching of 'local' neighborhoods around focal nodes. Extensive performance assessment based on real networks as well as synthetic networks, for which the ground truth is known, demonstrates that MONACO clearly and consistently outperforms all other state-of-the-art network alignment algorithms that we have tested, in terms of accuracy, coherence and topological quality of the aligned network regions. Furthermore, despite the sharply enhanced alignment accuracy, MONACO remains computationally efficient and it scales well with increasing size and number of networks. AVAILABILITY AND IMPLEMENTATION Matlab implementation is freely available at https://github.com/bjyoontamu/MONACO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hyun-Myung Woo
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77845, USA.,Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| |
Collapse
|
15
|
Erten C, Houdjedj A, Kazan H. Ranking cancer drivers via betweenness-based outlier detection and random walks. BMC Bioinformatics 2021; 22:62. [PMID: 33568049 PMCID: PMC7877041 DOI: 10.1186/s12859-021-03989-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 01/31/2021] [Indexed: 12/04/2022] Open
Abstract
Background Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.
Collapse
Affiliation(s)
- Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Aissa Houdjedj
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, Antalya, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey.
| |
Collapse
|
16
|
Alcalá A, Alberich R, Llabrés M, Rosselló F, Valiente G. AligNet: alignment of protein-protein interaction networks. BMC Bioinformatics 2020; 21:265. [PMID: 33203353 PMCID: PMC7672851 DOI: 10.1186/s12859-020-3502-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 04/16/2020] [Indexed: 11/23/2022] Open
Abstract
Background All molecular functions and biological processes are carried out by groups of proteins that interact with each other. Metaproteomic data continuously generates new proteins whose molecular functions and relations must be discovered. A widely accepted structure to model functional relations between proteins are protein-protein interaction networks (PPIN), and their analysis and alignment has become a key ingredient in the study and prediction of protein-protein interactions, protein function, and evolutionary conserved assembly pathways of protein complexes. Several PPIN aligners have been proposed, but attaining the right balance between network topology and biological information is one of the most difficult and key points in the design of any PPIN alignment algorithm. Results Motivated by the challenge of well-balanced and efficient algorithms, we have designed and implemented AligNet, a parameter-free pairwise PPIN alignment algorithm aimed at bridging the gap between topologically efficient and biologically meaningful matchings. A comparison of the results obtained with AligNet and with the best aligners shows that AligNet achieves indeed a good balance between topological and biological matching. Conclusion In this paper we present AligNet, a new pairwise global PPIN aligner that produces biologically meaningful alignments, by achieving a good balance between structural matching and protein function conservation, and more efficient computations than state-of-the-art tools.
Collapse
Affiliation(s)
- Adrià Alcalá
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122, Spain.,Balearic Islands Health Research Institute (IdISBa), Palma de Mallorca, E-07010, Spain
| | - Ricardo Alberich
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122, Spain.,Balearic Islands Health Research Institute (IdISBa), Palma de Mallorca, E-07010, Spain
| | - Mercè Llabrés
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122, Spain. .,Balearic Islands Health Research Institute (IdISBa), Palma de Mallorca, E-07010, Spain.
| | - Francesc Rosselló
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122, Spain.,Balearic Islands Health Research Institute (IdISBa), Palma de Mallorca, E-07010, Spain
| | - Gabriel Valiente
- Algorithms, Bioinformatics, Complexity and Formal Methods Research Group, Technical University of Catalonia, Barcelona, E-08034, Spain
| |
Collapse
|
17
|
Llabrés M, Riera G, Rosselló F, Valiente G. Alignment of biological networks by integer linear programming: virus-host protein-protein interaction networks. BMC Bioinformatics 2020; 21:434. [PMID: 33203352 PMCID: PMC7671827 DOI: 10.1186/s12859-020-03733-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 09/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The alignment of protein-protein interaction networks was recently formulated as an integer quadratic programming problem, along with a linearization that can be solved by integer linear programming software tools. However, the resulting integer linear program has a huge number of variables and constraints, rendering it of no practical use. RESULTS We present a compact integer linear programming reformulation of the protein-protein interaction network alignment problem, which can be solved using state-of-the-art mathematical modeling and integer linear programming software tools, along with empirical results showing that small biological networks, such as virus-host protein-protein interaction networks, can be aligned in a reasonable amount of time on a personal computer and the resulting alignments are structurally coherent and biologically meaningful. CONCLUSIONS The implementation of the integer linear programming reformulation using current mathematical modeling and integer linear programming software tools provided biologically meaningful alignments of virus-host protein-protein interaction networks.
Collapse
Affiliation(s)
- Mercè Llabrés
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122 Spain
- Balearic Islands Health Research Institute, Palma de Mallorca, E-07010 Spain
| | - Gabriel Riera
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122 Spain
- Balearic Islands Health Research Institute, Palma de Mallorca, E-07010 Spain
| | - Francesc Rosselló
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122 Spain
- Balearic Islands Health Research Institute, Palma de Mallorca, E-07010 Spain
| | - Gabriel Valiente
- Algorithms, Bioinformatics, Complexity and Formal Methods Research Group, Technical University of Catalonia, Barcelona, E-08034 Spain
| |
Collapse
|
18
|
Ayub U, Haider I, Naveed H. SAlign-a structure aware method for global PPI network alignment. BMC Bioinformatics 2020; 21:500. [PMID: 33148180 PMCID: PMC7640460 DOI: 10.1186/s12859-020-03827-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 10/20/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High throughput experiments have generated a significantly large amount of protein interaction data, which is being used to study protein networks. Studying complete protein networks can reveal more insight about healthy/disease states than studying proteins in isolation. Similarly, a comparative study of protein-protein interaction (PPI) networks of different species reveals important insights which may help in disease analysis and drug design. The study of PPI network alignment can also helps in understanding the different biological systems of different species. It can also be used in transfer of knowledge across different species. Different aligners have been introduced in the last decade but developing an accurate and scalable global alignment algorithm that can ensures the biological significance alignment is still challenging. RESULTS This paper presents a novel global pairwise network alignment algorithm, SAlign, which uses topological and biological information in the alignment process. The proposed algorithm incorporates sequence and structural information for computing biological scores, whereas previous algorithms only use sequence information. The alignment based on the proposed technique shows that the combined effect of structure and sequence results in significantly better pairwise alignments. We have compared SAlign with state-of-art algorithms on the basis of semantic similarity of alignment and the number of aligned nodes on multiple PPI network pairs. The results of SAlign on the network pairs which have high percentage of proteins with available structure are 3-63% semantically better than all existing techniques. Furthermore, it also aligns 5-14% more nodes of these network pairs as compared to existing aligners. The results of SAlign on other PPI network pairs are comparable or better than all existing techniques. We also introduce [Formula: see text], a Monte Carlo based alignment algorithm, that produces multiple network alignments with similar semantic similarity. This helps the user to pick biologically meaningful alignments. CONCLUSION The proposed algorithm has the ability to find the alignments that are more biologically significant/relevant as compared to the alignments of existing aligners. Furthermore, the proposed method is able to generate alternate alignments that help in studying different genes/proteins of the specie.
Collapse
Affiliation(s)
- Umair Ayub
- Department of Computing, National University of Computer and Emerging Sciences, Islamabad, 40100, Pakistan.,Computational Biology Research Lab, Islamabad, 40100, Pakistan
| | - Imran Haider
- Department of Computing, National University of Computer and Emerging Sciences, Islamabad, 40100, Pakistan.,Computational Biology Research Lab, Islamabad, 40100, Pakistan
| | - Hammad Naveed
- Department of Computing, National University of Computer and Emerging Sciences, Islamabad, 40100, Pakistan. .,Computational Biology Research Lab, Islamabad, 40100, Pakistan.
| |
Collapse
|
19
|
Wang Y, Jeong H, Yoon BJ, Qian X. ClusterM: a scalable algorithm for computational prediction of conserved protein complexes across multiple protein interaction networks. BMC Genomics 2020; 21:615. [PMID: 33208103 PMCID: PMC7677834 DOI: 10.1186/s12864-020-07010-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background The current computational methods on identifying conserved protein complexes across multiple Protein-Protein Interaction (PPI) networks suffer from the lack of explicit modeling of the desired topological properties within conserved protein complexes as well as their scalability. Results To overcome those issues, we propose a scalable algorithm—ClusterM—for identifying conserved protein complexes across multiple PPI networks through the integration of network topology and protein sequence similarity information. ClusterM overcomes the computational barrier that existed in previous methods, where the complexity escalates exponentially when handling an increasing number of PPI networks; and it is able to detect conserved protein complexes with both topological separability and cohesive protein sequence conservation. On two independent compendiums of PPI networks from Saccharomyces cerevisiae (Sce, yeast), Drosophila melanogaster (Dme, fruit fly), Caenorhabditis elegans (Cel, worm), and Homo sapiens (Hsa, human), we demonstrate that ClusterM outperforms other state-of-the-art algorithms by a significant margin and is able to identify de novo conserved protein complexes across four species that are missed by existing algorithms. Conclusions ClusterM can better capture the desired topological property of a typical conserved protein complex, which is densely connected within the complex while being well-separated from the rest of the networks. Furthermore, our experiments have shown that ClusterM is highly scalable and efficient when analyzing multiple PPI networks.
Collapse
|
20
|
Kazemi E, Grossglauser M. MPGM: Scalable and Accurate Multiple Network Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2040-2052. [PMID: 31056510 DOI: 10.1109/tcbb.2019.2914050] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein-protein interaction (PPI) network alignment is a canonical operation to transfer biological knowledge among species. The alignment of PPI-networks has many applications, such as the prediction of protein function, detection of conserved network motifs, and the reconstruction of species' phylogenetic relationships. A good multiple-network alignment (MNA), by considering the data related to several species, provides a deep understanding of biological networks and system-level cellular processes. With the massive amounts of available PPI data and the increasing number of known PPI networks, the problem of MNA is gaining more attention in the systems-biology studies. In this paper, we introduce a new scalable and accurate algorithm, called MPGM, for aligning multiple networks. The MPGM algorithm has two main steps: (i) SeedGeneration and (ii) MultiplePercolation. In the first step, to generate an initial set of seed tuples, the SeedGeneration algorithm uses only protein sequence similarities. In the second step, to align remaining unmatched nodes, the MultiplePercolation algorithm uses network structures and the seed tuples generated from the first step. We show that, with respect to different evaluation criteria, MPGM outperforms the other state-of-the-art algorithms. In addition, we guarantee the performance of MPGM under certain classes of network models. We introduce a sampling-based stochastic model for generating k correlated networks. We prove that for this model if a sufficient number of seed tuples are available, the MultiplePercolation algorithm correctly aligns almost all the nodes. Our theoretical results are supported by experimental evaluations over synthetic networks.
Collapse
|
21
|
Ma CY, Liao CS. A review of protein-protein interaction network alignment: From pathway comparison to global alignment. Comput Struct Biotechnol J 2020; 18:2647-2656. [PMID: 33033584 PMCID: PMC7533294 DOI: 10.1016/j.csbj.2020.09.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 09/01/2020] [Accepted: 09/05/2020] [Indexed: 12/13/2022] Open
Abstract
Network alignment provides a comprehensive way to discover the similar parts between molecular systems of different species based on topological and biological similarity. With such a strong basis, one can do comparative studies at a systems level in the field of computational biology. In this survey paper, we focus on protein-protein interaction networks and review some representative algorithms for network alignment in the past two decades as well as the state-of-the-art aligners. We also introduce the most popular evaluation measures in the literature to benchmark the performance of these approaches. Finally, we address several future challenges and the possible ways to conquer the existing problems of biological network alignment.
Collapse
Affiliation(s)
- Cheng-Yu Ma
- Chang Gung Memorial Hospital, No. 5, Fu-Hsing St., Kuei Shan Dist., Taoyuan City 33305, Taiwan, ROC
| | - Chung-Shou Liao
- National Tsing Hua University, No. 101, Section 2, Kuang-Fu Rd., Hsinchu City 30013, Taiwan, ROC
| |
Collapse
|
22
|
Credible seed identification for large-scale structural network alignment. Data Min Knowl Discov 2020. [DOI: 10.1007/s10618-020-00699-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
23
|
Abstract
In this study, we deal with the problem of biological network alignment (NA), which aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for the transfer of functional knowledge between the aligned nodes. We provide evidence that current NA methods, which assume that topologically similar nodes (i.e., nodes whose network neighborhoods are isomorphic-like) have high functional relatedness, do not actually end up aligning functionally related nodes. That is, we show that the current topological similarity assumption does not hold well. Consequently, we argue that a paradigm shift is needed with how the NA problem is approached. So, we redefine NA as a data-driven framework, called TARA (data-driven NA), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity. TARA makes no assumptions about what nodes should be aligned, distinguishing it from existing NA methods. Specifically, TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns (features). We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. TARA as currently implemented uses topological but not protein sequence information for functional knowledge transfer. In this context, we find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance. The software and data are available at http://www.nd.edu/~cone/TARA/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| |
Collapse
|
24
|
Woo HM, Jeong H, Yoon BJ. NAPAbench 2: A network synthesis algorithm for generating realistic protein-protein interaction (PPI) network families. PLoS One 2020; 15:e0227598. [PMID: 31986158 PMCID: PMC6984706 DOI: 10.1371/journal.pone.0227598] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/23/2019] [Indexed: 11/18/2022] Open
Abstract
Comparative network analysis provides effective computational means for gaining novel insights into the structural and functional compositions of biological networks. In recent years, various methods have been developed for biological network alignment, whose main goal is to identify important similarities and critical differences between networks in terms of their topology and composition. A major impediment to advancing network alignment techniques has been the lack of gold-standard benchmarks that can be used for accurate and comprehensive performance assessment of such algorithms. The original NAPAbench (network alignment performance assessment benchmark) was developed to address this problem, and it has been widely utilized by many researchers for the development, evaluation, and comparison of novel network alignment techniques. In this work, we introduce NAPAbench 2-a major update of the original NAPAbench that was introduced in 2012. NAPAbench 2 includes a completely redesigned network synthesis algorithm that can generate protein-protein interaction (PPI) network families whose characteristics closely match those of the latest real PPI networks. Furthermore, the network synthesis algorithm comes with an intuitive GUI that allows users to easily generate PPI network families with an arbitrary number of networks of any size, according to a flexible user-defined phylogeny. In addition, NAPAbench 2 provides updated benchmark datasets-created using the redesigned network synthesis algorithm-which can be used for comprehensive performance assessment of network alignment algorithms and their scalability.
Collapse
Affiliation(s)
- Hyun-Myung Woo
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America
| | - Hyundoo Jeong
- Department of Mechatronics Engineering, Incheon National University, Incheon, Republic of Korea
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, United States of America
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States of America
- * E-mail:
| |
Collapse
|
25
|
Hayes WB. An Introductory Guide to Aligning Networks Using SANA, the Simulated Annealing Network Aligner. Methods Mol Biol 2020; 2074:263-284. [PMID: 31583643 DOI: 10.1007/978-1-4939-9873-9_18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Sequence alignment has had an enormous impact on our understanding of biology, evolution, and disease. The alignment of biological networks holds similar promise. Biological networks generally model interactions between biomolecules such as proteins, genes, metabolites, or mRNAs. There is strong evidence that the network topology-the "structure" of the network-is correlated with the functions performed, so that network topology can be used to help predict or understand function. However, unlike sequence comparison and alignment-which is an essentially solved problem-network comparison and alignment is an NP-complete problem for which heuristic algorithms must be used.Here we introduce SANA, the Simulated Annealing Network Aligner. SANA is one of many algorithms proposed for the arena of biological network alignment. In the context of global network alignment, SANA stands out for its speed, memory efficiency, ease-of-use, and flexibility in the arena of producing alignments between two or more networks. SANA produces better alignments in minutes on a laptop than most other algorithms can produce in hours or days of CPU time on large server-class machines. We walk the user through how to use SANA for several types of biomolecular networks.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, USA.
| |
Collapse
|
26
|
Wen Bin Goh W, Thalappilly S, Thibault G. Moving beyond the current limits of data analysis in longevity and healthy lifespan studies. Drug Discov Today 2019; 24:2273-2285. [PMID: 31499187 DOI: 10.1016/j.drudis.2019.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 08/03/2019] [Accepted: 08/28/2019] [Indexed: 11/19/2022]
Abstract
Living longer with sustainable quality of life is becoming increasingly important in aging populations. Understanding associative biological mechanisms have proven daunting, because of multigenicity and population heterogeneity. Although Big Data and Artificial Intelligence (AI) could help, naïve adoption is ill advised. We hold the view that model organisms are better suited for big-data analytics but might lack relevance because they do not immediately reflect the human condition. Resolving this hurdle and bridging the human-model organism gap will require some finesse. This includes improving signal:noise ratios by appropriate contextualization of high-throughput data, establishing consistency across multiple high-throughput platforms, and adopting supporting technologies that provide useful in silico and in vivo validation strategies.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Bio-Data Science and Education Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore.
| | - Subhash Thalappilly
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Guillaume Thibault
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore; Institute of Molecular and Cell Biology, A*STAR, 138673, Singapore.
| |
Collapse
|
27
|
Kalecky K, Cho YR. PrimAlign: PageRank-inspired Markovian alignment for large biological networks. Bioinformatics 2019; 34:i537-i546. [PMID: 29949962 PMCID: PMC6022567 DOI: 10.1093/bioinformatics/bty288] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Motivation Cross-species analysis of large-scale protein–protein interaction (PPI) networks has played a significant role in understanding the principles deriving evolution of cellular organizations and functions. Recently, network alignment algorithms have been proposed to predict conserved interactions and functions of proteins. These approaches are based on the notion that orthologous proteins across species are sequentially similar and that topology of PPIs between orthologs is often conserved. However, high accuracy and scalability of network alignment are still a challenge. Results We propose a novel pairwise global network alignment algorithm, called PrimAlign, which is modeled as a Markov chain and iteratively transited until convergence. The proposed algorithm also incorporates the principles of PageRank. This approach is evaluated on tasks with human, yeast and fruit fly PPI networks. The experimental results demonstrate that PrimAlign outperforms several prevalent methods with statistically significant differences in multiple evaluation measures. PrimAlign, which is multi-platform, achieves superior performance in runtime with its linear asymptotic time complexity. Further evaluation is done with synthetic networks and results suggest that popular topological measures do not reflect real precision of alignments. Availability and implementation The source code is available at http://web.ecs.baylor.edu/faculty/cho/PrimAlign. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Karel Kalecky
- Institute of Biomedical Studies, Baylor University, Waco, TX, USA
| | - Young-Rae Cho
- Department of Computer Science, Baylor University, Waco, TX, USA
| |
Collapse
|
28
|
Abstract
BACKGROUND Biological networks describes the mechanisms which govern cellular functions. Temporal networks show how these networks evolve over time. Studying the temporal progression of network topologies is of utmost importance since it uncovers how a network evolves and how it resists to external stimuli and internal variations. Two temporal networks have co-evolving subnetworks if the evolving topologies of these subnetworks remain similar to each other as the network topology evolves over a period of time. In this paper, we consider the problem of identifying co-evolving subnetworks given a pair of temporal networks, which aim to capture the evolution of molecules and their interactions over time. Although this problem shares some characteristics of the well-known network alignment problems, it differs from existing network alignment formulations as it seeks a mapping of the two network topologies that is invariant to temporal evolution of the given networks. This is a computationally challenging problem as it requires capturing not only similar topologies between two networks but also their similar evolution patterns. RESULTS We present an efficient algorithm, Tempo, for solving identifying co-evolving subnetworks with two given temporal networks. We formally prove the correctness of our method. We experimentally demonstrate that Tempo scales efficiently with the size of network as well as the number of time points, and generates statistically significant alignments-even when evolution rates of given networks are high. Our results on a human aging dataset demonstrate that Tempo identifies novel genes contributing to the progression of Alzheimer's, Huntington's and Type II diabetes, while existing methods fail to do so. CONCLUSIONS Studying temporal networks in general and human aging specifically using Tempo enables us to identify age related genes from non age related genes successfully. More importantly, Tempo takes the network alignment problem one huge step forward by moving beyond the classical static network models.
Collapse
Affiliation(s)
- Rasha Elhesha
- University of Florida, CISE Department, Gainesville, Florida, 32611, US
| | - Aisharjya Sarkar
- University of Florida, CISE Department, Gainesville, Florida, 32611, US
| | - Christina Boucher
- University of Florida, CISE Department, Gainesville, Florida, 32611, US
| | - Tamer Kahveci
- University of Florida, CISE Department, Gainesville, Florida, 32611, US.
| |
Collapse
|
29
|
Nelson W, Zitnik M, Wang B, Leskovec J, Goldenberg A, Sharan R. To Embed or Not: Network Embedding as a Paradigm in Computational Biology. Front Genet 2019; 10:381. [PMID: 31118945 PMCID: PMC6504708 DOI: 10.3389/fgene.2019.00381] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 04/09/2019] [Indexed: 12/20/2022] Open
Abstract
Current technology is producing high throughput biomedical data at an ever-growing rate. A common approach to interpreting such data is through network-based analyses. Since biological networks are notoriously complex and hard to decipher, a growing body of work applies graph embedding techniques to simplify, visualize, and facilitate the analysis of the resulting networks. In this review, we survey traditional and new approaches for graph embedding and compare their application to fundamental problems in network biology with using the networks directly. We consider a broad variety of applications including protein network alignment, community detection, and protein function prediction. We find that in all of these domains both types of approaches are of value and their performance depends on the evaluation measures being used and the goal of the project. In particular, network embedding methods outshine direct methods according to some of those measures and are, thus, an essential tool in bioinformatics research.
Collapse
Affiliation(s)
- Walter Nelson
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Marinka Zitnik
- Department of Computer Science, Stanford University, Stanford, CA, United States
| | - Bo Wang
- Department of Computer Science, Stanford University, Stanford, CA, United States
- Peter Munk Cardiac Center, University Health Network, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, United States
- Chan Zuckerberg Biohub, San Francisco, CA, United States
| | - Anna Goldenberg
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
30
|
Hayes WB, Mamano N. SANA NetGO: a combinatorial approach to using Gene Ontology (GO) terms to score network alignments. Bioinformatics 2019; 34:1345-1352. [PMID: 29228175 DOI: 10.1093/bioinformatics/btx716] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 12/04/2017] [Indexed: 01/05/2023] Open
Abstract
Motivation Gene Ontology (GO) terms are frequently used to score alignments between protein-protein interaction (PPI) networks. Methods exist to measure GO similarity between proteins in isolation, but proteins in a network alignment are not isolated: each pairing is dependent on every other via the alignment itself. Existing measures fail to take into account the frequency of GO terms across networks, instead imposing arbitrary rules on when to allow GO terms. Results Here we develop NetGO, a new measure that naturally weighs infrequent, informative GO terms more heavily than frequent, less informative GO terms, without arbitrary cutoffs, instead downweighting GO terms according to their frequency in the networks being aligned. This is a global measure applicable only to alignments, independent of pairwise GO measures, in the same sense that the edge-based EC or S3 scores are global measures of topological similarity independent of pairwise topological similarities. We demonstrate the superiority of NetGO in alignments of predetermined quality and show that NetGO correlates with alignment quality better than any existing GO-based alignment measures. We also demonstrate that NetGO provides a measure of taxonomic similarity between species, consistent with existing taxonomic measuresa feature not shared with existing GObased network alignment measures. Finally, we re-score alignments produced by almost a dozen aligners from a previous study and show that NetGO does a better job at separating good alignments from bad ones. Availability and implementation Available as part of SANA. Contact whayes@uci.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA 92697-3435, USA
| | - Nil Mamano
- Department of Computer Science, University of California, Irvine, CA 92697-3435, USA
| |
Collapse
|
31
|
Jing F, Zhang SW, Zhang S. Brief Survey of Biological Network Alignment and a Variant with Incorporation of Functional Annotations. Curr Bioinform 2018. [DOI: 10.2174/1574893612666171020103747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Biological network alignment has been widely studied in the context of protein-protein interaction (PPI) networks, metabolic networks and others in bioinformatics. The topological structure of networks and genomic sequence are generally used by existing methods for achieving this task.Objective and Method:Here we briefly survey the methods generally used for this task and introduce a variant with incorporation of functional annotations based on similarity in Gene Ontology (GO). Making full use of GO information is beneficial to provide insights into precise biological network alignment.Results and Conclusion:We analyze the effect of incorporation of GO information to network alignment. Finally, we make a brief summary and discuss future directions about this topic.
Collapse
Affiliation(s)
- Fang Jing
- Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
32
|
Djeddi WE, Yahia SB, Nguifo EM. A Novel Computational Approach for Global Alignment for Multiple Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2060-2066. [PMID: 29994444 DOI: 10.1109/tcbb.2018.2808529] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Due to the rapid progress of biological networks for modeling biological systems, a lot of biomolecular networks have been producing more and more protein-protein interaction (PPI) data. Analyzing protein-protein interaction networks aims to find regions of topological and functional (dis)similarities between molecular networks of different species. The study of PPI networks has the potential to teach us as much about life process and diseases at the molecular level. Although few methods have been developed for multiple PPI network alignment and thus, new network alignment methods are of a compelling need. In this paper, we propose a novel algorithm for a global alignment of multiple protein-protein interaction networks called MAPPIN. The latter relies on information available for the proteins in the networks, such as sequence, function, and network topology. Our algorithm is perfectly designed to exploit current multi-core CPU architectures, and has been extensively tested on a real data (eight species). Our experimental results show that MAPPIN significantly outperforms NetCoffee in terms of coverage. Nevertheless, MAPPIN is handicapped by the time required to load the gene annotation file. An extensive comparison versus the pioneering PPI methods also show that MAPPIN is often efficient in terms of coverage, mean entropy, or mean normalized.
Collapse
|
33
|
Zhu Y, Li Y, Liu J, Qin L, Yu JX. Discovering large conserved functional components in global network alignment by graph matching. BMC Genomics 2018; 19:670. [PMID: 30255780 PMCID: PMC6157291 DOI: 10.1186/s12864-018-5027-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Aligning protein-protein interaction (PPI) networks is very important to discover the functionally conserved sub-structures between different species. In recent years, the global PPI network alignment problem has been extensively studied aiming at finding the one-to-one alignment with the maximum matching score. However, finding large conserved components remains challenging due to its NP-hardness. RESULTS We propose a new graph matching method GMAlign for global PPI network alignment. It first selects some pairs of important proteins as seeds, followed by a gradual expansion to obtain an initial matching, and then it refines the current result to obtain an optimal alignment result iteratively based on the vertex cover. We compare GMAlign with the state-of-the-art methods on the PPI network pairs obtained from the largest BioGRID dataset and validate its performance. The results show that our algorithm can produce larger size of alignment, and can find bigger and denser common connected subgraphs as well for the first time. Meanwhile, GMAlign can achieve high quality biological results, as measured by functional consistency and semantic similarity of the Gene Ontology terms. Moreover, we also show that GMAlign can achieve better results which are structurally and biologically meaningful in the detection of large conserved biological pathways between species. CONCLUSIONS GMAlign is a novel global network alignment tool to discover large conserved functional components between PPI networks. It also has many potential biological applications such as conserved pathway and protein complex discovery across species. The GMAlign software and datasets are avaialbile at https://github.com/yzlwhu/GMAlign .
Collapse
Affiliation(s)
- Yuanyuan Zhu
- School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072 China
| | - Yuezhi Li
- School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072 China
| | - Juan Liu
- School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072 China
| | - Lu Qin
- Centre of Quantum Computation and Intelligent Systems, University of Technology, Sydney, Australia
| | - Jeffrey Xu Yu
- The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
34
|
Elmsallati A, Msalati A, Kalita J. Index-Based Network Aligner of Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:330-336. [PMID: 28113986 DOI: 10.1109/tcbb.2016.2613098] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Network Alignment over graph-structured data has received considerable attention in many recent applications. Global network alignment tries to uniquely find the best mapping for a node in one network to only one node in another network. The mapping is performed according to some matching criteria that depend on the nature of data. In molecular biology, functional orthologs, protein complexes, and evolutionary conserved pathways are some examples of information uncovered by global network alignment. Current techniques for global network alignment suffer from several drawbacks, e.g., poor performance and high memory requirements. We address these problems by proposing IBNAL, Indexes-Based Network ALigner, for better alignment quality and faster results. To accelerate the alignment step, IBNAL makes use of a novel clique-based index and is able to align large networks in seconds. IBNAL produces a higher topological quality alignment and comparable biological match in alignment relative to other state-of-the-art aligners even though topological fit is primarily used to match nodes. IBNAL's results confirm and give another evidence that homology information is more likely to be encoded in network topology than sequence information.
Collapse
|
35
|
Alkan F, Erten C. RedNemo: topology-based PPI network reconstruction via repeated diffusion with neighborhood modifications. Bioinformatics 2017; 33:537-544. [PMID: 27797764 DOI: 10.1093/bioinformatics/btw655] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 10/12/2016] [Indexed: 01/28/2023] Open
Abstract
Motivation Analysis of protein-protein interaction (PPI) networks provides invaluable insight into several systems biology problems. High-throughput experimental techniques together with computational methods provide large-scale PPI networks. However, a major issue with these networks is their erroneous nature; they contain false-positive interactions and usually many more false-negatives. Recently, several computational methods have been proposed for network reconstruction based on topology, where given an input PPI network the goal is to reconstruct the network by identifying false-positives/-negatives as correctly as possible. Results We observe that the existing topology-based network reconstruction algorithms suffer several shortcomings. An important issue is regarding the scalability of their computational requirements, especially in terms of execution times, with the network sizes. They have only been tested on small-scale networks thus far and when applied on large-scale networks of popular PPI databases, the executions require unreasonable amounts of time, or may even crash without producing any output for some instances even after several months of execution. We provide an algorithm, RedNemo, for the topology-based network reconstruction problem. It provides more accurate networks than the alternatives as far as biological qualities measured in terms of most metrics based on gene ontology annotations. The recovery of a high-confidence network modified via random edge removals and rewirings is also better with RedNemo than with the alternatives under most of the experimented removal/rewiring ratios. Furthermore, through extensive tests on databases of varying sizes, we show that RedNemo achieves these results with much better running time performances. Availability and Implementation Supplementary material including source code, useful scripts, experimental data and the results are available at http://webprs.khas.edu.tr/~cesim/RedNemo.tar.gz. Contact cesim@khas.edu.tr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ferhat Alkan
- Center for Non-coding RNA in Technology and Health.,Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Grønnegardsvej 3, Frederiksberg, DK1870, Denmark
| | - Cesim Erten
- Department of Computer Engineering, Kadir Has University, Cibali, 34083 Istanbul, Turkey
| |
Collapse
|
36
|
Dopazo J, Erten C. Graph-theoretical comparison of normal and tumor networks in identifying BRCA genes. BMC SYSTEMS BIOLOGY 2017; 11:110. [PMID: 29166896 PMCID: PMC5700672 DOI: 10.1186/s12918-017-0495-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 11/13/2017] [Indexed: 12/18/2022]
Abstract
BACKGROUND Identification of driver genes related to certain types of cancer is an important research topic. Several systems biology approaches have been suggested, in particular for the identification of breast cancer (BRCA) related genes. Such approaches usually rely on differential gene expression and/or mutational landscape data. In some cases interaction network data is also integrated to identify cancer-related modules computationally. RESULTS We provide a framework for the comparative graph-theoretical analysis of networks integrating the relevant gene expression, mutations, and potein-protein interaction network data. The comparisons involve a graph-theoretical analysis of normal and tumor network pairs across all instances of a given set of breast cancer samples. The network measures under consideration are based on appropriate formulations of various centrality measures: betweenness, clustering coefficients, degree centrality, random walk distances, graph-theoretical distances, and Jaccard index centrality. CONCLUSIONS Among all the studied centrality-based graph-theoretical properties, we show that a betweenness-based measure differentiates BRCA genes across all normal versus tumor network pairs, than the rest of the popular centrality-based measures. The AUROC and AUPR values of the gene lists ordered with respect to the measures under study as compared to NCBI BioSystems pathway and the COSMIC database of cancer genes are the largest with the betweenness-based differentiation, followed by the measure based on degree centrality. In order to test the robustness of the suggested measures in prioritizing cancer genes, we further tested the two most promising measures, those based on betweenness and degree centralities, on randomly rewired networks. We show that both measures are quite resilient to noise in the input interaction network. We also compared the same measures against a state-of-the-art alternative disease gene prioritization method, MUFFFINN. We show that both our graph-theoretical measures outperform MUFFINN prioritizations in terms of ROC and precions/recall analysis. Finally, we filter the ordered list of the best measure, the betweenness-based differentiation, via a maximum-weight independent set formulation and investigate the top 50 genes in regards to literature verification. We show that almost all genes in the list are verified by the breast cancer literature and three genes are presented as novel genes that may potentialy be BRCA-related but missing in literature.
Collapse
Affiliation(s)
- Joaquin Dopazo
- Clinical Bioinformatics Research Area, Fundación Progreso y Salud, Hospital Virgen del Rocío, Sevilla, Spain
| | - Cesim Erten
- Computer Engineering, Antalya Bilim University, Antalya, Turkey.
| |
Collapse
|
37
|
Mir A, Naghibzadeh M, Saadati N. INDEX: Incremental depth extension approach for protein-protein interaction networks alignment. Biosystems 2017; 162:24-34. [PMID: 28860070 DOI: 10.1016/j.biosystems.2017.08.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 05/29/2017] [Accepted: 08/17/2017] [Indexed: 12/11/2022]
Abstract
High-throughput methods have provided us with a large amount of data pertaining to protein-protein interaction networks. The alignment of these networks enables us to better understand biological systems. Given the fact that the alignment of networks is computationally intractable, it is important to introduce a more efficient and accurate algorithm which finds as large as possible similar areas among networks. This paper proposes a new algorithm named INDEX for the global alignment of protein-protein interaction networks. INDEX has multiple phases. First, it computes topological and biological scores of proteins and creates the initial alignment based on the proposed matching score strategy. Using networks topologies and aligned proteins, it then selects a set of high scoring proteins in each phase and extends new alignments around them until final alignment is obtained. Proposing a new alignment strategy, detailed consideration of matching scores, and growth of the alignment core has led INDEX to obtain a larger common connected subgraph with a much greater number of edges compared with previous methods. Regarding other measures such as edge correctness, symmetric substructure score, and runtime, the proposed algorithm performed considerably better than existing popular methods. Our results show that INDEX can be a promising method for identifying functionally conserved interactions. AVAILABILITY The INDEX executable file is available at https://github.com/a-mir/index/.
Collapse
Affiliation(s)
- Abolfazl Mir
- Department of Computer Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Nayyereh Saadati
- Department of Internal Medicine, Ghaem Hospital, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
38
|
Xu W, Cao Y, Xie Z, He H, He S, Hong H, Bo X, Li F. NFPscanner: a webtool for knowledge-based deciphering of biomedical networks. BMC Bioinformatics 2017; 18:262. [PMID: 28521733 PMCID: PMC5437514 DOI: 10.1186/s12859-017-1673-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 05/03/2017] [Indexed: 12/05/2022] Open
Abstract
Background Many biological pathways have been created to represent different types of knowledge, such as genetic interactions, metabolic reactions, and gene-regulating and physical-binding relationships. Biologists are using a wide range of omics data to elaborately construct various context-specific differential molecular networks. However, they cannot easily gain insight into unfamiliar gene networks with the tools that are currently available for pathways resource and network analysis. They would benefit from the development of a standardized tool to compare functions of multiple biological networks quantitatively and promptly. Results To address this challenge, we developed NFPscanner, a web server for deciphering gene networks with pathway associations. Adapted from a recently reported knowledge-based framework called network fingerprint, NFPscanner integrates the annotated pathways of 7 databases, 4 algorithms, and 2 graphical visualization modules into a webtool. It implements 3 types of network analysis:Fingerprint: Deciphering gene networks and highlighting inherent pathway modules Alignment: Discovering functional associations by finding optimized node mapping between 2 gene networks Enrichment: Calculating and visualizing gene ontology (GO) and pathway enrichment for genes in networks
Users can upload gene networks to NFPscanner through the web interface and then interactively explore the networks’ functions. Conclusions NFPscanner is open-source software for non-commercial use, freely accessible at http://biotech.bmi.ac.cn/nfs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1673-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wenjian Xu
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China
| | - Yang Cao
- Tianjin Institute of Health & Environmental Medicine, 1 Dali Road, Heping District, Tianjin, 300050, China
| | - Ziwei Xie
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, 430074, Hubei, China
| | - Haochen He
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China
| | - Song He
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China
| | - Hao Hong
- Department of Biomedical Engineering, National University of Defense Technology, 109 Deya Road, Kaifu District, Changsha, 410073, Hunan, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China.
| | - Fei Li
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China.
| |
Collapse
|
39
|
Abstract
Paralleling the increasing availability of protein-protein interaction (PPI) network data, several network alignment methods have been proposed. Network alignments have been used to uncover functionally conserved network parts and to transfer annotations. However, due to the computational intractability of the network alignment problem, aligners are heuristics providing divergent solutions and no consensus exists on a gold standard, or which scoring scheme should be used to evaluate them. We comprehensively evaluate the alignment scoring schemes and global network aligners on large scale PPI data and observe that three methods, HUBALIGN, L-GRAAL and NATALIE, regularly produce the most topologically and biologically coherent alignments. We study the collective behaviour of network aligners and observe that PPI networks are almost entirely aligned with a handful of aligners that we unify into a new tool, Ulign. Ulign enables complete alignment of two networks, which traditional global and local aligners fail to do. Also, multiple mappings of Ulign define biologically relevant soft clusterings of proteins in PPI networks, which may be used for refining the transfer of annotations across networks. Hence, PPI networks are already well investigated by current aligners, so to gain additional biological insights, a paradigm shift is needed. We propose such a shift come from aligning all available data types collectively rather than any particular data type in isolation from others.
Collapse
|
40
|
Mamano N, Hayes WB. SANA: simulated annealing far outperforms many other search algorithms for biological network alignment. Bioinformatics 2017; 33:2156-2164. [DOI: 10.1093/bioinformatics/btx090] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 02/08/2017] [Indexed: 11/14/2022] Open
|
41
|
Kazemi E, Hassani H, Grossglauser M, Pezeshgi Modarres H. PROPER: global protein interaction network alignment through percolation matching. BMC Bioinformatics 2016; 17:527. [PMID: 27955623 PMCID: PMC5153870 DOI: 10.1186/s12859-016-1395-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 11/29/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The alignment of protein-protein interaction (PPI) networks enables us to uncover the relationships between different species, which leads to a deeper understanding of biological systems. Network alignment can be used to transfer biological knowledge between species. Although different PPI-network alignment algorithms were introduced during the last decade, developing an accurate and scalable algorithm that can find alignments with high biological and structural similarities among PPI networks is still challenging. RESULTS In this paper, we introduce a new global network alignment algorithm for PPI networks called PROPER. Compared to other global network alignment methods, our algorithm shows higher accuracy and speed over real PPI datasets and synthetic networks. We show that the PROPER algorithm can detect large portions of conserved biological pathways between species. Also, using a simple parsimonious evolutionary model, we explain why PROPER performs well based on several different comparison criteria. CONCLUSIONS We highlight that PROPER has high potential in further applications such as detecting biological pathways, finding protein complexes and PPI prediction. The PROPER algorithm is available at http://proper.epfl.ch .
Collapse
Affiliation(s)
- Ehsan Kazemi
- School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland.
| | - Hamed Hassani
- Department of Computer Science, ETHZ, Zurich, Switzerland
| | | | | |
Collapse
|
42
|
Gong M, Peng Z, Ma L, Huang J. Global Biological Network Alignment by Using Efficient Memetic Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:1117-1129. [PMID: 28055895 DOI: 10.1109/tcbb.2015.2511741] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
High-throughput experimental screening techniques have resulted in a large number of biological network data such as protein-protein interactions (PPI) data. The analysis of these data can enhance our understanding of cellular processes. PPI network alignment is one of the comparative analysis methods for analyzing biological networks. Research on PPI networks can identify conserved subgraphs and help us to understand evolutionary trajectories across species. Some evolutionary algorithms have been proposed for coping with PPI network alignment, but most of them are limited by the lower search efficiency due to the lack of the priori knowledge. In this paper, we propose a memetic algorithm, denoted as MeAlgn, to solve the biological network alignment by optimizing an objective function which introduces topological structure and sequence similarities. MeAlign combines genetic algorithm with a local search refinement. The genetic algorithm is to find interesting alignment solution regions, and the local search is to find optimal solutions around the regions. The proposed algorithm first develops a coarse similarity score matrix for initialization and then it uses a specific neighborhood heuristic local search strategy to find an optimal alignment. In MeAlign, the information of topological structure and sequence similarities is used to guide our mapping. Experimental results demonstrate that our algorithm can achieve a better mapping than some state-of-the-art algorithms and it makes a better balance between the network topology and nodes sequence similarities.
Collapse
|
43
|
Kursawe J, Bardenet R, Zartman JJ, Baker RE, Fletcher AG. Robust cell tracking in epithelial tissues through identification of maximum common subgraphs. J R Soc Interface 2016; 13:20160725. [PMID: 28334699 PMCID: PMC5134023 DOI: 10.1098/rsif.2016.0725] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 10/17/2016] [Indexed: 11/30/2022] Open
Abstract
Tracking of cells in live-imaging microscopy videos of epithelial sheets is a powerful tool for investigating fundamental processes in embryonic development. Characterizing cell growth, proliferation, intercalation and apoptosis in epithelia helps us to understand how morphogenetic processes such as tissue invagination and extension are locally regulated and controlled. Accurate cell tracking requires correctly resolving cells entering or leaving the field of view between frames, cell neighbour exchanges, cell removals and cell divisions. However, current tracking methods for epithelial sheets are not robust to large morphogenetic deformations and require significant manual interventions. Here, we present a novel algorithm for epithelial cell tracking, exploiting the graph-theoretic concept of a 'maximum common subgraph' to track cells between frames of a video. Our algorithm does not require the adjustment of tissue-specific parameters, and scales in sub-quadratic time with tissue size. It does not rely on precise positional information, permitting large cell movements between frames and enabling tracking in datasets acquired at low temporal resolution due to experimental constraints such as phototoxicity. To demonstrate the method, we perform tracking on the Drosophila embryonic epidermis and compare cell-cell rearrangements to previous studies in other tissues. Our implementation is open source and generally applicable to epithelial tissues.
Collapse
Affiliation(s)
- Jochen Kursawe
- Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK
| | - Rémi Bardenet
- CNRS and CRIStAL, Université de Lille, 59651 Villeneuve d'Ascq, France
| | - Jeremiah J Zartman
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, 205D McCourtney Hall of Molecular Science and Engineering, Notre Dame, IN 46556, USA
| | - Ruth E Baker
- Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK
| | - Alexander G Fletcher
- School of Mathematics and Statistics, University of Sheffield, Hicks Building, Hounsfield Road, Sheffield S3 7RH, UK
- Bateson Centre, University of Sheffield, Sheffield S10 2TN, UK
| |
Collapse
|
44
|
SUMONA: A supervised method for optimizing network alignment. Comput Biol Chem 2016; 63:41-51. [DOI: 10.1016/j.compbiolchem.2016.03.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2016] [Revised: 03/17/2016] [Accepted: 03/17/2016] [Indexed: 11/20/2022]
|
45
|
Elmsallati A, Clark C, Kalita J. Global Alignment of Protein-Protein Interaction Networks: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:689-705. [PMID: 26336140 DOI: 10.1109/tcbb.2015.2474391] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we survey algorithms that perform global alignment of networks or graphs. Global network alignment aligns two or more given networks to find the best mapping from nodes in one network to nodes in other networks. Since graphs are a common method of data representation, graph alignment has become important with many significant applications. Protein-protein interactions can be modeled as networks and aligning these networks of protein interactions has many applications in biological research. In this survey, we review algorithms for global pairwise alignment highlighting various proposed approaches, and classify them based on their methodology. Evaluation metrics that are used to measure the quality of the resulting alignments are also surveyed. We discuss and present a comparison between selected aligners on the same datasets and evaluate using the same evaluation metrics. Finally, a quick overview of the most popular databases of protein interaction networks is presented focusing on datasets that have been used recently.
Collapse
|
46
|
Emmert-Streib F, Dehmer M, Shi Y. Fifty years of graph matching, network alignment and network comparison. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.01.074] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
47
|
Faisal FE, Meng L, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:3. [PMID: 28194172 PMCID: PMC5270500 DOI: 10.1186/s13637-015-0022-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 11/10/2022]
Abstract
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Lei Meng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Joseph Crawford
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| |
Collapse
|
48
|
Natalie 2.0: Sparse Global Network Alignment as a Special Case of Quadratic Assignment. ALGORITHMS 2015. [DOI: 10.3390/a8041035] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
49
|
Chuanchao Zhang, Juan Liu, Qianqian Shi, Tao Zeng, Chen L. Identification of phenotypic networks based on whole transcriptome by comparative network decomposition. 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) 2015:189-194. [DOI: 10.1109/bibm.2015.7359679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
50
|
Dohrmann J, Puchin J, Singh R. Global multiple protein-protein interaction network alignment by combining pairwise network alignments. BMC Bioinformatics 2015; 16 Suppl 13:S11. [PMID: 26423128 PMCID: PMC4597059 DOI: 10.1186/1471-2105-16-s13-s11] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND A wealth of protein interaction data has become available in recent years, creating an urgent need for powerful analysis techniques. In this context, the problem of finding biologically meaningful correspondences between different protein-protein interaction networks (PPIN) is of particular interest. The PPIN of a species can be compared with that of other species through the process of PPIN alignment. Such an alignment can provide insight into basic problems like species evolution and network component function determination, as well as translational problems such as target identification and elucidation of mechanisms of disease spread. Furthermore, multiple PPINs can be aligned simultaneously, expanding the analytical implications of the result. While there are several pairwise network alignment algorithms, few methods are capable of multiple network alignment. RESULTS We propose SMAL, a MNA algorithm based on the philosophy of scaffold-based alignment. SMAL is capable of converting results from any global pairwise alignment algorithms into a MNA in linear time. Using this method, we have built multiple network alignments based on combining pairwise alignments from a number of publicly available (pairwise) network aligners. We tested SMAL using PPINs of eight species derived from the IntAct repository and employed a number of measures to evaluate performance. Additionally, as part of our experimental investigations, we compared the effectiveness of SMAL while aligning up to eight input PPINs, and examined the effect of scaffold network choice on the alignments. CONCLUSIONS A key advantage of SMAL lies in its ability to create MNAs through the use of pairwise network aligners for which native MNA implementations do not exist. Experiments indicate that the performance of SMAL was comparable to that of the native MNA implementation of established methods such as IsoRankN and SMETANA. However, in terms of computational time, SMAL was significantly faster. SMAL was also able to retain many important characteristics of the native pairwise alignments, such as the number of aligned nodes and edges, as well as the functional and homologene similarity of aligned nodes. The speed, flexibility and the ability to retain prior correspondences as new networks are aligned, makes SMAL a compelling choice for alignment of multiple large networks.
Collapse
|