51
|
Dohrmann J, Puchin J, Singh R. Global multiple protein-protein interaction network alignment by combining pairwise network alignments. BMC Bioinformatics 2015; 16 Suppl 13:S11. [PMID: 26423128 PMCID: PMC4597059 DOI: 10.1186/1471-2105-16-s13-s11] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND A wealth of protein interaction data has become available in recent years, creating an urgent need for powerful analysis techniques. In this context, the problem of finding biologically meaningful correspondences between different protein-protein interaction networks (PPIN) is of particular interest. The PPIN of a species can be compared with that of other species through the process of PPIN alignment. Such an alignment can provide insight into basic problems like species evolution and network component function determination, as well as translational problems such as target identification and elucidation of mechanisms of disease spread. Furthermore, multiple PPINs can be aligned simultaneously, expanding the analytical implications of the result. While there are several pairwise network alignment algorithms, few methods are capable of multiple network alignment. RESULTS We propose SMAL, a MNA algorithm based on the philosophy of scaffold-based alignment. SMAL is capable of converting results from any global pairwise alignment algorithms into a MNA in linear time. Using this method, we have built multiple network alignments based on combining pairwise alignments from a number of publicly available (pairwise) network aligners. We tested SMAL using PPINs of eight species derived from the IntAct repository and employed a number of measures to evaluate performance. Additionally, as part of our experimental investigations, we compared the effectiveness of SMAL while aligning up to eight input PPINs, and examined the effect of scaffold network choice on the alignments. CONCLUSIONS A key advantage of SMAL lies in its ability to create MNAs through the use of pairwise network aligners for which native MNA implementations do not exist. Experiments indicate that the performance of SMAL was comparable to that of the native MNA implementation of established methods such as IsoRankN and SMETANA. However, in terms of computational time, SMAL was significantly faster. SMAL was also able to retain many important characteristics of the native pairwise alignments, such as the number of aligned nodes and edges, as well as the functional and homologene similarity of aligned nodes. The speed, flexibility and the ability to retain prior correspondences as new networks are aligned, makes SMAL a compelling choice for alignment of multiple large networks.
Collapse
|
52
|
Thompson D, Regev A, Roy S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu Rev Cell Dev Biol 2015; 31:399-428. [PMID: 26355593 DOI: 10.1146/annurev-cellbio-100913-012908] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
Collapse
Affiliation(s)
- Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | | | | |
Collapse
|
53
|
GreedyPlus: An Algorithm for the Alignment of Interface Interaction Networks. Sci Rep 2015; 5:12074. [PMID: 26165520 PMCID: PMC4499810 DOI: 10.1038/srep12074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 06/15/2015] [Indexed: 11/08/2022] Open
Abstract
The increasing ease and accuracy of protein-protein interaction detection has resulted in the ability to map the interactomes of multiple species. We now have an opportunity to compare species to better understand how interactomes evolve. As DNA and protein sequence alignment algorithms were required for comparative genomics, network alignment algorithms are required for comparative interactomics. A number of network alignment methods have been developed for protein-protein interaction networks, where proteins are represented as vertices linked by edges if they interact. Recently, protein interactions have been mapped at the level of amino acid positions, which can be represented as an interface-interaction network (IIN), where vertices represent binding sites, such as protein domains and short sequence motifs. However, current algorithms are not designed to align these networks and generally fail to do so in practice. We present a greedy algorithm, GreedyPlus, for IIN alignment, combining data from diverse sources, including network, protein and binding site properties, to identify putative orthologous relationships between interfaces in available worm and yeast data. GreedyPlus is fast and simple, allowing for easy customization of behaviour, yet still capable of generating biologically meaningful network alignments.
Collapse
|
54
|
Clark C, Kalita J. A multiobjective memetic algorithm for PPI network alignment. Bioinformatics 2015; 31:1988-1998. [PMID: 25667548 DOI: 10.1093/bioinformatics/btv063] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 01/27/2015] [Indexed: 01/03/2025] Open
Abstract
MOTIVATION There recently has been great interest in aligning protein-protein interaction (PPI) networks to identify potentially orthologous proteins between species. It is thought that the topological information contained in these networks will yield better orthology predictions than sequence similarity alone. Recent work has found that existing aligners have difficulty making use of both topological and sequence similarity when aligning, with either one or the other being better matched. This can be at least partially attributed to the fact that existing aligners try to combine these two potentially conflicting objectives into a single objective. RESULTS We present Optnetalign, a multiobjective memetic algorithm for the problem of PPI network alignment that uses extremely efficient swap-based local search, mutation and crossover operations to create a population of alignments. This algorithm optimizes the conflicting goals of topological and sequence similarity using the concept of Pareto dominance, exploring the tradeoff between the two objectives as it runs. This allows us to produce many high-quality candidate alignments in a single run. Our algorithm produces alignments that are much better compromises between topological and biological match quality than previous work, while better characterizing the diversity of possible good alignments between two networks. Our aligner's results have several interesting implications for future research on alignment evaluation, the design of network alignment objectives and the interpretation of alignment results. AVAILABILITY AND IMPLEMENTATION The C++ source code to our program, along with compilation and usage instructions, is available at https://github.com/crclark/optnetaligncpp/
Collapse
Affiliation(s)
- Connor Clark
- Department of Computer Science, University of Colorado Colorado Springs, Colorado Springs, CO 80918, USA
| | - Jugal Kalita
- Department of Computer Science, University of Colorado Colorado Springs, Colorado Springs, CO 80918, USA
| |
Collapse
|
55
|
Crawford J, Sun Y, Milenković T. Fair evaluation of global network aligners. Algorithms Mol Biol 2015; 10:19. [PMID: 26060505 PMCID: PMC4460690 DOI: 10.1186/s13015-015-0050-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2014] [Accepted: 05/10/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analogous to genomic sequence alignment, biological network alignment identifies conserved regions between networks of different species. Then, function can be transferred from well- to poorly-annotated species between aligned network regions. Network alignment typically encompasses two algorithmic components: node cost function (NCF), which measures similarities between nodes in different networks, and alignment strategy (AS), which uses these similarities to rapidly identify high-scoring alignments. Different methods use both different NCFs and different ASs. Thus, it is unclear whether the superiority of a method comes from its NCF, its AS, or both. We already showed on state-of-the-art methods, MI-GRAAL and IsoRankN, that combining NCF of one method and AS of another method can give a new superior method. Here, we evaluate MI-GRAAL against a newer approach, GHOST, by mixing-and-matching the methods' NCFs and ASs to potentially further improve alignment quality. While doing so, we approach important questions that have not been asked systematically thus far. First, we ask how much of the NCF information should come from protein sequence data compared to network topology data. Existing methods determine this parameter more-less arbitrarily, which could affect alignment quality. Second, when topological information is used in NCF, we ask how large the size of the neighborhoods of the compared nodes should be. Existing methods assume that the larger the neighborhood size, the better. RESULTS Our findings are as follows. MI-GRAAL's NCF is superior to GHOST's NCF, while the performance of the methods' ASs is data-dependent. Thus, for data on which GHOST's AS is superior to MI-GRAAL's AS, the combination of MI-GRAAL's NCF and GHOST's AS represents a new superior method. Also, which amount of sequence information is used within NCF does not affect alignment quality, while the inclusion of topological information is crucial for producing good alignments. Finally, larger neighborhood sizes are preferred, but often, it is the second largest size that is superior. Using this size instead of the largest one would decrease computational complexity. CONCLUSION Taken together, our results represent general recommendations for a fair evaluation of network alignment methods and in particular of two-stage NCF-AS approaches.
Collapse
|
56
|
Davis D, Yaveroğlu ÖN, Malod-Dognin N, Stojmirovic A, Pržulj N. Topology-function conservation in protein-protein interaction networks. Bioinformatics 2015; 31:1632-9. [PMID: 25609797 PMCID: PMC4426845 DOI: 10.1093/bioinformatics/btv026] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 12/05/2014] [Accepted: 01/11/2015] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Proteins underlay the functioning of a cell and the wiring of proteins in protein-protein interaction network (PIN) relates to their biological functions. Proteins with similar wiring in the PIN (topology around them) have been shown to have similar functions. This property has been successfully exploited for predicting protein functions. Topological similarity is also used to guide network alignment algorithms that find similarly wired proteins between PINs of different species; these similarities are used to transfer annotation across PINs, e.g. from model organisms to human. To refine these functional predictions and annotation transfers, we need to gain insight into the variability of the topology-function relationships. For example, a function may be significantly associated with specific topologies, while another function may be weakly associated with several different topologies. Also, the topology-function relationships may differ between different species. RESULTS To improve our understanding of topology-function relationships and of their conservation among species, we develop a statistical framework that is built upon canonical correlation analysis. Using the graphlet degrees to represent the wiring around proteins in PINs and gene ontology (GO) annotations to describe their functions, our framework: (i) characterizes statistically significant topology-function relationships in a given species, and (ii) uncovers the functions that have conserved topology in PINs of different species, which we term topologically orthologous functions. We apply our framework to PINs of yeast and human, identifying seven biological process and two cellular component GO terms to be topologically orthologous for the two organisms.
Collapse
Affiliation(s)
- Darren Davis
- California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA
| | - Ömer Nebil Yaveroğlu
- California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA
| | - Noël Malod-Dognin
- California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA
| | - Aleksandar Stojmirovic
- California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA
| | - Nataša Pržulj
- California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA
| |
Collapse
|
57
|
Vijayan V, Saraph V, Milenković T. MAGNA++: Maximizing Accuracy in Global Network Alignment via both node and edge conservation. Bioinformatics 2015; 31:2409-11. [PMID: 25792552 DOI: 10.1093/bioinformatics/btv161] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 03/14/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Network alignment aims to find conserved regions between different networks. Existing methods aim to maximize total similarity over all aligned nodes (i.e. node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Thus, we recently introduced MAGNA (Maximizing Accuracy in Global Network Alignment) to directly maximize edge conservation while producing alignments and showed its superiority over the existing methods. Here, we extend the original MAGNA with several important algorithmic advances into a new MAGNA++ framework. RESULTS MAGNA++ introduces several novelties: (i) it simultaneously maximizes any one of three different measures of edge conservation (including our recent superior [Formula: see text] measure) and any desired node conservation measure, which further improves alignment quality compared with maximizing only node conservation or only edge conservation; (ii) it speeds up the original MAGNA algorithm by parallelizing it to automatically use all available resources, as well as by reimplementing the edge conservation measures more efficiently; (iii) it provides a friendly graphical user interface for easy use by domain (e.g. biological) scientists; and (iv) at the same time, MAGNA++ offers source code for easy extensibility by computational scientists. AVAILABILITY AND IMPLEMENTATION http://www.nd.edu/∼cone/MAGNA++/
Collapse
Affiliation(s)
- V Vijayan
- Department of Computer Science and Engineering, ECK Institute for Global Health, Interdisciplinary Center for Network Science and Application, University of Notre Dame, IN 46556, USA and
| | - V Saraph
- Department of Computer Science, Brown University, Providence, RI 02912, USA
| | - T Milenković
- Department of Computer Science and Engineering, ECK Institute for Global Health, Interdisciplinary Center for Network Science and Application, University of Notre Dame, IN 46556, USA and
| |
Collapse
|
58
|
Alkan F, Erten C. SiPAN: simultaneous prediction and alignment of protein-protein interaction networks. Bioinformatics 2015; 31:2356-63. [PMID: 25788620 DOI: 10.1093/bioinformatics/btv160] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 03/14/2015] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Network prediction as applied to protein-protein interaction (PPI) networks has received considerable attention within the last decade. Because of the limitations of experimental techniques for interaction detection and network construction, several computational methods for PPI network reconstruction and growth have been suggested. Such methods usually limit the scope of study to a single network, employing data based on genomic context, structure, domain, sequence information or existing network topology. Incorporating multiple species network data for network reconstruction and growth entails the design of novel models encompassing both network reconstruction and network alignment, since the goal of network alignment is to provide functionally orthologous proteins from multiple networks and such orthology information can be used in guiding interolog transfers. However, such an approach raises the classical chicken or egg problem; alignment methods assume error-free networks, whereas network prediction via orthology works affectively if the functionally orthologous proteins are determined with high precision. Thus to resolve this intertwinement, we propose a framework to handle both problems simultaneously, that of SImultaneous Prediction and Alignment of Networks (SiPAN). RESULTS We present an algorithm that solves the SiPAN problem in accordance with its simultaneous nature. Bearing the same name as the defined problem itself, the SiPAN algorithm employs state-of-the-art alignment and topology-based interaction confidence construction algorithms, which are used as benchmark methods for comparison purposes as well. To demonstrate the effectiveness of the proposed network reconstruction via SiPAN, we consider two scenarios; one that preserves the network sizes and the other where the network sizes are increased. Through extensive tests on real-world biological data, we show that the network qualities of SiPAN reconstructions are as good as those of original networks and in some cases SiPAN networks are even better, especially for the former scenario. An alternative state-of-the-art network reconstruction algorithm random walk with resistance produces networks considerably worse than the original networks and those reproduced via SiPAN in both cases. AVAILABILITY AND IMPLEMENTATION Freely available at http://webprs.khas.edu.tr/∼cesim/SiPAN.tar.gz.
Collapse
Affiliation(s)
- Ferhat Alkan
- Center for Non-Coding RNA in Technology and Health, Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Grønnegardsvej 3, DK-1870 Frederiksberg, Denmark and Department of Computer Engineering, Kadir Has University, Cibali, Istanbul 34083, Turkey
| | - Cesim Erten
- Department of Computer Engineering, Kadir Has University, Cibali, Istanbul 34083, Turkey
| |
Collapse
|
59
|
Malod-Dognin N, Pržulj N. L-GRAAL: Lagrangian graphlet-based network aligner. ACTA ACUST UNITED AC 2015; 31:2182-9. [PMID: 25725498 PMCID: PMC4481854 DOI: 10.1093/bioinformatics/btv130] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Accepted: 02/25/2015] [Indexed: 12/31/2022]
Abstract
Motivation: Discovering and understanding patterns in networks of protein–protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. A few methods have been proposed for global PPI network alignments, but because of NP-completeness of underlying sub-graph isomorphism problem, producing topologically and biologically accurate alignments remains a challenge. Results: We introduce a novel global network alignment tool, Lagrangian GRAphlet-based ALigner (L-GRAAL), which directly optimizes both the protein and the interaction functional conservations, using a novel alignment search heuristic based on integer programming and Lagrangian relaxation. We compare L-GRAAL with the state-of-the-art network aligners on the largest available PPI networks from BioGRID and observe that L-GRAAL uncovers the largest common sub-graphs between the networks, as measured by edge-correctness and symmetric sub-structures scores, which allow transferring more functional information across networks. We assess the biological quality of the protein mappings using the semantic similarity of their Gene Ontology annotations and observe that L-GRAAL best uncovers functionally conserved proteins. Furthermore, we introduce for the first time a measure of the semantic similarity of the mapped interactions and show that L-GRAAL also uncovers best functionally conserved interactions. In addition, we illustrate on the PPI networks of baker's yeast and human the ability of L-GRAAL to predict new PPIs. Finally, L-GRAAL's results are the first to show that topological information is more important than sequence information for uncovering functionally conserved interactions. Availability and implementation: L-GRAAL is coded in C++. Software is available at: http://bio-nets.doc.ic.ac.uk/L-GRAAL/. Contact:n.malod-dognin@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Nataša Pržulj
- Department of Computing, Imperial College London, London, UK
| |
Collapse
|
60
|
Sun Y, Crawford J, Tang J, Milenković T. Simultaneous Optimization of both Node and Edge Conservation in Network Alignment via WAVE. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-662-48221-6_2] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
61
|
Hu J, Reinert K. LocalAli: an evolutionary-based local alignment approach to identify functionally conserved modules in multiple networks. ACTA ACUST UNITED AC 2014; 31:363-72. [PMID: 25282642 DOI: 10.1093/bioinformatics/btu652] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Sequences and protein interaction data are of significance to understand the underlying molecular mechanism of organisms. Local network alignment is one of key systematic ways for predicting protein functions, identifying functional modules and understanding the phylogeny from these data. Most of currently existing tools, however, encounter their limitations, which are mainly concerned with scoring scheme, speed and scalability. Therefore, there are growing demands for sophisticated network evolution models and efficient local alignment algorithms. RESULTS We developed a fast and scalable local network alignment tool called LocalAli for the identification of functionally conserved modules in multiple networks. In this algorithm, we firstly proposed a new framework to reconstruct the evolution history of conserved modules based on a maximum-parsimony evolutionary model. By relying on this model, LocalAli facilitates interpretation of resulting local alignments in terms of conserved modules, which have been evolved from a common ancestral module through a series of evolutionary events. A meta-heuristic method simulated annealing was used to search for the optimal or near-optimal inner nodes (i.e. ancestral modules) of the evolutionary tree. To evaluate the performance and the statistical significance, LocalAli were tested on 26 real datasets and 1040 randomly generated datasets. The results suggest that LocalAli outperforms all existing algorithms in terms of coverage, consistency and scalability, meanwhile retains a high precision in the identification of functionally coherent subnetworks. AVAILABILITY The source code and test datasets are freely available for download under the GNU GPL v3 license at https://code.google.com/p/localali/. CONTACT jialu.hu@fu-berlin.de or knut.reinert@fu-berlin.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jialu Hu
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany
| |
Collapse
|