1
|
Hayes WB. Exact p-values for global network alignments via combinatorial analysis of shared GO terms : REFANGO: Rigorous Evaluation of Functional Alignments of Networks using Gene Ontology. J Math Biol 2024; 88:50. [PMID: 38551701 PMCID: PMC10980677 DOI: 10.1007/s00285-024-02058-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 01/21/2024] [Accepted: 02/05/2024] [Indexed: 04/01/2024]
Abstract
Network alignment aims to uncover topologically similar regions in the protein-protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no "gold standard" exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown's Method. We note that, just as with BLAST's p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, UC Irvine, Irvine, USA.
| |
Collapse
|
2
|
Piña JS, Orozco-Arias S, Tobón-Orozco N, Camargo-Forero L, Tabares-Soto R, Guyot R. G-SAIP: Graphical Sequence Alignment Through Parallel Programming in the Post-Genomic Era. Evol Bioinform Online 2023; 19:11769343221150585. [PMID: 36703866 PMCID: PMC9871978 DOI: 10.1177/11769343221150585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 12/23/2022] [Indexed: 01/22/2023] Open
Abstract
A common task in bioinformatics is to compare DNA sequences to identify similarities between organisms at the sequence level. An approach to such comparison is the dot-plots, a 2-dimensional graphical representation to analyze DNA or protein alignments. Dot-plots alignment software existed before the sequencing revolution, and now there is an ongoing limitation when dealing with large-size sequences, resulting in very long execution times. High-Performance Computing (HPC) techniques have been successfully used in many applications to reduce computing times, but so far, very few applications for graphical sequence alignment using HPC have been reported. Here, we present G-SAIP (Graphical Sequence Alignment in Parallel), a software capable of spawning multiple distributed processes on CPUs, over a supercomputing infrastructure to speed up the execution time for dot-plot generation up to 1.68× compared with other current fastest tools, improve the efficiency for comparative structural genomic analysis, phylogenetics because the benefits of pairwise alignments for comparison between genomes, repetitive structure identification, and assembly quality checking.
Collapse
Affiliation(s)
- Johan S. Piña
- Department of Data Science, People
Contact, Manizales, Caldas, Colombia,Department of Computer Science,
Universidad Autónoma de Manizales, Manizales, Caldas, Colombia,Johan S. Piña, Department of Computer
Science, Universidad Autónoma de Manizales, Antigua estación del ferrocarril,
Manizales, Caldas 170004, Colombia.
| | - Simon Orozco-Arias
- Department of Computer Science,
Universidad Autónoma de Manizales, Manizales, Caldas, Colombia,Department of Systems and Informatics,
Universidad de Caldas, Manizales, Caldas, Colombia
| | - Nicolas Tobón-Orozco
- Department of Computer Science,
Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | | | - Reinel Tabares-Soto
- Department of Electronics and
Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Romain Guyot
- Department of Electronics and
Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia,Institut de Recherche pour le
Développement, CIRAD, University of Montpellier, Montpellier, France
| |
Collapse
|
3
|
Wang S, Chen X, Frederisy BJ, Mbakogu BA, Kanne AD, Khosravi P, Hayes WB. On the current failure-but bright future-of topology-driven biological network alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:1-44. [PMID: 35871888 DOI: 10.1016/bs.apcsb.2022.05.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Since the function of a protein is defined by its interaction partners, and since we expect similar interaction patterns across species, the alignment of protein-protein interaction (PPI) networks between species, based on network topology alone, should uncover functionally related proteins across species. Surprisingly, despite the publication of more than fifty algorithms aimed at performing PPI network alignment, few have demonstrated a statistically significant link between network topology and functional similarity, and none have demonstrated that orthologs can be recovered using network topology alone. We find that the major contributing factors to this surprising failure are: (i) edge densities in most currently available experimental PPI networks are demonstrably too low to expect topological network alignment to succeed; (ii) in the few cases where the edge densities are high enough, some measures of topological similarity easily uncover functionally similar proteins while others do not; and (iii) most network alignment algorithms to date perform poorly at optimizing even their own topological objective functions, hampering their ability to use topology effectively. We demonstrate that SANA-the Simulated Annealing Network Aligner-significantly outperforms existing aligners at optimizing their own objective functions, even achieving near-optimal solutions when the optimal solution is known. We offer the first demonstration of global network alignments based on topology alone that align functionally similar proteins with p-values in some cases below 10-300. We predict that topological network alignment has a bright future as edge densities increase toward the value where good alignments become possible. We demonstrate that when enough common topology is present at high enough edge densities-for example in the recent, partly synthetic networks of the Integrated Interaction Database-topological network alignment easily recovers most orthologs, paving the way toward high-throughput functional prediction based on topology-driven network alignment.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Xiaoyin Chen
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Brent J Frederisy
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Benedict A Mbakogu
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Amy D Kanne
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Pasha Khosravi
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, United States.
| |
Collapse
|
4
|
Ma L, Shao Z, Li L, Huang J, Wang S, Lin Q, Li J, Gong M, Nandi AK. Heuristics and metaheuristics for biological network alignment: A review. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.08.156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Menor-Flores M, Vega-Rodríguez MA. Decomposition-based multi-objective optimization approach for PPI network alignment. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Biological sequence analysis. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00003-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
7
|
Pairwise Biological Network Alignment Based on Discrete Bat Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5548993. [PMID: 34777564 PMCID: PMC8580637 DOI: 10.1155/2021/5548993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 09/29/2021] [Accepted: 10/13/2021] [Indexed: 11/18/2022]
Abstract
The development of high-throughput technology has provided a reliable technical guarantee for an increased amount of available data on biological networks. Network alignment is used to analyze these data to identify conserved functional network modules and understand evolutionary relationships across species. Thus, an efficient computational network aligner is needed for network alignment. In this paper, the classic bat algorithm is discretized and applied to the network alignment. The bat algorithm initializes the population randomly and then searches for the optimal solution iteratively. Based on the bat algorithm, the global pairwise alignment algorithm BatAlign is proposed. In BatAlign, the individual velocity and the position are represented by a discrete code. BatAlign uses a search algorithm based on objective function that uses the number of conserved edges as the objective function. The similarity between the networks is used to initialize the population. The experimental results showed that the algorithm was able to match proteins with high functional consistency and reach a relatively high topological quality.
Collapse
|
8
|
Ma L, Wang S, Lin Q, Li J, You Z, Huang J, Gong M. Multi-Neighborhood Learning for Global Alignment in Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2598-2611. [PMID: 32305933 DOI: 10.1109/tcbb.2020.2985838] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The global alignment of biological networks (GABN) aims to find an optimal alignment between proteins across species, such that both the biological structures and the topological structures of the proteins are maximally conserved. The research on GABN has attracted great attention due to its applications on species evolution, orthology detection and genetic analyses. Most of the existing methods for GABN are difficult to obtain a good tradeoff between the conservation of the biological structures and topological structures. In this paper, we propose a multi-neighborhood learning method for solving GABN (called as CLMNA). CLMNA first models GABN as an optimization of a weighted similarity which evaluates the conserved biological and topological similarities of an alignment, and then it combines a first-proximity, second-proximity and individual-aware proximity learning algorithm to solve the modeled problem. Finally, systematic experiments on 10 pairs of biological networks across 5 species show the superiority of CLMNA over the state-of-the-art network alignment algorithms. They also validate the effectiveness of CLMNA as a refinement method on improving the performance of the compared algorithms.
Collapse
|
9
|
Vittadello ST, Stumpf MPH. Model comparison via simplicial complexes and persistent homology. ROYAL SOCIETY OPEN SCIENCE 2021; 8:211361. [PMID: 34659787 PMCID: PMC8511761 DOI: 10.1098/rsos.211361] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 09/16/2021] [Indexed: 05/21/2023]
Abstract
In many scientific and technological contexts, we have only a poor understanding of the structure and details of appropriate mathematical models. We often, therefore, need to compare different models. With available data we can use formal statistical model selection to compare and contrast the ability of different mathematical models to describe such data. There is, however, a lack of rigorous methods to compare different models a priori. Here, we develop and illustrate two such approaches that allow us to compare model structures in a systematic way by representing models as simplicial complexes. Using well-developed concepts from simplicial algebraic topology, we define a distance between models based on their simplicial representations. Employing persistent homology with a flat filtration provides for alternative representations of the models as persistence intervals, which represent model structure, from which the model distances are also obtained. We then expand on this measure of model distance to study the concept of model equivalence to determine the conceptual similarity of models. We apply our methodology for model comparison to demonstrate an equivalence between a positional-information model and a Turing-pattern model from developmental biology, constituting a novel observation for two classes of models that were previously regarded as unrelated.
Collapse
Affiliation(s)
- Sean T. Vittadello
- School of BioSciences and School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Michael P. H. Stumpf
- School of BioSciences and School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
10
|
Ovens K, Eames BF, McQuillan I. Comparative Analyses of Gene Co-expression Networks: Implementations and Applications in the Study of Evolution. Front Genet 2021; 12:695399. [PMID: 34484293 PMCID: PMC8414652 DOI: 10.3389/fgene.2021.695399] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
Similarities and differences in the associations of biological entities among species can provide us with a better understanding of evolutionary relationships. Often the evolution of new phenotypes results from changes to interactions in pre-existing biological networks and comparing networks across species can identify evidence of conservation or adaptation. Gene co-expression networks (GCNs), constructed from high-throughput gene expression data, can be used to understand evolution and the rise of new phenotypes. The increasing abundance of gene expression data makes GCNs a valuable tool for the study of evolution in non-model organisms. In this paper, we cover motivations for why comparing these networks across species can be valuable for the study of evolution. We also review techniques for comparing GCNs in the context of evolution, including local and global methods of graph alignment. While some protein-protein interaction (PPI) bioinformatic methods can be used to compare co-expression networks, they often disregard highly relevant properties, including the existence of continuous and negative values for edge weights. Also, the lack of comparative datasets in non-model organisms has hindered the study of evolution using PPI networks. We also discuss limitations and challenges associated with cross-species comparison using GCNs, and provide suggestions for utilizing co-expression network alignments as an indispensable tool for evolutionary studies going forward.
Collapse
Affiliation(s)
- Katie Ovens
- Augmented Intelligence & Precision Health Laboratory (AIPHL), Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - B. Frank Eames
- Department of Anatomy, Physiology, & Pharmacology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
11
|
Racedo S, Portnoy I, Vélez JI, San-Juan-Vergara H, Sanjuan M, Zurek E. A new pipeline for structural characterization and classification of RNA-Seq microbiome data. BioData Min 2021; 14:31. [PMID: 34243809 PMCID: PMC8268467 DOI: 10.1186/s13040-021-00266-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 06/16/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. RESULTS Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method's classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method's performance with that of two state-of-the-art methods. CONCLUSIONS Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.
Collapse
Affiliation(s)
| | - Ivan Portnoy
- Universidad del Norte, Barranquilla, Colombia.
- Productivity and Innovation Department, Universidad de la Costa, Calle 58 # 55-56, Barranquilla, Colombia.
| | | | | | | | | |
Collapse
|
12
|
Melkus G, Rucevskis P, Celms E, Čerāns K, Freivalds K, Kikusts P, Lace L, Opmanis M, Rituma D, Viksna J. Network motif-based analysis of regulatory patterns in paralogous gene pairs. J Bioinform Comput Biol 2021; 18:2040008. [PMID: 32698721 DOI: 10.1142/s0219720020400089] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Current high-throughput experimental techniques make it feasible to infer gene regulatory interactions at the whole-genome level with reasonably good accuracy. Such experimentally inferred regulatory networks have become available for a number of simpler model organisms such as S. cerevisiae, and others. The availability of such networks provides an opportunity to compare gene regulatory processes at the whole genome level, and in particular, to assess similarity of regulatory interactions for homologous gene pairs either from the same or from different species. We present here a new technique for analyzing the regulatory interaction neighborhoods of paralogous gene pairs. Our central focus is the analysis of S. cerevisiae gene interaction graphs, which are of particular interest due to the ancestral whole-genome duplication (WGD) that allows to distinguish between paralogous transcription factors that are traceable to this duplication event and other paralogues. Similar analysis is also applied to E. coli and C. elegans networks. We compare paralogous gene pairs according to the presence and size of bi-fan arrays, classically associated in the literature with gene duplication, within other network motifs. We further extend this framework beyond transcription factor comparison to obtain topology-based similarity metrics based on the overlap of interaction neighborhoods applicable to most genes in a given organism. We observe that our network divergence metrics show considerably larger similarity between paralogues, especially those traceable to WGD. This is the case for both yeast and C. elegans, but not for E. coli regulatory network. While there is no obvious cross-species link between metrics, different classes of paralogues show notable differences in interaction overlap, with traceable duplications tending toward higher overlap compared to genes with shared protein families. Our findings indicate that divergence in paralogous interaction networks reflects a shared genetic origin, and that our approach may be useful for investigating structural similarity in the interaction networks of paralogous genes.
Collapse
Affiliation(s)
- Gatis Melkus
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| | - Peteris Rucevskis
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| | - Edgars Celms
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| | - Kārlis Čerāns
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| | - Karlis Freivalds
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| | - Paulis Kikusts
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| | - Lelde Lace
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| | - Mārtiņš Opmanis
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| | - Darta Rituma
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| | - Juris Viksna
- Institute of Mathematics and Computer Science, University of Latvia, Rainis blvd. 29, Riga, LV-1459, Latvia
| |
Collapse
|
13
|
Dondi R, Hosseinzadeh MM, Guzzi PH. A novel algorithm for finding top-k weighted overlapping densest connected subgraphs in dual networks. APPLIED NETWORK SCIENCE 2021; 6:40. [PMID: 34124340 PMCID: PMC8179714 DOI: 10.1007/s41109-021-00381-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 05/20/2021] [Indexed: 06/12/2023]
Abstract
The use of networks for modelling and analysing relations among data is currently growing. Recently, the use of a single networks for capturing all the aspects of some complex scenarios has shown some limitations. Consequently, it has been proposed to use Dual Networks (DN), a pair of related networks, to analyse complex systems. The two graphs in a DN have the same set of vertices and different edge sets. Common subgraphs among these networks may convey some insights about the modelled scenarios. For instance, the detection of the Top-k Densest Connected subgraphs, i.e. a set k subgraphs having the largest density in the conceptual network which are also connected in the physical network, may reveal set of highly related nodes. After proposing a formalisation of the approach, we propose a heuristic to find a solution, since the problem is computationally hard. A set of experiments on synthetic and real networks is also presented to support our approach.
Collapse
Affiliation(s)
- Riccardo Dondi
- Department of Science, University of Bergamo, Bergamo, Italy
| | | | - Pietro H. Guzzi
- Department of Surgical and Medical Sciences, Magna Graecia University, Catanzaro, Italy
| |
Collapse
|
14
|
Alcalá A, Alberich R, Llabrés M, Rosselló F, Valiente G. AligNet: alignment of protein-protein interaction networks. BMC Bioinformatics 2020; 21:265. [PMID: 33203353 PMCID: PMC7672851 DOI: 10.1186/s12859-020-3502-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 04/16/2020] [Indexed: 11/23/2022] Open
Abstract
Background All molecular functions and biological processes are carried out by groups of proteins that interact with each other. Metaproteomic data continuously generates new proteins whose molecular functions and relations must be discovered. A widely accepted structure to model functional relations between proteins are protein-protein interaction networks (PPIN), and their analysis and alignment has become a key ingredient in the study and prediction of protein-protein interactions, protein function, and evolutionary conserved assembly pathways of protein complexes. Several PPIN aligners have been proposed, but attaining the right balance between network topology and biological information is one of the most difficult and key points in the design of any PPIN alignment algorithm. Results Motivated by the challenge of well-balanced and efficient algorithms, we have designed and implemented AligNet, a parameter-free pairwise PPIN alignment algorithm aimed at bridging the gap between topologically efficient and biologically meaningful matchings. A comparison of the results obtained with AligNet and with the best aligners shows that AligNet achieves indeed a good balance between topological and biological matching. Conclusion In this paper we present AligNet, a new pairwise global PPIN aligner that produces biologically meaningful alignments, by achieving a good balance between structural matching and protein function conservation, and more efficient computations than state-of-the-art tools.
Collapse
Affiliation(s)
- Adrià Alcalá
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122, Spain.,Balearic Islands Health Research Institute (IdISBa), Palma de Mallorca, E-07010, Spain
| | - Ricardo Alberich
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122, Spain.,Balearic Islands Health Research Institute (IdISBa), Palma de Mallorca, E-07010, Spain
| | - Mercè Llabrés
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122, Spain. .,Balearic Islands Health Research Institute (IdISBa), Palma de Mallorca, E-07010, Spain.
| | - Francesc Rosselló
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122, Spain.,Balearic Islands Health Research Institute (IdISBa), Palma de Mallorca, E-07010, Spain
| | - Gabriel Valiente
- Algorithms, Bioinformatics, Complexity and Formal Methods Research Group, Technical University of Catalonia, Barcelona, E-08034, Spain
| |
Collapse
|
15
|
Llabrés M, Riera G, Rosselló F, Valiente G. Alignment of biological networks by integer linear programming: virus-host protein-protein interaction networks. BMC Bioinformatics 2020; 21:434. [PMID: 33203352 PMCID: PMC7671827 DOI: 10.1186/s12859-020-03733-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 09/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The alignment of protein-protein interaction networks was recently formulated as an integer quadratic programming problem, along with a linearization that can be solved by integer linear programming software tools. However, the resulting integer linear program has a huge number of variables and constraints, rendering it of no practical use. RESULTS We present a compact integer linear programming reformulation of the protein-protein interaction network alignment problem, which can be solved using state-of-the-art mathematical modeling and integer linear programming software tools, along with empirical results showing that small biological networks, such as virus-host protein-protein interaction networks, can be aligned in a reasonable amount of time on a personal computer and the resulting alignments are structurally coherent and biologically meaningful. CONCLUSIONS The implementation of the integer linear programming reformulation using current mathematical modeling and integer linear programming software tools provided biologically meaningful alignments of virus-host protein-protein interaction networks.
Collapse
Affiliation(s)
- Mercè Llabrés
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122 Spain
- Balearic Islands Health Research Institute, Palma de Mallorca, E-07010 Spain
| | - Gabriel Riera
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122 Spain
- Balearic Islands Health Research Institute, Palma de Mallorca, E-07010 Spain
| | - Francesc Rosselló
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, E-07122 Spain
- Balearic Islands Health Research Institute, Palma de Mallorca, E-07010 Spain
| | - Gabriel Valiente
- Algorithms, Bioinformatics, Complexity and Formal Methods Research Group, Technical University of Catalonia, Barcelona, E-08034 Spain
| |
Collapse
|
16
|
Abstract
In this study, we deal with the problem of biological network alignment (NA), which aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for the transfer of functional knowledge between the aligned nodes. We provide evidence that current NA methods, which assume that topologically similar nodes (i.e., nodes whose network neighborhoods are isomorphic-like) have high functional relatedness, do not actually end up aligning functionally related nodes. That is, we show that the current topological similarity assumption does not hold well. Consequently, we argue that a paradigm shift is needed with how the NA problem is approached. So, we redefine NA as a data-driven framework, called TARA (data-driven NA), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity. TARA makes no assumptions about what nodes should be aligned, distinguishing it from existing NA methods. Specifically, TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns (features). We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. TARA as currently implemented uses topological but not protein sequence information for functional knowledge transfer. In this context, we find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance. The software and data are available at http://www.nd.edu/~cone/TARA/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| |
Collapse
|
17
|
Milano M, Milenković T, Cannataro M, Guzzi PH. L-HetNetAligner: A novel algorithm for Local Alignment of Heterogeneous Biological Networks. Sci Rep 2020; 10:3901. [PMID: 32127586 PMCID: PMC7054427 DOI: 10.1038/s41598-020-60737-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 02/11/2020] [Indexed: 11/10/2022] Open
Abstract
Networks are largely used for modelling and analysing a wide range of biological data. As a consequence, many different research efforts have resulted in the introduction of a large number of algorithms for analysis and comparison of networks. Many of these algorithms can deal with networks with a single class of nodes and edges, also referred to as homogeneous networks. Recently, many different approaches tried to integrate into a single model the interplay of different molecules. A possible formalism to model such a scenario comes from node/edge coloured networks (also known as heterogeneous networks) implemented as node/ edge-coloured graphs. Therefore, the need for the introduction of algorithms able to compare heterogeneous networks arises. We here focus on the local comparison of heterogeneous networks, and we formulate it as a network alignment problem. To the best of our knowledge, the local alignment of heterogeneous networks has not been explored in the past. We here propose L-HetNetAligner a novel algorithm that receives as input two heterogeneous networks (node-coloured graphs) and builds a local alignment of them. We also implemented and tested our algorithm. Our results confirm that our method builds high-quality alignments. The following website *contains Supplementary File 1 material and the code.
Collapse
Affiliation(s)
- Marianna Milano
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, 88040, Italy
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, USA
| | - Mario Cannataro
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, 88040, Italy
- Data Analytics Research Center, University of Catanzaro, Catanzaro, Italy
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, 88040, Italy.
- Data Analytics Research Center, University of Catanzaro, Catanzaro, Italy.
| |
Collapse
|
18
|
Hayes WB. An Introductory Guide to Aligning Networks Using SANA, the Simulated Annealing Network Aligner. Methods Mol Biol 2020; 2074:263-284. [PMID: 31583643 DOI: 10.1007/978-1-4939-9873-9_18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Sequence alignment has had an enormous impact on our understanding of biology, evolution, and disease. The alignment of biological networks holds similar promise. Biological networks generally model interactions between biomolecules such as proteins, genes, metabolites, or mRNAs. There is strong evidence that the network topology-the "structure" of the network-is correlated with the functions performed, so that network topology can be used to help predict or understand function. However, unlike sequence comparison and alignment-which is an essentially solved problem-network comparison and alignment is an NP-complete problem for which heuristic algorithms must be used.Here we introduce SANA, the Simulated Annealing Network Aligner. SANA is one of many algorithms proposed for the arena of biological network alignment. In the context of global network alignment, SANA stands out for its speed, memory efficiency, ease-of-use, and flexibility in the arena of producing alignments between two or more networks. SANA produces better alignments in minutes on a laptop than most other algorithms can produce in hours or days of CPU time on large server-class machines. We walk the user through how to use SANA for several types of biomolecular networks.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, USA.
| |
Collapse
|
19
|
Maskey S, Cho YR. LePrimAlign: local entropy-based alignment of PPI networks to predict conserved modules. BMC Genomics 2019; 20:964. [PMID: 31874635 PMCID: PMC6929407 DOI: 10.1186/s12864-019-6271-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Cross-species analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved interaction patterns. Identifying such conserved substructures between PPI networks of different species increases our understanding of the principles deriving evolution of cellular organizations and their functions in a system level. In recent years, network alignment techniques have been applied to genome-scale PPI networks to predict evolutionary conserved modules. Although a wide variety of network alignment algorithms have been introduced, developing a scalable local network alignment algorithm with high accuracy is still challenging. Results We present a novel pairwise local network alignment algorithm, called LePrimAlign, to predict conserved modules between PPI networks of three different species. The proposed algorithm exploits the results of a pairwise global alignment algorithm with many-to-many node mapping. It also applies the concept of graph entropy to detect initial cluster pairs from two networks. Finally, the initial clusters are expanded to increase the local alignment score that is formulated by a combination of intra-network and inter-network scores. The performance comparison with state-of-the-art approaches demonstrates that the proposed algorithm outperforms in terms of accuracy of identified protein complexes and quality of alignments. Conclusion The proposed method produces local network alignment of higher accuracy in predicting conserved modules even with large biological networks at a reduced computational cost.
Collapse
Affiliation(s)
- Sawal Maskey
- Department of Computer Science, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA
| | - Young-Rae Cho
- Department of Computer Science, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA. .,Bioinformatics Program, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA.
| |
Collapse
|
20
|
Form and relationship of the social networks of the New Testament. SOCIAL NETWORK ANALYSIS AND MINING 2019. [DOI: 10.1007/s13278-019-0577-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
21
|
Meerzaman D, Dunn BK. Value of Collaboration among Multi-Domain Experts in Analysis of High-Throughput Genomics Data. Cancer Res 2019; 79:5140-5145. [PMID: 31337654 DOI: 10.1158/0008-5472.can-19-0769] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 05/30/2019] [Accepted: 07/11/2019] [Indexed: 12/18/2022]
Abstract
The recent explosion and ease of access to large-scale genomics data is intriguing. However, serious obstacles exist to the optimal management of the entire spectrum from data production in the laboratory through bioinformatic analysis to statistical evaluation and ultimately clinical interpretation. Beyond the multitude of technical issues, what stands out the most is the absence of adequate communication among the specialists in these domains. Successful interdisciplinary collaborations along the genomics pipeline extending from laboratory experiments to bioinformatic analyses to clinical application are notable in large scale, well managed projects such as The Cancer Genome Atlas. However, in certain settings in which the various experts perform their specialized research activities in isolation, the siloed approach to their research contributes to the generation of questionable genomic interpretations. Such situations are particularly concerning when the ultimate endpoint involves genetic/genomic interpretations that are intended for clinical applications. In spite of the fact that clinicians express interest in gaining a better understanding of clinical genomic applications, the lack of communication from upstream experts leaves them with a serious level of discomfort in applying such genomic knowledge to patient care. This discomfort is especially evident among healthcare providers who are not trained as geneticists, in particular primary care physicians. We offer some initiatives that have potential to address this problem, with emphasis on improved and ongoing communication among all the experts in these fields, constituting a comprehensive genomic "pipeline" from laboratory to patient.
Collapse
Affiliation(s)
- Daoud Meerzaman
- NCI, Center for Biomedical Informatics and Information Technology, Bethesda, Maryland.
| | | |
Collapse
|
22
|
Wen Bin Goh W, Thalappilly S, Thibault G. Moving beyond the current limits of data analysis in longevity and healthy lifespan studies. Drug Discov Today 2019; 24:2273-2285. [PMID: 31499187 DOI: 10.1016/j.drudis.2019.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 08/03/2019] [Accepted: 08/28/2019] [Indexed: 11/19/2022]
Abstract
Living longer with sustainable quality of life is becoming increasingly important in aging populations. Understanding associative biological mechanisms have proven daunting, because of multigenicity and population heterogeneity. Although Big Data and Artificial Intelligence (AI) could help, naïve adoption is ill advised. We hold the view that model organisms are better suited for big-data analytics but might lack relevance because they do not immediately reflect the human condition. Resolving this hurdle and bridging the human-model organism gap will require some finesse. This includes improving signal:noise ratios by appropriate contextualization of high-throughput data, establishing consistency across multiple high-throughput platforms, and adopting supporting technologies that provide useful in silico and in vivo validation strategies.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- Bio-Data Science and Education Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore.
| | - Subhash Thalappilly
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Guillaume Thibault
- Lipid Regulation and Cell Stress Research Group, School of Biological Sciences, Nanyang Technological University, 637551, Singapore; Institute of Molecular and Cell Biology, A*STAR, 138673, Singapore.
| |
Collapse
|
23
|
Fan J, Cannistra A, Fried I, Lim T, Schaffner T, Crovella M, Hescott B, Leiserson MDM. Functional protein representations from biological networks enable diverse cross-species inference. Nucleic Acids Res 2019; 47:e51. [PMID: 30847485 PMCID: PMC6511848 DOI: 10.1093/nar/gkz132] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 01/09/2019] [Accepted: 02/18/2019] [Indexed: 12/31/2022] Open
Abstract
Transferring knowledge between species is key for many biological applications, but is complicated by divergent and convergent evolution. Many current approaches for this problem leverage sequence and interaction network data to transfer knowledge across species, exemplified by network alignment methods. While these techniques do well, they are limited in scope, creating metrics to address one specific problem or task. We take a different approach by creating an environment where multiple knowledge transfer tasks can be performed using the same protein representations. Specifically, our kernel-based method, MUNK, integrates sequence and network structure to create functional protein representations, embedding proteins from different species in the same vector space. First we show proteins in different species that are close in MUNK-space are functionally similar. Next, we use these representations to share knowledge of synthetic lethal interactions between species. Importantly, we find that the results using MUNK-representations are at least as accurate as existing algorithms for these tasks. Finally, we generalize the notion of a phenolog ('orthologous phenotype') to use functionally similar proteins (i.e. those with similar representations). We demonstrate the utility of this broadened notion by using it to identify known phenologs and novel non-obvious ones supported by current research.
Collapse
Affiliation(s)
- Jason Fan
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| | | | - Inbar Fried
- University of North Carolina Medical School, USA
| | - Tim Lim
- Department of Computer Science, Boston University, USA
| | | | - Mark Crovella
- Department of Computer Science, Boston University, USA
| | - Benjamin Hescott
- College of Computer and Information Science, Northeastern University, USA
| | - Mark D M Leiserson
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| |
Collapse
|
24
|
Rantala M, Niemistö H, Karhela T, Sierla S, Vyatkin V. Applying graph matching techniques to enhance reuse of plant design information. COMPUT IND 2019. [DOI: 10.1016/j.compind.2019.01.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
25
|
Nelson W, Zitnik M, Wang B, Leskovec J, Goldenberg A, Sharan R. To Embed or Not: Network Embedding as a Paradigm in Computational Biology. Front Genet 2019; 10:381. [PMID: 31118945 PMCID: PMC6504708 DOI: 10.3389/fgene.2019.00381] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 04/09/2019] [Indexed: 12/20/2022] Open
Abstract
Current technology is producing high throughput biomedical data at an ever-growing rate. A common approach to interpreting such data is through network-based analyses. Since biological networks are notoriously complex and hard to decipher, a growing body of work applies graph embedding techniques to simplify, visualize, and facilitate the analysis of the resulting networks. In this review, we survey traditional and new approaches for graph embedding and compare their application to fundamental problems in network biology with using the networks directly. We consider a broad variety of applications including protein network alignment, community detection, and protein function prediction. We find that in all of these domains both types of approaches are of value and their performance depends on the evaluation measures being used and the goal of the project. In particular, network embedding methods outshine direct methods according to some of those measures and are, thus, an essential tool in bioinformatics research.
Collapse
Affiliation(s)
- Walter Nelson
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Marinka Zitnik
- Department of Computer Science, Stanford University, Stanford, CA, United States
| | - Bo Wang
- Department of Computer Science, Stanford University, Stanford, CA, United States
- Peter Munk Cardiac Center, University Health Network, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, United States
- Chan Zuckerberg Biohub, San Francisco, CA, United States
| | - Anna Goldenberg
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
26
|
Guzzi PH, Milenkovic T. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin. Brief Bioinform 2019; 19:472-481. [PMID: 28062413 DOI: 10.1093/bib/bbw132] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Indexed: 12/23/2022] Open
Abstract
Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology.
Collapse
Affiliation(s)
- Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University Magna Graecia, Catanzaro, 88100 Italy
| | - Tijana Milenkovic
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications (iCeNSA), ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
27
|
Jing F, Zhang SW, Zhang S. Brief Survey of Biological Network Alignment and a Variant with Incorporation of Functional Annotations. Curr Bioinform 2018. [DOI: 10.2174/1574893612666171020103747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Biological network alignment has been widely studied in the context of protein-protein interaction (PPI) networks, metabolic networks and others in bioinformatics. The topological structure of networks and genomic sequence are generally used by existing methods for achieving this task.Objective and Method:Here we briefly survey the methods generally used for this task and introduce a variant with incorporation of functional annotations based on similarity in Gene Ontology (GO). Making full use of GO information is beneficial to provide insights into precise biological network alignment.Results and Conclusion:We analyze the effect of incorporation of GO information to network alignment. Finally, we make a brief summary and discuss future directions about this topic.
Collapse
Affiliation(s)
- Fang Jing
- Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
28
|
Shen T, Zhang Z, Chen Z, Gu D, Liang S, Xu Y, Li R, Wei Y, Liu Z, Yi Y, Xie X. A genome-scale metabolic network alignment method within a hypergraph-based framework using a rotational tensor-vector product. Sci Rep 2018; 8:16376. [PMID: 30401914 PMCID: PMC6219566 DOI: 10.1038/s41598-018-34692-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 10/23/2018] [Indexed: 12/14/2022] Open
Abstract
Biological network alignment aims to discover important similarities and differences and thus find a mapping between topological and/or functional components of different biological molecular networks. Then, the mapped components can be considered to correspond to both their places in the network topology and their biological attributes. Development and evolution of biological network alignment methods has been accelerated by the rapidly increasing availability of such biological networks, yielding a repertoire of tens of methods based upon graph theory. However, most biological processes, especially the metabolic reactions, are more sophisticated than simple pairwise interactions and contain three or more participating components. Such multi-lateral relations are not captured by graphs, and computational methods to overcome this limitation are currently lacking. This paper introduces hypergraphs and association hypergraphs to describe metabolic networks and their potential alignments, respectively. Within this framework, metabolic networks are aligned by identifying the maximal Z-eigenvalue of a symmetric tensor. A shifted higher-order power method was utilized to identify a solution. A rotational strategy has been introduced to accelerate the tensor-vector product by 250-fold on average and reduce the storage cost by up to 1,000-fold. The algorithm was implemented on a spark-based distributed computation cluster to significantly increase the convergence rate further by 50- to 80-fold. The parameters have been explored to understand their impact on alignment accuracy and speed. In particular, the influence of initial value selection on the stationary point has been simulated to ensure an accurate approximation of the global optimum. This framework was demonstrated by alignments among the genome-wide metabolic networks of Escherichia coli MG-1655 and Halophilic archaeon DL31. To our knowledge, this is the first genome-wide metabolic network alignment at both the metabolite level and the enzyme level. These results demonstrate that it can supply quite a few valuable insights into metabolic networks. First, this method can access the driving force of organic reactions through the chemical evolution of metabolic network. Second, this method can incorporate the chemical information of enzymes and structural changes of compounds to offer new way defining reaction class and module, such as those in KEGG. Third, as a vertex-focused treatment, this method can supply novel structural and functional annotation for ill-defined molecules. The related source code is available on request.
Collapse
Affiliation(s)
- Tie Shen
- Key Laboratory of Information and Computing Science Guizhou Province, Guizhou Normal University, Guiyang, Guizhou, China.
| | - Zhengdong Zhang
- College of Mathematics and Information Science, Guiyang University, Guiyang, Guizhou, China
| | - Zhen Chen
- College of Mathematical Science, Guizhou Normal University, Guiyang, Guizhou, China
| | - Dagang Gu
- College of Mathematics and Information Science, Guiyang University, Guiyang, Guizhou, China
| | - Shen Liang
- College of Mathematics and Information Science, Guiyang University, Guiyang, Guizhou, China
| | - Yang Xu
- Key Laboratory of Information and Computing Science Guizhou Province, Guizhou Normal University, Guiyang, Guizhou, China
| | - Ruiyuan Li
- Key Laboratory of Information and Computing Science Guizhou Province, Guizhou Normal University, Guiyang, Guizhou, China
| | - Yimin Wei
- School of Mathematics Sciences and Key Laboratory of Mathematics for Nonlinear Sciences, Fudan University, Shanghai, China
| | - Zhijie Liu
- Key Laboratory of Information and Computing Science Guizhou Province, Guizhou Normal University, Guiyang, Guizhou, China
| | - Yin Yi
- Key Laboratory of State Forestry Administration on Biodiversity Conservation in Karst of Southwest Areas China, Guizhou Normal University, Guiyang, Guizhou, China.
| | - Xiaoyao Xie
- Key Laboratory of Information and Computing Science Guizhou Province, Guizhou Normal University, Guiyang, Guizhou, China.
| |
Collapse
|
29
|
Zhu Y, Li Y, Liu J, Qin L, Yu JX. Discovering large conserved functional components in global network alignment by graph matching. BMC Genomics 2018; 19:670. [PMID: 30255780 PMCID: PMC6157291 DOI: 10.1186/s12864-018-5027-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Aligning protein-protein interaction (PPI) networks is very important to discover the functionally conserved sub-structures between different species. In recent years, the global PPI network alignment problem has been extensively studied aiming at finding the one-to-one alignment with the maximum matching score. However, finding large conserved components remains challenging due to its NP-hardness. RESULTS We propose a new graph matching method GMAlign for global PPI network alignment. It first selects some pairs of important proteins as seeds, followed by a gradual expansion to obtain an initial matching, and then it refines the current result to obtain an optimal alignment result iteratively based on the vertex cover. We compare GMAlign with the state-of-the-art methods on the PPI network pairs obtained from the largest BioGRID dataset and validate its performance. The results show that our algorithm can produce larger size of alignment, and can find bigger and denser common connected subgraphs as well for the first time. Meanwhile, GMAlign can achieve high quality biological results, as measured by functional consistency and semantic similarity of the Gene Ontology terms. Moreover, we also show that GMAlign can achieve better results which are structurally and biologically meaningful in the detection of large conserved biological pathways between species. CONCLUSIONS GMAlign is a novel global network alignment tool to discover large conserved functional components between PPI networks. It also has many potential biological applications such as conserved pathway and protein complex discovery across species. The GMAlign software and datasets are avaialbile at https://github.com/yzlwhu/GMAlign .
Collapse
Affiliation(s)
- Yuanyuan Zhu
- School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072 China
| | - Yuezhi Li
- School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072 China
| | - Juan Liu
- School of Computer Science, Wuhan University, Bayi Road, Wuhan, 430072 China
| | - Lu Qin
- Centre of Quantum Computation and Intelligent Systems, University of Technology, Sydney, Australia
| | - Jeffrey Xu Yu
- The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
30
|
Henkel R, Hoehndorf R, Kacprowski T, Knüpfer C, Liebermeister W, Waltemath D. Notions of similarity for systems biology models. Brief Bioinform 2018; 19:77-88. [PMID: 27742665 PMCID: PMC5862271 DOI: 10.1093/bib/bbw090] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 08/28/2016] [Indexed: 01/23/2023] Open
Abstract
Systems biology models are rapidly increasing in complexity, size and numbers. When building large models, researchers rely on software tools for the retrieval, comparison, combination and merging of models, as well as for version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of 'similarity' may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here we survey existing methods for the comparison of models, introduce quantitative measures for model similarity, and discuss potential applications of combined similarity measures. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on a combination of different model aspects. The six aspects that we define as potentially relevant for similarity are underlying encoding, references to biological entities, quantitative behaviour, qualitative behaviour, mathematical equations and parameters and network structure. We argue that future similarity measures will benefit from combining these model aspects in flexible, problem-specific ways to mimic users' intuition about model similarity, and to support complex model searches in databases.
Collapse
Affiliation(s)
| | | | | | | | | | - Dagmar Waltemath
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
| |
Collapse
|
31
|
Elmsallati A, Msalati A, Kalita J. Index-Based Network Aligner of Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:330-336. [PMID: 28113986 DOI: 10.1109/tcbb.2016.2613098] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Network Alignment over graph-structured data has received considerable attention in many recent applications. Global network alignment tries to uniquely find the best mapping for a node in one network to only one node in another network. The mapping is performed according to some matching criteria that depend on the nature of data. In molecular biology, functional orthologs, protein complexes, and evolutionary conserved pathways are some examples of information uncovered by global network alignment. Current techniques for global network alignment suffer from several drawbacks, e.g., poor performance and high memory requirements. We address these problems by proposing IBNAL, Indexes-Based Network ALigner, for better alignment quality and faster results. To accelerate the alignment step, IBNAL makes use of a novel clique-based index and is able to align large networks in seconds. IBNAL produces a higher topological quality alignment and comparable biological match in alignment relative to other state-of-the-art aligners even though topological fit is primarily used to match nodes. IBNAL's results confirm and give another evidence that homology information is more likely to be encoded in network topology than sequence information.
Collapse
|
32
|
Mohammadi S, Gleich DF, Kolda TG, Grama A. Triangular Alignment (TAME): A Tensor-Based Approach for Higher-Order Network Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1446-1458. [PMID: 27483461 DOI: 10.1109/tcbb.2016.2595583] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Network alignment has extensive applications in comparative interactomics. Traditional approaches aim to simultaneously maximize the number of conserved edges and the underlying similarity of aligned entities. We propose a novel formulation of the network alignment problem that extends topological similarity to higher-order structures and provides a new objective function that maximizes the number of aligned substructures. This objective function corresponds to an integer programming problem, which is NP-hard. Consequently, we identify a closely related surrogate function whose maximization results in a tensor eigenvector problem. Based on this formulation, we present an algorithm called Triangular AlignMEnt (TAME), which attempts to maximize the number of aligned triangles across networks. Using a case study on the NAPAbench dataset, we show that triangular alignment is capable of producing mappings with high node correctness. We further evaluate our method by aligning yeast and human interactomes. Our results indicate that TAME outperforms the state-of-art alignment methods in terms of conserved triangles. In addition, we show that the number of conserved triangles is more significantly correlated, compared to the conserved edge, with node correctness and co-expression of edges. Our formulation and resulting algorithms can be easily extended to arbitrary motifs.
Collapse
|
33
|
Xu W, Cao Y, Xie Z, He H, He S, Hong H, Bo X, Li F. NFPscanner: a webtool for knowledge-based deciphering of biomedical networks. BMC Bioinformatics 2017; 18:262. [PMID: 28521733 PMCID: PMC5437514 DOI: 10.1186/s12859-017-1673-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 05/03/2017] [Indexed: 12/05/2022] Open
Abstract
Background Many biological pathways have been created to represent different types of knowledge, such as genetic interactions, metabolic reactions, and gene-regulating and physical-binding relationships. Biologists are using a wide range of omics data to elaborately construct various context-specific differential molecular networks. However, they cannot easily gain insight into unfamiliar gene networks with the tools that are currently available for pathways resource and network analysis. They would benefit from the development of a standardized tool to compare functions of multiple biological networks quantitatively and promptly. Results To address this challenge, we developed NFPscanner, a web server for deciphering gene networks with pathway associations. Adapted from a recently reported knowledge-based framework called network fingerprint, NFPscanner integrates the annotated pathways of 7 databases, 4 algorithms, and 2 graphical visualization modules into a webtool. It implements 3 types of network analysis:Fingerprint: Deciphering gene networks and highlighting inherent pathway modules Alignment: Discovering functional associations by finding optimized node mapping between 2 gene networks Enrichment: Calculating and visualizing gene ontology (GO) and pathway enrichment for genes in networks
Users can upload gene networks to NFPscanner through the web interface and then interactively explore the networks’ functions. Conclusions NFPscanner is open-source software for non-commercial use, freely accessible at http://biotech.bmi.ac.cn/nfs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1673-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wenjian Xu
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China
| | - Yang Cao
- Tianjin Institute of Health & Environmental Medicine, 1 Dali Road, Heping District, Tianjin, 300050, China
| | - Ziwei Xie
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, 430074, Hubei, China
| | - Haochen He
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China
| | - Song He
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China
| | - Hao Hong
- Department of Biomedical Engineering, National University of Defense Technology, 109 Deya Road, Kaifu District, Changsha, 410073, Hunan, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China.
| | - Fei Li
- Department of Biotechnology, Beijing Institute of Radiation Medicine, 27 Taiping Street, Haidian District, Beijing, 100850, China.
| |
Collapse
|
34
|
Abstract
Paralleling the increasing availability of protein-protein interaction (PPI) network data, several network alignment methods have been proposed. Network alignments have been used to uncover functionally conserved network parts and to transfer annotations. However, due to the computational intractability of the network alignment problem, aligners are heuristics providing divergent solutions and no consensus exists on a gold standard, or which scoring scheme should be used to evaluate them. We comprehensively evaluate the alignment scoring schemes and global network aligners on large scale PPI data and observe that three methods, HUBALIGN, L-GRAAL and NATALIE, regularly produce the most topologically and biologically coherent alignments. We study the collective behaviour of network aligners and observe that PPI networks are almost entirely aligned with a handful of aligners that we unify into a new tool, Ulign. Ulign enables complete alignment of two networks, which traditional global and local aligners fail to do. Also, multiple mappings of Ulign define biologically relevant soft clusterings of proteins in PPI networks, which may be used for refining the transfer of annotations across networks. Hence, PPI networks are already well investigated by current aligners, so to gain additional biological insights, a paradigm shift is needed. We propose such a shift come from aligning all available data types collectively rather than any particular data type in isolation from others.
Collapse
|
35
|
|
36
|
Mamano N, Hayes WB. SANA: simulated annealing far outperforms many other search algorithms for biological network alignment. Bioinformatics 2017; 33:2156-2164. [DOI: 10.1093/bioinformatics/btx090] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 02/08/2017] [Indexed: 11/14/2022] Open
|
37
|
Kazemi E, Hassani H, Grossglauser M, Pezeshgi Modarres H. PROPER: global protein interaction network alignment through percolation matching. BMC Bioinformatics 2016; 17:527. [PMID: 27955623 PMCID: PMC5153870 DOI: 10.1186/s12859-016-1395-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 11/29/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The alignment of protein-protein interaction (PPI) networks enables us to uncover the relationships between different species, which leads to a deeper understanding of biological systems. Network alignment can be used to transfer biological knowledge between species. Although different PPI-network alignment algorithms were introduced during the last decade, developing an accurate and scalable algorithm that can find alignments with high biological and structural similarities among PPI networks is still challenging. RESULTS In this paper, we introduce a new global network alignment algorithm for PPI networks called PROPER. Compared to other global network alignment methods, our algorithm shows higher accuracy and speed over real PPI datasets and synthetic networks. We show that the PROPER algorithm can detect large portions of conserved biological pathways between species. Also, using a simple parsimonious evolutionary model, we explain why PROPER performs well based on several different comparison criteria. CONCLUSIONS We highlight that PROPER has high potential in further applications such as detecting biological pathways, finding protein complexes and PPI prediction. The PROPER algorithm is available at http://proper.epfl.ch .
Collapse
Affiliation(s)
- Ehsan Kazemi
- School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland.
| | - Hamed Hassani
- Department of Computer Science, ETHZ, Zurich, Switzerland
| | | | | |
Collapse
|
38
|
Gong M, Peng Z, Ma L, Huang J. Global Biological Network Alignment by Using Efficient Memetic Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:1117-1129. [PMID: 28055895 DOI: 10.1109/tcbb.2015.2511741] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
High-throughput experimental screening techniques have resulted in a large number of biological network data such as protein-protein interactions (PPI) data. The analysis of these data can enhance our understanding of cellular processes. PPI network alignment is one of the comparative analysis methods for analyzing biological networks. Research on PPI networks can identify conserved subgraphs and help us to understand evolutionary trajectories across species. Some evolutionary algorithms have been proposed for coping with PPI network alignment, but most of them are limited by the lower search efficiency due to the lack of the priori knowledge. In this paper, we propose a memetic algorithm, denoted as MeAlgn, to solve the biological network alignment by optimizing an objective function which introduces topological structure and sequence similarities. MeAlign combines genetic algorithm with a local search refinement. The genetic algorithm is to find interesting alignment solution regions, and the local search is to find optimal solutions around the regions. The proposed algorithm first develops a coarse similarity score matrix for initialization and then it uses a specific neighborhood heuristic local search strategy to find an optimal alignment. In MeAlign, the information of topological structure and sequence similarities is used to guide our mapping. Experimental results demonstrate that our algorithm can achieve a better mapping than some state-of-the-art algorithms and it makes a better balance between the network topology and nodes sequence similarities.
Collapse
|
39
|
Meng L, Striegel A, Milenković T. Local versus global biological network alignment. Bioinformatics 2016; 32:3155-3164. [PMID: 27357169 PMCID: PMC5048063 DOI: 10.1093/bioinformatics/btw348] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 02/18/2016] [Accepted: 05/23/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Network alignment (NA) aims to find regions of similarities between species' molecular networks. There exist two NA categories: local (LNA) and global (GNA). LNA finds small highly conserved network regions and produces a many-to-many node mapping. GNA finds large conserved regions and produces a one-to-one node mapping. Given the different outputs of LNA and GNA, when a new NA method is proposed, it is compared against existing methods from the same category. However, both NA categories have the same goal: to allow for transferring functional knowledge from well- to poorly-studied species between conserved network regions. So, which one to choose, LNA or GNA? To answer this, we introduce the first systematic evaluation of the two NA categories. RESULTS We introduce new measures of alignment quality that allow for fair comparison of the different LNA and GNA outputs, as such measures do not exist. We provide user-friendly software for efficient alignment evaluation that implements the new and existing measures. We evaluate prominent LNA and GNA methods on synthetic and real-world biological networks. We study the effect on alignment quality of using different interaction types and confidence levels. We find that the superiority of one NA category over the other is context-dependent. Further, when we contrast LNA and GNA in the application of learning novel protein functional knowledge, the two produce very different predictions, indicating their complementarity. Our results and software provide guidelines for future NA method development and evaluation. AVAILABILITY AND IMPLEMENTATION Software: http://www.nd.edu/~cone/LNA_GNA CONTACT: : tmilenko@nd.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lei Meng
- Department of Computer Science and Engineering ECK Institute of Global Health and Interdisciplinary Center for Network Science and Applications
| | - Aaron Striegel
- Department of Computer Science and Engineering Wireless Institute, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering ECK Institute of Global Health and Interdisciplinary Center for Network Science and Applications
| |
Collapse
|
40
|
Sarajlić A, Malod-Dognin N, Yaveroğlu ÖN, Pržulj N. Graphlet-based Characterization of Directed Networks. Sci Rep 2016; 6:35098. [PMID: 27734973 PMCID: PMC5062067 DOI: 10.1038/srep35098] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 09/26/2016] [Indexed: 01/22/2023] Open
Abstract
We are flooded with large-scale, dynamic, directed, networked data. Analyses requiring exact comparisons between networks are computationally intractable, so new methodologies are sought. To analyse directed networks, we extend graphlets (small induced sub-graphs) and their degrees to directed data. Using these directed graphlets, we generalise state-of-the-art network distance measures (RGF, GDDA and GCD) to directed networks and show their superiority for comparing directed networks. Also, we extend the canonical correlation analysis framework that enables uncovering the relationships between the wiring patterns around nodes in a directed network and their expert annotations. On directed World Trade Networks (WTNs), our methodology allows uncovering the core-broker-periphery structure of the WTN, predicting the economic attributes of a country, such as its gross domestic product, from its wiring patterns in the WTN for up-to ten years in the future. It does so by enabling us to track the dynamics of a country's positioning in the WTN over years. On directed metabolic networks, our framework yields insights into preservation of enzyme function from the network wiring patterns rather than from sequence data. Overall, our methodology enables advanced analyses of directed networked data from any area of science, allowing domain-specific interpretation of a directed network's topology.
Collapse
Affiliation(s)
- Anida Sarajlić
- Department of Computing, Imperial College London, SW7 2AZ London, UK
| | - Noël Malod-Dognin
- Department of Computer Science, University College London, WC1E 6BT London, UK
| | | | - Nataša Pržulj
- Department of Computer Science, University College London, WC1E 6BT London, UK
| |
Collapse
|
41
|
|
42
|
Elmsallati A, Clark C, Kalita J. Global Alignment of Protein-Protein Interaction Networks: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:689-705. [PMID: 26336140 DOI: 10.1109/tcbb.2015.2474391] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we survey algorithms that perform global alignment of networks or graphs. Global network alignment aligns two or more given networks to find the best mapping from nodes in one network to nodes in other networks. Since graphs are a common method of data representation, graph alignment has become important with many significant applications. Protein-protein interactions can be modeled as networks and aligning these networks of protein interactions has many applications in biological research. In this survey, we review algorithms for global pairwise alignment highlighting various proposed approaches, and classify them based on their methodology. Evaluation metrics that are used to measure the quality of the resulting alignments are also surveyed. We discuss and present a comparison between selected aligners on the same datasets and evaluate using the same evaluation metrics. Finally, a quick overview of the most popular databases of protein interaction networks is presented focusing on datasets that have been used recently.
Collapse
|
43
|
Abstract
Interactions among biological entities contain more information than purely the similarities between the entities. For example, interactions between genes, and gene products, can be more informative than the sequence similarities of the genes involved. However, the study of biological networks and their evolution in particular is still in its infancy. Simplified theoretical models of the development of biological networks from a starting state exist, but the problem of finding a distance between existing biological networks, with an unknown history, has seen less research. Metrics for network distance can also be used to measure the fit between theoretically derived networks and their real-world counterpart. In this article, we present a useful model of biological network distance and demonstrate an implementation using simulated gene regulatory networks. We compared our method with existing methods for network alignment and showed that we are much better able to identify evolutionary changes in biological networks. In particular, we can recover the evolutionary trees that describe the relationship between these networks.
Collapse
Affiliation(s)
- Martin McGrane
- 1 School of Information Technologies, The University of Sydney , Sydney, New South Wales, Australia
| | | |
Collapse
|
44
|
Emmert-Streib F, Dehmer M, Shi Y. Fifty years of graph matching, network alignment and network comparison. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.01.074] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
45
|
Abstract
Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.
Collapse
Affiliation(s)
- Ulf Schmitz
- Dept of Systems Biology & Bioinformatics, University of Rostock, Rostock, Germany
| | - Olaf Wolkenhauer
- Dept of Systems Biology & Bioinformatics, University of Rostock, Rostock, Germany
| |
Collapse
|
46
|
Penfold CA, Millar JBA, Wild DL. Inferring orthologous gene regulatory networks using interspecies data fusion. Bioinformatics 2015; 31:i97-105. [PMID: 26072515 PMCID: PMC4765882 DOI: 10.1093/bioinformatics/btv267] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Motivation: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved ‘hypernetwork’. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. Results: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase. Availability and implementation: MATLAB code is available from http://go.warwick.ac.uk/systemsbiology/software/. Contact:d.l.wild@warwick.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher A Penfold
- Warwick Systems Biology Centre and Biomedical Cell Biology, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| | - Jonathan B A Millar
- Warwick Systems Biology Centre and Biomedical Cell Biology, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| | - David L Wild
- Warwick Systems Biology Centre and Biomedical Cell Biology, Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
47
|
Gligorijević V, Malod-Dognin N, Pržulj N. Fuse: multiple network alignment via data fusion. Bioinformatics 2015; 32:1195-203. [DOI: 10.1093/bioinformatics/btv731] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 10/09/2015] [Indexed: 02/07/2023] Open
|
48
|
Malek M, Ibragimov R, Albrecht M, Baumbach J. CytoGEDEVO-global alignment of biological networks with Cytoscape. Bioinformatics 2015; 32:1259-61. [PMID: 26669930 DOI: 10.1093/bioinformatics/btv732] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 12/09/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In the systems biology era, high-throughput omics technologies have enabled the unraveling of the interplay of some biological entities on a large scale (e.g. genes, proteins, metabolites or RNAs). Huge biological networks have emerged, where nodes correspond to these entities and edges between them model their relations. Protein-protein interaction networks, for instance, show the physical interactions of proteins in an organism. The comparison of such networks promises additional insights into protein and cell function as well as knowledge-transfer across species. Several computational approaches have been developed previously to solve the network alignment (NA) problem, but only a few concentrate on the usability of the implemented tools for the evaluation of protein-protein interactions by the end users (biologists and medical researchers). RESULTS We have created CytoGEDEVO, a Cytoscape app for visual and user-assisted NA. It extends the previous GEDEVO methodology for global pairwise NAs with new graphical and functional features. Our main focus was on the usability, even by non-programmers and the interpretability of the NA results with Cytoscape. AVAILABILITY AND IMPLEMENTATION CytoGEDEVO is publicly available from the Cytoscape app store at http://apps.cytoscape.org/apps/cytogedevo In addition, we provide stand-alone command line executables, source code, documentation and step-by-step user instructions at http://cytogedevo.compbio.sdu.dk CONTACT malek@tugraz.at SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maximilian Malek
- Saarland University, 66123 Saarbrücken, Germany Institute for Knowledge Discovery, Graz University of Technology, 8010 Graz, Austria
| | - Rashid Ibragimov
- Saarland University, 66123 Saarbrücken, Germany Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | - Mario Albrecht
- Institute for Knowledge Discovery, Graz University of Technology, 8010 Graz, Austria BioTechMed-Graz, Graz, Austria
| | | |
Collapse
|
49
|
Faisal FE, Meng L, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:3. [PMID: 28194172 PMCID: PMC5270500 DOI: 10.1186/s13637-015-0022-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 11/10/2022]
Abstract
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Lei Meng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Joseph Crawford
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| |
Collapse
|
50
|
Poot-Hernandez AC, Rodriguez-Vazquez K, Perez-Rueda E. The alignment of enzymatic steps reveals similar metabolic pathways and probable recruitment events in Gammaproteobacteria. BMC Genomics 2015; 16:957. [PMID: 26578309 PMCID: PMC4647829 DOI: 10.1186/s12864-015-2113-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 10/19/2015] [Indexed: 11/29/2022] Open
Abstract
Background It is generally accepted that gene duplication followed by functional divergence is one of the main sources of metabolic diversity. In this regard, there is an increasing interest in the development of methods that allow the systematic identification of these evolutionary events in metabolism. Here, we used a method not based on biomolecular sequence analysis to compare and identify common and variable routes in the metabolism of 40 Gammaproteobacteria species. Method The metabolic maps deposited in the KEGG database were transformed into linear Enzymatic Step Sequences (ESS) by using the breadth-first search algorithm. These ESS represent subsequent enzymes linked to each other, where their catalytic activities are encoded in the Enzyme Commission numbers. The ESS were compared in an all-against-all (pairwise comparisons) approach by using a dynamic programming algorithm, leaving only a set of significant pairs. Results and conclusion From these comparisons, we identified a set of functionally conserved enzymatic steps in different metabolic maps, in which cell wall components and fatty acid and lysine biosynthesis were included. In addition, we found that pathways associated with biosynthesis share a higher proportion of similar ESS than degradation pathways and secondary metabolism pathways. Also, maps associated with the metabolism of similar compounds contain a high proportion of similar ESS, such as those maps from nucleotide metabolism pathways, in particular the inosine monophosphate pathway. Furthermore, diverse ESS associated with the low part of the glycolysis pathway were identified as functionally similar to multiple metabolic pathways. In summary, our comparisons may help to identify similar reactions in different metabolic pathways and could reinforce the patchwork model in the evolution of metabolism in Gammaproteobacteria. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2113-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Augusto Cesar Poot-Hernandez
- Departamento de Microbiología Molecular, Instituto de Biotecnología, UNAM, Av. Universidad 2001, Cuernavaca, Morelos, CP 62210, México. .,Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, UNAM, Ciudad Universitaria, CP 04510, México D.F., México.
| | - Katya Rodriguez-Vazquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, UNAM, Ciudad Universitaria, CP 04510, México D.F., México.
| | - Ernesto Perez-Rueda
- Departamento de Microbiología Molecular, Instituto de Biotecnología, UNAM, Av. Universidad 2001, Cuernavaca, Morelos, CP 62210, México.
| |
Collapse
|