1
|
Guo Y, Liu L, Lin A. Improving the identification of cancer driver modules using deep features learned from multi-omics data. Comput Biol Med 2025; 184:109322. [PMID: 39522132 DOI: 10.1016/j.compbiomed.2024.109322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 10/14/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
Identifying the cancer driver modules or pathways is crucial to understanding the fundamental mechanisms of cancer occurrence and progression. The rapid abundance of cancer omics data provides unprecedented opportunities to study the driver modules in cancer, and many computational methods have been developed in recent years. However, most existing methods have limitations in considering different types of cancer omics data and cannot effectively learn informative omics features for integrated identification of driver modules. In this paper, we introduce a new integrated framework to accurately identify the cancer driver modules by integrating the protein-protein interaction network, transcriptional regulatory network, gene expression and mutation data in cancer. We first develop a series of methods to learn the deep features of functional connectivity between genes in each omics data and then construct an integrated gene functional coherence network. Furthermore, we present a two-step module mining method to efficiently identify the cancer driver modules from the integrated gene functional coherence network. Systematic experiments in three cancer types demonstrate that the proposed framework can obtain more significant driver modules than most existing methods, and some identified driver modules are associated with clinical survival phenotypes.
Collapse
Affiliation(s)
- Yang Guo
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China.
| | - Lingling Liu
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| | - Aofeng Lin
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| |
Collapse
|
2
|
Hayes WB. Exact p-values for global network alignments via combinatorial analysis of shared GO terms : REFANGO: Rigorous Evaluation of Functional Alignments of Networks using Gene Ontology. J Math Biol 2024; 88:50. [PMID: 38551701 PMCID: PMC10980677 DOI: 10.1007/s00285-024-02058-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 01/21/2024] [Accepted: 02/05/2024] [Indexed: 04/01/2024]
Abstract
Network alignment aims to uncover topologically similar regions in the protein-protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no "gold standard" exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. Given an alignment in which k aligned protein pairs share a particular GO term g, we use a combinatorial argument to precisely quantify the p-value of that alignment with respect to g compared to a random alignment. The p-value of the alignment with respect to all GO terms, including their inter-relationships, is approximated using the Empirical Brown's Method. We note that, just as with BLAST's p-values, this method is not designed to guide an alignment algorithm towards a solution; instead, just as with BLAST, an alignment is guided by a scoring matrix or function; the p-values herein are computed after the fact, providing independent feedback to the user on the biological quality of the alignment that was generated by optimizing the scoring function. Importantly, we demonstrate that among all GO-based measures of network alignments, ours is the only one that correlates with the precision of GO annotation predictions, paving the way for network alignment-based protein function prediction.
Collapse
Affiliation(s)
- Wayne B Hayes
- Department of Computer Science, UC Irvine, Irvine, USA.
| |
Collapse
|
3
|
Li L, Dannenfelser R, Zhu Y, Hejduk N, Segarra S, Yao V. Joint embedding of biological networks for cross-species functional alignment. Bioinformatics 2023; 39:btad529. [PMID: 37632792 PMCID: PMC10477935 DOI: 10.1093/bioinformatics/btad529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 07/12/2023] [Accepted: 08/24/2023] [Indexed: 08/28/2023] Open
Abstract
MOTIVATION Model organisms are widely used to better understand the molecular causes of human disease. While sequence similarity greatly aids this cross-species transfer, sequence similarity does not imply functional similarity, and thus, several current approaches incorporate protein-protein interactions to help map findings between species. Existing transfer methods either formulate the alignment problem as a matching problem which pits network features against known orthology, or more recently, as a joint embedding problem. RESULTS We propose a novel state-of-the-art joint embedding solution: Embeddings to Network Alignment (ETNA). ETNA generates individual network embeddings based on network topological structure and then uses a Natural Language Processing-inspired cross-training approach to align the two embeddings using sequence-based orthologs. The final embedding preserves both within and between species gene functional relationships, and we demonstrate that it captures both pairwise and group functional relevance. In addition, ETNA's embeddings can be used to transfer genetic interactions across species and identify phenotypic alignments, laying the groundwork for potential opportunities for drug repurposing and translational studies. AVAILABILITY AND IMPLEMENTATION https://github.com/ylaboratory/ETNA.
Collapse
Affiliation(s)
- Lechuan Li
- Department of Computer Science, Rice University, Houston, TX 77005, United States
| | - Ruth Dannenfelser
- Department of Computer Science, Rice University, Houston, TX 77005, United States
| | - Yu Zhu
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, United States
| | - Nathaniel Hejduk
- Department of Computer Science, Rice University, Houston, TX 77005, United States
| | - Santiago Segarra
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, United States
| | - Vicky Yao
- Department of Computer Science, Rice University, Houston, TX 77005, United States
| |
Collapse
|
4
|
Ding K, Wang S, Luo Y. Supervised biological network alignment with graph neural networks. Bioinformatics 2023; 39:i465-i474. [PMID: 37387160 PMCID: PMC10311300 DOI: 10.1093/bioinformatics/btad241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Despite the advances in sequencing technology, massive proteins with known sequences remain functionally unannotated. Biological network alignment (NA), which aims to find the node correspondence between species' protein-protein interaction (PPI) networks, has been a popular strategy to uncover missing annotations by transferring functional knowledge across species. Traditional NA methods assumed that topologically similar proteins in PPIs are functionally similar. However, it was recently reported that functionally unrelated proteins can be as topologically similar as functionally related pairs, and a new data-driven or supervised NA paradigm has been proposed, which uses protein function data to discern which topological features correspond to functional relatedness. RESULTS Here, we propose GraNA, a deep learning framework for the supervised NA paradigm for the pairwise NA problem. Employing graph neural networks, GraNA utilizes within-network interactions and across-network anchor links for learning protein representations and predicting functional correspondence between across-species proteins. A major strength of GraNA is its flexibility to integrate multi-faceted non-functional relationship data, such as sequence similarity and ortholog relationships, as anchor links to guide the mapping of functionally related proteins across species. Evaluating GraNA on a benchmark dataset composed of several NA tasks between different pairs of species, we observed that GraNA accurately predicted the functional relatedness of proteins and robustly transferred functional annotations across species, outperforming a number of existing NA methods. When applied to a case study on a humanized yeast network, GraNA also successfully discovered functionally replaceable human-yeast protein pairs that were documented in previous studies. AVAILABILITY AND IMPLEMENTATION The code of GraNA is available at https://github.com/luo-group/GraNA.
Collapse
Affiliation(s)
- Kerr Ding
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195, United States
| | - Yunan Luo
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| |
Collapse
|
5
|
Liu F, Jiang X, Yang J, Tao J, Zhang M. A chronotherapeutics-applicable multi-target therapeutics based on AI: Example of therapeutic hypothermia. Brief Bioinform 2022; 23:6694809. [PMID: 36088545 PMCID: PMC9487598 DOI: 10.1093/bib/bbac365] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 07/15/2022] [Accepted: 08/03/2022] [Indexed: 11/24/2022] Open
Abstract
Nowadays, the complexity of disease mechanisms and the inadequacy of single-target therapies in restoring the biological system have inevitably instigated the strategy of multi-target therapeutics with the analysis of each target individually. However, it is not suitable for dealing with the conflicts between targets or between drugs. With the release of high-precision protein structure prediction artificial intelligence, large-scale high-precision protein structure prediction and docking have become possible. In this article, we propose a multi-target drug discovery method by the example of therapeutic hypothermia (TH). First, we performed protein structure prediction for all protein targets of each group by AlphaFold2 and RoseTTAFold. Then, QuickVina 2 is used for molecular docking between the proteins and drugs. After docking, we use PageRank to rank single drugs and drug combinations of each group. The ePharmaLib was used for predicting the side effect targets. Given the differences in the weights of different targets, the method can effectively avoid inhibiting beneficial proteins while inhibiting harmful proteins. So it could minimize the conflicts between different doses and be friendly to chronotherapeutics. Besides, this method also has potential in precision medicine for its high compatibility with bioinformatics and promotes the development of pharmacogenomics and bioinfo-pharmacology.
Collapse
Affiliation(s)
- Fei Liu
- Department of Emergency Medicine, Second Affiliated Hospital of Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Institute of Emergency Medicine, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Key Laboratory of The Diagnosis and Treatment of Severe Trauma and Burn of Zhejiang Province, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
| | - Xiangkang Jiang
- Department of Emergency Medicine, Second Affiliated Hospital of Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Institute of Emergency Medicine, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Key Laboratory of The Diagnosis and Treatment of Severe Trauma and Burn of Zhejiang Province, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
| | - Jingyuan Yang
- Department of Emergency Medicine, Second Affiliated Hospital of Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Institute of Emergency Medicine, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Key Laboratory of The Diagnosis and Treatment of Severe Trauma and Burn of Zhejiang Province, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
| | - Jiawei Tao
- Department of Emergency Medicine, Second Affiliated Hospital of Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Institute of Emergency Medicine, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Key Laboratory of The Diagnosis and Treatment of Severe Trauma and Burn of Zhejiang Province, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
| | - Mao Zhang
- Department of Emergency Medicine, Second Affiliated Hospital of Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Institute of Emergency Medicine, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
- Key Laboratory of The Diagnosis and Treatment of Severe Trauma and Burn of Zhejiang Province, Zhejiang University , Hangzhou 310009, Zhejiang Province, China
| |
Collapse
|
6
|
Wang S, Chen X, Frederisy BJ, Mbakogu BA, Kanne AD, Khosravi P, Hayes WB. On the current failure-but bright future-of topology-driven biological network alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:1-44. [PMID: 35871888 DOI: 10.1016/bs.apcsb.2022.05.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Since the function of a protein is defined by its interaction partners, and since we expect similar interaction patterns across species, the alignment of protein-protein interaction (PPI) networks between species, based on network topology alone, should uncover functionally related proteins across species. Surprisingly, despite the publication of more than fifty algorithms aimed at performing PPI network alignment, few have demonstrated a statistically significant link between network topology and functional similarity, and none have demonstrated that orthologs can be recovered using network topology alone. We find that the major contributing factors to this surprising failure are: (i) edge densities in most currently available experimental PPI networks are demonstrably too low to expect topological network alignment to succeed; (ii) in the few cases where the edge densities are high enough, some measures of topological similarity easily uncover functionally similar proteins while others do not; and (iii) most network alignment algorithms to date perform poorly at optimizing even their own topological objective functions, hampering their ability to use topology effectively. We demonstrate that SANA-the Simulated Annealing Network Aligner-significantly outperforms existing aligners at optimizing their own objective functions, even achieving near-optimal solutions when the optimal solution is known. We offer the first demonstration of global network alignments based on topology alone that align functionally similar proteins with p-values in some cases below 10-300. We predict that topological network alignment has a bright future as edge densities increase toward the value where good alignments become possible. We demonstrate that when enough common topology is present at high enough edge densities-for example in the recent, partly synthetic networks of the Integrated Interaction Database-topological network alignment easily recovers most orthologs, paving the way toward high-throughput functional prediction based on topology-driven network alignment.
Collapse
Affiliation(s)
- Siyue Wang
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Xiaoyin Chen
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Brent J Frederisy
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Benedict A Mbakogu
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Amy D Kanne
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Pasha Khosravi
- Department of Computer Science, University of California, Irvine, CA, United States
| | - Wayne B Hayes
- Department of Computer Science, University of California, Irvine, CA, United States.
| |
Collapse
|
7
|
Ma L, Shao Z, Li L, Huang J, Wang S, Lin Q, Li J, Gong M, Nandi AK. Heuristics and metaheuristics for biological network alignment: A review. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.08.156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Xue XY, Chen Z, Hu Y, Nie D, Zhao H, Mao XG. Protein-protein interaction network of E. coli K-12 has significant high-dimensional cavities: New insights from algebraic topological studies. FEBS Open Bio 2022; 12:1406-1418. [PMID: 35560988 PMCID: PMC9249336 DOI: 10.1002/2211-5463.13437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 04/21/2022] [Accepted: 05/11/2022] [Indexed: 11/08/2022] Open
Abstract
As a model system, Escherichia coli (E. coli) has been used to study various life processes. A dramatic paradigm shift has occurred in recent years, with the study of single proteins moving towards the study of dynamically-interacting proteins, especially protein-protein interaction (PPI) networks. However, despite the importance of PPI networks, little is known about the intrinsic nature of the network structure, especially high-dimensional topological properties . By introducing general hypergeometric distribution, here we reconstruct a statistically-reliable combined PPI network of E. coli (E.coli-PPI-Network) from several datasets. Unlike traditional graph analysis, algebraic topology was introduced to analyze the topological structures of the E.coli-PPI-Network, including high-dimensional cavities and cycles. Random networks with the same node and edge number (RandomNet), or scale-free networks with the same degree distribution (RandomNet-SameDD) were produced as controls. We discovered that the E.coli-PPI-Network had special algebraic typological structures, exhibiting more high-dimensional cavities and cycles, compared to RandomNets or, importantly, RandomNet-SameDD. Based on these results, we defined degree of involved q dimensional cycles of proteins (q-DCprotein ) in the network, a novel concept which relies on the integral structure of the network and is different from traditional node degree or hubs. Finally, top proteins ranked by their 1-DCprotein were identified. In conclusion, by introducing mathematical and computer technologies, we discovered novel algebraic topological properties of the E.coli-PPI-Network, which has special high-dimensional cavities and cycles, and thereby revealed certain intrinsic rules of information flow underlining bacteria biology.
Collapse
Affiliation(s)
- Xiao-Yan Xue
- Department of Pharmacology, School of Pharmacy, Fourth Military Medical University, Xi'an, Shaanxi Province, People's Republic of China
| | - Zhou Chen
- Department of Pharmacology, School of Pharmacy, Fourth Military Medical University, Xi'an, Shaanxi Province, People's Republic of China
| | - Yue Hu
- Department of Pharmacology, School of Pharmacy, Fourth Military Medical University, Xi'an, Shaanxi Province, People's Republic of China
| | - Dan Nie
- Department of Pharmacology, School of Pharmacy, Fourth Military Medical University, Xi'an, Shaanxi Province, People's Republic of China
| | - Hui Zhao
- Department of Pharmacology, School of Pharmacy, Fourth Military Medical University, Xi'an, Shaanxi Province, People's Republic of China
| | - Xing-Gang Mao
- Department of Neurosurgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi Province, 710032, People's Republic of China
| |
Collapse
|
9
|
Zhao Y, Wang J, Guo M, Zhang X, Yu G. Cross-Species Protein Function Prediction with Asynchronous-Random Walk. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1439-1450. [PMID: 31562099 DOI: 10.1109/tcbb.2019.2943342] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein function prediction is a fundamental task in the post-genomic era. Available functional annotations of proteins are incomplete and the annotations of two homologous species are complementary to each other. However, how to effectively leverage mutually complementary annotations of different species to further boost the prediction performance is still not well studied. In this paper, we propose a cross-species protein function prediction approach by performing Asynchronous Random Walk on a heterogeneous network (AsyRW). AsyRW first constructs a heterogeneous network to integrate multiple functional association networks derived from different biological data, established homology-relationships between proteins from different species, known annotations of proteins and Gene Ontology (GO). To account for the intrinsic structures of intra- and inter-species of proteins and that of GO, AsyRW quantifies the individual walk lengths of each network node using the gravity-like theory, and then performs asynchronous-random walk with the individual length to predict associations between proteins and GO terms. Experiments on annotations archived in different years show that individual walk length and asynchronous-random walk can effectively leverage the complementary annotations of different species, AsyRW has a significantly improved performance to other related and competitive methods. The codes of AsyRW are available at: http://mlda.swu.edu.cn/codes.php?name=AsyRW.
Collapse
|
10
|
Woo HM, Yoon BJ. MONACO: accurate biological network alignment through optimal neighborhood matching between focal nodes. Bioinformatics 2021; 37:1401-1410. [PMID: 33165517 DOI: 10.1093/bioinformatics/btaa962] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 10/19/2020] [Accepted: 11/02/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Alignment of protein-protein interaction networks can be used for the unsupervised prediction of functional modules, such as protein complexes and signaling pathways, that are conserved across different species. To date, various algorithms have been proposed for biological network alignment, many of which attempt to incorporate topological similarity between the networks into the alignment process with the goal of constructing accurate and biologically meaningful alignments. Especially, random walk models have been shown to be effective for quantifying the global topological relatedness between nodes that belong to different networks by diffusing node-level similarity along the interaction edges. However, these schemes are not ideal for capturing the local topological similarity between nodes. RESULTS In this article, we propose MONACO, a novel and versatile network alignment algorithm that finds highly accurate pairwise and multiple network alignments through the iterative optimal matching of 'local' neighborhoods around focal nodes. Extensive performance assessment based on real networks as well as synthetic networks, for which the ground truth is known, demonstrates that MONACO clearly and consistently outperforms all other state-of-the-art network alignment algorithms that we have tested, in terms of accuracy, coherence and topological quality of the aligned network regions. Furthermore, despite the sharply enhanced alignment accuracy, MONACO remains computationally efficient and it scales well with increasing size and number of networks. AVAILABILITY AND IMPLEMENTATION Matlab implementation is freely available at https://github.com/bjyoontamu/MONACO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hyun-Myung Woo
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77845, USA.,Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| |
Collapse
|
11
|
Gu S, Milenković T. Data-driven biological network alignment that uses topological, sequence, and functional information. BMC Bioinformatics 2021; 22:34. [PMID: 33514304 PMCID: PMC7847157 DOI: 10.1186/s12859-021-03971-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 01/15/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Network alignment (NA) can transfer functional knowledge between species' conserved biological network regions. Traditional NA assumes that it is topological similarity (isomorphic-like matching) between network regions that corresponds to the regions' functional relatedness. However, we recently found that functionally unrelated proteins are as topologically similar as functionally related proteins. So, we redefined NA as a data-driven method called TARA, which learns from network and protein functional data what kind of topological relatedness (rather than similarity) between proteins corresponds to their functional relatedness. TARA used topological information (within each network) but not sequence information (between proteins across networks). Yet, TARA yielded higher protein functional prediction accuracy than existing NA methods, even those that used both topological and sequence information. RESULTS Here, we propose TARA++ that is also data-driven, like TARA and unlike other existing methods, but that uses across-network sequence information on top of within-network topological information, unlike TARA. To deal with the within-and-across-network analysis, we adapt social network embedding to the problem of biological NA. TARA++ outperforms protein functional prediction accuracy of existing methods. CONCLUSIONS As such, combining research knowledge from different domains is promising. Overall, improvements in protein functional prediction have biomedical implications, for example allowing researchers to better understand how cancer progresses or how humans age.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, Eck Institute for Global Health, Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, Eck Institute for Global Health, Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
12
|
Ma CY, Liao CS. A review of protein-protein interaction network alignment: From pathway comparison to global alignment. Comput Struct Biotechnol J 2020; 18:2647-2656. [PMID: 33033584 PMCID: PMC7533294 DOI: 10.1016/j.csbj.2020.09.011] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 09/01/2020] [Accepted: 09/05/2020] [Indexed: 12/13/2022] Open
Abstract
Network alignment provides a comprehensive way to discover the similar parts between molecular systems of different species based on topological and biological similarity. With such a strong basis, one can do comparative studies at a systems level in the field of computational biology. In this survey paper, we focus on protein-protein interaction networks and review some representative algorithms for network alignment in the past two decades as well as the state-of-the-art aligners. We also introduce the most popular evaluation measures in the literature to benchmark the performance of these approaches. Finally, we address several future challenges and the possible ways to conquer the existing problems of biological network alignment.
Collapse
Affiliation(s)
- Cheng-Yu Ma
- Chang Gung Memorial Hospital, No. 5, Fu-Hsing St., Kuei Shan Dist., Taoyuan City 33305, Taiwan, ROC
| | - Chung-Shou Liao
- National Tsing Hua University, No. 101, Section 2, Kuang-Fu Rd., Hsinchu City 30013, Taiwan, ROC
| |
Collapse
|
13
|
Abstract
In this study, we deal with the problem of biological network alignment (NA), which aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for the transfer of functional knowledge between the aligned nodes. We provide evidence that current NA methods, which assume that topologically similar nodes (i.e., nodes whose network neighborhoods are isomorphic-like) have high functional relatedness, do not actually end up aligning functionally related nodes. That is, we show that the current topological similarity assumption does not hold well. Consequently, we argue that a paradigm shift is needed with how the NA problem is approached. So, we redefine NA as a data-driven framework, called TARA (data-driven NA), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity. TARA makes no assumptions about what nodes should be aligned, distinguishing it from existing NA methods. Specifically, TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns (features). We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. TARA as currently implemented uses topological but not protein sequence information for functional knowledge transfer. In this context, we find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance. The software and data are available at http://www.nd.edu/~cone/TARA/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States of America
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, United States of America
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN, United States of America
| |
Collapse
|
14
|
Vijayan V, Gu S, Krebs ET, Meng L, MilenkoviĆ T. Pairwise Versus Multiple Global Network Alignment. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:41961-41974. [PMID: 33747670 PMCID: PMC7971151 DOI: 10.1109/access.2020.2976487] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Biological network alignment (NA) aims to identify similar regions between molecular networks of different species. NA can be local or global. Just as the recent trend in the NA field, we also focus on global NA, which can be pairwise (PNA) and multiple (MNA). PNA produces aligned node pairs between two networks. MNA produces aligned node clusters between more than two networks. Recently, the focus has shifted from PNA to MNA, because MNA captures conserved regions between more networks than PNA (and MNA is thus hypothesized to yield higher-quality alignments), though at higher computational complexity. The issue is that, due to the different outputs of PNA and MNA, a PNA method is only compared to other PNA methods, and an MNA method is only compared to other MNA methods. Comparison of PNA against MNA must be done to evaluate whether MNA indeed yields higher-quality alignments, as only this would justify MNA's higher computational complexity. We introduce a framework that allows for this. We evaluate eight prominent PNA and MNA methods, on synthetic and real-world biological networks, using topological and functional alignment quality measures. We compare PNA against MNA in both a pairwise (native to PNA) and multiple (native to MNA) manner. PNA is expected to perform better under the pairwise evaluation framework. Indeed this is what we find. MNA is expected to perform better under the multiple evaluation framework. Shockingly, we find this not always to hold; PNA is often better than MNA in this framework, depending on the choice of evaluation test.
Collapse
Affiliation(s)
- Vipin Vijayan
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Shawn Gu
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Eric T Krebs
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Lei Meng
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana MilenkoviĆ
- Center for Network and Data Science, Department of Computer Science and Engineering, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
15
|
Flores J, García E, Pedroche F, Romance M. Parametric controllability of the personalized PageRank: Classic model vs biplex approach. CHAOS (WOODBURY, N.Y.) 2020; 30:023115. [PMID: 32113253 DOI: 10.1063/1.5128567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Accepted: 01/20/2020] [Indexed: 06/10/2023]
Abstract
Measures of centrality in networks defined by means of matrix algebra, like PageRank-type centralities, have been used for over 70 years. Recently, new extensions of PageRank have been formulated and may include a personalization (or teleportation) vector. It is accepted that one of the key issues for any centrality measure formulation is to what extent someone can control its variability. In this paper, we compare the limits of variability of two centrality measures for complex networks that we call classic PageRank (PR) and biplex approach PageRank (BPR). Both centrality measures depend on the so-called damping parameterα that controls the quantity of teleportation. Our first result is that the intersection of the intervals of variation of both centrality measures is always a nonempty set. Our second result is that when α is lower that 0.48 (and, therefore, the ranking is highly affected by teleportation effects) then the upper limits of PR are more controllable than the upper limits of BPR; on the contrary, when α is greater than 0.5 (and we recall that the usual PageRank algorithm uses the value 0.85), then the upper limits of PR are less controllable than the upper limits of BPR, provided certain mild assumptions on the local structure of the graph. Regarding the lower limits of variability, we give a result for small values of α. We illustrate the results with some analytical networks and also with a real Facebook network.
Collapse
Affiliation(s)
- Julio Flores
- Department of Applied Mathematics, Rey Juan Carlos University, 28933 Madrid, Spain
| | - Esther García
- Department of Applied Mathematics, Rey Juan Carlos University, 28933 Madrid, Spain
| | - Francisco Pedroche
- Institut de Matemàtica Multidisciplinària, Universitat Politècnica de València, València, Spain
| | - Miguel Romance
- Department of Applied Mathematics, Rey Juan Carlos University, 28933 Madrid, Spain
| |
Collapse
|
16
|
Nguyen ND, Blaby IK, Wang D. ManiNetCluster: a novel manifold learning approach to reveal the functional links between gene networks. BMC Genomics 2019; 20:1003. [PMID: 31888454 PMCID: PMC6936142 DOI: 10.1186/s12864-019-6329-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The coordination of genomic functions is a critical and complex process across biological systems such as phenotypes or states (e.g., time, disease, organism, environmental perturbation). Understanding how the complexity of genomic function relates to these states remains a challenge. To address this, we have developed a novel computational method, ManiNetCluster, which simultaneously aligns and clusters gene networks (e.g., co-expression) to systematically reveal the links of genomic function between different conditions. Specifically, ManiNetCluster employs manifold learning to uncover and match local and non-linear structures among networks, and identifies cross-network functional links. RESULTS We demonstrated that ManiNetCluster better aligns the orthologous genes from their developmental expression profiles across model organisms than state-of-the-art methods (p-value <2.2×10-16). This indicates the potential non-linear interactions of evolutionarily conserved genes across species in development. Furthermore, we applied ManiNetCluster to time series transcriptome data measured in the green alga Chlamydomonas reinhardtii to discover the genomic functions linking various metabolic processes between the light and dark periods of a diurnally cycling culture. We identified a number of genes putatively regulating processes across each lighting regime. CONCLUSIONS ManiNetCluster provides a novel computational tool to uncover the genes linking various functions from different networks, providing new insight on how gene functions coordinate across different conditions. ManiNetCluster is publicly available as an R package at https://github.com/daifengwanglab/ManiNetCluster.
Collapse
Affiliation(s)
- Nam D Nguyen
- Deparment of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA
| | - Ian K Blaby
- Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA. .,US Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, 4720, CA, USA.
| | - Daifeng Wang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, 53726, WI, USA. .,Waisman Center, University of Wisconsin-Madison, Madison, 53705, WI, USA.
| |
Collapse
|
17
|
Maskey S, Cho YR. LePrimAlign: local entropy-based alignment of PPI networks to predict conserved modules. BMC Genomics 2019; 20:964. [PMID: 31874635 PMCID: PMC6929407 DOI: 10.1186/s12864-019-6271-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Cross-species analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved interaction patterns. Identifying such conserved substructures between PPI networks of different species increases our understanding of the principles deriving evolution of cellular organizations and their functions in a system level. In recent years, network alignment techniques have been applied to genome-scale PPI networks to predict evolutionary conserved modules. Although a wide variety of network alignment algorithms have been introduced, developing a scalable local network alignment algorithm with high accuracy is still challenging. Results We present a novel pairwise local network alignment algorithm, called LePrimAlign, to predict conserved modules between PPI networks of three different species. The proposed algorithm exploits the results of a pairwise global alignment algorithm with many-to-many node mapping. It also applies the concept of graph entropy to detect initial cluster pairs from two networks. Finally, the initial clusters are expanded to increase the local alignment score that is formulated by a combination of intra-network and inter-network scores. The performance comparison with state-of-the-art approaches demonstrates that the proposed algorithm outperforms in terms of accuracy of identified protein complexes and quality of alignments. Conclusion The proposed method produces local network alignment of higher accuracy in predicting conserved modules even with large biological networks at a reduced computational cost.
Collapse
Affiliation(s)
- Sawal Maskey
- Department of Computer Science, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA
| | - Young-Rae Cho
- Department of Computer Science, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA. .,Bioinformatics Program, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA.
| |
Collapse
|