1
|
Abbas MN, Broneske D, Saake G. A multi-objective evolutionary algorithm for detecting protein complexes in PPI networks using gene ontology. Sci Rep 2025; 15:16855. [PMID: 40374682 DOI: 10.1038/s41598-025-01667-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2025] [Accepted: 05/07/2025] [Indexed: 05/17/2025] Open
Abstract
Detecting protein complexes is crucial in computational biology for understanding cellular mechanisms and facilitating drug discovery. Evolutionary algorithms (EAs) have proven effective in uncovering protein complexes within networks of protein-protein interactions (PPIs). However, their integration with functional insights from gene ontology (GO) annotations remains underexplored. This paper presents two primary contributions: First, it proposes a novel multi-objective optimization model for detecting protein complexes, conceptualizing the task as a problem with inherently conflicting objectives based on biological data. Second, it introduces an innovative gene ontology-based mutation operator, termed the Functional Similarity-Based Protein Translocation Operator ([Formula: see text]). This operator enhances collaboration between the canonical model and the GO-informed mutation strategy, thereby improving the algorithm's performance. As far as we know, this is the initial effort to incorporate the biological characteristics of PPIs into both the problem formulation and the development of intricate perturbation strategies. We assess the effectiveness of the proposed multi-objective evolutionary algorithm through experiments conducted on two widely recognized PPI networks and two standard complex datasets provided by the Munich Information Center for Protein Sequences (MIPS). To further assess the robustness of our algorithm, we create artificial networks by introducing different noise levels into the original Saccharomyces cerevisiae (yeast) PPI networks. This allows us to evaluate how perturbations in protein interactions affect the algorithm's performance compared to other approaches. The experimental results highlight that our algorithm outperforms several state-of-the-art methods in accurately identifying protein complexes. Moreover, the findings emphasize the substantial advantages of incorporating our heuristic perturbation operator, which significantly improves the quality of the detected complexes over other evolutionary algorithm-based methods.
Collapse
Affiliation(s)
- Mustafa N Abbas
- Databases and Software Engineering, Otto-von-Guericke-University, Magdeburg, Germany.
| | - David Broneske
- German Centre for Higher Education Research and Science Studies, Hannover, Germany
| | - Gunter Saake
- Databases and Software Engineering, Otto-von-Guericke-University, Magdeburg, Germany
| |
Collapse
|
2
|
Wang R, Ma H, Wang C. An Ensemble Learning Framework for Detecting Protein Complexes From PPI Networks. Front Genet 2022; 13:839949. [PMID: 35281831 PMCID: PMC8908451 DOI: 10.3389/fgene.2022.839949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 01/31/2022] [Indexed: 11/14/2022] Open
Abstract
Detecting protein complexes is one of the keys to understanding cellular organization and processes principles. With high-throughput experiments and computing science development, it has become possible to detect protein complexes by computational methods. However, most computational methods are based on either unsupervised learning or supervised learning. Unsupervised learning-based methods do not need training datasets, but they can only detect one or several topological protein complexes. Supervised learning-based methods can detect protein complexes with different topological structures. However, they are usually based on a type of training model, and the generalization of a single model is poor. Therefore, we propose an Ensemble Learning Framework for Detecting Protein Complexes (ELF-DPC) within protein-protein interaction (PPI) networks to address these challenges. The ELF-DPC first constructs the weighted PPI network by combining topological and biological information. Second, it mines protein complex cores using the protein complex core mining strategy we designed. Third, it obtains an ensemble learning model by integrating structural modularity and a trained voting regressor model. Finally, it extends the protein complex cores and forms protein complexes by a graph heuristic search strategy. The experimental results demonstrate that ELF-DPC performs better than the twelve state-of-the-art approaches. Moreover, functional enrichment analysis illustrated that ELF-DPC could detect biologically meaningful protein complexes. The code/dataset is available for free download from https://github.com/RongquanWang/ELF-DPC.
Collapse
Affiliation(s)
- Rongquan Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Huimin Ma
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
- *Correspondence: Huimin Ma,
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, Beijing, China
| |
Collapse
|
3
|
A New Framework for Discovering Protein Complex and Disease Association via Mining Multiple Databases. Interdiscip Sci 2021; 13:683-692. [PMID: 33905111 DOI: 10.1007/s12539-021-00432-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 03/31/2021] [Accepted: 04/09/2021] [Indexed: 10/21/2022]
Abstract
One important challenge in the post-genomic era is to explore disease mechanisms by efficiently integrating different types of biological data. In fact, a single disease is usually caused through multiple genes products such as protein complexes rather than single gene. Therefore, it is meaningful for us to discover protein communities from the protein-protein interaction network and use them for inferring disease-disease associations. In this article, we propose a new framework including protein-protein networks, disease-gene associations and disease-complex pairs to cluster protein complexes and infer disease associations. Complexes discovered by our approach is superior in quality (Sn, PPV and ACC) and clustering quantity than other four popular methods on three PPI networks. A systematic analysis shows that disease pairs sharing more protein complexes (such as Glucose and Lipid Metabolic Disorders) are more similar and overlapping proteins may have different roles in different diseases. These findings can provide clinical scholars and medical practitioners with new ideas on disease identification and treatment.
Collapse
|
4
|
HFADE-FMD: a hybrid approach of fireworks algorithm and differential evolution strategies for functional module detection in protein-protein interaction networks. APPL INTELL 2021. [DOI: 10.1007/s10489-020-01791-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
5
|
Wu Z, Liao Q, Fan S, Liu B. idenPC-CAP: Identify protein complexes from weighted RNA-protein heterogeneous interaction networks using co-assemble partner relation. Brief Bioinform 2020; 22:6041167. [PMID: 33333549 DOI: 10.1093/bib/bbaa372] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/07/2020] [Accepted: 11/20/2020] [Indexed: 12/18/2022] Open
Abstract
Protein complexes play important roles in most cellular processes. The available genome-wide protein-protein interaction (PPI) data make it possible for computational methods identifying protein complexes from PPI networks. However, PPI datasets usually contain a large ratio of false positive noise. Moreover, different types of biomolecules in a living cell cooperate to form a union interaction network. Because previous computational methods focus only on PPIs ignoring other types of biomolecule interactions, their predicted protein complexes often contain many false positive proteins. In this study, we develop a novel computational method idenPC-CAP to identify protein complexes from the RNA-protein heterogeneous interaction network consisting of RNA-RNA interactions, RNA-protein interactions and PPIs. By considering interactions among proteins and RNAs, the new method reduces the ratio of false positive proteins in predicted protein complexes. The experimental results demonstrate that idenPC-CAP outperforms the other state-of-the-art methods in this field.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Shixi Fan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| |
Collapse
|
6
|
Hu L, Yuan X, Liu X, Xiong S, Luo X. Efficiently Detecting Protein Complexes from Protein Interaction Networks via Alternating Direction Method of Multipliers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1922-1935. [PMID: 29994334 DOI: 10.1109/tcbb.2018.2844256] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Protein complexes are crucial in improving our understanding of the mechanisms employed by proteins. Various computational algorithms have thus been proposed to detect protein complexes from protein interaction networks. However, given massive protein interactome data obtained by high-throughput technologies, existing algorithms, especially those with additionally consideration of biological information of proteins, either have low efficiency in performing their tasks or suffer from limited effectiveness. For addressing this issue, this work proposes to detect protein complexes from a protein interaction network with high efficiency and effectiveness. To do so, the original detection task is first formulated into an optimization problem according to the intuitive properties of protein complexes. After that, the framework of alternating direction method of multipliers is applied to decompose this optimization problem into several subtasks, which can be subsequently solved in a separate and parallel manner. An algorithm for implementing this solution is then developed. Experimental results on five large protein interaction networks demonstrated that compared to state-of-the-art protein complex detection algorithms, our algorithm outperformed them in terms of both effectiveness and efficiency. Moreover, as number of parallel processes increases, one can expect an even higher computational efficiency for the proposed algorithm with no compromise on effectiveness.
Collapse
|
7
|
CDAP: An Online Package for Evaluation of Complex Detection Methods. Sci Rep 2019; 9:12751. [PMID: 31485005 PMCID: PMC6726630 DOI: 10.1038/s41598-019-49225-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 08/21/2019] [Indexed: 01/21/2023] Open
Abstract
Methods for detecting protein complexes from protein-protein interaction networks are of the most critical computational approaches. Numerous methods have been proposed in this area. Therefore, it is necessary to evaluate them. Various metrics have been proposed in order to compare these methods. Nevertheless, it is essential to define new metrics that evaluate methods both qualitatively and quantitatively. In addition, there is no tool for the comprehensive comparison of such methods. In this paper, a new criterion is introduced that can fully evaluate protein complex detection algorithms. We introduce CDAP (Complex Detection Analyzer Package); an online package for comparing protein complex detection methods. CDAP can quickly rank the performance of methods based on previously defined as well as newly introduced criteria in various settings (4 PPI datasets and 3 gold standards). It has the capability of integrating various methods and apply several filterings on the results. CDAP can be easily extended to include new datasets, gold standards, and methods. Furthermore, the user can compare the results of a custom method with the results of existing methods. Thus, the authors of future papers can use CDAP for comparing their method with the previous ones. A case study is done on YGR198W, a well-known protein, and the detected clusters are compared to the known complexes of this protein.
Collapse
|
8
|
Li S, Cao Y, Li L, Zhang H, Lu X, Bo C, Kong X, Liu Z, Chen L, Liu P, Jiao Y, Wang J, Ning S, Wang L. Building the drug-GO function network to screen significant candidate drugs for myasthenia gravis. PLoS One 2019; 14:e0214857. [PMID: 30947317 PMCID: PMC6448860 DOI: 10.1371/journal.pone.0214857] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 03/22/2019] [Indexed: 12/17/2022] Open
Abstract
Myasthenia gravis (MG) is an autoimmune disease. In recent years, considerable evidence has indicated that Gene Ontology (GO) functions, especially GO-biological processes, have important effects on the mechanisms and treatments of different diseases. However, the roles of GO functions in the pathogenesis and treatment of MG have not been well studied. This study aimed to uncover the potential important roles of risk-related GO functions and to screen significant candidate drugs related to GO functions for MG. Based on MG risk genes, 238 risk GO functions and 42 drugs were identified. Through constructing a GO function network, we discovered that positive regulation of NF-kappaB transcription factor activity (GO:0051092) may be one of the most important GO functions in the mechanism of MG. Furthermore, we built a drug-GO function network to help evaluate the latent relationship between drugs and GO functions. According to the drug-GO function network, 5 candidate drugs showing promise for treating MG were identified. Indeed, 2 out of 5 candidate drugs have been investigated to treat MG. Through functional enrichment analysis, we found that the mechanisms between 5 candidate drugs and associated GO functions may involve two vital pathways, specifically hsa05332 (graft-versus-host disease) and hsa04940 (type I diabetes mellitus). More interestingly, most of the processes in these two pathways were consistent. Our study will not only reveal a new perspective on the mechanisms and novel treatment strategies of MG, but also will provide strong support for research on GO functions.
Collapse
Affiliation(s)
- Shuang Li
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Yuze Cao
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
- Department of Neurology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, China
| | - Lei Li
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Huixue Zhang
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xiaoyu Lu
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Chunrui Bo
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xiaotong Kong
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Zhaojun Liu
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Lixia Chen
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Peifang Liu
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Yang Jiao
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Jianjian Wang
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LW); (SN); (JW)
| | - Shangwei Ning
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LW); (SN); (JW)
| | - Lihua Wang
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LW); (SN); (JW)
| |
Collapse
|
9
|
Performance evaluation measures for protein complex prediction. Genomics 2018; 111:1483-1492. [PMID: 30312661 DOI: 10.1016/j.ygeno.2018.10.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 09/25/2018] [Accepted: 10/04/2018] [Indexed: 02/01/2023]
Abstract
Protein complexes play a dominant role in cellular organization and function. Prediction of protein complexes from the network of physical interactions between proteins (PPI networks) has thus become one of the important research areas. Recently, many computational approaches have been developed to identify these complexes. Various performance assessment measures have been proposed for evaluating the efficiency of these methods. However, there are many inconsistencies in the definitions and usage of the measures across the literature. To address this issue, we have gathered and presented the most important performance evaluation measures and developed a tool, named CompEvaluator, to critically assess the protein complex prediction methods. The tool and documentation are publicly available at https://sourceforge.net/projects/compevaluator/files/.
Collapse
|
10
|
Detection of Protein Complexes Based on Penalized Matrix Decomposition in a Sparse Protein⁻Protein Interaction Network. Molecules 2018; 23:molecules23061460. [PMID: 29914123 PMCID: PMC6100434 DOI: 10.3390/molecules23061460] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 06/11/2018] [Accepted: 06/12/2018] [Indexed: 01/20/2023] Open
Abstract
High-throughput technology has generated large-scale protein interaction data, which is crucial in our understanding of biological organisms. Many complex identification algorithms have been developed to determine protein complexes. However, these methods are only suitable for dense protein interaction networks, because their capabilities decrease rapidly when applied to sparse protein–protein interaction (PPI) networks. In this study, based on penalized matrix decomposition (PMD), a novel method of penalized matrix decomposition for the identification of protein complexes (i.e., PMDpc) was developed to detect protein complexes in the human protein interaction network. This method mainly consists of three steps. First, the adjacent matrix of the protein interaction network is normalized. Second, the normalized matrix is decomposed into three factor matrices. The PMDpc method can detect protein complexes in sparse PPI networks by imposing appropriate constraints on factor matrices. Finally, the results of our method are compared with those of other methods in human PPI network. Experimental results show that our method can not only outperform classical algorithms, such as CFinder, ClusterONE, RRW, HC-PIN, and PCE-FR, but can also achieve an ideal overall performance in terms of a composite score consisting of F-measure, accuracy (ACC), and the maximum matching ratio (MMR).
Collapse
|
11
|
MTGO: PPI Network Analysis Via Topological and Functional Module Identification. Sci Rep 2018; 8:5499. [PMID: 29615773 PMCID: PMC5882952 DOI: 10.1038/s41598-018-23672-0] [Citation(s) in RCA: 82] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 02/28/2018] [Indexed: 11/08/2022] Open
Abstract
Protein-protein interaction (PPI) networks are viable tools to understand cell functions, disease machinery, and drug design/repositioning. Interpreting a PPI, however, it is a particularly challenging task because of network complexity. Several algorithms have been proposed for an automatic PPI interpretation, at first by solely considering the network topology, and later by integrating Gene Ontology (GO) terms as node similarity attributes. Here we present MTGO - Module detection via Topological information and GO knowledge, a novel functional module identification approach. MTGO let emerge the bimolecular machinery underpinning PPI networks by leveraging on both biological knowledge and topological properties. In particular, it directly exploits GO terms during the module assembling process, and labels each module with its best fit GO term, easing its functional interpretation. MTGO shows largely better results than other state of the art algorithms (including recent GO-based ones) when searching for small or sparse functional modules, while providing comparable or better results all other cases. MTGO correctly identifies molecular complexes and literature-consistent processes in an experimentally derived PPI network of Myocardial infarction. A software version of MTGO is available freely for non-commercial purposes at https://gitlab.com/d1vella/MTGO .
Collapse
|
12
|
Lei X, Zhang Y, Cheng S, Wu FX, Pedrycz W. Topology potential based seed-growth method to identify protein complexes on dynamic PPI data. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2017.10.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
13
|
Hernandez C, Mella C, Navarro G, Olivera-Nappa A, Araya J. Protein complex prediction via dense subgraphs and false positive analysis. PLoS One 2017; 12:e0183460. [PMID: 28937982 PMCID: PMC5609739 DOI: 10.1371/journal.pone.0183460] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 08/04/2017] [Indexed: 01/04/2023] Open
Abstract
Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.
Collapse
Affiliation(s)
- Cecilia Hernandez
- Computer Science, University of Concepción, Concepción, Chile
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
- * E-mail:
| | - Carlos Mella
- Computer Science, University of Concepción, Concepción, Chile
| | - Gonzalo Navarro
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
| | - Alvaro Olivera-Nappa
- Center for Biotechnology and Bioengineering (CeBiB), Department of Chemical Engineering and Biotechnology, University of Chile, Santiago, Chile
| | - Jaime Araya
- Computer Science, University of Concepción, Concepción, Chile
| |
Collapse
|
14
|
Zhou H, Liu J, Li J, Duan W. A density-based approach for detecting complexes in weighted PPI networks by semantic similarity. PLoS One 2017; 12:e0180570. [PMID: 28704455 PMCID: PMC5507511 DOI: 10.1371/journal.pone.0180570] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Accepted: 06/16/2017] [Indexed: 11/23/2022] Open
Abstract
Protein complex detection in PPI networks plays an important role in analyzing biological processes. A new algorithm-DBGPWN-is proposed for predicting complexes in PPI networks. Firstly, a method based on gene ontology is used to measure semantic similarities between interacted proteins, and the similarity values are used as their weights. Then, a density-based graph partitioning algorithm is developed to find clusters in the weighted PPI networks, and the identified ones are considered to be dense and similar. Experimental results demonstrate that our approach achieves good performance as compared with such algorithms as MCL, CMC, MCODE, RNSC, CORE, ClusterOne and FGN.
Collapse
Affiliation(s)
- HongFang Zhou
- School of Computer Science and Engineering, Xi'an University of Technology, Xi’an, China
| | - Jie Liu
- School of Computer Science and Engineering, Xi'an University of Technology, Xi’an, China
| | - JunHuai Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi’an, China
| | - WenCong Duan
- School of Computer Science and Engineering, Xi'an University of Technology, Xi’an, China
| |
Collapse
|
15
|
Maddi AMA, Eslahchi C. Discovering overlapped protein complexes from weighted PPI networks by removing inter-module hubs. Sci Rep 2017; 7:3247. [PMID: 28607455 PMCID: PMC5468366 DOI: 10.1038/s41598-017-03268-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 04/26/2017] [Indexed: 12/21/2022] Open
Abstract
Detecting known protein complexes and predicting undiscovered protein complexes from protein-protein interaction (PPI) networks help us to understand principles of cell organization and its functions. Nevertheless, the discovery of protein complexes based on experiment still needs to be explored. Therefore, computational methods are useful approaches to overcome the experimental limitations. Nevertheless, extraction of protein complexes from PPI network is often nontrivial. Two major constraints are large amount of noise and ignorance of occurrence time of different interactions in PPI network. In this paper, an efficient algorithm, Inter Module Hub Removal Clustering (IMHRC), is developed based on inter-module hub removal in the weighted PPI network which can detect overlapped complexes. By removing some of the inter-module hubs and module hubs, IMHRC eliminates high amount of noise in dataset and implicitly considers different occurrence time of the PPI in network. The performance of the IMHRC was evaluated on several benchmark datasets and results were compared with some of the state-of-the-art models. The protein complexes discovered with the IMHRC method show significantly better agreement with the real complexes than other current methods. Our algorithm provides an accurate and scalable method for detecting and predicting protein complexes from PPI networks.
Collapse
Affiliation(s)
- A M A Maddi
- Department of Electrical and computer Engineering, Isfahan University of Technology, Isfahan, 1983963113, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, 193955746, Iran
| | - Ch Eslahchi
- Department of Computer Sciences, Faculty of Mathematics, Shahid Beheshti University, Tehran, 1983963113, Iran.
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, 193955746, Iran.
| |
Collapse
|
16
|
Vella D, Zoppis I, Mauri G, Mauri P, Di Silvestre D. From protein-protein interactions to protein co-expression networks: a new perspective to evaluate large-scale proteomic data. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2017; 2017:6. [PMID: 28477207 PMCID: PMC5359264 DOI: 10.1186/s13637-017-0059-z] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Accepted: 03/09/2017] [Indexed: 12/19/2022]
Abstract
The reductionist approach of dissecting biological systems into their constituents has been successful in the first stage of the molecular biology to elucidate the chemical basis of several biological processes. This knowledge helped biologists to understand the complexity of the biological systems evidencing that most biological functions do not arise from individual molecules; thus, realizing that the emergent properties of the biological systems cannot be explained or be predicted by investigating individual molecules without taking into consideration their relations. Thanks to the improvement of the current -omics technologies and the increasing understanding of the molecular relationships, even more studies are evaluating the biological systems through approaches based on graph theory. Genomic and proteomic data are often combined with protein-protein interaction (PPI) networks whose structure is routinely analyzed by algorithms and tools to characterize hubs/bottlenecks and topological, functional, and disease modules. On the other hand, co-expression networks represent a complementary procedure that give the opportunity to evaluate at system level including organisms that lack information on PPIs. Based on these premises, we introduce the reader to the PPI and to the co-expression networks, including aspects of reconstruction and analysis. In particular, the new idea to evaluate large-scale proteomic data by means of co-expression networks will be discussed presenting some examples of application. Their use to infer biological knowledge will be shown, and a special attention will be devoted to the topological and module analysis.
Collapse
Affiliation(s)
- Danila Vella
- Institute for Biomedical Technologies - National Research Council (ITB-CNR), 93 Fratelli Cervi, Segrate, Milan, Italy.,Department of Computer Science, Systems and Communication DiSCo, University of Milano-Bicocca, 336 Viale Sarca, Milan, Italy
| | - Italo Zoppis
- Department of Computer Science, Systems and Communication DiSCo, University of Milano-Bicocca, 336 Viale Sarca, Milan, Italy
| | - Giancarlo Mauri
- Department of Computer Science, Systems and Communication DiSCo, University of Milano-Bicocca, 336 Viale Sarca, Milan, Italy
| | - Pierluigi Mauri
- Institute for Biomedical Technologies - National Research Council (ITB-CNR), 93 Fratelli Cervi, Segrate, Milan, Italy
| | - Dario Di Silvestre
- Institute for Biomedical Technologies - National Research Council (ITB-CNR), 93 Fratelli Cervi, Segrate, Milan, Italy.
| |
Collapse
|
17
|
Shen X, Yi L, Jiang X, Zhao Y, Hu X, He T, Yang J. Neighbor affinity based algorithm for discovering temporal protein complex from dynamic PPI network. Methods 2016; 110:90-96. [PMID: 27320204 DOI: 10.1016/j.ymeth.2016.06.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Revised: 05/31/2016] [Accepted: 06/14/2016] [Indexed: 12/13/2022] Open
Abstract
Detection of temporal protein complexes would be a great aid in furthering our knowledge of the dynamic features and molecular mechanism in cell life activities. Most existing clustering algorithms for discovering protein complexes are based on static protein interaction networks in which the inherent dynamics are often overlooked. We propose a novel algorithm DPC-NADPIN (Discovering Protein Complexes based on Neighbor Affinity and Dynamic Protein Interaction Network) to identify temporal protein complexes from the time course protein interaction networks. Inspired by the idea of that the tighter a protein's neighbors inside a module connect, the greater the possibility that the protein belongs to the module, DPC-NADPIN algorithm first chooses each of the proteins with high clustering coefficient and its neighbors to consolidate into an initial cluster, and then the initial cluster becomes a protein complex by appending its neighbor proteins according to the relationship between the affinity among neighbors inside the cluster and that outside the cluster. In our experiments, DPC-NADPIN algorithm is proved to be reasonable and it has better performance on discovering protein complexes than the following state-of-the-art algorithms: Hunter, MCODE, CFinder, SPICI, and ClusterONE; Meanwhile, it obtains many protein complexes with strong biological significance, which provide helpful biological knowledge to the related researchers. Moreover, we find that proteins are assembled coordinately to form protein complexes with characteristics of temporality and spatiality, thereby performing specific biological functions.
Collapse
Affiliation(s)
- Xianjun Shen
- School of Computer, Central China Normal University, Wuhan, China.
| | - Li Yi
- School of Computer, Central China Normal University, Wuhan, China.
| | - Xingpeng Jiang
- School of Computer, Central China Normal University, Wuhan, China.
| | - Yanli Zhao
- School of Computer, Central China Normal University, Wuhan, China.
| | - Xiaohua Hu
- School of Computer, Central China Normal University, Wuhan, China; College of Computing and Informatics, Drexel University, Philadelphia, USA.
| | - Tingting He
- School of Computer, Central China Normal University, Wuhan, China.
| | - Jincai Yang
- School of Computer, Central China Normal University, Wuhan, China.
| |
Collapse
|
18
|
Using contrast patterns between true complexes and random subgraphs in PPI networks to predict unknown protein complexes. Sci Rep 2016; 6:21223. [PMID: 26868667 PMCID: PMC4751475 DOI: 10.1038/srep21223] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 01/19/2016] [Indexed: 02/02/2023] Open
Abstract
Most protein complex detection methods utilize unsupervised techniques to cluster densely connected nodes in a protein-protein interaction (PPI) network, in spite of the fact that many true complexes are not dense subgraphs. Supervised methods have been proposed recently, but they do not answer why a group of proteins are predicted as a complex, and they have not investigated how to detect new complexes of one species by training the model on the PPI data of another species. We propose a novel supervised method to address these issues. The key idea is to discover emerging patterns (EPs), a type of contrast pattern, which can clearly distinguish true complexes from random subgraphs in a PPI network. An integrative score of EPs is defined to measure how likely a subgraph of proteins can form a complex. New complexes thus can grow from our seed proteins by iteratively updating this score. The performance of our method is tested on eight benchmark PPI datasets and compared with seven unsupervised methods, two supervised and one semi-supervised methods under five standards to assess the quality of the predicted complexes. The results show that in most cases our method achieved a better performance, sometimes significantly.
Collapse
|