1
|
Wang W, Meng X, Xiang J, Dino Bedru H, Li M. Dopcc: Detecting Overlapping Protein Complexes via Multi-Metrics and Co-Core Attachment Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2000-2010. [PMID: 39018215 DOI: 10.1109/tcbb.2024.3429546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2024]
Abstract
Identification of protein complex is an important issue in the field of system biology, which is crucial to understanding the cellular organization and inferring protein functions. Recently, many computational methods have been proposed to detect protein complexes from protein-protein interaction (PPI) networks. However, most of these methods only focus on local information of proteins in the PPI network, which are easily affected by the noise in the PPI network. Meanwhile, it's still challenging to detect protein complexes, especially for overlapping cases. To address these issues, we propose a new method, named Dopcc, to detect overlapping protein complexes by constructing a multi-metrics network according to different strategies. First, we adopt the Jaccard coefficient to measure the neighbor similarity between proteins and denoise the PPI network. Then, we propose a new strategy, integrating hierarchical compressing with network embedding, to capture the high-order structural similarity between proteins. Further, a new co-core attachment strategy is proposed to detect overlapping protein complexes from multi-metrics. The experimental results show that our proposed method, Dopcc, outperforms the other eight state-of-the-art methods in terms of F-measure, MMR, and Composite Score on two yeast datasets.
Collapse
|
2
|
Xia S, Li D, Deng X, Liu Z, Zhu H, Liu Y, Li D. Integration of protein sequence and protein-protein interaction data by hypergraph learning to identify novel protein complexes. Brief Bioinform 2024; 25:bbae274. [PMID: 38851299 PMCID: PMC11162299 DOI: 10.1093/bib/bbae274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/10/2024] Open
Abstract
Protein-protein interactions (PPIs) are the basis of many important biological processes, with protein complexes being the key forms implementing these interactions. Understanding protein complexes and their functions is critical for elucidating mechanisms of life processes, disease diagnosis and treatment and drug development. However, experimental methods for identifying protein complexes have many limitations. Therefore, it is necessary to use computational methods to predict protein complexes. Protein sequences can indicate the structure and biological functions of proteins, while also determining their binding abilities with other proteins, influencing the formation of protein complexes. Integrating these characteristics to predict protein complexes is very promising, but currently there is no effective framework that can utilize both protein sequence and PPI network topology for complex prediction. To address this challenge, we have developed HyperGraphComplex, a method based on hypergraph variational autoencoder that can capture expressive features from protein sequences without feature engineering, while also considering topological properties in PPI networks, to predict protein complexes. Experiment results demonstrated that HyperGraphComplex achieves satisfactory predictive performance when compared with state-of-art methods. Further bioinformatics analysis shows that the predicted protein complexes have similar attributes to known ones. Moreover, case studies corroborated the remarkable predictive capability of our model in identifying protein complexes, including 3 that were not only experimentally validated by recent studies but also exhibited high-confidence structural predictions from AlphaFold-Multimer. We believe that the HyperGraphComplex algorithm and our provided proteome-wide high-confidence protein complex prediction dataset will help elucidate how proteins regulate cellular processes in the form of complexes, and facilitate disease diagnosis and treatment and drug development. Source codes are available at https://github.com/LiDlab/HyperGraphComplex.
Collapse
Affiliation(s)
- Simin Xia
- School of Basic Medical Sciences, Anhui Medical University, 81 Meishan Road, Shushan District, Hefei 230032, China
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, 38 Life Science Park, Changping District, Beijing 102206, China
| | - Dianke Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, 38 Life Science Park, Changping District, Beijing 102206, China
- State Key Laboratory of Farm Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, 2 Yuanmingyuan West Road, Haidian District, Beijing 100193, China
| | - Xinru Deng
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, 38 Life Science Park, Changping District, Beijing 102206, China
| | - Zhongyang Liu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, 38 Life Science Park, Changping District, Beijing 102206, China
| | - Huaqing Zhu
- School of Basic Medical Sciences, Anhui Medical University, 81 Meishan Road, Shushan District, Hefei 230032, China
| | - Yuan Liu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, 38 Life Science Park, Changping District, Beijing 102206, China
| | - Dong Li
- School of Basic Medical Sciences, Anhui Medical University, 81 Meishan Road, Shushan District, Hefei 230032, China
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, 38 Life Science Park, Changping District, Beijing 102206, China
| |
Collapse
|
3
|
Pan Y, Li R, Li W, Lv L, Guan J, Zhou S. HPC-Atlas: Computationally Constructing A Comprehensive Atlas of Human Protein Complexes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:976-990. [PMID: 37730114 PMCID: PMC10928439 DOI: 10.1016/j.gpb.2023.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 04/23/2023] [Accepted: 05/08/2023] [Indexed: 09/22/2023]
Abstract
A fundamental principle of biology is that proteins tend to form complexes to play important roles in the core functions of cells. For a complete understanding of human cellular functions, it is crucial to have a comprehensive atlas of human protein complexes. Unfortunately, we still lack such a comprehensive atlas of experimentally validated protein complexes, which prevents us from gaining a complete understanding of the compositions and functions of human protein complexes, as well as the underlying biological mechanisms. To fill this gap, we built Human Protein Complexes Atlas (HPC-Atlas), as far as we know, the most accurate and comprehensive atlas of human protein complexes available to date. We integrated two latest protein interaction networks, and developed a novel computational method to identify nearly 9000 protein complexes, including many previously uncharacterized complexes. Compared with the existing methods, our method achieved outstanding performance on both testing and independent datasets. Furthermore, with HPC-Atlas we identified 751 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-affected human protein complexes, and 456 multifunctional proteins that contain many potential moonlighting proteins. These results suggest that HPC-Atlas can serve as not only a computing framework to effectively identify biologically meaningful protein complexes by integrating multiple protein data sources, but also a valuable resource for exploring new biological findings. The HPC-Atlas webserver is freely available at http://www.yulpan.top/HPC-Atlas.
Collapse
Affiliation(s)
- Yuliang Pan
- Department of Computer Science and Technology, College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China
| | - Ruiyi Li
- Translational Medical Center for Stem Cell Therapy, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai 200120, China
| | - Wengen Li
- Department of Computer Science and Technology, College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China
| | - Liuzhenghao Lv
- Department of Computer Science and Technology, College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China
| | - Jihong Guan
- Department of Computer Science and Technology, College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China.
| | - Shuigeng Zhou
- Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, China.
| |
Collapse
|
4
|
Wang W, Meng X, Xiang J, Shuai Y, Bedru HD, Li M. CACO: A Core-Attachment Method With Cross-Species Functional Ortholog Information to Detect Human Protein Complexes. IEEE J Biomed Health Inform 2023; 27:4569-4578. [PMID: 37399160 DOI: 10.1109/jbhi.2023.3289490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
Protein complexes play an essential role in living cells. Detecting protein complexes is crucial to understand protein functions and treat complex diseases. Due to high time and resource consumption of experiment approaches, many computational approaches have been proposed to detect protein complexes. However, most of them are only based on protein-protein interaction (PPI) networks, which heavily suffer from the noise in PPI networks. Therefore, we propose a novel core-attachment method, named CACO, to detect human protein complexes, by integrating the functional information from other species via protein ortholog relations. First, CACO constructs a cross-species ortholog relation matrix and transfers GO terms from other species as a reference to evaluate the confidence of PPIs. Then, a PPI filter strategy is adopted to clean the PPI network and thus a weighted clean PPI network is constructed. Finally, a new effective core-attachment algorithm is proposed to detect protein complexes from the weighted PPI network. Compared to other thirteen state-of-the-art methods, CACO outperforms all of them in terms of F-measure and Composite Score, showing that integrating ortholog information and the proposed core-attachment algorithm are effective in detecting protein complexes.
Collapse
|
5
|
Palukuri MV, Patil RS, Marcotte EM. Molecular complex detection in protein interaction networks through reinforcement learning. BMC Bioinformatics 2023; 24:306. [PMID: 37532987 PMCID: PMC10394916 DOI: 10.1186/s12859-023-05425-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 07/20/2023] [Indexed: 08/04/2023] Open
Abstract
BACKGROUND Proteins often assemble into higher-order complexes to perform their biological functions. Such protein-protein interactions (PPI) are often experimentally measured for pairs of proteins and summarized in a weighted PPI network, to which community detection algorithms can be applied to define the various higher-order protein complexes. Current methods include unsupervised and supervised approaches, often assuming that protein complexes manifest only as dense subgraphs. Utilizing supervised approaches, the focus is not on how to find them in a network, but only on learning which subgraphs correspond to complexes, currently solved using heuristics. However, learning to walk trajectories on a network to identify protein complexes leads naturally to a reinforcement learning (RL) approach, a strategy not extensively explored for community detection. Here, we develop and evaluate a reinforcement learning pipeline for community detection on weighted protein-protein interaction networks to detect new protein complexes. The algorithm is trained to calculate the value of different subgraphs encountered while walking on the network to reconstruct known complexes. A distributed prediction algorithm then scales the RL pipeline to search for novel protein complexes on large PPI networks. RESULTS The reinforcement learning pipeline is applied to a human PPI network consisting of 8k proteins and 60k PPI, which results in 1,157 protein complexes. The method demonstrated competitive accuracy with improved speed compared to previous algorithms. We highlight protein complexes such as C4orf19, C18orf21, and KIAA1522 which are currently minimally characterized. Additionally, the results suggest TMC04 be a putative additional subunit of the KICSTOR complex and confirm the involvement of C15orf41 in a higher-order complex with HIRA, CDAN1, ASF1A, and by 3D structural modeling. CONCLUSIONS Reinforcement learning offers several distinct advantages for community detection, including scalability and knowledge of the walk trajectories defining those communities. Applied to currently available human protein interaction networks, this method had comparable accuracy with other algorithms and notable savings in computational time, and in turn, led to clear predictions of protein function and interactions for several uncharacterized human proteins.
Collapse
Affiliation(s)
- Meghana V Palukuri
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
- Oden Institute for Computational Engineering and Sciences, University of Texas, Austin, TX, 78712, USA.
| | - Ridhi S Patil
- Department of Biomedical Engineering, University of Texas, Austin, TX, 78712, USA.
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas, Austin, TX, 78712, USA.
- Oden Institute for Computational Engineering and Sciences, University of Texas, Austin, TX, 78712, USA.
| |
Collapse
|
6
|
Pan Y, Wang Y, Guan J, Zhou S. PCGAN: a generative approach for protein complex identification from protein interaction networks. Bioinformatics 2023; 39:btad473. [PMID: 37531266 PMCID: PMC10457665 DOI: 10.1093/bioinformatics/btad473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 07/23/2023] [Accepted: 08/01/2023] [Indexed: 08/04/2023] Open
Abstract
MOTIVATION Protein complexes are groups of polypeptide chains linked by non-covalent protein-protein interactions, which play important roles in biological systems and perform numerous functions, including DNA transcription, mRNA translation, and signal transduction. In the past decade, a number of computational methods have been developed to identify protein complexes from protein interaction networks by mining dense subnetworks or subgraphs. RESULTS In this article, different from the existing works, we propose a novel approach for this task based on generative adversarial networks, which is called PCGAN, meaning identifying Protein Complexes by GAN. With the help of some real complexes as training samples, our method can learn a model to generate new complexes from a protein interaction network. To effectively support model training and testing, we construct two more comprehensive and reliable protein interaction networks and a larger gold standard complex set by merging existing ones of the same organism (including human and yeast). Extensive comparison studies indicate that our method is superior to existing protein complex identification methods in terms of various performance metrics. Furthermore, functional enrichment analysis shows that the identified complexes are of high biological significance, which indicates that these generated protein complexes are very possibly real complexes. AVAILABILITY AND IMPLEMENTATION https://github.com/yul-pan/PCGAN.
Collapse
Affiliation(s)
- Yuliang Pan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Yang Wang
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Shuigeng Zhou
- Shanghai Key Laboratory of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| |
Collapse
|
7
|
Chen H, Cai Y, Ji C, Selvaraj G, Wei D, Wu H. AdaPPI: identification of novel protein functional modules via adaptive graph convolution networks in a protein-protein interaction network. Brief Bioinform 2023; 24:bbac523. [PMID: 36526282 DOI: 10.1093/bib/bbac523] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/10/2022] [Accepted: 11/02/2022] [Indexed: 12/23/2022] Open
Abstract
Identifying unknown protein functional modules, such as protein complexes and biological pathways, from protein-protein interaction (PPI) networks, provides biologists with an opportunity to efficiently understand cellular function and organization. Finding complex nonlinear relationships in underlying functional modules may involve a long-chain of PPI and pose great challenges in a PPI network with an unevenly sparse and dense node distribution. To overcome these challenges, we propose AdaPPI, an adaptive convolution graph network in PPI networks to predict protein functional modules. We first suggest an attributed graph node presentation algorithm. It can effectively integrate protein gene ontology attributes and network topology, and adaptively aggregates low- or high-order graph structural information according to the node distribution by considering graph node smoothness. Based on the obtained node representations, core cliques and expansion algorithms are applied to find functional modules in PPI networks. Comprehensive performance evaluations and case studies indicate that the framework significantly outperforms state-of-the-art methods. We also presented potential functional modules based on their confidence.
Collapse
|
8
|
Omranian S, Nikoloski Z, Grimm DG. Computational identification of protein complexes from network interactions: Present state, challenges, and the way forward. Comput Struct Biotechnol J 2022; 20:2699-2712. [PMID: 35685359 PMCID: PMC9166428 DOI: 10.1016/j.csbj.2022.05.049] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/25/2022] [Accepted: 05/25/2022] [Indexed: 01/05/2023] Open
|
9
|
Wang R, Ma H, Wang C. An Ensemble Learning Framework for Detecting Protein Complexes From PPI Networks. Front Genet 2022; 13:839949. [PMID: 35281831 PMCID: PMC8908451 DOI: 10.3389/fgene.2022.839949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 01/31/2022] [Indexed: 11/14/2022] Open
Abstract
Detecting protein complexes is one of the keys to understanding cellular organization and processes principles. With high-throughput experiments and computing science development, it has become possible to detect protein complexes by computational methods. However, most computational methods are based on either unsupervised learning or supervised learning. Unsupervised learning-based methods do not need training datasets, but they can only detect one or several topological protein complexes. Supervised learning-based methods can detect protein complexes with different topological structures. However, they are usually based on a type of training model, and the generalization of a single model is poor. Therefore, we propose an Ensemble Learning Framework for Detecting Protein Complexes (ELF-DPC) within protein-protein interaction (PPI) networks to address these challenges. The ELF-DPC first constructs the weighted PPI network by combining topological and biological information. Second, it mines protein complex cores using the protein complex core mining strategy we designed. Third, it obtains an ensemble learning model by integrating structural modularity and a trained voting regressor model. Finally, it extends the protein complex cores and forms protein complexes by a graph heuristic search strategy. The experimental results demonstrate that ELF-DPC performs better than the twelve state-of-the-art approaches. Moreover, functional enrichment analysis illustrated that ELF-DPC could detect biologically meaningful protein complexes. The code/dataset is available for free download from https://github.com/RongquanWang/ELF-DPC.
Collapse
Affiliation(s)
- Rongquan Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Huimin Ma
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
- *Correspondence: Huimin Ma,
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, Beijing, China
| |
Collapse
|
10
|
Yu Y, Kong D. Protein complexes detection based on node local properties and gene expression in PPI weighted networks. BMC Bioinformatics 2022; 23:24. [PMID: 34991441 PMCID: PMC8734347 DOI: 10.1186/s12859-021-04543-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 12/20/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Identifying protein complexes from protein-protein interaction (PPI) networks is a crucial task, and many related algorithms have been developed. Most algorithms usually employ direct neighbors of nodes and ignore resource allocation and second-order neighbors. The effective use of such information is crucial to protein complex detection. RESULT Based on this observation, we propose a new way by combining node resource allocation and gene expression information to weight protein network (NRAGE-WPN), in which protein complexes are detected based on core-attachment and second-order neighbors. CONCLUSIONS Through comparison with eleven methods in Yeast and Human PPI network, the experimental results demonstrate that this algorithm not only performs better than other methods on 75% in terms of f-measure+, but also can achieve an ideal overall performance in terms of a composite score consisting of five performance measures. This identification method is simple and can accurately identify more complexes.
Collapse
Affiliation(s)
- Yang Yu
- Software College, Shenyang Normal University, Shenyang, 110034, People's Republic of China.
| | - Dezhou Kong
- Software College, Shenyang Normal University, Shenyang, 110034, People's Republic of China
| |
Collapse
|
11
|
Wang X, Zhang N, Zhao Y, Wang J. A New Method for Recognizing Protein Complexes Based on Protein Interaction Networks and GO Terms. Front Genet 2021; 12:792265. [PMID: 34966415 PMCID: PMC8711776 DOI: 10.3389/fgene.2021.792265] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 11/10/2021] [Indexed: 01/29/2023] Open
Abstract
Motivation: A protein complex is the combination of proteins which interact with each other. Protein–protein interaction (PPI) networks are composed of multiple protein complexes. It is very difficult to recognize protein complexes from PPI data due to the noise of PPI. Results: We proposed a new method, called Topology and Semantic Similarity Network (TSSN), based on topological structure characteristics and biological characteristics to construct the PPI. Experiments show that the TSSN can filter the noise of PPI data. We proposed a new algorithm, called Neighbor Nodes of Proteins (NNP), for recognizing protein complexes by considering their topology information. Experiments show that the algorithm can identify more protein complexes and more accurately. The recognition of protein complexes is vital in research on evolution analysis. Availability and implementation: https://github.com/bioinformatical-code/NNP.
Collapse
Affiliation(s)
- Xiaoting Wang
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| | - Nan Zhang
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| | - Yulan Zhao
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| | - Juan Wang
- School of Computer Science, Inner Mongolia University, and with Ecological Big Data Engineering Research Center of the Ministry of Education, Hohhot, China
| |
Collapse
|
12
|
Omranian S, Angeleska A, Nikoloski Z. Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient. Comput Struct Biotechnol J 2021; 19:5255-5263. [PMID: 34630943 PMCID: PMC8479235 DOI: 10.1016/j.csbj.2021.09.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 09/13/2021] [Accepted: 09/13/2021] [Indexed: 12/23/2022] Open
Abstract
Provided a family of efficient network algorithms for protein complex identification. The parameter-free family outperforms existing approaches on different networks. It exactly recovered ~ 35% of protein complexes in a pan-plant PPI network. We examined of network perturbations on predicted protein complexes.
Identification of protein complexes from protein-protein interaction (PPI) networks is a key problem in PPI mining, solved by parameter-dependent approaches that suffer from small recall rates. Here we introduce GCC-v, a family of efficient, parameter-free algorithms to accurately predict protein complexes using the (weighted) clustering coefficient of proteins in PPI networks. Through comparative analyses with gold standards and PPI networks from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, we demonstrate that GCC-v outperforms twelve state-of-the-art approaches for identification of protein complexes with respect to twelve performance measures in at least 85.71% of scenarios. We also show that GCC-v results in the exact recovery of ∼35% of protein complexes in a pan-plant PPI network and discover 144 new protein complexes in Arabidopsis thaliana, with high support from GO semantic similarity. Our results indicate that findings from GCC-v are robust to network perturbations, which has direct implications to assess the impact of the PPI network quality on the predicted protein complexes.
Collapse
Affiliation(s)
- Sara Omranian
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany.,Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany.,Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany
| |
Collapse
|