1
|
Abbas MN, Broneske D, Saake G. A multi-objective evolutionary algorithm for detecting protein complexes in PPI networks using gene ontology. Sci Rep 2025; 15:16855. [PMID: 40374682 DOI: 10.1038/s41598-025-01667-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2025] [Accepted: 05/07/2025] [Indexed: 05/17/2025] Open
Abstract
Detecting protein complexes is crucial in computational biology for understanding cellular mechanisms and facilitating drug discovery. Evolutionary algorithms (EAs) have proven effective in uncovering protein complexes within networks of protein-protein interactions (PPIs). However, their integration with functional insights from gene ontology (GO) annotations remains underexplored. This paper presents two primary contributions: First, it proposes a novel multi-objective optimization model for detecting protein complexes, conceptualizing the task as a problem with inherently conflicting objectives based on biological data. Second, it introduces an innovative gene ontology-based mutation operator, termed the Functional Similarity-Based Protein Translocation Operator ([Formula: see text]). This operator enhances collaboration between the canonical model and the GO-informed mutation strategy, thereby improving the algorithm's performance. As far as we know, this is the initial effort to incorporate the biological characteristics of PPIs into both the problem formulation and the development of intricate perturbation strategies. We assess the effectiveness of the proposed multi-objective evolutionary algorithm through experiments conducted on two widely recognized PPI networks and two standard complex datasets provided by the Munich Information Center for Protein Sequences (MIPS). To further assess the robustness of our algorithm, we create artificial networks by introducing different noise levels into the original Saccharomyces cerevisiae (yeast) PPI networks. This allows us to evaluate how perturbations in protein interactions affect the algorithm's performance compared to other approaches. The experimental results highlight that our algorithm outperforms several state-of-the-art methods in accurately identifying protein complexes. Moreover, the findings emphasize the substantial advantages of incorporating our heuristic perturbation operator, which significantly improves the quality of the detected complexes over other evolutionary algorithm-based methods.
Collapse
Affiliation(s)
- Mustafa N Abbas
- Databases and Software Engineering, Otto-von-Guericke-University, Magdeburg, Germany.
| | - David Broneske
- German Centre for Higher Education Research and Science Studies, Hannover, Germany
| | - Gunter Saake
- Databases and Software Engineering, Otto-von-Guericke-University, Magdeburg, Germany
| |
Collapse
|
2
|
Wang S, Cui H, Qu Y, Zhang Y. Multi-source biological knowledge-guided hypergraph spatiotemporal subnetwork embedding for protein complex identification. Brief Bioinform 2024; 26:bbae718. [PMID: 39814560 PMCID: PMC11735048 DOI: 10.1093/bib/bbae718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 12/19/2024] [Accepted: 12/30/2024] [Indexed: 01/18/2025] Open
Abstract
Identifying biologically significant protein complexes from protein-protein interaction (PPI) networks and understanding their roles are essential for elucidating protein functions, life processes, and disease mechanisms. Current methods typically rely on static PPI networks and model PPI data as pairwise relationships, which presents several limitations. Firstly, static PPI networks do not adequately represent the scopes and temporal dynamics of protein interactions. Secondly, a large amount of available biological resources have not been fully integrated. Moreover, PPIs in biological systems are not merely one-to-one relationships but involve higher order non-pairwise interactions. To alleviate these issues, we propose HGST, a multi-source biological knowledge-guided hypergraph spatiotemporal subnetwork (subnet) embedding method for identifying biologically significant protein complexes from PPI networks. HGST initially constructs spatiotemporal PPI subnets using the scopes and temporal dynamics of proteins derived from multi-source biological knowledge, treating them as dynamic networks through fine-grained spatiotemporal partitioning. The spatiotemporal subnets are then transformed into hypergraphs, which model higher order non-pairwise relationships via hypergraph embedding. Simultaneously, fine-grained amino acid sequence features and coarse-grained gene ontology attributes are introduced for multi-dimensional feature fusion. Finally, protein complexes are identified from the reweighted subnets based on fused feature representations using the core-attachment strategy. Evaluations on four real PPI datasets demonstrate that HGST achieves competitive performance. Furthermore, a series of biological analyses confirm the high biological significance of the complexes identified by HGST. The source code is available at https://github.com/qifen37/HGST.
Collapse
Affiliation(s)
- Shilong Wang
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, 116026, Dalian, Liaoning, China
| | - Hai Cui
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, 116026, Dalian, Liaoning, China
| | - Yanchen Qu
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, 116026, Dalian, Liaoning, China
| | - Yijia Zhang
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, 116026, Dalian, Liaoning, China
| |
Collapse
|
3
|
Santos MVC, Feltrin AS, Costa-Amaral IC, Teixeira LR, Perini JA, Martins DC, Larentis AL. Network Analysis of Biomarkers Associated with Occupational Exposure to Benzene and Malathion. Int J Mol Sci 2023; 24:ijms24119415. [PMID: 37298367 DOI: 10.3390/ijms24119415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/21/2023] [Accepted: 05/03/2023] [Indexed: 06/12/2023] Open
Abstract
Complex diseases are associated with the effects of multiple genes, proteins, and biological pathways. In this context, the tools of Network Medicine are compatible as a platform to systematically explore not only the molecular complexity of a specific disease but may also lead to the identification of disease modules and pathways. Such an approach enables us to gain a better understanding of how environmental chemical exposures affect the function of human cells, providing better perceptions about the mechanisms involved and helping to monitor/prevent exposure and disease to chemicals such as benzene and malathion. We selected differentially expressed genes for exposure to benzene and malathion. The construction of interaction networks was carried out using GeneMANIA and STRING. Topological properties were calculated using MCODE, BiNGO, and CentiScaPe, and a Benzene network composed of 114 genes and 2415 interactions was obtained. After topological analysis, five networks were identified. In these subnets, the most interconnected nodes were identified as: IL-8, KLF6, KLF4, JUN, SERTAD1, and MT1H. In the Malathion network, composed of 67 proteins and 134 interactions, HRAS and STAT3 were the most interconnected nodes. Path analysis, combined with various types of high-throughput data, reflects biological processes more clearly and comprehensively than analyses involving the evaluation of individual genes. We emphasize the central roles played by several important hub genes obtained by exposure to benzene and malathion.
Collapse
Affiliation(s)
- Marcus Vinicius C Santos
- Studies Center of Worker's Health and Human Ecology (CESTEH), Sergio Arouca National School of Public Health (ENSP), Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro 21041-210, RJ, Brazil
| | - Arthur S Feltrin
- Center for Mathematics, Computation and Cognition, Federal University of ABC, Santo André 09210-580, SP, Brazil
| | - Isabele C Costa-Amaral
- Studies Center of Worker's Health and Human Ecology (CESTEH), Sergio Arouca National School of Public Health (ENSP), Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro 21041-210, RJ, Brazil
| | - Liliane R Teixeira
- Studies Center of Worker's Health and Human Ecology (CESTEH), Sergio Arouca National School of Public Health (ENSP), Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro 21041-210, RJ, Brazil
| | - Jamila A Perini
- Research Laboratory of Pharmaceutical Sciences (LAPESF), State University of Rio de Janeiro (West Zone-UERJ-ZO), Rio de Janeiro 23070-200, RJ, Brazil
| | - David C Martins
- Center for Mathematics, Computation and Cognition, Federal University of ABC, Santo André 09210-580, SP, Brazil
| | - Ariane L Larentis
- Studies Center of Worker's Health and Human Ecology (CESTEH), Sergio Arouca National School of Public Health (ENSP), Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro 21041-210, RJ, Brazil
| |
Collapse
|
4
|
Wang H, Luo J, Li A, Su X, Fang C, Xie L, Wu Y, Wen F, Liu Y, Wang T, Zhong Y, Ma L. Proteomic and phosphorylated proteomic landscape of injured lung in juvenile septic rats with therapeutic application of umbilical cord mesenchymal stem cells. Front Immunol 2022; 13:1034821. [PMID: 36341346 PMCID: PMC9635340 DOI: 10.3389/fimmu.2022.1034821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/10/2022] [Indexed: 02/05/2023] Open
Abstract
Acute lung injury (ALI) is the most common complication of sepsis. Intravenous injection of HUMSCs can regulate the level of circulating endothelial cytokines and alleviate lung injury in juvenile septic rats. In this study, we performed proteomic and phosphorylated proteomic analysis of lung tissue of juvenile septic rats after Human Umbilical Cord Mesenchymal Stem Cells (HUMSCs) intervention for the first time, and screened the potential proteins and pathways of HUMSCs for therapeutic effect. The 4D proteome quantitative technique was used to quantitatively analyze the lung tissues of septic rats 24 hours (3 biological samples) and 24 hours after HUMSCs intervention (3 biological samples). A total of 213 proteins were identified as differentially expressed proteins, and 971 phosphorylation sites changed significantly. Based on the public database, we analyzed the functional enrichment of these proteins and phosphorylated proteins. In addition, Tenascin-C may be the key differential protein and ECM receptor interaction pathway may be the main signal pathway by using various algorithms to analyze the protein-protein interaction network. Phosphorylation analysis showed that tight junction pathway was closely related to immune inflammatory reaction, and EGFR interacted most, which may be the key differential phosphorylated protein. Finally, 123 conserved motifs of serine phosphorylation site (pS) and 17 conserved motifs of threonine (pT) phosphorylation sites were identified by motif analysis of phosphorylation sites. Results from proteomics and phosphorylated proteomics, the potential new therapeutic targets of HUMSCs in alleviating lung injury in juvenile septic rats were revealed.
Collapse
Affiliation(s)
- Hongwu Wang
- Department of Pediatrics, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
- Department of Hematology and Oncology, Shenzhen Children's Hospital of China Medical University, Shenzhen, China
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen, China
| | - Junlin Luo
- Department of Pediatrics, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Aijia Li
- Department of Pediatrics, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Xing Su
- Department of Pediatrics, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Chuiqin Fang
- Department of Pediatrics, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Lichun Xie
- Department of Hematology and Oncology, Shenzhen Children's Hospital of China Medical University, Shenzhen, China
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen, China
- Department of Pediatrics, The Third Affiliated Hospital of Guangzhou Medical University (The Women and Children’s Medical Hospital of Guangzhou Medical University), Guangzhou, China
| | - Yi Wu
- Department of Pediatrics, The Second Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Feiqiu Wen
- Department of Hematology and Oncology, Shenzhen Children's Hospital of China Medical University, Shenzhen, China
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen, China
- Department of Hematology and Oncology, Shenzhen Public Service Platform of Molecular Medicine in Pediatric Hematology and Oncology, Shenzhen, China
| | - Yufeng Liu
- Department of Pediatrics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Tianyou Wang
- Department of Hematology and Oncology, Beijing Children’s Hospital, Capital Medical University, Beijing, China
| | - Yong Zhong
- Department of Pediatrics, The Southeast General Hospital of Dongguan, Dongguan, China
| | - Lian Ma
- Department of Hematology and Oncology, Shenzhen Children's Hospital of China Medical University, Shenzhen, China
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen, China
- Department of Pediatrics, The Third Affiliated Hospital of Guangzhou Medical University (The Women and Children’s Medical Hospital of Guangzhou Medical University), Guangzhou, China
- Department of Hematology and Oncology, Shenzhen Public Service Platform of Molecular Medicine in Pediatric Hematology and Oncology, Shenzhen, China
| |
Collapse
|
5
|
Wu Z, Liao Q, Fan S, Liu B. idenPC-CAP: Identify protein complexes from weighted RNA-protein heterogeneous interaction networks using co-assemble partner relation. Brief Bioinform 2020; 22:6041167. [PMID: 33333549 DOI: 10.1093/bib/bbaa372] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/07/2020] [Accepted: 11/20/2020] [Indexed: 12/18/2022] Open
Abstract
Protein complexes play important roles in most cellular processes. The available genome-wide protein-protein interaction (PPI) data make it possible for computational methods identifying protein complexes from PPI networks. However, PPI datasets usually contain a large ratio of false positive noise. Moreover, different types of biomolecules in a living cell cooperate to form a union interaction network. Because previous computational methods focus only on PPIs ignoring other types of biomolecule interactions, their predicted protein complexes often contain many false positive proteins. In this study, we develop a novel computational method idenPC-CAP to identify protein complexes from the RNA-protein heterogeneous interaction network consisting of RNA-RNA interactions, RNA-protein interactions and PPIs. By considering interactions among proteins and RNAs, the new method reduces the ratio of false positive proteins in predicted protein complexes. The experimental results demonstrate that idenPC-CAP outperforms the other state-of-the-art methods in this field.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Shixi Fan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| |
Collapse
|
6
|
Huang S, Zheng F, Liu L, Meng S, Cai W, Zhang C, Dai W, Liu D, Hong X, Tang D, Dai Y. Integrated proteome and phosphoproteome analyses of peripheral blood mononuclear cells in primary Sjögren syndrome patients. Aging (Albany NY) 2020; 13:1071-1095. [PMID: 33290261 PMCID: PMC7835054 DOI: 10.18632/aging.202233] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 10/27/2020] [Indexed: 12/18/2022]
Abstract
Primary Sjögren syndrome (pSS) is a common autoimmune disease. Here, we performed the first proteome and phosphoproteome analyses of peripheral blood mononuclear cells in pSS patients to obtain a comprehensive profile and identify the potential crucial proteins and pathways for the screening and evaluation of pSS patients. Peripheral blood mononuclear cells from 8 pSS-confirmed patients (American-European Consensus Group Criteria, 2002) and 10 normal controls were selected. Label-free quantitative proteomics was utilized to obtain quantitative information. In total, 787 proteins were identified as differentially expressed proteins, and 175 phosphosites on 123 proteins were identified as differentially phosphorylated proteins. We performed functional enrichment analyses with these proteins and phosphoproteins based on public database. Furthermore, protein-protein interaction network analyses were performed by using multiple algorithms. Using module and hub protein analyses, we identified 16 modules for the proteins, 2 clusters for the phosphoproteins and selected the top 10 hub proteins. Finally, we identified 22 motifs using motif analysis of the phosphosites and found 17 newly identified motifs, while 6 motifs were experimentally verified for known protein kinases. The findings distinguished pSS patients from normal controls at the peripheral blood mononuclear cells level and revealed potential candidates for use in pSS diagnosis.
Collapse
Affiliation(s)
- Shaoying Huang
- Department of Clinical Medical Research Center, Guangdong Provincial Engineering Research Center of Autoimmune Disease Precision Medicine, Shenzhen People’s Hospital, The First Affiliated Hospital Southern University of Science and Technology, The Second Clinical Medical College of Jinan University, Shenzhen 518020, Guangdong, China
| | - Fengping Zheng
- Department of Clinical Medical Research Center, Guangdong Provincial Engineering Research Center of Autoimmune Disease Precision Medicine, Shenzhen People’s Hospital, The First Affiliated Hospital Southern University of Science and Technology, The Second Clinical Medical College of Jinan University, Shenzhen 518020, Guangdong, China
| | - Lixiong Liu
- Department of Rheumatology and Immunology, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen 518020, Guangdong, China
| | - Shuhui Meng
- Department of Clinical Medical Research Center, Guangdong Provincial Engineering Research Center of Autoimmune Disease Precision Medicine, Shenzhen People’s Hospital, The First Affiliated Hospital Southern University of Science and Technology, The Second Clinical Medical College of Jinan University, Shenzhen 518020, Guangdong, China
| | - Wanxia Cai
- Department of Clinical Medical Research Center, Guangdong Provincial Engineering Research Center of Autoimmune Disease Precision Medicine, Shenzhen People’s Hospital, The First Affiliated Hospital Southern University of Science and Technology, The Second Clinical Medical College of Jinan University, Shenzhen 518020, Guangdong, China
| | - Cantong Zhang
- Department of Clinical Medical Research Center, Guangdong Provincial Engineering Research Center of Autoimmune Disease Precision Medicine, Shenzhen People’s Hospital, The First Affiliated Hospital Southern University of Science and Technology, The Second Clinical Medical College of Jinan University, Shenzhen 518020, Guangdong, China
| | - Weier Dai
- College of Natural Science, University of Texas at Austin, Austin, TX 78712, USA
| | - Dongzhou Liu
- Department of Rheumatology and Immunology, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen 518020, Guangdong, China
| | - Xiaoping Hong
- Department of Rheumatology and Immunology, The Second Clinical Medical College of Jinan University, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen 518020, Guangdong, China
| | - Donge Tang
- Department of Clinical Medical Research Center, Guangdong Provincial Engineering Research Center of Autoimmune Disease Precision Medicine, Shenzhen People’s Hospital, The First Affiliated Hospital Southern University of Science and Technology, The Second Clinical Medical College of Jinan University, Shenzhen 518020, Guangdong, China
| | - Yong Dai
- Department of Clinical Medical Research Center, Guangdong Provincial Engineering Research Center of Autoimmune Disease Precision Medicine, Shenzhen People’s Hospital, The First Affiliated Hospital Southern University of Science and Technology, The Second Clinical Medical College of Jinan University, Shenzhen 518020, Guangdong, China.,Guangxi Key Laboratory of Metabolic Disease Research, Nephrology Department of Guilin, Guilin 541002, China
| |
Collapse
|
7
|
Wu Z, Liao Q, Liu B. A comprehensive review and evaluation of computational methods for identifying protein complexes from protein-protein interaction networks. Brief Bioinform 2020; 21:1531-1548. [PMID: 31631226 DOI: 10.1093/bib/bbz085] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/17/2019] [Accepted: 06/17/2019] [Indexed: 01/03/2025] Open
Abstract
Protein complexes are the fundamental units for many cellular processes. Identifying protein complexes accurately is critical for understanding the functions and organizations of cells. With the increment of genome-scale protein-protein interaction (PPI) data for different species, various computational methods focus on identifying protein complexes from PPI networks. In this article, we give a comprehensive and updated review on the state-of-the-art computational methods in the field of protein complex identification, especially focusing on the newly developed approaches. The computational methods are organized into three categories, including cluster-quality-based methods, node-affinity-based methods and ensemble clustering methods. Furthermore, the advantages and disadvantages of different methods are discussed, and then, the performance of 17 state-of-the-art methods is evaluated on two widely used benchmark data sets. Finally, the bottleneck problems and their potential solutions in this important field are discussed.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
8
|
Dai C, He J, Hu K, Ding Y. Identifying essential proteins in dynamic protein networks based on an improved h-index algorithm. BMC Med Inform Decis Mak 2020; 20:110. [PMID: 32552708 PMCID: PMC7371468 DOI: 10.1186/s12911-020-01141-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 06/01/2020] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The essential proteins in protein networks play an important role in complex cellular functions and in protein evolution. Therefore, the identification of essential proteins in a network can help to explain the structure, function, and dynamics of basic cellular networks. The existing dynamic protein networks regard the protein components as the same at all time points; however, the role of proteins can vary over time. METHODS To improve the accuracy of identifying essential proteins, an improved h-index algorithm based on the attenuation coefficient method is proposed in this paper. This method incorporates previously neglected node information to improve the accuracy of the essential protein search. Based on choosing the appropriate attenuation coefficient, the values, such as monotonicity, SN, SP, PPV and NPV of different essential protein search algorithms are tested. RESULTS The experimental results show that, the algorithm proposed in this paper can ensure the accuracy of the found proteins while identifying more essential proteins. CONCLUSIONS The described experiments show that this method is more effective than other similar methods in identifying essential proteins in dynamic protein networks. This study can better explain the mechanism of life activities and provide theoretical basis for the research and development of targeted drugs.
Collapse
Affiliation(s)
- Caiyan Dai
- College of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine University, Nanjing, 210000, China.
| | - Ju He
- College of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine University, Nanjing, 210000, China
| | - Kongfa Hu
- College of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine University, Nanjing, 210000, China
| | - Youwei Ding
- College of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine University, Nanjing, 210000, China
| |
Collapse
|
9
|
Zhang Z, Luo Y, Hu S, Li X, Wang L, Zhao B. A novel method to predict essential proteins based on tensor and HITS algorithm. Hum Genomics 2020; 14:14. [PMID: 32252824 PMCID: PMC7137323 DOI: 10.1186/s40246-020-00263-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 03/05/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Essential proteins are an important part of the cell and closely related to the life activities of the cell. Hitherto, Protein-Protein Interaction (PPI) networks have been adopted by many computational methods to predict essential proteins. Most of the current approaches focus mainly on the topological structure of PPI networks. However, those methods relying solely on the PPI network have low detection accuracy for essential proteins. Therefore, it is necessary to integrate the PPI network with other biological information to identify essential proteins. RESULTS In this paper, we proposed a novel random walk method for identifying essential proteins, called HEPT. A three-dimensional tensor is constructed first by combining the PPI network of Saccharomyces cerevisiae with multiple biological data such as gene ontology annotations and protein domains. Then, based on the newly constructed tensor, we extended the Hyperlink-Induced Topic Search (HITS) algorithm from a two-dimensional to a three-dimensional tensor model that can be utilized to infer essential proteins. Different from existing state-of-the-art methods, the importance of proteins and the types of interactions will both contribute to the essential protein prediction. To evaluate the performance of our newly proposed HEPT method, proteins are ranked in the descending order based on their ranking scores computed by our method and other competitive methods. After that, a certain number of the ranked proteins are selected as candidates for essential proteins. According to the list of known essential proteins, the number of true essential proteins is used to judge the performance of each method. Experimental results show that our method can achieve better prediction performance in comparison with other nine state-of-the-art methods in identifying essential proteins. CONCLUSIONS Through analysis and experimental results, it is obvious that HEPT can be used to effectively improve the prediction accuracy of essential proteins by the use of HITS algorithm and the combination of network topology with gene ontology annotations and protein domains, which provides a new insight into multi-data source fusion.
Collapse
Affiliation(s)
- Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Yingchun Luo
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
- Department of Ultrasound, Hunan Province Women and Children’s Hospital, Changsha, 410008 China
| | - Sai Hu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Xueyong Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
- Hunan Provincial Key Laboratory of Nutrition and Quality Control of Aquatic Animals, Department of Biological and Environmental Engineering, Changsha University, Changsha, 410022 China
| |
Collapse
|
10
|
CDAP: An Online Package for Evaluation of Complex Detection Methods. Sci Rep 2019; 9:12751. [PMID: 31485005 PMCID: PMC6726630 DOI: 10.1038/s41598-019-49225-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 08/21/2019] [Indexed: 01/21/2023] Open
Abstract
Methods for detecting protein complexes from protein-protein interaction networks are of the most critical computational approaches. Numerous methods have been proposed in this area. Therefore, it is necessary to evaluate them. Various metrics have been proposed in order to compare these methods. Nevertheless, it is essential to define new metrics that evaluate methods both qualitatively and quantitatively. In addition, there is no tool for the comprehensive comparison of such methods. In this paper, a new criterion is introduced that can fully evaluate protein complex detection algorithms. We introduce CDAP (Complex Detection Analyzer Package); an online package for comparing protein complex detection methods. CDAP can quickly rank the performance of methods based on previously defined as well as newly introduced criteria in various settings (4 PPI datasets and 3 gold standards). It has the capability of integrating various methods and apply several filterings on the results. CDAP can be easily extended to include new datasets, gold standards, and methods. Furthermore, the user can compare the results of a custom method with the results of existing methods. Thus, the authors of future papers can use CDAP for comparing their method with the previous ones. A case study is done on YGR198W, a well-known protein, and the detected clusters are compared to the known complexes of this protein.
Collapse
|
11
|
Xu B, Li K, Zheng W, Liu X, Zhang Y, Zhao Z, He Z. Protein complexes identification based on go attributed network embedding. BMC Bioinformatics 2018; 19:535. [PMID: 30572820 PMCID: PMC6302388 DOI: 10.1186/s12859-018-2555-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 11/30/2018] [Indexed: 01/19/2023] Open
Abstract
Background Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate different types of biological information into the complexes discovery process under a unified framework. Recently, attributed network embedding methods have be proved to be remarkably effective in generating vector representations for nodes in the network. In the transformed vector space, both the topological proximity and node attributed affinity between different nodes are preserved. Therefore, such attributed network embedding methods provide us a unified framework to integrate various biological evidences into the protein complexes identification process. Results In this article, we propose a new method called GANE to predict protein complexes based on Gene Ontology (GO) attributed network embedding. Firstly, it learns the vector representation for each protein from a GO attributed PPI network. Based on the pair-wise vector representation similarity, a weighted adjacency matrix is constructed. Secondly, it uses the clique mining method to generate candidate cores. Consequently, seed cores are obtained by ranking candidate cores based on their densities on the weighted adjacency matrix and removing redundant cores. For each seed core, its attachments are the proteins with correlation score that is larger than a given threshold. The combination of a seed core and its attachment proteins is reported as a predicted protein complex by the GANE algorithm. For performance evaluation, we compared GANE with six protein complex identification methods on five yeast PPI networks. Experimental results showes that GANE performs better than the competing algorithms in terms of different evaluation metrics. Conclusions GANE provides a framework that integrate many valuable and different biological information into the task of protein complex identification. The protein vector representation learned from our attributed PPI network can also be used in other tasks, such as PPI prediction and disease gene prediction. Electronic supplementary material The online version of this article (10.1186/s12859-018-2555-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bo Xu
- School of Software Technology, Dalian University of Technology, No.321 Tuqiang Road, Economic Development Zone, Dalian, 116024, China. .,Key Laboratory for Ubiquitous Network and Service Software of Liaoning, Dalian, 116000, China.
| | - Kun Li
- School of Software Technology, Dalian University of Technology, No.321 Tuqiang Road, Economic Development Zone, Dalian, 116024, China
| | - Wei Zheng
- College of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, China.,College of software, Dalian JiaoTong University, Dalian, 116000, China
| | - Xiaoxia Liu
- College of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, China
| | - Yijia Zhang
- College of Computer Science and Technology, Dalian University of Technology, No.2 Linggong Road, Ganjingzi District, Dalian, 116024, China
| | - Zhehuan Zhao
- School of Software Technology, Dalian University of Technology, No.321 Tuqiang Road, Economic Development Zone, Dalian, 116024, China.,Key Laboratory for Ubiquitous Network and Service Software of Liaoning, Dalian, 116000, China
| | - Zengyou He
- School of Software Technology, Dalian University of Technology, No.321 Tuqiang Road, Economic Development Zone, Dalian, 116024, China.,Key Laboratory for Ubiquitous Network and Service Software of Liaoning, Dalian, 116000, China
| |
Collapse
|
12
|
Abdulateef AH, Attea BA, Rashid AN, Al-Ani M. A new evolutionary algorithm with locally assisted heuristic for complex detection in protein interaction networks. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.09.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Performance evaluation measures for protein complex prediction. Genomics 2018; 111:1483-1492. [PMID: 30312661 DOI: 10.1016/j.ygeno.2018.10.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 09/25/2018] [Accepted: 10/04/2018] [Indexed: 02/01/2023]
Abstract
Protein complexes play a dominant role in cellular organization and function. Prediction of protein complexes from the network of physical interactions between proteins (PPI networks) has thus become one of the important research areas. Recently, many computational approaches have been developed to identify these complexes. Various performance assessment measures have been proposed for evaluating the efficiency of these methods. However, there are many inconsistencies in the definitions and usage of the measures across the literature. To address this issue, we have gathered and presented the most important performance evaluation measures and developed a tool, named CompEvaluator, to critically assess the protein complex prediction methods. The tool and documentation are publicly available at https://sourceforge.net/projects/compevaluator/files/.
Collapse
|
14
|
Kashyap S, Kumar S, Agarwal V, Misra DP, Phadke SR, Kapoor A. Protein protein interaction network analysis of differentially expressed genes to understand involved biological processes in coronary artery disease and its different severity. GENE REPORTS 2018; 12:50-60. [DOI: 10.1016/j.genrep.2018.05.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
15
|
Wang R, Liu G, Wang C, Su L, Sun L. Predicting overlapping protein complexes based on core-attachment and a local modularity structure. BMC Bioinformatics 2018; 19:305. [PMID: 30134824 PMCID: PMC6106838 DOI: 10.1186/s12859-018-2309-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 07/30/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent decades, detecting protein complexes (PCs) from protein-protein interaction networks (PPINs) has been an active area of research. There are a large number of excellent graph clustering methods that work very well for identifying PCs. However, most of existing methods usually overlook the inherent core-attachment organization of PCs. Therefore, these methods have three major limitations we should concern. Firstly, many methods have ignored the importance of selecting seed, especially without considering the impact of overlapping nodes as seed nodes. Thus, there may be false predictions. Secondly, PCs are generally supposed to be dense subgraphs. However, the subgraphs with high local modularity structure usually correspond to PCs. Thirdly, a number of available methods lack handling noise mechanism, and miss some peripheral proteins. In summary, all these challenging issues are very important for predicting more biological overlapping PCs. RESULTS In this paper, to overcome these weaknesses, we propose a clustering method by core-attachment and local modularity structure, named CALM, to detect overlapping PCs from weighted PPINs with noises. Firstly, we identify overlapping nodes and seed nodes. Secondly, for a node, we calculate the support function between a node and a cluster. In CALM, a cluster which initially consists of only a seed node, is extended by adding its direct neighboring nodes recursively according to the support function, until this cluster forms a locally optimal modularity subgraph. Thirdly, we repeat this process for the remaining seed nodes. Finally, merging and removing procedures are carried out to obtain final predicted clusters. The experimental results show that CALM outperforms other classical methods, and achieves ideal overall performance. Furthermore, CALM can match more complexes with a higher accuracy and provide a better one-to-one mapping with reference complexes in all test datasets. Additionally, CALM is robust against the high rate of noise PPIN. CONCLUSIONS By considering core-attachment and local modularity structure, CALM could detect PCs much more effectively than some representative methods. In short, CALM could potentially identify previous undiscovered overlapping PCs with various density and high modularity.
Collapse
Affiliation(s)
- Rongquan Wang
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, 24 Zhanlanguan Road, Xicheng District, Beijing, 100037 China
| | - Lingtao Su
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
| | - Liyan Sun
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
| |
Collapse
|
16
|
Zaki N, Alashwal H. Improving the Detection of Protein Complexes by Predicting Novel Missing Interactome Links in the Protein-Protein Interaction Network. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:5041-5044. [PMID: 30441473 DOI: 10.1109/embc.2018.8513476] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying protein complexes within a protein-protein interaction (PPI) networks is a crucial task in computational biology that helps to facilitate a better understanding of the cellular mechanisms it is possible to observe in various organisms. Datasets of predicted PPIs have been determined using high-throughput experimental technology. However, the datasets typically contain many spurious interactions. It is essential that these interactions, observed in the given datasets, are validated before they are employed to predict protein complexes. This paper describes the identification of missing interactome links in the PPI network as a way of improving the detection of protein complexes. The missing links have been identified by extracting several topological features. These are subsequently employed in conjunction with a two-class boosted decision-tree classifier to develop a machine-learning model that is capable of distinguishing between existing and non-existing interactome links. The model was trained on a PPI network that consisted of 1,622 proteins and 9,074 interactions, then tested on another PPI network that consisted of 1,430 proteins and 6,531 interactions. All 6,531 interactions were identified with a precision of 0.994 and a recall of 1. The model was also able to detect 37 novel interactions that were then validated using a STRING database of known and predicted PPIs. The detection of the protein complexes using CIusterONE was improved by the inclusion of the 37 novel interactions.
Collapse
|
17
|
Attea BA, Abdullah QZ. Improving the performance of evolutionary-based complex detection models in protein–protein interaction networks. Soft comput 2018. [DOI: 10.1007/s00500-017-2593-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
18
|
Pellegrini M, Baglioni M, Geraci F. Protein complex prediction for large protein protein interaction networks with the Core&Peel method. BMC Bioinformatics 2016; 17:372. [PMID: 28185552 PMCID: PMC5123419 DOI: 10.1186/s12859-016-1191-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Background Biological networks play an increasingly important role in the exploration of functional modularity and cellular organization at a systemic level. Quite often the first tools used to analyze these networks are clustering algorithms. We concentrate here on the specific task of predicting protein complexes (PC) in large protein-protein interaction networks (PPIN). Currently, many state-of-the-art algorithms work well for networks of small or moderate size. However, their performance on much larger networks, which are becoming increasingly common in modern proteome-wise studies, needs to be re-assessed. Results and discussion We present a new fast algorithm for clustering large sparse networks: Core&Peel, which runs essentially in time and storage O(a(G)m+n) for a network G of n nodes and m arcs, where a(G) is the arboricity of G (which is roughly proportional to the maximum average degree of any induced subgraph in G). We evaluated Core&Peel on five PPI networks of large size and one of medium size from both yeast and homo sapiens, comparing its performance against those of ten state-of-the-art methods. We demonstrate that Core&Peel consistently outperforms the ten competitors in its ability to identify known protein complexes and in the functional coherence of its predictions. Our method is remarkably robust, being quite insensible to the injection of random interactions. Core&Peel is also empirically efficient attaining the second best running time over large networks among the tested algorithms. Conclusions Our algorithm Core&Peel pushes forward the state-of the-art in PPIN clustering providing an algorithmic solution with polynomial running time that attains experimentally demonstrable good output quality and speed on challenging large real networks. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1191-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marco Pellegrini
- Laboratory for Integrative Systems Medicine - Istituto di Informatica e Telematica and Istituto di Fisiologia Clinica del CNR, via Moruzzi 1, Pisa, 56124, Italy.
| | - Miriam Baglioni
- Laboratory for Integrative Systems Medicine - Istituto di Informatica e Telematica and Istituto di Fisiologia Clinica del CNR, via Moruzzi 1, Pisa, 56124, Italy
| | - Filippo Geraci
- Laboratory for Integrative Systems Medicine - Istituto di Informatica e Telematica and Istituto di Fisiologia Clinica del CNR, via Moruzzi 1, Pisa, 56124, Italy
| |
Collapse
|
19
|
Comparative genomic analysis of novel Acinetobacter symbionts: A combined systems biology and genomics approach. Sci Rep 2016; 6:29043. [PMID: 27378055 PMCID: PMC4932630 DOI: 10.1038/srep29043] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 06/08/2016] [Indexed: 12/20/2022] Open
Abstract
The increasing trend of antibiotic resistance in Acinetobacter drastically limits the range of therapeutic agents required to treat multidrug resistant (MDR) infections. This study focused on analysis of novel Acinetobacter strains using a genomics and systems biology approach. Here we used a network theory method for pathogenic and non-pathogenic Acinetobacter spp. to identify the key regulatory proteins (hubs) in each strain. We identified nine key regulatory proteins, guaA, guaB, rpsB, rpsI, rpsL, rpsE, rpsC, rplM and trmD, which have functional roles as hubs in a hierarchical scale-free fractal protein-protein interaction network. Two key hubs (guaA and guaB) were important for insect-associated strains, and comparative analysis identified guaA as more important than guaB due to its role in effective module regulation. rpsI played a significant role in all the novel strains, while rplM was unique to sheep-associated strains. rpsM, rpsB and rpsI were involved in the regulation of overall network topology across all Acinetobacter strains analyzed in this study. Future analysis will investigate whether these hubs are useful as drug targets for treating Acinetobacter infections.
Collapse
|
20
|
Zhang Y, Lin H, Yang Z, Wang J. Construction of dynamic probabilistic protein interaction networks for protein complex identification. BMC Bioinformatics 2016; 17:186. [PMID: 27117946 PMCID: PMC4847341 DOI: 10.1186/s12859-016-1054-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Accepted: 04/14/2016] [Indexed: 11/10/2022] Open
Abstract
Background Recently, high-throughput experimental techniques have generated a large amount of protein-protein interaction (PPI) data which can construct large complex PPI networks for numerous organisms. System biology attempts to understand cellular organization and function by analyzing these PPI networks. However, most studies still focus on static PPI networks which neglect the dynamic information of PPI. Results The gene expression data under different time points and conditions can reveal the dynamic information of proteins. In this study, we used an active probability-based method to distinguish the active level of proteins at different active time points. We constructed dynamic probabilistic protein networks (DPPN) to integrate dynamic information of protein into static PPI networks. Based on DPPN, we subsequently proposed a novel method to identify protein complexes, which could effectively exploit topological structure as well as dynamic information of DPPN. We used three different yeast PPI datasets and gene expression data to construct three DPPNs. When applied to three DPPNs, many well-characterized protein complexes were accurately identified by this method. Conclusion The shift from static PPI networks to dynamic PPI networks is essential to accurately identify protein complex. This method not only can be applied to identify protein complex, but also establish a framework to integrate dynamic information into static networks for other applications, such as pathway analysis.
Collapse
Affiliation(s)
- Yijia Zhang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116023, China.
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116023, China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116023, China
| | - Jian Wang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116023, China
| |
Collapse
|
21
|
Hanna EM, Zaki N, Amin A. Detecting Protein Complexes in Protein Interaction Networks Modeled as Gene Expression Biclusters. PLoS One 2015; 10:e0144163. [PMID: 26641660 PMCID: PMC4671556 DOI: 10.1371/journal.pone.0144163] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 11/13/2015] [Indexed: 12/13/2022] Open
Abstract
Developing suitable methods for the detection of protein complexes in protein interaction networks continues to be an intriguing area of research. The importance of this objective originates from the fact that protein complexes are key players in most cellular processes. The more complexes we identify, the better we can understand normal as well as abnormal molecular events. Up till now, various computational methods were designed for this purpose. However, despite their notable performance, questions arise regarding potential ways to improve them, in addition to ameliorative guidelines to introduce novel approaches. A close interpretation leads to the assent that the way in which protein interaction networks are initially viewed should be adjusted. These networks are dynamic in reality and it is necessary to consider this fact to enhance the detection of protein complexes. In this paper, we present "DyCluster", a framework to model the dynamic aspect of protein interaction networks by incorporating gene expression data, through biclustering techniques, prior to applying complex-detection algorithms. The experimental results show that DyCluster leads to higher numbers of correctly-detected complexes with better evaluation scores. The high accuracy achieved by DyCluster in detecting protein complexes is a valid argument in favor of the proposed method. DyCluster is also able to detect biologically meaningful protein groups. The code and datasets used in the study are downloadable from https://github.com/emhanna/DyCluster.
Collapse
Affiliation(s)
| | - Nazar Zaki
- Intelligent Systems, College of Info. Tech., UAEU, Al Ain 17551, UAE
| | - Amr Amin
- Department of Biology, College of Science, UAEU, Al Ain 15551, UAE
- Faculty of Science, Cairo University, Cairo, Egypt
| |
Collapse
|
22
|
Characterization of protein complexes and subcomplexes in protein-protein interaction databases. Biochem Res Int 2015; 2015:245075. [PMID: 25722891 PMCID: PMC4334629 DOI: 10.1155/2015/245075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 01/05/2015] [Accepted: 01/06/2015] [Indexed: 12/24/2022] Open
Abstract
The identification and characterization of protein complexes implicated in protein-protein interaction data are crucial to the understanding of the molecular events under normal and abnormal physiological conditions. This paper provides a novel characterization of subcomplexes in protein interaction databases, stressing definition and representation issues, quantification, biological validation, network metrics, motifs, modularity, and gene ontology (GO) terms. The paper introduces the concept of "nested group" as a way to represent subcomplexes and estimates that around 15% of those nested group with the higher Jaccard index may be a result of data artifacts in protein interaction databases, while a number of them can be found in biologically important modular structures or dynamic structures. We also found that network centralities, enrichment in essential proteins, GO terms related to regulation, imperfect 5-clique motifs, and higher GO homogeneity can be used to identify proteins in nested complexes.
Collapse
|
23
|
Wang H, Wang C, Zhang L, Lu Y, Duan Q, Gong Z, Liang A, Song H, Wang L. Analysis of the protein-protein interaction networks of differentially expressed genes in pulmonary embolism. Mol Med Rep 2014; 11:2527-33. [PMID: 25434468 PMCID: PMC4337743 DOI: 10.3892/mmr.2014.3006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 10/24/2014] [Indexed: 12/27/2022] Open
Abstract
The aim of the present study was to explore the function and interaction of differentially expressed genes (DEGs) in pulmonary embolism (PE). The gene expression profile GSE13535, was downloaded from the Gene Expression Omnibus database. The DEGs 2 and 18 h post-PE initiation were identified using the affy package in R software. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of the DEGs were analyzed using Database for Annotation Visualization and Integrated Discovery (DAVID) online analytical tools. In addition, protein-protein interaction (PPI) networks of the DEGs were constructed using the Search Tool for the Retrieval of Interacting Genes/Proteins. The PPI network at 18 h was modularized using ClusterONE, and a functional enrichment analysis of the DEGs in the top three modules was performed with DAVID. Overall, 80 and 346 DEGs were identified 2 and 18 h after PE initiation, respectively. The KEGG pathways, including chemokine signaling and toll-like receptor signaling, were shown to be significantly enriched. The five highest degree nodes in the PPI networks at 2 or 18 h were screened. The module analysis of the PPI network at 18 h revealed 11 hub nodes. A Gene Ontology terms analysis demonstrated that the DEGs in the top three modules were associated with the inflammatory, defense and immune responses. The results of the present study suggest that the DEGs identified, including chemokine-related genes TFPI2 and TNF, may be potential target genes for the treatment of PE. The chemokine signaling pathway, inflammatory response and immune response were explored, and it may be suggested that these pathways have important roles in PE.
Collapse
Affiliation(s)
- Hao Wang
- Department of Family Medicine, Tongji Hospital, School of Medicine, Tongji University, Shanghai 200065, P.R. China
| | - Chen Wang
- Department of Family Medicine, Tongji Hospital, School of Medicine, Tongji University, Shanghai 200065, P.R. China
| | - Lei Zhang
- Department of Family Medicine, Tongji Hospital, School of Medicine, Tongji University, Shanghai 200065, P.R. China
| | - Yinghua Lu
- Department of Family Medicine, Tongji Hospital, School of Medicine, Tongji University, Shanghai 200065, P.R. China
| | - Qianglin Duan
- Department of Cardiology, Tongji Hospital, School of Medicine, Tongji University, Shanghai 200065, P.R. China
| | - Zhu Gong
- Department of Cardiology, Tongji Hospital, School of Medicine, Tongji University, Shanghai 200065, P.R. China
| | - Aibin Liang
- Department of Hematology, Tongji Hospital, School of Medicine, Tongji University, Shanghai 200065, P.R. China
| | - Haoming Song
- Department of Cardiology, Tongji Hospital, School of Medicine, Tongji University, Shanghai 200065, P.R. China
| | - Lemin Wang
- Department of Cardiology, Tongji Hospital, School of Medicine, Tongji University, Shanghai 200065, P.R. China
| |
Collapse
|
24
|
Nafis S, Kalaiarasan P, Brojen Singh RK, Husain M, Bamezai RNK. Apoptosis regulatory protein-protein interaction demonstrates hierarchical scale-free fractal network. Brief Bioinform 2014; 16:675-99. [PMID: 25256288 DOI: 10.1093/bib/bbu036] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2014] [Accepted: 08/21/2014] [Indexed: 12/29/2022] Open
Abstract
Dysregulation or inhibition of apoptosis favors cancer and many other diseases. Understanding of the network interaction of the genes involved in apoptotic pathway, therefore, is essential, to look for targets of therapeutic intervention. Here we used the network theory methods, using experimentally validated 25 apoptosis regulatory proteins and identified important genes for apoptosis regulation, which demonstrated a hierarchical scale-free fractal protein-protein interaction network. TP53, BRCA1, UBIQ and CASP3 were recognized as a four key regulators. BRCA1 and UBIQ were also individually found to control highly clustered modules and play an important role in the stability of the overall network. The connection among the BRCA1, UBIQ and TP53 proteins was found to be important for regulation, which controlled their own respective communities and the overall network topology. The feedback loop regulation motif was identified among NPM1, BRCA1 and TP53, and these crucial motif topologies were also reflected in high frequency. The propagation of the perturbed signal from hubs was found to be active upto some distance, after which propagation started decreasing and TP53 was the most efficient signal propagator. From the functional enrichment analysis, most of the apoptosis regulatory genes associated with cardiovascular diseases and highly expressed in brain tissues were identified. Apart from TP53, BRCA1 was observed to regulate apoptosis by influencing motif, propagation of signals and module regulation, reflecting their biological significance. In future, biochemical investigation of the observed hub-interacting partners could provide further understanding about their role in the pathophysiology of cancer.
Collapse
|
25
|
Hanna EM, Zaki N. Detecting protein complexes in protein interaction networks using a ranking algorithm with a refined merging procedure. BMC Bioinformatics 2014; 15:204. [PMID: 24944073 PMCID: PMC4230023 DOI: 10.1186/1471-2105-15-204] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Accepted: 06/10/2014] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Developing suitable methods for the identification of protein complexes remains an active research area. It is important since it allows better understanding of cellular functions as well as malfunctions and it consequently leads to producing more effective cures for diseases. In this context, various computational approaches were introduced to complement high-throughput experimental methods which typically involve large datasets, are expensive in terms of time and cost, and are usually subject to spurious interactions. RESULTS In this paper, we propose ProRank+, a method which detects protein complexes in protein interaction networks. The presented approach is mainly based on a ranking algorithm which sorts proteins according to their importance in the interaction network, and a merging procedure which refines the detected complexes in terms of their protein members. ProRank + was compared to several state-of-the-art approaches in order to show its effectiveness. It was able to detect more protein complexes with higher quality scores. CONCLUSIONS The experimental results achieved by ProRank + show its ability to detect protein complexes in protein interaction networks. Eventually, the method could potentially identify previously-undiscovered protein complexes.The datasets and source codes are freely available for academic purposes at http://faculty.uaeu.ac.ae/nzaki/Research.htm.
Collapse
Affiliation(s)
- Eileen Marie Hanna
- College of Information Technology, United Arab Emirates University (UAEU, Al Ain 17551, United Arab Emirates.
| | | |
Collapse
|
26
|
Fadhal E, Mwambene EC, Gamieldien J. Modelling human protein interaction networks as metric spaces has potential in disease research and drug target discovery. BMC SYSTEMS BIOLOGY 2014; 8:68. [PMID: 24929653 PMCID: PMC4088370 DOI: 10.1186/1752-0509-8-68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Accepted: 06/04/2014] [Indexed: 01/06/2023]
Abstract
Background We have recently shown by formally modelling human protein interaction networks (PINs) as metric spaces and classified proteins into zones based on their distance from the topological centre that hub proteins are primarily centrally located. We also showed that zones closest to the network centre are enriched for critically important proteins and are also functionally very specialised for specific ‘house keeping’ functions. We proposed that proteins closest to the network centre may present good therapeutic targets. Here, we present multiple pieces of novel functional evidence that provides strong support for this hypothesis. Results We found that the human PINs has a highly connected signalling core, with the majority of proteins involved in signalling located in the two zones closest to the topological centre. The majority of essential, disease related, tumour suppressor, oncogenic and approved drug target proteins were found to be centrally located. Similarly, the majority of proteins consistently expressed in 13 types of cancer are also predominantly located in zones closest to the centre. Proteins from zones 1 and 2 were also found to comprise the majority of proteins in key KEGG pathways such as MAPK-signalling, the cell cycle, apoptosis and also pathways in cancer, with very similar patterns seen in pathways that lead to cancers such as melanoma and glioma, and non-neoplastic diseases such as measles, inflammatory bowel disease and Alzheimer’s disease. Conclusions Based on the diversity of evidence uncovered, we propose that when considered holistically, proteins located centrally in the human PINs that also have similar functions to existing drug targets are good candidate targets for novel therapeutics. Similarly, since disease pathways are dominated by centrally located proteins, candidates shortlisted in genome scale disease studies can be further prioritized and contextualised based on whether they occupy central positions in the human PINs.
Collapse
Affiliation(s)
| | | | - Junaid Gamieldien
- South African National Bioinformatics Institute/ MRC Unit for Bioinformatics Capacity Development, University of the Western Cape, Bellville 7530, South Africa.
| |
Collapse
|
27
|
Fadhal E, Gamieldien J, Mwambene EC. Protein interaction networks as metric spaces: a novel perspective on distribution of hubs. BMC SYSTEMS BIOLOGY 2014; 8:6. [PMID: 24438364 PMCID: PMC3902029 DOI: 10.1186/1752-0509-8-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 01/07/2014] [Indexed: 02/02/2023]
Abstract
Background In the post-genomic era, a central and overarching question in the analysis of protein-protein interaction networks continues to be whether biological characteristics and functions of proteins such as lethality, physiological malfunctions and malignancy are intimately linked to the topological role proteins play in the network as a mathematical structure. One of the key features that have implicitly been presumed is the existence of hubs, highly connected proteins considered to play a crucial role in biological networks. We explore the structure of protein interaction networks of a number of organisms as metric spaces and show that hubs are non randomly positioned and, from a distance point of view, centrally located. Results By analysing how the human functional protein interaction network, the human signalling network, Saccharomyces cerevisiae, Arabidopsis thaliana and Escherichia coli protein-protein interaction networks from various databases are distributed as metric spaces, we found that proteins interact radially through a central node, high degree proteins coagulate in the centre of the network, and those far away from the centre have low degree. We further found that the distribution of proteins from the centre is in some hierarchy of importance and has biological significance. Conclusions We conclude that structurally, protein interaction networks are mathematical entities that share properties between organisms but not necessarily with other networks that follow power-law. We therefore conclude that (i) if there are hubs defined by degree, they are not distributed randomly; (ii) zones closest to the centre of the network are enriched for critically important proteins and are also functionally very specialised for specific 'house keeping’ functions; (iii) proteins closest to the network centre are functionally less dispensable and may present good targets for therapy development; and (iv) network biology requires its own network theory modelled on actual biological evidence and that simply adopting theories from the social sciences may be misleading.
Collapse
Affiliation(s)
| | | | - Eric C Mwambene
- Department of Mathematics and Applied Mathematics, University of the Western Cape, P/Bag X17, Bellville, South Africa.
| |
Collapse
|
28
|
Tang X, Feng Q, Wang J, He Y, Pan Y. Clustering based on multiple biological information: approach for predicting protein complexes. IET Syst Biol 2013; 7:223-30. [PMID: 24067423 PMCID: PMC8687320 DOI: 10.1049/iet-syb.2012.0052] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Revised: 03/23/2013] [Accepted: 04/15/2013] [Indexed: 04/05/2024] Open
Abstract
Protein complexes are a cornerstone of many biological processes. Protein-protein interaction (PPI) data enable a number of computational methods for predicting protein complexes. However, the insufficiency of the PPI data significantly lowers the accuracy of computational methods. In the current work, the authors develop a novel method named clustering based on multiple biological information (CMBI) to discover protein complexes via the integration of multiple biological resources including gene expression profiles, essential protein information and PPI data. First, CMBI defines the functional similarity of each pair of interacting proteins based on the edge-clustering coefficient and the Pearson correlation coefficient. Second, CMBI selects essential proteins as seeds to build the protein complexes. A redundancy-filtering procedure is performed to eliminate redundant complexes. In addition to the essential proteins, CMBI also uses other proteins as seeds to expand protein complexes. To check the performance of CMBI, the authors compare the complexes discovered by CMBI with the ones found by other techniques by matching the predicted complexes against the reference complexes. The authors use subsequently GO::TermFinder to analyse the complexes predicted by various methods. Finally, the effect of parameters T and R is investigated. The results from GO functional enrichment and matching analyses show that CMBI performs significantly better than the state-of-the-art methods.
Collapse
Affiliation(s)
- Xiwei Tang
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
- School of Information Science and Engineering, Hunan First Normal UniversityChangsha410205People's Republic of China
| | - Qilong Feng
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
| | - Yiming He
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
| | - Yi Pan
- School of Information Science and Engineering, Central South UniversityChangsha410083People's Republic of China
- Department of Computer ScienceGeorgia State UniversityAtlantaGA30302-4110USA
| |
Collapse
|
29
|
Wu M, Xie Z, Li X, Kwoh CK, Zheng J. Identifying protein complexes from heterogeneous biological data. Proteins 2013; 81:2023-33. [PMID: 23852772 DOI: 10.1002/prot.24365] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2013] [Revised: 06/03/2013] [Accepted: 06/17/2013] [Indexed: 12/27/2022]
Abstract
With the increasing availability of diverse biological information for proteins, integration of heterogeneous data becomes more useful for many problems in proteomics, such as annotating protein functions, predicting novel protein-protein interactions and so on. In this paper, we present an integrative approach called InteHC (Integrative Hierarchical Clustering) to identify protein complexes from multiple data sources. Although integrating multiple sources could effectively improve the coverage of current insufficient protein interactome (the false negative issue), it could also introduce potential false-positive interactions that could hurt the performance of protein complex prediction. Our proposed InteHC method can effectively address these issues to facilitate accurate protein complex prediction and it is summarized into the following three steps. First, for each individual source/feature, InteHC computes the matrices to store the affinity scores between a protein pair that indicate their propensity to interact or co-complex relationship. Second, InteHC computes a final score matrix, which is the weighted sum of affinity scores from individual sources. In particular, the weights indicating the reliability of individual sources are learned from a supervised model (i.e., a linear ranking SVM). Finally, a hierarchical clustering algorithm is performed on the final score matrix to generate clusters as predicted protein complexes. In our experiments, we compared the results collected by our hierarchical clustering on each individual feature with those predicted by InteHC on the combined matrix. We observed that integration of heterogeneous data significantly benefits the identification of protein complexes. Moreover, a comprehensive comparison demonstrates that InteHC performs much better than 14 state-of-the-art approaches. All the experimental data and results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/InteHC.
Collapse
Affiliation(s)
- Min Wu
- School of Computer Engineering, Nanyang Technological University, Singapore
| | | | | | | | | |
Collapse
|
30
|
Zaki N, Efimov D, Berengueres J. Protein complex detection using interaction reliability assessment and weighted clustering coefficient. BMC Bioinformatics 2013; 14:163. [PMID: 23688127 PMCID: PMC3680028 DOI: 10.1186/1471-2105-14-163] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2012] [Accepted: 05/09/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predicting protein complexes from protein-protein interaction data is becoming a fundamental problem in computational biology. The identification and characterization of protein complexes implicated are crucial to the understanding of the molecular events under normal and abnormal physiological conditions. On the other hand, large datasets of experimentally detected protein-protein interactions were determined using High-throughput experimental techniques. However, experimental data is usually liable to contain a large number of spurious interactions. Therefore, it is essential to validate these interactions before exploiting them to predict protein complexes. RESULTS In this paper, we propose a novel graph mining algorithm (PEWCC) to identify such protein complexes. Firstly, the algorithm assesses the reliability of the interaction data, then predicts protein complexes based on the concept of weighted clustering coefficient. To demonstrate the effectiveness of the proposed method, the performance of PEWCC was compared to several methods. PEWCC was able to detect more matched complexes than any of the state-of-the-art methods with higher quality scores. CONCLUSIONS The higher accuracy achieved by PEWCC in detecting protein complexes is a valid argument in favor of the proposed method. The datasets and programs are freely available at http://faculty.uaeu.ac.ae/nzaki/Research.htm.
Collapse
Affiliation(s)
- Nazar Zaki
- Intelligent Systems, College of Information Technology, UAEU, Al Ain, UAE.
| | | | | |
Collapse
|