1
|
Li G, Luo X, Hu Z, Wu J, Peng W, Liu J, Zhu X. Essential proteins discovery based on dominance relationship and neighborhood similarity centrality. Health Inf Sci Syst 2023; 11:55. [PMID: 37981988 PMCID: PMC10654316 DOI: 10.1007/s13755-023-00252-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 10/13/2023] [Indexed: 11/21/2023] Open
Abstract
Essential proteins play a vital role in development and reproduction of cells. The identification of essential proteins helps to understand the basic survival of cells. Due to time-consuming, costly and inefficient with biological experimental methods for discovering essential proteins, computational methods have gained increasing attention. In the initial stage, essential proteins are mainly identified by the centralities based on protein-protein interaction (PPI) networks, which limit their identification rate due to many false positives in PPI networks. In this study, a purified PPI network is firstly introduced to reduce the impact of false positives in the PPI network. Secondly, by analyzing the similarity relationship between a protein and its neighbors in the PPI network, a new centrality called neighborhood similarity centrality (NSC) is proposed. Thirdly, based on the subcellular localization and orthologous data, the protein subcellular localization score and ortholog score are calculated, respectively. Fourthly, by analyzing a large number of methods based on multi-feature fusion, it is found that there is a special relationship among features, which is called dominance relationship, then, a novel model based on dominance relationship is proposed. Finally, NSC, subcellular localization score, and ortholog score are fused by the dominance relationship model, and a new method called NSO is proposed. In order to verify the performance of NSO, the seven representative methods (ION, NCCO, E_POC, SON, JDC, PeC, WDC) are compared on yeast datasets. The experimental results show that the NSO method has higher identification rate than other methods.
Collapse
Affiliation(s)
- Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xinlong Luo
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Zhipeng Hu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xiaoshu Zhu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
- School of Computer and Information Security & School of Software Engineering, Guilin University of Electronic Science and Technology, Guilin, China
| |
Collapse
|
2
|
Zhao H, Liu G, Cao X. A seed expansion-based method to identify essential proteins by integrating protein-protein interaction sub-networks and multiple biological characteristics. BMC Bioinformatics 2023; 24:452. [PMID: 38036960 PMCID: PMC10688502 DOI: 10.1186/s12859-023-05583-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 11/24/2023] [Indexed: 12/02/2023] Open
Abstract
BACKGROUND The identification of essential proteins is of great significance in biology and pathology. However, protein-protein interaction (PPI) data obtained through high-throughput technology include a high number of false positives. To overcome this limitation, numerous computational algorithms based on biological characteristics and topological features have been proposed to identify essential proteins. RESULTS In this paper, we propose a novel method named SESN for identifying essential proteins. It is a seed expansion method based on PPI sub-networks and multiple biological characteristics. Firstly, SESN utilizes gene expression data to construct PPI sub-networks. Secondly, seed expansion is performed simultaneously in each sub-network, and the expansion process is based on the topological features of predicted essential proteins. Thirdly, the error correction mechanism is based on multiple biological characteristics and the entire PPI network. Finally, SESN analyzes the impact of each biological characteristic, including protein complex, gene expression data, GO annotations, and subcellular localization, and adopts the biological data with the best experimental results. The output of SESN is a set of predicted essential proteins. CONCLUSIONS The analysis of each component of SESN indicates the effectiveness of all components. We conduct comparison experiments using three datasets from two species, and the experimental results demonstrate that SESN achieves superior performance compared to other methods.
Collapse
Affiliation(s)
- He Zhao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.
| | - Xintian Cao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
3
|
Han Y, Liu M, Wang Z. Key protein identification by integrating protein complex information and multi-biological features. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:18191-18206. [PMID: 38052554 DOI: 10.3934/mbe.2023808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Identifying key proteins based on protein-protein interaction networks has emerged as a prominent area of research in bioinformatics. However, current methods exhibit certain limitations, such as the omission of subcellular localization information and the disregard for the impact of topological structure noise on the reliability of key protein identification. Moreover, the influence of proteins outside a complex but interacting with proteins inside the complex on complex participation tends to be overlooked. Addressing these shortcomings, this paper presents a novel method for key protein identification that integrates protein complex information with multiple biological features. This approach offers a comprehensive evaluation of protein importance by considering subcellular localization centrality, topological centrality weighted by gene ontology (GO) similarity and complex participation centrality. Experimental results, including traditional statistical metrics, jackknife methodology metric and key protein overlap or difference, demonstrate that the proposed method not only achieves higher accuracy in identifying key proteins compared to nine classical methods but also exhibits robustness across diverse protein-protein interaction networks.
Collapse
Affiliation(s)
- Yongyin Han
- School of Computer Science and Technology, China University of Mining and Technology, China
- Xuzhou College of Industrial Technology, China
| | - Maolin Liu
- School of Computer Science and Technology, China University of Mining and Technology, China
| | - Zhixiao Wang
- School of Computer Science and Technology, China University of Mining and Technology, China
| |
Collapse
|
4
|
Kim S, Lee J, Park J, Choi S, Bui DC, Kim JE, Shin J, Kim H, Choi GJ, Lee YW, Chang PS, Son H. Genetic and Transcriptional Regulatory Mechanisms of Lipase Activity in the Plant Pathogenic Fungus Fusarium graminearum. Microbiol Spectr 2023; 11:e0528522. [PMID: 37093014 PMCID: PMC10269793 DOI: 10.1128/spectrum.05285-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 03/30/2023] [Indexed: 04/25/2023] Open
Abstract
Lipases, which catalyze the hydrolysis of long-chain triglycerides, diglycerides, and monoglycerides into free fatty acids and glycerol, participate in various biological pathways in fungi. In this study, we examined the biological functions and regulatory mechanisms of fungal lipases via two approaches. First, we performed a systemic functional characterization of 86 putative lipase-encoding genes in the plant-pathogenic fungus Fusarium graminearum. The phenotypes were assayed for vegetative growth, asexual and sexual reproduction, stress responses, pathogenicity, mycotoxin production, and lipase activity. Most mutants were normal in the assessed phenotypes, implying overlapping roles for lipases in F. graminearum. In particular, FgLip1 and Fgl1 were revealed as core extracellular lipases in F. graminearum. Second, we examined the lipase activity of previously constructed transcription factor (TF) mutants of F. graminearum and identified three TFs and one histone acetyltransferase that significantly affect lipase activity. The relative transcript levels of FgLIP1 and FGL1 were markedly reduced or enhanced in these TF mutants. Among them, Gzzc258 was identified as a key lipase regulator that is also involved in the induction of lipase activity during sexual reproduction. To our knowledge, this study is the first comprehensive functional analysis of fungal lipases and provides significant insights into the genetic and regulatory mechanisms underlying lipases in fungi. IMPORTANCE Fusarium graminearum is an economically important plant-pathogenic fungus that causes Fusarium head blight (FHB) on wheat and barley. Here, we constructed a gene knockout mutant library of 86 putative lipase-encoding genes and established a comprehensive phenotypic database of the mutants. Among them, we found that FgLip1 and Fgl1 act as core extracellular lipases in this pathogen. Moreover, several putative transcription factors (TFs) that regulate the lipase activities in F. graminearum were identified. The disruption mutants of F. graminearum-lipase regulatory TFs all showed defects in sexual reproduction, which implies a strong relationship between sexual development and lipase activity in this fungus. These findings provide valuable insights into the genetic mechanisms regulating lipase activity as well as its importance to the developmental stages of this plant-pathogenic fungus.
Collapse
Affiliation(s)
- Sieun Kim
- Department of Agricultural Biotechnology, Seoul National University, Seoul, Republic of Korea
| | - Juno Lee
- Department of Agricultural Biotechnology, Seoul National University, Seoul, Republic of Korea
| | - Jiyeun Park
- Department of Agricultural Biotechnology, Seoul National University, Seoul, Republic of Korea
| | - Soyoung Choi
- Department of Agricultural Biotechnology, Seoul National University, Seoul, Republic of Korea
| | - Duc-Cuong Bui
- Department of Pathology, University of Texas Medical Branch, Galveston, Texas, USA
| | - Jung-Eun Kim
- Research Institute of Climate Change and Agriculture, National Institute of Horticultural and Herbal Science, Jeju, Republic of Korea
| | - Jiyoung Shin
- Division of Bioresources Bank, Honam National Institute of Biological Resources, Mokpo, Republic of Korea
| | - Hun Kim
- Center for Eco-friendly New Materials, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea
| | - Gyung Ja Choi
- Center for Eco-friendly New Materials, Korea Research Institute of Chemical Technology, Daejeon, Republic of Korea
| | - Yin-Won Lee
- Department of Agricultural Biotechnology, Seoul National University, Seoul, Republic of Korea
| | - Pahn-Shick Chang
- Department of Agricultural Biotechnology, Seoul National University, Seoul, Republic of Korea
- Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
- Center for Food and Bioconvergence, Seoul National University, Seoul, Republic of Korea
- Center for Agricultural Microorganism and Enzyme, Seoul National University, Seoul, Republic of Korea
| | - Hokyoung Son
- Department of Agricultural Biotechnology, Seoul National University, Seoul, Republic of Korea
- Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
5
|
Pérez Rodríguez F, Valdés-Santiago L, Noé García-Chávez J, Luis Castro-Guillén J, Ruiz-Herrera J. Analysis of gene expression related to polyamine concentration and dimorphism induced in ornithine decarboxylase (odc) and spermidine synthase (spd) Ustilago maydis mutants. Fungal Genet Biol 2023; 166:103792. [PMID: 36996931 DOI: 10.1016/j.fgb.2023.103792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 03/20/2023] [Accepted: 03/24/2023] [Indexed: 03/30/2023]
Abstract
Polyamines are ubiquitous small organic cations, and their roles as regulators of several cellular processes are widely recognized. They are implicated in the key stages of the fungal life cycle. Ustilago maydis is a phytopathogenic fungus, the causal agent of common smut of maize and a model system to understand dimorphism and virulence. U. maydis grows in yeast form at pH 7 and it can develop its mycelial form in vitro at pH 3. Δodc mutants that are unable to synthesize polyamines, grew as yeast at pH 3 with a low putrescine concentration, and to complete its dimorphic transition high putrescine concentration was required. Δspd mutants required spermidine to grow and cannot form mycelium at pH 3. In this work, the increased expression of the mating genes, mfa1 and mfa2, on Δodc mutants, was related to high putrescine concentration. Global gene expression analysis comparisons of Δodc and Δspd U. maydis mutants indicated that 2,959 genes were differentially expressed in the presence of exogenous putrescine at pH 7 and 475 genes at pH 3. While, in Δspd mutant, the expression of 1,426 genes was affected by exogenous spermine concentration at pH 7 and 11 genes at pH 3. Additionally, we identified 28 transcriptional modules with correlated expression during seven tested conditions: mutant genotype, morphology (yeast, and mycelium), pH, and putrescine or spermidine concentration. Furthermore, significant differences in transcript levels were noted for genes in modules relating to pH and genotype genes involved in ribosome biogenesis, mitochondrial oxidative phosphorylation, N-glycan synthesis, and Glycosylphosphatidylinositol (GPI)-anchor. In summary, our results offer a valuable tool for the identification of potential factors involved in phenomena related to polyamines and dimorphism.
Collapse
|
6
|
Chen H, Cai Y, Ji C, Selvaraj G, Wei D, Wu H. AdaPPI: identification of novel protein functional modules via adaptive graph convolution networks in a protein-protein interaction network. Brief Bioinform 2023; 24:6918779. [PMID: 36526282 DOI: 10.1093/bib/bbac523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/10/2022] [Accepted: 11/02/2022] [Indexed: 12/23/2022] Open
Abstract
Identifying unknown protein functional modules, such as protein complexes and biological pathways, from protein-protein interaction (PPI) networks, provides biologists with an opportunity to efficiently understand cellular function and organization. Finding complex nonlinear relationships in underlying functional modules may involve a long-chain of PPI and pose great challenges in a PPI network with an unevenly sparse and dense node distribution. To overcome these challenges, we propose AdaPPI, an adaptive convolution graph network in PPI networks to predict protein functional modules. We first suggest an attributed graph node presentation algorithm. It can effectively integrate protein gene ontology attributes and network topology, and adaptively aggregates low- or high-order graph structural information according to the node distribution by considering graph node smoothness. Based on the obtained node representations, core cliques and expansion algorithms are applied to find functional modules in PPI networks. Comprehensive performance evaluations and case studies indicate that the framework significantly outperforms state-of-the-art methods. We also presented potential functional modules based on their confidence.
Collapse
|
7
|
Wang X, Zhang Y, Zhou P, Liu X. A supervised protein complex prediction method with network representation learning and gene ontology knowledge. BMC Bioinformatics 2022; 23:300. [PMID: 35879648 PMCID: PMC9317086 DOI: 10.1186/s12859-022-04850-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 07/18/2022] [Indexed: 11/29/2022] Open
Abstract
Background Protein complexes are essential for biologists to understand cell organization and function effectively. In recent years, predicting complexes from protein–protein interaction (PPI) networks through computational methods is one of the current research hotspots. Many methods for protein complex prediction have been proposed. However, how to use the information of known protein complexes is still a fundamental problem that needs to be solved urgently in predicting protein complexes. Results To solve these problems, we propose a supervised learning method based on network representation learning and gene ontology knowledge, which can fully use the information of known protein complexes to predict new protein complexes. This method first constructs a weighted PPI network based on gene ontology knowledge and topology information, reducing the network's noise problem. On this basis, the topological information of known protein complexes is extracted as features, and the supervised learning model SVCC is obtained according to the feature training. At the same time, the SVCC model is used to predict candidate protein complexes from the protein interaction network. Then, we use the network representation learning method to obtain the vector representation of the protein complex and train the random forest model. Finally, we use the random forest model to classify the candidate protein complexes to obtain the final predicted protein complexes. We evaluate the performance of the proposed method on two publicly PPI data sets. Conclusions Experimental results show that our method can effectively improve the performance of protein complex recognition compared with existing methods. In addition, we also analyze the biological significance of protein complexes predicted by our method and other methods. The results show that the protein complexes predicted by our method have high biological significance.
Collapse
Affiliation(s)
- Xiaoxu Wang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116024, Liaoning, China
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116024, Liaoning, China.
| | - Peixuan Zhou
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116024, Liaoning, China
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
8
|
Omranian S, Nikoloski Z, Grimm DG. Computational identification of protein complexes from network interactions: Present state, challenges, and the way forward. Comput Struct Biotechnol J 2022; 20:2699-2712. [PMID: 35685359 PMCID: PMC9166428 DOI: 10.1016/j.csbj.2022.05.049] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/25/2022] [Accepted: 05/25/2022] [Indexed: 01/05/2023] Open
|
9
|
Wang R, Ma H, Wang C. An Ensemble Learning Framework for Detecting Protein Complexes From PPI Networks. Front Genet 2022; 13:839949. [PMID: 35281831 PMCID: PMC8908451 DOI: 10.3389/fgene.2022.839949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 01/31/2022] [Indexed: 11/14/2022] Open
Abstract
Detecting protein complexes is one of the keys to understanding cellular organization and processes principles. With high-throughput experiments and computing science development, it has become possible to detect protein complexes by computational methods. However, most computational methods are based on either unsupervised learning or supervised learning. Unsupervised learning-based methods do not need training datasets, but they can only detect one or several topological protein complexes. Supervised learning-based methods can detect protein complexes with different topological structures. However, they are usually based on a type of training model, and the generalization of a single model is poor. Therefore, we propose an Ensemble Learning Framework for Detecting Protein Complexes (ELF-DPC) within protein-protein interaction (PPI) networks to address these challenges. The ELF-DPC first constructs the weighted PPI network by combining topological and biological information. Second, it mines protein complex cores using the protein complex core mining strategy we designed. Third, it obtains an ensemble learning model by integrating structural modularity and a trained voting regressor model. Finally, it extends the protein complex cores and forms protein complexes by a graph heuristic search strategy. The experimental results demonstrate that ELF-DPC performs better than the twelve state-of-the-art approaches. Moreover, functional enrichment analysis illustrated that ELF-DPC could detect biologically meaningful protein complexes. The code/dataset is available for free download from https://github.com/RongquanWang/ELF-DPC.
Collapse
Affiliation(s)
- Rongquan Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Huimin Ma
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
- *Correspondence: Huimin Ma,
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, Beijing, China
| |
Collapse
|
10
|
Kong W, Wong BJH, Gao H, Guo T, Liu X, Du X, Wong L, Goh WWB. PROTREC: A probability-based approach for recovering missing proteins based on biological networks. J Proteomics 2022; 250:104392. [PMID: 34626823 DOI: 10.1016/j.jprot.2021.104392] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 08/30/2021] [Accepted: 09/02/2021] [Indexed: 12/18/2022]
Abstract
A novel network-based approach for predicting missing proteins (MPs) is proposed here. This approach, PROTREC (short for PROtein RECovery), dominates existing network-based methods - such as Functional Class Scoring (FCS), Hypergeometric Enrichment (HE), and Gene Set Enrichment Analysis (GSEA) - across a variety of proteomics datasets derived from different proteomics data acquisition paradigms: Higher PROTREC scores are much more closely correlated with higher recovery rates of MPs across sample replicates. The PROTREC score, unlike methods reporting p-values, can be directly interpreted as the probability that an unreported protein in a proteomic screen is actually present in the sample being screened. SIGNIFICANCE: Mass spectrometry (MS) has developed rapidly in recent years; however, an obvious proportion of proteins is still undetected, leading to missing protein problems. A few existing protein recovery methods are based on biological networks, but the performance is not satisfactory. We propose a new protein recovery method, PROTREC, a Bayesian-inspired approach based on biological networks, which shows exceptional performance across multiple validation strategies. It does not rely on peptide information, so it avoids the ambiguity issue that most protein assembly methods face.
Collapse
Affiliation(s)
- Weijia Kong
- School of Biological Sciences, Nanyang Technological University, Singapore; Department of Computer Science, National University of Singapore, Singapore
| | | | - Huanhuan Gao
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Zhejiang, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Zhejiang Province, China
| | - Tiannan Guo
- Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Zhejiang, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Zhejiang Province, China
| | - Xianming Liu
- Bruker (Beijing) Scientific Technology Co., Ltd, Shanghai, China
| | - Xiaoxian Du
- Bruker (Beijing) Scientific Technology Co., Ltd, Shanghai, China
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore.
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore.
| |
Collapse
|
11
|
Wang R, Ma H, Wang C. An Improved Memetic Algorithm for Detecting Protein Complexes in Protein Interaction Networks. Front Genet 2022; 12:794354. [PMID: 34970305 PMCID: PMC8712950 DOI: 10.3389/fgene.2021.794354] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 11/22/2021] [Indexed: 11/13/2022] Open
Abstract
Identifying the protein complexes in protein-protein interaction (PPI) networks is essential for understanding cellular organization and biological processes. To address the high false positive/negative rates of PPI networks and detect protein complexes with multiple topological structures, we developed a novel improved memetic algorithm (IMA). IMA first combines the topological and biological properties to obtain a weighted PPI network with reduced noise. Next, it integrates various clustering results to construct the initial populations. Furthermore, a fitness function is designed based on the five topological properties of the protein complexes. Finally, we describe the rest of our IMA method, which primarily consists of four steps: selection operator, recombination operator, local optimization strategy, and updating the population operator. In particular, IMA is a combination of genetic algorithm and a local optimization strategy, which has a strong global search ability, and searches for local optimal solutions effectively. The experimental results demonstrate that IMA performs much better than the base methods and existing state-of-the-art techniques. The source code and datasets of the IMA can be found at https://github.com/RongquanWang/IMA.
Collapse
Affiliation(s)
- Rongquan Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Huimin Ma
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, Beijing, China
| |
Collapse
|
12
|
Palukuri MV, Marcotte EM. Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks. PLoS One 2022; 16:e0262056. [PMID: 34972161 PMCID: PMC8719692 DOI: 10.1371/journal.pone.0262056] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/15/2021] [Indexed: 12/12/2022] Open
Abstract
Characterization of protein complexes, i.e. sets of proteins assembling into a single larger physical entity, is important, as such assemblies play many essential roles in cells such as gene regulation. From networks of protein-protein interactions, potential protein complexes can be identified computationally through the application of community detection methods, which flag groups of entities interacting with each other in certain patterns. Most community detection algorithms tend to be unsupervised and assume that communities are dense network subgraphs, which is not always true, as protein complexes can exhibit diverse network topologies. The few existing supervised machine learning methods are serial and can potentially be improved in terms of accuracy and scalability by using better-suited machine learning models and parallel algorithms. Here, we present Super.Complex, a distributed, supervised AutoML-based pipeline for overlapping community detection in weighted networks. We also propose three new evaluation measures for the outstanding issue of comparing sets of learned and known communities satisfactorily. Super.Complex learns a community fitness function from known communities using an AutoML method and applies this fitness function to detect new communities. A heuristic local search algorithm finds maximally scoring communities, and a parallel implementation can be run on a computer cluster for scaling to large networks. On a yeast protein-interaction network, Super.Complex outperforms 6 other supervised and 4 unsupervised methods. Application of Super.Complex to a human protein-interaction network with ~8k nodes and ~60k edges yields 1,028 protein complexes, with 234 complexes linked to SARS-CoV-2, the COVID-19 virus, with 111 uncharacterized proteins present in 103 learned complexes. Super.Complex is generalizable with the ability to improve results by incorporating domain-specific features. Learned community characteristics can also be transferred from existing applications to detect communities in a new application with no known communities. Code and interactive visualizations of learned human protein complexes are freely available at: https://sites.google.com/view/supercomplex/super-complex-v3-0.
Collapse
Affiliation(s)
- Meghana Venkata Palukuri
- Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, Texas, United States of America
- * E-mail: (MVP); (EMM)
| | - Edward M. Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas, United States of America
- * E-mail: (MVP); (EMM)
| |
Collapse
|
13
|
Zhu X, He X, Kuang L, Chen Z, Lancine C. A Novel Collaborative Filtering Model-Based Method for Identifying Essential Proteins. Front Genet 2021; 12:763153. [PMID: 34745230 PMCID: PMC8566338 DOI: 10.3389/fgene.2021.763153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 09/13/2021] [Indexed: 11/19/2022] Open
Abstract
Considering that traditional biological experiments are expensive and time consuming, it is important to develop effective computational models to infer potential essential proteins. In this manuscript, a novel collaborative filtering model-based method called CFMM was proposed, in which, an updated protein–domain interaction (PDI) network was constructed first by applying collaborative filtering algorithm on the original PDI network, and then, through integrating topological features of PDI networks with biological features of proteins, a calculative method was designed to infer potential essential proteins based on an improved PageRank algorithm. The novelties of CFMM lie in construction of an updated PDI network, application of the commodity-customer-based collaborative filtering algorithm, and introduction of the calculation method based on an improved PageRank algorithm, which ensured that CFMM can be applied to predict essential proteins without relying entirely on known protein–domain associations. Simulation results showed that CFMM can achieve reliable prediction accuracies of 92.16, 83.14, 71.37, 63.87, 55.84, and 52.43% in the top 1, 5, 10, 15, 20, and 25% predicted candidate key proteins based on the DIP database, which are remarkably higher than 14 competitive state-of-the-art predictive models as a whole, and in addition, CFMM can achieve satisfactory predictive performances based on different databases with various evaluation measurements, which further indicated that CFMM may be a useful tool for the identification of essential proteins in the future.
Collapse
Affiliation(s)
- Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China.,Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, China
| | - Xin He
- College of Computer, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhiping Chen
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Camara Lancine
- The Social Sciences and Management University of Bamako, Bamako, Mali
| |
Collapse
|
14
|
He Z, Chen W, Wei X, Liu Y. On the statistical significance of communities from weighted graphs. Sci Rep 2021; 11:20304. [PMID: 34645850 PMCID: PMC8514603 DOI: 10.1038/s41598-021-99175-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Accepted: 09/21/2021] [Indexed: 11/09/2022] Open
Abstract
Community detection is a fundamental procedure in the analysis of network data. Despite decades of research, there is still no consensus on the definition of a community. To analytically test the realness of a candidate community in weighted networks, we present a general formulation from a significance testing perspective. In this new formulation, the edge-weight is modeled as a censored observation due to the noisy characteristics of real networks. In particular, the edge-weights of missing links are incorporated as well, which are specified to be zeros based on the assumption that they are truncated or unobserved. Thereafter, the community significance assessment issue is formulated as a two-sample test problem on censored data. More precisely, the Logrank test is employed to conduct the significance testing on two sets of augmented edge-weights: internal weight set and external weight set. The presented approach is evaluated on both weighted networks and un-weighted networks. The experimental results show that our method can outperform prior widely used evaluation metrics on the task of individual community validation.
Collapse
Affiliation(s)
- Zengyou He
- School of Software, Dalian University of Technology, Dalian, 116024, China. .,Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian, 116024, China.
| | - Wenfang Chen
- School of Software, Dalian University of Technology, Dalian, 116024, China
| | - Xiaoqi Wei
- School of Software, Dalian University of Technology, Dalian, 116024, China
| | - Yan Liu
- School of Software, Dalian University of Technology, Dalian, 116024, China
| |
Collapse
|
15
|
Omranian S, Angeleska A, Nikoloski Z. Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient. Comput Struct Biotechnol J 2021; 19:5255-5263. [PMID: 34630943 PMCID: PMC8479235 DOI: 10.1016/j.csbj.2021.09.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 09/13/2021] [Accepted: 09/13/2021] [Indexed: 12/23/2022] Open
Abstract
Provided a family of efficient network algorithms for protein complex identification. The parameter-free family outperforms existing approaches on different networks. It exactly recovered ~ 35% of protein complexes in a pan-plant PPI network. We examined of network perturbations on predicted protein complexes.
Identification of protein complexes from protein-protein interaction (PPI) networks is a key problem in PPI mining, solved by parameter-dependent approaches that suffer from small recall rates. Here we introduce GCC-v, a family of efficient, parameter-free algorithms to accurately predict protein complexes using the (weighted) clustering coefficient of proteins in PPI networks. Through comparative analyses with gold standards and PPI networks from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, we demonstrate that GCC-v outperforms twelve state-of-the-art approaches for identification of protein complexes with respect to twelve performance measures in at least 85.71% of scenarios. We also show that GCC-v results in the exact recovery of ∼35% of protein complexes in a pan-plant PPI network and discover 144 new protein complexes in Arabidopsis thaliana, with high support from GO semantic similarity. Our results indicate that findings from GCC-v are robust to network perturbations, which has direct implications to assess the impact of the PPI network quality on the predicted protein complexes.
Collapse
Affiliation(s)
- Sara Omranian
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany.,Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476 Potsdam, Germany.,Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany
| |
Collapse
|
16
|
Palukuri MV, Marcotte EM. Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021. [PMID: 34189530 PMCID: PMC8240683 DOI: 10.1101/2021.06.22.449395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Characterization of protein complexes, i.e. sets of proteins assembling into a single larger physical entity, is important, as such assemblies play many essential roles in cells such as gene regulation. From networks of protein-protein interactions, potential protein complexes can be identified computationally through the application of community detection methods, which flag groups of entities interacting with each other in certain patterns. Most community detection algorithms tend to be unsupervised and assume that communities are dense network subgraphs, which is not always true, as protein complexes can exhibit diverse network topologies. The few existing supervised machine learning methods are serial and can potentially be improved in terms of accuracy and scalability by using better-suited machine learning models and parallel algorithms. Here, we present Super.Complex, a distributed, supervised AutoML-based pipeline for overlapping community detection in weighted networks. We also propose three new evaluation measures for the outstanding issue of comparing sets of learned and known communities satisfactorily. Super.Complex learns a community fitness function from known communities using an AutoML method and applies this fitness function to detect new communities. A heuristic local search algorithm finds maximally scoring communities, and a parallel implementation can be run on a computer cluster for scaling to large networks. On a yeast protein-interaction network, Super.Complex outperforms 6 other supervised and 4 unsupervised methods. Application of Super.Complex to a human protein-interaction network with ~8k nodes and ~60k edges yields 1,028 protein complexes, with 234 complexes linked to SARS-CoV-2, the COVID-19 virus, with 111 uncharacterized proteins present in 103 learned complexes. Super.Complex is generalizable with the ability to improve results by incorporating domain-specific features. Learned community characteristics can also be transferred from existing applications to detect communities in a new application with no known communities. Code and interactive visualizations of learned human protein complexes are freely available at: https://sites.google.com/view/supercomplex/super-complex-v3-0.
Collapse
|
17
|
Yao H, Guan J, Liu T. Denoising Protein-Protein interaction network via variational graph auto-encoder for protein complex detection. J Bioinform Comput Biol 2021; 18:2040010. [PMID: 32698725 DOI: 10.1142/s0219720020400107] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Identifying protein complexes is an important issue in computational biology, as it benefits the understanding of cellular functions and the design of drugs. In the past decades, many computational methods have been proposed by mining dense subgraphs in Protein-Protein Interaction Networks (PINs). However, the high rate of false positive/negative interactions in PINs prevents accurately detecting complexes directly from the raw PINs. In this paper, we propose a denoising approach for protein complex detection by using variational graph auto-encoder. First, we embed a PIN to vector space by a stacked graph convolutional network (GCN), then decide which interactions in the PIN are credible. If the probability of an interaction being credible is less than a threshold, we delete the interaction. In such a way, we reconstruct a reliable PIN. Following that, we detect protein complexes in the reconstructed PIN by using several typical detection methods, including CPM, Coach, DPClus, GraphEntropy, IPCA and MCODE, and compare the results with those obtained directly from the original PIN. We conduct the empirical evaluation on four yeast PPI datasets (Gavin, Krogan, DIP and Wiphi) and two human PPI datasets (Reactome and Reactomekb), against two yeast complex benchmarks (CYC2008 and MIPS) and three human complex benchmarks (REACT, REACT_uniprotkb and CORE_COMPLEX_human), respectively. Experimental results show that with the reconstructed PINs obtained by our denoising approach, complex detection performance can get obviously boosted, in most cases by over 5%, sometimes even by 200%. Furthermore, we compare our approach with two existing denoising methods (RWS and RedNemo) while varying different matching rates on separate complex distributions. Our results show that in most cases (over 2/3), the proposed approach outperforms the existing methods.
Collapse
Affiliation(s)
- Heng Yao
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai 201804, P. R. China.,Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai, P. R. China.,Shanghai Electronic Transactions and Information Service, Collaborative Innovation Center, Shanghai, P. R. China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai 201804, P. R. China.,Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai, P. R. China.,Shanghai Electronic Transactions and Information Service, Collaborative Innovation Center, Shanghai, P. R. China
| | - Tianying Liu
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai 201804, P. R. China.,Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai, P. R. China.,Shanghai Electronic Transactions and Information Service, Collaborative Innovation Center, Shanghai, P. R. China
| |
Collapse
|
18
|
Zhu J, Zheng Z, Yang M, Fung GPC, Huang C. Protein Complexes Detection Based on Semi-Supervised Network Embedding Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:797-803. [PMID: 31581089 DOI: 10.1109/tcbb.2019.2944809] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A protein complex is a group of associated polypeptide chains which plays essential roles in the biological process. Given a graph representing protein-protein interactions (PPI) network, it is critical but non-trivial to detect protein complexes, the subsets of proteins that are tightly coupled, from it. Network embedding is a technique to learn low-dimensional representations of vertices in networks. It has been proved quite useful for community detection in social networks in recent years. However, unlike social networks, PPI network does not contain rich metadata, so that existing network embedding methods cannot fully capture the network structure of PPI to improve the effect of protein complexes detection significantly. We propose a semi-supervised network embedding model by adopting graph convolutional networks to detect densely connected subgraphs effectively. We compare the performance of our model with state-of-the-art approaches on three popular PPI networks with various data sizes and densities. The experimental results show that our approach significantly outperforms other approaches on all three PPI networks.
Collapse
|
19
|
Omranian S, Angeleska A, Nikoloski Z. PC2P: Parameter-free network-based prediction of protein complexes. Bioinformatics 2021; 37:73-81. [PMID: 33416831 PMCID: PMC8034538 DOI: 10.1093/bioinformatics/btaa1089] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 12/17/2020] [Accepted: 12/30/2020] [Indexed: 11/12/2022] Open
Abstract
Motivation Prediction of protein complexes from protein–protein interaction (PPI) networks is an important problem in systems biology, as they control different cellular functions. The existing solutions employ algorithms for network community detection that identify dense subgraphs in PPI networks. However, gold standards in yeast and human indicate that protein complexes can also induce sparse subgraphs, introducing further challenges in protein complex prediction. Results To address this issue, we formalize protein complexes as biclique spanned subgraphs, which include both sparse and dense subgraphs. We then cast the problem of protein complex prediction as a network partitioning into biclique spanned subgraphs with removal of minimum number of edges, called coherent partition. Since finding a coherent partition is a computationally intractable problem, we devise a parameter-free greedy approximation algorithm, termed Protein Complexes from Coherent Partition (PC2P), based on key properties of biclique spanned subgraphs. Through comparison with nine contenders, we demonstrate that PC2P: (i) successfully identifies modular structure in networks, as a prerequisite for protein complex prediction, (ii) outperforms the existing solutions with respect to a composite score of five performance measures on 75% and 100% of the analyzed PPI networks and gold standards in yeast and human, respectively, and (iii,iv) does not compromise GO semantic similarity and enrichment score of the predicted protein complexes. Therefore, our study demonstrates that clustering of networks in terms of biclique spanned subgraphs is a promising framework for detection of complexes in PPI networks. Availability and implementation https://github.com/SaraOmranian/PC2P. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sara Omranian
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476, Potsdam, Germany
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476, Potsdam, Germany.,Centre of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| |
Collapse
|
20
|
Zeng M, Li M, Fei Z, Wu FX, Li Y, Pan Y, Wang J. A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:296-305. [PMID: 30736002 DOI: 10.1109/tcbb.2019.2897679] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Computational methods including centrality and machine learning-based methods have been proposed to identify essential proteins for understanding the minimum requirements of the survival and evolution of a cell. In centrality methods, researchers are required to design a score function which is based on prior knowledge, yet is usually not sufficient to capture the complexity of biological information. In machine learning-based methods, some selected biological features cannot represent the complete properties of biological information as they lack a computational framework to automatically select features. To tackle these problems, we propose a deep learning framework to automatically learn biological features without prior knowledge. We use node2vec technique to automatically learn a richer representation of protein-protein interaction (PPI) network topologies than a score function. Bidirectional long short term memory cells are applied to capture non-local relationships in gene expression data. For subcellular localization information, we exploit a high dimensional indicator vector to characterize their feature. To evaluate the performance of our method, we tested it on PPI network of S. cerevisiae. Our experimental results demonstrate that the performance of our method is better than traditional centrality methods and is superior to existing machine learning-based methods. To explore which of the three types of biological information is the most vital element, we conduct an ablation study by removing each component in turn. Our results show that the PPI network embedding contributes most to the improvement. In addition, gene expression profiles and subcellular localization information are also helpful to improve the performance in identification of essential proteins.
Collapse
|
21
|
Patra S, Mohapatra A. Protein complex prediction in interaction network based on network motif. Comput Biol Chem 2020; 89:107399. [PMID: 33152665 DOI: 10.1016/j.compbiolchem.2020.107399] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 08/07/2020] [Accepted: 10/01/2020] [Indexed: 11/28/2022]
Abstract
The enormous size of Protein-Protein Interaction (PPI) networks demands efficient computational methods to extract biologically significant protein complexes. A wide variety of algorithms have been proposed to predict protein complexes from PPI networks. However, it is still a challenging task to detect protein complexes with high accuracy and manageable sensitivity. In this manuscript, a novel complex prediction algorithm based on Network Motif (CPNM) is proposed. This algorithm addresses the role of proteins in the embeddings of network motif. These roles are used to define feature vectors and feature weights of proteins. Based on these features, a neighborhood search technique predict the protein complexes that consider both the inherent organization of proteins as well as the dense regions in PPI networks. The performance of the proposed algorithm is evaluated using various evaluation metrics like Precision, Recall, F-measure, Sensitivity, PPV, and Accuracy. The research finding indicates that the proposed algorithm outperforms most of the competing algorithms like MCODE, DPClus, RNSC, COACH, ClusterONE, CMC and PROCODE over the PPI network of Saccharomyces cerevisiae and Homo sapiens.
Collapse
Affiliation(s)
- Sabyasachi Patra
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| | - Anjali Mohapatra
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| |
Collapse
|
22
|
He Z, Zhao C, Liang H, Xu B, Zou Q. Protein Complexes Identification with Family-Wise Error Rate Control. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2062-2073. [PMID: 31027047 DOI: 10.1109/tcbb.2019.2912602] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The detection of protein complexes from protein-protein interaction network is a fundamental issue in bioinformatics and systems biology. To solve this problem, numerous methods have been proposed from different angles in the past decades. However, the study on detecting statistically significant protein complexes still has not received much attention. Although there are a few methods available in the literature for identifying statistically significant protein complexes, none of these methods can provide a more strict control on the error rate of a protein complex in terms of family-wise error rate (FWER). In this paper, we propose a new detection method SSF that is capable of controlling the FWER of each reported protein complex. More precisely, we first present a p-value calculation method based on Fisher's exact test to quantify the association between each protein and a given candidate protein complex. Consequently, we describe the key modules of the SSF algorithm: a seed expansion procedure for significant protein complexes search and a set cover strategy for redundancy elimination. The experimental results on five benchmark data sets show that: (1) our method can achieve the highest precision; (2) it outperforms three competing methods in terms of normalized mutual information (NMI) and F1 score in most cases.
Collapse
|
23
|
Ma CY, Liao CS. A review of protein-protein interaction network alignment: From pathway comparison to global alignment. Comput Struct Biotechnol J 2020; 18:2647-2656. [PMID: 33033584 PMCID: PMC7533294 DOI: 10.1016/j.csbj.2020.09.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 09/01/2020] [Accepted: 09/05/2020] [Indexed: 12/13/2022] Open
Abstract
Network alignment provides a comprehensive way to discover the similar parts between molecular systems of different species based on topological and biological similarity. With such a strong basis, one can do comparative studies at a systems level in the field of computational biology. In this survey paper, we focus on protein-protein interaction networks and review some representative algorithms for network alignment in the past two decades as well as the state-of-the-art aligners. We also introduce the most popular evaluation measures in the literature to benchmark the performance of these approaches. Finally, we address several future challenges and the possible ways to conquer the existing problems of biological network alignment.
Collapse
Affiliation(s)
- Cheng-Yu Ma
- Chang Gung Memorial Hospital, No. 5, Fu-Hsing St., Kuei Shan Dist., Taoyuan City 33305, Taiwan, ROC
| | - Chung-Shou Liao
- National Tsing Hua University, No. 101, Section 2, Kuang-Fu Rd., Hsinchu City 30013, Taiwan, ROC
| |
Collapse
|
24
|
Moi D, Kilchoer L, Aguilar PS, Dessimoz C. Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes. PLoS Comput Biol 2020; 16:e1007553. [PMID: 32697802 PMCID: PMC7423146 DOI: 10.1371/journal.pcbi.1007553] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 08/12/2020] [Accepted: 05/18/2020] [Indexed: 01/09/2023] Open
Abstract
Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require at least quadratic time as a function of the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at https://github.com/DessimozLab/HogProf. Genes that are involved in the same biological process tend to co-evolve. This property is exploited by the technique of phylogenetic profiling, which identifies co-evolving (and therefore likely functionally related) genes through patterns of correlated gene retention and loss in evolution and across species. However, conventional methods to computing and clustering these correlated genes do not scale with increasing numbers of genomes. HogProf is a novel phylogenetic profiling tool built on probabilistic data structures. It allows the user to construct searchable databases containing the evolutionary history of hundreds of thousands of protein families. Such fast detection of coevolution takes advantage of the rapidly increasing amount of genomic data publicly available, and can uncover unknown biological networks and guide in-vivo research and experimentation. We have applied our tool to describe the biological networks underpinning sexual reproduction in eukaryotes.
Collapse
Affiliation(s)
- David Moi
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail: (DM); (CD)
| | - Laurent Kilchoer
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo S. Aguilar
- Instituto de Investigaciones Biotecnologicas (IIBIO), Universidad Nacional de San Martín, Buenos Aires, Argentina
- Instituto de Fisiología, Biología Molecular y Neurociencias (IFIBYNE-CONICET), Buenos Aires, Argentina
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Genetics, Evolution, and Environment, University College London, London, United Kingdom
- Department of Computer Science, University College London, London, United Kingdom
- * E-mail: (DM); (CD)
| |
Collapse
|
25
|
SabziNezhad A, Jalili S. DPCT: A Dynamic Method for Detecting Protein Complexes From TAP-Aware Weighted PPI Network. Front Genet 2020; 11:567. [PMID: 32676097 PMCID: PMC7333736 DOI: 10.3389/fgene.2020.00567] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 05/11/2020] [Indexed: 12/13/2022] Open
Abstract
Detecting protein complexes from the Protein-Protein interaction network (PPI) is the essence of discovering the rules of the cellular world. There is a large amount of PPI data available, generated from high throughput experimental data. The enormous size of the data persuaded us to use computational methods instead of experimental methods to detect protein complexes. In past years, many researchers presented their algorithms to detect protein complexes. Most of the presented algorithms use current static PPI networks. New researches proved the dynamicity of cellular systems, and so, the PPI is not static over time. In this paper, we introduce DPCT to detect protein complexes from dynamic PPI networks. In the proposed method, TAP and GO data are used to make a weighted PPI network and to reduce the noise of PPI. Gene expression data are also used to make dynamic subnetworks from PPI. A memetic algorithm is used to bicluster gene expression data and to create a dynamic subnetwork for each bicluster. Experimental results show that DPCT can detect protein complexes with better correctness than state-of-the-art detection algorithms. The source code and datasets of DPCT used can be found at https://github.com/alisn72/DPCT.
Collapse
Affiliation(s)
- Ali SabziNezhad
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| | - Saeed Jalili
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
26
|
Yao H, Shi Y, Guan J, Zhou S. Accurately Detecting Protein Complexes by Graph Embedding and Combining Functions with Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:777-787. [PMID: 30736004 DOI: 10.1109/tcbb.2019.2897769] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying protein complexes is helpful for understanding cellular functions and designing drugs. In the last decades, many computational methods have been proposed based on detecting dense subgraphs or subnetworks in Protein-Protein Interaction Networks (PINs). However, the high rate of false positive/negative interactions in PINs prevents from the achievement of satisfactory detection results directly from PINs, because most of such existing methods exploit mainly topological information to do network partitioning. In this paper, we propose a new approach for protein complex detection by merging topological information of PINs and functional information of proteins. We first split proteins to a number of protein groups from the perspective of protein functions by using FunCat data. Then, for each of the resulting protein groups, we calculate two protein-protein similarity matrices: one is computed by using graph embedding over a PIN, the other is by using GO terms, and combine these two matrices to get an integrated similarity matrix. Following that, we cluster the proteins in each group based on the corresponding integrated similarity matrix, and obtain a number of small protein clusters. We map these clusters of proteins onto the PIN, and get a number of connected subgraphs. After a round of merging of overlapping subgraphs, finally we get the detected complexes. We conduct empirical evaluation on four PPI datasets (Collins, Gavin, Krogan, and Wiphi) with two complex benchmarks (CYC2008 and MIPS). Experimental results show that our method performs better than the state-of-the-art methods.
Collapse
|
27
|
Martha-Paz AM, Eide D, Mendoza-Cózatl D, Castro-Guerrero NA, Aréchiga-Carvajal ET. Zinc uptake in the Basidiomycota: Characterization of zinc transporters in Ustilago maydis. Mol Membr Biol 2019; 35:39-50. [PMID: 31617434 PMCID: PMC6816022 DOI: 10.1080/09687688.2019.1667034] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/12/2019] [Accepted: 09/03/2019] [Indexed: 10/25/2022]
Abstract
At present, the planet faces a change in the composition and bioavailability of nutrients. Zinc deficiency is a widespread problem throughout the world. It is imperative to understand the mechanisms that organisms use to adapt to the deficiency of this micronutrient. In the Ascomycetes fungi, the ZIP family of proteins is one of the most important for zinc transport and includes high affinity Zrt1p and low zinc affinity Zrt2p transporters. After identification and characterization of ZRT1/ZRT2-like genes in Ustilago maydis we conclude that they encode for high and low zinc affinity transporters, with no apparent iron transport activity. These conclusions were supported by the gene deletion in Ustilago and the functional characterization of ZRT1/ZRT2-like genes by measuring the intracellular zinc content over a range of zinc availability. The functional complementation of the S. cerevisiae ZRT1Δ ZRT2Δ mutant with U. maydis genes supports this as well. U. maydis ZRT2 gene, was found to be regulated by pH through Rim101 pathway, thus providing novel insights into how this Basidiomycota fungus can adapt to different levels of Zn availability.
Collapse
Affiliation(s)
- Adriana M. Martha-Paz
- Universidad Autónoma de Nuevo León, UANL, Facultad Ciencias Biológicas, LMYF, Unidad de Manipulación Genética, Unidad C. Av. Universidad S/N Cd. Universitaria, San Nicolás de los Garza, Nuevo León. México. C.P. 66451
| | - David Eide
- University of Wisconsin-Madison. Department of Nutritional Sciences. 1415 Linden Drive, Madison WI, USA 53706
| | - David Mendoza-Cózatl
- Division of Plant Sciences, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA 65211
| | - Norma A. Castro-Guerrero
- Division of Plant Sciences, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA 65211
| | - Elva T. Aréchiga-Carvajal
- Universidad Autónoma de Nuevo León, UANL, Facultad Ciencias Biológicas, LMYF, Unidad de Manipulación Genética, Unidad C. Av. Universidad S/N Cd. Universitaria, San Nicolás de los Garza, Nuevo León. México. C.P. 66451
| |
Collapse
|
28
|
A Computational Framework for Predicting Direct Contacts and Substructures within Protein Complexes. Biomolecules 2019; 9:biom9110656. [PMID: 31717703 PMCID: PMC6921016 DOI: 10.3390/biom9110656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/20/2019] [Accepted: 10/23/2019] [Indexed: 11/17/2022] Open
Abstract
Understanding the physical arrangement of subunits within protein complexes potentially provides valuable clues about how the subunits work together and how the complexes function. The majority of recent research focuses on identifying protein complexes as a whole and seldom studies the inner structures within complexes. In this study, we propose a computational framework to predict direct contacts and substructures within protein complexes. In this framework, we first train a supervised learning model of l2-regularized logistic regression to learn the patterns of direct and indirect interactions within complexes, from where physical subunit interaction networks are predicted. Then, to infer substructures within complexes, we apply a graph clustering method (i.e., maximum modularity clustering (MMC)) and a gene ontology (GO) semantic similarity based functional clustering on partially- and fully-connected networks, respectively. Computational results show that the proposed framework achieves fairly good performance of cross validation and independent test in terms of detecting direct contacts between subunits. Functional analyses further demonstrate the rationality of partitioning the subunits into substructures via the MMC algorithm and functional clustering.
Collapse
|
29
|
Wang R, Wang C, Sun L, Liu G. A seed-extended algorithm for detecting protein complexes based on density and modularity with topological structure and GO annotations. BMC Genomics 2019; 20:637. [PMID: 31390979 PMCID: PMC6686515 DOI: 10.1186/s12864-019-5956-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 07/04/2019] [Indexed: 12/28/2022] Open
Abstract
Background The detection of protein complexes is of great significance for researching mechanisms underlying complex diseases and developing new drugs. Thus, various computational algorithms have been proposed for protein complex detection. However, most of these methods are based on only topological information and are sensitive to the reliability of interactions. As a result, their performance is affected by false-positive interactions in PPINs. Moreover, these methods consider only density and modularity and ignore protein complexes with various densities and modularities. Results To address these challenges, we propose an algorithm to exploit protein complexes in PPINs by a Seed-Extended algorithm based on Density and Modularity with Topological structure and GO annotations, named SE-DMTG to improve the accuracy of protein complex detection. First, we use common neighbors and GO annotations to construct a weighted PPIN. Second, we define a new seed selection strategy to select seed nodes. Third, we design a new fitness function to detect protein complexes with various densities and modularities. We compare the performance of SE-DMTG with that of thirteen state-of-the-art algorithms on several real datasets. Conclusion The experimental results show that SE-DMTG not only outperforms some classical algorithms in yeast PPINs in terms of the F-measure and Jaccard but also achieves an ideal performance in terms of functional enrichment. Furthermore, we apply SE-DMTG to PPINs of several other species and demonstrate the outstanding accuracy and matching ratio in detecting protein complexes compared with other algorithms.
Collapse
Affiliation(s)
- Rongquan Wang
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, 24 Zhanlanguan Road, Xicheng District, Beijing, 100037, China
| | - Liyan Sun
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China. .,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.
| |
Collapse
|
30
|
Xu B, Guan J, Wang Y, Wang Z. Essential Protein Detection by Random Walk on Weighted Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:377-387. [PMID: 28504946 DOI: 10.1109/tcbb.2017.2701824] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Essential proteins are critical to the development and survival of cells. Identification of essential proteins is helpful for understanding the minimal set of required genes in a living cell and for designing new drugs. To detect essential proteins, various computational methods have been proposed based on protein-protein interaction (PPI) networks. However, protein interaction data obtained by high-throughput experiments usually contain high false positives, which negatively impacts the accuracy of essential protein detection. Moreover, most existing studies focused on the local information of proteins in PPI networks, while ignoring the influence of indirect protein interactions on essentiality. In this paper, we propose a novel method, called Essentiality Ranking (EssRank in short), to boost the accuracy of essential protein detection. To deal with the inaccuracy of PPI data, confidence scores of interactions are evaluated by integrating various biological information. Weighted edge clustering coefficient (WECC), considering both interaction confidence scores and network topology, is proposed to calculate edge weights in PPI networks. The weight of each node is evaluated by the sum of WECC values of its linking edges. A random walk method, making use of both direct and indirect protein interactions, is then employed to calculate protein essentiality iteratively. Experimental results on the yeast PPI network show that EssRank outperforms most existing methods, including the most commonly-used centrality measures (SC, DC, BC, CC, IC, and EC), topology based methods (DMNC and NC) and the data integrating method IEW.
Collapse
|
31
|
Haque M, Sarmah R, Bhattacharyya DK. A common neighbor based technique to detect protein complexes in PPI networks. J Genet Eng Biotechnol 2019; 16:227-238. [PMID: 30647726 PMCID: PMC6296598 DOI: 10.1016/j.jgeb.2017.10.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Revised: 09/26/2017] [Accepted: 10/05/2017] [Indexed: 01/15/2023]
Abstract
Detection of protein complexes by analyzing and understanding PPI networks is an important task and critical to all aspects of cell biology. We present a technique called PROtein COmplex DEtection based on common neighborhood (PROCODE) that considers the inherent organization of protein complexes as well as the regions with heavy interactions in PPI networks to detect protein complexes. Initially, the core of the protein complexes is detected based on the neighborhood of PPI network. Then a merging strategy based on density is used to attach proteins and protein complexes to the core-protein complexes to form biologically meaningful structures. The predicted protein complexes of PROCODE was evaluated and analyzed using four PPI network datasets out of which three were from budding yeast and one from human. Our proposed technique is compared with some of the existing techniques using standard benchmark complexes and PROCODE was found to match very well with actual protein complexes in the benchmark data. The detected complexes were at par with existing biological evidence and knowledge.
Collapse
|
32
|
Ray SS, Misra S. Genetic algorithm for assigning weights to gene expressions using functional annotations. Comput Biol Med 2018; 104:149-162. [PMID: 30472497 DOI: 10.1016/j.compbiomed.2018.11.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 11/13/2018] [Accepted: 11/13/2018] [Indexed: 12/17/2022]
Abstract
A method, named genetic algorithm for assigning weights to gene expressions using functional annotations (GAAWGEFA), is developed to assign proper weights to the gene expressions at each time point. The weights are estimated using functional annotations of the genes in a genetic algorithm framework. The method shows gene similarity in an improved manner as compared with other existing methods because it takes advantage of the existing functional annotations of the genes. The weight combination for the expressions at different time points is determined by maximizing the fitness function of GAAWGEFA in terms of the positive predictive value (PPV) for the top 10,000 gene pairs. The performance of the proposed method is primarily compared with Biweight mid correlation (BICOR) and original expression values for the six Saccharomyces cerevisiae datasets and one Bacillus subtilis dataset. The utility of GAAWGEFA is shown in predicting the functions of 48 unclassified genes (using p-value cutoff 10-13) from Saccharomyces cerevisiae microarray data where the expressions are weighted using GAAWGEFA and are clustered using k-medoids algorithm. The related code along with various parameters is available at http://sampa.droppages.com/GAAWGEFA.html.
Collapse
Affiliation(s)
- Shubhra Sankar Ray
- Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108, India.
| | - Sampa Misra
- Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108, India.
| |
Collapse
|
33
|
Liu X, Yang Z, Sang S, Zhou Z, Wang L, Zhang Y, Lin H, Wang J, Xu B. Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks. BMC Bioinformatics 2018; 19:332. [PMID: 30241459 PMCID: PMC6150962 DOI: 10.1186/s12859-018-2364-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 09/09/2018] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Protein complexes are one of the keys to deciphering the behavior of a cell system. During the past decade, most computational approaches used to identify protein complexes have been based on discovering densely connected subgraphs in protein-protein interaction (PPI) networks. However, many true complexes are not dense subgraphs and these approaches show limited performances for detecting protein complexes from PPI networks. RESULTS To solve these problems, in this paper we propose a supervised learning method based on network node embeddings which utilizes the informative properties of known complexes to guide the search process for new protein complexes. First, node embeddings are obtained from human protein interaction network. Then the protein interactions are weighted through the similarities between node embeddings. After that, the supervised learning method is used to detect protein complexes. Then the random forest model is used to filter the candidate complexes in order to obtain the final predicted complexes. Experimental results on real human and yeast protein interaction networks show that our method effectively improves the performance for protein complex detection. CONCLUSIONS We provided a new method for identifying protein complexes from human and yeast protein interaction networks, which has great potential to benefit the field of protein complex detection.
Collapse
Affiliation(s)
- Xiaoxia Liu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China.
| | - Shengtian Sang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Ziwei Zhou
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Lei Wang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850, People's Republic of China.
| | - Yin Zhang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850, People's Republic of China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Jian Wang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Bo Xu
- School of Software Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| |
Collapse
|
34
|
Janani S, Ramyachitra D, Ranjani Rani R. PCD-DPPI: Protein complex detection from dynamic PPI using shuffled frog-leaping algorithm. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.06.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
35
|
Ding Z, Kihara D. Computational Methods for Predicting Protein-Protein Interactions Using Various Protein Features. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2018; 93:e62. [PMID: 29927082 PMCID: PMC6097941 DOI: 10.1002/cpps.62] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Understanding protein-protein interactions (PPIs) in a cell is essential for learning protein functions, pathways, and mechanism of diseases. PPIs are also important targets for developing drugs. Experimental methods, both small-scale and large-scale, have identified PPIs in several model organisms. However, results cover only a part of PPIs of organisms; moreover, there are many organisms whose PPIs have not yet been investigated. To complement experimental methods, many computational methods have been developed that predict PPIs from various characteristics of proteins. Here we provide an overview of literature reports to classify computational PPI prediction methods that consider different features of proteins, including protein sequence, genomes, protein structure, function, PPI network topology, and those which integrate multiple methods. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
| | - Daisuke Kihara
- Department of Biological Science, Purdue University, West Lafayette, IN, 47907 USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907 USA
- Corresponding author: DK; , Phone: 1-765-496-2284 (DK)
| |
Collapse
|
36
|
Liu W, Ma L, Jeon B, Chen L, Chen B. A Network Hierarchy-Based method for functional module detection in protein-protein interaction networks. J Theor Biol 2018; 455:26-38. [PMID: 29981337 DOI: 10.1016/j.jtbi.2018.06.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Revised: 06/27/2018] [Accepted: 06/29/2018] [Indexed: 02/02/2023]
Abstract
In the post-genomic era, one of the important tasks is to identify protein complexes and functional modules from high-throughput protein-protein interaction data, so that we can systematically analyze and understand the molecular functions and biological processes of cells. Although a lot of functional module detection studies have been proposed, how to design correctly and efficiently functional modules detection algorithms is still a challenging and important scientific problem in computational biology. In this paper, we present a novel Network Hierarchy-Based method to detect functional modules in PPI networks (named NHB-FMD). NHB-FMD first constructs the hierarchy tree corresponding to the PPI network and then encodes the tree such that genetic algorithm is employed to obtain the hierarchy tree with Maximum Likelihood. After that functional module partitioning is performed based on it and the best partitioning is selected as the result. Experimental results in the real PPI networks have shown that the proposed algorithm not only significantly outperforms the state-of-the-art methods but also can detect protein modules more effectively and accurately.
Collapse
Affiliation(s)
- Wei Liu
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China; The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China; School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea.
| | - Liangyu Ma
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Byeungwoo Jeon
- School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea
| | - Ling Chen
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Bolun Chen
- The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China
| |
Collapse
|
37
|
Zhong J, Sun Y, Peng W, Xie M, Yang J, Tang X. XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction. IEEE Trans Nanobioscience 2018; 17:243-250. [DOI: 10.1109/tnb.2018.2842219] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
38
|
Lei X, Zhao J, Fujita H, Zhang A. Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.03.027] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
39
|
Sharma P, Bhattacharyya D, Kalita J. Detecting protein complexes based on a combination of topological and biological properties in protein-protein interaction network. J Genet Eng Biotechnol 2018; 16:217-226. [PMID: 30647725 PMCID: PMC6296571 DOI: 10.1016/j.jgeb.2017.11.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 01/04/2023]
Abstract
Protein complexes are known to play a major role in controlling cellular activity in a living being. Identifying complexes from raw protein protein interactions (PPIs) is an important area of research. Earlier work has been limited mostly to yeast. Such protein complex identification methods, when applied to large human PPIs often give poor performance. We introduce a novel method called CSC to detect protein complexes. The method is evaluated in terms of positive predictive value, sensitivity and accuracy using the datasets of the model organism, yeast and humans. CSC outperforms several other competing algorithms for both organisms. Further, we present a framework to establish the usefulness of CSC in analyzing the influence of a given disease gene in a complex topologically as well as biologically considering eight major association factors.
Collapse
Affiliation(s)
- Pooja Sharma
- Department of Computer Science & Engineering, Tezpur University Napaam, Tezpur 784028, Assam, India
| | - D.K. Bhattacharyya
- Department of Computer Science & Engineering, Tezpur University Napaam, Tezpur 784028, Assam, India
| | - J.K. Kalita
- Department of Computer Science, University of Colorado at Colorado, Springs, CO 80933-7150, USA
| |
Collapse
|
40
|
Li G, Luo J, Xiao Z, Liang C. MTMO: an efficient network-centric algorithm for subtree counting and enumeration. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-018-0140-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
41
|
MTGO: PPI Network Analysis Via Topological and Functional Module Identification. Sci Rep 2018; 8:5499. [PMID: 29615773 PMCID: PMC5882952 DOI: 10.1038/s41598-018-23672-0] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 02/28/2018] [Indexed: 11/08/2022] Open
Abstract
Protein-protein interaction (PPI) networks are viable tools to understand cell functions, disease machinery, and drug design/repositioning. Interpreting a PPI, however, it is a particularly challenging task because of network complexity. Several algorithms have been proposed for an automatic PPI interpretation, at first by solely considering the network topology, and later by integrating Gene Ontology (GO) terms as node similarity attributes. Here we present MTGO - Module detection via Topological information and GO knowledge, a novel functional module identification approach. MTGO let emerge the bimolecular machinery underpinning PPI networks by leveraging on both biological knowledge and topological properties. In particular, it directly exploits GO terms during the module assembling process, and labels each module with its best fit GO term, easing its functional interpretation. MTGO shows largely better results than other state of the art algorithms (including recent GO-based ones) when searching for small or sparse functional modules, while providing comparable or better results all other cases. MTGO correctly identifies molecular complexes and literature-consistent processes in an experimentally derived PPI network of Myocardial infarction. A software version of MTGO is available freely for non-commercial purposes at https://gitlab.com/d1vella/MTGO .
Collapse
|
42
|
Liu X, Yang Z, Zhou Z, Sun Y, Lin H, Wang J, Xu B. The impact of protein interaction networks’ characteristics on computational complex detection methods. J Theor Biol 2018; 439:141-151. [DOI: 10.1016/j.jtbi.2017.12.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 11/29/2017] [Accepted: 12/03/2017] [Indexed: 11/25/2022]
|
43
|
Cao B, Deng S, Luo J, Ding P, Wang S. Identification of overlapping protein complexes by fuzzy K-medoids clustering algorithm in yeast protein-protein interaction networks. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-17026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Buwen Cao
- School of Information Science and Engineering, Hunan City University, Yiyang, China
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shuguang Deng
- College of Communication and Electronic Engineering, Hunan City University, Yiyang, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Pingjian Ding
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shulin Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
44
|
CPredictor3.0: detecting protein complexes from PPI networks with expression data and functional annotations. BMC SYSTEMS BIOLOGY 2017; 11:135. [PMID: 29322927 PMCID: PMC5763309 DOI: 10.1186/s12918-017-0504-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Effectively predicting protein complexes not only helps to understand the structures and functions of proteins and their complexes, but also is useful for diagnosing disease and developing new drugs. Up to now, many methods have been developed to detect complexes by mining dense subgraphs from static protein-protein interaction (PPI) networks, while ignoring the value of other biological information and the dynamic properties of cellular systems. RESULTS In this paper, based on our previous works CPredictor and CPredictor2.0, we present a new method for predicting complexes from PPI networks with both gene expression data and protein functional annotations, which is called CPredictor3.0. This new method follows the viewpoint that proteins in the same complex should roughly have similar functions and are active at the same time and place in cellular systems. We first detect active proteins by using gene express data of different time points and cluster proteins by using gene ontology (GO) functional annotations, respectively. Then, for each time point, we do set intersections with one set corresponding to active proteins generated from expression data and the other set corresponding to a protein cluster generated from functional annotations. Each resulting unique set indicates a cluster of proteins that have similar function(s) and are active at that time point. Following that, we map each cluster of active proteins of similar function onto a static PPI network, and get a series of induced connected subgraphs. We treat these subgraphs as candidate complexes. Finally, by expanding and merging these candidate complexes, the predicted complexes are obtained. We evaluate CPredictor3.0 and compare it with a number of existing methods on several PPI networks and benchmarking complex datasets. The experimental results show that CPredictor3.0 achieves the highest F1-measure, which indicates that CPredictor3.0 outperforms these existing method in overall. CONCLUSION CPredictor3.0 can serve as a promising tool of protein complex prediction.
Collapse
|
45
|
Brown NA, Evans J, Mead A, Hammond‐Kosack KE. A spatial temporal analysis of the Fusarium graminearum transcriptome during symptomless and symptomatic wheat infection. MOLECULAR PLANT PATHOLOGY 2017; 18:1295-1312. [PMID: 28466509 PMCID: PMC5697668 DOI: 10.1111/mpp.12564] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Revised: 04/10/2017] [Accepted: 04/24/2017] [Indexed: 05/20/2023]
Abstract
Fusarium head blight of wheat is one of the most serious and hazardous crop diseases worldwide. Here, a transcriptomic investigation of Fusarium graminearum reveals a new model for symptomless and symptomatic wheat infection. The predicted metabolic state and secretome of F. graminearum were distinct within symptomless and symptomatic wheat tissues. Transcripts for genes involved in the biosynthesis of the mycotoxin, deoxynivalenol, plus other characterized and putative secondary metabolite clusters increased in abundance in symptomless tissue. Transcripts encoding for genes of distinct groups of putative secreted effectors increased within either symptomless or symptomatic tissue. Numerous pathogenicity-associated gene transcripts and transcripts representing PHI-base mutations that impacted on virulence increased in symptomless tissue. In contrast, hydrolytic carbohydrate-active enzyme (CAZyme) and lipase gene transcripts exhibited a different pattern of expression, resulting in elevated transcript abundance during the development of disease symptoms. Genome-wide comparisons with existing datasets confirmed that, within the wheat floral tissue, at a single time point, different phases of infection co-exist, which are spatially separated and reminiscent of both early and late infection. This study provides novel insights into the combined spatial temporal coordination of functionally characterized and hypothesized virulence strategies.
Collapse
Affiliation(s)
- Neil A. Brown
- Department of Biointeractions and Crop ProtectionRothamsted ResearchHarpenden, Hertfordshire AL5 2JQ, UK
| | - Jess Evans
- Computational and Analytical SciencesRothamsted ResearchHarpenden, Hertfordshire AL5 2JQ, UK
| | - Andrew Mead
- Computational and Analytical SciencesRothamsted ResearchHarpenden, Hertfordshire AL5 2JQ, UK
| | - Kim E. Hammond‐Kosack
- Department of Biointeractions and Crop ProtectionRothamsted ResearchHarpenden, Hertfordshire AL5 2JQ, UK
| |
Collapse
|
46
|
Ou-Yang L, Yan H, Zhang XF. A multi-network clustering method for detecting protein complexes from multiple heterogeneous networks. BMC Bioinformatics 2017; 18:463. [PMID: 29219066 PMCID: PMC5773919 DOI: 10.1186/s12859-017-1877-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023] Open
Abstract
Background The accurate identification of protein complexes is important for the understanding of cellular organization. Up to now, computational methods for protein complex detection are mostly focus on mining clusters from protein-protein interaction (PPI) networks. However, PPI data collected by high-throughput experimental techniques are known to be quite noisy. It is hard to achieve reliable prediction results by simply applying computational methods on PPI data. Behind protein interactions, there are protein domains that interact with each other. Therefore, based on domain-protein associations, the joint analysis of PPIs and domain-domain interactions (DDI) has the potential to obtain better performance in protein complex detection. As traditional computational methods are designed to detect protein complexes from a single PPI network, it is necessary to design a new algorithm that could effectively utilize the information inherent in multiple heterogeneous networks. Results In this paper, we introduce a novel multi-network clustering algorithm to detect protein complexes from multiple heterogeneous networks. Unlike existing protein complex identification algorithms that focus on the analysis of a single PPI network, our model can jointly exploit the information inherent in PPI and DDI data to achieve more reliable prediction results. Extensive experiment results on real-world data sets demonstrate that our method can predict protein complexes more accurately than other state-of-the-art protein complex identification algorithms. Conclusions In this work, we demonstrate that the joint analysis of PPI network and DDI network can help to improve the accuracy of protein complex detection.
Collapse
Affiliation(s)
- Le Ou-Yang
- College of Information Engineering & Shenzhen Key Laboratory of Media Security, Shenzhen University, Nanhai Ave 3688, Shenzhen, 518060, China
| | - Hong Yan
- College of Information Engineering & Shenzhen Key Laboratory of Media Security, Shenzhen University, Nanhai Ave 3688, Shenzhen, 518060, China.,Department of Electronic and Engineering, City University of Hong Kong, Tat Chee Avenue, Hong Kong, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China.
| |
Collapse
|
47
|
|
48
|
Finding optimum width of discretization for gene expressions using functional annotations. Comput Biol Med 2017; 90:59-67. [DOI: 10.1016/j.compbiomed.2017.09.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Revised: 09/14/2017] [Accepted: 09/14/2017] [Indexed: 12/20/2022]
|
49
|
MOHAMMADI-JENGHARA MOSLEM, EBRAHIMPOUR-KOMLEH HOSSEIN. EXTRACTION OF CO-BEHAVING GENES BY SIMILARITY ENSEMBLES. J BIOL SYST 2017. [DOI: 10.1142/s021833901750022x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Microarray technology is used as a source of data for a wide range of biology studies. Useful biological information can be extracted from the analysis of microarray data, namely, the impact of a particular gene expression on the expression of other genes or the determination of expressed genes under different conditions. The purpose of this paper is to find co-behavioral genes in different data sets for different times and conditions. In other words, genes with similar behavior, same increase or decrease, under different medical, stress, and time conditions in terms of expression are determined. Multi-valued discretization of expression data was used for extracting genes with identical behavior. The algorithm proposed in this study is based on data and methods ensemble. The data ensemble technique was used to extract candidate genes with identical behavior. Other methods were also applied on all the data sets; as a result, many co-behavioral candidate genes with different similarity and correlation values were identified. Finally, the ultimate output was created from the ensemble of different methods. By applying the algorithm on yeast gene expression data, meaningful relations among genes were extracted.
Collapse
|
50
|
Hernandez C, Mella C, Navarro G, Olivera-Nappa A, Araya J. Protein complex prediction via dense subgraphs and false positive analysis. PLoS One 2017; 12:e0183460. [PMID: 28937982 PMCID: PMC5609739 DOI: 10.1371/journal.pone.0183460] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 08/04/2017] [Indexed: 01/04/2023] Open
Abstract
Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.
Collapse
Affiliation(s)
- Cecilia Hernandez
- Computer Science, University of Concepción, Concepción, Chile
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
- * E-mail:
| | - Carlos Mella
- Computer Science, University of Concepción, Concepción, Chile
| | - Gonzalo Navarro
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
| | - Alvaro Olivera-Nappa
- Center for Biotechnology and Bioengineering (CeBiB), Department of Chemical Engineering and Biotechnology, University of Chile, Santiago, Chile
| | - Jaime Araya
- Computer Science, University of Concepción, Concepción, Chile
| |
Collapse
|