1
|
Hou X, Huan M, Zhang Y, Zhang J, Lei Z, Zhang X, Xu Y, Mao S. Protein polysaccharide molecules combine with exercise to promote bone and joint injury repair: protein kinase signaling pathway. Int J Biol Macromol 2025; 309:142960. [PMID: 40210062 DOI: 10.1016/j.ijbiomac.2025.142960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2025] [Revised: 03/25/2025] [Accepted: 04/06/2025] [Indexed: 04/12/2025]
Abstract
Bone and joint injuries are common motor system diseases that seriously affect the quality of life of patients. This study aims to explore the promoting effect of protein polysaccharide molecules combined with exercise on bone and joint injury repair, as well as the underlying protein kinase signaling pathway mechanism. Research on constructing protein polysaccharide complexes through molecular docking technology to analyze their construction characteristics and the properties of protein kinases. By constructing relevant signaling pathways and applying support vector machine technology, the potential signaling pathways of kinases were encoded and calculated for predictive analysis. Subsequently, the repair effect of bone and joint injuries was quantitatively evaluated by combining exercise methods. The research results indicate that the constructed protein polysaccharide complex can effectively activate related protein kinase signaling pathways, enhance the repair effect of bone and joint injuries. Through the combination of exercise methods, the healing time of bone injuries is significantly shortened, and the recovery of joint function is significantly improved. Therefore, the binding movement of protein polysaccharide molecules has a significant effect on promoting the repair of bone and joint injuries, and its mechanism of action is mainly achieved by activating the signaling pathways of specific protein kinases.
Collapse
Affiliation(s)
- Xingchen Hou
- School of Physical Education and Health Sciences, Mudanjiang Normal University, Mudanjiang, Heilongjiang 157011, China
| | - Meng Huan
- Physical Education and Research Department, Mudanjiang Medical University, Mudanjiang, Heilongjiang 157011, China.
| | - Youming Zhang
- School of Physical Education and Health Sciences, Mudanjiang Normal University, Mudanjiang, Heilongjiang 157011, China
| | - Jianhui Zhang
- School of Physical Education, Shanxi University of Finance and Economics, Taiyuan, Shanxi 030006, China
| | - Zhang Lei
- College of Life Science and Technology, Mudanjiang Normal University, Mudanjiang, Heilongjiang 157011, China
| | - Xiangyu Zhang
- School of Acupuncture-Moxibustion and Tuina, Beijing University of Chinese Medicine, Beijing 100029, China
| | - Yin Xu
- Tennis Teaching and Research Section, Beijing Sport University, Beijing 100084, China
| | - Shukai Mao
- School of Physical Education, Southwest Medical University, Luzhou, Sichuan 646699, China
| |
Collapse
|
2
|
Wu C, Lin B, Zhang J, Gao R, Song R, Liu ZP. AttentionEP: Predicting essential proteins via fusion of multiscale features by attention mechanisms. Comput Struct Biotechnol J 2024; 23:4315-4323. [PMID: 39697678 PMCID: PMC11652892 DOI: 10.1016/j.csbj.2024.11.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 11/17/2024] [Accepted: 11/25/2024] [Indexed: 12/20/2024] Open
Abstract
Identifying essential proteins is of utmost importance in the field of biomedical research due to their essential functions in cellular activities and their involvement in mechanisms related to diseases. In this research, a novel approach called AttentionEP for predicting essential proteins (EP) is introduced by attention mechanisms. This method leverages both cross-attention and self-attention frameworks, focusing on enhancing prediction accuracy through the integration of features across diverse scales. Spatial characteristics of proteins are obtained from the protein-protein interaction (PPI) network by employing Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Following this, Bidirectional Long Short-Term Memory networks (BiLSTM) are employed to derive temporal features from gene expression datasets. Furthermore, spatial characteristics are derived by integrating data on subcellular localization with the application of Deep Neural Networks (DNN). In order to effectively integrate features across multiple scales, initial steps involve the application of self-attention techniques to derive essential insights from each unique data set. Following this, mechanisms involving self-attention and cross-attention are employed to enhance the interaction between diverse information sources. To identify essential proteins, a classifier based on the ResNet architecture is developed. The findings from the experiments indicate that the method introduced here shows superior performance in identifying essential proteins, recording an Area Under the Curve (AUC) value of 0.9433. This approach shows a considerable advantage over established techniques. The findings of this study provide a significant advancement in the comprehension of critical proteins, revealing promising potential for applications in the development of therapeutics and addressing various diseases.
Collapse
Affiliation(s)
- Chuanyan Wu
- School of Intelligent Engineering, Shandong Management University, No.3500 Dingxiang Road, Jinan, Shandong, 250357, China
| | - Bentao Lin
- School of Intelligent Engineering, Shandong Management University, No.3500 Dingxiang Road, Jinan, Shandong, 250357, China
| | - Jialin Zhang
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| | - Rui Song
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| | - Zhi-Ping Liu
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| |
Collapse
|
3
|
Lu P, Tian J. ACDMBI: A deep learning model based on community division and multi-source biological information fusion predicts essential proteins. Comput Biol Chem 2024; 112:108115. [PMID: 38865861 DOI: 10.1016/j.compbiolchem.2024.108115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/15/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024]
Abstract
Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model's superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model's performance.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| | - Jialong Tian
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| |
Collapse
|
4
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
5
|
Sun J, Pan L, Li B, Wang H, Yang B, Li W. A Construction Method of Dynamic Protein Interaction Networks by Using Relevant Features of Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2790-2801. [PMID: 37030714 DOI: 10.1109/tcbb.2023.3264241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Essential proteins play an important role in various life activities and are considered to be a vital part of the organism. Gene expression data are an important dataset to construct dynamic protein-protein interaction networks (DPIN). The existing methods for the construction of DPINs generally utilize all features (or the features in a cycle) of the gene expression data. However, the features observed from successive time points tend to be highly correlated, and thus there are some redundant and irrelevant features in the gene expression data, which will influence the quality of the constructed network and the predictive performance of essential proteins. To address this problem, we propose a construction method of DPINs by using selected relevant features rather than continuous and periodic features. We adopt an improved unsupervised feature selection method based on Laplacian algorithm to remove irrelevant and redundant features from gene expression data, then integrate the chosen relevant features into the static protein-protein interaction network (SPIN) to construct a more concise and effective DPIN (FS-DPIN). To evaluate the effectiveness of the FS-DPIN, we apply 15 network-based centrality methods on the FS-DPIN and compare the results with those on the SPIN and the existing DPINs. Then the predictive performance of the 15 centrality methods is validated in terms of sensitivity, specificity, positive predictive value, negative predictive value, F-measure, accuracy, Jackknife and AUPRC. The experimental results show that the FS-DPIN is superior to the existing DPINs in the identification accuracy of essential proteins.
Collapse
|
6
|
Liu P, Liu C, Mao Y, Guo J, Liu F, Cai W, Zhao F. Identification of essential proteins based on edge features and the fusion of multiple-source biological information. BMC Bioinformatics 2023; 24:203. [PMID: 37198530 DOI: 10.1186/s12859-023-05315-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/30/2023] [Indexed: 05/19/2023] Open
Abstract
BACKGROUND A major current focus in the analysis of protein-protein interaction (PPI) data is how to identify essential proteins. As massive PPI data are available, this warrants the design of efficient computing methods for identifying essential proteins. Previous studies have achieved considerable performance. However, as a consequence of the features of high noise and structural complexity in PPIs, it is still a challenge to further upgrade the performance of the identification methods. METHODS This paper proposes an identification method, named CTF, which identifies essential proteins based on edge features including h-quasi-cliques and uv-triangle graphs and the fusion of multiple-source information. We first design an edge-weight function, named EWCT, for computing the topological scores of proteins based on quasi-cliques and triangle graphs. Then, we generate an edge-weighted PPI network using EWCT and dynamic PPI data. Finally, we compute the essentiality of proteins by the fusion of topological scores and three scores of biological information. RESULTS We evaluated the performance of the CTF method by comparison with 16 other methods, such as MON, PeC, TEGS, and LBCC, the experiment results on three datasets of Saccharomyces cerevisiae show that CTF outperforms the state-of-the-art methods. Moreover, our method indicates that the fusion of other biological information is beneficial to improve the accuracy of identification.
Collapse
Affiliation(s)
- Peiqiang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China.
| | - Chang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Yanyan Mao
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
- College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao, China
| | - Junhong Guo
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Fanshu Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Wangmin Cai
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Feng Zhao
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| |
Collapse
|
7
|
de Menezes TA, Aburjaile FF, Quintanilha-Peixoto G, Tomé LMR, Fonseca PLC, Mendes-Pereira T, Araújo DS, Melo TS, Kato RB, Delabie JHC, Ribeiro SP, Brenig B, Azevedo V, Drechsler-Santos ER, Andrade BS, Góes-Neto A. Unraveling the Secrets of a Double-Life Fungus by Genomics: Ophiocordyceps australis CCMB661 Displays Molecular Machinery for Both Parasitic and Endophytic Lifestyles. J Fungi (Basel) 2023; 9:jof9010110. [PMID: 36675931 PMCID: PMC9864599 DOI: 10.3390/jof9010110] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/31/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023] Open
Abstract
Ophiocordyceps australis (Ascomycota, Hypocreales, Ophiocordycipitaceae) is a classic entomopathogenic fungus that parasitizes ants (Hymenoptera, Ponerinae, Ponerini). Nonetheless, according to our results, this fungal species also exhibits a complete set of genes coding for plant cell wall degrading Carbohydrate-Active enZymes (CAZymes), enabling a full endophytic stage and, consequently, its dual ability to both parasitize insects and live inside plant tissue. The main objective of our study was the sequencing and full characterization of the genome of the fungal strain of O. australis (CCMB661) and its predicted secretome. The assembled genome had a total length of 30.31 Mb, N50 of 92.624 bp, GC content of 46.36%, and 8,043 protein-coding genes, 175 of which encoded CAZymes. In addition, the primary genes encoding proteins and critical enzymes during the infection process and those responsible for the host-pathogen interaction have been identified, including proteases (Pr1, Pr4), aminopeptidases, chitinases (Cht2), adhesins, lectins, lipases, and behavioral manipulators, such as enterotoxins, Protein Tyrosine Phosphatases (PTPs), and Glycoside Hydrolases (GHs). Our findings indicate that the presence of genes coding for Mad2 and GHs in O. australis may facilitate the infection process in plants, suggesting interkingdom colonization. Furthermore, our study elucidated the pathogenicity mechanisms for this Ophiocordyceps species, which still is scarcely studied.
Collapse
Affiliation(s)
- Thaís Almeida de Menezes
- Department of Biological Sciences, Universidade Estadual de Feira de Santana, Av. Transnordestina, s/n, Novo Horizonte, Feira de Santana 44036-900, BA, Brazil
| | - Flávia Figueira Aburjaile
- Laboratory of Integrative Bioinformatics, Preventive Veterinary Medicine Department, Veterinary School, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, MG, Brazil
| | - Gabriel Quintanilha-Peixoto
- Laboratory of Molecular and Computational Biology of Fungi, Institute of Biological Sciences, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha, Belo Horizonte 31270-901, MG, Brazil
| | - Luiz Marcelo Ribeiro Tomé
- Laboratory of Molecular and Computational Biology of Fungi, Institute of Biological Sciences, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha, Belo Horizonte 31270-901, MG, Brazil
| | - Paula Luize Camargos Fonseca
- Laboratory of Molecular and Computational Biology of Fungi, Institute of Biological Sciences, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha, Belo Horizonte 31270-901, MG, Brazil
| | - Thairine Mendes-Pereira
- Laboratory of Molecular and Computational Biology of Fungi, Institute of Biological Sciences, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha, Belo Horizonte 31270-901, MG, Brazil
| | - Daniel Silva Araújo
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
| | - Tarcisio Silva Melo
- Department of Biological Sciences, Universidade Estadual de Feira de Santana, Av. Transnordestina, s/n, Novo Horizonte, Feira de Santana 44036-900, BA, Brazil
| | - Rodrigo Bentes Kato
- Laboratory of Molecular and Computational Biology of Fungi, Institute of Biological Sciences, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha, Belo Horizonte 31270-901, MG, Brazil
| | - Jacques Hubert Charles Delabie
- Laboratory of Myrmecology, Centro de Pesquisa do Cacau, Ilhéus 45600-000, BA, Brazil
- Department of Agricultural and Environmental Sciences, Universidade Estadual de Santa Cruz, Ilhéus 45600-970, BA, Brazil
| | - Sérvio Pontes Ribeiro
- Laboratory of Ecology of Diseases and Forests, Nucleus of Biological Science, Campus Morro do Cruzeiro, Universidade Federal de Ouro Preto, Ouro Preto 35402-163, MG, Brazil
| | - Bertram Brenig
- Institute of Veterinary Medicine, Burckhardtweg, University of Göttingen, 37073 Göttingen, Germany
| | - Vasco Azevedo
- Laboratory of Cellular and Molecular Genetics, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, MG, Brazil
| | | | - Bruno Silva Andrade
- Department of Biological Sciences, Universidade Federal do Sudoeste da Bahia, Av. José Moreira Sobrinho, s/n, Jequiezinho, Jequié 45205-490, BA, Brazil
| | - Aristóteles Góes-Neto
- Laboratory of Molecular and Computational Biology of Fungi, Institute of Biological Sciences, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha, Belo Horizonte 31270-901, MG, Brazil
- Correspondence: ; Tel.: +55-31-3409-3050
| |
Collapse
|
8
|
Merino GA, Saidi R, Milone DH, Stegmayer G, Martin MJ. Hierarchical deep learning for predicting GO annotations by integrating protein knowledge. Bioinformatics 2022; 38:4488-4496. [PMID: 35929781 PMCID: PMC9524999 DOI: 10.1093/bioinformatics/btac536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 07/18/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet. RESULTS We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations. AVAILABILITY AND IMPLEMENTATION DeeProtGO and a case of use are available at https://github.com/gamerino/DeeProtGO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriela A Merino
- Bioengineering and Bioinformatics Research and Development Institute (IBB), FI-UNER, CONICET, Oro Verde 3100, Argentina
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB101SD, UK
| | - Rabie Saidi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB101SD, UK
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB101SD, UK
| |
Collapse
|
9
|
Identifying essential proteins from protein-protein interaction networks based on influence maximization. BMC Bioinformatics 2022; 23:339. [PMID: 35974329 PMCID: PMC9380286 DOI: 10.1186/s12859-022-04874-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 08/03/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Essential proteins are indispensable to the development and survival of cells. The identification of essential proteins not only is helpful for the understanding of the minimal requirements for cell survival, but also has practical significance in disease diagnosis, drug design and medical treatment. With the rapidly amassing of protein-protein interaction (PPI) data, computationally identifying essential proteins from protein-protein interaction networks (PINs) becomes more and more popular. Up to now, a number of various approaches for essential protein identification based on PINs have been developed. RESULTS In this paper, we propose a new and effective approach called iMEPP to identify essential proteins from PINs by fusing multiple types of biological data and applying the influence maximization mechanism to the PINs. Concretely, we first integrate PPI data, gene expression data and Gene Ontology to construct weighted PINs, to alleviate the impact of high false-positives in the raw PPI data. Then, we define the influence scores of nodes in PINs with both orthological data and PIN topological information. Finally, we develop an influence discount algorithm to identify essential proteins based on the influence maximization mechanism. CONCLUSIONS We applied our method to identifying essential proteins from saccharomyces cerevisiae PIN. Experiments show that our iMEPP method outperforms the existing methods, which validates its effectiveness and advantage.
Collapse
|
10
|
Zhu X, Zhu Y, Tan Y, Chen Z, Wang L. An Iterative Method for Predicting Essential Proteins Based on Multifeature Fusion and Linear Neighborhood Similarity. Front Aging Neurosci 2022; 13:799500. [PMID: 35140599 PMCID: PMC8819145 DOI: 10.3389/fnagi.2021.799500] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 12/02/2021] [Indexed: 11/13/2022] Open
Abstract
Growing evidence have demonstrated that many biological processes are inseparable from the participation of key proteins. In this paper, a novel iterative method called linear neighborhood similarity-based protein multifeatures fusion (LNSPF) is proposed to identify potential key proteins based on multifeature fusion. In LNSPF, an original protein-protein interaction (PPI) network will be constructed first based on known protein-protein interaction data downloaded from benchmark databases, based on which, topological features will be further extracted. Next, gene expression data of proteins will be adopted to transfer the original PPI network to a weighted PPI network based on the linear neighborhood similarity. After that, subcellular localization and homologous information of proteins will be integrated to extract functional features for proteins, and based on both functional and topological features obtained above. And then, an iterative method will be designed and carried out to predict potential key proteins. At last, for evaluating the predictive performance of LNSPF, extensive experiments have been done, and compare results between LNPSF and 15 state-of-the-art competitive methods have demonstrated that LNSPF can achieve satisfactory recognition accuracy, which is markedly better than that achieved by each competing method.
Collapse
Affiliation(s)
- Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Yaocan Zhu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Zhiping Chen
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
11
|
Predicting Essential Proteins Based on Integration of Local Fuzzy Fractal Dimension and Subcellular Location Information. Genes (Basel) 2022; 13:genes13020173. [PMID: 35205217 PMCID: PMC8872415 DOI: 10.3390/genes13020173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 01/08/2022] [Accepted: 01/12/2022] [Indexed: 11/17/2022] Open
Abstract
Essential proteins are indispensable to cells’ survival and development. Prediction and analysis of essential proteins are crucial for uncovering the mechanisms of cells. With the help of computer science and high-throughput technologies, forecasting essential proteins by protein–protein interaction (PPI) networks has become more efficient than traditional approaches (expensive experimental methods are generally used). Many computational algorithms were employed to predict the essential proteins; however, they have various restrictions. To improve the prediction accuracy, by introducing the Local Fuzzy Fractal Dimension (LFFD) of complex networks into the analysis of the PPI network, we propose a novel algorithm named LDS, which combines the LFFD of the PPI network with the protein subcellular location information. By testing the proposed LDS algorithm on three different yeast PPI networks, the experimental results show that LDS outperforms some state-of-the-art essential protein-prediction techniques.
Collapse
|
12
|
Liu Y, Liang H, Zou Q, He Z. Significance-Based Essential Protein Discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:633-642. [PMID: 32750873 DOI: 10.1109/tcbb.2020.3004364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The identification of essential proteins is an important problem in bioinformatics. During the past decades, many centrality measures and algorithms have been proposed to address this issue. However, existing methods still deserve the following drawbacks: (1) the lack of a context-free and readily interpretable quantification of their centrality values; (2) the difficulty of specifying a proper threshold for their centrality values; (3) the incapability of controlling the quality of reported essential proteins in a statistically sound manner. To overcome the limitations of existing solutions, we tackle the essential protein discovery problem from a significance testing perspective. More precisely, the essential protein discovery problem is formulated as a multiple hypothesis testing problem, where the null hypothesis is that each protein is not an essential protein. To quantify the statistical significance of each protein, we present a p-value calculation method in which both the degree and the local clustering coefficient are used as the test statistic and the Erdös-Rényi model is employed as the random graph model. After calculating the p-value for each protein, the false discovery rate is used as the error rate in the multiple testing correction procedure. Our significance-based essential protein discovery method is named as SigEP, which is tested on both simulated networks and real PPI networks. The experimental results show that our method is able to achieve better performance than those competing algorithms.
Collapse
|
13
|
Meng X, Li W, Peng X, Li Y, Li M. Protein interaction networks: centrality, modularity, dynamics, and applications. FRONTIERS OF COMPUTER SCIENCE 2021; 15:156902. [DOI: 10.1007/s11704-020-8179-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 08/12/2020] [Indexed: 01/03/2025]
|
14
|
Wang N, Zeng M, Li Y, Wu FX, Li M. Essential Protein Prediction Based on node2vec and XGBoost. J Comput Biol 2021; 28:687-700. [PMID: 34152838 DOI: 10.1089/cmb.2020.0543] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Essential proteins are a vital part of the survival of organisms and cells. Identification of essential proteins lays a solid foundation for understanding protein functions and discovering drug targets. The traditional biological experiments are expensive and time-consuming. Recently, many computational methods have been proposed. However, some noises in the protein-protein interaction (PPI) networks affect the efficiency of essential protein prediction. It is necessary to construct a credible PPI network by using other useful biological information to reduce the effects of these noises. In this article, we proposed a model, Ess-NEXG, to identify essential proteins, which integrates biological information, including orthologous information, subcellular localization information, RNA-Seq information, and PPI network. In our model, first, we constructed a credible weighted PPI network by using different types of biological information. Second, we extracted the topological features of proteins in the constructed weighted PPI network by using the node2vec technique. Last, we used eXtreme Gradient Boosting (XGBoost) to predict essential proteins by using the topological features of proteins. The extensive results show that our model has better performance than other computational methods.
Collapse
Affiliation(s)
- Nian Wang
- School of Computer Science and Engineering, Central South University, Changsha, P.R. China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, P.R. China
| | - Yiming Li
- School of Computer Science and Engineering, Central South University, Changsha, P.R. China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada.,Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, P.R. China
| |
Collapse
|
15
|
Zhong J, Tang C, Peng W, Xie M, Sun Y, Tang Q, Xiao Q, Yang J. A novel essential protein identification method based on PPI networks and gene expression data. BMC Bioinformatics 2021; 22:248. [PMID: 33985429 PMCID: PMC8120700 DOI: 10.1186/s12859-021-04175-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 05/06/2021] [Indexed: 02/08/2023] Open
Abstract
Background Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins. Results In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression. Conclusions We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.
Collapse
Affiliation(s)
- Jiancheng Zhong
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.,Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Changsha, 410083, China
| | - Chao Tang
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Wei Peng
- College of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, Yunnan, China
| | - Minzhu Xie
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Yusui Sun
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Qiang Tang
- College of Engineering and Design, Hunan Normal University, Changsha, 410081, China
| | - Qiu Xiao
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| | - Jiahong Yang
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| |
Collapse
|
16
|
Ahmed NM, Chen L, Li B, Liu W, Dai C. A random walk-based method for detecting essential proteins by integrating the topological and biological features of PPI network. Soft comput 2021. [DOI: 10.1007/s00500-021-05780-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
17
|
CEGSO: Boosting Essential Proteins Prediction by Integrating Protein Complex, Gene Expression, Gene Ontology, Subcellular Localization and Orthology Information. Interdiscip Sci 2021; 13:349-361. [PMID: 33772722 DOI: 10.1007/s12539-021-00426-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 02/04/2021] [Accepted: 03/05/2021] [Indexed: 01/13/2023]
Abstract
Essential proteins are assumed to be an indispensable element in sustaining normal physiological function and crucial to drug design and disease diagnosis. The discovery of essential proteins is of great importance in revealing the molecular mechanisms and biological processes. Owing to the tedious biological experiment, many numerical methods have been developed to discover key proteins by mining the features of the high throughput data. Appropriate integration of differential biological information based on protein-protein interaction (PPI) network has been proven useful in predicting essential proteins. The main intention of this research is to provide a comprehensive study and a review on identifying essential proteins by integrating multi-source data and provide guidance for researchers. Detailed analysis and comparison of current essential protein prediction algorithms have been carried out and tested on benchmark PPI networks. In addition, based on the previous method TEGS (short for the network Topology, gene Expression, Gene ontology, and Subcellular localization), we improve the performance of predicting essential proteins by incorporating known protein complex information, the gene expression profile, Gene Ontology (GO) terms information, subcellular localization information, and protein's orthology data into the PPI network, named CEGSO. The simulation results show that CEGSO achieves more accurate and robust results than other compared methods under different test datasets with various evaluation measurements.
Collapse
|
18
|
Yusuf SM, Zhang F, Zeng M, Li M. DeepPPF: A deep learning framework for predicting protein family. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.11.062] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
19
|
Zeng M, Li M, Fei Z, Wu FX, Li Y, Pan Y, Wang J. A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:296-305. [PMID: 30736002 DOI: 10.1109/tcbb.2019.2897679] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Computational methods including centrality and machine learning-based methods have been proposed to identify essential proteins for understanding the minimum requirements of the survival and evolution of a cell. In centrality methods, researchers are required to design a score function which is based on prior knowledge, yet is usually not sufficient to capture the complexity of biological information. In machine learning-based methods, some selected biological features cannot represent the complete properties of biological information as they lack a computational framework to automatically select features. To tackle these problems, we propose a deep learning framework to automatically learn biological features without prior knowledge. We use node2vec technique to automatically learn a richer representation of protein-protein interaction (PPI) network topologies than a score function. Bidirectional long short term memory cells are applied to capture non-local relationships in gene expression data. For subcellular localization information, we exploit a high dimensional indicator vector to characterize their feature. To evaluate the performance of our method, we tested it on PPI network of S. cerevisiae. Our experimental results demonstrate that the performance of our method is better than traditional centrality methods and is superior to existing machine learning-based methods. To explore which of the three types of biological information is the most vital element, we conduct an ablation study by removing each component in turn. Our results show that the PPI network embedding contributes most to the improvement. In addition, gene expression profiles and subcellular localization information are also helpful to improve the performance in identification of essential proteins.
Collapse
|
20
|
Chen X, Xu M, An Y. Identifying the essential nodes in network pharmacology based on multilayer network combined with random walk algorithm. J Biomed Inform 2020; 114:103666. [PMID: 33352331 DOI: 10.1016/j.jbi.2020.103666] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 12/11/2020] [Accepted: 12/12/2020] [Indexed: 11/15/2022]
Abstract
Compared with the general complex network, the multilayer network is more suitable for the description of reality. It can be used as a tool of network pharmacology to analyze the mechanism of drug action from an overall perspective. Combined with random walk algorithm, it measures the importance of nodes from the entire network rather than a single layer. Here a four-layer network was constructed based on the data about the action process of prescriptions, consisting of ingredients, target proteins, metabolic pathways and diseases. The random walk algorithm was used to calculate the betweenness centrality of the protein layer nodes to get the rank of their importance. According to above method, we screened out the top 10% proteins that play a key role in treatment. Prescriptions Xiaochaihu Decoction was taken as example to prove our method. The selected proteins were measured with the ones that have been validated to be associated with the treated diseases. The results showed that its accuracy was no less than the topology-based method of single-layer network. The applicability of our method was proved by another prescription Yupingfeng Decoction. Our study demonstrated that multilayer network combined with random walk algorithm was an effective method for pre-screening vital target proteins related to prescriptions.
Collapse
Affiliation(s)
- Xianlai Chen
- Big Data Institute, Central South University, Changsha, Hunan, China.
| | - Mingyue Xu
- Big Data Institute, Central South University, Changsha, Hunan, China.
| | - Ying An
- Big Data Institute, Central South University, Changsha, Hunan, China.
| |
Collapse
|
21
|
Zhang W, Xu J, Zou X. Predicting Essential Proteins by Integrating Network Topology, Subcellular Localization Information, Gene Expression Profile and GO Annotation Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2053-2061. [PMID: 31095490 DOI: 10.1109/tcbb.2019.2916038] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Essential proteins are indispensable for maintaining normal cellular functions. Identification of essential proteins from Protein-protein interaction (PPI) networks has become a hot topic in recent years. Traditionally biological experimental based approaches are time-consuming and expensive, although lots of computational based methods have been developed in the past years; however, the prediction accuracy is still unsatisfied. In this research, by introducing the protein sub-cellular localization information, we define a new measurement for characterizing the protein's subcellular localization essentiality, and a new data fusion based method is developed for identifying essential proteins, named TEGS, based on integrating network topology, gene expression profile, GO annotation information, and protein subcellular localization information. To demonstrate the efficiency of the proposed method TEGS, we evaluate its performance on two Saccharomyces cerevisiae datasets and compare with other seven state-of-the-art methods (DC, BC, NC, PeC, WDC, SON, and TEO) in terms of true predicted number, jackknife curve, and precision-recall curve. Simulation results show that the TEGS outperforms the other compared methods in identifying essential proteins. The source code of TEGS is freely available at https://github.com/wzhangwhu/TEGS.
Collapse
|
22
|
Athira K, Gopakumar G. An integrated method for identifying essential proteins from multiplex network model of protein-protein interactions. J Bioinform Comput Biol 2020; 18:2050020. [PMID: 32795133 DOI: 10.1142/s0219720020500201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Cell survival requires the presence of essential proteins. Detection of essential proteins is relevant not only because of the critical biological functions they perform but also the role played by them as a drug target against pathogens. Several computational techniques are in place to identify essential proteins based on protein-protein interaction (PPI) network. Essential protein detection using only physical interaction data of proteins is challenging due to its inherent uncertainty. Hence, in this work, we propose a multiplex network-based framework that incorporates multiple protein interaction data from their physical, coexpression and phylogenetic profiles. An extended version termed as multiplex eigenvector centrality (MEC) is used to identify essential proteins from this network. The methodology integrates the score obtained from the multiplex analysis with subcellular localization and Gene Ontology information and is implemented using Saccharomyces cerevisiae datasets. The proposed method outperformed many recent essential protein prediction techniques in the literature.
Collapse
Affiliation(s)
- K Athira
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kozhikkode, Kerala 673601, India
| | - G Gopakumar
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kozhikkode, Kerala 673601, India
| |
Collapse
|
23
|
Li G, Li M, Wang J, Li Y, Pan Y. United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1451-1458. [PMID: 30596582 DOI: 10.1109/tcbb.2018.2889978] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.
Collapse
|
24
|
Li X, Li W, Zeng M, Zheng R, Li M. Network-based methods for predicting essential genes or proteins: a survey. Brief Bioinform 2020; 21:566-583. [PMID: 30776072 DOI: 10.1093/bib/bbz017] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 01/21/2019] [Accepted: 01/22/2019] [Indexed: 01/03/2025] Open
Abstract
Genes that are thought to be critical for the survival of organisms or cells are called essential genes. The prediction of essential genes and their products (essential proteins) is of great value in exploring the mechanism of complex diseases, the study of the minimal required genome for living cells and the development of new drug targets. As laboratory methods are often complicated, costly and time-consuming, a great many of computational methods have been proposed to identify essential genes/proteins from the perspective of the network level with the in-depth understanding of network biology and the rapid development of biotechnologies. Through analyzing the topological characteristics of essential genes/proteins in protein-protein interaction networks (PINs), integrating biological information and considering the dynamic features of PINs, network-based methods have been proved to be effective in the identification of essential genes/proteins. In this paper, we survey the advanced methods for network-based prediction of essential genes/proteins and present the challenges and directions for future research.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Wenkai Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| |
Collapse
|
25
|
Yan C, Wu FX, Wang J, Duan G. PESM: predicting the essentiality of miRNAs based on gradient boosting machines and sequences. BMC Bioinformatics 2020; 21:111. [PMID: 32183740 PMCID: PMC7079416 DOI: 10.1186/s12859-020-3426-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 02/21/2020] [Indexed: 11/16/2022] Open
Abstract
Background MicroRNAs (miRNAs) are a kind of small noncoding RNA molecules that are direct posttranscriptional regulations of mRNA targets. Studies have indicated that miRNAs play key roles in complex diseases by taking part in many biological processes, such as cell growth, cell death and so on. Therefore, in order to improve the effectiveness of disease diagnosis and treatment, it is appealing to develop advanced computational methods for predicting the essentiality of miRNAs. Result In this study, we propose a method (PESM) to predict the miRNA essentiality based on gradient boosting machines and miRNA sequences. First, PESM extracts the sequence and structural features of miRNAs. Then it uses gradient boosting machines to predict the essentiality of miRNAs. We conduct the 5-fold cross-validation to assess the prediction performance of our method. The area under the receiver operating characteristic curve (AUC), F-measure and accuracy (ACC) are used as the metrics to evaluate the prediction performance. We also compare PESM with other three competing methods which include miES, Gaussian Naive Bayes and Support Vector Machine. Conclusion The results of experiments show that PESM achieves the better prediction performance (AUC: 0.9117, F-measure: 0.8572, ACC: 0.8516) than other three computing methods. In addition, the relative importance of all features also further shows that newly added features can be helpful to improve the prediction performance of methods.
Collapse
Affiliation(s)
- Cheng Yan
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.,School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000, China
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China
| | - Guihua Duan
- Hunan Provincial Key Lab on Bioinformtics, School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.
| |
Collapse
|
26
|
Jia K, Zhou Y, Cui Q. Quantifying Gene Essentiality Based on the Context of Cellular Components. Front Genet 2020; 10:1342. [PMID: 32038710 PMCID: PMC6985572 DOI: 10.3389/fgene.2019.01342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Accepted: 12/09/2019] [Indexed: 11/26/2022] Open
Abstract
Different genes have their protein products localized in various subcellular compartments. The diversity in protein localization may serve as a gene characteristic, revealing gene essentiality from a subcellular perspective. To measure this diversity, we introduced a Subcellular Diversity Index (SDI) based on the Gene Ontology-Cellular Component Ontology (GO-CCO) and a semantic similarity measure of GO terms. Analyses revealed that SDI of human genes was well correlated with some known measures of gene essentiality, including protein–protein interaction (PPI) network topology measurements, dN/dS ratio, homologous gene number, expression level and tissue specificity. In addition, SDI had a good performance in predicting human essential genes (AUC = 0.702) and drug target genes (AUC = 0.704), and drug targets with higher SDI scores tended to cause more side-effects. The results suggest that SDI could be used to identify novel drug targets and to guide the filtering of drug targets with fewer potential side effects. Finally, we developed a user-friendly online database for querying SDI score for genes across eight species, and the predicted probabilities of human drug target based on SDI. The online database of SDI is available at: http://www.cuilab.cn/sdi.
Collapse
Affiliation(s)
- Kaiwen Jia
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Yuan Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Qinghua Cui
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China
| |
Collapse
|
27
|
Song F, Cui C, Gao L, Cui Q. miES: predicting the essentiality of miRNAs with machine learning and sequence features. Bioinformatics 2019; 35:1053-1054. [PMID: 30165607 DOI: 10.1093/bioinformatics/bty738] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 08/06/2018] [Accepted: 08/23/2018] [Indexed: 01/20/2023] Open
Abstract
MOTIVATION MicroRNAs (miRNAs) are one class of small noncoding RNA molecules, which regulate gene expression at the post-transcriptional level and play important roles in health and disease. To dissect the critical miRNAs in miRNAome, it is needed to predict the essentiality of miRNAs, however, bioinformatics methods for this purpose are limited. RESULTS Here we propose miES, a novel algorithm, for the prioritization of miRNA essentiality. miES implements a machine learning strategy based on learning from positive and unlabeled samples. miES uses sequence features of known essential miRNAs and performs miRNAome-wide searching for new essential miRNAs. miES achieves an AUC of 0.9 for 5-fold cross validation. Moreover, experiments further show that the miES score is significantly correlated with some established biological metrics for miRNA importance, such as miRNA conservation, miRNA disease spectrum width (DSW) and expression level. AVAILABILITY AND IMPLEMENTATION The R source code is available at the download page of the web server, http://www.cuilab.cn/mies. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fei Song
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Chunmei Cui
- Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Qinghua Cui
- Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing, China.,Center of Bioinformatics, Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
28
|
Li G, Li M, Peng W, Li Y, Pan Y, Wang J. A novel extended Pareto Optimality Consensus model for predicting essential proteins. J Theor Biol 2019; 480:141-149. [PMID: 31398315 DOI: 10.1016/j.jtbi.2019.08.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 08/02/2019] [Accepted: 08/06/2019] [Indexed: 12/11/2022]
Abstract
Essential proteins have vital functions, when they are destroyed in cells, the cells will die or stop reproducing. Therefore, it is very important to identify essential proteins from a large number of other proteins. Due to the time-consuming, expensive, and inefficient process in biological experimental methods, computational methods become more and more popular to recognize them. In the early stages, these methods mainly rely on protein-protein interaction (PPI) information, which limits their discovery capacities. Researchers find novel methods by fusing multi-information to improve prediction accuracy. According to these features, essential proteins are more important and conservative in the evolution process, their neighbors in PPI networks are usually likely to be essential, there are many false positives in PPI data, whether a protein is essential can be assessed by the importance of a protein itself, the relevance of neighbors and the reliability of PPIs. The importance of neighbors and the reliability of PPIs can be further integrated into neighborhood feature. In the study, orthologous information, edge-clustering coefficient and gene expression information are used to measure the importance of a protein itself, the importance of the neighbors and the reliability of PPIs, respectively. Then, a novel expanded POC model, E_POC, is proposed to fuse the above information to discover essential proteins, a weighted PPI network is constructed. The proteins ranked high according to their weights are treated as candidate essential proteins. This novel method is named as E_POC. E_POC outperforms the existing classical methods on S. cerevisiae and E. coli data.
Collapse
Affiliation(s)
- Gaoshi Li
- School of Computer Science and engineering, Central South University, Changsha 410083, China; Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi 541004, China.
| | - Min Li
- School of Computer Science and engineering, Central South University, Changsha 410083, China.
| | - Wei Peng
- Computer Center/ Faculty of Information Engineering and Automation of Kunming University of Science and Technology, Kunming, Yunnan 650093, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA 30302-4110, USA.
| | - Jianxin Wang
- School of Computer Science and engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
29
|
Shen Y, Ding Y, Tang J, Zou Q, Guo F. Critical evaluation of web-based prediction tools for human protein subcellular localization. Brief Bioinform 2019; 21:1628-1640. [DOI: 10.1093/bib/bbz106] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 07/23/2019] [Accepted: 07/27/2019] [Indexed: 11/12/2022] Open
Abstract
Abstract
Human protein subcellular localization has an important research value in biological processes, also in elucidating protein functions and identifying drug targets. Over the past decade, a number of protein subcellular localization prediction tools have been designed and made freely available online. The purpose of this paper is to summarize the progress of research on the subcellular localization of human proteins in recent years, including commonly used data sets proposed by the predecessors and the performance of all selected prediction tools against the same benchmark data set. We carry out a systematic evaluation of several publicly available subcellular localization prediction methods on various benchmark data sets. Among them, we find that mLASSO-Hum and pLoc-mHum provide a statistically significant improvement in performance, as measured by the value of accuracy, relative to the other methods. Meanwhile, we build a new data set using the latest version of Uniprot database and construct a new GO-based prediction method HumLoc-LBCI in this paper. Then, we test all selected prediction tools on the new data set. Finally, we discuss the possible development directions of human protein subcellular localization. Availability: The codes and data are available from http://www.lbci.cn/syn/.
Collapse
Affiliation(s)
- Yinan Shen
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
- School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
- Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
30
|
Cui C, Shi B, Shi J, Zhou Y, Cui Q. Defining the Importance Score of Human MicroRNAs and Their Single Nucleotide Mutants Using Random Forest Regression and Sequence Data. ADVANCED THEORY AND SIMULATIONS 2019. [DOI: 10.1002/adts.201900083] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Chunmei Cui
- Department of Biomedical InformaticsDepartment of Physiology and PathophysiologyCenter for Noncoding RNA MedicineMOE Key Lab of Cardiovascular SciencesSchool of Basic Medical SciencesPeking University 38 Xueyuan Rd Beijing 100191 China
| | - Bing Shi
- Department of CardiologyBeijing Military General Hospital Beijing 100700 China
| | - Jiangcheng Shi
- Department of Biomedical InformaticsDepartment of Physiology and PathophysiologyCenter for Noncoding RNA MedicineMOE Key Lab of Cardiovascular SciencesSchool of Basic Medical SciencesPeking University 38 Xueyuan Rd Beijing 100191 China
| | - Yuan Zhou
- Department of Biomedical InformaticsDepartment of Physiology and PathophysiologyCenter for Noncoding RNA MedicineMOE Key Lab of Cardiovascular SciencesSchool of Basic Medical SciencesPeking University 38 Xueyuan Rd Beijing 100191 China
| | - Qinghua Cui
- Department of Biomedical InformaticsDepartment of Physiology and PathophysiologyCenter for Noncoding RNA MedicineMOE Key Lab of Cardiovascular SciencesSchool of Basic Medical SciencesPeking University 38 Xueyuan Rd Beijing 100191 China
- Center of BioinformaticsKey Laboratory for Neuro‐Information of Ministry of EducationSchool of Life Science and TechnologyUniversity of Electronic Science and Technology of China Chengdu 610054 China
| |
Collapse
|
31
|
Zhang Z, Ruan J, Gao J, Wu FX. Predicting essential proteins from protein-protein interactions using order statistics. J Theor Biol 2019; 480:274-283. [PMID: 31251944 DOI: 10.1016/j.jtbi.2019.06.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 03/24/2019] [Accepted: 06/24/2019] [Indexed: 12/11/2022]
Abstract
Many computational methods have been proposed to predict essential proteins from protein-protein interaction (PPI) networks. However, it is still challenging to improve the prediction accuracy. In this study, we propose a new method, esPOS (essential proteins Predictor using Order Statistics) to predict essential proteins from PPI networks. Firstly, we refine the networks by using gene expression information and subcellular localization information. Secondly, we design some new features, which combine the protein predicted secondary structure with PPI network. We show that these new features are useful to predict essential proteins. Thirdly, we optimize these features by using a greedy method, and combine the optimized features by order statistic method. Our method achieves the prediction accuracy of 0.76-0.79 on two network datasets. The proposed method is available at https://sourceforge.net/projects/espos/.
Collapse
Affiliation(s)
- Zhaopeng Zhang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| | - Jishou Ruan
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| |
Collapse
|
32
|
Zhang F, Song H, Zeng M, Li Y, Kurgan L, Li M. DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions. Proteomics 2019; 19:e1900019. [PMID: 30941889 DOI: 10.1002/pmic.201900019] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Revised: 03/18/2019] [Indexed: 01/06/2023]
Abstract
Annotation of protein functions plays an important role in understanding life at the molecular level. High-throughput sequencing produces massive numbers of raw proteins sequences and only about 1% of them have been manually annotated with functions. Experimental annotations of functions are expensive, time-consuming and do not keep up with the rapid growth of the sequence numbers. This motivates the development of computational approaches that predict protein functions. A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence- and network-derived information. More precisely, DeepFunc uses a long and sparse binary vector to encode information concerning domains, families, and motifs collected from the InterPro tool that is associated with the input protein sequence. This vector is processed with two neural layers to obtain a low-dimensional vector which is combined with topological information extracted from protein-protein interactions (PPIs) and functional linkages. The combined information is processed by a deep neural network that predicts protein functions. DeepFunc is empirically and comparatively tested on a benchmark testing dataset and the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset. The experimental results demonstrate that DeepFunc outperforms current methods on the testing dataset and that it secures the highest Fmax = 0.54 and AUC = 0.94 on the CAFA3 dataset.
Collapse
Affiliation(s)
- Fuhao Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| | - Hong Song
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| | - Yaohang Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China.,Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| |
Collapse
|
33
|
Ijaq J, Malik G, Kumar A, Das PS, Meena N, Bethi N, Sundararajan VS, Suravajhala P. A model to predict the function of hypothetical proteins through a nine-point classification scoring schema. BMC Bioinformatics 2019; 20:14. [PMID: 30621574 PMCID: PMC6325861 DOI: 10.1186/s12859-018-2554-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 11/30/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. RESULTS In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). CONCLUSION With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs.
Collapse
Affiliation(s)
- Johny Ijaq
- Department of Biotechnology, Osmania University, Hyderabad, 500007 India
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
| | - Girik Malik
- Department of Pediatrics, The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, The Ohio State University, Columbus, OH USA
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
- Labrynthe, New Delhi, India
| | - Anuj Kumar
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
- Advanced Center for Computational and Applied Biotechnology, Uttarakhand Council for Biotechnology, Dehradun, 248007 India
| | - Partha Sarathi Das
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
- Department of Microbiology, Bioinformatics Infrastructure Facility, Vidyasagar University, Midnapore, India
| | - Narendra Meena
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, RJ 302001 India
| | - Neeraja Bethi
- Department of Biotechnology, Osmania University, Hyderabad, 500007 India
| | | | - Prashanth Suravajhala
- Bioclues.org, Kukatpally, Hyderabad, 500072 India
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Statue Circle, RJ 302001 India
| |
Collapse
|
34
|
Elahi A, Babamir SM. Identification of essential proteins based on a new combination of topological and biological features in weighted protein-protein interaction networks. IET Syst Biol 2018; 12:247-257. [PMID: 30472688 PMCID: PMC8687241 DOI: 10.1049/iet-syb.2018.5024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 04/23/2018] [Accepted: 04/30/2018] [Indexed: 02/01/2023] Open
Abstract
The identification of essential proteins in protein-protein interaction (PPI) networks is not only important in understanding the process of cellular life but also useful in diagnosis and drug design. The network topology-based centrality measures are sensitive to noise of network. Moreover, these measures cannot detect low-connectivity essential proteins. The authors have proposed a new method using a combination of topological centrality measures and biological features based on statistical analyses of essential proteins and protein complexes. With incomplete PPI networks, they face the challenge of false-positive interactions. To remove these interactions, the PPI networks are weighted by gene ontology. Furthermore, they use a combination of classifiers, including the newly proposed measures and traditional weighted centrality measures, to improve the precision of identification. This combination is evaluated using the logistic regression model in terms of significance levels. The proposed method has been implemented and compared to both previous and more recent efficient computational methods using six statistical standards. The results show that the proposed method is more precise in identifying essential proteins than the previous methods. This level of precision was obtained through the use of four different data sets: YHQ-W, YMBD-W, YDIP-W and YMIPS-W.
Collapse
Affiliation(s)
- Abdolkarim Elahi
- Department of Software Engineering, University of Kashan, Kashan, Iran
| | | |
Collapse
|
35
|
Shen Y, Tang J, Guo F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC. J Theor Biol 2018; 462:230-239. [PMID: 30452958 DOI: 10.1016/j.jtbi.2018.11.012] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 11/07/2018] [Accepted: 11/15/2018] [Indexed: 01/07/2023]
Abstract
Identifying the location of proteins in a cell plays an important role in understanding their functions, such as drug design, therapeutic target discovery and biological research. However, the traditional subcellular localization experiments are time-consuming, laborious and small scale. With the development of next-generation sequencing technology, the number of proteins has grown exponentially, which lays the foundation of the computational method for identifying protein subcellular localization. Although many methods for predicting subcellular localization of proteins have been proposed, most of them are limited to single-location. In this paper, we propose a multi-kernel SVM to predict subcellular localization of both multi-location and single-location proteins. First, we make use of the evolutionary information extracted from position specific scoring matrix (PSSM) and physicochemical properties of proteins, by Chou's general PseAAC and other efficient functions. Then, we propose a multi-kernel support vector machine (SVM) model to identify multi-label protein subcellular localization. As a result, our method has a good performance on predicting subcellular localization of proteins. It achieves an average precision of 0.7065 and 0.6889 on two human datasets, respectively. All results are higher than those achieved by other existing methods. Therefore, we provide an efficient system via a novel perspective to study the protein subcellular localization.
Collapse
Affiliation(s)
- Yinan Shen
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China.
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China; School of Computational Science and Engineering, University of South Carolina, Columbia, USA.
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China.
| |
Collapse
|
36
|
A systematic survey of centrality measures for protein-protein interaction networks. BMC SYSTEMS BIOLOGY 2018; 12:80. [PMID: 30064421 PMCID: PMC6069823 DOI: 10.1186/s12918-018-0598-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 06/22/2018] [Indexed: 12/12/2022]
Abstract
Background Numerous centrality measures have been introduced to identify “central” nodes in large networks. The availability of a wide range of measures for ranking influential nodes leaves the user to decide which measure may best suit the analysis of a given network. The choice of a suitable measure is furthermore complicated by the impact of the network topology on ranking influential nodes by centrality measures. To approach this problem systematically, we examined the centrality profile of nodes of yeast protein-protein interaction networks (PPINs) in order to detect which centrality measure is succeeding in predicting influential proteins. We studied how different topological network features are reflected in a large set of commonly used centrality measures. Results We used yeast PPINs to compare 27 common of centrality measures. The measures characterize and assort influential nodes of the networks. We applied principal component analysis (PCA) and hierarchical clustering and found that the most informative measures depend on the network’s topology. Interestingly, some measures had a high level of contribution in comparison to others in all PPINs, namely Latora closeness, Decay, Lin, Freeman closeness, Diffusion, Residual closeness and Average distance centralities. Conclusions The choice of a suitable set of centrality measures is crucial for inferring important functional properties of a network. We concluded that undertaking data reduction using unsupervised machine learning methods helps to choose appropriate variables (centrality measures). Hence, we proposed identifying the contribution proportions of the centrality measures with PCA as a prerequisite step of network analysis before inferring functional consequences, e.g., essentiality of a node. Electronic supplementary material The online version of this article (10.1186/s12918-018-0598-2) contains supplementary material, which is available to authorized users.
Collapse
|
37
|
Lei X, Yang X. A new method for predicting essential proteins based on participation degree in protein complex and subgraph density. PLoS One 2018; 13:e0198998. [PMID: 29894517 PMCID: PMC5997351 DOI: 10.1371/journal.pone.0198998] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Accepted: 05/30/2018] [Indexed: 12/11/2022] Open
Abstract
Essential proteins are crucial to living cells. Identification of essential proteins from protein-protein interaction (PPI) networks can be applied to pathway analysis and function prediction, furthermore, it can contribute to disease diagnosis and drug design. There have been some experimental and computational methods designed to identify essential proteins, however, the prediction precision remains to be improved. In this paper, we propose a new method for identifying essential proteins based on Participation degree of a protein in protein Complexes and Subgraph Density, named as PCSD. In order to test the performance of PCSD, four PPI datasets (DIP, Krogan, MIPS and Gavin) are used to conduct experiments. The experiment results have demonstrated that PCSD achieves a better performance for predicting essential proteins compared with some competing methods including DC, SC, EC, IC, LAC, NC, WDC, PeC, UDoNC, and compared with the most recent method LBCC, PCSD can correctly predict more essential proteins from certain numbers of top ranked proteins on the DIP dataset, which indicates that PCSD is very effective in discovering essential proteins in most case.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Xiaoqin Yang
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| |
Collapse
|