1
|
Wu C, Lin B, Zhang J, Gao R, Song R, Liu ZP. AttentionEP: Predicting essential proteins via fusion of multiscale features by attention mechanisms. Comput Struct Biotechnol J 2024; 23:4315-4323. [PMID: 39697678 PMCID: PMC11652892 DOI: 10.1016/j.csbj.2024.11.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 11/17/2024] [Accepted: 11/25/2024] [Indexed: 12/20/2024] Open
Abstract
Identifying essential proteins is of utmost importance in the field of biomedical research due to their essential functions in cellular activities and their involvement in mechanisms related to diseases. In this research, a novel approach called AttentionEP for predicting essential proteins (EP) is introduced by attention mechanisms. This method leverages both cross-attention and self-attention frameworks, focusing on enhancing prediction accuracy through the integration of features across diverse scales. Spatial characteristics of proteins are obtained from the protein-protein interaction (PPI) network by employing Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Following this, Bidirectional Long Short-Term Memory networks (BiLSTM) are employed to derive temporal features from gene expression datasets. Furthermore, spatial characteristics are derived by integrating data on subcellular localization with the application of Deep Neural Networks (DNN). In order to effectively integrate features across multiple scales, initial steps involve the application of self-attention techniques to derive essential insights from each unique data set. Following this, mechanisms involving self-attention and cross-attention are employed to enhance the interaction between diverse information sources. To identify essential proteins, a classifier based on the ResNet architecture is developed. The findings from the experiments indicate that the method introduced here shows superior performance in identifying essential proteins, recording an Area Under the Curve (AUC) value of 0.9433. This approach shows a considerable advantage over established techniques. The findings of this study provide a significant advancement in the comprehension of critical proteins, revealing promising potential for applications in the development of therapeutics and addressing various diseases.
Collapse
Affiliation(s)
- Chuanyan Wu
- School of Intelligent Engineering, Shandong Management University, No.3500 Dingxiang Road, Jinan, Shandong, 250357, China
| | - Bentao Lin
- School of Intelligent Engineering, Shandong Management University, No.3500 Dingxiang Road, Jinan, Shandong, 250357, China
| | - Jialin Zhang
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| | - Rui Song
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| | - Zhi-Ping Liu
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| |
Collapse
|
2
|
Pan L, Wang H, Yang B, Li W. A protein network refinement method based on module discovery and biological information. BMC Bioinformatics 2024; 25:157. [PMID: 38643108 PMCID: PMC11031909 DOI: 10.1186/s12859-024-05772-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 04/10/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND The identification of essential proteins can help in understanding the minimum requirements for cell survival and development to discover drug targets and prevent disease. Nowadays, node ranking methods are a common way to identify essential proteins, but the poor data quality of the underlying PIN has somewhat hindered the identification accuracy of essential proteins for these methods in the PIN. Therefore, researchers constructed refinement networks by considering certain biological properties of interacting protein pairs to improve the performance of node ranking methods in the PIN. Studies show that proteins in a complex are more likely to be essential than proteins not present in the complex. However, the modularity is usually ignored for the refinement methods of the PINs. METHODS Based on this, we proposed a network refinement method based on module discovery and biological information. The idea is, first, to extract the maximal connected subgraph in the PIN, and to divide it into different modules by using Fast-unfolding algorithm; then, to detect critical modules according to the orthologous information, subcellular localization information and topology information within each module; finally, to construct a more refined network (CM-PIN) by using the identified critical modules. RESULTS To evaluate the effectiveness of the proposed method, we used 12 typical node ranking methods (LAC, DC, DMNC, NC, TP, LID, CC, BC, PR, LR, PeC, WDC) to compare the overall performance of the CM-PIN with those on the S-PIN, D-PIN and RD-PIN. The experimental results showed that the CM-PIN was optimal in terms of the identification number of essential proteins, precision-recall curve, Jackknifing method and other criteria, and can help to identify essential proteins more accurately.
Collapse
Affiliation(s)
- Li Pan
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Haoyue Wang
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| | - Bo Yang
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Wenbin Li
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| |
Collapse
|
3
|
Sun J, Pan L, Li B, Wang H, Yang B, Li W. A Construction Method of Dynamic Protein Interaction Networks by Using Relevant Features of Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2790-2801. [PMID: 37030714 DOI: 10.1109/tcbb.2023.3264241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Essential proteins play an important role in various life activities and are considered to be a vital part of the organism. Gene expression data are an important dataset to construct dynamic protein-protein interaction networks (DPIN). The existing methods for the construction of DPINs generally utilize all features (or the features in a cycle) of the gene expression data. However, the features observed from successive time points tend to be highly correlated, and thus there are some redundant and irrelevant features in the gene expression data, which will influence the quality of the constructed network and the predictive performance of essential proteins. To address this problem, we propose a construction method of DPINs by using selected relevant features rather than continuous and periodic features. We adopt an improved unsupervised feature selection method based on Laplacian algorithm to remove irrelevant and redundant features from gene expression data, then integrate the chosen relevant features into the static protein-protein interaction network (SPIN) to construct a more concise and effective DPIN (FS-DPIN). To evaluate the effectiveness of the FS-DPIN, we apply 15 network-based centrality methods on the FS-DPIN and compare the results with those on the SPIN and the existing DPINs. Then the predictive performance of the 15 centrality methods is validated in terms of sensitivity, specificity, positive predictive value, negative predictive value, F-measure, accuracy, Jackknife and AUPRC. The experimental results show that the FS-DPIN is superior to the existing DPINs in the identification accuracy of essential proteins.
Collapse
|
4
|
Liu P, Liu C, Mao Y, Guo J, Liu F, Cai W, Zhao F. Identification of essential proteins based on edge features and the fusion of multiple-source biological information. BMC Bioinformatics 2023; 24:203. [PMID: 37198530 DOI: 10.1186/s12859-023-05315-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/30/2023] [Indexed: 05/19/2023] Open
Abstract
BACKGROUND A major current focus in the analysis of protein-protein interaction (PPI) data is how to identify essential proteins. As massive PPI data are available, this warrants the design of efficient computing methods for identifying essential proteins. Previous studies have achieved considerable performance. However, as a consequence of the features of high noise and structural complexity in PPIs, it is still a challenge to further upgrade the performance of the identification methods. METHODS This paper proposes an identification method, named CTF, which identifies essential proteins based on edge features including h-quasi-cliques and uv-triangle graphs and the fusion of multiple-source information. We first design an edge-weight function, named EWCT, for computing the topological scores of proteins based on quasi-cliques and triangle graphs. Then, we generate an edge-weighted PPI network using EWCT and dynamic PPI data. Finally, we compute the essentiality of proteins by the fusion of topological scores and three scores of biological information. RESULTS We evaluated the performance of the CTF method by comparison with 16 other methods, such as MON, PeC, TEGS, and LBCC, the experiment results on three datasets of Saccharomyces cerevisiae show that CTF outperforms the state-of-the-art methods. Moreover, our method indicates that the fusion of other biological information is beneficial to improve the accuracy of identification.
Collapse
Affiliation(s)
- Peiqiang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China.
| | - Chang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Yanyan Mao
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
- College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao, China
| | - Junhong Guo
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Fanshu Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Wangmin Cai
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Feng Zhao
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| |
Collapse
|
5
|
Ni X, Geng B, Zheng H, Shi J, Hu G, Gao J. Accurate Estimation of Single-Cell Differentiation Potency Based on Network Topology and Gene Ontology Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3255-3262. [PMID: 34529570 DOI: 10.1109/tcbb.2021.3112951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
One important task in single-cell analysis is to quantify the differentiation potential of single cells. Though various single-cell potency measures have been proposed, they are based on individual biological sources, thus not robust and reliable. It is still a challenge to combine multiple sources to generate a relatively reliable and robust measure to estimate differentiation. In this paper, we propose a New Centrality measure with Gene ontology information (NCG) to estimate single-cell potency. NCG is designed by combining network topology property with edge clustering coefficient, and gene function information using gene ontology function similarity scores. NCG distinguishes pluripotent cells from non-pluripotent cells with high accuracy, correctly ranks different cell types by their differentiation potency, tracks changes during the differentiation process, and constructs the lineage trajectory from human myoblasts into skeletal muscle cells. These indicate that NCG is a reliable and robust measure to estimate single-cell potency. NCG is anticipated to be a useful tool for identifying novel stem or progenitor cell phenotypes from single-cell RNA-Seq data. The source codes and datasets are available at https://github.com/Xinzhe-Ni/NCG.
Collapse
|
6
|
Lu H, Shang C, Zou S, Cheng L, Yang S, Wang L. A Novel Method for Predicting Essential Proteins by Integrating Multidimensional Biological Attribute Information and Topological Properties. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220304201507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Essential proteins are indispensable to the maintenance of life activities and play essential roles in the areas of synthetic biology. Identification of essential proteins by computational methods has become a hot topic in recent years because of its efficiency.
Objective:
Identification of essential proteins is of important significance and practical use in the areas of synthetic biology, drug targets, and human disease genes.
Method:
In this paper, a method called EOP(Edge clustering coefficient -Orthologous-Protein) is proposed to infer potential essential proteins by combining Multidimensional Biological Attribute Information of proteins with Topological Properties of the protein-protein interaction network.
Results:
The simulation results on the yeast protein interaction network show that the number of essential proteins identified by this method is more than the number identified by the other 12 methods(DC, IC, EC, SC, BC, CC, NC, LAC, PEC, CoEWC, POEM, DWE). Especially compared with DC(Degree Centrality), the SN(sensitivity) is 9% higher, when the candidate protein is 1%, the recognition rate is 34% higher, when the candidate protein is 5%, 10%, 15%, 20%, 25% the recognition rate is 36%, 22%, 15%, 11%, 8% higher respectively.
Conclusion:
Experimental results show that our method can achieve satisfactory prediction results, which may provide references for future research.
Collapse
Affiliation(s)
- Hanyu Lu
- College of Big Data and Information Engineering, Guizhou University, Guizhou, China
| | - Chen Shang
- College of Big Data and Information Engineering, Guizhou University, Guizhou, China
| | - Sai Zou
- College of Big Data and Information Engineering, Guizhou University, Guizhou, China
| | - Lihong Cheng
- College of Foreign Languages, Dalian Jiaotong University, China
| | - Shikong Yang
- College of Big Data and Information Engineering, Guizhou University, Guizhou, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, China
| |
Collapse
|
7
|
Gao J, Zheng S, Yao M, Wu P. Precise estimation of residue relative solvent accessible area from Cα atom distance matrix using a deep learning method. Bioinformatics 2021; 38:94-98. [PMID: 34450651 DOI: 10.1093/bioinformatics/btab616] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 08/12/2021] [Accepted: 08/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. RESULTS In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921-0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. AVAILABILITYAND IMPLEMENTATION The method is free available at https://github.com/cliffgao/EAGERER. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Shuangjia Zheng
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, China
| | - Mengting Yao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Peikun Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
8
|
He X, Kuang L, Chen Z, Tan Y, Wang L. Method for Identifying Essential Proteins by Key Features of Proteins in a Novel Protein-Domain Network. Front Genet 2021; 12:708162. [PMID: 34267785 PMCID: PMC8276041 DOI: 10.3389/fgene.2021.708162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 05/31/2021] [Indexed: 11/21/2022] Open
Abstract
In recent years, due to low accuracy and high costs of traditional biological experiments, more and more computational models have been proposed successively to infer potential essential proteins. In this paper, a novel prediction method called KFPM is proposed, in which, a novel protein-domain heterogeneous network is established first by combining known protein-protein interactions with known associations between proteins and domains. Next, based on key topological characteristics extracted from the newly constructed protein-domain network and functional characteristics extracted from multiple biological information of proteins, a new computational method is designed to effectively integrate multiple biological features to infer potential essential proteins based on an improved PageRank algorithm. Finally, in order to evaluate the performance of KFPM, we compared it with 13 state-of-the-art prediction methods, experimental results show that, among the top 1, 5, and 10% of candidate proteins predicted by KFPM, the prediction accuracy can achieve 96.08, 83.14, and 70.59%, respectively, which significantly outperform all these 13 competitive methods. It means that KFPM may be a meaningful tool for prediction of potential essential proteins in the future.
Collapse
Affiliation(s)
- Xin He
- College of Computer, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhiping Chen
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer, Xiangtan University, Xiangtan, China
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
9
|
Ahmed NM, Chen L, Li B, Liu W, Dai C. A random walk-based method for detecting essential proteins by integrating the topological and biological features of PPI network. Soft comput 2021. [DOI: 10.1007/s00500-021-05780-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|