1
|
Shen X, Zhang S, Long J, Chen C, Wang M, Cui Z, Chen B, Tan T. A Highly Sensitive Model Based on Graph Neural Networks for Enzyme Key Catalytic Residue Prediction. J Chem Inf Model 2023; 63:4277-4290. [PMID: 37399293 DOI: 10.1021/acs.jcim.3c00273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
Determining the catalytic site of enzymes is a great help for understanding the relationship between protein sequence, structure, and function, which provides the basis and targets for designing, modifying, and enhancing enzyme activity. The unique local spatial configuration bound to the substrate at the active center of the enzyme determines the catalytic ability of enzymes and plays an important role in the catalytic site prediction. As a suitable tool, the graph neural network can better understand and identify the residue sites with unique local spatial configurations due to its remarkable ability to characterize the three-dimensional structural features of proteins. Consequently, a novel model for predicting enzyme catalytic sites has been developed, which incorporates a uniquely designed adaptive edge-gated graph attention neural network (AEGAN). This model is capable of effectively handling sequential and structural characteristics of proteins at various levels, and the extracted features enable an accurate description of the local spatial configuration of the enzyme active site by sampling the local space around candidate residues and special design of amino acid physical and chemical properties. To evaluate its performance, the model was compared with existing catalytic site prediction models using different benchmark datasets and achieved the best results on each benchmark dataset. The model exhibited a sensitivity of 0.9659, accuracy of 0.9226, and area under the precision-recall curve (AUPRC) of 0.9241 on the independent test set constructed for evaluation. Furthermore, the F1-score of this model is nearly four times higher than that of the best-performing similar model in previous studies. This research can serve as a valuable tool to help researchers understand protein sequence-structure-function relationships while facilitating the characterization of novel enzymes of unknown function.
Collapse
Affiliation(s)
- Xiaowei Shen
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Shiding Zhang
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Jianyu Long
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Changjing Chen
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Meng Wang
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Ziheng Cui
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Biqiang Chen
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| | - Tianwei Tan
- National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China
| |
Collapse
|
2
|
Feehan R, Franklin MW, Slusky JSG. Machine learning differentiates enzymatic and non-enzymatic metals in proteins. Nat Commun 2021; 12:3712. [PMID: 34140507 PMCID: PMC8211803 DOI: 10.1038/s41467-021-24070-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 06/02/2021] [Indexed: 11/09/2022] Open
Abstract
Metalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions. Because of the physicochemical similarities between the active sites of metalloenzymes and inactive metal binding sites, it is challenging to differentiate between them. Yet distinguishing these two classes is critical for the identification of both native and designed enzymes. Because of similarities between catalytic and non-catalytic metal binding sites, finding physicochemical features that distinguish these two types of metal sites can indicate aspects that are critical to enzyme function. In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. We then use a decision-tree ensemble machine learning model to classify metals bound to proteins as enzymatic or non-enzymatic with 92.2% precision and 90.1% recall. Our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites. Finally, we find our model has overall better performance in a side-to-side comparison against other methods that differentiate enzymatic from non-enzymatic sequences. We anticipate that our model's ability to correctly identify which metal sites are responsible for enzymatic activity could enable identification of new enzymatic mechanisms and de novo enzyme design.
Collapse
Affiliation(s)
- Ryan Feehan
- Center for Computational Biology, The University of Kansas, Lawrence, KS, USA
| | - Meghan W Franklin
- Center for Computational Biology, The University of Kansas, Lawrence, KS, USA
| | - Joanna S G Slusky
- Center for Computational Biology, The University of Kansas, Lawrence, KS, USA.
- Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA.
| |
Collapse
|
3
|
Yan W, Hu G, Liang Z, Zhou J, Yang Y, Chen J, Shen B. Node-Weighted Amino Acid Network Strategy for Characterization and Identification of Protein Functional Residues. J Chem Inf Model 2018; 58:2024-2032. [PMID: 30107728 DOI: 10.1021/acs.jcim.8b00146] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The study of functional residues (FRs) is essential for understanding protein functions and biological processes. The amino acid network (AAN) has become an emerging paradigm for studying FRs during the past decade. Current AAN models ignore the heterogeneity of nodes and treat amino acids in the AAN as the same. However, the properties of each amino acid node are of fundamental importance. We here proposed a node-weighted AAN strategy termed the node-weighted amino acid contact energy network (NACEN) to characterize and predict three types of FRs, namely, hot spots, catalytic residues, and allosteric residues. We first constructed NACENs with their nodes weighted based on structural, sequence, physicochemical, and dynamical properties of the amino acids and then characterized the FRs with the NACEN parameters. We finally built machine learning predictors to identify each type of FR. The results revealed that residues characterized with NACEN parameters are more distinguishable between FRs and non-FRs than those with unweighted network ones. With few features for classification, NACEN yields comparable performance for FR identification and provides residue level prediction for allosteric regulation. The proposed strategy can be easily implemented to other functional residue identification. An R package is also provided for NACEN construction and analysis at http://sysbio.suda.edu.cn/NACEN/index.html .
Collapse
Affiliation(s)
- Wenying Yan
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Guang Hu
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Zhongjie Liang
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Jianhong Zhou
- Center for systems biology , Soochow University , Suzhou 215006 , China
| | - Yang Yang
- School of computer science and technology , Soochow University , Suzhou 215006 , China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering , Suzhou University of Science and Technology , Suzhou 215011 , China
| | - Bairong Shen
- Center for systems biology , Soochow University , Suzhou 215006 , China
| |
Collapse
|
4
|
Song J, Li F, Takemoto K, Haffari G, Akutsu T, Chou KC, Webb GI. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J Theor Biol 2018; 443:125-137. [DOI: 10.1016/j.jtbi.2018.01.023] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 01/17/2018] [Accepted: 01/18/2018] [Indexed: 10/18/2022]
|
5
|
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes. Sci Rep 2016; 6:34044. [PMID: 27665935 PMCID: PMC5036049 DOI: 10.1038/srep34044] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 09/07/2016] [Indexed: 11/08/2022] Open
Abstract
A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.
Collapse
|
6
|
Han L, Zhang YJ, Zhang L, Cui X, Yu J, Zhang Z, Liu MS. Operating mechanism and molecular dynamics of pheromone-binding protein ASP1 as influenced by pH. PLoS One 2014; 9:e110565. [PMID: 25337796 PMCID: PMC4206424 DOI: 10.1371/journal.pone.0110565] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 09/23/2014] [Indexed: 11/24/2022] Open
Abstract
Odorant binding protein (OBP) is a vital component of the olfactory sensation system. It performs the specific role of ferrying odorant molecules to odorant receptors. OBP helps insects and types of animal to sense and transport stimuli molecules. However, the molecular details about how OBPs bind or release its odorant ligands are unclear. For some OBPs, the systems' pH level is reported to impact on the ligands' binding or unbinding capability. In this work we investigated the operating mechanism and molecular dynamics in bee antennal pheromone-binding protein ASP1 under varying pH conditions. We found that conformational flexibility is the key factor for regulating the interaction of ASP1 and its ligands, and the odorant binds to ASP1 at low pH conditions. Dynamics, once triggered by pH changes, play the key roles in coupling the global conformational changes with the odorant release. In ASP1, the C-terminus, the N-terminus, helix α2 and the region ranging from helices α4 to α5 form a cavity with a novel 'entrance' of binding. These are the major regions that respond to pH change and regulate the ligand release. Clearly there are processes of dynamics and hydrogen bond network propagation in ASP1 in response to pH stimuli. These findings lead to an understanding of the mechanism and dynamics of odorant-OBP interaction in OBP, and will benefit chemsensory-related biotech and agriculture research and development.
Collapse
Affiliation(s)
- Lei Han
- Centre for Cancer Molecular Diagnosis, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin, China
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yong-Jun Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Long Zhang
- Key Lab for Biological Control of the Ministry of Agriculture, China Agricultural University, Beijing, China
| | - Xu Cui
- Beijing Computing Center, Beijing, China
| | - Jinpu Yu
- Centre for Cancer Molecular Diagnosis, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ming S. Liu
- CSIRO - Computational Informatics & Digital Productivity Flagship, Private Bag 10, Clayton South, Australia
| |
Collapse
|
7
|
Improved prediction of residue flexibility by embedding optimized amino acid grouping into RSA-based linear models. Amino Acids 2014; 46:2665-80. [DOI: 10.1007/s00726-014-1817-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Accepted: 07/21/2014] [Indexed: 11/26/2022]
|
8
|
Zhou Y, Liu S, Song J, Zhang Z. Structural propensities of human ubiquitination sites: accessibility, centrality and local conformation. PLoS One 2013; 8:e83167. [PMID: 24349449 PMCID: PMC3859641 DOI: 10.1371/journal.pone.0083167] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Accepted: 10/30/2013] [Indexed: 12/03/2022] Open
Abstract
The existence and function of most proteins in the human proteome are regulated by the ubiquitination process. To date, tens of thousands human ubiquitination sites have been identified from high-throughput proteomic studies. However, the mechanism of ubiquitination site selection remains elusive because of the complicated sequence pattern flanking the ubiquitination sites. In this study, we perform a systematic analysis of 1,330 ubiquitination sites in 505 protein structures and quantify the significantly high accessibility and unexpectedly high centrality of human ubiquitination sites. Further analysis suggests that the higher centrality of ubiquitination sites is associated with the multi-functionality of ubiquitination sites, among which protein-protein interaction sites are common targets of ubiquitination. Moreover, we demonstrate that ubiquitination sites are flanked by residues with non-random local conformation. Finally, we provide quantitative and unambiguous evidence that most of the structural propensities contain specific information about ubiquitination site selection that is not represented by the sequence pattern. Therefore, the hypothesis about the structural level of the ubiquitination site selection mechanism has been substantially approved.
Collapse
Affiliation(s)
- Yuan Zhou
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Sixue Liu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
| |
Collapse
|
9
|
Chen Z, Wang Y, Zhai YF, Song J, Zhang Z. ZincExplorer: an accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences. MOLECULAR BIOSYSTEMS 2013; 9:2213-22. [DOI: 10.1039/c3mb70100j] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|