1
|
Zhang L, Liu T. ATP-Pred: Prediction of Protein-ATP Binding Residues via Fusion of Residue-Level Embeddings and Kolmogorov-Arnold Network. J Chem Inf Model 2025; 65:3812-3826. [PMID: 40119803 DOI: 10.1021/acs.jcim.5c00016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2025]
Abstract
Accurately identifying protein-ATP binding residues is essential for understanding biological processes and designing drugs. However, current sequence-based methods have limitations, such as difficulties in extracting discriminative features and the need for more efficient algorithms. Additionally, methods based on multiple sequence alignments often face challenges in handling large-scale predictions. To address these issues, we developed ATP-Pred, a sequence-based method for predicting ATP-binding residues in proteins. This model applies transfer learning by using two recently developed pretrain protein language models, Ankh and ProstT5, to extract residue-level embeddings that capture protein functionality. ATP-Pred also integrates a CNN-BiLSTM network and a Kolmogorov-Arnold network to build the prediction model. To handle data imbalance, we introduced a weighted focal loss function. Experimental results on three independent test data sets showed that ATP-Pred outperforms most existing methods. Its generalizability was further validated on four protein-mononucleotide binding residue data sets, where it delivered promising results. These findings suggest that ATP-Pred is a robust and reliable predictor.
Collapse
Affiliation(s)
- Lingrong Zhang
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| |
Collapse
|
2
|
Chang YY, Liu YC, Jhang WE, Wei SS, Ou YY. CaBind_MCNN: Identifying Potential Calcium Channel Blocker Targets by Predicting Calcium-Binding Sites in Ion Channels and Ion Transporters Using Protein Language Models and Multiscale Feature Extraction. J Chem Inf Model 2025; 65:2145-2157. [PMID: 39916334 PMCID: PMC11863380 DOI: 10.1021/acs.jcim.4c02252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 01/03/2025] [Accepted: 01/09/2025] [Indexed: 02/25/2025]
Abstract
Calcium ions (Ca2+) are crucial for various physiological processes, including neurotransmission and cardiac function. Dysregulation of Ca2+ homeostasis can lead to serious health conditions such as cardiac arrhythmias and hypertension. Ion channels and transporters play a vital role in maintaining cellular Ca2+ balance by facilitating Ca2+ transport across cell membranes. Accurate prediction of Ca2+ binding sites within these proteins is essential for understanding their function and identifying potential therapeutic targets, particularly for developing novel calcium channel blockers (CCBs). This study introduces CaBind_MCNN, an innovative computational model that leverages pretrained protein language models (PLMs) and a multiscale feature extraction approach to predict Ca2+ binding sites in ion channels and transporter proteins. Our method integrates embeddings from the ProtTrans PLM with a convolutional neural network (CNN)-based multiwindow scanning approach, capable of capturing diverse sequence features relevant to Ca2+ binding. The model, trained on a curated data set of 27 calcium-binding protein sequences, achieves high accuracy with an area under the curve (AUC) of 0.9886, significantly outperforming some existing methods. These results demonstrate the potential of CaBind_MCNN to enhance drug discovery efforts by identifying potential CCB targets and advancing the development of novel therapies for calcium-related disorders.
Collapse
Affiliation(s)
- Yan-Yun Chang
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Chen Liu
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Wei-En Jhang
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Sin-Siang Wei
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Yen Ou
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Graduate
Program in Biomedical Informatics, Yuan
Ze University, Chung-Li 32003, Taiwan
| |
Collapse
|
3
|
Zhou J, Xiao Y, Tang Q, Yan Y, Liu D, Zhang H. De novo design protein binders for MBP and GST tags. Biochem Biophys Res Commun 2025; 748:151322. [PMID: 39827550 DOI: 10.1016/j.bbrc.2025.151322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 11/24/2024] [Accepted: 01/11/2025] [Indexed: 01/22/2025]
Abstract
Maltose-binding protein (MBP) and glutathione S-transferase (GST) are widely used solubility-enhancing protein tags, typically employed to address various issues related to protein expression and purification. The detection of these tags are usually achieved through binding of corresponding antibodies. Designing low-cost binders as alternatives to antibodies is of great significance. This study employed a de novo design approach, starting with a large number of protein scaffolds and screening out 6 candidate binders targeting MBP and 4 candidate binders targeting GST based on scoring functions. Flow cytometry low-affinity selection and biolayer interferometry (BLI) quantitative results showed that MBP and GST can interact strongly with one or several binders, exhibiting nanomolar binding. Among them, LZMB3 has a binding dissociation constant (KD) of 54.05 ± 1.46 nM, while LJGB3 and LJGB4 have KD values of 105.4 ± 1.812 nM and 437.9 ± 17.69 nM, respectively.
Collapse
Affiliation(s)
- Jinlong Zhou
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, MOE Key Laboratory of Molecular Biophysics, Wuhan, 430074, China
| | - Yue Xiao
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, MOE Key Laboratory of Molecular Biophysics, Wuhan, 430074, China
| | - Qian Tang
- School of Environmental Science and Engineering, Huazhong University of Science and Technology, No. 1037 Luoyu Road, Wuhan, 430074, China
| | - Yunjun Yan
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, MOE Key Laboratory of Molecular Biophysics, Wuhan, 430074, China
| | - Dongqi Liu
- School of Environmental Science and Engineering, Huazhong University of Science and Technology, No. 1037 Luoyu Road, Wuhan, 430074, China.
| | - Houjin Zhang
- Department of Biotechnology, College of Life Science and Technology, Huazhong University of Science and Technology, MOE Key Laboratory of Molecular Biophysics, Wuhan, 430074, China.
| |
Collapse
|
4
|
Lv SQ, Zeng X, Su GP, Du WF, Li Y, Wen ML. Improving Identification of Drug-Target Binding Sites Based on Structures of Targets Using Residual Graph Transformer Network. Biomolecules 2025; 15:221. [PMID: 40001524 PMCID: PMC11853427 DOI: 10.3390/biom15020221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 01/28/2025] [Accepted: 01/28/2025] [Indexed: 02/27/2025] Open
Abstract
Improving identification of drug-target binding sites can significantly aid in drug screening and design, thereby accelerating the drug development process. However, due to challenges such as insufficient fusion of multimodal information from targets and imbalanced datasets, enhancing the performance of drug-target binding sites prediction models remains exceptionally difficult. Leveraging structures of targets, we proposed a novel deep learning framework, RGTsite, which employed a Residual Graph Transformer Network to improve the identification of drug-target binding sites. First, a residual 1D convolutional neural network (1D-CNN) and the pre-trained model ProtT5 were employed to extract the local and global sequence features from the target, respectively. These features were then combined with the physicochemical properties of amino acid residues to serve as the vertex features in graph. Next, the edge features were incorporated, and the residual graph transformer network (GTN) was applied to extract the more comprehensive vertex features. Finally, a fully connected network was used to classify whether the vertex was a binding site. Experimental results showed that RGTsite outperformed the existing state-of-the-art methods in key evaluation metrics, such as F1-score (F1) and Matthews Correlation Coefficient (MCC), across multiple benchmark datasets. Additionally, we conducted interpretability analysis for RGTsite through the real-world cases, and the results confirmed that RGTsite can effectively identify drug-target binding sites in practical applications.
Collapse
Affiliation(s)
- Shuang-Qing Lv
- Faculty of Surveying and Information Engineering, West Yunnan University of Applied Sciences, Dali 671000, China;
| | - Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Guang-Peng Su
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Wen-Feng Du
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming 650000, China
| |
Collapse
|
5
|
Vural O, Jololian L. Machine learning approaches for predicting protein-ligand binding sites from sequence data. FRONTIERS IN BIOINFORMATICS 2025; 5:1520382. [PMID: 39963299 PMCID: PMC11830693 DOI: 10.3389/fbinf.2025.1520382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 01/10/2025] [Indexed: 02/20/2025] Open
Abstract
Proteins, composed of amino acids, are crucial for a wide range of biological functions. Proteins have various interaction sites, one of which is the protein-ligand binding site, essential for molecular interactions and biochemical reactions. These sites enable proteins to bind with other molecules, facilitating key biological functions. Accurate prediction of these binding sites is pivotal in computational drug discovery, helping to identify therapeutic targets and facilitate treatment development. Machine learning has made significant contributions to this field by improving the prediction of protein-ligand interactions. This paper reviews studies that use machine learning to predict protein-ligand binding sites from sequence data, focusing on recent advancements. The review examines various embedding methods and machine learning architectures, addressing current challenges and the ongoing debates in the field. Additionally, research gaps in the existing literature are highlighted, and potential future directions for advancing the field are discussed. This study provides a thorough overview of sequence-based approaches for predicting protein-ligand binding sites, offering insights into the current state of research and future possibilities.
Collapse
Affiliation(s)
- Orhun Vural
- Department of Electrical and Computer Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | | |
Collapse
|
6
|
Le VT, Malik MS, Lin YJ, Liu YC, Chang YY, Ou YY. ATP_mCNN: Predicting ATP binding sites through pretrained language models and multi-window neural networks. Comput Biol Med 2025; 185:109541. [PMID: 39653625 DOI: 10.1016/j.compbiomed.2024.109541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 11/20/2024] [Accepted: 12/05/2024] [Indexed: 01/26/2025]
Abstract
Adenosine triphosphate plays a vital role in providing energy and enabling key cellular processes through interactions with binding proteins. The increasing amount of protein sequence data necessitates computational methods for identifying binding sites. However, experimental identification of adenosine triphosphate-binding residues remains challenging. To address the challenge, we developed a multi-window convolutional neural network architecture taking pre-trained protein language model embeddings as input features. In particular, multiple parallel convolutional layers scan for motifs localized to different window sizes. Max pooling extracts salient features concatenated across windows into a final multi-scale representation for residue-level classification. On benchmark datasets, our model achieves an area under the ROC curve of 0.95, significantly improving on prior sequence-based models and outperforming convolutional neural network baselines. This demonstrates the utility of pre-trained language models and multi-window convolutional neural networks for advanced sequence-based prediction of adenosine triphosphate-binding residues. Our approach provides a promising new direction for elucidating binding mechanisms and interactions from primary structure.
Collapse
Affiliation(s)
- Van-The Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Muhammad-Shahid Malik
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Department of Computer Sciences, Karakoram International University, Gilgit-Baltistan, 15100, Pakistan
| | - Yi-Jing Lin
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Chen Liu
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yan-Yun Chang
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li, 32003, Taiwan.
| |
Collapse
|
7
|
Lee J, Bang D, Kim S. Residue-Level Multiview Deep Learning for ATP Binding Site Prediction and Applications in Kinase Inhibitors. J Chem Inf Model 2025; 65:50-61. [PMID: 39690486 DOI: 10.1021/acs.jcim.4c01255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2024]
Abstract
Accurate identification of adenosine triphosphate (ATP) binding sites is crucial for understanding cellular functions and advancing drug discovery, particularly in targeting kinases for cancer treatment. Existing methods face significant challenges due to their reliance on time-consuming precomputed features and the heavily imbalanced nature of binding site data without further investigations on their utility in drug discovery. To address these limitations, we introduced Multiview-ATPBind and ResiBoost. Multiview-ATPBind is an end-to-end deep learning model that integrates one-dimensional (1D) sequence and three-dimensional (3D) structural information for rapid and precise residue-level pocket-ligand interaction predictions. Additionally, ResiBoost is a novel residue-level boosting algorithm designed to mitigate data imbalance by enhancing the prediction of rare positive binding residues. Our approach outperforms state-of-the-art models on benchmark data sets, showing significant improvements in balanced metrics with both experimental and AI-predicted structures. Furthermore, our model seamlessly transfers to predicting binding sites and enhancing docking simulations for kinase inhibitors, including imatinib and dasatinib, underscoring the potential of our method in drug discovery applications.
Collapse
Affiliation(s)
- Jaechan Lee
- Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Republic of Korea
- AIGENDRUG Co., Ltd., Seoul 08826, Republic of Korea
| | - Dongmin Bang
- AIGENDRUG Co., Ltd., Seoul 08826, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Republic of Korea
- AIGENDRUG Co., Ltd., Seoul 08826, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
8
|
le Maire A, Bourguet W. What Structural Biology Tells Us About the Mode of Action and Detection of Toxicants. Annu Rev Pharmacol Toxicol 2025; 65:529-546. [PMID: 39107041 DOI: 10.1146/annurev-pharmtox-061724-080642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
The study of the adverse effects of chemical substances on living organisms is an old and intense field of research. However, toxicological and environmental health sciences have long been dominated by descriptive approaches that enable associations or correlations but relatively few robust causal links and molecular mechanisms. Recent achievements have shown that structural biology approaches can bring this added value to the field. By providing atomic-level information, structural biology is a powerful tool to decipher the mechanisms by which toxicants bind to and alter the normal function of essential cell components, causing adverse effects. Here, using endocrine-disrupting chemicals as illustrative examples, we describe recent advances in the structure-based understanding of their modes of action and how this knowledge can be exploited to develop computational tools aimed at predicting properties of large collections of compounds.
Collapse
Affiliation(s)
- Albane le Maire
- Centre de Biologie Structurale (CBS), Univ Montpellier, CNRS, Inserm, Montpellier, France; ,
| | - William Bourguet
- Centre de Biologie Structurale (CBS), Univ Montpellier, CNRS, Inserm, Montpellier, France; ,
| |
Collapse
|
9
|
Essien C, Wang N, Yu Y, Alqarghuli S, Qin Y, Manshour N, He F, Xu D. Predicting the location of coordinated metal ion-ligand binding sites using geometry-aware graph neural networks. Comput Struct Biotechnol J 2024; 27:137-148. [PMID: 39840139 PMCID: PMC11750443 DOI: 10.1016/j.csbj.2024.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 12/15/2024] [Accepted: 12/20/2024] [Indexed: 01/23/2025] Open
Abstract
More than 50 % of proteins bind to metal ions. Interactions between metal ions and proteins, especially coordinated interactions, are essential for biological functions, such as maintaining protein structure and signal transport. Physiological metal-ion binding prediction is pivotal for both elucidating the biological functions of proteins and for the design of new drugs. However, accurately predicting these interactions remains challenging. In this study, we proposed GPred, a novel structure-based method that transforms the 3-dimensional structure of a protein into a point cloud representation and then designs a geometry-aware graph neural network to learn the local structural properties of each amino acid residue under specific ligand-binding supervision. We trained our model to predict the location of coordinated binding sites for five essential metal ions: Zn2+, Ca2+, Mg2+, Mn2+, and Fe2+. We further demonstrated the versatility of GPred by applying transfer learning to predict the binding sites of 2 heavy metal ions, that is, cadmium (Cd2+) and mercury (Hg2+). We achieved greater than 19.62 %, 14.32 %, 36.62 %, and 40.69 % improvement in the area under the precision-recall curve (AUPR) of Zn2+, Ca2+, Mg2+, Mn2+, and Fe2+, respectively, when compared with 6 current accessible state-of-the-art sequence-based or structure-based tools. We also validated the proposed approach on protein structures predicted by AlphaFold2, and its performance was similar to experimental protein structures. In both cases, achieving a low false discovery rate for proteins without annotated ion-binding sites was demonstrated. © 2017 Elsevier Inc. All rights reserved.
Collapse
Affiliation(s)
- Clement Essien
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Ning Wang
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China
| | - Yang Yu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Salhuldin Alqarghuli
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Yongfang Qin
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Negin Manshour
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Fei He
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- School of Information Science and Technology, Northeast Normal University, Changchun, Jilin, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| |
Collapse
|
10
|
Zhao Y, He S, Xing Y, Li M, Cao Y, Wang X, Zhao D, Bo X. A Point Cloud Graph Neural Network for Protein-Ligand Binding Site Prediction. Int J Mol Sci 2024; 25:9280. [PMID: 39273227 PMCID: PMC11394757 DOI: 10.3390/ijms25179280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 08/25/2024] [Accepted: 08/26/2024] [Indexed: 09/15/2024] Open
Abstract
Predicting protein-ligand binding sites is an integral part of structural biology and drug design. A comprehensive understanding of these binding sites is essential for advancing drug innovation, elucidating mechanisms of biological function, and exploring the nature of disease. However, accurately identifying protein-ligand binding sites remains a challenging task. To address this, we propose PGpocket, a geometric deep learning-based framework to improve protein-ligand binding site prediction. Initially, the protein surface is converted into a point cloud, and then the geometric and chemical properties of each point are calculated. Subsequently, the point cloud graph is constructed based on the inter-point distances, and the point cloud graph neural network (GNN) is applied to extract and analyze the protein surface information to predict potential binding sites. PGpocket is trained on the scPDB dataset, and its performance is verified on two independent test sets, Coach420 and HOLO4K. The results show that PGpocket achieves a 58% success rate on the Coach420 dataset and a 56% success rate on the HOLO4K dataset. These results surpass competing algorithms, demonstrating PGpocket's advancement and practicality for protein-ligand binding site prediction.
Collapse
Affiliation(s)
- Yanpeng Zhao
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Song He
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Yuting Xing
- Defense Innovation Institute, Beijing 100071, China
| | - Mengfan Li
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Yang Cao
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Xuanze Wang
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Dongsheng Zhao
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing 100850, China
| |
Collapse
|
11
|
Yuan Q, Tian C, Song Y, Ou P, Zhu M, Zhao H, Yang Y. GPSFun: geometry-aware protein sequence function predictions with language models. Nucleic Acids Res 2024; 52:W248-W255. [PMID: 38738636 PMCID: PMC11223820 DOI: 10.1093/nar/gkae381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/22/2024] [Accepted: 04/26/2024] [Indexed: 05/14/2024] Open
Abstract
Knowledge of protein function is essential for elucidating disease mechanisms and discovering new drug targets. However, there is a widening gap between the exponential growth of protein sequences and their limited function annotations. In our prior studies, we have developed a series of methods including GraphPPIS, GraphSite, LMetalSite and SPROF-GO for protein function annotations at residue or protein level. To further enhance their applicability and performance, we now present GPSFun, a versatile web server for Geometry-aware Protein Sequence Function annotations, which equips our previous tools with language models and geometric deep learning. Specifically, GPSFun employs large language models to efficiently predict 3D conformations of the input protein sequences and extract informative sequence embeddings. Subsequently, geometric graph neural networks are utilized to capture the sequence and structure patterns in the protein graphs, facilitating various downstream predictions including protein-ligand binding sites, gene ontologies, subcellular locations and protein solubility. Notably, GPSFun achieves superior performance to state-of-the-art methods across diverse tasks without requiring multiple sequence alignments or experimental protein structures. GPSFun is freely available to all users at https://bio-web1.nscc-gz.cn/app/GPSFun with user-friendly interfaces and rich visualizations.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Chong Tian
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yidong Song
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Peihua Ou
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Mingming Zhu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| |
Collapse
|
12
|
Jahn LR, Marquet C, Heinzinger M, Rost B. Protein embeddings predict binding residues in disordered regions. Sci Rep 2024; 14:13566. [PMID: 38866950 PMCID: PMC11169622 DOI: 10.1038/s41598-024-64211-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 06/06/2024] [Indexed: 06/14/2024] Open
Abstract
The identification of protein binding residues helps to understand their biological processes as protein function is often defined through ligand binding, such as to other proteins, small molecules, ions, or nucleotides. Methods predicting binding residues often err for intrinsically disordered proteins or regions (IDPs/IDPRs), often also referred to as molecular recognition features (MoRFs). Here, we presented a novel machine learning (ML) model trained to specifically predict binding regions in IDPRs. The proposed model, IDBindT5, leveraged embeddings from the protein language model (pLM) ProtT5 to reach a balanced accuracy of 57.2 ± 3.6% (95% confidence interval). Assessed on the same data set, this did not differ at the 95% CI from the state-of-the-art (SOTA) methods ANCHOR2 and DeepDISOBind that rely on expert-crafted features and evolutionary information from multiple sequence alignments (MSAs). Assessed on other data, methods such as SPOT-MoRF reached higher MCCs. IDBindT5's SOTA predictions are much faster than other methods, easily enabling full-proteome analyses. Our findings emphasize the potential of pLMs as a promising approach for exploring and predicting features of disordered proteins. The model and a comprehensive manual are publicly available at https://github.com/jahnl/binding_in_disorder .
Collapse
Affiliation(s)
- Laura R Jahn
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics and Computational Biology, TUM (Technical University of Munich), 85748, Garching/Munich, Germany
| | - Céline Marquet
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics and Computational Biology, TUM (Technical University of Munich), 85748, Garching/Munich, Germany.
| | - Michael Heinzinger
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics and Computational Biology, TUM (Technical University of Munich), 85748, Garching/Munich, Germany
| | - Burkhard Rost
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics and Computational Biology, TUM (Technical University of Munich), 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
13
|
Xia Y, Pan X, Shen HB. A comprehensive survey on protein-ligand binding site prediction. Curr Opin Struct Biol 2024; 86:102793. [PMID: 38447285 DOI: 10.1016/j.sbi.2024.102793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/18/2024] [Accepted: 02/18/2024] [Indexed: 03/08/2024]
Abstract
Protein-ligand binding site prediction is critical for protein function annotation and drug discovery. Biological experiments are time-consuming and require significant equipment, materials, and labor resources. Developing accurate and efficient computational methods for protein-ligand interaction prediction is essential. Here, we summarize the key challenges associated with ligand binding site (LBS) prediction and introduce recently published methods from their input features, computational algorithms, and ligand types. Furthermore, we investigate the specificity of allosteric site identification as a particular LBS type. Finally, we discuss the prospective directions for machine learning-based LBS prediction in the near future.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
14
|
Yuan Q, Tian C, Yang Y. Genome-scale annotation of protein binding sites via language model and geometric deep learning. eLife 2024; 13:RP93695. [PMID: 38630609 PMCID: PMC11023698 DOI: 10.7554/elife.93695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Revealing protein binding sites with other molecules, such as nucleic acids, peptides, or small ligands, sheds light on disease mechanism elucidation and novel drug design. With the explosive growth of proteins in sequence databases, how to accurately and efficiently identify these binding sites from sequences becomes essential. However, current methods mostly rely on expensive multiple sequence alignments or experimental protein structures, limiting their genome-scale applications. Besides, these methods haven't fully explored the geometry of the protein structures. Here, we propose GPSite, a multi-task network for simultaneously predicting binding residues of DNA, RNA, peptide, protein, ATP, HEM, and metal ions on proteins. GPSite was trained on informative sequence embeddings and predicted structures from protein language models, while comprehensively extracting residual and relational geometric contexts in an end-to-end manner. Experiments demonstrate that GPSite substantially surpasses state-of-the-art sequence-based and structure-based approaches on various benchmark datasets, even when the structures are not well-predicted. The low computational cost of GPSite enables rapid genome-scale binding residue annotations for over 568,000 sequences, providing opportunities to unveil unexplored associations of binding sites with molecular functions, biological processes, and genetic variants. The GPSite webserver and annotation database can be freely accessed at https://bio-web1.nscc-gz.cn/app/GPSite.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| | - Chong Tian
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| |
Collapse
|
15
|
Wu JS, Liu Y, Ge F, Yu DJ. Prediction of protein-ATP binding residues using multi-view feature learning via contextual-based co-attention network. Comput Biol Med 2024; 172:108227. [PMID: 38460308 DOI: 10.1016/j.compbiomed.2024.108227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/17/2024] [Accepted: 02/25/2024] [Indexed: 03/11/2024]
Abstract
Accurately predicting protein-ATP binding residues is critical for protein function annotation and drug discovery. Computational methods dedicated to the prediction of binding residues based on protein sequence information have exhibited notable advancements in predictive accuracy. Nevertheless, these methods continue to grapple with several formidable challenges, including limited means of extracting more discriminative features and inadequate algorithms for integrating protein and residue information. To address the problems, we propose ATP-Deep, a novel protein-ATP binding residues predictor. ATP-Deep harnesses the capabilities of unsupervised pre-trained language models and incorporates domain-specific evolutionary context information from homologous sequences. It further refines the embedding at the residue level through integration with corresponding protein-level information and employs a contextual-based co-attention mechanism to adeptly fuse multiple sources of features. The performance evaluation results on the benchmark datasets reveal that ATP-Deep achieves an AUC of 0.954 and 0.951, respectively, surpassing the performance of the state-of-the-art model. These findings underscore the effectiveness of assimilating protein-level information and deploying a contextual-based co-attention mechanism grounded in context to bolster the prediction performance of protein-ATP binding residues.
Collapse
Affiliation(s)
- Jia-Shun Wu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Yan Liu
- School of Information Engineering, Yangzhou University, 196 West Huayang, Yangzhou, 225100, China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China.
| |
Collapse
|
16
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
17
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
18
|
Zhang L, Xiao K, Wang X, Kong L. A novel fusion technology utilizing complex network and sequence information for FAD-binding site identification. Anal Biochem 2024; 685:115401. [PMID: 37981176 DOI: 10.1016/j.ab.2023.115401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/08/2023] [Accepted: 11/14/2023] [Indexed: 11/21/2023]
Abstract
Flavin adenine dinucleotide (FAD) binding sites play an increasingly important role as useful targets for inhibiting bacterial infections. To reveal protein topological structural information as a reasonable complement for the identification FAD-binding sites, we designed a novel fusion technology according to sequence and complex network. The specially designed feature vectors were combined and fed into CatBoost for model construction. Moreover, due to the minority class (positive samples) is more significant for biological researches, a random under-sampling technique was applied to solve the imbalance. Compared with the previous methods, our methods achieved the best results for two independent test datasets. Especially, the MCC obtained by FADsite and FADsite_seq were 14.37 %-53.37 % and 21.81 %-60.81 % higher than the results of existing methods on Test6; and they showed improvements ranging from 6.03 % to 21.96 % and 19.77 %-35.70 % on Test4. Meanwhile, statistical tests show that our methods significantly differ from the state-of-the-art methods and the cross-entropy loss shows that our methods have high certainty. The excellent results demonstrated the effectiveness of using sequence and complex network information in identifying FAD-binding sites. It may be complementary to other biological studies. The data and resource codes are available at https://github.com/Kangxiaoneuq/FADsite.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China; Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China
| | - Kang Xiao
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Xueting Wang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Liang Kong
- Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China; School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, PR China.
| |
Collapse
|
19
|
Rao B, Yu X, Bai J, Hu J. E2EATP: Fast and High-Accuracy Protein-ATP Binding Residue Prediction via Protein Language Model Embedding. J Chem Inf Model 2024; 64:289-300. [PMID: 38127815 DOI: 10.1021/acs.jcim.3c01298] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Identifying the ATP-binding sites of proteins is fundamentally important to uncover the mechanisms of protein functions and explore drug discovery. Many computational methods are proposed to predict ATP-binding sites. However, due to the limitation of the quality of feature representation, the prediction performance still has a big room for improvement. In this study, we propose an end-to-end deep learning model, E2EATP, to dig out more discriminative information from a protein sequence for improving the ATP-binding site prediction performance. Concretely, we employ a pretrained deep learning-based protein language model (ESM2) to automatically extract high-latent discriminative representations of protein sequences relevant for protein functions. Based on ESM2, we design a residual convolutional neural network to train a protein-ATP binding site prediction model. Furthermore, a weighted focal loss function is used to reduce the negative impact of imbalanced data on the model training stage. Experimental results on the two independent testing data sets demonstrate that E2EATP could achieve higher Matthew's correlation coefficient and AUC values than most existing state-of-the-art prediction methods. The speed (about 0.05 s per protein) of E2EATP is much faster than the other existing prediction methods. Detailed data analyses show that the major advantage of E2EATP lies at the utilization of the pretrained protein language model that extracts more discriminative information from the protein sequence only. The standalone package of E2EATP is freely available for academic at https://github.com/jun-csbio/e2eatp/.
Collapse
Affiliation(s)
- Bing Rao
- School of Information and Electrical Engineering, Hangzhou City University, Hangzhou 310015, China
| | - Xuan Yu
- Glasgow College, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jie Bai
- School of Information and Electrical Engineering, Hangzhou City University, Hangzhou 310015, China
| | - Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
20
|
Habeeb M, You HW, Umapathi M, Ravikumar KK, Hariyadi, Mishra S. Strategies of Artificial intelligence tools in the domain of nanomedicine. J Drug Deliv Sci Technol 2024; 91:105157. [DOI: 10.1016/j.jddst.2023.105157] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
21
|
Fang Y, Jiang Y, Wei L, Ma Q, Ren Z, Yuan Q, Wei DQ. DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model. Bioinformatics 2023; 39:btad718. [PMID: 38015872 PMCID: PMC10723037 DOI: 10.1093/bioinformatics/btad718] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/04/2023] [Accepted: 11/27/2023] [Indexed: 11/30/2023] Open
Abstract
MOTIVATION Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.
Collapse
Affiliation(s)
- Yitian Fang
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Yi Jiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Leyi Wei
- School of Software, Shandong University, Jinan, Shandong 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | | | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| |
Collapse
|
22
|
Essien C, Jiang L, Wang D, Xu D. Prediction of Protein Ion-Ligand Binding Sites with ELECTRA. Molecules 2023; 28:6793. [PMID: 37836636 PMCID: PMC10574437 DOI: 10.3390/molecules28196793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 09/15/2023] [Accepted: 09/19/2023] [Indexed: 10/15/2023] Open
Abstract
Interactions between proteins and ions are essential for various biological functions like structural stability, metabolism, and signal transport. Given that more than half of all proteins bind to ions, it is becoming crucial to identify ion-binding sites. The accurate identification of protein-ion binding sites helps us to understand proteins' biological functions and plays a significant role in drug discovery. While several computational approaches have been proposed, this remains a challenging problem due to the small size and high versatility of metals and acid radicals. In this study, we propose IonPred, a sequence-based approach that employs ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) to predict ion-binding sites using only raw protein sequences. We successfully fine-tuned our pretrained model to predict the binding sites for nine metal ions (Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, and K+) and four acid radical ion ligands (CO32-, SO42-, PO43-, NO2-). IonPred surpassed six current state-of-the-art tools by over 44.65% and 28.46%, respectively, in the F1 score and MCC when compared on an independent test dataset. Our method is more computationally efficient than existing tools, producing prediction results for a hundred sequences for a specific ion in under ten minutes.
Collapse
Affiliation(s)
| | | | | | - Dong Xu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (C.E.); (L.J.); (D.W.)
| |
Collapse
|
23
|
Li P, Liu ZP. GeoBind: segmentation of nucleic acid binding interface on protein surface with geometric deep learning. Nucleic Acids Res 2023; 51:e60. [PMID: 37070217 PMCID: PMC10250245 DOI: 10.1093/nar/gkad288] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 03/21/2023] [Accepted: 04/06/2023] [Indexed: 04/19/2023] Open
Abstract
Unveiling the nucleic acid binding sites of a protein helps reveal its regulatory functions in vivo. Current methods encode protein sites from the handcrafted features of their local neighbors and recognize them via a classification, which are limited in expressive ability. Here, we present GeoBind, a geometric deep learning method for predicting nucleic binding sites on protein surface in a segmentation manner. GeoBind takes the whole point clouds of protein surface as input and learns the high-level representation based on the aggregation of their neighbors in local reference frames. Testing GeoBind on benchmark datasets, we demonstrate GeoBind is superior to state-of-the-art predictors. Specific case studies are performed to show the powerful ability of GeoBind to explore molecular surfaces when deciphering proteins with multimer formation. To show the versatility of GeoBind, we further extend GeoBind to five other types of ligand binding sites prediction tasks and achieve competitive performances.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
24
|
Yousefi N, Yazdani-Jahromi M, Tayebi A, Kolanthai E, Neal CJ, Banerjee T, Gosai A, Balasubramanian G, Seal S, Ozmen Garibay O. BindingSite-AugmentedDTA: enabling a next-generation pipeline for interpretable prediction models in drug repurposing. Brief Bioinform 2023; 24:7140297. [PMID: 37096593 DOI: 10.1093/bib/bbad136] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 03/02/2022] [Accepted: 03/16/2023] [Indexed: 04/26/2023] Open
Abstract
While research into drug-target interaction (DTI) prediction is fairly mature, generalizability and interpretability are not always addressed in the existing works in this field. In this paper, we propose a deep learning (DL)-based framework, called BindingSite-AugmentedDTA, which improves drug-target affinity (DTA) predictions by reducing the search space of potential-binding sites of the protein, thus making the binding affinity prediction more efficient and accurate. Our BindingSite-AugmentedDTA is highly generalizable as it can be integrated with any DL-based regression model, while it significantly improves their prediction performance. Also, unlike many existing models, our model is highly interpretable due to its architecture and self-attention mechanism, which can provide a deeper understanding of its underlying prediction mechanism by mapping attention weights back to protein-binding sites. The computational results confirm that our framework can enhance the prediction performance of seven state-of-the-art DTA prediction algorithms in terms of four widely used evaluation metrics, including concordance index, mean squared error, modified squared correlation coefficient ($r^2_m$) and the area under the precision curve. We also contribute to three benchmark drug-traget interaction datasets by including additional information on 3D structure of all proteins contained in those datasets, which include the two most commonly used datasets, namely Kiba and Davis, as well as the data from IDG-DREAM drug-kinase binding prediction challenge. Furthermore, we experimentally validate the practical potential of our proposed framework through in-lab experiments. The relatively high agreement between computationally predicted and experimentally observed binding interactions supports the potential of our framework as the next-generation pipeline for prediction models in drug repurposing.
Collapse
Affiliation(s)
- Niloofar Yousefi
- Industrial Engineering and Management Systems, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| | - Mehdi Yazdani-Jahromi
- Computer Science, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| | - Aida Tayebi
- Industrial Engineering and Management Systems, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| | - Elayaraja Kolanthai
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
| | - Craig J Neal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
| | - Tanumoy Banerjee
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem 18015, PA, USA
| | | | - Ganesh Balasubramanian
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem 18015, PA, USA
| | - Sudipta Seal
- College of Medicine, Bionix Cluster, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
- Advanced Materials Processing and Analysis Center, Department of Materials Science and Engineering, University of Central Florida, 4000 Central Florida Blvd., Orlando 32816, FL, USA
| | - Ozlem Ozmen Garibay
- Industrial Engineering and Management Systems, University of Central Florida, 32816, 4000 Central Florida Blvd., Orlando, FL, USA
| |
Collapse
|
25
|
Xia Y, Pan X, Shen HB. LigBind: identifying binding residues for over 1000 ligands with relation-aware graph neural networks. J Mol Biol 2023; 435:168091. [PMID: 37054909 DOI: 10.1016/j.jmb.2023.168091] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 03/22/2023] [Accepted: 04/05/2023] [Indexed: 04/15/2023]
Abstract
Identifying the interactions between proteins and ligands is significant for drug discovery and design. Considering the diverse binding patterns of ligands, the ligand-specific methods are trained per ligand to predict binding residues. However, most of the existing ligand-specific methods ignore shared binding preferences among various ligands and generally only cover a limited number of ligands with a sufficient number of known binding proteins. In this study, we propose a relation-aware framework LigBind with graph-level pre-training to enhance the ligand-specific binding residue predictions for 1159 ligands, which can effectively cover the ligands with a few known binding proteins. LigBind first pre-trains a graph neural network-based feature extractor for ligand-residue pairs and relation-aware classifiers for similar ligands. Then, LigBind is fine-tuned with ligand-specific binding data, where a domain adaptive neural network is designed to automatically leverage the diversity and similarity of various ligand-binding patterns for accurate binding residue prediction. We construct ligand-specific benchmark datasets of 1159 ligands and 16 unseen ligands, which are used to evaluate the effectiveness of LigBind. The results demonstrate the LigBind's efficacy on the large-scale ligand-specific benchmark datasets, and generalizes well to unseen ligands. LigBind also enables accurate identification of the ligand-binding residues in the main protease, papain-like protease and the RNA-dependent RNA polymerase of SARS-CoV-2. The webserver and source codes of LigBind are available at http://www.csbio.sjtu.edu.cn/bioinf/LigBind/ and https://github.com/YYingXia/LigBind/ for academic use.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
26
|
Chatterjee A, Walters R, Shafi Z, Ahmed OS, Sebek M, Gysi D, Yu R, Eliassi-Rad T, Barabási AL, Menichetti G. Improving the generalizability of protein-ligand binding predictions with AI-Bind. Nat Commun 2023; 14:1989. [PMID: 37031187 PMCID: PMC10082765 DOI: 10.1038/s41467-023-37572-z] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 03/23/2023] [Indexed: 04/10/2023] Open
Abstract
Identifying novel drug-target interactions is a critical and rate-limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, here we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Here we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training to improve binding predictions for novel proteins and ligands. We validate AI-Bind predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. AI-Bind is a high-throughput approach to identify drug-target combinations with the potential of becoming a powerful tool in drug discovery.
Collapse
Affiliation(s)
- Ayan Chatterjee
- Network Science Institute, Northeastern University, Boston, MA, USA
| | - Robin Walters
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Zohair Shafi
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Omair Shafi Ahmed
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Michael Sebek
- Network Science Institute, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
| | - Deisy Gysi
- Network Science Institute, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, CA, USA
| | - Tina Eliassi-Rad
- Network Science Institute, Northeastern University, Boston, MA, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
- Santa Fe Institute, Santa Fe, NM, USA
- The Institute for Experiential AI, Northeastern University, Boston, MA, USA
| | - Albert-László Barabási
- Network Science Institute, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
- Department of Network and Data Science, Central European University, Budapest, Hungary
| | - Giulia Menichetti
- Network Science Institute, Northeastern University, Boston, MA, USA.
- Department of Physics, Northeastern University, Boston, MA, USA.
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
27
|
Yu T, Boob AG, Volk MJ, Liu X, Cui H, Zhao H. Machine learning-enabled retrobiosynthesis of molecules. Nat Catal 2023. [DOI: 10.1038/s41929-022-00909-w] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
28
|
Giri N, Cheng J. Improving Protein-Ligand Interaction Modeling with cryo-EM Data, Templates, and Deep Learning in 2021 Ligand Model Challenge. Biomolecules 2023; 13:132. [PMID: 36671518 PMCID: PMC9855343 DOI: 10.3390/biom13010132] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 01/04/2023] [Accepted: 01/06/2023] [Indexed: 01/11/2023] Open
Abstract
Elucidating protein-ligand interaction is crucial for studying the function of proteins and compounds in an organism and critical for drug discovery and design. The problem of protein-ligand interaction is traditionally tackled by molecular docking and simulation, which is based on physical forces and statistical potentials and cannot effectively leverage cryo-EM data and existing protein structural information in the protein-ligand modeling process. In this work, we developed a deep learning bioinformatics pipeline (DeepProLigand) to predict protein-ligand interactions from cryo-EM density maps of proteins and ligands. DeepProLigand first uses a deep learning method to predict the structure of proteins from cryo-EM maps, which is averaged with a reference (template) structure of the proteins to produce a combined structure to add ligands. The ligands are then identified and added into the structure to generate a protein-ligand complex structure, which is further refined. The method based on the deep learning prediction and template-based modeling was blindly tested in the 2021 EMDataResource Ligand Challenge and was ranked first in fitting ligands to cryo-EM density maps. These results demonstrate that the deep learning bioinformatics approach is a promising direction for modeling protein-ligand interactions on cryo-EM data using prior structural information.
Collapse
Affiliation(s)
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
29
|
Luo Z, Lou L, Qiu W, Xu Z, Xiao X. Predicting N6-Methyladenosine Sites in Multiple Tissues of Mammals through Ensemble Deep Learning. Int J Mol Sci 2022; 23:15490. [PMID: 36555143 PMCID: PMC9778682 DOI: 10.3390/ijms232415490] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/03/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.
Collapse
Affiliation(s)
| | | | | | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
30
|
Xia Y, Xia C, Pan X, Shen H. BindWeb: A web server for ligand binding residue and pocket prediction from protein structures. Protein Sci 2022; 31:e4462. [PMID: 36190332 PMCID: PMC9667820 DOI: 10.1002/pro.4462] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/27/2022] [Accepted: 09/28/2022] [Indexed: 12/13/2022]
Abstract
Knowledge of protein-ligand interactions is beneficial for biological process analysis and drug design. Given the complexity of the interactions and the inadequacy of experimental data, accurate ligand binding residue and pocket prediction remains challenging. In this study, we introduce an easy-to-use web server BindWeb for ligand-specific and ligand-general binding residue and pocket prediction from protein structures. BindWeb integrates a graph neural network GraphBind with a hybrid convolutional neural network and bidirectional long short-term memory network DELIA to identify binding residues. Furthermore, BindWeb clusters the predicted binding residues to binding pockets with mean shift clustering. The experimental results and case study demonstrate that BindWeb benefits from the complementarity of two base methods. BindWeb is freely available for academic use at http://www.csbio.sjtu.edu.cn/bioinf/BindWeb/.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern RecognitionShanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of ChinaShanghaiChina
| | - Chunqiu Xia
- Institute of Image Processing and Pattern RecognitionShanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of ChinaShanghaiChina
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern RecognitionShanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of ChinaShanghaiChina
| | - Hong‐Bin Shen
- Institute of Image Processing and Pattern RecognitionShanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of ChinaShanghaiChina
| |
Collapse
|
31
|
Yuan Q, Chen S, Wang Y, Zhao H, Yang Y. Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning. Brief Bioinform 2022; 23:6770088. [PMID: 36274238 DOI: 10.1093/bib/bbac444] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Revised: 09/02/2022] [Accepted: 09/17/2022] [Indexed: 12/14/2022] Open
Abstract
More than one-third of the proteins contain metal ions in the Protein Data Bank. Correct identification of metal ion-binding residues is important for understanding protein functions and designing novel drugs. Due to the small size and high versatility of metal ions, it remains challenging to computationally predict their binding sites from protein sequence. Existing sequence-based methods are of low accuracy due to the lack of structural information, and time-consuming owing to the usage of multi-sequence alignment. Here, we propose LMetalSite, an alignment-free sequence-based predictor for binding sites of the four most frequently seen metal ions in BioLiP (Zn2+, Ca2+, Mg2+ and Mn2+). LMetalSite leverages the pretrained language model to rapidly generate informative sequence representations and employs transformer to capture long-range dependencies. Multi-task learning is adopted to compensate for the scarcity of training data and capture the intrinsic similarities between different metal ions. LMetalSite was shown to surpass state-of-the-art structure-based methods by more than 19.7, 14.4, 36.8 and 12.6% in area under the precision recall on the four independent tests, respectively. Further analyses indicated that the self-attention modules are effective to learn the structural contexts of residues from protein sequence. We provide the data sets, source codes and trained models of LMetalSite at https://github.com/biomed-AI/LMetalSite.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering at Sun Yat-sen University, Guangzhou 510000, China
| | - Sheng Chen
- School of Computer Science and Engineering at Sun Yat-sen University, Guangzhou 510000, China
| | - Yu Wang
- Peng Cheng National Laboratory at Shenzhen, Guangzhou 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital at Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China, and Key Laboratory of Machine Intelligence and Advanced Computing of MOE, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
32
|
Gervasoni S, Malloci G, Bosin A, Vargiu AV, Zgurskaya HI, Ruggerone P. Recognition of quinolone antibiotics by the multidrug efflux transporter MexB of Pseudomonas aeruginosa. Phys Chem Chem Phys 2022; 24:16566-16575. [PMID: 35766032 PMCID: PMC9278589 DOI: 10.1039/d2cp00951j] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The drug/proton antiporter MexB is the engine of the major efflux pump MexAB-OprM in Pseudomonas aeruginosa. This protein is known to transport a large variety of compounds, including antibiotics, thus conferring a multi-drug resistance phenotype. Due to the difficulty of producing co-crystals, only two X-ray structures of MexB in a complex with ligands are available to date, and mechanistic aspects are largely hypothesized based on the body of data collected for the homologous protein AcrB of Escherichia coli. In particular, a recent study (Ornik-Cha, Wilhelm, Kobylka et al., Nat. Commun., 2021, 12, 6919) reported a co-crystal structure of AcrB in a complex with levofloxacin, an antibiotic belonging to the important class of (fluoro)-quinolones. In this work, we performed a systematic ensemble docking campaign coupled to the cluster analysis and molecular-mechanics optimization of docking poses to study the interaction between 36 quinolone antibiotics and MexB. We additionally investigated surface complementarity between each molecule and the transporter and thoroughly assessed the computational protocol adopted against the known experimental data. Our study reveals different binding preferences of the investigated compounds towards the sub-sites of the large deep binding pocket of MexB, supporting the hypothesis that MexB substrates oscillate between different binding modes with similar affinity. Interestingly, small changes in the molecular structure translate into significant differences in MexB–quinolone interactions. All the predicted binding modes are available for download and visualization at the following link: https://www.dsf.unica.it/dock/mexb/quinolones. Putative binding modes (BMs) of quinolones to the bacterial efflux transporter MexB were identified. Multiple interaction patterns are possible, supporting the hypothesis that substrates oscillate between different BMs with similar affinity.![]()
Collapse
Affiliation(s)
- Silvia Gervasoni
- Department of Physics, University of Cagliari, Citt. Universitaria, I-09042 Monserrato (Cagliari), Italy.
| | - Giuliano Malloci
- Department of Physics, University of Cagliari, Citt. Universitaria, I-09042 Monserrato (Cagliari), Italy.
| | - Andrea Bosin
- Department of Physics, University of Cagliari, Citt. Universitaria, I-09042 Monserrato (Cagliari), Italy.
| | - Attilio V Vargiu
- Department of Physics, University of Cagliari, Citt. Universitaria, I-09042 Monserrato (Cagliari), Italy.
| | - Helen I Zgurskaya
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73072, USA
| | - Paolo Ruggerone
- Department of Physics, University of Cagliari, Citt. Universitaria, I-09042 Monserrato (Cagliari), Italy.
| |
Collapse
|
33
|
Aja PM, Awoke JN, Agu PC, Adegboyega AE, Ezeh EM, Igwenyi IO, Orji OU, Ani OG, Ale BA, Ibiam UA. Hesperidin abrogates bisphenol A endocrine disruption through binding with fibroblast growth factor 21 (FGF-21), α-amylase and α-glucosidase: an in silico molecular study. J Genet Eng Biotechnol 2022; 20:84. [PMID: 35648239 PMCID: PMC9160168 DOI: 10.1186/s43141-022-00370-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 05/20/2022] [Indexed: 12/03/2022]
Abstract
Background Fibroblast growth factor 21 (FGF-21), alpha-amylase, and alpha-glucosidase are key proteins implicated in metabolic dysregulations. Bisphenol A (BPA) is an environmental toxicant known to cause endocrine dysregulations. Hesperidin from citrus is an emerging flavonoid for metabolic diseases management. Through computational approach, we investigated the potentials of hesperidin in abrogating BPA interference in metabolism. The 3D crystal structure of the proteins (FGF-21, α-amylase, and α-glucosidase) and the ligands (BPA and hesperidin) were retrieved from the PDB and PubChem database respectively. Using Autodock plugin Pyrx, molecular docking of the ligands and individual proteins were performed to ascertain their binding affinities and their potentials to compete for the same binding site. Validation of the docking study was considered as the ability of the ligands to bind at the same site of each proteins. The docking poses were visualized using UCSF Chimera and Discovery Studio 2020, respectively to reveal each of the protein-ligands interactions within the binding pockets. Using SwissAdme and AdmeSar servers, we further investigated hesperidin’s ADMET profile. Hesperidin used was purchased commercially. Results Hesperidin and BPA competitively bound to the same site on each protein. Interestingly, hesperidin had greater binding affinities (Kcal/mol) − 5.80, − 9.60, and − 9.60 than BPA (Kcal/mol) − 4.40, − 7.20, − 7.10 for FGF-21, α-amylase, and α-glucosidase respectively. Visualizations of the binding poses showed that hesperidin interacted with stronger bonds than BPA within the proteins’ pockets. Although hesperidin violated Lipinski rule of five, this however can be optimized through structural modifications. Conclusions Hesperidin may be an emerging natural product with promising therapeutic potentials against metabolic and endocrine derangement.
Collapse
Affiliation(s)
- P M Aja
- Department of Biochemistry, Faculty of Science, Ebonyi State University, Abakaliki, Nigeria
| | - J N Awoke
- Department of Biochemistry, Faculty of Science, Ebonyi State University, Abakaliki, Nigeria. .,Interdisciplinary Biomedical Research Centre, School of Science and Technology, Nottingham Trent University, Nottingham, UK.
| | - P C Agu
- Department of Biochemistry, Faculty of Science, Ebonyi State University, Abakaliki, Nigeria
| | - A E Adegboyega
- Department of Biochemistry, Faculty of Medical Sciences, University of Jos/Jaris Computational Biology Centre, Jos, Nigeria
| | - E M Ezeh
- Department of Chemical Engineering, Nnamdi Azikiwe University, Awka, Nigeria
| | - I O Igwenyi
- Department of Biochemistry, Faculty of Science, Ebonyi State University, Abakaliki, Nigeria
| | - O U Orji
- Department of Biochemistry, Faculty of Science, Ebonyi State University, Abakaliki, Nigeria
| | - O G Ani
- Nutrition and Exercise Physiology, University of Missouri, Columbia, United States of America
| | - B A Ale
- Department of Biochemistry, University of Nigeria Nsukka, Nsukka, Nigeria
| | - U A Ibiam
- Department of Biochemistry, Faculty of Science, Ebonyi State University, Abakaliki, Nigeria
| |
Collapse
|
34
|
Yan X, Lu Y, Li Z, Wei Q, Gao X, Wang S, Wu S, Cui S. PointSite: A Point Cloud Segmentation Tool for Identification of Protein Ligand Binding Atoms. J Chem Inf Model 2022; 62:2835-2845. [PMID: 35621730 DOI: 10.1021/acs.jcim.1c01512] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurate identification of ligand binding sites (LBS) on a protein structure is critical for understanding protein function and designing structure-based drugs. As the previous pocket-centric methods are usually based on the investigation of pseudo-surface-points outside the protein structure, they cannot fully take advantage of the local connectivity of atoms within the protein, as well as the global 3D geometrical information from all the protein atoms. In this paper, we propose a novel point clouds segmentation method, PointSite, for accurate identification of protein ligand binding atoms, which performs protein LBS identification at the atom-level in a protein-centric manner. Specifically, we first transfer the original 3D protein structure to point clouds and then conduct segmentation through Submanifold Sparse Convolution based U-Net. With the fine-grained atom-level binding atoms representation and enhanced feature learning, PointSite can outperform previous methods in atom Intersection over Union (atom-IoU) by a large margin. Furthermore, our segmented binding atoms, that is, atoms with high probability predicted by our model can work as a filter on predictions achieved by previous pocket-centric approaches, which significantly decreases the false-positive of LBS candidates. Besides, we further directly extend PointSite trained on bound proteins for LBS identification on unbound proteins, which demonstrates the superior generalization capacity of PointSite. Through cascaded filter and reranking aided by the segmented atoms, state-of-the-art performance can be achieved over various canonical benchmarks, CAMEO hard targets, and unbound proteins in terms of the commonly used DCA criteria.
Collapse
Affiliation(s)
- Xu Yan
- The Chinese University of Hongkong (Shenzhen) & Future Network of Intelligence Institute, Shenzhen 518172, China
| | - Yingfeng Lu
- The Chinese University of Hongkong (Shenzhen) & Future Network of Intelligence Institute, Shenzhen 518172, China
| | - Zhen Li
- The Chinese University of Hongkong (Shenzhen) & Future Network of Intelligence Institute, Shenzhen 518172, China
| | - Qing Wei
- The Chinese University of Hongkong (Shenzhen) & Future Network of Intelligence Institute, Shenzhen 518172, China
| | - Xin Gao
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Song Wu
- Shenzhen University, Shenzhen 518060, China
| | - Shuguang Cui
- The Chinese University of Hongkong (Shenzhen) & Future Network of Intelligence Institute, Shenzhen 518172, China
| |
Collapse
|
35
|
A Comprehensive Review of Computation-Based Metal-Binding Prediction Approaches at the Residue Level. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8965712. [PMID: 35402609 PMCID: PMC8989566 DOI: 10.1155/2022/8965712] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 03/04/2022] [Indexed: 12/29/2022]
Abstract
Clear evidence has shown that metal ions strongly connect and delicately tune the dynamic homeostasis in living bodies. They have been proved to be associated with protein structure, stability, regulation, and function. Even small changes in the concentration of metal ions can shift their effects from natural beneficial functions to harmful. This leads to degenerative diseases, malignant tumors, and cancers. Accurate characterizations and predictions of metalloproteins at the residue level promise informative clues to the investigation of intrinsic mechanisms of protein-metal ion interactions. Compared to biophysical or biochemical wet-lab technologies, computational methods provide open web interfaces of high-resolution databases and high-throughput predictors for efficient investigation of metal-binding residues. This review surveys and details 18 public databases of metal-protein binding. We collect a comprehensive set of 44 computation-based methods and classify them into four categories, namely, learning-, docking-, template-, and meta-based methods. We analyze the benchmark datasets, assessment criteria, feature construction, and algorithms. We also compare several methods on two benchmark testing datasets and include a discussion about currently publicly available predictive tools. Finally, we summarize the challenges and underlying limitations of the current studies and propose several prospective directions concerning the future development of the related databases and methods.
Collapse
|
36
|
Zhou X, Song H, Li J. Residue-Frustration-Based Prediction of Protein-Protein Interactions Using Machine Learning. J Phys Chem B 2022; 126:1719-1727. [PMID: 35170967 DOI: 10.1021/acs.jpcb.1c10525] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The study of protein-protein interactions (PPIs) is important in understanding the function of proteins. However, it is still a challenge to investigate the transient protein-protein interaction by experiments. Hence, the computational prediction for protein-protein interactions draws growing attention. Statistics-based features have been widely used in the studies of protein structure prediction and protein folding. Due to the scarcity of experimental data of PPI, it is difficult to construct a conventional statistical feature for PPI prediction, and the application of statistics-based features is very limited in this field. In this paper, we explored the application of frustration, a statistical potential, in PPI prediction. By comparing the energetic contribution of the extra stabilization energy from a given residue pair in the native protein with the statistics of the energies, we obtained the residue pair's frustration index. By calculating the number of residue pairs with a high frustration index, the highly frustrated density, a residue-frustration-based feature, was then obtained to describe the tendency of residues to be involved in PPI. Highly frustrated density, as well as structure-based features, were then used to describe protein residues and combined with the long short-term memory (LSTM) neural network to predict PPI residue pairs. Our model correctly predicted 75% dimers when only the top 2‰ residue pairs were selected in each dimer. Our model, which considers the statistics-based features, is significantly different from the models based on the chemical features of residues. We found that frustration can effectively describe the tendency of residue to be involved in PPI. Frustration-based features can replace chemical features to combine with machine learning and realize the better performance of PPI prediction. It reveals the great potential of statistical potential such as frustration in PPI prediction.
Collapse
Affiliation(s)
- Xiaozhou Zhou
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Haoyu Song
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| | - Jingyuan Li
- Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China
| |
Collapse
|
37
|
Dhakal A, McKay C, Tanner JJ, Cheng J. Artificial intelligence in the prediction of protein-ligand interactions: recent advances and future directions. Brief Bioinform 2022; 23:bbab476. [PMID: 34849575 PMCID: PMC8690157 DOI: 10.1093/bib/bbab476] [Citation(s) in RCA: 112] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/28/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein-ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein-ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein-ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein-ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein-ligand interactions.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Cole McKay
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
| | - John J Tanner
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
- Department of Chemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| |
Collapse
|
38
|
Littmann M, Heinzinger M, Dallago C, Weissenow K, Rost B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci Rep 2021; 11:23916. [PMID: 34903827 PMCID: PMC8668950 DOI: 10.1038/s41598-021-03431-4] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 12/02/2021] [Indexed: 01/27/2023] Open
Abstract
One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model (pLM) ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed MSA-based predictions. Combination with homology-based inference increased performance to F1 = 48 ± 3% (95% CI) and MCC = 0.46 ± 0.04 when merging all three ligand classes into one. All results were confirmed by three independent data sets. Focusing on very reliably predicted residues could complement experimental evidence: For the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when ignoring the problem of missing experimental annotations. The new method bindEmbed21 is fast, simple, and broadly applicable-neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding and predicted about 6% of all residues as binding to metal ions, nucleic acids, or small molecules.
Collapse
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Konstantin Weissenow
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, I12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
39
|
Xia Y, Xia CQ, Pan X, Shen HB. GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues. Nucleic Acids Res 2021; 49:e51. [PMID: 33577689 PMCID: PMC8136796 DOI: 10.1093/nar/gkab044] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 02/09/2021] [Indexed: 11/24/2022] Open
Abstract
Knowledge of the interactions between proteins and nucleic acids is the basis of understanding various biological activities and designing new drugs. How to accurately identify the nucleic-acid-binding residues remains a challenging task. In this paper, we propose an accurate predictor, GraphBind, for identifying nucleic-acid-binding residues on proteins based on an end-to-end graph neural network. Considering that binding sites often behave in highly conservative patterns on local tertiary structures, we first construct graphs based on the structural contexts of target residues and their spatial neighborhood. Then, hierarchical graph neural networks (HGNNs) are used to embed the latent local patterns of structural and bio-physicochemical characteristics for binding residue recognition. We comprehensively evaluate GraphBind on DNA/RNA benchmark datasets. The results demonstrate the superior performance of GraphBind than state-of-the-art methods. Moreover, GraphBind is extended to other ligand-binding residue prediction to verify its generalization capability. Web server of GraphBind is freely available at http://www.csbio.sjtu.edu.cn/bioinf/GraphBind/.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Chun-Qiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.,School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
40
|
Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity. Interdiscip Sci 2021; 13:693-702. [PMID: 34143353 DOI: 10.1007/s12539-021-00448-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 05/31/2021] [Accepted: 06/04/2021] [Indexed: 10/21/2022]
Abstract
Transmembrane proteins play a vital role in cell life activities. There are several techniques to determine transmembrane protein structures and X-ray crystallography is the primary methodology. However, due to the special properties of transmembrane proteins, it is still hard to determine their structures by X-ray crystallography technique. To reduce experimental consumption and improve experimental efficiency, it is of great significance to develop computational methods for predicting the crystallization propensity of transmembrane proteins. In this work, we proposed a sequence-based machine learning method, namely Prediction of TransMembrane protein Crystallization propensity (PTMC), to predict the propensity of transmembrane protein crystallization. First, we obtained several general sequence features and the specific encoded features of relative solvent accessibility and hydrophobicity. Second, feature selection was employed to filter out redundant and irrelevant features, and the optimal feature subset is composed of hydrophobicity, amino acid composition and relative solvent accessibility. Finally, we chose extreme gradient boosting by comparing with other several machine learning methods. Comparative results on the independent test set indicate that PTMC outperforms state-of-the-art sequence-based methods in terms of sensitivity, specificity, accuracy, Matthew's Correlation Coefficient (MCC) and Area Under the receiver operating characteristic Curve (AUC). In comparison with two competitors, Bcrystal and TMCrys, PTMC achieves an improvement by 0.132 and 0.179 for sensitivity, 0.014 and 0.127 for specificity, 0.037 and 0.192 for accuracy, 0.128 and 0.362 for MCC, and 0.027 and 0.125 for AUC, respectively.
Collapse
|
41
|
Hu J, Zheng LL, Bai YS, Zhang KW, Yu DJ, Zhang GJ. Accurate prediction of protein-ATP binding residues using position-specific frequency matrix. Anal Biochem 2021; 626:114241. [PMID: 33971164 DOI: 10.1016/j.ab.2021.114241] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 04/27/2021] [Accepted: 05/01/2021] [Indexed: 10/21/2022]
Abstract
Knowledge of protein-ATP interaction can help for protein functional annotation and drug discovery. Accurately identifying protein-ATP binding residues is an important but challenging task to gain the knowledge of protein-ATP interactions, especially for the case where only protein sequence information is given. In this study, we propose a novel method, named DeepATPseq, to predict protein-ATP binding residues without using any information about protein three-dimension structure or sequence-derived structural information. In DeepATPseq, the HHBlits-generated position-specific frequency matrix (PSFM) profile is first employed to extract the feature information of each residue. Then, for each residue, the PSFM-based feature is fed into two prediction models, which are generated by the algorithms of deep convolutional neural network (DCNN) and support vector machine (SVM) separately. The final ATP-binding probability of the corresponding residue is calculated by the weighted sum of the outputted values of DCNN-based and SVM-based models. Experimental results on the independent validation data set demonstrate that DeepATPseq could achieve an accuracy of 77.71%, covering 57.42% of all ATP-binding residues, while achieving a Matthew's correlation coefficient value (0.655) that is significantly higher than that of existing sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis show that the major advantage of DeepATPseq lies at the combination utilization of DCNN and SVM that helps dig out more discriminative information from the PSFM profiles. The online server and standalone package of DeepATPseq are freely available at: https://jun-csbio.github.io/DeepATPseq/for academic use.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.
| | - Lin-Lin Zheng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Yan-Song Bai
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Ke-Wen Zhang
- College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology,Xiaolingwei 200, Nanjing, 210094, China.
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.
| |
Collapse
|
42
|
Identification of ligand-binding residues using protein sequence profile alignment and query-specific support vector machine model. Anal Biochem 2020; 604:113799. [DOI: 10.1016/j.ab.2020.113799] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 05/23/2020] [Accepted: 05/26/2020] [Indexed: 12/23/2022]
|