1
|
Wang ZC, Zeng Y, Sun JY, Chen XQ, Wu HC, Li YY, Mu YG, Zheng LZ, Gao ZB, Li WF. An efficient deep learning-based strategy to screen inhibitors for GluN1/GluN3A receptor. Acta Pharmacol Sin 2025:10.1038/s41401-025-01513-x. [PMID: 40069493 DOI: 10.1038/s41401-025-01513-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Accepted: 02/12/2025] [Indexed: 03/15/2025]
Abstract
The GluN1/GluN3A receptor, a unique excitatory glycine receptor recently identified in the central nervous system, challenges traditional perspectives of N-methyl-D-aspartate (NMDA) receptor diversity and glycinergic signaling. Its role in emotional regulation positions it as a potential therapeutic target for neuropsychiatric disorders. However, pharmacological research on GluN1/GluN3A receptors remains at an early stage. Traditional high-throughput screening methods for ion channel drug discovery often lack efficiency, particularly when applied to large compound libraries. To address this concern, we designed a deep learning-based strategy that balances efficiency and accuracy for identifying GluN1/GluN3A inhibitors. First, a sequence-based scoring function was developed to rapidly screen a library containing 18 million compounds, reducing the pool to approximately 105 candidates. Next, two complex-based scoring functions, IGModel and RTMScore, were employed to precisely score and rank the remaining candidates. Finally, an active molecule with an IC50 of 2.87 ± 0.80 μM for the GluN1/GluN3A receptor was confirmed through whole-cell voltage-clamp electrophysiology. This study also presents a paradigm for integrating deep learning into rapid and precise high-throughput screening.
Collapse
Affiliation(s)
- Ze-Chen Wang
- School of Physics, Shandong University, Jinan, 250100, China
| | - Yue Zeng
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Department of Pharmacology, School of Pharmacy, Fudan University, Shanghai, 200032, China
| | - Jin-Yuan Sun
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xue-Qin Chen
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - Hao-Chen Wu
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - Yang-Yang Li
- School of Physics, Shandong University, Jinan, 250100, China
| | - Yu-Guang Mu
- School of Biological Science, Nanyang Technological University, Singapore, 637551, Singapore
| | - Liang-Zhen Zheng
- Shenzhen Zelixir Biotech Co. Ltd, Hengtaiyu Park, Shenzhen, 518107, China
| | - Zhao-Bing Gao
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan, 528400, China.
| | - Wei-Feng Li
- School of Physics, Shandong University, Jinan, 250100, China.
| |
Collapse
|
2
|
Sim J, Kim D, Kim B, Choi J, Lee J. Recent advances in AI-driven protein-ligand interaction predictions. Curr Opin Struct Biol 2025; 92:103020. [PMID: 39999605 DOI: 10.1016/j.sbi.2025.103020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 01/23/2025] [Accepted: 01/31/2025] [Indexed: 02/27/2025]
Abstract
Structure-based drug discovery is a fundamental approach in modern drug development, leveraging computational models to predict protein-ligand interactions. AI-driven methodologies are significantly improving key aspects of the field, including ligand binding site prediction, protein-ligand binding pose estimation, scoring function development, and virtual screening. In this review, we summarize the recent AI-driven advances in various protein-ligand interaction prediction tasks. Traditional docking methods based on empirical scoring functions often lack accuracy, whereas AI models, including graph neural networks, mixture density networks, transformers, and diffusion models, have enhanced predictive performance. Ligand binding site prediction has been refined using geometric deep learning and sequence-based embeddings, aiding in the identification of potential druggable target sites. Binding pose prediction has evolved with sampling-based and regression-based models, as well as protein-ligand co-generation frameworks. AI-powered scoring functions now integrate physical constraints and deep learning techniques to improve binding affinity estimation, leading to more robust virtual screening strategies. Despite these advances, generalization across diverse protein-ligand pairs remains a challenge. As AI technologies continue to evolve, they are expected to revolutionize molecular docking and affinity prediction, increasing both the accuracy and efficiency of structure-based drug discovery.
Collapse
Affiliation(s)
- Jaemin Sim
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dongwoo Kim
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Bomin Kim
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jieun Choi
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea
| | - Juyong Lee
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea; College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea; Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea; Arontier Co., Seoul, 06735, Republic of Korea.
| |
Collapse
|
3
|
Beck AG, Fine J, Lam YH, Sherer EC, Regalado EL, Aggarwal P. Dedenser: A Python Package for Clustering and Downsampling Chemical Libraries. J Chem Inf Model 2025; 65:1053-1060. [PMID: 39883037 DOI: 10.1021/acs.jcim.4c01980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
The screening of chemical libraries is an essential starting point in the drug discovery process. While some researchers desire a more thorough screening of drug targets against a narrower scope of molecules, it is not uncommon for diverse screening sets to be favored during the early stages of drug discovery. However, a cost burden is associated with the screening of molecules, with potential drawbacks if particular areas of chemical space are needlessly overrepresented. To facilitate triaged sampling of chemical libraries and other collections of molecules, we have developed Dedenser, a tool for the downsampling of chemical clusters. Dedenser functions by reducing the membership of clusters within chemical point clouds while maintaining the initial topology or distribution in chemical space. Dedenser is a Python package that utilizes Hierarchical Density-Based Spatial Clustering of Applications with Noise to first identify clusters present in 3D chemical point clouds and then downsamples by applying Poisson disk sampling to clusters based on either their volume or density in chemical space. A command line interface tool and graphic user interface are available with Dedenser, which allow for the generation of chemical point clouds, using Mordred for QSAR descriptor calculations and uniform manifold approximation and projection for 3D embedding, as well as visualization. We hope that Dedenser will serve the community by enabling quick access to reduced collections of molecules that are representative of larger sets and selecting even distributions of molecules within clusters rather than single representative molecules from clusters. All code for Dedenser is open source and available at https://github.com/MSDLLCpapers/dedenser.
Collapse
Affiliation(s)
- Armen G Beck
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Jonathan Fine
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Yu-Hong Lam
- Modeling and Informatics, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Edward C Sherer
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Erik L Regalado
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| | - Pankaj Aggarwal
- Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, New Jersey 07065, United States
| |
Collapse
|
4
|
Vural O, Jololian L. Machine learning approaches for predicting protein-ligand binding sites from sequence data. FRONTIERS IN BIOINFORMATICS 2025; 5:1520382. [PMID: 39963299 PMCID: PMC11830693 DOI: 10.3389/fbinf.2025.1520382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 01/10/2025] [Indexed: 02/20/2025] Open
Abstract
Proteins, composed of amino acids, are crucial for a wide range of biological functions. Proteins have various interaction sites, one of which is the protein-ligand binding site, essential for molecular interactions and biochemical reactions. These sites enable proteins to bind with other molecules, facilitating key biological functions. Accurate prediction of these binding sites is pivotal in computational drug discovery, helping to identify therapeutic targets and facilitate treatment development. Machine learning has made significant contributions to this field by improving the prediction of protein-ligand interactions. This paper reviews studies that use machine learning to predict protein-ligand binding sites from sequence data, focusing on recent advancements. The review examines various embedding methods and machine learning architectures, addressing current challenges and the ongoing debates in the field. Additionally, research gaps in the existing literature are highlighted, and potential future directions for advancing the field are discussed. This study provides a thorough overview of sequence-based approaches for predicting protein-ligand binding sites, offering insights into the current state of research and future possibilities.
Collapse
Affiliation(s)
- Orhun Vural
- Department of Electrical and Computer Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | | |
Collapse
|
5
|
Utgés JS, Barton GJ. Comparative evaluation of methods for the prediction of protein-ligand binding sites. J Cheminform 2024; 16:126. [PMID: 39529176 PMCID: PMC11552181 DOI: 10.1186/s13321-024-00923-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 10/28/2024] [Indexed: 11/16/2024] Open
Abstract
The accurate identification of protein-ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years, focusing on the latest machine learning-based methods such as VN-EGNN, IF-SitePred, GrASP, PUResNet, and DeepPocket and compare them to the established P2Rank, PRANK and fpocket and earlier methods like PocketFinder, Ligsite and Surfnet. We benchmark the methods against the human subset of our new curated reference dataset, LIGYSIS. LIGYSIS is a comprehensive protein-ligand complex dataset comprising 30,000 proteins with bound ligands which aggregates biologically relevant unique protein-ligand interfaces across biological units of multiple structures from the same protein. LIGYSIS is an improvement for testing methods over earlier datasets like sc-PDB, PDBbind, binding MOAD, COACH420 and HOLO4K which either include 1:1 protein-ligand complexes or consider asymmetric units. Re-scoring of fpocket predictions by PRANK and DeepPocket display the highest recall (60%) whilst IF-SitePred presents the lowest recall (39%). We demonstrate the detrimental effect that redundant prediction of binding sites has on performance as well as the beneficial impact of stronger pocket scoring schemes, with improvements up to 14% in recall (IF-SitePred) and 30% in precision (Surfnet). Finally, we propose top-N+2 recall as the universal benchmark metric for ligand binding site prediction and urge authors to share not only the source code of their methods, but also of their benchmark.Scientific contributionsThis study conducts the largest benchmark of ligand binding site prediction methods to date, comparing 13 original methods and 15 variants using 10 informative metrics. The LIGYSIS dataset is introduced, which aggregates biologically relevant protein-ligand interfaces across multiple structures of the same protein. The study highlights the detrimental effect of redundant binding site prediction and demonstrates significant improvement in recall and precision through stronger scoring schemes. Finally, top-N+2 recall is proposed as a universal benchmark metric for ligand binding site prediction, with a recommendation for open-source sharing of both methods and benchmarks.
Collapse
Affiliation(s)
- Javier S Utgés
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, Scotland, UK
| | - Geoffrey J Barton
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, Scotland, UK.
| |
Collapse
|
6
|
Luo Q, Wang S, Li HY, Zheng L, Mu Y, Guo J. Benchmarking reverse docking through AlphaFold2 human proteome. Protein Sci 2024; 33:e5167. [PMID: 39276010 PMCID: PMC11400627 DOI: 10.1002/pro.5167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 08/21/2024] [Accepted: 08/24/2024] [Indexed: 09/16/2024]
Abstract
Predicting the binding of ligands to the human proteome via reverse-docking methods enables the understanding of ligand's interactions with potential protein targets in the human body, thereby facilitating drug repositioning and the evaluation of potential off-target effects or toxic side effects of drugs. In this study, we constructed 11 reverse docking pipelines by integrating site prediction tools (PointSite and SiteMap), docking programs (Glide and AutoDock Vina), and scoring functions (Glide, Autodock Vina, RTMScore, DeepRMSD, and OnionNet-SFCT), and then thoroughly benchmarked their predictive capabilities. The results show that the Glide_SFCT (PS) pipeline exhibited the best target prediction performance based on the atomic structure models in AlphaFold2 human proteome. It achieved a success rate of 27.8% when considering the top 100 ranked prediction. This pipeline effectively narrows the range of potential targets within the human proteome, laying a foundation for drug target prediction, off-target assessment, and toxicity prediction, ultimately boosting drug development. By facilitating these critical aspects of drug discovery and development, our work has the potential to ultimately accelerate the identification of new therapeutic agents and improve drug safety.
Collapse
Affiliation(s)
- Qing Luo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., China
| | - Hoi Yeung Li
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Liangzhen Zheng
- Shenzhen Zelixir Biotech Company Ltd., China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| |
Collapse
|
7
|
Lam HYI, Guan JS, Ong XE, Pincket R, Mu Y. Protein language models are performant in structure-free virtual screening. Brief Bioinform 2024; 25:bbae480. [PMID: 39327890 PMCID: PMC11427677 DOI: 10.1093/bib/bbae480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 08/17/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open
Abstract
Hitherto virtual screening (VS) has been typically performed using a structure-based drug design paradigm. Such methods typically require the use of molecular docking on high-resolution three-dimensional structures of a target protein-a computationally-intensive and time-consuming exercise. This work demonstrates that by employing protein language models and molecular graphs as inputs to a novel graph-to-transformer cross-attention mechanism, a screening power comparable to state-of-the-art structure-based models can be achieved. The implications thereof include highly expedited VS due to the greatly reduced compute required to run this model, and the ability to perform early stages of computer-aided drug design in the complete absence of 3D protein structures.
Collapse
Affiliation(s)
- Hilbert Yuen In Lam
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Jia Sheng Guan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
| | - Xing Er Ong
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Robbe Pincket
- Heliovision, Asstraat 5, 3000 Leuven, Leuven, Kingdom of Belgium
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| |
Collapse
|
8
|
Zhao Y, He S, Xing Y, Li M, Cao Y, Wang X, Zhao D, Bo X. A Point Cloud Graph Neural Network for Protein-Ligand Binding Site Prediction. Int J Mol Sci 2024; 25:9280. [PMID: 39273227 PMCID: PMC11394757 DOI: 10.3390/ijms25179280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 08/25/2024] [Accepted: 08/26/2024] [Indexed: 09/15/2024] Open
Abstract
Predicting protein-ligand binding sites is an integral part of structural biology and drug design. A comprehensive understanding of these binding sites is essential for advancing drug innovation, elucidating mechanisms of biological function, and exploring the nature of disease. However, accurately identifying protein-ligand binding sites remains a challenging task. To address this, we propose PGpocket, a geometric deep learning-based framework to improve protein-ligand binding site prediction. Initially, the protein surface is converted into a point cloud, and then the geometric and chemical properties of each point are calculated. Subsequently, the point cloud graph is constructed based on the inter-point distances, and the point cloud graph neural network (GNN) is applied to extract and analyze the protein surface information to predict potential binding sites. PGpocket is trained on the scPDB dataset, and its performance is verified on two independent test sets, Coach420 and HOLO4K. The results show that PGpocket achieves a 58% success rate on the Coach420 dataset and a 56% success rate on the HOLO4K dataset. These results surpass competing algorithms, demonstrating PGpocket's advancement and practicality for protein-ligand binding site prediction.
Collapse
Affiliation(s)
- Yanpeng Zhao
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Song He
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Yuting Xing
- Defense Innovation Institute, Beijing 100071, China
| | - Mengfan Li
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Yang Cao
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Xuanze Wang
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Dongsheng Zhao
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing 100850, China
| |
Collapse
|
9
|
Wu J, Wang Y, Cai W, Chen D, Peng X, Dong H, Li J, Liu H, Shi S, Tang S, Li Z, Sui H, Wang Y, Wu C, Zhang Y, Fu X, Yin Y. Ribosomal translation of fluorinated non-canonical amino acids for de novo biologically active fluorinated macrocyclic peptides. Chem Sci 2024:d4sc04061a. [PMID: 39129776 PMCID: PMC11310889 DOI: 10.1039/d4sc04061a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 07/25/2024] [Indexed: 08/13/2024] Open
Abstract
Fluorination has emerged as a promising strategy in medicinal chemistry to improve the pharmacological profiles of drug candidates. Similarly, incorporating fluorinated non-canonical amino acids into macrocyclic peptides expands chemical diversity and enhances their pharmacological properties, from improved metabolic stability to enhanced cell permeability and target interactions. However, only a limited number of fluorinated non-canonical amino acids, which are canonical amino acid analogs, have been incorporated into macrocyclic peptides by ribosomes for de novo construction and target-based screening of fluorinated macrocyclic peptides. In this study, we report the ribosomal translation of a series of distinct fluorinated non-canonical amino acids, including mono-to tri-fluorinated variants, as well as fluorinated l-amino acids, d-amino acids, β-amino acids, etc. This enabled the de novo discovery of fluorinated macrocyclic peptides with high affinity for EphA2, and particularly the identification of those exhibiting broad-spectrum activity against Gram-negative bacteria by targeting the BAM complex. This study not only expands the scope of ribosomally translatable fluorinated amino acids but also underscores the versatility of fluorinated macrocyclic peptides as potent therapeutic agents.
Collapse
Affiliation(s)
- Junjie Wu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Yuchan Wang
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Wenfeng Cai
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Danyan Chen
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Xiangda Peng
- Shanghai Zelixir Biotech Company Ltd Shanghai 200030 China
| | - Huilei Dong
- College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 China
| | - Jinjing Li
- College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 China
| | - Hongtan Liu
- College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 China
| | - Shuting Shi
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Sen Tang
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Zhifeng Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Haiyan Sui
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Yan Wang
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Chuanliu Wu
- College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Xinmiao Fu
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
- Shandong Research Institute of Industrial Technology Jinan 250101 China
| |
Collapse
|
10
|
Wang X, Xu K, Zeng X, Linghu K, Zhao B, Yu S, Wang K, Yu S, Zhao X, Zeng W, Wang K, Zhou J. Machine learning-assisted substrate binding pocket engineering based on structural information. Brief Bioinform 2024; 25:bbae381. [PMID: 39101501 PMCID: PMC11299021 DOI: 10.1093/bib/bbae381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 05/25/2024] [Accepted: 07/23/2024] [Indexed: 08/06/2024] Open
Abstract
Engineering enzyme-substrate binding pockets is the most efficient approach for modifying catalytic activity, but is limited if the substrate binding sites are indistinct. Here, we developed a 3D convolutional neural network for predicting protein-ligand binding sites. The network was integrated by DenseNet, UNet, and self-attention for extracting features and recovering sample size. We attempted to enlarge the dataset by data augmentation, and the model achieved success rates of 48.4%, 35.5%, and 43.6% at a precision of ≥50% and 52%, 47.6%, and 58.1%. The distance of predicted and real center is ≤4 Å, which is based on SC6K, COACH420, and BU48 validation datasets. The substrate binding sites of Klebsiella variicola acid phosphatase (KvAP) and Bacillus anthracis proline 4-hydroxylase (BaP4H) were predicted using DUnet, showing high competitive performance of 53.8% and 56% of the predicted binding sites that critically affected the catalysis of KvAP and BaP4H. Virtual saturation mutagenesis was applied based on the predicted binding sites of KvAP, and the top-ranked 10 single mutations contributed to stronger enzyme-substrate binding varied while the predicted sites were different. The advantage of DUnet for predicting key residues responsible for enzyme activity further promoted the success rate of virtual mutagenesis. This study highlighted the significance of correctly predicting key binding sites for enzyme engineering.
Collapse
Affiliation(s)
- Xinglong Wang
- School of Food Science and Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Kangjie Xu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Xuan Zeng
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Kai Linghu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Beichen Zhao
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Shangyang Yu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Kun Wang
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Shuyao Yu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Xinyi Zhao
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Weizhu Zeng
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Kai Wang
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Jingwen Zhou
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| |
Collapse
|
11
|
Zhou R, Fan J, Li S, Zeng W, Chen Y, Zheng X, Chen H, Liao J. LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification. J Cheminform 2024; 16:79. [PMID: 38972994 PMCID: PMC11229186 DOI: 10.1186/s13321-024-00871-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 06/12/2024] [Indexed: 07/09/2024] Open
Abstract
BACKGROUND Previous deep learning methods for predicting protein binding pockets mainly employed 3D convolution, yet an abundance of convolution operations may lead the model to excessively prioritize local information, thus overlooking global information. Moreover, it is essential for us to account for the influence of diverse protein folding structural classes. Because proteins classified differently structurally exhibit varying biological functions, whereas those within the same structural class share similar functional attributes. RESULTS We proposed LVPocket, a novel method that synergistically captures both local and global information of protein structure through the integration of Transformer encoders, which help the model achieve better performance in binding pockets prediction. And then we tailored prediction models for data of four distinct structural classes of proteins using the transfer learning. The four fine-tuned models were trained on the baseline LVPocket model which was trained on the sc-PDB dataset. LVPocket exhibits superior performance on three independent datasets compared to current state-of-the-art methods. Additionally, the fine-tuned model outperforms the baseline model in terms of performance. SCIENTIFIC CONTRIBUTION We present a novel model structure for predicting protein binding pockets that provides a solution for relying on extensive convolutional computation while neglecting global information about protein structures. Furthermore, we tackle the impact of different protein folding structures on binding pocket prediction tasks through the application of transfer learning methods.
Collapse
Affiliation(s)
- Ruifeng Zhou
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Jing Fan
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Sishu Li
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Wenjie Zeng
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Yilun Chen
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Xiaoshan Zheng
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Hongyang Chen
- Research Center for Graph Computing, Zhejiang Lab, Hangzhou, 311121, Zhejiang, People's Republic of China.
| | - Jun Liao
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China.
- Zhejiang Lab, Hangzhou, 311121, Zhejiang, People's Republic of China.
| |
Collapse
|
12
|
Mishra S, Rout M, Singh MK, Dehury B, Pati S. Classical molecular dynamics simulation identifies catechingallate as a promising antiviral polyphenol against MPOX palmitoylated surface protein. Comput Biol Chem 2024; 110:108070. [PMID: 38678726 DOI: 10.1016/j.compbiolchem.2024.108070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/04/2024] [Accepted: 04/06/2024] [Indexed: 05/01/2024]
Abstract
Cumulative global prevalence of the emergent monkeypox (MPX) infection in the non-endemic countries has been professed as a global public health predicament. Lack of effective MPX-specific treatments sets the baseline for designing the current study. This research work uncovers the effective use of known antiviral polyphenols against MPX viral infection, and recognises their mode of interaction with the target F13 protein, that plays crucial role in formation of enveloped virions. Herein, we have employed state-of-the-art machine learning based AlphaFold2 to predict the three-dimensional structure of F13 followed by molecular docking and all-atoms molecular dynamics (MD) simulations to investigate the differential mode of F13-polyphenol interactions. Our extensive computational approach identifies six potent polyphenols Rutin, Epicatechingallate, Catechingallate, Quercitrin, Isoquecitrin and Hyperoside exhibiting higher binding affinity towards F13, buried inside a positively charged binding groove. Intermolecular contact analysis of the docked and MD simulated complexes divulges three important residues Asp134, Ser137 and Ser321 that are observed to be involved in ligand binding through hydrogen bonds. Our findings suggest that ligand binding induces minor conformational changes in F13 to affect the conformation of the binding site. Concomitantly, essential dynamics of the six-MD simulated complexes reveals Catechin gallate, a known antiviral agent as a promising polyphenol targeting F13 protein, dominated with a dense network of hydrophobic contacts. However, assessment of biological activities of these polyphenols need to be confirmed through in vitro and in vivo assays, which may pave the way for development of new novel antiviral drugs.
Collapse
Affiliation(s)
- Sarbani Mishra
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Chandrasekharpur, Bhubaneswar, Odisha 751023, India
| | - Madhusmita Rout
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Chandrasekharpur, Bhubaneswar, Odisha 751023, India
| | - Mahender Kumar Singh
- Data Science Laboratory, National Brain Research Centre, Gurgaon, Haryana 122052, India
| | - Budheswar Dehury
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Chandrasekharpur, Bhubaneswar, Odisha 751023, India; Department of Bioinformatics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal 576104, India.
| | - Sanghamitra Pati
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Chandrasekharpur, Bhubaneswar, Odisha 751023, India.
| |
Collapse
|
13
|
Ravichandran A, Araque JC, Lawson JW. Predicting the functional state of protein kinases using interpretable graph neural networks from sequence and structural data. Proteins 2024; 92:623-636. [PMID: 38083830 DOI: 10.1002/prot.26641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 10/13/2023] [Accepted: 11/09/2023] [Indexed: 04/13/2024]
Abstract
Protein kinases are central to cellular activities and are actively pursued as drug targets for several conditions including cancer and autoimmune diseases. Despite the availability of a large structural database for kinases, methodologies to elucidate the structure-function relationship of these proteins (without manual intervention) are lacking. Such techniques are essential in structural biology and to accelerate drug discovery efforts. Here, we implement an interpretable graph neural network (GNN) framework for classifying the functionally active and inactive states of a large set of protein kinases by only using their tertiary structure and amino acid sequence. We show that the GNN models can classify kinase structures with high accuracy (>97%). We implement the Gradient-weighted Class Activation Mapping for graphs (Graph Grad-CAM) to automatically identify structurally important residues and residue-residue contacts of the kinases without any a priori input. We show that the motifs identified through the Graph Grad-CAM methodology are functionally critical, consistent with the existing kinase literature. Notably, the highly conserved DFG and HRD motifs of the well-known hydrophobic spine are identified by the interpretable framework in addition to some of the lesser known motifs. Further, using Grad-CAM maps as the vector embedding of the protein structures, we identify the subtle differences in the crystal structures among different sub-classes of kinases in the Protein Data Bank (PDB). Frameworks such as the one implemented here, for high-throughput identification of protein structure-function relationships are essential in designing targeted small molecules therapies as well as in engineering new proteins for novel applications.
Collapse
Affiliation(s)
- Ashwin Ravichandran
- KBR Inc., Intelligent Systems Division, NASA Ames Research Center, Moffett Field, California, USA
| | - Juan C Araque
- KBR Inc., Intelligent Systems Division, NASA Ames Research Center, Moffett Field, California, USA
| | - John W Lawson
- Intelligent Systems Division, NASA Ames Research Center, Moffett Field, California, USA
| |
Collapse
|
14
|
Rout M, Dey S, Mishra S, Panda S, Singh MK, Sinha R, Dehury B, Pati S. Machine learning and classical MD simulation to identify inhibitors against the P37 envelope protein of monkeypox virus. J Biomol Struct Dyn 2024; 42:3935-3948. [PMID: 37221882 DOI: 10.1080/07391102.2023.2216290] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/16/2023] [Indexed: 05/25/2023]
Abstract
Monkeypox virus (MPXV) outbreak is a serious public health concern that requires international attention. P37 of MPXV plays a pivotal role in DNA replication and acts as one of the promising targets for antiviral drug design. In this study, we intent to screen potential analogs of existing FDA approved drugs of MPXV against P37 using state-of-the-art machine learning and computational biophysical techniques. AlphaFold2 guided all-atoms molecular dynamics simulations optimized P37 structure is used for molecular docking and binding free energy calculations. Similar to members of Phospholipase-D family , the predicted P37 structure also adopts a β-α-β-α-β sandwich fold, harbouring strongly conserved HxKxxxxD motif. The binding pocket comprises of Tyr48, Lys86, His115, Lys117, Ser130, Asn132, Trp280, Asn240, His325, Lys327 and Tyr346 forming strong hydrogen bonds and dense hydrophobic contacts with the screened analogs and is surrounded by positively charged patches. Loops connecting the two domains and C-terminal region exhibit high degree of flexibility. In some structural ensembles, the partial disorderness in the C-terminal region is presumed to be due to its low confidence score, acquired during structure prediction. Transition from loop to β-strands (244-254 aa) in P37-Cidofovir and its analog complexes advocates the need for further investigations. MD simulations support the accuracy of the molecular docking results, indicating the potential of analogs as potent binders of P37. Taken together, our results provide preferable understanding of molecular recognition and dynamics of ligand-bound states of P37, offering opportunities for development of new antivirals against MPXV. However, the need of in vitro and in vivo assays for confirmation of these results still persists.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Madhusmita Rout
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Bhubaneswar, Odisha, India
| | - Suchanda Dey
- Biomics and Biodiversity Lab, Siksha 'O' Anusandhan (deemed to be) University, Bhubaneswar, Odisha, India
| | - Sarbani Mishra
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Bhubaneswar, Odisha, India
| | - Sunita Panda
- Mycology Division, ICMR-Regional Medical Research Centre, Nalco Square, Bhubaneswar, Odisha, India
| | - Mahender Kumar Singh
- Data Science Laboratory, National Brain Research Centre, Gurgaon, Haryana, India
| | - Rohan Sinha
- Computer Science, National Institute of Technology Patna, Patna, India
| | - Budheswar Dehury
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Bhubaneswar, Odisha, India
| | - Sanghamitra Pati
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Bhubaneswar, Odisha, India
| |
Collapse
|
15
|
Sun R, Zheng P, Chen P, Wu D, Zheng J, Liu X, Hu Y. Enhancing the Catalytic Efficiency of D-lactonohydrolase through the Synergy of Tunnel Engineering, Evolutionary Analysis, and Force-Field Calculations. Chemistry 2024; 30:e202304164. [PMID: 38217521 DOI: 10.1002/chem.202304164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/11/2024] [Accepted: 01/12/2024] [Indexed: 01/15/2024]
Abstract
Computational design advances enzyme evolution and their use in biocatalysis in a faster and more efficient manner. In this study, a synergistic approach integrating tunnel engineering, evolutionary analysis, and force-field calculations has been employed to enhance the catalytic activity of D-lactonohydrolase (D-Lac), which is a pivotal enzyme involved in the resolution of racemic pantolactone during the production of vitamin B5. The best mutant, N96S/A271E/F274Y/F308G (M3), was obtained and its catalytic efficiency (kcat/KM) was nearly 23-fold higher than that of the wild-type. The M3 whole-cell converted 20 % of DL-pantolactone into D-pantoic acid (D-PA, >99 % e.e.) with a conversion rate of 47 % and space-time yield of 107.1 g L-1 h-1, demonstrating its great potential for industrial-scale D-pantothenic acid production. Molecular dynamics (MD) simulations revealed that the reduction in the steric hindrance within the substrate tunnel and conformational reconstruction of the distal loop resulted in a more favourable"catalytic" conformation, making it easier for the substrate and enzyme to enter their pre-reaction state. This study illustrates the potential of the distal residue on the pivotal loop at the entrance of the D-Lac substrate tunnel as a novel modification hotspot capable of reshaping energy patterns and consequently influencing the enzymatic activity.
Collapse
Affiliation(s)
- Ruobin Sun
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi, 214122, P. R. China
| | - Pu Zheng
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi, 214122, P. R. China
| | - Pengcheng Chen
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi, 214122, P. R. China
| | - Dan Wu
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi, 214122, P. R. China
| | - Jiangmei Zheng
- Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi, 214122, P. R. China
| | - Xueyu Liu
- Hangzhou Xinfu Technology Co., Ltd., Hangzhou, 311301, P. R. China
| | - Yunxiang Hu
- Hangzhou Xinfu Technology Co., Ltd., Hangzhou, 311301, P. R. China
| |
Collapse
|
16
|
Shen T, Liu F, Wang Z, Sun J, Bu Y, Meng J, Chen W, Yao K, Mu Y, Li W, Zhao G, Wang S, Wei Y, Zheng L. zPoseScore model for accurate and robust protein-ligand docking pose scoring in CASP15. Proteins 2023; 91:1837-1849. [PMID: 37606194 DOI: 10.1002/prot.26573] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 07/20/2023] [Accepted: 07/31/2023] [Indexed: 08/23/2023]
Abstract
We introduce a deep learning-based ligand pose scoring model called zPoseScore for predicting protein-ligand complexes in the 15th Critical Assessment of Protein Structure Prediction (CASP15). Our contributions are threefold: first, we generate six training and evaluation data sets by employing advanced data augmentation and sampling methods. Second, we redesign the "zFormer" module, inspired by AlphaFold2's Evoformer, to efficiently describe protein-ligand interactions. This module enables the extraction of protein-ligand paired features that lead to accurate predictions. Finally, we develop the zPoseScore framework with zFormer for scoring and ranking ligand poses, allowing for atomic-level protein-ligand feature encoding and fusion to output refined ligand poses and ligand per-atom deviations. Our results demonstrate excellent performance on various testing data sets, achieving Pearson's correlation R = 0.783 and 0.659 for ranking docking decoys generated based on experimental and predicted protein structures of CASF-2016 protein-ligand complexes. Additionally, we obtain an averaged local distance difference test (lDDT pli = 0.558) of AIchemy LIG2 in CASP15 for de novo protein-ligand complex structure predictions. Detailed analysis shows that accurate ligand binding site prediction and side-chain orientation are crucial for achieving better prediction performance. Our proposed model is one of the most accurate protein-ligand pose prediction models and could serve as a valuable tool in small molecule drug discovery.
Collapse
Affiliation(s)
- Tao Shen
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Fuxu Liu
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Zechen Wang
- School of Physics, Shandong University, Jinan, Shandong, China
| | - Jinyuan Sun
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Yifan Bu
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Jintao Meng
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Weihua Chen
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Keyi Yao
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, Shandong, China
| | - Guoping Zhao
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
| | - Yanjie Wei
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech Company Ltd., Shanghai, China
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| |
Collapse
|
17
|
Li S, Tian T, Zhang Z, Zou Z, Zhao D, Zeng J. PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst 2023; 14:692-705.e6. [PMID: 37516103 DOI: 10.1016/j.cels.2023.05.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 11/25/2022] [Accepted: 05/19/2023] [Indexed: 07/31/2023]
Abstract
Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define "anchors" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.
Collapse
Affiliation(s)
- Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing 100084, China; MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Ziheng Zou
- Silexon AI Technology, Nanjing, Jiangsu Province 210023, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
18
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
19
|
Rout M, Mishra S, Dey S, Singh MK, Dehury B, Pati S. Exploiting the potential of natural polyphenols as antivirals against monkeypox envelope protein F13 using machine learning and all-atoms MD simulations. Comput Biol Med 2023; 162:107116. [PMID: 37302336 PMCID: PMC10239311 DOI: 10.1016/j.compbiomed.2023.107116] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 05/12/2023] [Accepted: 05/30/2023] [Indexed: 06/13/2023]
Abstract
The re-emergence of monkeypox (MPX), in the era of COVID-19 pandemic is a new global menace. Regardless of its leniency, there are chances of MPX expediting severe health deterioration. The role of envelope protein, F13 as a critical component for production of extracellular viral particles makes it a crucial drug target. Polyphenols, exhibiting antiviral properties have been acclaimed as an effective alternative to the traditional treatment methods for management of viral diseases. To facilitate the development of potent MPX specific therapeutics, herein, we have employed state-of-the-art machine learning techniques to predict a highly accurate 3-dimensional structure of F13 as well as identify binding hotspots on the protein surface. Additionally, we have effectuated high-throughput virtual screening methodology on 57 potent natural polyphenols having antiviral activities followed by all-atoms molecular dynamics (MD) simulations, to substantiate the mode of interaction of F13 protein and polyphenol complexes. The structure-based virtual screening based on Glide SP, XP and MM/GBSA scores enables the selection of six potent polyphenols having higher binding affinity towards F13. Non-bonded contact analysis, of pre- and post- MD complexes propound the critical role of Glu143, Asp134, Asn345, Ser321 and Tyr320 residues in polyphenol recognition, which is well supported by per-residue decomposition analysis. Close-observation of the structural ensembles from MD suggests that the binding groove of F13 is mostly hydrophobic in nature. Taken together, this structure-based analysis from our study provides a lead on Myricetin, and Demethoxycurcumin, which may act as potent inhibitors of F13. In conclusion, our study provides new insights into the molecular recognition and dynamics of F13-polyphenol bound states, offering new promises for development of antivirals to combat monkeypox. However, further in vitro and in vivo experiments are necessary to validate these results.
Collapse
Affiliation(s)
- Madhusmita Rout
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Chandrasekharpur, Bhubaneswar, 751023, Odisha, India
| | - Sarbani Mishra
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Chandrasekharpur, Bhubaneswar, 751023, Odisha, India
| | - Suchanda Dey
- Biomics and Biodiversity Lab, Siksha 'O' Anusandhan (deemed to be) University, Kalinga Nagar, Ghatikia, Bhubaneswar, 751003, Odisha, India
| | - Mahender Kumar Singh
- Data Science Laboratory, National Brain Research Centre, Gurgaon, Haryana, 122052, India
| | - Budheswar Dehury
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Chandrasekharpur, Bhubaneswar, 751023, Odisha, India.
| | - Sanghamitra Pati
- Bioinformatics Division, ICMR-Regional Medical Research Centre, Nalco Square, Chandrasekharpur, Bhubaneswar, 751023, Odisha, India.
| |
Collapse
|
20
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
21
|
Julca I, Mutwil-Anderwald D, Manoj V, Khan Z, Lai SK, Yang LK, Beh IT, Dziekan J, Lim YP, Lim SK, Low YW, Lam YI, Tjia S, Mu Y, Tan QW, Nuc P, Choo LM, Khew G, Shining L, Kam A, Tam JP, Bozdech Z, Schmidt M, Usadel B, Kanagasundaram Y, Alseekh S, Fernie A, Li HY, Mutwil M. Genomic, transcriptomic, and metabolomic analysis of Oldenlandia corymbosa reveals the biosynthesis and mode of action of anti-cancer metabolites. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2023. [PMID: 36807520 DOI: 10.1111/jipb.13469] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 02/18/2023] [Indexed: 06/18/2023]
Abstract
Plants accumulate a vast array of secondary metabolites, which constitute a natural resource for pharmaceuticals. Oldenlandia corymbosa belongs to the Rubiaceae family, and has been used in traditional medicine to treat different diseases, including cancer. However, the active metabolites of the plant, their biosynthetic pathway and mode of action in cancer are unknown. To fill these gaps, we exposed this plant to eight different stress conditions and combined different omics data capturing gene expression, metabolic profiles, and anti-cancer activity. Our results show that O. corymbosa extracts are active against breast cancer cell lines and that ursolic acid is responsible for this activity. Moreover, we assembled a high-quality genome and uncovered two genes involved in the biosynthesis of ursolic acid. Finally, we also revealed that ursolic acid causes mitotic catastrophe in cancer cells and identified three high-confidence protein binding targets by Cellular Thermal Shift Assay (CETSA) and reverse docking. Altogether, these results constitute a valuable resource to further characterize the biosynthesis of active metabolites in the Oldenlandia group, while the mode of action of ursolic acid will allow us to further develop this valuable compound.
Collapse
Affiliation(s)
- Irene Julca
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | | | - Vaishnervi Manoj
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Zahra Khan
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Soak Kuan Lai
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Lay K Yang
- Shared Analytics, Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A*STAR), Singapore, 138671, Singapore
| | - Ing T Beh
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Jerzy Dziekan
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Yoon P Lim
- Department of Biochemistry, National University of Singapore, Singapore, 117596, Singapore
| | - Shen K Lim
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
- Department of Biochemistry, National University of Singapore, Singapore, 117596, Singapore
| | - Yee W Low
- Singapore Botanic Gardens, Singapore, 259569, Singapore
| | - Yuen I Lam
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Seth Tjia
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Qiao W Tan
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Przemyslaw Nuc
- Department of Gene Expression, Faculty of Biology, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Poznan, 61-614, Poland
| | - Le M Choo
- Singapore Botanic Gardens, Singapore, 259569, Singapore
| | - Gillian Khew
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
- Singapore Botanic Gardens, Singapore, 259569, Singapore
| | - Loo Shining
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Antony Kam
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - James P Tam
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Zbynek Bozdech
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | | | - Bjoern Usadel
- IBG-4 Bioinformatics, Forschungszentrum Jülich, Jülich, 52428, Germany
| | - Yoganathan Kanagasundaram
- Shared Analytics, Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A*STAR), Singapore, 138671, Singapore
| | - Saleh Alseekh
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam-Golm, 14476, Germany
- Center of Plant Systems Biology and Biotechnology, Plovdiv, 4000, Bulgaria
| | - Alisdair Fernie
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam-Golm, 14476, Germany
- Center of Plant Systems Biology and Biotechnology, Plovdiv, 4000, Bulgaria
| | - Hoi Y Li
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| |
Collapse
|
22
|
Li M, Wang Y, Guo C, Wang S, Zheng L, Bu Y, Ding K. The claim of primacy of human gut Bacteroides ovatus in dietary cellobiose degradation. Gut Microbes 2023; 15:2227434. [PMID: 37349961 PMCID: PMC10291918 DOI: 10.1080/19490976.2023.2227434] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 06/14/2023] [Indexed: 06/24/2023] Open
Abstract
A demonstration of cellulose degrading bacterium from human gut changed our view that human cannot degrade the cellulose. However, investigation of cellulose degradation by human gut microbiota on molecular level has not been completed so far. We showed here, using cellobiose as a model that promoted the growth of human gut key members, such as Bacteroides ovatus (BO), to clarify the molecular mechanism. Our results showed that a new polysaccharide utilization locus (PUL) from BO was involved in the cellobiose capturing and degradation. Further, two new cellulases BACOVA_02626GH5 and BACOVA_02630GH5 on the cell surface performed the degradation of cellobiose into glucose were determined. The predicted structures of BACOVA_02626GH5 and BACOVA_02630GH5 were highly homologous with the cellulase from soil bacteria, and the catalytic residues were highly conservative with two glutamate residues. In murine experiment, we observed cellobiose reshaped the composition of gut microbiota and probably modified the metabolic function of bacteria. Taken together, our findings further highlight the evidence of cellulose can be degraded by human gut microbes and provide new insight in the field of investigation on cellulose.
Collapse
Affiliation(s)
- Meixia Li
- Glycochemistry and Glycobiology Lab, Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, P. R. China
| | - Yeqing Wang
- Glycochemistry and Glycobiology Lab, Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, P. R. China
| | - Ciliang Guo
- Glycochemistry and Glycobiology Lab, Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, P. R. China
- University of Chinese Academy of Science, Beijing, P. R. China
| | | | | | - Yifan Bu
- Zelixir Biotech, Shanghai, P. R. China
| | - Kan Ding
- Glycochemistry and Glycobiology Lab, Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, P. R. China
- University of Chinese Academy of Science, Beijing, P. R. China
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Science, SSIP Healthcare and Medicine Demonstration Zone, Zhongshan, P. R. China
| |
Collapse
|
23
|
Dehnavi A, Nazem F, Ghasemi F, Fassihi A, Rasti R. A GU-Net-based architecture predicting ligand–Protein-binding atoms. JOURNAL OF MEDICAL SIGNALS & SENSORS 2023. [DOI: 10.4103/jmss.jmss_142_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
24
|
Wu M, Zhang Y. Combining bioinformatics, network pharmacology and artificial intelligence to predict the mechanism of celastrol in the treatment of type 2 diabetes. Front Endocrinol (Lausanne) 2022; 13:1030278. [PMID: 36339449 PMCID: PMC9627222 DOI: 10.3389/fendo.2022.1030278] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Accepted: 10/03/2022] [Indexed: 11/13/2022] Open
Abstract
Background Type 2 diabetes (T2D) is a common chronic disease with many serious complications. Celastrol can prevent and treat type 2 diabetes by reversing insulin resistance in a number of ways. However, the specific mechanisms by which celastrol prevents and treats T2D are not well understood. The aim of this study was to explore the key gene targets and potential signaling pathway mechanisms of celastrol for the treatment of T2D. Methods GSE184050 was downloaded from the Gene Expression Omnibus online database. Blood samples from patients and healthy individuals with T2D were analyzed to identify differentially expressed genes (DEGs), and a protein-protein interaction network (PPI) was constructed. Key gene analysis of DEGs was performed using the MCODE plugin in Cystoscope as well as the Hubba plugin, and intersections were taken to obtain hub genes, which were displayed using a Venn diagram. Enrichment analysis was then performed via the ClueGo plugin in Cytoscape and validated using Gene Set Enrichment Analysis. The therapeutic targets of celastrol were then analyzed by pharmacophore network pharmacology, intersected to identify the therapeutic targets of celastrol, enriched for all targets, and intersected to obtain the signaling pathways for celastrol treatment. The protein structures of the therapeutic targets were predicted using the artificial intelligence AlphaFold2. Finally, molecular docking was used to verify whether celastrol could be successfully docked to the predicted targets. Results 618 DEGs were obtained, and 9 hub genes for T2D were identified by the MCODE and Hubba plug-ins, including ADAMTS15, ADAMTS7, ADAMTSL1, SEMA5B, ADAMTS8, THBS2, HBB, HBD and HBG2. The DEG-enriched signaling pathways mainly included the ferroptosis and TGF-beta signaling pathways. A total of 228 target genes were annotated by pharmacophore target analysis, and the therapeutic targets were identified, including S100A11, RBP3, HBB, BMP7 and IQUB, and 9 therapeutic signaling pathways were obtained by an intersectional set. The protein structures of the therapeutic targets were successfully predicted by AlphaFold2, and docking was validated using molecular docking. Conclusion Celastrol may prevent and treat T2D through key target genes, such as HBB, as well as signaling pathways, such as the TGF-beta signaling pathway and type II diabetes mellitus.
Collapse
Affiliation(s)
- Ming Wu
- Postgraduate Training Base in Shanghai Gongli Hospital, Ningxia Medical University, Shanghai, China
| | - Yan Zhang
- Department of Orthopedics, Gongli Hospital of Pudong New Area, Shanghai, China
| |
Collapse
|
25
|
Eguida M, Rognan D. Estimating the Similarity between Protein Pockets. Int J Mol Sci 2022; 23:12462. [PMID: 36293316 PMCID: PMC9604425 DOI: 10.3390/ijms232012462] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 10/15/2022] [Accepted: 10/16/2022] [Indexed: 10/28/2023] Open
Abstract
With the exponential increase in publicly available protein structures, the comparison of protein binding sites naturally emerged as a scientific topic to explain observations or generate hypotheses for ligand design, notably to predict ligand selectivity for on- and off-targets, explain polypharmacology, and design target-focused libraries. The current review summarizes the state-of-the-art computational methods applied to pocket detection and comparison as well as structural druggability estimates. The major strengths and weaknesses of current pocket descriptors, alignment methods, and similarity search algorithms are presented. Lastly, an exhaustive survey of both retrospective and prospective applications in diverse medicinal chemistry scenarios illustrates the capability of the existing methods and the hurdle that still needs to be overcome for more accurate predictions.
Collapse
Affiliation(s)
| | - Didier Rognan
- Laboratoire d’Innovation Thérapeutique, UMR7200 CNRS-Université de Strasbourg, 67400 Illkirch, France
| |
Collapse
|