1
|
Xue B, Xu Y, Huang R, Zhu Q. Novel target identification towards drug repurposing based on biological activity profiles. PLoS One 2025; 20:e0319865. [PMID: 40327632 PMCID: PMC12054903 DOI: 10.1371/journal.pone.0319865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 02/09/2025] [Indexed: 05/08/2025] Open
Abstract
Rare diseases affect more than 30 million individuals, with the majority facing limited treatment options, elevating the urgency to innovative therapeutic solutions. Addressing these medical challenges necessitates an exploration of novel treatment modalities. Among these, drug repurposing emerges as a promising avenue, offering both potential and risk mitigation. To achieve this goal, we primarily focused on developing predictive models that harness cutting-edge computational techniques to uncover latent relationships between gene targets and chemical compounds towards drug repurposing. Building upon our previous investigation, where we successfully identified gene targets for compounds from the Tox21 in vitro assays, our endeavor expanded to a systematic prediction of potential targets for drug repurposing employing machine learning models built on diverse algorithms such as Support Vector Classifier, K-Nearest Neighbors, Random Forest, and Extreme Gradient Boosting. These models were trained on comprehensive biological activity profile data to predict the relationship between 143 gene targets and over 6000 compounds. Our models demonstrated high accuracy (>0.75), with predictions further validated by using public experimental datasets. Furthermore, several findings were evaluated via case studies. By elucidating these connections, we aim to streamline the drug repurposing process, ultimately catalyzing the discovery of more effective therapeutic interventions for rare diseases.
Collapse
Affiliation(s)
- Binghan Xue
- Division of Rare Disease Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| | - Yanji Xu
- Division of Rare Disease Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| | - Ruili Huang
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| | - Qian Zhu
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| |
Collapse
|
2
|
Wang L, Ji B, Zhai J, Wang J. Advancing promiscuous aggregating inhibitor analysis with intelligent machine learning classification. Brief Bioinform 2025; 26:bbaf205. [PMID: 40329861 PMCID: PMC12056367 DOI: 10.1093/bib/bbaf205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2025] [Revised: 03/09/2025] [Accepted: 03/28/2025] [Indexed: 05/08/2025] Open
Abstract
Small molecules have been playing a crucial role in drug discovery; however, some exhibit nonspecific inhibitory effects during hit screening due to the formation of colloidal aggregators. Such false positives often lead to significant research costs and time investment. Therefore, to identify potential aggregating compounds efficiently and accurately at an early stage of drug discovery, we employed several machine learning techniques to develop classification models for identifying promiscuous aggregating inhibitors. Using a training dataset of 10 000 aggregators and 10 000 nonaggregators, models were trained by combining four different molecular representations with various machine learning algorithms. We found that the best-performing model is the one that employs path-based FP2 fingerprints in conjunction with the cubic support vector machine algorithm, which achieved the highest accuracy and area under the receiver operating characteristic curve values for both the validation and test datasets while maintaining high sensitivity and specificity levels (>0.93). Additionally, we have proposed a new model interpretation method, global sensitivity analysis (GSA), to complement the well-recognized SHapley Additive exPlanations analysis. Several comparative studies have shown that GSA is a time-efficient and accurate approach for identifying crucial descriptors that contribute to model prediction, especially in the scenario where the dataset contains a substantial number of data entries with a limited set of descriptors. Our models as well as GSA findings can provide useful guidance on screening library design to minimize false positives.
Collapse
Affiliation(s)
- Luxuan Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St., Pittsburgh, PA 15261, United States
| | - Beihong Ji
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St., Pittsburgh, PA 15261, United States
| | - Jingchen Zhai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St., Pittsburgh, PA 15261, United States
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St., Pittsburgh, PA 15261, United States
| |
Collapse
|
3
|
Kairys V, Baranauskiene L, Kazlauskiene M, Zubrienė A, Petrauskas V, Matulis D, Kazlauskas E. Recent advances in computational and experimental protein-ligand affinity determination techniques. Expert Opin Drug Discov 2024; 19:649-670. [PMID: 38715415 DOI: 10.1080/17460441.2024.2349169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
INTRODUCTION Modern drug discovery revolves around designing ligands that target the chosen biomolecule, typically proteins. For this, the evaluation of affinities of putative ligands is crucial. This has given rise to a multitude of dedicated computational and experimental methods that are constantly being developed and improved. AREAS COVERED In this review, the authors reassess both the industry mainstays and the newest trends among the methods for protein - small-molecule affinity determination. They discuss both computational affinity predictions and experimental techniques, describing their basic principles, main limitations, and advantages. Together, this serves as initial guide to the currently most popular and cutting-edge ligand-binding assays employed in rational drug design. EXPERT OPINION The affinity determination methods continue to develop toward miniaturization, high-throughput, and in-cell application. Moreover, the availability of data analysis tools has been constantly increasing. Nevertheless, cross-verification of data using at least two different techniques and careful result interpretation remain of utmost importance.
Collapse
Affiliation(s)
- Visvaldas Kairys
- Department of Bioinformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lina Baranauskiene
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | | | - Asta Zubrienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Vytautas Petrauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
4
|
Wang L, He X, Ji B, Han F, Niu T, Cai L, Zhai J, Hao D, Wang J. Geometry Optimization Algorithms in Conjunction with the Machine Learning Potential ANI-2x Facilitate the Structure-Based Virtual Screening and Binding Mode Prediction. Biomolecules 2024; 14:648. [PMID: 38927052 PMCID: PMC11201553 DOI: 10.3390/biom14060648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 05/27/2024] [Accepted: 05/29/2024] [Indexed: 06/28/2024] Open
Abstract
Structure-based virtual screening utilizes molecular docking to explore and analyze ligand-macromolecule interactions, crucial for identifying and developing potential drug candidates. Although there is availability of several widely used docking programs, the accurate prediction of binding affinity and binding mode still presents challenges. In this study, we introduced a novel protocol that combines our in-house geometry optimization algorithm, the conjugate gradient with backtracking line search (CG-BS), which is capable of restraining and constraining rotatable torsional angles and other geometric parameters with a highly accurate machine learning potential, ANI-2x, renowned for its precise molecular energy predictions reassembling the wB97X/6-31G(d) model. By integrating this protocol with binding pose prediction using the Glide, we conducted additional structural optimization and potential energy prediction on 11 small molecule-macromolecule and 12 peptide-macromolecule systems. We observed that ANI-2x/CG-BS greatly improved the docking power, not only optimizing binding poses more effectively, particularly when the RMSD of the predicted binding pose by Glide exceeded around 5 Å, but also achieving a 26% higher success rate in identifying those native-like binding poses at the top rank compared to Glide docking. As for the scoring and ranking powers, ANI-2x/CG-BS demonstrated an enhanced performance in predicting and ranking hundreds or thousands of ligands over Glide docking. For example, Pearson's and Spearman's correlation coefficients remarkedly increased from 0.24 and 0.14 with Glide docking to 0.85 and 0.69, respectively, with the addition of ANI-2x/CG-BS for optimizing and ranking small molecules binding to the bacterial ribosomal aminoacyl-tRNA receptor. These results suggest that ANI-2x/CG-BS holds considerable potential for being integrated into virtual screening pipelines due to its enhanced docking performance.
Collapse
Affiliation(s)
- Luxuan Wang
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (L.W.); (X.H.); (B.J.); (F.H.); (T.N.); (L.C.); (J.Z.)
| | - Xibing He
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (L.W.); (X.H.); (B.J.); (F.H.); (T.N.); (L.C.); (J.Z.)
| | - Beihong Ji
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (L.W.); (X.H.); (B.J.); (F.H.); (T.N.); (L.C.); (J.Z.)
| | - Fengyang Han
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (L.W.); (X.H.); (B.J.); (F.H.); (T.N.); (L.C.); (J.Z.)
| | - Taoyu Niu
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (L.W.); (X.H.); (B.J.); (F.H.); (T.N.); (L.C.); (J.Z.)
| | - Lianjin Cai
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (L.W.); (X.H.); (B.J.); (F.H.); (T.N.); (L.C.); (J.Z.)
| | - Jingchen Zhai
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (L.W.); (X.H.); (B.J.); (F.H.); (T.N.); (L.C.); (J.Z.)
| | - Dongxiao Hao
- School of Electronics and Information Engineering, Ankang University, Ankang 725000, China
| | - Junmei Wang
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (L.W.); (X.H.); (B.J.); (F.H.); (T.N.); (L.C.); (J.Z.)
| |
Collapse
|
5
|
Guo L, Wang J. GSScore: a novel Graphormer-based shell-like scoring method for protein-ligand docking. Brief Bioinform 2024; 25:bbae201. [PMID: 38706316 PMCID: PMC11070652 DOI: 10.1093/bib/bbae201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 02/05/2024] [Accepted: 04/16/2024] [Indexed: 05/07/2024] Open
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery. But due to the complexity and high cost of experimental methods, there is a great demand for computational approaches to recognize PLI patterns, such as protein-ligand docking. In recent years, more and more models based on machine learning have been developed to directly predict the root mean square deviation (RMSD) of a ligand docking pose with reference to its native binding pose. However, new scoring methods are pressingly needed in methodology for more accurate RMSD prediction. We present a new deep learning-based scoring method for RMSD prediction of protein-ligand docking poses based on a Graphormer method and Shell-like graph architecture, named GSScore. To recognize near-native conformations from a set of poses, GSScore takes atoms as nodes and then establishes the docking interface of protein-ligand into multiple bipartite graphs within different shell ranges. Benefiting from the Graphormer and Shell-like graph architecture, GSScore can effectively capture the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses without extra information. GSScore was extensively evaluated on diverse test sets including a subset of PDBBind version 2019, CASF2016 as well as DUD-E, and obtained significant improvements over existing methods in terms of RMSE, $R$ (Pearson correlation coefficient), Spearman correlation coefficient and Docking power.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| |
Collapse
|
6
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
7
|
Cai L, Han F, Ji B, He X, Wang L, Niu T, Zhai J, Wang J. In Silico Screening of Natural Flavonoids against 3-Chymotrypsin-like Protease of SARS-CoV-2 Using Machine Learning and Molecular Modeling. Molecules 2023; 28:8034. [PMID: 38138524 PMCID: PMC10745665 DOI: 10.3390/molecules28248034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 11/30/2023] [Accepted: 12/07/2023] [Indexed: 12/24/2023] Open
Abstract
The "Long-COVID syndrome" has posed significant challenges due to a lack of validated therapeutic options. We developed a novel multi-step virtual screening strategy to reliably identify inhibitors against 3-chymotrypsin-like protease of SARS-CoV-2 from abundant flavonoids, which represents a promising source of antiviral and immune-boosting nutrients. We identified 57 interacting residues as contributors to the protein-ligand binding pocket. Their energy interaction profiles constituted the input features for Machine Learning (ML) models. The consensus of 25 classifiers trained using various ML algorithms attained 93.9% accuracy and a 6.4% false-positive-rate. The consensus of 10 regression models for binding energy prediction also achieved a low root-mean-square error of 1.18 kcal/mol. We screened out 120 flavonoid hits first and retained 50 drug-like hits after predefined ADMET filtering to ensure bioavailability and safety profiles. Furthermore, molecular dynamics simulations prioritized nine bioactive flavonoids as promising anti-SARS-CoV-2 agents exhibiting both high structural stability (root-mean-square deviation < 5 Å for 218 ns) and low MM/PBSA binding free energy (<-6 kcal/mol). Among them, KB-2 (PubChem-CID, 14630497) and 9-O-Methylglyceofuran (PubChem-CID, 44257401) displayed excellent binding affinity and desirable pharmacokinetic capabilities. These compounds have great potential to serve as oral nutraceuticals with therapeutic and prophylactic properties as care strategies for patients with long-COVID syndrome.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Junmei Wang
- School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (L.C.); (F.H.); (B.J.); (X.H.); (L.W.); (T.N.); (J.Z.)
| |
Collapse
|
8
|
Ge H, Ji B, Fang J, Wang J, Li J, Wang J. Discovery of Potent and Selective CB2 Agonists Utilizing a Function-Based Computational Screening Protocol. ACS Chem Neurosci 2023; 14:3941-3958. [PMID: 37823773 PMCID: PMC10623575 DOI: 10.1021/acschemneuro.3c00580] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 09/22/2023] [Indexed: 10/13/2023] Open
Abstract
Nowadays, the identification of agonists and antagonists represents a great challenge in computer-aided drug design. In this work, we developed a computational protocol enabling us to design/screen novel chemicals that are likely to serve as selective CB2 agonists. The principle of this protocol is that by calculating the ligand-residue interaction profile (LRIP) of a ligand binding to a specific target, the agonist-antagonist function of a compound is then able to be determined after statistical analysis and free energy calculations. This computational protocol was successfully applied in CB2 agonist development starting from a lead compound, and a success rate of 70% was achieved. The functions of the synthesized derivatives were determined by in vitro functional assays. Moreover, the identified potent CB2 agonists and antagonists strongly interact with the key residues identified using the already known potent CB2 agonists/antagonists. The analysis of the interaction profile of compound 6, a potent agonist, showed strong interactions with F2.61, I186, and F2.64, while compound 39, a potent antagonist, showed strong interactions with L17, W6.48, V6.51, and C7.42. Still, some residues including V3.32, T3.33, S7.39, F183, W5.43, and I3.29 are hotspots for both CB2 agonists and antagonists. More significantly, we identified three hotspot residues in the loop, including I186 for agonists, L17 for antagonists, and F183 for both. These hotspot residues are typically not considered in CB1/CB2 rational ligand design. In conclusion, LRIP is a useful concept in rationally designing a compound to possess a certain function.
Collapse
Affiliation(s)
- Haixia Ge
- School
of Life Sciences, Huzhou University, Huzhou 313000, China
| | - Beihong Ji
- Department
of Pharmaceutical Sciences and Computational Chemical Genomics Screening
Center, School of Pharmacy, University of
Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Jiahui Fang
- Chinese
Academy of Sciences Key Laboratory of Receptor Research, National
Center for Drug Screening, Shanghai Institute
of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Jiayang Wang
- School
of Life Sciences, Huzhou University, Huzhou 313000, China
| | - Jing Li
- Chinese
Academy of Sciences Key Laboratory of Receptor Research, National
Center for Drug Screening, Shanghai Institute
of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Junmei Wang
- Department
of Pharmaceutical Sciences and Computational Chemical Genomics Screening
Center, School of Pharmacy, University of
Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
9
|
Guo L, Qiu T, Wang J. ViTScore: A Novel Three-Dimensional Vision Transformer Method for Accurate Prediction of Protein-Ligand Docking Poses. IEEE Trans Nanobioscience 2023; 22:734-743. [PMID: 37159314 DOI: 10.1109/tnb.2023.3274640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery, and due to the complexity and high cost of experimental methods, there is a great demand for computational approaches, such as protein-ligand docking, to decipher PLI patterns. One of the most challenging aspects of protein-ligand docking is to identify near-native conformations from a set of poses, but traditional scoring functions still have limited accuracy. Therefore, new scoring methods are urgently needed for methodological and/or practical implications. We present a novel deep learning-based scoring function for ranking protein-ligand docking poses based on Vision Transformer (ViT), named ViTScore. To recognize near-native poses from a set of poses, ViTScore voxelizes the protein-ligand interactional pocket into a 3D grid labeled by the occupancy contribution of atoms in different physicochemical classes. This allows ViTScore to capture the subtle differences between spatially and energetically favorable near-native poses and unfavorable non-native poses without needing extra information. After that, ViTScore will output the prediction of the root mean square deviation (rmsd) of a docking pose with reference to the native binding pose. ViTScore is extensively evaluated on diverse test sets including PDBbind2019 and CASF2016, and obtains significant improvements over existing methods in terms of RMSE, R and docking power. Moreover, the results demonstrate that ViTScore is a promising scoring function for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Furthermore, the results suggest that ViTScore is a powerful tool for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Additionally, ViTScore can be used to identify potential drug targets and to design new drugs with improved efficacy and safety.
Collapse
|
10
|
Zhang X, Shen C, Wang T, Deng Y, Kang Y, Li D, Hou T, Pan P. ML-PLIC: a web platform for characterizing protein-ligand interactions and developing machine learning-based scoring functions. Brief Bioinform 2023; 24:bbad295. [PMID: 37738401 DOI: 10.1093/bib/bbad295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/17/2023] [Accepted: 07/31/2023] [Indexed: 09/24/2023] Open
Abstract
Cracking the entangling code of protein-ligand interaction (PLI) is of great importance to structure-based drug design and discovery. Different physical and biochemical representations can be used to describe PLI such as energy terms and interaction fingerprints, which can be analyzed by machine learning (ML) algorithms to create ML-based scoring functions (MLSFs). Here, we propose the ML-based PLI capturer (ML-PLIC), a web platform that automatically characterizes PLI and generates MLSFs to identify the potential binders of a specific protein target through virtual screening (VS). ML-PLIC comprises five modules, including Docking for ligand docking, Descriptors for PLI generation, Modeling for MLSF training, Screening for VS and Pipeline for the integration of the aforementioned functions. We validated the MLSFs constructed by ML-PLIC in three benchmark datasets (Directory of Useful Decoys-Enhanced, Active as Decoys and TocoDecoy), demonstrating accuracy outperforming traditional docking tools and competitive performance to the deep learning-based SF, and provided a case study of the Serine/threonine-protein kinase WEE1 in which MLSFs were developed by using the ML-based VS pipeline in ML-PLIC. Underpinning the latest version of ML-PLIC is a powerful platform that incorporates physical and biological knowledge about PLI, leveraging PLI characterization and MLSF generation into the design of structure-based VS pipeline. The ML-PLIC web platform is now freely available at http://cadd.zju.edu.cn/plic/.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
11
|
Kim Y, Yoon T, Park WB, Na S. Predicting mechanical properties of silk from its amino acid sequences via machine learning. J Mech Behav Biomed Mater 2023; 140:105739. [PMID: 36871478 DOI: 10.1016/j.jmbbm.2023.105739] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 02/12/2023] [Accepted: 02/21/2023] [Indexed: 02/25/2023]
Abstract
The silk fiber is increasingly being sought for its superior mechanical properties, biocompatibility, and eco-friendliness, making it promising as a base material for various applications. One of the characteristics of protein fibers, such as silk, is that their mechanical properties are significantly dependent on the amino acid sequence. Numerous studies have been conducted to determine the specific relationship between the amino acid sequence of silk and its mechanical properties. Still, the relationship between the amino acid sequence of silk and its mechanical properties is yet to be clarified. Other fields have adopted machine learning (ML) to establish a relationship between the inputs, such as the ratio of different input material compositions and the resulting mechanical properties. We have proposed a method to convert the amino acid sequence into numerical values for input and succeeded in predicting the mechanical properties of silk from its amino acid sequences. Our study sheds light on predicting mechanical properties of silk fiber from respective amino acid sequences.
Collapse
|
12
|
Structure-based virtual screening of chemical libraries as potential MELK inhibitors and their therapeutic evaluation against breast cancer. Chem Biol Interact 2023; 376:110443. [PMID: 36893906 DOI: 10.1016/j.cbi.2023.110443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 02/15/2023] [Accepted: 03/04/2023] [Indexed: 03/09/2023]
Abstract
New targeted therapy for triple negative breast cancer (TNBC) is an urgent need, as advanced disease responds poorly to conventional chemotherapy. Genomic and proteomic studies are currently investigating new genes and proteins as promising therapeutic targets. One of such therapeutic targets is a cell cycle regulatory kinase; Maternal Embryonic Leucine Zipper Kinase (MELK), overexpressed in TNBC and correlated with cancer development. We performed molecular docking for virtual screening of chemical libraries (phytochemicals/synthetic drugs) against MELK protein structure and identified 8 phytoconstituents (isoxanthorin, emodin, gamma-coniceine, quercetin, tenuazonic acid, isoliquiritigenin, kaempferol, and Nobiletin) and 8 synthetic drugs (tetrahydrofolic acid, alfuzosin, lansoprazole, ketorolac, ketoprofen, variolin B, orantinib, and firestein) as potential hits interacting with the active site residues of MELK based on bound poses, hydrogen bond, hydrophobic interactions and MM/GBSA binding free energies. ADME and drug-likeness prediction further identified few hits with high drug-likeness properties and were further tested for anti-tumorigenic potential. Two phytochemicals isoliquiritigenin and emodin demonstrated growth inhibitory effects on TNBC MDA-MB-231 cells while much lower effect was observed on non-tumorigenic MCF-10A mammary epithelial cells. Treatment with both molecules downregulated MELK expression, induced cell cycle arrest, accumulated DNA damage and enhanced apoptosis. The study identified isoliquiritigenin and emodin as potential MELK inhibitors and provides a basis for subsequent experimental validation and drug development against cancer.
Collapse
|
13
|
Wang Z, Zheng L, Wang S, Lin M, Wang Z, Kong AWK, Mu Y, Wei Y, Li W. A fully differentiable ligand pose optimization framework guided by deep learning and a traditional scoring function. Brief Bioinform 2023; 24:6887112. [PMID: 36502369 DOI: 10.1093/bib/bbac520] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 10/17/2022] [Accepted: 10/31/2022] [Indexed: 12/14/2022] Open
Abstract
The recently reported machine learning- or deep learning-based scoring functions (SFs) have shown exciting performance in predicting protein-ligand binding affinities with fruitful application prospects. However, the differentiation between highly similar ligand conformations, including the native binding pose (the global energy minimum state), remains challenging that could greatly enhance the docking. In this work, we propose a fully differentiable, end-to-end framework for ligand pose optimization based on a hybrid SF called DeepRMSD+Vina combined with a multi-layer perceptron (DeepRMSD) and the traditional AutoDock Vina SF. The DeepRMSD+Vina, which combines (1) the root mean square deviation (RMSD) of the docking pose with respect to the native pose and (2) the AutoDock Vina score, is fully differentiable; thus is capable of optimizing the ligand binding pose to the energy-lowest conformation. Evaluated by the CASF-2016 docking power dataset, the DeepRMSD+Vina reaches a success rate of 94.4%, which outperforms most reported SFs to date. We evaluated the ligand conformation optimization framework in practical molecular docking scenarios (redocking and cross-docking tasks), revealing the high potentialities of this framework in drug design and discovery. Structural analysis shows that this framework has the ability to identify key physical interactions in protein-ligand binding, such as hydrogen-bonding. Our work provides a paradigm for optimizing ligand conformations based on deep learning algorithms. The DeepRMSD+Vina model and the optimization framework are available at GitHub repository https://github.com/zchwang/DeepRMSD-Vina_Optimization.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, Jinan, Shandong 250100, China
| | - Liangzhen Zheng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Mingzhi Lin
- Shanghai Zelixir Biotech Company Ltd., Shanghai 200030, China
| | - Zhihao Wang
- School of Physics, Shandong University, Jinan, Shandong 250100, China
| | - Adams Wai-Kin Kong
- Rolls-Royce Corporate Lab, Nanyang Technological University, Singapore 637551, Singapore
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, Shandong 250100, China
| |
Collapse
|
14
|
Qu X, Dong L, Zhang J, Si Y, Wang B. Systematic Improvement of the Performance of Machine Learning Scoring Functions by Incorporating Features of Protein-Bound Water Molecules. J Chem Inf Model 2022; 62:4369-4379. [PMID: 36083808 DOI: 10.1021/acs.jcim.2c00916] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Water molecules at the ligand-protein interfaces play crucial roles in the binding of the ligands, but the behavior of protein-bound water is largely ignored in many currently used machine learning (ML)-based scoring functions (SFs). In an attempt to improve the prediction performance of existing ML-based SFs, we estimated the water distribution with a HydraMap (HM) method and then incorporated the features extracted from protein-bound waters obtained in this way into three ML-based SFs: RF-Score, ECIF, and PLEC. It was found that a combination of HM-based features can consistently improve the performance of all three SFs, including their scoring, ranking, and docking power. HydraMap-based features show consistently good performance with both crystal structures and docked structures, demonstrating their robustness for SFs. Overall, HM-based features, which are a statistical representation of hydration sites at protein-ligand interfaces, are expected to improve the prediction performance for diverse SFs.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Jinyan Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| |
Collapse
|
15
|
Small dataset solves big problem: An outlier-insensitive binary classifier for inhibitory potency prediction. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109242] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
16
|
Zhai J, He X, Sun Y, Wan Z, Ji B, Liu S, Li S, Wang J. In silico binding affinity prediction for metabotropic glutamate receptors using both endpoint free energy methods and a machine learning-based scoring function. Phys Chem Chem Phys 2022; 24:18291-18305. [PMID: 35880533 PMCID: PMC9460939 DOI: 10.1039/d2cp01727j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Metabotropic glutamate receptors (mGluRs) play an important role in regulating glutamate signal pathways, which are involved in neuropathy and periphery homeostasis. mGluR4, which belongs to Group III mGluRs, is most widely distributed in the periphery among all the mGluRs. It has been proved that the regulation of this receptor is involved in diabetes, colorectal carcinoma and many other diseases. However, the application of structure-based drug design to identify small molecules to regulate the mGluR4 receptor is limited due to the absence of a resolved mGluR4 protein structure. In this work, we first built a homology model of mGluR4 based on a crystal structure of mGluR8, and then conducted hierarchical virtual screening (HVS) to identify possible active ligands for mGluR4. The HVS protocol consists of three hierarchical filters including Glide docking, molecular dynamic (MD) simulation and binding free energy calculation. We successfully prioritized active ligands of mGluR4 from a set of screening compounds using HVS. The predicted active ligands based on binding affinities can almost cover all the experiment-determined active ligands, with only one ligand missed. The correlation between the measured and predicted binding affinities is significantly improved for the MM-PB/GBSA-WSAS methods compared to the Glide docking method. More importantly, we have identified hotspots for ligand binding, and we found that SER157 and GLY158 tend to contribute to the selectivity of mGluR4 ligands, while ALA154 and ALA155 could account for the ligand selectivity to mGluR8. We also recognized other 5 key residues that are critical for ligand potency. The difference of the binding profiles between mGluR4 and mGluR8 can guide us to develop more potent and selective modulators. Moreover, we evaluated the performance of IPSF, a novel type of scoring function trained by a machine learning algorithm on residue-ligand interaction profiles, in guiding drug lead optimization. The cross-validation root-mean-square errors (RMSEs) are much smaller than those by the endpoint methods, and the correlation coefficients are comparable to the best endpoint methods for both mGluRs. Thus, machine learning-based IPSF can be applied to guide lead optimization, albeit the total number of actives/inactives are not big, a typical scenario in drug discovery projects.
Collapse
Affiliation(s)
- Jingchen Zhai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| | - Xibing He
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| | - Yuchen Sun
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| | - Zhuoya Wan
- Department of Pharmaceutical Sciences and Center for Pharmacogenetics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Beihong Ji
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| | - Shuhan Liu
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| | - Song Li
- Department of Pharmaceutical Sciences and Center for Pharmacogenetics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| |
Collapse
|
17
|
Yang ZY, Ye ZF, Xiao YJ, Hsieh CY, Zhang SY. SPLDExtraTrees: robust machine learning approach for predicting kinase inhibitor resistance. Brief Bioinform 2022; 23:6543900. [PMID: 35262669 DOI: 10.1093/bib/bbac050] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 01/17/2022] [Accepted: 01/31/2022] [Indexed: 12/25/2022] Open
Abstract
Drug resistance is a major threat to the global health and a significant concern throughout the clinical treatment of diseases and drug development. The mutation in proteins that is related to drug binding is a common cause for adaptive drug resistance. Therefore, quantitative estimations of how mutations would affect the interaction between a drug and the target protein would be of vital significance for the drug development and the clinical practice. Computational methods that rely on molecular dynamics simulations, Rosetta protocols, as well as machine learning methods have been proven to be capable of predicting ligand affinity changes upon protein mutation. However, the severely limited sample size and heavy noise induced overfitting and generalization issues have impeded wide adoption of machine learning for studying drug resistance. In this paper, we propose a robust machine learning method, termed SPLDExtraTrees, which can accurately predict ligand binding affinity changes upon protein mutation and identify resistance-causing mutations. Especially, the proposed method ranks training data following a specific scheme that starts with easy-to-learn samples and gradually incorporates harder and diverse samples into the training, and then iterates between sample weight recalculations and model updates. In addition, we calculate additional physics-based structural features to provide the machine learning model with the valuable domain knowledge on proteins for these data-limited predictive tasks. The experiments substantiate the capability of the proposed method for predicting kinase inhibitor resistance under three scenarios and achieve predictive accuracy comparable with that of molecular dynamics and Rosetta methods with much less computational costs.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China
| | - Zhao-Feng Ye
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China
| | - Yi-Jia Xiao
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China.,Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China
| | - Sheng-Yu Zhang
- Tencent Quantum Laboratory, Shenzhen, 518057, Guangdong, China
| |
Collapse
|
18
|
Wang DD, Chan MT, Yan H. Structure-based protein-ligand interaction fingerprints for binding affinity prediction. Comput Struct Biotechnol J 2021; 19:6291-6300. [PMID: 34900139 PMCID: PMC8637032 DOI: 10.1016/j.csbj.2021.11.018] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/09/2021] [Accepted: 11/13/2021] [Indexed: 11/17/2022] Open
Abstract
Binding affinity prediction (BAP) using protein–ligand complex structures is crucial to computer-aided drug design, but remains a challenging problem. To achieve efficient and accurate BAP, machine-learning scoring functions (SFs) based on a wide range of descriptors have been developed. Among those descriptors, protein–ligand interaction fingerprints (IFPs) are competitive due to their simple representations, elaborate profiles of key interactions and easy collaborations with machine-learning algorithms. In this paper, we have adopted a building-block-based taxonomy to review a broad range of IFP models, and compared representative IFP-based SFs in target-specific and generic scoring tasks. Atom-pair-counts-based and substructure-based IFPs show great potential in these tasks.
Collapse
Affiliation(s)
- Debby D Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, 516 Jungong Rd, Shanghai 200093, China
| | - Moon-Tong Chan
- School of Science and Technology, Hong Kong Metropolitan University, 30 Good Shepherd St, Ho Man Tin, Hong Kong
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| |
Collapse
|
19
|
Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guoli Xiong
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shao Liu
- Department of Pharmacy Xiangya Hospital, Central South University Changsha China
| | - Aiping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis Xiangya Hospital, Central South University Changsha China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| |
Collapse
|
20
|
Bao J, He X, Zhang JZH. DeepBSP-a Machine Learning Method for Accurate Prediction of Protein-Ligand Docking Structures. J Chem Inf Model 2021; 61:2231-2240. [PMID: 33979150 DOI: 10.1021/acs.jcim.1c00334] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In recent years, machine-learning-based scoring functions have significantly improved the scoring power. However, many of these methods do not perform well in distinguishing the native structure from docked decoy poses due to the lack of decoy structural information in their training data. Here, we developed a machine-learning model, named DeepBSP, that can directly predict the root mean square deviation (rmsd) of a ligand docking pose with reference to its native binding pose. Unlike the binding affinity, the rmsd between the docking poses with reference to their native structures can be straightforwardly determined. By training on a generated data set with 11,925 native complexes and more than 165,000 docked poses, our model shows excellent docking power on our test set and also on the CASF-2016 docking decoy set compared to other major scoring functions. Thus, by combining molecular dockings that generate many poses with the application of DeepBSP, one can more accurately predict the best binding pose that is closest to the native complex structure. This DeepBSP model shall be very useful in picking out poses close to their natives from many poses generated from a dock application.
Collapse
Affiliation(s)
- Jingxiao Bao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York, New York 10003, United States.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| |
Collapse
|