1
|
Poudel P, Miteva MA, Alexov E. Strategies for in Silico Drug Discovery to Modulate Macromolecular Interactions Altered by Mutations. FRONT BIOSCI-LANDMRK 2025; 30:26339. [PMID: 40302318 DOI: 10.31083/fbl26339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 09/22/2024] [Accepted: 10/09/2024] [Indexed: 05/02/2025]
Abstract
Most human diseases have genetic components, frequently single nucleotide variants (SNVs), which alter the wild type characteristics of macromolecules and their interactions. A straightforward approach for correcting such SNVs-related alterations is to seek small molecules, potential drugs, that can eliminate disease-causing effects. Certain disorders are caused by altered protein-protein interactions, for example, Snyder-Robinson syndrome, the therapy for which focuses on the development of small molecules that restore the wild type homodimerization of spermine synthase. Other disorders originate from altered protein-nucleic acid interactions, as in the case of cancer; in these cases, the elimination of disease-causing effects requires small molecules that eliminate the effect of mutation and restore wild type p53-DNA affinity. Overall, especially for complex diseases, pathogenic mutations frequently alter macromolecular interactions. This effect can be direct, i.e., the alteration of wild type affinity and specificity, or indirect via alterations in the concentration of the binding partners. Here, we outline progress made in methods and strategies to computationally identify small molecules capable of altering macromolecular interactions in a desired manner, reducing or increasing the binding affinity, and eliminating the disease-causing effect. When applicable, we provide examples of the outlined general strategy. Successful cases are presented at the end of the work.
Collapse
Affiliation(s)
- Pitambar Poudel
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Maria A Miteva
- Université Paris Cité, CNRS UMR 8038 CiTCoM, Inserm, U1268 MCTR Paris, France
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
2
|
Zhang X, Zhang M, Li Y, Deng P. Identification of Potential Selective PAK4 Inhibitors Through Shape and Protein Conformation Ensemble Screening and Electrostatic-Surface-Matching Optimization. Curr Issues Mol Biol 2025; 47:29. [PMID: 39852144 PMCID: PMC11764389 DOI: 10.3390/cimb47010029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2024] [Revised: 12/31/2024] [Accepted: 01/03/2025] [Indexed: 01/26/2025] Open
Abstract
P21-activated kinase 4 (PAK4) plays a crucial role in the proliferation and metastasis of various cancers. However, developing selective PAK4 inhibitors remains challenging due to the high homology within the PAK family. Therefore, developing highly selective PAK4 inhibitors is critical to overcoming the limitations of existing inhibitors. We analyzed the structural differences in the binding pockets of PAK1 and PAK4 by combining cross-docking and molecular dynamics simulations to identify key binding regions and unique structural features of PAK4. We then performed screening using shape and protein conformation ensembles, followed by a re-evaluation of the docking results with deep-learning-driven GNINA to identify the candidate molecule, STOCK7S-56165. Based on this, we applied a fragment-replacement strategy under electrostatic-surface-matching conditions to obtain Compd 26. This optimization significantly improved electrostatic interactions and reduced binding energy, highlighting its potential for selectivity. Our findings provide a novel approach for developing selective PAK4 inhibitors and lay the theoretical foundation for future anticancer drug design.
Collapse
Affiliation(s)
- Xiaoxuan Zhang
- College of Pharmacy, Chongqing Medical University, Chongqing 400016, China; (X.Z.); (M.Z.); (Y.L.)
- Chongqing Research Center for Pharmaceutical Engineering, Chongqing 400016, China
| | - Meile Zhang
- College of Pharmacy, Chongqing Medical University, Chongqing 400016, China; (X.Z.); (M.Z.); (Y.L.)
- Chongqing Research Center for Pharmaceutical Engineering, Chongqing 400016, China
| | - Yihao Li
- College of Pharmacy, Chongqing Medical University, Chongqing 400016, China; (X.Z.); (M.Z.); (Y.L.)
- Chongqing Research Center for Pharmaceutical Engineering, Chongqing 400016, China
| | - Ping Deng
- College of Pharmacy, Chongqing Medical University, Chongqing 400016, China; (X.Z.); (M.Z.); (Y.L.)
- Chongqing Research Center for Pharmaceutical Engineering, Chongqing 400016, China
- Chongqing Key Research Laboratory for Quality Evaluation and Safety Research of APIs, Chongqing 400016, China
| |
Collapse
|
3
|
Gómez-Sacristán P, Simeon S, Tran-Nguyen VK, Patil S, Ballester PJ. Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers. J Adv Res 2025; 67:185-196. [PMID: 38280715 PMCID: PMC11725107 DOI: 10.1016/j.jare.2024.01.024] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 12/01/2023] [Accepted: 01/21/2024] [Indexed: 01/29/2024] Open
Abstract
INTRODUCTION Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers. OBJECTIVES We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization. METHODS By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets. RESULTS 60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression. CONCLUSION PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.
Collapse
Affiliation(s)
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille 13009, France
| | | | - Sachin Patil
- NanoBio Laboratory, Widener University, Chester, PA 19013, USA
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
4
|
Yang Z, Zhong W, Lv Q, Dong T, Chen G, Chen CYC. Interaction-Based Inductive Bias in Graph Neural Networks: Enhancing Protein-Ligand Binding Affinity Predictions From 3D Structures. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:8191-8208. [PMID: 38739515 DOI: 10.1109/tpami.2024.3400515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Inductive bias in machine learning (ML) is the set of assumptions describing how a model makes predictions. Different ML-based methods for protein-ligand binding affinity (PLA) prediction have different inductive biases, leading to different levels of generalization capability and interpretability. Intuitively, the inductive bias of an ML-based model for PLA prediction should fit in with biological mechanisms relevant for binding to achieve good predictions with meaningful reasons. To this end, we propose an interaction-based inductive bias to restrict neural networks to functions relevant for binding with two assumptions: 1) A protein-ligand complex can be naturally expressed as a heterogeneous graph with covalent and non-covalent interactions; 2) The predicted PLA is the sum of pairwise atom-atom affinities determined by non-covalent interactions. The interaction-based inductive bias is embodied by an explainable heterogeneous interaction graph neural network (EHIGN) for explicitly modeling pairwise atom-atom interactions to predict PLA from 3D structures. Extensive experiments demonstrate that EHIGN achieves better generalization capability than other state-of-the-art ML-based baselines in PLA prediction and structure-based virtual screening. More importantly, comprehensive analyses of distance-affinity, pose-affinity, and substructure-affinity relations suggest that the interaction-based inductive bias can guide the model to learn atomic interactions that are consistent with physical reality. As a case study to demonstrate practical usefulness, our method is tested for predicting the efficacy of Nirmatrelvir against SARS-CoV-2 variants. EHIGN successfully recognizes the changes in the efficacy of Nirmatrelvir for different SARS-CoV-2 variants with meaningful reasons.
Collapse
|
5
|
Liu D, Song T, Wang S. MM-DRPNet: A multimodal dynamic radial partitioning network for enhanced protein-ligand binding affinity prediction. Comput Struct Biotechnol J 2024; 23:4396-4405. [PMID: 39737077 PMCID: PMC11683220 DOI: 10.1016/j.csbj.2024.11.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 11/23/2024] [Accepted: 11/30/2024] [Indexed: 01/01/2025] Open
Abstract
Accurate prediction of drug-target binding affinity remains a fundamental challenge in contemporary drug discovery. Despite significant advances in computational methods for protein-ligand binding affinity prediction, current approaches still face substantial limitations in prediction accuracy. Moreover, the prevalent methodologies often overlook critical three-dimensional (3D) structural information, thereby constraining their practical utility in computer-aided drug design (CADD). Here we present MM-DRPNet, a multimodal deep learning framework that enhances binding affinity prediction by integrating protein-ligand structural information with interaction features and physicochemical properties. The core innovation lies in our dynamic radial partitioning (DRP) algorithm, which adaptively segments 3D space based on complex-specific interaction patterns, surpassing traditional fixed partitioning methods in capturing spatial interactions. MM-DRPNet further incorporates molecular topological features to comprehensively model both structural and spatial relationships. Extensive evaluations on benchmark datasets demonstrate that MM-DRPNet significantly outperforms state-of-the-art methods across multiple metrics, with ablation studies confirming the substantial contribution of each architectural component. Source code for MM-DRPNet is freely available for download at https://github.com/Bigrock-dd/MMDRPv1.
Collapse
Affiliation(s)
- Dayan Liu
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, Shandong, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, Shandong, China
| | - Shudong Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, Shandong, China
| |
Collapse
|
6
|
Jones D, Zhang X, Bennion BJ, Pinge S, Xu W, Kang J, Khaleghi B, Moshiri N, Allen JE, Rosing TS. HDBind: encoding of molecular structure with hyperdimensional binary representations. Sci Rep 2024; 14:29025. [PMID: 39578580 PMCID: PMC11584749 DOI: 10.1038/s41598-024-80009-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Accepted: 11/14/2024] [Indexed: 11/24/2024] Open
Abstract
Traditional methods for identifying "hit" molecules from a large collection of potential drug-like candidates rely on biophysical theory to compute approximations to the Gibbs free energy of the binding interaction between the drug and its protein target. These approaches have a significant limitation in that they require exceptional computing capabilities for even relatively small collections of molecules. Increasingly large and complex state-of-the-art deep learning approaches have gained popularity with the promise to improve the productivity of drug design, notorious for its numerous failures. However, as deep learning models increase in their size and complexity, their acceleration at the hardware level becomes more challenging. Hyperdimensional Computing (HDC) has recently gained attention in the computer hardware community due to its algorithmic simplicity relative to deep learning approaches. The HDC learning paradigm, which represents data with high-dimension binary vectors, allows the use of low-precision binary vector arithmetic to create models of the data that can be learned without the need for the gradient-based optimization required in many conventional machine learning and deep learning methods. This algorithmic simplicity allows for acceleration in hardware that has been previously demonstrated in a range of application areas (computer vision, bioinformatics, mass spectrometery, remote sensing, edge devices, etc.). To the best of our knowledge, our work is the first to consider HDC for the task of fast and efficient screening of modern drug-like compound libraries. We also propose the first HDC graph-based encoding methods for molecular data, demonstrating consistent and substantial improvement over previous work. We compare our approaches to alternative approaches on the well-studied MoleculeNet dataset and the recently proposed LIT-PCBA dataset derived from high quality PubChem assays. We demonstrate our methods on multiple target hardware platforms, including Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), showing at least an order of magnitude improvement in energy efficiency versus even our smallest neural network baseline model with a single hidden layer. Our work thus motivates further investigation into molecular representation learning to develop ultra-efficient pre-screening tools. We make our code publicly available at https://github.com/LLNL/hdbind .
Collapse
Affiliation(s)
- Derek Jones
- Department of Computer Science and Engineering, University of California-San Diego, La Jolla, CA, USA.
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA, USA.
| | - Xiaohua Zhang
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Brian J Bennion
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Sumukh Pinge
- Department of Computer Science and Engineering, University of California-San Diego, La Jolla, CA, USA
| | - Weihong Xu
- Department of Computer Science and Engineering, University of California-San Diego, La Jolla, CA, USA
| | - Jaeyoung Kang
- Department of Computer Science and Engineering, University of California-San Diego, La Jolla, CA, USA
| | - Behnam Khaleghi
- Department of Computer Science and Engineering, University of California-San Diego, La Jolla, CA, USA
| | - Niema Moshiri
- Department of Computer Science and Engineering, University of California-San Diego, La Jolla, CA, USA
| | - Jonathan E Allen
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Tajana S Rosing
- Department of Computer Science and Engineering, University of California-San Diego, La Jolla, CA, USA
| |
Collapse
|
7
|
Seo S, Kim H, Lee J, Choi S, Park S. Exploring the potential of compound-protein complex structure-free models in virtual screening using BlendNet. Brief Bioinform 2024; 26:bbae712. [PMID: 39804143 PMCID: PMC11726592 DOI: 10.1093/bib/bbae712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/13/2024] [Accepted: 12/27/2024] [Indexed: 01/16/2025] Open
Abstract
Identifying new compounds that interact with a target is a crucial time-limiting step in the initial phases of drug discovery. Compound-protein complex structure-based affinity prediction models can expedite this process; however, their dependence on high-quality three-dimensional (3D) complex structures limits their practical application. Prediction models that do not require 3D complex structures for binding-affinity estimation offer a theoretically attractive alternative; however, accurately predicting affinity without interaction information presents significant challenges. We introduce BlendNet, a framework that employs a knowledge transfer strategy to improve affinity prediction accuracy by learning the interdependent relationships between compounds and proteins without relying on 3D complex structures. Compared with state-of-the-art models for affinity prediction, BlendNet demonstrated superior performance across various cold-start cases. The ability of BlendNet to interpret compound-protein interactions without utilizing complex structure data highlights its potential to accelerate and streamline drug development.
Collapse
Affiliation(s)
- Sangmin Seo
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
- UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Hwanhee Kim
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Jieun Lee
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Seungyeon Choi
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| |
Collapse
|
8
|
Dhayalan A, Prajapati A, Yogisharadhya R, Chanda MM, Shivachandra SB. Anti-quorum sensing and anti-biofilm activities of Pasteurella multocida strains. Microb Pathog 2024; 197:107085. [PMID: 39481691 DOI: 10.1016/j.micpath.2024.107085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 10/10/2024] [Accepted: 10/25/2024] [Indexed: 11/02/2024]
Abstract
A total of 52 Pasteurella multocida strains of capsular serogroups (A, B and D) were screened for anti-quorum sensing activity against Chromobacterium violaceum. Of which, 12 strains of serogroups A were found to possess anti-quorum sensing activity. Inhibition activity was highest for strain NIVEDIPm9 and lowest for strain NIVEDIPm30 based on zone of pigment inhibition. Further, cell free extract of NIVEDIPm9 strain showed highest anti-biofilm activity in reference E. coli strain and concentration dependent degradation activity of C6-AHL molecule. In whole genome sequence annotation of NIVEDIPm9 strain predicted the presence of four metallo-β-lactamases (MBL) fold metallo-hydrolase proteins. In docking studies, MBL1 and MBL3 proteins showed high binding affinity with autoinduce signalling molecules AHL compound of OH-C10, binding energy value were -6.3 and -6.2 kcal/mol. Interaction study of VAF and quorum sensing molecules showed that OmpA and HgbA proteins were stimulated by all the ten molecules (C4-AHLs, C6-AHLs, C10-AHLs, C14-AHLs, 3-oxo-C10-AHLs, 3OH-C10-HSL, C8-HSL, C10-HSL, C12-HSL, C14-HSL), while toxA gene was stimulated by OH-C10-AHL molecule, sodC gene was stimulated by none. In conclusion, we described the anti-quorum sensing activities of diverse P. multocida strains causing Pasteurellosis in livestock.
Collapse
Affiliation(s)
- Arul Dhayalan
- ICAR-National Institute of Veterinary Epidemiology and Disease Informatics (NIVEDI), Post Box No. 6450, Yelahanka, Bengaluru, 560064, Karnataka, India
| | - Awadhesh Prajapati
- Bihar Veterinary College, Bihar Animal Sciences University, Patna, 800014. Bihar, India
| | - Revanaiah Yogisharadhya
- ICAR-Krishi Vigyan Kendra (KVK), ICAR-Research Complex for NEH Region, Hailakandi, 788152, Assam, India
| | - Mohammed Mudassar Chanda
- ICAR-National Institute of Veterinary Epidemiology and Disease Informatics (NIVEDI), Post Box No. 6450, Yelahanka, Bengaluru, 560064, Karnataka, India
| | - Sathish Bhadravati Shivachandra
- ICAR-National Institute of Veterinary Epidemiology and Disease Informatics (NIVEDI), Post Box No. 6450, Yelahanka, Bengaluru, 560064, Karnataka, India.
| |
Collapse
|
9
|
Lam HYI, Guan JS, Ong XE, Pincket R, Mu Y. Protein language models are performant in structure-free virtual screening. Brief Bioinform 2024; 25:bbae480. [PMID: 39327890 PMCID: PMC11427677 DOI: 10.1093/bib/bbae480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 08/17/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open
Abstract
Hitherto virtual screening (VS) has been typically performed using a structure-based drug design paradigm. Such methods typically require the use of molecular docking on high-resolution three-dimensional structures of a target protein-a computationally-intensive and time-consuming exercise. This work demonstrates that by employing protein language models and molecular graphs as inputs to a novel graph-to-transformer cross-attention mechanism, a screening power comparable to state-of-the-art structure-based models can be achieved. The implications thereof include highly expedited VS due to the greatly reduced compute required to run this model, and the ability to perform early stages of computer-aided drug design in the complete absence of 3D protein structures.
Collapse
Affiliation(s)
- Hilbert Yuen In Lam
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Jia Sheng Guan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
| | - Xing Er Ong
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| | - Robbe Pincket
- Heliovision, Asstraat 5, 3000 Leuven, Leuven, Kingdom of Belgium
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Dr, Singapore 637551, Singapore, Republic of Singapore
- MagMol Pte. Ltd., 68 Circular Road, #02-01, Singapore 049422, Singapore, Republic of Singapore
| |
Collapse
|
10
|
Liu S, Yu J, Ni N, Wang Z, Chen M, Li Y, Xu C, Ding Y, Zhang J, Yao X, Liu H. Versatile Framework for Drug-Target Interaction Prediction by Considering Domain-Specific Features. J Chem Inf Model 2024; 64:5646-5656. [PMID: 38976879 DOI: 10.1021/acs.jcim.4c00403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting drug-target interactions (DTIs) is one of the crucial tasks in drug discovery, but traditional wet-lab experiments are costly and time-consuming. Recently, deep learning has emerged as a promising tool for accelerating DTI prediction due to its powerful performance. However, the models trained on limited known DTI data struggle to generalize effectively to novel drug-target pairs. In this work, we propose a strategy to train an ensemble of models by capturing both domain-generic and domain-specific features (E-DIS) to learn diverse domain features and adapt them to out-of-distribution data. Multiple experts were trained on different domains to capture and align domain-specific information from various distributions without accessing any data from unseen domains. E-DIS provides a comprehensive representation of proteins and ligands by capturing diverse features. Experimental results on four benchmark data sets in both in-domain and cross-domain settings demonstrated that E-DIS significantly improved model performance and domain generalization compared to existing methods. Our approach presents a significant advancement in DTI prediction by combining domain-generic and domain-specific features, enhancing the generalization ability of the DTI prediction model.
Collapse
Affiliation(s)
- Shuo Liu
- School of Pharmacy, Lanzhou University, Gansu 730000, China
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jialiang Yu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Ningxi Ni
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Zidong Wang
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Mengyun Chen
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Gansu 730000, China
| | - Chen Xu
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Yahao Ding
- Huawei Technologies Co., Ltd., Hangzhou 310000, China
| | - Jun Zhang
- Changping Laboratory, Beijing 102200, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China
| |
Collapse
|
11
|
Li X, Shen C, Zhu H, Yang Y, Wang Q, Yang J, Huang N. A High-Quality Data Set of Protein-Ligand Binding Interactions Via Comparative Complex Structure Modeling. J Chem Inf Model 2024; 64:2454-2466. [PMID: 38181418 DOI: 10.1021/acs.jcim.3c01170] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]
Abstract
High-quality protein-ligand complex structures provide the basis for understanding the nature of noncovalent binding interactions at the atomic level and enable structure-based drug design. However, experimentally determined complex structures are scarce compared with the vast chemical space. In this study, we addressed this issue by constructing the BindingNet data set via comparative complex structure modeling, which contains 69,816 modeled high-quality protein-ligand complex structures with experimental binding affinity data. BindingNet provides valuable insights into investigating protein-ligand interactions, allowing visual inspection and interpretation of structural analogues' structure-activity relationships. It can also be used for evaluating machine-learning-based scoring functions. Our results indicate that machine learning models trained on BindingNet could reduce the bias caused by buried solvent-accessible surface area, as we previously found for models trained on the PDBbind data set. We also discussed strategies to improve BindingNet and its potential utilization for benchmarking the molecular docking methods and ligand binding free energy calculation approaches. The BindingNet complements PDBbind in constructing a sufficient and unbiased protein-ligand binding data set and is freely available at http://bindingnet.huanglab.org.cn.
Collapse
Affiliation(s)
- Xuelian Li
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Cheng Shen
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Hui Zhu
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| | - Yujian Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Qing Wang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Jincai Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Niu Huang
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| |
Collapse
|
12
|
Qu X, Dong L, Luo D, Si Y, Wang B. Water Network-Augmented Two-State Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2263-2274. [PMID: 37433009 DOI: 10.1021/acs.jcim.3c00567] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Water network rearrangement from the ligand-unbound state to the ligand-bound state is known to have significant effects on the protein-ligand binding interactions, but most of the current machine learning-based scoring functions overlook these effects. In this study, we endeavor to construct a comprehensive and realistic deep learning model by incorporating water network information into both ligand-unbound and -bound states. In particular, extended connectivity interaction features were integrated into graph representation, and graph transformer operator was employed to extract features of the ligand-unbound and -bound states. Through these efforts, we developed a water network-augmented two-state model called ECIFGraph::HM-Holo-Apo. Our new model exhibits satisfactory performance in terms of scoring, ranking, docking, screening, and reverse screening power tests on the CASF-2016 benchmark. In addition, it can achieve superior performance in large-scale docking-based virtual screening tests on the DEKOIS2.0 data set. Our study highlights that the use of a water network-augmented two-state model can be an effective strategy to bolster the robustness and applicability of machine learning-based scoring functions, particularly for targets with hydrophilic or solvent-exposed binding pockets.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
13
|
Zhang X, Gao H, Wang H, Chen Z, Zhang Z, Chen X, Li Y, Qi Y, Wang R. PLANET: A Multi-objective Graph Neural Network Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2205-2220. [PMID: 37319418 DOI: 10.1021/acs.jcim.3c00253] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Predicting protein-ligand binding affinity is a central issue in drug design. Various deep learning models have been published in recent years, where many of them rely on 3D protein-ligand complex structures as input and tend to focus on the single task of reproducing binding affinity. In this study, we have developed a graph neural network model called PLANET (Protein-Ligand Affinity prediction NETwork). This model takes the graph-represented 3D structure of the binding pocket on the target protein and the 2D chemical structure of the ligand molecule as input. It was trained through a multi-objective process with three related tasks, including deriving the protein-ligand binding affinity, protein-ligand contact map, and ligand distance matrix. Besides the protein-ligand complexes with known binding affinity data retrieved from the PDBbind database, a large number of non-binder decoys were also added to the training data for deriving the final model of PLANET. When tested on the CASF-2016 benchmark, PLANET exhibited a scoring power comparable to the best result yielded by other deep learning models as well as a reasonable ranking power and docking power. In virtual screening trials conducted on the DUD-E benchmark, PLANET's performance was notably better than several deep learning and machine learning models. As on the LIT-PCBA benchmark, PLANET achieved comparable accuracy as the conventional docking program Glide, but it only spent less than 1% of Glide's computation time to finish the same job because PLANET did not need exhaustive conformational sampling. Considering the decent accuracy and efficiency of PLANET in binding affinity prediction, it may become a useful tool for conducting large-scale virtual screening.
Collapse
Affiliation(s)
- Xiangying Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haotian Gao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haojie Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhihang Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhe Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Xinchong Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yifei Qi
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
14
|
Menchon G, Maveyraud L, Czaplicki G. Molecular Dynamics as a Tool for Virtual Ligand Screening. Methods Mol Biol 2024; 2714:33-83. [PMID: 37676592 DOI: 10.1007/978-1-0716-3441-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Rational drug design is essential for new drugs to emerge, especially when the structure of a target protein or nucleic acid is known. To that purpose, high-throughput virtual ligand screening campaigns aim at discovering computationally new binding molecules or fragments to modulate particular biomolecular interactions or biological activities, related to a disease process. The structure-based virtual ligand screening process primarily relies on docking methods which allow predicting the binding of a molecule to a biological target structure with a correct conformation and the best possible affinity. The docking method itself is not sufficient as it suffers from several and crucial limitations (lack of full protein flexibility information, no solvation and ion effects, poor scoring functions, and unreliable molecular affinity estimation).At the interface of computer techniques and drug discovery, molecular dynamics (MD) allows introducing protein flexibility before or after a docking protocol, refining the structure of protein-drug complexes in the presence of water, ions, and even in membrane-like environments, describing more precisely the temporal evolution of the biological complex and ranking these complexes with more accurate binding energy calculations. In this chapter, we describe the up-to-date MD, which plays the role of supporting tools in the virtual ligand screening (VS) process.Without a doubt, using docking in combination with MD is an attractive approach in structure-based drug discovery protocols nowadays. It has proved its efficiency through many examples in the literature and is a powerful method to significantly reduce the amount of required wet experimentations (Tarcsay et al, J Chem Inf Model 53:2990-2999, 2013; Barakat et al, PLoS One 7:e51329, 2012; De Vivo et al, J Med Chem 59:4035-4061, 2016; Durrant, McCammon, BMC Biol 9:71-79, 2011; Galeazzi, Curr Comput Aided Drug Des 5:225-240, 2009; Hospital et al, Adv Appl Bioinforma Chem 8:37-47, 2015; Jiang et al, Molecules 20:12769-12786, 2015; Kundu et al, J Mol Graph Model 61:160-174, 2015; Mirza et al, J Mol Graph Model 66:99-107, 2016; Moroy et al, Future Med Chem 7:2317-2331, 2015; Naresh et al, J Mol Graph Model 61:272-280, 2015; Nichols et al, J Chem Inf Model 51:1439-1446, 2011; Nichols et al, Methods Mol Biol 819:93-103, 2012; Okimoto et al, PLoS Comput Biol 5:e1000528, 2009; Rodriguez-Bussey et al, Biopolymers 105:35-42, 2016; Sliwoski et al, Pharmacol Rev 66:334-395, 2014).
Collapse
Affiliation(s)
- Grégory Menchon
- Inserm U1242, Oncogenesis, Stress and Signaling (OSS), Université de Rennes 1, Rennes, France
| | - Laurent Maveyraud
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III - Paul Sabatier (UT3), Toulouse, France
| | - Georges Czaplicki
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III - Paul Sabatier (UT3), Toulouse, France.
| |
Collapse
|
15
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
16
|
Tran-Nguyen VK, Junaid M, Simeon S, Ballester PJ. A practical guide to machine-learning scoring for structure-based virtual screening. Nat Protoc 2023; 18:3460-3511. [PMID: 37845361 DOI: 10.1038/s41596-023-00885-w] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 07/03/2023] [Indexed: 10/18/2023]
Abstract
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Collapse
Affiliation(s)
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille, Marseille, France
| | | |
Collapse
|
17
|
Shen C, Zhang X, Hsieh CY, Deng Y, Wang D, Xu L, Wu J, Li D, Kang Y, Hou T, Pan P. A generalized protein-ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem Sci 2023; 14:8129-8146. [PMID: 37538816 PMCID: PMC10395315 DOI: 10.1039/d3sc02044d] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/03/2023] [Indexed: 08/05/2023] Open
Abstract
Applying machine learning algorithms to protein-ligand scoring functions has aroused widespread attention in recent years due to the high predictive accuracy and affordable computational cost. Nevertheless, most machine learning-based scoring functions are only applicable to a specific task, e.g., binding affinity prediction, binding pose prediction or virtual screening, suggesting that the development of a scoring function with balanced performance in all critical tasks remains a grand challenge. To this end, we propose a novel parameterization strategy by introducing an adjustable binding affinity term that represents the correlation between the predicted outcomes and experimental data into the training of mixture density network. The resulting residue-atom distance likelihood potential not only retains the superior docking and screening power over all the other state-of-the-art approaches, but also achieves a remarkable improvement in scoring and ranking performance. We emphatically explore the impacts of several key elements on prediction accuracy as well as the task preference, and demonstrate that the performance of scoring/ranking and docking/screening tasks of a certain model could be well balanced through an appropriate manner. Overall, our study highlights the potential utility of our innovative parameterization strategy as well as the resulting scoring framework in future structure-based drug design.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology Changzhou 213001 China
| | - Jian Wu
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
18
|
Shiota K, Suma A, Ogawa H, Yamaguchi T, Iida A, Hata T, Matsushita M, Akutsu T, Tateno M. AQDnet: Deep Neural Network for Protein-Ligand Docking Simulation. ACS OMEGA 2023; 8:23925-23935. [PMID: 37426216 PMCID: PMC10324054 DOI: 10.1021/acsomega.3c02411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 05/31/2023] [Indexed: 07/11/2023]
Abstract
We have developed an innovative system, AI QM Docking Net (AQDnet), which utilizes the three-dimensional structure of protein-ligand complexes to predict binding affinity. This system is novel in two respects: first, it significantly expands the training dataset by generating thousands of diverse ligand configurations for each protein-ligand complex and subsequently determining the binding energy of each configuration through quantum computation. Second, we have devised a method that incorporates the atom-centered symmetry function (ACSF), highly effective in describing molecular energies, for the prediction of protein-ligand interactions. These advancements have enabled us to effectively train a neural network to learn the protein-ligand quantum energy landscape (P-L QEL). Consequently, we have achieved a 92.6% top 1 success rate in the CASF-2016 docking power, placing first among all models assessed in the CASF-2016, thus demonstrating the exceptional docking performance of our model.
Collapse
Affiliation(s)
- Koji Shiota
- Innovation
to Implementation Laboratories, Central
Pharmaceutical Research Institute, Japan Tobacco Inc., Takatsuki, Osaka 569-1125, Japan
| | - Akira Suma
- Innovation
to Implementation Laboratories, Central
Pharmaceutical Research Institute, Japan Tobacco Inc., Takatsuki, Osaka 569-1125, Japan
| | - Hiroyuki Ogawa
- Innovation
to Implementation Laboratories, Central
Pharmaceutical Research Institute, Japan Tobacco Inc., Takatsuki, Osaka 569-1125, Japan
| | - Takuya Yamaguchi
- Innovation
to Implementation Laboratories, Central
Pharmaceutical Research Institute, Japan Tobacco Inc., Takatsuki, Osaka 569-1125, Japan
| | - Akio Iida
- Innovation
to Implementation Laboratories, Central
Pharmaceutical Research Institute, Japan Tobacco Inc., Takatsuki, Osaka 569-1125, Japan
| | - Takahiro Hata
- Innovation
to Implementation Laboratories, Central
Pharmaceutical Research Institute, Japan Tobacco Inc., Takatsuki, Osaka 569-1125, Japan
| | - Mutsuyoshi Matsushita
- Innovation
to Implementation Laboratories, Central
Pharmaceutical Research Institute, Japan Tobacco Inc., Takatsuki, Osaka 569-1125, Japan
| | - Tatsuya Akutsu
- Bioinformatics
Center, Institute for Chemical Research,
Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Masaru Tateno
- Innovation
to Implementation Laboratories, Central
Pharmaceutical Research Institute, Japan Tobacco Inc., Takatsuki, Osaka 569-1125, Japan
| |
Collapse
|
19
|
Tran-Nguyen VK, Ballester PJ. Beware of Simple Methods for Structure-Based Virtual Screening: The Critical Importance of Broader Comparisons. J Chem Inf Model 2023; 63:1401-1405. [PMID: 36848585 PMCID: PMC10015451 DOI: 10.1021/acs.jcim.3c00218] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.
Collapse
Affiliation(s)
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, U.K
| |
Collapse
|
20
|
Blanes-Mira C, Fernández-Aguado P, de Andrés-López J, Fernández-Carvajal A, Ferrer-Montiel A, Fernández-Ballester G. Comprehensive Survey of Consensus Docking for High-Throughput Virtual Screening. Molecules 2022; 28:molecules28010175. [PMID: 36615367 PMCID: PMC9821981 DOI: 10.3390/molecules28010175] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/19/2022] [Accepted: 12/21/2022] [Indexed: 12/28/2022] Open
Abstract
The rapid advances of 3D techniques for the structural determination of proteins and the development of numerous computational methods and strategies have led to identifying highly active compounds in computer drug design. Molecular docking is a method widely used in high-throughput virtual screening campaigns to filter potential ligands targeted to proteins. A great variety of docking programs are currently available, which differ in the algorithms and approaches used to predict the binding mode and the affinity of the ligand. All programs heavily rely on scoring functions to accurately predict ligand binding affinity, and despite differences in performance, none of these docking programs is preferable to the others. To overcome this problem, consensus scoring methods improve the outcome of virtual screening by averaging the rank or score of individual molecules obtained from different docking programs. The successful application of consensus docking in high-throughput virtual screening highlights the need to optimize the predictive power of molecular docking methods.
Collapse
|
21
|
The Impact of Software Used and the Type of Target Protein on Molecular Docking Accuracy. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27249041. [PMID: 36558174 PMCID: PMC9788237 DOI: 10.3390/molecules27249041] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 12/05/2022] [Accepted: 12/16/2022] [Indexed: 12/24/2022]
Abstract
The modern development of computer technology and different in silico methods have had an increasing impact on the discovery and development of new drugs. Different molecular docking techniques most widely used in silico methods in drug discovery. Currently, the time and financial costs for the initial hit identification can be significantly reduced due to the ability to perform high-throughput virtual screening of large compound libraries in a short time. However, the selection of potential hit compounds still remains more of a random process, because there is still no consensus on what the binding energy and ligand efficiency (LE) of a potentially active compound should be. In the best cases, only 20-30% of compounds identified by molecular docking are active in biological tests. In this work, we evaluated the impact of the docking software used as well as the type of the target protein on the molecular docking results and their accuracy using an example of the three most popular programs and five target proteins related to neurodegenerative diseases. In addition, we attempted to determine the "reliable range" of the binding energy and LE that would allow selecting compounds with biological activity in the desired concentration range.
Collapse
|
22
|
McMillan AE, Wu WWX, Nichols PL, Wanner BM, Bode JW. A vending machine for drug-like molecules - automated synthesis of virtual screening hits. Chem Sci 2022; 13:14292-14299. [PMID: 36545137 PMCID: PMC9749103 DOI: 10.1039/d2sc05182f] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 10/27/2022] [Indexed: 12/24/2022] Open
Abstract
As a result of high false positive rates in virtual screening campaigns, prospective hits must be synthesised for validation. When done manually, this is a time consuming and laborious process. Large "on-demand" virtual libraries (>7 × 1012 members), suitable for preparation using capsule-based automated synthesis and commercial building blocks, were evaluated to determine their structural novelty. One sub-library, constructed from iSnAP capsules, aldehydes and amines, contains unique scaffolds with drug-like physicochemical properties. Virtual screening hits from this iSnAP library were prepared in an automated fashion for evaluation against Aedes aegypti and Phytophthora infestans. In comparison to manual workflows, this approach provided a 10-fold improvement in user efficiency. A streamlined method of relative stereochemical assignment was also devised to augment the rapid synthesis. User efficiency was further improved to 100-fold by downscaling and parallelising capsule-based chemistry on 96-well plates equipped with filter bases. This work demonstrates that automated synthesis consoles can enable the rapid and reliable preparation of attractive virtual screening hits from large virtual libraries.
Collapse
Affiliation(s)
- Angus E. McMillan
- Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH ZürichZürich 8093Switzerland
| | - Wilson W. X. Wu
- Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH ZürichZürich 8093Switzerland
| | - Paula L. Nichols
- Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH ZürichZürich 8093Switzerland,Synple Chem AGKemptpark 18Kemptthal 8310Switzerland
| | | | - Jeffrey W. Bode
- Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH ZürichZürich 8093Switzerland
| |
Collapse
|
23
|
Xu M, Shen C, Yang J, Wang Q, Huang N. Systematic Investigation of Docking Failures in Large-Scale Structure-Based Virtual Screening. ACS OMEGA 2022; 7:39417-39428. [PMID: 36340123 PMCID: PMC9632257 DOI: 10.1021/acsomega.2c05826] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 10/07/2022] [Indexed: 06/16/2023]
Abstract
In recent years, large-scale structure-based virtual screening has attracted increasing levels of interest for identification of novel compounds corresponding to potential drug targets. It is critical to understand the strengths and weaknesses of docking algorithms to increase the success rate in practical applications. Here, we systematically investigated the docking successes and failures of two representative docking programs: UCSF DOCK 3.7 and AutoDock Vina. DOCK 3.7 performed better in early enrichment on the Directory of Useful Decoys: Enhanced (DUD-E) data set, although both docking methods were roughly comparable in overall enrichment performance. DOCK 3.7 also showed superior computational efficiency. Intriguingly, the Vina scoring function showed a bias toward compounds with higher molecular weights. Both the tested docking approaches yielded incorrectly predicted ligand binding poses caused by the limitations of torsion sampling. Based on a careful analysis of docking results from six representative cases, we propose the reasons underlying docking failures; furthermore, we provide a few solutions, representing practical guidance for large-scale virtual screening campaigns and future docking algorithm development.
Collapse
Affiliation(s)
- Min Xu
- College
of Life Sciences, Beijing Normal University, No. 19 Xinjiekouwai Street, Beijing 100875, China
- National
Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science
Park, Beijing 102206, China
| | - Cheng Shen
- National
Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science
Park, Beijing 102206, China
- Graduate
School of Peking Union Medical College, Chinese Academy of Medical Sciences, No. 9, Dongdan Santiao, Dongcheng District, Beijing 100730, China
| | - Jincai Yang
- National
Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science
Park, Beijing 102206, China
| | - Qing Wang
- National
Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science
Park, Beijing 102206, China
- School
of Pharmaceutical Science and Technology, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China
| | - Niu Huang
- National
Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science
Park, Beijing 102206, China
- Tsinghua
Institute of Multidisciplinary Biomedical Research, Tsinghua University, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| |
Collapse
|
24
|
Guterres H, Park S, Zhang H, Perone T, Kim J, Im W. CHARMM‐GUI
high‐throughput simulator
for efficient evaluation of protein–ligand interactions with different force fields. Protein Sci 2022. [DOI: 10.1002/pro.4413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Hugo Guterres
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| | - Sang‐Jun Park
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| | - Han Zhang
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| | - Thomas Perone
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| | - Jongtaek Kim
- Department of Physics and Chemistry Korea Air Force Academy Cheongju South Korea
| | - Wonpil Im
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| |
Collapse
|
25
|
Yang C, Chen EA, Zhang Y. Protein-Ligand Docking in the Machine-Learning Era. Molecules 2022; 27:4568. [PMID: 35889440 PMCID: PMC9323102 DOI: 10.3390/molecules27144568] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 07/14/2022] [Indexed: 11/16/2022] Open
Abstract
Molecular docking plays a significant role in early-stage drug discovery, from structure-based virtual screening (VS) to hit-to-lead optimization, and its capability and predictive power is critically dependent on the protein-ligand scoring function. In this review, we give a broad overview of recent scoring function development, as well as the docking-based applications in drug discovery. We outline the strategies and resources available for structure-based VS and discuss the assessment and development of classical and machine learning protein-ligand scoring functions. In particular, we highlight the recent progress of machine learning scoring function ranging from descriptor-based models to deep learning approaches. We also discuss the general workflow and docking protocols of structure-based VS, such as structure preparation, binding site detection, docking strategies, and post-docking filter/re-scoring, as well as a case study on the large-scale docking-based VS test on the LIT-PCBA data set.
Collapse
Affiliation(s)
- Chao Yang
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
| | - Eric Anthony Chen
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, NY 10003, USA; (C.Y.); (E.A.C.)
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
26
|
Tran-Nguyen VK, Simeon S, Junaid M, Ballester PJ. Structure-based virtual screening for PDL1 dimerizers: Evaluating generic scoring functions. Curr Res Struct Biol 2022; 4:206-210. [PMID: 35769111 PMCID: PMC9234010 DOI: 10.1016/j.crstbi.2022.06.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/14/2022] [Accepted: 06/02/2022] [Indexed: 10/31/2022] Open
Abstract
The interaction between PD1 and its ligand PDL1 has been shown to render tumor cells resistant to apoptosis and promote tumor progression. An innovative mechanism to inhibit the PD1/PDL1 interaction is PDL1 dimerization induced by small-molecule PDL1 binders. Structure-based virtual screening is a promising approach to discovering such small-molecule PD1/PDL1 inhibitors. Here we investigate which type of generic scoring functions is most suitable to tackle this problem. We consider CNN-Score, an ensemble of convolutional neural networks, as the representative of machine-learning scoring functions. We also evaluate Smina, a commonly used classical scoring function, and IFP, a top structural fingerprint similarity scoring function. These three types of scoring functions were evaluated on two test sets sharing the same set of small-molecule PD1/PDL1 inhibitors, but using different types of inactives: either true inactives (molecules with no in vitro PD1/PDL1 inhibition activity) or assumed inactives (property-matched decoy molecules generated from each active). On both test sets, CNN-Score performed much better than Smina, which in turn strongly outperformed IFP. The fact that the latter was the case, despite precluding any possibility of exploiting decoy bias, demonstrates the predictive value of CNN-Score for PDL1. These results suggest that re-scoring Smina-docked molecules with CNN-Score is a promising structure-based virtual screening method to discover new small-molecule inhibitors of this therapeutic target.
Collapse
Affiliation(s)
- Viet-Khoa Tran-Nguyen
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| | - Saw Simeon
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| | - Muhammad Junaid
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| | - Pedro J. Ballester
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France
- CNRS, UMR7258, Marseille, F-13009, France
- Institut Paoli-Calmettes, Marseille, F-13009, France
- Aix-Marseille University, UM 105, F-13284, Marseille, France
| |
Collapse
|
27
|
Volkov M, Turk JA, Drizard N, Martin N, Hoffmann B, Gaston-Mathé Y, Rognan D. On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks. J Med Chem 2022; 65:7946-7958. [PMID: 35608179 DOI: 10.1021/acs.jmedchem.2c00487] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Accurate prediction of binding affinities from protein-ligand atomic coordinates remains a major challenge in early stages of drug discovery. Using modular message passing graph neural networks describing both the ligand and the protein in their free and bound states, we unambiguously evidence that an explicit description of protein-ligand noncovalent interactions does not provide any advantage with respect to ligand or protein descriptors. Simple models, inferring binding affinities of test samples from that of the closest ligands or proteins in the training set, already exhibit good performances, suggesting that memorization largely dominates true learning in the deep neural networks. The current study suggests considering only noncovalent interactions while omitting their protein and ligand atomic environments. Removing all hidden biases probably requires much denser protein-ligand training matrices and a coordinated effort of the drug design community to solve the necessary protein-ligand structures.
Collapse
Affiliation(s)
- Mikhail Volkov
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, Illkirch 67400, France
| | | | | | | | | | | | - Didier Rognan
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, 74 route du Rhin, Illkirch 67400, France
| |
Collapse
|
28
|
Yang C, Zhang Y. Delta Machine Learning to Improve Scoring-Ranking-Screening Performances of Protein-Ligand Scoring Functions. J Chem Inf Model 2022; 62:2696-2712. [PMID: 35579568 DOI: 10.1021/acs.jcim.2c00485] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Protein-ligand scoring functions are widely used in structure-based drug design for fast evaluation of protein-ligand interactions, and it is of strong interest to develop scoring functions with machine-learning approaches. In this work, by expanding the training set, developing physically meaningful features, employing our recently developed linear empirical scoring function Lin_F9 (Yang, C. J. Chem. Inf. Model. 2021, 61, 4630-4644) as the baseline, and applying extreme gradient boosting (XGBoost) with Δ-machine learning, we have further improved the robustness and applicability of machine-learning scoring functions. Besides the top performances for scoring-ranking-screening power tests of the CASF-2016 benchmark, the new scoring function ΔLin_F9XGB also achieves superior scoring and ranking performances in different structure types that mimic real docking applications. The scoring powers of ΔLin_F9XGB for locally optimized poses, flexible redocked poses, and ensemble docked poses of the CASF-2016 core set achieve Pearson's correlation coefficient (R) values of 0.853, 0.839, and 0.813, respectively. In addition, the large-scale docking-based virtual screening test on the LIT-PCBA data set demonstrates the reliability and robustness of ΔLin_F9XGB in virtual screening application. The ΔLin_F9XGB scoring function and its code are freely available on the web at (https://yzhang.hpc.nyu.edu/Delta_LinF9_XGB).
Collapse
Affiliation(s)
- Chao Yang
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
29
|
Pan X, Wang H, Zhang Y, Wang X, Li C, Ji C, Zhang JZH. AA-Score: a New Scoring Function Based on Amino Acid-Specific Interaction for Molecular Docking. J Chem Inf Model 2022; 62:2499-2509. [PMID: 35452230 DOI: 10.1021/acs.jcim.1c01537] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The protein-ligand scoring function plays an important role in computer-aided drug discovery and is heavily used in virtual screening and lead optimization. In this study, we developed a new empirical protein-ligand scoring function with amino acid-specific interaction components for hydrogen bond, van der Waals, and electrostatic interactions. In addition, hydrophobic, π-stacking, π-cation, and metal-ligand interactions are also included in the new scoring function. To better evaluate the performance of the AA-Score, we generated several new test sets for evaluation of scoring, ranking, and docking performances, respectively. Extensive tests show that AA-Score performs well on scoring, docking, and ranking as compared to other widely used traditional scoring functions. The performance improvement of AA-Score benefits from the decomposition of individual interaction into amino acid-specific types. To facilitate applications, we developed an easy-to-use tool to analyze protein-ligand interaction fingerprint and predict binding affinity using the AA-Score. The source code and associated running examples can be found at https://github.com/xundrug/AA-Score-Tool.
Collapse
Affiliation(s)
- Xiaolin Pan
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Hao Wang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Yueqing Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Xingyu Wang
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Cuiyu Li
- Advanced Computing East China Sub-center, Suma Technology Co., Ltd., Kunshan 215300, China
| | - Changge Ji
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York 10003, United States.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan Shanxi 030006, China
| |
Collapse
|
30
|
Konc J, Lešnik S, Škrlj B, Sova M, Proj M, Knez D, Gobec S, Janežič D. ProBiS-Dock: A Hybrid Multitemplate Homology Flexible Docking Algorithm Enabled by Protein Binding Site Comparison. J Chem Inf Model 2022; 62:1573-1584. [PMID: 35289616 DOI: 10.1021/acs.jcim.1c01176] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The protein data bank (PDB) is a rich source of protein ligand structures, but ligands are not explicitly used in current docking algorithms. We have developed ProBiS-Dock, a docking algorithm complementary to the ProBiS-Dock Database (J. Chem. Inf. Model. 2021, 61, 4097-4107) that treats small molecules and proteins as fully flexible entities and allows conformational changes in both after ligand binding. A new scoring function is described that consists of a binding site-specific scoring function (ProBiS-Score) and a general statistical scoring function. ProBiS-Dock enables rapid docking of small molecules to proteins and has been successfully validated in silico against standard benchmarks. It enables rapid search for new active ligands by leveraging existing knowledge in the PDB. The potential of the software for drug development has been confirmed in vitro by the discovery of new inhibitors of human indoleamine 2,3-dioxygenase 1, an enzyme that is an attractive target for cancer therapy and catalyzes the first rate-determining step of l-tryptophan metabolism via the kynurenine pathway. The software is freely available to academic users at http://insilab.org/probisdock.
Collapse
Affiliation(s)
- Janez Konc
- National Institute of Chemistry, Theory Department, Hajdrihova 19, SI-1001 Ljubljana, Slovenia
| | - Samo Lešnik
- National Institute of Chemistry, Theory Department, Hajdrihova 19, SI-1001 Ljubljana, Slovenia
| | - Blaž Škrlj
- National Institute of Chemistry, Theory Department, Hajdrihova 19, SI-1001 Ljubljana, Slovenia.,Jozef Stefan International Postgraduate School, Jamova cesta 39, SI-1000 Ljubljana, Slovenia.,Jozef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia
| | - Matej Sova
- Faculty of Pharmacy, The Chair of Pharmaceutical Chemistry, Aškerčeva cesta 7, SI-1000 Ljubljana, Slovenia
| | - Matic Proj
- Faculty of Pharmacy, The Chair of Pharmaceutical Chemistry, Aškerčeva cesta 7, SI-1000 Ljubljana, Slovenia
| | - Damijan Knez
- Faculty of Pharmacy, The Chair of Pharmaceutical Chemistry, Aškerčeva cesta 7, SI-1000 Ljubljana, Slovenia
| | - Stanislav Gobec
- Faculty of Pharmacy, The Chair of Pharmaceutical Chemistry, Aškerčeva cesta 7, SI-1000 Ljubljana, Slovenia
| | - Dušanka Janežič
- Faculty of Mathematics, Natural Sciences and Information Technologies, Glagoljaška ulica 8, SI-6000 Koper, Slovenia
| |
Collapse
|
31
|
Spiegel J, Senderowitz H. A Comparison between Enrichment Optimization Algorithm (EOA)-Based and Docking-Based Virtual Screening. Int J Mol Sci 2021; 23:43. [PMID: 35008467 PMCID: PMC8744642 DOI: 10.3390/ijms23010043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/18/2021] [Accepted: 12/19/2021] [Indexed: 12/30/2022] Open
Abstract
Virtual screening (VS) is a well-established method in the initial stages of many drug and material design projects. VS is typically performed using structure-based approaches such as molecular docking, or various ligand-based approaches. Most docking tools were designed to be as global as possible, and consequently only require knowledge on the 3D structure of the biotarget. In contrast, many ligand-based approaches (e.g., 3D-QSAR and pharmacophore) require prior development of project-specific predictive models. Depending on the type of model (e.g., classification or regression), predictive ability is typically evaluated using metrics of performance on either the training set (e.g.,QCV2) or the test set (e.g., specificity, selectivity or QF1/F2/F32). However, none of these metrics were developed with VS in mind, and consequently, their ability to reliably assess the performances of a model in the context of VS is at best limited. With this in mind we have recently reported the development of the enrichment optimization algorithm (EOA). EOA derives QSAR models in the form of multiple linear regression (MLR) equations for VS by optimizing an enrichment-based metric in the space of the descriptors. Here we present an improved version of the algorithm which better handles active compounds and which also takes into account information on inactive (either known inactive or decoy) compounds. We compared the improved EOA in small-scale VS experiments with three common docking tools, namely, Glide-SP, GOLD and AutoDock Vina, employing five molecular targets (acetylcholinesterase, human immunodeficiency virus type 1 protease, MAP kinase p38 alpha, urokinase-type plasminogen activator, and trypsin I). We found that EOA consistently outperformed all docking tools in terms of the area under the ROC curve (AUC) and EF1% metrics that measured the overall and initial success of the VS process, respectively. This was the case when the docking metrics were calculated based on a consensus approach and when they were calculated based on two different sets of single crystal structures. Finally, we propose that EOA could be combined with molecular docking to derive target-specific scoring functions.
Collapse
Affiliation(s)
| | - Hanoch Senderowitz
- Department of Chemistry, Bar-Ilan University, Ramat-Gan 5290002, Israel;
| |
Collapse
|
32
|
Dong L, Qu X, Zhao Y, Wang B. Prediction of Binding Free Energy of Protein-Ligand Complexes with a Hybrid Molecular Mechanics/Generalized Born Surface Area and Machine Learning Method. ACS OMEGA 2021; 6:32938-32947. [PMID: 34901645 PMCID: PMC8655939 DOI: 10.1021/acsomega.1c04996] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 11/10/2021] [Indexed: 06/14/2023]
Abstract
Accurate prediction of protein-ligand binding free energies is important in enzyme engineering and drug discovery. The molecular mechanics/generalized Born surface area (MM/GBSA) approach is widely used to estimate ligand-binding affinities, but its performance heavily relies on the accuracy of its energy components. A hybrid strategy combining MM/GBSA and machine learning (ML) has been developed to predict the binding free energies of protein-ligand systems. Based on the MM/GBSA energy terms and several features associated with protein-ligand interactions, our ML-based scoring function, GXLE, shows much better performance than MM/GBSA without entropy. In particular, the good transferability of the GXLE model is highlighted by its good performance in ranking power for prediction of the binding affinity of different ligands for either the docked structures or crystal structures. The GXLE scoring function and its code are freely available and can be used to correct the binding free energies computed by MM/GBSA.
Collapse
Affiliation(s)
- Lina Dong
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Xiaoyang Qu
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Yuan Zhao
- The
Key Laboratory of Natural Medicine and Immuno-Engineering, Henan University, Kaifeng 475004, P. R.
China
| | - Binju Wang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| |
Collapse
|
33
|
Hall-Swan S, Devaurs D, Rigo MM, Antunes DA, Kavraki LE, Zanatta G. DINC-COVID: A webserver for ensemble docking with flexible SARS-CoV-2 proteins. Comput Biol Med 2021; 139:104943. [PMID: 34717233 PMCID: PMC8518241 DOI: 10.1016/j.compbiomed.2021.104943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 09/27/2021] [Accepted: 10/11/2021] [Indexed: 12/16/2022]
Abstract
An unprecedented research effort has been undertaken in response to the ongoing COVID-19 pandemic. This has included the determination of hundreds of crystallographic structures of SARS-CoV-2 proteins, and numerous virtual screening projects searching large compound libraries for potential drug inhibitors. Unfortunately, these initiatives have had very limited success in producing effective inhibitors against SARS-CoV-2 proteins. A reason might be an often overlooked factor in these computational efforts: receptor flexibility. To address this issue we have implemented a computational tool for ensemble docking with SARS-CoV-2 proteins. We have extracted representative ensembles of protein conformations from the Protein Data Bank and from in silico molecular dynamics simulations. Twelve pre-computed ensembles of SARS-CoV-2 protein conformations have now been made available for ensemble docking via a user-friendly webserver called DINC-COVID (dinc-covid.kavrakilab.org). We have validated DINC-COVID using data on tested inhibitors of two SARS-CoV-2 proteins, obtaining good correlations between docking-derived binding energies and experimentally-determined binding affinities. Some of the best results have been obtained on a dataset of large ligands resolved via room temperature crystallography, and therefore capturing alternative receptor conformations. In addition, we have shown that the ensembles available in DINC-COVID capture different ranges of receptor flexibility, and that this diversity is useful in finding alternative binding modes of ligands. Overall, our work highlights the importance of accounting for receptor flexibility in docking studies, and provides a platform for the identification of new inhibitors against SARS-CoV-2 proteins.
Collapse
Affiliation(s)
- Sarah Hall-Swan
- Department of Computer Science, Rice University, Houston, 77005, Texas, United States
| | - Didier Devaurs
- MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Mauricio M. Rigo
- Department of Computer Science, Rice University, Houston, 77005, Texas, United States
| | - Dinler A. Antunes
- Department of Computer Science, Rice University, Houston, 77005, Texas, United States,Department of Biology and Biochemistry, University of Houston, Houston, 77005, Texas, United States,Corresponding author. Department of Computer Science, Rice University, Houston, 77005, Texas, United States
| | - Lydia E. Kavraki
- Department of Computer Science, Rice University, Houston, 77005, Texas, United States,Corresponding author
| | - Geancarlo Zanatta
- Department of Physics, Federal University of Ceará, Fortaleza, CE, Brazil,Corresponding author
| |
Collapse
|
34
|
Xu Z, Wauchope OR, Frank AT. Navigating Chemical Space by Interfacing Generative Artificial Intelligence and Molecular Docking. J Chem Inf Model 2021; 61:5589-5600. [PMID: 34633194 DOI: 10.1021/acs.jcim.1c00746] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Here, we report the implementation and application of a simple, structure-aware framework to generate target-specific screening libraries. Our approach combines advances in generative artificial intelligence (AI) with conventional molecular docking to explore chemical space conditioned on the unique physicochemical properties of the active site of a biomolecular target. As a demonstration, we used our framework, which we refer to as sample-and-dock, to construct focused libraries for cyclin-dependent kinase type-2 (CDK2) and the active site of the main protease (Mpro) of the SARS-CoV-2 virus. We envision that the sample-and-dock framework could be used to generate theoretical maps of the chemical space specific to a given target and so provide information about its molecular recognition characteristics.
Collapse
Affiliation(s)
- Ziqiao Xu
- Chemistry Department, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| | - Orrette R Wauchope
- Department of Natural Sciences, City University of New York, Baruch College, New York, New York 10010, United States
| | - Aaron T Frank
- Biophysics Program, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
35
|
Gupta A, Zhou HX. Machine Learning-Enabled Pipeline for Large-Scale Virtual Drug Screening. J Chem Inf Model 2021; 61:4236-4244. [PMID: 34399578 DOI: 10.1021/acs.jcim.1c00710] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Virtual screening is receiving renewed attention in drug discovery, but progress is hampered by challenges on two fronts: handling the ever-increasing sizes of libraries of drug-like compounds and separating true positives from false positives. Here, we developed a machine learning-enabled pipeline for large-scale virtual screening that promises breakthroughs on both fronts. By clustering compounds according to molecular properties and limited docking against a drug target, the full library was trimmed by 10-fold; the remaining compounds were then screened individually by docking; and finally, a dense neural network was trained to classify the hits into true and false positives. As illustration, we screened for inhibitors against RPN11, the deubiquitinase subunit of the proteasome, and a drug target for breast cancer.
Collapse
Affiliation(s)
- Aayush Gupta
- Department of Chemistry, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Huan-Xiang Zhou
- Department of Chemistry, University of Illinois at Chicago, Chicago, Illinois 60607, United States.,Department of Physics, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| |
Collapse
|