1
|
Pushkaran AC, Arabi AA. Accurate prediction of DNA-Intercalator binding energies: Ensemble of short or long molecular dynamics simulations? Int J Biol Macromol 2025; 306:141408. [PMID: 39993670 DOI: 10.1016/j.ijbiomac.2025.141408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 01/29/2025] [Accepted: 02/21/2025] [Indexed: 02/26/2025]
Abstract
Despite the wide use of molecular dynamics (MD) simulations for binding energy predictions in biomolecular systems, results from single MD simulations are non-reproducible and often deviate from experimental values, even when longer simulations are used. This study addresses these limitations using ensemble MD simulations for the formation of DNA-intercalator complexes. Twenty-five replicas of short (10 ns) and long (100 ns) MD simulations were performed on different intercalators binding into DNA. The MM/PBSA and MM/GBSA binding energies of the Doxorubicin intercalating into DNA, including entropy and deformation energy corrections, are -7.3 ± 2.0 kcal/mol and -8.9 ± 1.6 kcal/mol, using 25 replicas of 100 ns. These values were closely reproduced even with shorter simulations of 10 ns, where the energies, averaged over 25 replicas, are -7.6 ± 2.4 kcal/mol (MM/PBSA) and -8.3 ± 2.9 kcal/mol (MM/GBSA). In both cases, the energies align well with the experimental range of -7.7 ± 0.3 to -9.9 ± 0.1 kcal/mol. This shows that reproducibility and accuracy of the binding energies depend more on the number of replicas than on the simulation length. The study was repeated for the DNA-Proflavine system, where the corrected MM/PBSA and MM/GBSA binding energies, averaged over 25 replicas of 10 ns each, are -5.6 ± 1.4 and -5.3 ± 2.3 kcal/mol, respectively. These are congruent with the experimental range of -5.9 to -7.1 kcal/mol. Bootstrap analyses revealed that 6 replicas of 100 ns or 8 replicas of 10 ns provide a good balance between computational efficiency and accuracy within 1.0 kcal/mol from experimental values.
Collapse
Affiliation(s)
- Anju Choorakottayil Pushkaran
- Department of Biochemistry and Molecular Biology, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, P.O. Box: 15551, United Arab Emirates
| | - Alya A Arabi
- Department of Biochemistry and Molecular Biology, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, P.O. Box: 15551, United Arab Emirates.
| |
Collapse
|
2
|
Yue Y, Cheng Y, Marquet C, Xiao C, Guo J, Li S, He S. Meta-Learning Enables Complex Cluster-Specific Few-Shot Binding Affinity Prediction for Protein-Protein Interactions. J Chem Inf Model 2025; 65:580-588. [PMID: 39772708 DOI: 10.1021/acs.jcim.4c01607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Predicting protein-protein interaction (PPI) binding affinities in unseen protein complex clusters is essential for elucidating complex protein interactions and for the targeted screening of peptide- or protein-based drugs. We introduce MCGLPPI++, a meta-learning framework designed to improve the adaptability of pretrained geometric models in such scenarios. To effectively boost the meta-learning optimization by injecting prior intersample distribution knowledge, three specially designed training sample cluster splitting patterns based on protein interaction interfaces are introduced. Additionally, MCGLPPI++ is equipped with an independent energy component which explicitly models interface nonbonded interaction energies closely related to the strengths of PPIs. To validate our approach, we curate a new data set featuring a challenging test cluster of T-cell receptors binding to antigenic peptide-MHC molecules (TCR-pMHC). Experimental results show that geometric models enhanced by the MCGLPPI++ framework achieve significantly more robust binding affinity predictions after fine-tuning on a few samples from this novel cluster compared to their vanilla counterparts, which demonstrates the effectiveness of the framework.
Collapse
Affiliation(s)
- Yang Yue
- School of Computer Science, The University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K
| | - Yihua Cheng
- School of Computer Science, The University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K
| | - Céline Marquet
- Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching 85748, Munich, Germany
| | - Chenguang Xiao
- School of Computer Science, The University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K
| | - Jingjing Guo
- Centre of Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR 999078, China
| | - Shu Li
- Centre of Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR 999078, China
| | - Shan He
- School of Computer Science, The University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K
| |
Collapse
|
3
|
Bhati A, Wan S, Coveney PV. Equilibrium and Nonequilibrium Ensemble Methods for Accurate, Precise and Reproducible Absolute Binding Free Energy Calculations. J Chem Theory Comput 2025; 21:440-462. [PMID: 39680850 PMCID: PMC11736689 DOI: 10.1021/acs.jctc.4c01389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 11/28/2024] [Accepted: 12/02/2024] [Indexed: 12/18/2024]
Abstract
Free energy calculations for protein-ligand complexes have become widespread in recent years owing to several conceptual, methodological and technological advances. Central among these is the use of ensemble methods which permits accurate, precise and reproducible predictions and is necessary for uncertainty quantification. Absolute binding free energies (ABFEs) are challenging to predict using alchemical methods and their routine application in drug discovery has remained out of reach until now. Here, we apply ensemble alchemical ABFE methods to a large data set comprising 219 ligand-protein complexes and obtain statistically robust results with high accuracy (<1 kcal/mol). We compare equilibrium and nonequilibrium methods for ABFE predictions at large scale and provide a systematic critical assessment of each method. The equilibrium method is more accurate, precise, faster, computationally more cost-effective and requires a much simpler protocol, making it preferable for large scale and blind applications. We find that the calculated free energy distributions are non-normal and discuss the consequences. We recommend a definitive protocol to perform ABFE calculations optimally. Using this protocol, it is possible to perform thousands of ABFE calculations within a few hours on modern exascale machines.
Collapse
Affiliation(s)
- Agastya
P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, United Kingdom
| | - Shunzhou Wan
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, United Kingdom
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, United Kingdom
- Computational
Science Laboratory, Institute for Informatics, Faculty of Science, University of Amsterdam, Amsterdam 1012, The Netherlands
- Advanced
Research Computing Centre, University College
London, London WC1H 9BT, United Kingdom
| |
Collapse
|
4
|
Yue Y, Li S, Cheng Y, Wang L, Hou T, Zhu Z, He S. Integration of molecular coarse-grained model into geometric representation learning framework for protein-protein complex property prediction. Nat Commun 2024; 15:9629. [PMID: 39511202 PMCID: PMC11544137 DOI: 10.1038/s41467-024-53583-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 10/16/2024] [Indexed: 11/15/2024] Open
Abstract
Structure-based machine learning algorithms have been utilized to predict the properties of protein-protein interaction (PPI) complexes, such as binding affinity, which is critical for understanding biological mechanisms and disease treatments. While most existing algorithms represent PPI complex graph structures at the atom-scale or residue-scale, these representations can be computationally expensive or may not sufficiently integrate finer chemical-plausible interaction details for improving predictions. Here, we introduce MCGLPPI, a geometric representation learning framework that combines graph neural networks (GNNs) with MARTINI molecular coarse-grained (CG) models to predict PPI overall properties accurately and efficiently. Extensive experiments on three types of downstream PPI property prediction tasks demonstrate that at the CG-scale, MCGLPPI achieves competitive performance compared with the counterparts at the atom- and residue-scale, but with only a third of computational resource consumption. Furthermore, CG-scale pre-training on protein domain-domain interaction structures enhances its predictive capabilities for PPI tasks. MCGLPPI offers an effective and efficient solution for PPI overall property predictions, serving as a promising tool for the large-scale analysis of biomolecular interactions.
Collapse
Affiliation(s)
- Yang Yue
- School of Computer Science, The University of Birmingham, Edgbaston, Birmingham, UK
| | - Shu Li
- Macao Polytechnic University, Macao, China
| | - Yihua Cheng
- School of Computer Science, The University of Birmingham, Edgbaston, Birmingham, UK
| | - Lie Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, Institute of Immunology, Zhejiang University School of Medicine, Hangzhou, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Zexuan Zhu
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China.
| | - Shan He
- School of Computer Science, The University of Birmingham, Edgbaston, Birmingham, UK.
- Macao Polytechnic University, Macao, China.
| |
Collapse
|
5
|
Loeffler HH, Wan S, Klähn M, Bhati AP, Coveney PV. Optimal Molecular Design: Generative Active Learning Combining REINVENT with Precise Binding Free Energy Ranking Simulations. J Chem Theory Comput 2024; 20. [PMID: 39225482 PMCID: PMC11428133 DOI: 10.1021/acs.jctc.4c00576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/08/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024]
Abstract
Active learning (AL) is a specific instance of sequential experimental design and uses machine learning to intelligently choose the next data point or batch of molecular structures to be evaluated. In this sense, it closely mimics the iterative design-make-test-analysis cycle of laboratory experiments to find optimized compounds for a given design task. Here, we describe an AL protocol which combines generative molecular AI, using REINVENT, and physics-based absolute binding free energy molecular dynamics simulation, using ESMACS, to discover new ligands for two different target proteins, 3CLpro and TNKS2. We have deployed our generative active learning (GAL) protocol on Frontier, the world's only exa-scale machine. We show that the protocol can find higher-scoring molecules compared to the baseline, a surrogate ML docking model for 3CLpro and compounds with experimentally determined binding affinities for TNKS2. The ligands found are also chemically diverse and occupy a different chemical space than the baseline. We vary the batch sizes that are put forward for free energy assessment in each GAL cycle to assess the impact on their efficiency on the GAL protocol and recommend their optimal values in different scenarios. Overall, we demonstrate a powerful capability of the combination of physics-based and AI methods which yields effective chemical space sampling at an unprecedented scale and is of immediate and direct relevance to modern, data-driven drug discovery.
Collapse
Affiliation(s)
- Hannes H. Loeffler
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Shunzhou Wan
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Marco Klähn
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Agastya P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
- Advanced
Research Computing Centre, University College
London, London WC1H 0AJ, U.K.
- Institute
for Informatics, Faculty of Science, University
of Amsterdam, Amsterdam 1098XH, The Netherlands
| |
Collapse
|
6
|
Wan S, Bhati AP, Wade AD, Coveney PV. Ensemble-Based Approaches Ensure Reliability and Reproducibility. J Chem Inf Model 2023; 63:6959-6963. [PMID: 37965695 PMCID: PMC10685440 DOI: 10.1021/acs.jcim.3c01654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Indexed: 11/16/2023]
Abstract
It is increasingly widely recognized that ensemble-based approaches are required to achieve reliability, accuracy, and precision in molecular dynamics calculations. The purpose of the present article is to address a frequently raised question: what is the optimal way to perform ensemble simulation to calculate quantities of interest?
Collapse
Affiliation(s)
- Shunzhou Wan
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U. K
| | - Agastya P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U. K
| | - Alexander D. Wade
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U. K
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U. K
- Advanced
Research Computing Centre, University College
London, London WC1H 0AJ, U.K.
- Institute
for Informatics, Faculty of Science, University
of Amsterdam, 1098XH Amsterdam, The Netherlands
| |
Collapse
|
7
|
Wan S, Bhati AP, Coveney PV. Comparison of Equilibrium and Nonequilibrium Approaches for Relative Binding Free Energy Predictions. J Chem Theory Comput 2023; 19:7846-7860. [PMID: 37862058 PMCID: PMC10653111 DOI: 10.1021/acs.jctc.3c00842] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Indexed: 10/21/2023]
Abstract
Alchemical relative binding free energy calculations have recently found important applications in drug optimization. A series of congeneric compounds are generated from a preidentified lead compound, and their relative binding affinities to a protein are assessed in order to optimize candidate drugs. While methods based on equilibrium thermodynamics have been extensively studied, an approach based on nonequilibrium methods has recently been reported together with claims of its superiority. However, these claims pay insufficient attention to the basis and reliability of both methods. Here we report a comparative study of the two approaches across a large data set, comprising more than 500 ligand transformations spanning in excess of 300 ligands binding to a set of 14 diverse protein targets. Ensemble methods are essential to quantify the uncertainty in these calculations, not only for the reasons already established in the equilibrium approach but also to ensure that the nonequilibrium calculations reside within their domain of validity. If and only if ensemble methods are applied, we find that the nonequilibrium method can achieve accuracy and precision comparable to those of the equilibrium approach. Compared to the equilibrium method, the nonequilibrium approach can reduce computational costs but introduces higher computational complexity and longer wall clock times. There are, however, cases where the standard length of a nonequilibrium transition is not sufficient, necessitating a complete rerun of the entire set of transitions. This significantly increases the computational cost and proves to be highly inconvenient during large-scale applications. Our findings provide a key set of recommendations that should be adopted for the reliable implementation of nonequilibrium approaches to relative binding free energy calculations in ligand-protein systems.
Collapse
Affiliation(s)
- Shunzhou Wan
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Agastya P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
- Advanced
Research Computing Centre, University College
London, London WC1H 0AJ, U.K.
- Computational
Science Laboratory, Institute for Informatics, Faculty of Science, University of Amsterdam, Amsterdam 1012 WP, Netherlands
| |
Collapse
|
8
|
The performance of ensemble-based free energy protocols in computing binding affinities to ROS1 kinase. Sci Rep 2022; 12:10433. [PMID: 35729177 PMCID: PMC9211793 DOI: 10.1038/s41598-022-13319-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 05/23/2022] [Indexed: 11/08/2022] Open
Abstract
Optimization of binding affinities for compounds to their target protein is a primary objective in drug discovery. Herein we report on a collaborative study that evaluates a set of compounds binding to ROS1 kinase. We use ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) and TIES (thermodynamic integration with enhanced sampling) protocols to rank the binding free energies. The predicted binding free energies from ESMACS simulations show good correlations with experimental data for subsets of the compounds. Consistent binding free energy differences are generated for TIES and ESMACS. Although an unexplained overestimation exists, we obtain excellent statistical rankings across the set of compounds from the TIES protocol, with a Pearson correlation coefficient of 0.90 between calculated and experimental activities.
Collapse
|
9
|
Wade A, Bhati AP, Wan S, Coveney PV. Alchemical Free Energy Estimators and Molecular Dynamics Engines: Accuracy, Precision, and Reproducibility. J Chem Theory Comput 2022; 18:3972-3987. [PMID: 35609233 PMCID: PMC9202356 DOI: 10.1021/acs.jctc.2c00114] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Indexed: 11/28/2022]
Abstract
The binding free energy between a ligand and its target protein is an essential quantity to know at all stages of the drug discovery pipeline. Assessing this value computationally can offer insight into where efforts should be focused in the pursuit of effective therapeutics to treat a myriad of diseases. In this work, we examine the computation of alchemical relative binding free energies with an eye for assessing reproducibility across popular molecular dynamics packages and free energy estimators. The focus of this work is on 54 ligand transformations from a diverse set of protein targets: MCL1, PTP1B, TYK2, CDK2, and thrombin. These targets are studied with three popular molecular dynamics packages: OpenMM, NAMD2, and NAMD3 alpha. Trajectories collected with these packages are used to compare relative binding free energies calculated with thermodynamic integration and free energy perturbation methods. The resulting binding free energies show good agreement between molecular dynamics packages with an average mean unsigned error between them of 0.50 kcal/mol. The correlation between packages is very good, with the lowest Spearman's, Pearson's and Kendall's tau correlation coefficients being 0.92, 0.91, and 0.76, respectively. Agreement between thermodynamic integration and free energy perturbation is shown to be very good when using ensemble averaging.
Collapse
Affiliation(s)
- Alexander
D. Wade
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, UK
| | - Agastya P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, UK
| | - Shunzhou Wan
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, UK
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, UK
- Informatics
Institute, University of Amsterdam, Amsterdam 1098XH, The Netherlands
- Advanced
Research Computing Centre, University College
London, London WC1H 0AJ, UK
| |
Collapse
|