1
|
Hahn DF, Gapsys V, de Groot BL, Mobley DL, Tresadern G. Current State of Open Source Force Fields in Protein-Ligand Binding Affinity Predictions. J Chem Inf Model 2024; 64:5063-5076. [PMID: 38895959 PMCID: PMC11234369 DOI: 10.1021/acs.jcim.4c00417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/23/2024] [Accepted: 04/25/2024] [Indexed: 06/21/2024]
Abstract
In drug discovery, the in silico prediction of binding affinity is one of the major means to prioritize compounds for synthesis. Alchemical relative binding free energy (RBFE) calculations based on molecular dynamics (MD) simulations are nowadays a popular approach for the accurate affinity ranking of compounds. MD simulations rely on empirical force field parameters, which strongly influence the accuracy of the predicted affinities. Here, we evaluate the ability of six different small-molecule force fields to predict experimental protein-ligand binding affinities in RBFE calculations on a set of 598 ligands and 22 protein targets. The public force fields OpenFF Parsley and Sage, GAFF, and CGenFF show comparable accuracy, while OPLS3e is significantly more accurate. However, a consensus approach using Sage, GAFF, and CGenFF leads to accuracy comparable to OPLS3e. While Parsley and Sage are performing comparably based on aggregated statistics across the whole dataset, there are differences in terms of outliers. Analysis of the force field reveals that improved parameters lead to significant improvement in the accuracy of affinity predictions on subsets of the dataset involving those parameters. Lower accuracy can not only be attributed to the force field parameters but is also dependent on input preparation and sampling convergence of the calculations. Especially large perturbations and nonconverged simulations lead to less accurate predictions. The input structures, Gromacs force field files, as well as the analysis Python notebooks are available on GitHub.
Collapse
Affiliation(s)
- David F. Hahn
- Computational
Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse 2340, Belgium
| | - Vytautas Gapsys
- Computational
Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse 2340, Belgium
- Computational
Biomolecular Dynamics Group, Max Planck
Institute for Multidisciplinary Sciences, Am Fassberg 11, Göttingen 37077, Germany
| | - Bert L. de Groot
- Computational
Biomolecular Dynamics Group, Max Planck
Institute for Multidisciplinary Sciences, Am Fassberg 11, Göttingen 37077, Germany
| | - David L. Mobley
- Department
of Chemistry, University of California, Irvine, California 92697, United States
- Department
of Pharmaceutical Sciences, University of
California, Irvine, California 92697, United States
| | - Gary Tresadern
- Computational
Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse 2340, Belgium
| |
Collapse
|
2
|
Kurniawan J, Ishida T. Comparing Supervised Learning and Rigorous Approach for Predicting Protein Stability upon Point Mutations in Difficult Targets. J Chem Inf Model 2023; 63:6778-6788. [PMID: 37897811 DOI: 10.1021/acs.jcim.3c00750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2023]
Abstract
Accurate prediction of protein stability upon a point mutation has important applications in drug discovery and personalized medicine. It remains a challenging issue in computational biology. Existing computational prediction methods, which range from mechanistic to supervised learning approaches, have experienced limited progress over the last few decades. This stagnation is largely due to their heavy reliance on both the quantity and quality of the training data. This is evident in recent state-of-the-art methods that continue to yield substantial errors on two challenging blind test sets: frataxin and p53, with average root-mean-square errors exceeding 3 and 1.5 kcal/mol, respectively, which is still above the theoretical 1 kcal/mol prediction barrier. Rigorous approaches, on the other hand, offer greater potential for accuracy without relying on training data but are computationally demanding and require both wild-type and mutant structure information. Although they showed high accuracy for conserving mutations, their performance is still limited for charge-changing mutation cases. This might be due to the lack of an available mutant structure, often represented by a simplified capped peptide. The recent advances in protein structure prediction methods now make it possible to obtain structures comparable to experimental ones, including complete mutant structure information. In this work, we compare the performance of supervised learning-based methods and rigorous approaches for predicting protein stability on point mutations in difficult targets: frataxin and p53. The rigorous alchemical method significantly surpasses state-of-the-art techniques in terms of both the root-mean-squared error and Pearson correlation coefficient in these two challenging blind test sets. Additionally, we propose an improved alchemical method that employs the pmx double-system/single-box approach to accurately predict the folding free energy change upon both conserving and charge-changing mutations. The enhanced protocol can accurately predict both types of mutations, thereby outperforming existing state-of-the-art methods in overall performance.
Collapse
Affiliation(s)
- Jason Kurniawan
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
3
|
Jackson V, Hermann J, Tynan CJ, Rolfe DJ, Corey RA, Duncan AL, Noriega M, Chu A, Kalli AC, Jones EY, Sansom MSP, Martin-Fernandez ML, Seiradake E, Chavent M. The guidance and adhesion protein FLRT2 dimerizes in cis via dual small-X 3-small transmembrane motifs. Structure 2022; 30:1354-1365.e5. [PMID: 35700726 DOI: 10.1016/j.str.2022.05.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 03/03/2022] [Accepted: 05/18/2022] [Indexed: 10/18/2022]
Abstract
Fibronectin Leucine-rich Repeat Transmembrane (FLRT 1-3) proteins are a family of broadly expressed single-spanning transmembrane receptors that play key roles in development. Their extracellular domains mediate homotypic cell-cell adhesion and heterotypic protein interactions with other receptors to regulate cell adhesion and guidance. These in trans FLRT interactions determine the formation of signaling complexes of varying complexity and function. Whether FLRTs also interact at the surface of the same cell, in cis, remains unknown. Here, molecular dynamics simulations reveal two dimerization motifs in the FLRT2 transmembrane helix. Single particle tracking experiments show that these Small-X3-Small motifs synergize with a third dimerization motif encoded in the extracellular domain to permit the cis association and co-diffusion patterns of FLRT2 receptors on cells. These results may point to a competitive switching mechanism between in cis and in trans interactions, which suggests that homotypic FLRT interaction mirrors the functionalities of classic adhesion molecules.
Collapse
Affiliation(s)
- Verity Jackson
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 5RJ, UK
| | - Julia Hermann
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 5RJ, UK
| | - Christopher J Tynan
- Central Laser Facility, Research Complex at Harwell, Science and Technology Facilities Council, Harwell Campus, Didcot, OX11 0FA, UK
| | - Daniel J Rolfe
- Central Laser Facility, Research Complex at Harwell, Science and Technology Facilities Council, Harwell Campus, Didcot, OX11 0FA, UK
| | - Robin A Corey
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 5RJ, UK
| | - Anna L Duncan
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 5RJ, UK
| | - Maxime Noriega
- Institut de Pharmacologie et Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, 205 route de Narbonne, 31400 Toulouse, France
| | - Amy Chu
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 5RJ, UK
| | - Antreas C Kalli
- Leeds Institute of Cardiovascular and Metabolic Medicine, School of Medicine and Astbury Center for Structural Molecular Biology, University of Leeds, Leeds, LS2 9NL, UK
| | - E Yvonne Jones
- Division of Structural Biology, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Mark S P Sansom
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 5RJ, UK
| | - Marisa L Martin-Fernandez
- Central Laser Facility, Research Complex at Harwell, Science and Technology Facilities Council, Harwell Campus, Didcot, OX11 0FA, UK.
| | - Elena Seiradake
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 5RJ, UK.
| | - Matthieu Chavent
- Institut de Pharmacologie et Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, 205 route de Narbonne, 31400 Toulouse, France.
| |
Collapse
|
4
|
Gapsys V, Hahn DF, Tresadern G, Mobley DL, Rampp M, de Groot BL. Pre-Exascale Computing of Protein-Ligand Binding Free Energies with Open Source Software for Drug Design. J Chem Inf Model 2022; 62:1172-1177. [PMID: 35191702 PMCID: PMC8924919 DOI: 10.1021/acs.jcim.1c01445] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Nowadays, drug design projects benefit from highly accurate protein-ligand binding free energy predictions based on molecular dynamics simulations. While such calculations have been computationally expensive in the past, we now demonstrate that workflows built on open source software packages can efficiently leverage pre-exascale computing resources to screen hundreds of compounds in a matter of days. We report our results of free energy calculations on a large set of pharmaceutically relevant targets assembled to reflect industrial drug discovery projects.
Collapse
Affiliation(s)
- Vytautas Gapsys
- Computational
Biomolecular Dynamics Group, Max-Planck
Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - David F. Hahn
- Computational
Chemistry, Janssen Research and Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Gary Tresadern
- Computational
Chemistry, Janssen Research and Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - David L. Mobley
- Department
of Pharmaceutical Sciences, University of
California, Irvine, California 92697, United States
| | - Markus Rampp
- Max-Planck
Computing and Data Facility, Giessenbachstrasse 2, 85748 Garching, Germany
| | - Bert L. de Groot
- Computational
Biomolecular Dynamics Group, Max-Planck
Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| |
Collapse
|
5
|
Accurate absolute free energies for ligand-protein binding based on non-equilibrium approaches. Commun Chem 2021; 4:61. [PMID: 36697634 PMCID: PMC9814727 DOI: 10.1038/s42004-021-00498-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 03/24/2021] [Indexed: 01/28/2023] Open
Abstract
The accurate calculation of the binding free energy for arbitrary ligand-protein pairs is a considerable challenge in computer-aided drug discovery. Recently, it has been demonstrated that current state-of-the-art molecular dynamics (MD) based methods are capable of making highly accurate predictions. Conventional MD-based approaches rely on the first principles of statistical mechanics and assume equilibrium sampling of the phase space. In the current work we demonstrate that accurate absolute binding free energies (ABFE) can also be obtained via theoretically rigorous non-equilibrium approaches. Our investigation of ligands binding to bromodomains and T4 lysozyme reveals that both equilibrium and non-equilibrium approaches converge to the same results. The non-equilibrium approach achieves the same level of accuracy and convergence as an equilibrium free energy perturbation (FEP) method enhanced by Hamiltonian replica exchange. We also compare uni- and bi-directional non-equilibrium approaches and demonstrate that considering the work distributions from both forward and reverse directions provides substantial accuracy gains. In summary, non-equilibrium ABFE calculations are shown to yield reliable and well-converged estimates of protein-ligand binding affinity.
Collapse
|
6
|
Schultz AJ, Kofke DA. Identifying and estimating bias in overlap-sampling free-energy calculations. MOLECULAR SIMULATION 2021. [DOI: 10.1080/08927022.2020.1758695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Andrew J. Schultz
- Department of Chemical and Biological Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - David A. Kofke
- Department of Chemical and Biological Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| |
Collapse
|
7
|
Aldeghi M, de Groot BL, Gapsys V. Accurate Calculation of Free Energy Changes upon Amino Acid Mutation. Methods Mol Biol 2019; 1851:19-47. [PMID: 30298390 DOI: 10.1007/978-1-4939-8736-8_2] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Molecular dynamics based free energy calculations allow for a robust and accurate evaluation of free energy changes upon amino acid mutation in proteins. In this chapter we cover the basic theoretical concepts important for the use of calculations utilizing the non-equilibrium alchemical switching methodology. We further provide a detailed step-by-step protocol for estimating the effect of a single amino acid mutation on protein thermostability. In addition, the potential caveats and solutions to some frequently encountered issues concerning the non-equilibrium alchemical free energy calculations are discussed. The protocol comprises details for the hybrid structure/topology generation required for alchemical transitions, equilibrium simulation setup, and description of the fast non-equilibrium switching. Subsequently, the analysis of the obtained results is described. The steps in the protocol are complemented with an illustrative practical application: a destabilizing mutation in the Trp cage mini protein. The concepts that are described are generally applicable. The shown example makes use of the pmx software package for the free energy calculations using Gromacs as a molecular dynamics engine. Finally, we discuss how the current protocol can readily be adapted to carry out charge-changing or multiple mutations at once, as well as large-scale mutational scans.
Collapse
Affiliation(s)
- Matteo Aldeghi
- Max Planck Institute for Biophysical Chemistry, Computational Biomolecular Dynamics Group, Am Fassberg, 11, 37077, Göttingen, Germany.
| | - Bert L de Groot
- Max Planck Institute for Biophysical Chemistry, Computational Biomolecular Dynamics Group, Am Fassberg, 11, 37077, Göttingen, Germany.
| | - Vytautas Gapsys
- Max Planck Institute for Biophysical Chemistry, Computational Biomolecular Dynamics Group, Am Fassberg, 11, 37077, Göttingen, Germany.
| |
Collapse
|
8
|
Chelli R. Local Sampling in Steered Monte Carlo Simulations Decreases Dissipation and Enhances Free Energy Estimates via Nonequilibrium Work Theorems. J Chem Theory Comput 2012; 8:4040-52. [DOI: 10.1021/ct300348w] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Riccardo Chelli
- Dipartimento di Chimica,
Università di Firenze,
Via della Lastruccia 3, I-50019 Sesto Fiorentino, Italy
- European Laboratory for Nonlinear Spectroscopy (LENS),
Via Nello Carrara 1, I-50019 Sesto Fiorentino, Italy
| |
Collapse
|
9
|
Kim I, Allen TW. Bennett's acceptance ratio and histogram analysis methods enhanced by umbrella sampling along a reaction coordinate in configurational space. J Chem Phys 2012; 136:164103. [PMID: 22559466 DOI: 10.1063/1.3701766] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Free energy perturbation, a method for computing the free energy difference between two states, is often combined with non-Boltzmann biased sampling techniques in order to accelerate the convergence of free energy calculations. Here we present a new extension of the Bennett acceptance ratio (BAR) method by combining it with umbrella sampling (US) along a reaction coordinate in configurational space. In this approach, which we call Bennett acceptance ratio with umbrella sampling (BAR-US), the conditional histogram of energy difference (a mapping of the 3N-dimensional configurational space via a reaction coordinate onto 1D energy difference space) is weighted for marginalization with the associated population density along a reaction coordinate computed by US. This procedure produces marginal histograms of energy difference, from forward and backward simulations, with higher overlap in energy difference space, rendering free energy difference estimations using BAR statistically more reliable. In addition to BAR-US, two histogram analysis methods, termed Bennett overlapping histograms with US (BOH-US) and Bennett-Hummer (linear) least square with US (BHLS-US), are employed as consistency and convergence checks for free energy difference estimation by BAR-US. The proposed methods (BAR-US, BOH-US, and BHLS-US) are applied to a 1-dimensional asymmetric model potential, as has been used previously to test free energy calculations from non-equilibrium processes. We then consider the more stringent test of a 1-dimensional strongly (but linearly) shifted harmonic oscillator, which exhibits no overlap between two states when sampled using unbiased Brownian dynamics. We find that the efficiency of the proposed methods is enhanced over the original Bennett's methods (BAR, BOH, and BHLS) through fast uniform sampling of energy difference space via US in configurational space. We apply the proposed methods to the calculation of the electrostatic contribution to the absolute solvation free energy (excess chemical potential) of water. We then address the controversial issue of ion selectivity in the K(+) ion channel, KcsA. We have calculated the relative binding affinity of K(+) over Na(+) within a binding site of the KcsA channel for which different, though adjacent, K(+) and Na(+) configurations exist, ideally suited to these US-enhanced methods. Our studies demonstrate that the significant improvements in free energy calculations obtained using the proposed methods can have serious consequences for elucidating biological mechanisms and for the interpretation of experimental data.
Collapse
Affiliation(s)
- Ilsoo Kim
- Department of Chemistry, University of California, One Shields Avenue, Davis, California 95616, USA.
| | | |
Collapse
|
10
|
Minh DDL, Chodera JD. Estimating equilibrium ensemble averages using multiple time slices from driven nonequilibrium processes: theory and application to free energies, moments, and thermodynamic length in single-molecule pulling experiments. J Chem Phys 2011; 134:024111. [PMID: 21241084 DOI: 10.1063/1.3516517] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Recently discovered identities in statistical mechanics have enabled the calculation of equilibrium ensemble averages from realizations of driven nonequilibrium processes, including single-molecule pulling experiments and analogous computer simulations. Challenges in collecting large data sets motivate the pursuit of efficient statistical estimators that maximize use of available information. Along these lines, Hummer and Szabo developed an estimator that combines data from multiple time slices along a driven nonequilibrium process to compute the potential of mean force. Here, we generalize their approach, pooling information from multiple time slices to estimate arbitrary equilibrium expectations. Our expression may be combined with estimators of path-ensemble averages, including existing optimal estimators that use data collected by unidirectional and bidirectional protocols. We demonstrate the estimator by calculating free energies, moments of the polymer extension, the thermodynamic metric tensor, and the thermodynamic length in a model single-molecule pulling experiment. Compared to estimators that only use individual time slices, our multiple time-slice estimators yield substantially smoother estimates and achieve lower variance for higher-order moments.
Collapse
Affiliation(s)
- David D L Minh
- Biosciences Division, Argonne National Laboratory, 9700 S. Cass Ave., Argonne, Illinois 60439, USA.
| | | |
Collapse
|
11
|
Nicolini P, Frezzato D, Chelli R. Exploiting Configurational Freezing in Nonequilibrium Monte Carlo Simulations. J Chem Theory Comput 2011; 7:582-93. [DOI: 10.1021/ct100568n] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Paolo Nicolini
- Dipartimento di Chimica, Università di Firenze, Via della Lastruccia 3, I-50019 Sesto Fiorentino, Italy
| | - Diego Frezzato
- Dipartimento di Scienze Chimiche, Università di Padova, Via Marzolo 1, I-35131 Padova, Italy
| | - Riccardo Chelli
- Dipartimento di Chimica, Università di Firenze, Via della Lastruccia 3, I-50019 Sesto Fiorentino, Italy
- European Laboratory for Nonlinear Spectroscopy (LENS), Via Nello Carrara 1, I-50019 Sesto Fiorentino, Italy
| |
Collapse
|
12
|
|