1
|
The SAMPL9 host-guest blind challenge: an overview of binding free energy predictive accuracy. Phys Chem Chem Phys 2024; 26:9207-9225. [PMID: 38444308 PMCID: PMC10954238 DOI: 10.1039/d3cp05111k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 02/03/2024] [Indexed: 03/07/2024]
Abstract
We report the results of the SAMPL9 host-guest blind challenge for predicting binding free energies. The challenge focused on macrocycles from pillar[n]-arene and cyclodextrin host families, including WP6, and bCD and HbCD. A variety of methods were used by participants to submit binding free energy predictions. A machine learning approach based on molecular descriptors achieved the highest accuracy (RMSE of 2.04 kcal mol-1) among the ranked methods in the WP6 dataset. Interestingly, predictions for WP6 obtained via docking tended to outperform all methods (RMSE of 1.70 kcal mol-1), most of which are MD based and computationally more expensive. In general, methods applying force fields achieved better correlation with experiments for WP6 opposed to the machine learning and docking models. In the cyclodextrin-phenothiazine challenge, the ATM approach emerged as the top performing method with RMSE less than 1.86 kcal mol-1. Correlation metrics of ranked methods in this dataset were relatively poor compared to WP6. We also highlight several lessons learned to guide future work and help improve studies on the systems discussed. For example, WP6 may be present in other microstates other than its -12 state in the presence of certain guests. Machine learning approaches can be used to fine tune or help train force fields for certain chemistry (i.e. WP6-G4). Certain phenothiazines occupy distinct primary and secondary orientations, some of which were considered individually for accurate binding free energies. The accuracy of predictions from certain methods while starting from a single binding pose/orientation demonstrates the sensitivity of calculated binding free energies to the orientation, and in some cases the likely dominant orientation for the system. Computational and experimental results suggest that guest phenothiazine core traverses both the secondary and primary faces of the cyclodextrin hosts, a bulky cationic side chain will primarily occupy the primary face, and the phenothiazine core substituent resides at the larger secondary face.
Collapse
|
2
|
A Fast, Convenient, Polarizable Electrostatic Model for Molecular Dynamics. J Chem Theory Comput 2024; 20:1293-1305. [PMID: 38240687 PMCID: PMC10867846 DOI: 10.1021/acs.jctc.3c01171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
We present an efficient polarizable electrostatic model, utilizing typed, atom-centered polarizabilities and the fast direct approximation, designed for efficient use in molecular dynamics (MD) simulations. The model provides two convenient approaches for assigning partial charges in the context of atomic polarizabilities. One is a generalization of RESP, called RESP-dPol, and the other, AM1-BCC-dPol, is an adaptation of the widely used AM1-BCC method. Both are designed to accurately replicate gas-phase quantum mechanical electrostatic potentials. Benchmarks of this polarizable electrostatic model against gas-phase dipole moments, molecular polarizabilities, bulk liquid densities, and static dielectric constants of organic liquids show good agreement with the reference values. Of note, the model yields markedly more accurate dielectric constants of organic liquids, relative to a matched nonpolarizable force field. MD simulations with this method, which is currently parametrized for molecules containing elements C, N, O, and H, run only about 3.6-fold slower than fixed charge force fields, while simulations with the self-consistent mutual polarization average 4.5-fold slower. Our results suggest that RESP-dPol and AM1-BCC-dPol afford improved accuracy relative to fixed charge force fields and are good starting points for developing general, affordable, and transferable polarizable force fields. The software implementing these approaches has been designed to utilize the force field fitting frameworks developed and maintained by the Open Force Field Initiative, setting the stage for further exploration of this approach to polarizable force field development.
Collapse
|
3
|
PopShift: A Thermodynamically Sound Approach to Estimate Binding Free Energies by Accounting for Ligand-Induced Population Shifts from a Ligand-Free Markov State Model. J Chem Theory Comput 2024; 20:1036-1050. [PMID: 38291966 PMCID: PMC10867841 DOI: 10.1021/acs.jctc.3c00870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 02/01/2024]
Abstract
Obtaining accurate binding free energies from in silico screens has been a long-standing goal for the computational chemistry community. However, accuracy and computational cost are at odds with one another, limiting the utility of methods that perform this type of calculation. Many methods achieve massive scale by explicitly or implicitly assuming that the target protein adopts a single structure, or undergoes limited fluctuations around that structure, to minimize computational cost. Others simulate each protein-ligand complex of interest, accepting lower throughput in exchange for better predictions of binding affinities. Here, we present the PopShift framework for accounting for the ensemble of structures a protein adopts and their relative probabilities. Protein degrees of freedom are enumerated once, and then arbitrarily many molecules can be screened against this ensemble. Specifically, we use Markov state models (MSMs) as a compressed representation of a protein's thermodynamic ensemble. We start with a ligand-free MSM and then calculate how addition of a ligand shifts the populations of each protein conformational state based on the strength of the interaction between that protein conformation and the ligand. In this work we use docking to estimate the affinity between a given protein structure and ligand, but any estimator of binding affinities could be used in the PopShift framework. We test PopShift on the classic benchmark pocket T4 Lysozyme L99A. We find that PopShift is more accurate than common strategies, such as docking to a single structure and traditional ensemble docking─producing results that compare favorably with alchemical binding free energy calculations in terms of RMSE but not correlation─and may have a more favorable computational cost profile in some applications. In addition to predicting binding free energies and ligand poses, PopShift also provides insight into how the probability of different protein structures is shifted upon addition of various concentrations of ligand, providing a platform for predicting affinities and allosteric effects of ligand binding. Therefore, we expect PopShift will be valuable for hit finding and for providing insight into phenomena like allostery.
Collapse
|
4
|
Abstract
DNA-encoded libraries (DELs) provide the means to make and screen millions of diverse compounds against a target of interest in a single experiment. However, despite producing large volumes of binding data at a relatively low cost, the DEL selection process is susceptible to noise, necessitating computational follow-up to increase signal-to-noise ratios. In this work, we present a set of informatics tools to employ data from prior DEL screen(s) to gain information about which building blocks are most likely to be productive when designing new DELs for the same target. We demonstrate that similar building blocks have similar probabilities of forming compounds that bind. We then build a model from the inference that the combined behavior of individual building blocks is predictive of whether an overall compound binds. We illustrate our approach on a set of three-cycle OpenDEL libraries screened against soluble epoxide hydrolase (sEH) and report performance of more than an order of magnitude greater than random guessing on a holdout set, demonstrating that our model can serve as a baseline for comparison against other machine learning models on DEL data. Lastly, we provide a discussion on how we believe this informatics workflow could be applied to benefit researchers in their specific DEL campaigns.
Collapse
|
5
|
A transferable double exponential potential for condensed phase simulations of small molecules. DIGITAL DISCOVERY 2023; 2:1178-1187. [PMID: 38013814 PMCID: PMC10408570 DOI: 10.1039/d3dd00070b] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/07/2023] [Indexed: 11/29/2023]
Abstract
The Lennard-Jones potential is the most widely-used function for the description of non-bonded interactions in transferable force fields for the condensed phase. This is not because it has an optimal functional form, but rather it is a legacy resulting from when computational expense was a major consideration and this potential was particularly convenient numerically. At present, it persists because the effort that would be required to re-write molecular modelling software and train new force fields has, until now, been prohibitive. Here, we present Smirnoff-plugins as a flexible framework to extend the Open Force Field software stack to allow custom force field functional forms. We deploy Smirnoff-plugins with the automated Open Force Field infrastructure to train a transferable, small molecule force field based on the recently-proposed double exponential functional form, on over 1000 experimental condensed phase properties. Extensive testing of the resulting force field shows improvements in transfer free energies, with acceptable conformational energetics, run times and convergence properties compared to state-of-the-art Lennard-Jones based force fields.
Collapse
|
6
|
PopShift: A thermodynamically sound approach to estimate binding free energies by accounting for ligand-induced population shifts from a ligand-free MSM. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.14.549110. [PMID: 37503302 PMCID: PMC10370083 DOI: 10.1101/2023.07.14.549110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Obtaining accurate binding free energies from in silico screens has been a longstanding goal for the computational chemistry community. However, accuracy and computational cost are at odds with one another, limiting the utility of methods that perform this type of calculation. Many methods achieve massive scale by explicitly or implicitly assuming that the target protein adopts a single structure, or undergoes limited fluctuations around that structure, to minimize computational cost. Others simulate each protein-ligand complex of interest, accepting lower throughput in exchange for better predictions of binding affinities. Here, we present the PopShift framework for accounting for the ensemble of structures a protein adopts and their relative probabilities. Protein degrees of freedom are enumerated once, and then arbitrarily many molecules can be screened against this ensemble. Specifically, we use Markov state models (MSMs) as a compressed representation of a protein's thermodynamic ensemble. We start with a ligand-free MSM and then calculate how addition of a ligand shifts the populations of each protein conformational state based on the strength of the interaction between that protein conformation and the ligand. In this work we use docking to estimate the affinity between a given protein structure and ligand, but any estimator of binding affinities could be used in the PopShift framework. We test PopShift on the classic benchmark pocket T4 Lysozyme L99A. We find that PopShift is more accurate than common strategies, such as docking to a single structure and traditional ensemble docking-producing results that compare favorably with alchemical binding free energy calculations in terms of RMSE but not correlation - and may have a more favorable computational cost profile in some applications. In addition to predicting binding free energies and ligand poses, PopShift also provides insight into how the probability of different protein structures is shifted upon addition of various concentrations of ligand, providing a platform for predicting affinities and allosteric effects of ligand binding. Therefore, we expect PopShift will be valuable for hit finding and for providing insight into phenomena like allostery.
Collapse
|
7
|
Broadening the Scope of Binding Free Energy Calculations Using a Separated Topologies Approach. J Chem Theory Comput 2023; 19:5058-5076. [PMID: 37487138 PMCID: PMC10413862 DOI: 10.1021/acs.jctc.3c00282] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Indexed: 07/26/2023]
Abstract
Binding free energy calculations predict the potency of compounds to protein binding sites in a physically rigorous manner and see broad application in prioritizing the synthesis of novel drug candidates. Relative binding free energy (RBFE) calculations have emerged as an industry-standard approach to achieve highly accurate rank-order predictions of the potency of related compounds; however, this approach requires that the ligands share a common scaffold and a common binding mode, restricting the methods' domain of applicability. This is a critical limitation since complex modifications to the ligands, especially core hopping, are very common in drug design. Absolute binding free energy (ABFE) calculations are an alternate method that can be used for ligands that are not congeneric. However, ABFE suffers from a known problem of long convergence times due to the need to sample additional degrees of freedom within each system, such as sampling rearrangements necessary to open and close the binding site. Here, we report on an alternative method for RBFE, called Separated Topologies (SepTop), which overcomes the issues in both of the aforementioned methods by enabling large scaffold changes between ligands with a convergence time comparable to traditional RBFE. Instead of only mutating atoms that vary between two ligands, this approach performs two absolute free energy calculations at the same time in opposite directions, one for each ligand. Defining the two ligands independently allows the comparison of the binding of diverse ligands without the artificial constraints of identical poses or a suitable atom-atom mapping. This approach also avoids the need to sample the unbound state of the protein, making it more efficient than absolute binding free energy calculations. Here, we introduce an implementation of SepTop. We developed a general and efficient protocol for running SepTop, and we demonstrated the method on four diverse, pharmaceutically relevant systems. We report the performance of the method, as well as our practical insights into the strengths, weaknesses, and challenges of applying this method in an industrial drug design setting. We find that the accuracy of the approach is sufficiently high to rank order ligands with an accuracy comparable to traditional RBFE calculations while maintaining the additional flexibility of SepTop.
Collapse
|
8
|
Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field. J Chem Theory Comput 2023; 19:3251-3275. [PMID: 37167319 PMCID: PMC10269353 DOI: 10.1021/acs.jctc.3c00039] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Indexed: 05/13/2023]
Abstract
We introduce the Open Force Field (OpenFF) 2.0.0 small molecule force field for drug-like molecules, code-named Sage, which builds upon our previous iteration, Parsley. OpenFF force fields are based on direct chemical perception, which generalizes easily to highly diverse sets of chemistries based on substructure queries. Like the previous OpenFF iterations, the Sage generation of OpenFF force fields was validated in protein-ligand simulations to be compatible with AMBER biopolymer force fields. In this work, we detail the methodology used to develop this force field, as well as the innovations and improvements introduced since the release of Parsley 1.0.0. One particularly significant feature of Sage is a set of improved Lennard-Jones (LJ) parameters retrained against condensed phase mixture data, the first refit of LJ parameters in the OpenFF small molecule force field line. Sage also includes valence parameters refit to a larger database of quantum chemical calculations than previous versions, as well as improvements in how this fitting is performed. Force field benchmarks show improvements in general metrics of performance against quantum chemistry reference data such as root-mean-square deviations (RMSD) of optimized conformer geometries, torsion fingerprint deviations (TFD), and improved relative conformer energetics (ΔΔE). We present a variety of benchmarks for these metrics against our previous force fields as well as in some cases other small molecule force fields. Sage also demonstrates improved performance in estimating physical properties, including comparison against experimental data from various thermodynamic databases for small molecule properties such as ΔHmix, ρ(x), ΔGsolv, and ΔGtrans. Additionally, we benchmarked against protein-ligand binding free energies (ΔGbind), where Sage yields results statistically similar to previous force fields. All the data is made publicly available along with complete details on how to reproduce the training results at https://github.com/openforcefield/openff-sage.
Collapse
|
9
|
To Design Scalable Free Energy Perturbation Networks, Optimal Is Not Enough. J Chem Inf Model 2023; 63:1776-1793. [PMID: 36878475 PMCID: PMC10547263 DOI: 10.1021/acs.jcim.2c01579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
Drug discovery is accelerated with computational methods such as alchemical simulations to estimate ligand affinities. In particular, relative binding free energy (RBFE) simulations are beneficial for lead optimization. To use RBFE simulations to compare prospective ligands in silico, researchers first plan the simulation experiment, using graphs where nodes represent ligands and graph edges represent alchemical transformations between ligands. Recent work demonstrated that optimizing the statistical architecture of these perturbation graphs improves the accuracy of the predicted changes in the free energy of ligand binding. Therefore, to improve the success rate of computational drug discovery, we present the open-source software package High Information Mapper (HiMap)─a new take on its predecessor, Lead Optimization Mapper (LOMAP). HiMap removes heuristics decisions from design selection and instead finds statistically optimal graphs over ligands clustered with machine learning. Beyond optimal design generation, we present theoretical insights for designing alchemical perturbation maps. Some of these results include that for n number of nodes, the precision of perturbation maps is stable at n·ln(n) edges. This result indicates that even an "optimal" graph can result in unexpectedly high errors if a plan includes too few alchemical transformations for the given number of ligands and edges. And, as a study compares more ligands, the performance of even optimal graphs will deteriorate with linear scaling of the edge count. In this sense, ensuring an A- or D-optimal topology is not enough to produce robust errors. We additionally find that optimal designs will converge more rapidly than radial and LOMAP designs. Moreover, we derive bounds for how clustering reduces cost for designs with a constant expected relative error per cluster, invariant of the size of the design. These results inform how to best design perturbation maps for computational drug discovery and have broader implications for experimental design.
Collapse
|
10
|
Enhanced Grand Canonical Sampling of Occluded Water Sites Using Nonequilibrium Candidate Monte Carlo. J Chem Theory Comput 2023; 19:1050-1062. [PMID: 36692215 PMCID: PMC9933432 DOI: 10.1021/acs.jctc.2c00823] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Indexed: 01/25/2023]
Abstract
Water molecules play a key role in many biomolecular systems, particularly when bound at protein-ligand interfaces. However, molecular simulation studies on such systems are hampered by the relatively long time scales over which water exchange between a protein and solvent takes place. Grand canonical Monte Carlo (GCMC) is a simulation technique that avoids this issue by attempting the insertion and deletion of water molecules within a given structure. The approach is constrained by low acceptance probabilities for insertions in congested systems, however. To address this issue, here, we combine GCMC with nonequilibium candidate Monte Carlo (NCMC) to yield a method that we refer to as grand canonical nonequilibrium candidate Monte Carlo (GCNCMC), in which the water insertions and deletions are carried out in a gradual, nonequilibrium fashion. We validate this new approach by comparing GCNCMC and GCMC simulations of bulk water and three protein binding sites. We find that not only is the efficiency of the water sampling improved by GCNCMC but that it also results in increased sampling of ligand conformations in a protein binding site, revealing new water-mediated ligand-binding geometries that are not observed using alternative enhanced sampling techniques.
Collapse
|
11
|
Development and benchmarking of an open, self-consistent force field for proteins and small molecules from the open force field initiative. Biophys J 2023; 122:421a. [PMID: 36784152 DOI: 10.1016/j.bpj.2022.11.2280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023] Open
|
12
|
Prioritizing Small Sets of Molecules for Synthesis through in-silico Tools: A Comparison of Common Ranking Methods. ChemMedChem 2023; 18:e202200425. [PMID: 36240514 PMCID: PMC9868080 DOI: 10.1002/cmdc.202200425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/10/2022] [Indexed: 01/26/2023]
Abstract
Prioritizing molecules for synthesis is a key role of computational methods within medicinal chemistry. Multiple tools exist for ranking molecules, from the cheap and popular molecular docking methods to more computationally expensive molecular-dynamics (MD)-based methods. It is often questioned whether the accuracy of the more rigorous methods justifies the higher computational cost and associated calculation time. Here, we compared the performance on ranking the binding of small molecules for seven scoring functions from five docking programs, one end-point method (MM/GBSA), and two MD-based free energy methods (PMX, FEP+). We investigated 16 pharmaceutically relevant targets with a total of 423 known binders. The performance of docking methods for ligand ranking was strongly system dependent. We observed that MD-based methods predominantly outperformed docking algorithms and MM/GBSA calculations. Based on our results, we recommend the application of MD-based free energy methods for prioritization of molecules for synthesis in lead optimization, whenever feasible.
Collapse
|
13
|
Molecular-dynamics simulation methods for macromolecular crystallography. Acta Crystallogr D Struct Biol 2023; 79:50-65. [PMID: 36601807 PMCID: PMC9815100 DOI: 10.1107/s2059798322011871] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 12/13/2022] [Indexed: 12/31/2022] Open
Abstract
It is investigated whether molecular-dynamics (MD) simulations can be used to enhance macromolecular crystallography (MX) studies. Historically, protein crystal structures have been described using a single set of atomic coordinates. Because conformational variation is important for protein function, researchers now often build models that contain multiple structures. Methods for building such models can fail, however, in regions where the crystallographic density is difficult to interpret, for example at the protein-solvent interface. To address this limitation, a set of MD-MX methods that combine MD simulations of protein crystals with conventional modeling and refinement tools have been developed. In an application to a cyclic adenosine monophosphate-dependent protein kinase at room temperature, the procedure improved the interpretation of ambiguous density, yielding an alternative water model and a revised protein model including multiple conformations. The revised model provides mechanistic insights into the catalytic and regulatory interactions of the enzyme. The same methods may be used in other MX studies to seek mechanistic insights.
Collapse
|
14
|
Collaborative Assessment of Molecular Geometries and Energies from the Open Force Field. J Chem Inf Model 2022; 62:6094-6104. [PMID: 36433835 PMCID: PMC9873353 DOI: 10.1021/acs.jcim.2c01185] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Force fields form the basis for classical molecular simulations, and their accuracy is crucial for the quality of, for instance, protein-ligand binding simulations in drug discovery. The huge diversity of small-molecule chemistry makes it a challenge to build and parameterize a suitable force field. The Open Force Field Initiative is a combined industry and academic consortium developing a state-of-the-art small-molecule force field. In this report, industry members of the consortium worked together to objectively evaluate the performance of the force fields (referred to here as OpenFF) produced by the initiative on a combined public and proprietary dataset of 19,653 relevant molecules selected from their internal research and compound collections. This evaluation was important because it was completely blind; at most partners, none of the molecules or data were used in force field development or testing prior to this work. We compare the Open Force Field "Sage" version 2.0.0 and "Parsley" version 1.3.0 with GAFF-2.11-AM1BCC, OPLS4, and SMIRNOFF99Frosst. We analyzed force-field-optimized geometries and conformer energies compared to reference quantum mechanical data. We show that OPLS4 performs best, and the latest Open Force Field release shows a clear improvement compared to its predecessors. The performance of established force fields such as GAFF-2.11 was generally worse. While OpenFF researchers were involved in building the benchmarking infrastructure used in this work, benchmarking was done entirely in-house within industrial organizations and the resulting assessment is reported here. This work assesses the force field performance using separate benchmarking steps, external datasets, and involving external research groups. This effort may also be unique in terms of the number of different industrial partners involved, with 10 different companies participating in the benchmark efforts.
Collapse
|
15
|
Abstract
The development of accurate transferable force fields is key to realizing the full potential of atomistic modeling in the study of biological processes such as protein-ligand binding for drug discovery. State-of-the-art transferable force fields, such as those produced by the Open Force Field Initiative, use modern software engineering and automation techniques to yield accuracy improvements. However, force field torsion parameters, which must account for many stereoelectronic and steric effects, are considered to be less transferable than other force field parameters and are therefore often targets for bespoke parametrization. Here, we present the Open Force Field QCSubmit and BespokeFit software packages that, when combined, facilitate the fitting of torsion parameters to quantum mechanical reference data at scale. We demonstrate the use of QCSubmit for simplifying the process of creating and archiving large numbers of quantum chemical calculations, by generating a dataset of 671 torsion scans for druglike fragments. We use BespokeFit to derive individual torsion parameters for each of these molecules, thereby reducing the root-mean-square error in the potential energy surface from 1.1 kcal/mol, using the original transferable force field, to 0.4 kcal/mol using the bespoke version. Furthermore, we employ the bespoke force fields to compute the relative binding free energies of a congeneric series of inhibitors of the TYK2 protein, and demonstrate further improvements in accuracy, compared to the base force field (MUE reduced from 0.560.390.77 to 0.420.280.59 kcal/mol and R2 correlation improved from 0.720.350.87 to 0.930.840.97).
Collapse
|
16
|
Absolute Binding Free Energy Calculations for Buried Water Molecules. J Chem Theory Comput 2022; 18:6482-6499. [PMID: 36197451 PMCID: PMC9873352 DOI: 10.1021/acs.jctc.2c00658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Water often plays a key role in mediating protein-ligand interactions. Understanding contributions from active-site water molecules to binding thermodynamics of a ligand is important in predicting binding free energies for ligand optimization. In this work, we tested a non-equilibrium switching method for absolute binding free energy calculations on water molecules in binding sites of 13 systems. We discuss the lessons we learned about identified issues that affected our calculations and ways to address them. This work fits with our larger focus on how to do accurate ligand binding free energy calculations when water rearrangements are very slow, such as rearrangements due to ligand modification (as in relative free energy calculations) or ligand binding (as in absolute free energy calculations). The method studied in this work can potentially be used to account for limited water sampling via providing endpoint corrections to free energy calculations using our calculated binding free energy of water.
Collapse
|
17
|
An overview of the SAMPL8 host-guest binding challenge. J Comput Aided Mol Des 2022; 36:707-734. [PMID: 36229622 PMCID: PMC9596595 DOI: 10.1007/s10822-022-00462-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 06/21/2022] [Indexed: 11/23/2022]
Abstract
The SAMPL series of challenges aim to focus the community on specific modeling challenges, while testing and hopefully driving progress of computational methods to help guide pharmaceutical drug discovery. In this study, we report on the results of the SAMPL8 host–guest blind challenge for predicting absolute binding affinities. SAMPL8 focused on two host–guest datasets, one involving the cucurbituril CB8 (with a series of common drugs of abuse) and another involving two different Gibb deep-cavity cavitands. The latter dataset involved a previously featured deep cavity cavitand (TEMOA) as well as a new variant (TEETOA), both binding to a series of relatively rigid fragment-like guests. Challenge participants employed a reasonably wide variety of methods, though many of these were based on molecular simulations, and predictive accuracy was mixed. As in some previous SAMPL iterations (SAMPL6 and SAMPL7), we found that one approach to achieve greater accuracy was to apply empirical corrections to the binding free energy predictions, taking advantage of prior data on binding to these hosts. Another approach which performed well was a hybrid MD-based approach with reweighting to a force matched QM potential. In the cavitand challenge, an alchemical method using the AMOEBA-polarizable force field achieved the best success with RMSE less than 1 kcal/mol, while another alchemical approach (ATM/GAFF2-AM1BCC/TIP3P/HREM) had RMSE less than 1.75 kcal/mol. The work discussed here also highlights several important lessons; for example, retrospective studies of reference calculations demonstrate the sensitivity of predicted binding free energies to ethyl group sampling and/or guest starting pose, providing guidance to help improve future studies on these systems.
Collapse
|
18
|
Enhancing sampling of water rehydration upon ligand binding using variants of grand canonical Monte Carlo. J Comput Aided Mol Des 2022; 36:767-779. [PMID: 36198874 PMCID: PMC9869699 DOI: 10.1007/s10822-022-00479-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 09/15/2022] [Indexed: 01/26/2023]
Abstract
Water plays an important role in mediating protein-ligand interactions. Water rearrangement upon a ligand binding or modification can be very slow and beyond typical timescales used in molecular dynamics (MD) simulations. Thus, inadequate sampling of slow water motions in MD simulations often impairs the accuracy of the accuracy of ligand binding free energy calculations. Previous studies suggest grand canonical Monte Carlo (GCMC) outperforms normal MD simulations for water sampling, thus GCMC has been applied to help improve the accuracy of ligand binding free energy calculations. However, in prior work we observed protein and/or ligand motions impaired how well GCMC performs at water rehydration, suggesting more work is needed to improve this method to handle water sampling. In this work, we applied GCMC in 21 protein-ligand systems to assess the performance of GCMC for rehydrating buried water sites. While our results show that GCMC can rapidly rehydrate all selected water sites for most systems, it fails in five systems. In most failed systems, we observe protein/ligand motions, which occur in the absence of water, combine to close water sites and block instantaneous GCMC water insertion moves. For these five failed systems, we both extended our GCMC simulations and tested a new technique named grand canonical nonequilibrium candidate Monte Carlo (GCNCMC). GCNCMC combines GCMC with the nonequilibrium candidate Monte Carlo (NCMC) sampling technique to improve the probability of a successful water insertion/deletion. Our results show that GCNCMC and extended GCMC can rehydrate all target water sites for three of the five problematic systems and GCNCMC is more efficient than GCMC in two out of the three systems. In one system, only GCNCMC can rehydrate all target water sites, while GCMC fails. Both GCNCMC and GCMC fail in one system. This work suggests this new GCNCMC method is promising for water rehydration especially when protein/ligand motions may block water insertion/removal.
Collapse
|
19
|
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks [Article v0.1]. LIVING JOURNAL OF COMPUTATIONAL MOLECULAR SCIENCE 2022; 4:1497. [PMID: 36382113 PMCID: PMC9662604 DOI: 10.33011/livecoms.4.1.1497] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their expected accuracy on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and developers with a way to assess the expected impact of new methodologies. These assessments require construction of a benchmark-a set of well-prepared, high quality systems with corresponding experimental measurements designed to ensure the resulting calculations provide a realistic assessment of expected performance when these methods are deployed within their domains of applicability. To date, the community has not yet adopted a common standardized benchmark, and existing benchmark reports suffer from a myriad of issues, including poor data quality, limited statistical power, and statistically deficient analyses, all of which can conspire to produce benchmarks that are poorly predictive of real-world performance. Here, we address these issues by presenting guidelines for (1) curating experimental data to develop meaningful benchmark sets, (2) preparing benchmark inputs according to best practices to facilitate widespread adoption, and (3) analysis of the resulting predictions to enable statistically meaningful comparisons among methods and force fields. We highlight challenges and open questions that remain to be solved in these areas, as well as recommendations for the collection of new datasets that might optimally serve to measure progress as methods become systematically more reliable. Finally, we provide a curated, versioned, open, standardized benchmark set adherent to these standards (PLBenchmarks) and an open source toolkit for implementing standardized best practices assessments (arsenic) for the community to use as a standardized assessment tool. While our main focus is free energy methods based on molecular simulations, these guidelines should prove useful for assessment of the rapidly growing field of machine learning methods for affinity prediction as well.
Collapse
|
20
|
Improving Force Field Accuracy by Training against Condensed-Phase Mixture Properties. J Chem Theory Comput 2022; 18:3577-3592. [PMID: 35533269 DOI: 10.1021/acs.jctc.1c01268] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Developing a sufficiently accurate classical force field representation of molecules is key to realizing the full potential of molecular simulations as a route to gaining a fundamental insight into a broad spectrum of chemical and biological phenomena. This is only possible, however, if the many complex interactions between molecules of different species in the system are accurately captured by the model. Historically, the intermolecular van der Waals (vdW) interactions have primarily been trained against densities and enthalpies of vaporization of pure (single-component) systems, with occasional usage of hydration free energies. In this study, we demonstrate how including physical property data of binary mixtures can better inform these parameters, encoding more information about the underlying physics of the system in complex chemical mixtures. To demonstrate this, we retrain a select number of Lennard-Jones parameters describing the vdW interactions of the OpenFF 1.0.0 (Parsley) fixed charge force field against training sets composed of densities and enthalpies of mixing for binary liquid mixtures as well as densities and enthalpies of vaporization of pure liquid systems and assess the performance of each of these combinations. We show that retraining against the mixture data improves the force field's ability to reproduce mixture properties, including solvation free energies, correcting some systematic errors that exist when training vdW interactions against properties of pure systems only.
Collapse
|
21
|
Open Force Field Evaluator: An Automated, Efficient, and Scalable Framework for the Estimation of Physical Properties from Molecular Simulation. J Chem Theory Comput 2022; 18:3566-3576. [PMID: 35507313 DOI: 10.1021/acs.jctc.1c01111] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Developing accurate classical force field representations of molecules is key to realizing the full potential of molecular simulations, both as a powerful route to gaining fundamental insights into a broad spectrum of chemical and biological phenomena and for predicting physicochemical and mechanical properties of substances. The Open Force Field Consortium is an industry-funded open science effort to this end, developing open-source tools for rapidly generating new high-quality small-molecule force fields. An integral aspect of this is the parameterization and assessment of force fields against high-quality, condensed-phase physical property data, curated from open data sources such as the NIST ThermoML Archive, alongside quantum chemical data. The quantity of such experimental data in open data archives alone would require an onerous amount of human and computational resources to both curate and estimate manually, especially when estimations must be obtained for numerous sets of force field parameters. Here, we present an entirely automated, highly scalable framework for evaluating physical properties and their gradients in terms of force field parameters. It is written as a modular and extensible Python framework, which employs an intelligent multiscale estimation approach that allows for the automated estimation of properties from simulation and cached simulation data, and a pluggable API for estimation of new properties. In this study, we demonstrate the utility of the framework by benchmarking the OpenFF 1.0.0 small-molecule force field and GAFF 1.8 and GAFF 2.1 force fields against a test set of binary density and enthalpy of mixing measurements curated using the framework utilities. Further, we demonstrate the framework's utility as part of force field optimization by using it alongside ForceBalance, a framework for systematic force field optimization, to retrain a set of nonbonded van der Waals parameters against a training set of density and enthalpy of vaporization measurements.
Collapse
|
22
|
Abstract
Water often plays a key role in protein structure, molecular recognition, and mediating protein-ligand interactions. Thus, free energy calculations must adequately sample water motions, which often proves challenging in typical MD simulation time scales. Thus, the accuracy of methods relying on MD simulations ends up limited by slow water sampling. Particularly, as a ligand is removed or modified, bulk water may not have time to fill or rearrange in the binding site. In this work, we focus on several molecular dynamics (MD) simulation-based methods attempting to help rehydrate buried water sites: BLUES, using nonequilibrium candidate Monte Carlo (NCMC); grand, using grand canonical Monte Carlo (GCMC); and normal MD. We assess the accuracy and efficiency of these methods in rehydrating target water sites. We selected a range of systems with varying numbers of waters in the binding site, as well as those where water occupancy is coupled to the identity or binding mode of the ligand. We analyzed the rehydration of buried water sites in binding pockets using both clustering of trajectories and direct analysis of electron density maps. Our results suggest both BLUES and grand enhance water sampling relative to normal MD and grand is more robust than BLUES, but also that water sampling remains a major challenge for all of the methods tested. The lessons we learned for these methods and systems are discussed.
Collapse
|
23
|
Pre-Exascale Computing of Protein-Ligand Binding Free Energies with Open Source Software for Drug Design. J Chem Inf Model 2022; 62:1172-1177. [PMID: 35191702 PMCID: PMC8924919 DOI: 10.1021/acs.jcim.1c01445] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Nowadays, drug design projects benefit from highly accurate protein-ligand binding free energy predictions based on molecular dynamics simulations. While such calculations have been computationally expensive in the past, we now demonstrate that workflows built on open source software packages can efficiently leverage pre-exascale computing resources to screen hundreds of compounds in a matter of days. We report our results of free energy calculations on a large set of pharmaceutically relevant targets assembled to reflect industrial drug discovery projects.
Collapse
|
24
|
Building an open, self-consistent force field for biopolymers and small molecules with the open force field initiative infrastructure. Biophys J 2022. [DOI: 10.1016/j.bpj.2021.11.1367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open
|
25
|
Enhancing Paraoxon Binding to Organophosphorus Hydrolase Active Site. Int J Mol Sci 2021; 22:12624. [PMID: 34884430 PMCID: PMC8657610 DOI: 10.3390/ijms222312624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/12/2021] [Accepted: 11/14/2021] [Indexed: 11/16/2022] Open
Abstract
Organophosphorus hydrolase (OPH) is a metalloenzyme that can hydrolyze organophosphorus agents resulting in products that are generally of reduced toxicity. The best OPH substrate found to date is diethyl p-nitrophenyl phosphate (paraoxon). Most structural and kinetic studies assume that the binding orientation of paraoxon is identical to that of diethyl 4-methylbenzylphosphonate, which is the only substrate analog co-crystallized with OPH. In the current work, we used a combined docking and molecular dynamics (MD) approach to predict the likely binding mode of paraoxon. Then, we used the predicted binding mode to run MD simulations on the wild type (WT) OPH complexed with paraoxon, and OPH mutants complexed with paraoxon. Additionally, we identified three hot-spot residues (D253, H254, and I255) involved in the stability of the OPH active site. We then experimentally assayed single and double mutants involving these residues for paraoxon binding affinity. The binding free energy calculations and the experimental kinetics of the reactions between each OPH mutant and paraoxon show that mutated forms D253E, D253E-H254R, and D253E-I255G exhibit enhanced substrate binding affinity over WT OPH. Interestingly, our experimental results show that the substrate binding affinity of the double mutant D253E-H254R increased by 19-fold compared to WT OPH.
Collapse
|
26
|
Automated high throughput pK a and distribution coefficient measurements of pharmaceutical compounds for the SAMPL8 blind prediction challenge. J Comput Aided Mol Des 2021; 35:1141-1155. [PMID: 34714468 DOI: 10.1007/s10822-021-00427-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 10/13/2021] [Indexed: 11/28/2022]
Abstract
The goal of the Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) challenge is to improve the accuracy of current computational models to estimate free energy of binding, deprotonation, distribution and other associated physical properties that are useful for the design of new pharmaceutical products. New experimental datasets of physicochemical properties provide opportunities for prospective evaluation of computational prediction methods. Here, aqueous pKa and a range of bi-phasic logD values for a variety of pharmaceutical compounds were determined through a streamlined automated process to be utilized in the SAMPL8 physical property challenge. The goal of this paper is to provide an in-depth review of the experimental methods utilized to create a comprehensive data set for the blind prediction challenge. The significance of this work involves the use of high throughput experimentation equipment and instrumentation to produce acid dissociation constants for twenty-three drug molecules, as well as distribution coefficients for eleven of those molecules.
Collapse
|
27
|
Alchemical absolute protein-ligand binding free energies for drug design. Chem Sci 2021; 12:13958-13971. [PMID: 34760182 PMCID: PMC8549785 DOI: 10.1039/d1sc03472c] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 09/23/2021] [Indexed: 12/13/2022] Open
Abstract
The recent advances in relative protein-ligand binding free energy calculations have shown the value of alchemical methods in drug discovery. Accurately assessing absolute binding free energies, although highly desired, remains a challenging endeavour, mostly limited to small model cases. Here, we demonstrate accurate first principles based absolute binding free energy estimates for 128 pharmaceutically relevant targets. We use a novel rigorous method to generate protein-ligand ensembles for the ligand in its decoupled state. Not only do the calculations deliver accurate protein-ligand binding affinity estimates, but they also provide detailed physical insight into the structural determinants of binding. We identify subtle rotamer rearrangements between apo and holo states of a protein that are crucial for binding. When compared to relative binding free energy calculations, obtaining absolute binding free energies is considerably more challenging in large part due to the need to explicitly account for the protein in its apo state. In this work we present several approaches to obtain apo state ensembles for accurate absolute ΔG calculations, thus outlining protocols for prospective application of the methods for drug discovery.
Collapse
|
28
|
Development and Benchmarking of Open Force Field v1.0.0-the Parsley Small-Molecule Force Field. J Chem Theory Comput 2021; 17:6262-6280. [PMID: 34551262 PMCID: PMC8511297 DOI: 10.1021/acs.jctc.1c00571] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
We present a methodology for defining and optimizing a general force field for classical molecular simulations, and we describe its use to derive the Open Force Field 1.0.0 small-molecule force field, codenamed Parsley. Rather than using traditional atom typing, our approach is built on the SMIRKS-native Open Force Field (SMIRNOFF) parameter assignment formalism, which handles increases in the diversity and specificity of the force field definition without needlessly increasing the complexity of the specification. Parameters are optimized with the ForceBalance tool, based on reference quantum chemical data that include torsion potential energy profiles, optimized gas-phase structures, and vibrational frequencies. These quantum reference data are computed and are maintained with QCArchive, an open-source and freely available distributed computing and database software ecosystem. In this initial application of the method, we present essentially a full optimization of all valence parameters and report tests of the resulting force field against compounds and data types outside the training set. These tests show improvements in optimized geometries and conformational energetics and demonstrate that Parsley's accuracy for liquid properties is similar to that of other general force fields, as is accuracy on binding free energies. We find that this initial Parsley force field affords accuracy similar to that of other general force fields when used to calculate relative binding free energies spanning 199 protein-ligand systems. Additionally, the resulting infrastructure allows us to rapidly optimize an entirely new force field with minimal human intervention.
Collapse
|
29
|
Temperature artifacts in protein structures bias ligand-binding predictions. Chem Sci 2021; 12:11275-11293. [PMID: 34667539 PMCID: PMC8447925 DOI: 10.1039/d1sc02751d] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 07/09/2021] [Indexed: 12/14/2022] Open
Abstract
X-ray crystallography is the gold standard to resolve conformational ensembles that are significant for protein function, ligand discovery, and computational methods development. However, relevant conformational states may be missed at common cryogenic (cryo) data-collection temperatures but can be populated at room temperature. To assess the impact of temperature on making structural and computational discoveries, we systematically investigated protein conformational changes in response to temperature and ligand binding in a structural and computational workhorse, the T4 lysozyme L99A cavity. Despite decades of work on this protein, shifting to RT reveals new global and local structural changes. These include uncovering an apo helix conformation that is hidden at cryo but relevant for ligand binding, and altered side chain and ligand conformations. To evaluate the impact of temperature-induced protein and ligand changes on the utility of structural information in computation, we evaluated how temperature can mislead computational methods that employ cryo structures for validation. We find that when comparing simulated structures just to experimental cryo structures, hidden successes and failures often go unnoticed. When using structural information in ligand binding predictions, both coarse docking and rigorous binding free energy calculations are influenced by temperature effects. The trend that cryo artifacts limit the utility of structures for computation holds across five distinct protein classes. Our results suggest caution when consulting cryogenic structural data alone, as temperature artifacts can conceal errors and prevent successful computational predictions, which can mislead the development and application of computational methods in discovering bioactive molecules.
Collapse
|
30
|
Evaluation of log P, pK a, and log D predictions from the SAMPL7 blind challenge. J Comput Aided Mol Des 2021; 35:771-802. [PMID: 34169394 PMCID: PMC8224998 DOI: 10.1007/s10822-021-00397-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/05/2021] [Indexed: 12/16/2022]
Abstract
The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges focuses the computational modeling community on areas in need of improvement for rational drug design. The SAMPL7 physical property challenge dealt with prediction of octanol-water partition coefficients and pKa for 22 compounds. The dataset was composed of a series of N-acylsulfonamides and related bioisosteres. 17 research groups participated in the log P challenge, submitting 33 blind submissions total. For the pKa challenge, 7 different groups participated, submitting 9 blind submissions in total. Overall, the accuracy of octanol-water log P predictions in the SAMPL7 challenge was lower than octanol-water log P predictions in SAMPL6, likely due to a more diverse dataset. Compared to the SAMPL6 pKa challenge, accuracy remains unchanged in SAMPL7. Interestingly, here, though macroscopic pKa values were often predicted with reasonable accuracy, there was dramatically more disagreement among participants as to which microscopic transitions produced these values (with methods often disagreeing even as to the sign of the free energy change associated with certain transitions), indicating far more work needs to be done on pKa prediction methods.
Collapse
|
31
|
Abstract
Binding free energy calculations have become increasingly valuable to drive decision making in drug discovery projects. However, among other issues, inadequate sampling can reduce accuracy, limiting the value of the technique. In this paper, we apply absolute binding free energy calculations to ligands binding to T4 lysozyme L99A and HSP90 using equilibrium and nonequilibrium approaches. We highlight sampling problems encountered in these systems, such as slow side chain rearrangements and slow changes of water placement upon ligand binding. These same types of challenges are also likely to show up in other protein-ligand systems, and we propose some strategies to diagnose and test for such problems in alchemical free energy calculations. We also explore similarities and differences in how the equilibrium and the nonequilibrium approaches handle these problems. Our results show the large amount of work still to be done to make free energy calculations robust and reliable and provide insight for future research in this area.
Collapse
|
32
|
A Benchmark of Electrostatic Method Performance in Relative Binding Free Energy Calculations. J Chem Inf Model 2021; 61:1048-1052. [PMID: 33686853 DOI: 10.1021/acs.jcim.0c01424] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Relative free energy calculations are fast becoming a critical part of early stage pharmaceutical design, making it important to know how to obtain the best performance with these calculations in applications that could span hundreds of calculations and molecules. In this work, we compared two different treatments of long-range electrostatics, Particle Mesh Ewald (PME) and Reaction Field (RF), in relative binding free energy calculations using a nonequilibrium switching protocol. We found simulations using RF achieve comparable results to those using PME but gain more efficiency when using CPU and similar performance using GPU. The results from this work encourage more use of RF in molecular simulations.
Collapse
|
33
|
Improving small molecule force fields by identifying and characterizing small molecules with inconsistent parameters. J Comput Aided Mol Des 2021; 35:271-284. [PMID: 33506360 PMCID: PMC8162916 DOI: 10.1007/s10822-020-00367-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 12/01/2020] [Indexed: 01/07/2023]
Abstract
Many molecular simulation methods use force fields to help model and simulate molecules and their behavior in various environments. Force fields are sets of functions and parameters used to calculate the potential energy of a chemical system as a function of the atomic coordinates. Despite the widespread use of force fields, their inadequacies are often thought to contribute to systematic errors in molecular simulations. Furthermore, different force fields tend to give varying results on the same systems with the same simulation settings. Here, we present a pipeline for comparing the geometries of small molecule conformers. We aimed to identify molecules or chemistries that are particularly informative for future force field development because they display inconsistencies between force fields. We applied our pipeline to a subset of the eMolecules database, and highlighted molecules that appear to be parameterized inconsistently across different force fields. We then identified over-represented functional groups in these molecule sets. The molecules and moieties identified by this pipeline may be particularly helpful for future force field parameterization.
Collapse
|
34
|
Enhancing water sampling of buried binding sites using nonequilibrium candidate Monte Carlo. J Comput Aided Mol Des 2021; 35:167-177. [PMID: 32968887 PMCID: PMC7904576 DOI: 10.1007/s10822-020-00344-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 09/16/2020] [Indexed: 11/26/2022]
Abstract
Water molecules can be found interacting with the surface and within cavities in proteins. However, water exchange between bulk and buried hydration sites can be slow compared to simulation timescales, thus leading to the inefficient sampling of the locations of water. This can pose problems for free energy calculations for computer-aided drug design. Here, we apply a hybrid method that combines nonequilibrium candidate Monte Carlo (NCMC) simulations and molecular dynamics (MD) to enhance sampling of water in specific areas of a system, such as the binding site of a protein. Our approach uses NCMC to gradually remove interactions between a selected water molecule and its environment, then translates the water to a new region, before turning the interactions back on. This approach of gradual removal of interactions, followed by a move and then reintroduction of interactions, allows the environment to relax in response to the proposed water translation, improving acceptance of moves and thereby accelerating water exchange and sampling. We validate this approach on several test systems including the ligand-bound MUP-1 and HSP90 proteins with buried crystallographic waters removed. We show that our BLUES (NCMC/MD) method enhances water sampling relative to normal MD when applied to these systems. Thus, this approach provides a strategy to improve water sampling in molecular simulations which may be useful in practical applications in drug discovery and biomolecular design.
Collapse
|
35
|
Overview of the SAMPL6 pK a challenge: evaluating small molecule microscopic and macroscopic pK a predictions. J Comput Aided Mol Des 2021; 35:131-166. [PMID: 33394238 PMCID: PMC7904668 DOI: 10.1007/s10822-020-00362-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 11/17/2020] [Indexed: 01/01/2023]
Abstract
The prediction of acid dissociation constants (pKa) is a prerequisite for predicting many other properties of a small molecule, such as its protein-ligand binding affinity, distribution coefficient (log D), membrane permeability, and solubility. The prediction of each of these properties requires knowledge of the relevant protonation states and solution free energy penalties of each state. The SAMPL6 pKa Challenge was the first time that a separate challenge was conducted for evaluating pKa predictions as part of the Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) exercises. This challenge was motivated by significant inaccuracies observed in prior physical property prediction challenges, such as the SAMPL5 log D Challenge, caused by protonation state and pKa prediction issues. The goal of the pKa challenge was to assess the performance of contemporary pKa prediction methods for drug-like molecules. The challenge set was composed of 24 small molecules that resembled fragments of kinase inhibitors, a number of which were multiprotic. Eleven research groups contributed blind predictions for a total of 37 pKa distinct prediction methods. In addition to blinded submissions, four widely used pKa prediction methods were included in the analysis as reference methods. Collecting both microscopic and macroscopic pKa predictions allowed in-depth evaluation of pKa prediction performance. This article highlights deficiencies of typical pKa prediction evaluation approaches when the distinction between microscopic and macroscopic pKas is ignored; in particular, we suggest more stringent evaluation criteria for microscopic and macroscopic pKa predictions guided by the available experimental data. Top-performing submissions for macroscopic pKa predictions achieved RMSE of 0.7-1.0 pKa units and included both quantum chemical and empirical approaches, where the total number of extra or missing macroscopic pKas predicted by these submissions were fewer than 8 for 24 molecules. A large number of submissions had RMSE spanning 1-3 pKa units. Molecules with sulfur-containing heterocycles or iodo and bromo groups were less accurately predicted on average considering all methods evaluated. For a subset of molecules, we utilized experimentally-determined microstates based on NMR to evaluate the dominant tautomer predictions for each macroscopic state. Prediction of dominant tautomers was a major source of error for microscopic pKa predictions, especially errors in charged tautomers. The degree of inaccuracy in pKa predictions observed in this challenge is detrimental to the protein-ligand binding affinity predictions due to errors in dominant protonation state predictions and the calculation of free energy corrections for multiple protonation states. Underestimation of ligand pKa by 1 unit can lead to errors in binding free energy errors up to 1.2 kcal/mol. The SAMPL6 pKa Challenge demonstrated the need for improving pKa prediction methods for drug-like molecules, especially for challenging moieties and multiprotic molecules.
Collapse
|
36
|
Abstract
Sampling multiple binding modes of a ligand in a single molecular dynamics simulation is difficult. A given ligand may have many internal degrees of freedom, along with many different ways it might orient itself in a binding site or across several binding sites, all of which might be separated by large energy barriers. We have developed a novel Monte Carlo move called molecular darting (MolDarting) to reversibly sample between predefined binding modes of a ligand. Here, we couple this with nonequilibrium candidate Monte Carlo (NCMC) to improve acceptance of moves. We apply this technique to a simple dipeptide system, a ligand binding to T4 lysozyme L99A, and ligand binding to HIV integrase to test this new method. We observe significant increases in acceptance compared to uniformly sampling the internal and rotational/translational degrees of freedom in these systems.
Collapse
|
37
|
Structural and Molecular Dynamics of Mycobacterium tuberculosis Malic Enzyme, a Potential Anti-TB Drug Target. ACS Infect Dis 2021; 7:174-188. [PMID: 33356117 DOI: 10.1021/acsinfecdis.0c00735] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Tuberculosis (TB) is the most lethal bacterial infectious disease worldwide. It is notoriously difficult to treat, requiring a cocktail of antibiotics administered over many months. The dense, waxy outer membrane of the TB-causing agent, Mycobacterium tuberculosis (Mtb), acts as a formidable barrier against uptake of antibiotics. Subsequently, enzymes involved in maintaining the integrity of the Mtb cell wall are promising drug targets. Recently, we demonstrated that Mtb lacking malic enzyme (MEZ) has altered cell wall lipid composition and attenuated uptake by macrophages. These results suggest that MEZ contributes to lipid biosynthesis by providing reductants in the form of NAD(P)H. Here, we present the X-ray crystal structure of MEZ to 3.6 Å. We use biochemical assays to demonstrate MEZ is dimeric in solution and to evaluate the effects of pH and allosteric regulators on its kinetics and thermal stability. To assess the interactions between MEZ and its substrate malate and cofactors, Mn2+ and NAD(P)+, we ran a series of molecular dynamics (MD) simulations. First, the MD analysis corroborates our empirical observations that MEZ is unusually flexible, which persists even with the addition of substrate and cofactors. Second, the MD simulations reveal that dimeric MEZ subunits alternate between open and closed states, and that MEZ can stably bind its NAD(P)+ cofactor in multiple conformations, including an inactive, compact NAD+ form. Together the structure of MEZ and insights from its dynamics can be harnessed to inform the design of MEZ inhibitors that target Mtb and not human malic enzyme homologues.
Collapse
|
38
|
SAMPL7 Host-Guest Challenge Overview: assessing the reliability of polarizable and non-polarizable methods for binding free energy calculations. J Comput Aided Mol Des 2021; 35:1-35. [PMID: 33392951 PMCID: PMC8121194 DOI: 10.1007/s10822-020-00363-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/24/2020] [Indexed: 12/15/2022]
Abstract
The SAMPL challenges focus on testing and driving progress of computational methods to help guide pharmaceutical drug discovery. However, assessment of methods for predicting binding affinities is often hampered by computational challenges such as conformational sampling, protonation state uncertainties, variation in test sets selected, and even lack of high quality experimental data. SAMPL blind challenges have thus frequently included a component focusing on host-guest binding, which removes some of these challenges while still focusing on molecular recognition. Here, we report on the results of the SAMPL7 blind prediction challenge for host-guest affinity prediction. In this study, we focused on three different host-guest categories-a familiar deep cavity cavitand series which has been featured in several prior challenges (where we examine binding of a series of guests to two hosts), a new series of cyclodextrin derivatives which are monofunctionalized around the rim to add amino acid-like functionality (where we examine binding of two guests to a series of hosts), and binding of a series of guests to a new acyclic TrimerTrip host which is related to previous cucurbituril hosts. Many predictions used methods based on molecular simulations, and overall success was mixed, though several methods stood out. As in SAMPL6, we find that one strategy for achieving reasonable accuracy here was to make empirical corrections to binding predictions based on previous data for host categories which have been studied well before, though this can be of limited value when new systems are included. Additionally, we found that alchemical free energy methods using the AMOEBA polarizable force field had considerable success for the two host categories in which they participated. The new TrimerTrip system was also found to introduce some sampling problems, because multiple conformations may be relevant to binding and interconvert only slowly. Overall, results in this challenge tentatively suggest that further investigation of polarizable force fields for these challenges may be warranted.
Collapse
|
39
|
Abstract
Background: Force fields are used in a wide variety of contexts for classical molecular simulation, including studies on protein-ligand binding, membrane permeation, and thermophysical property prediction. The quality of these studies relies on the quality of the force fields used to represent the systems. Methods: Focusing on small molecules of fewer than 50 heavy atoms, our aim in this work is to compare nine force fields: GAFF, GAFF2, MMFF94, MMFF94S, OPLS3e, SMIRNOFF99Frosst, and the Open Force Field Parsley, versions 1.0, 1.1, and 1.2. On a dataset comprising 22,675 molecular structures of 3,271 molecules, we analyzed force field-optimized geometries and conformer energies compared to reference quantum mechanical (QM) data. Results: We show that while OPLS3e performs best, the latest Open Force Field Parsley release is approaching a comparable level of accuracy in reproducing QM geometries and energetics for this set of molecules. Meanwhile, the performance of established force fields such as MMFF94S and GAFF2 is generally somewhat worse. We also find that the series of recent Open Force Field versions provide significant increases in accuracy. Conclusions: This study provides an extensive test of the performance of different molecular mechanics force fields on a diverse molecule set, and highlights two (OPLS3e and OpenFF 1.2) that perform better than the others tested on the present comparison. Our molecule set and results are available for other researchers to use in testing.
Collapse
|
40
|
Kinetics and free energy of ligand dissociation using weighted ensemble milestoning. J Chem Phys 2020; 153:154117. [PMID: 33092382 DOI: 10.1063/5.0021953] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We consider the recently developed weighted ensemble milestoning (WEM) scheme [D. Ray and I. Andricioaei, J. Chem. Phys. 152, 234114 (2020)] and test its capability of simulating ligand-receptor dissociation dynamics. We performed WEM simulations on the following host-guest systems: Na+/Cl- ion pair and 4-hydroxy-2-butanone ligand with FK506 binding protein. As a proof of principle, we show that the WEM formalism reproduces the Na+/Cl- ion pair dissociation timescale and the free energy profile obtained from long conventional MD simulation. To increase the accuracy of WEM calculations applied to kinetics and thermodynamics in protein-ligand binding, we introduced a modified WEM scheme called weighted ensemble milestoning with restraint release (WEM-RR), which can increase the number of starting points per milestone without adding additional computational cost. WEM-RR calculations obtained a ligand residence time and binding free energy in agreement with experimental and previous computational results. Moreover, using the milestoning framework, the binding time and rate constants, dissociation constants, and committor probabilities could also be calculated at a low computational cost. We also present an analytical approach for estimating the association rate constant (kon) when binding is primarily diffusion driven. We show that the WEM method can efficiently calculate multiple experimental observables describing ligand-receptor binding/unbinding and is a promising candidate for computer-aided inhibitor design.
Collapse
|
41
|
Insights on small molecule binding to the Hv1 proton channel from free energy calculations with molecular dynamics simulations. Sci Rep 2020; 10:13587. [PMID: 32788614 PMCID: PMC7423955 DOI: 10.1038/s41598-020-70369-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 06/23/2020] [Indexed: 02/07/2023] Open
Abstract
Hv1 is a voltage-gated proton channel whose main function is to facilitate extrusion of protons from the cell. The development of effective channel blockers for Hv1 can lead to new therapeutics for the treatment of maladies related to Hv1 dysfunction. Although the mechanism of proton permeation in Hv1 remains to be elucidated, a series of small molecules have been discovered to inhibit Hv1. Here, we computed relative binding free energies of a prototypical Hv1 blocker on a model of human Hv1 in an open state. We used alchemical free energy perturbation techniques based on atomistic molecular dynamics simulations. The results support our proposed open state model and shed light on the preferred tautomeric state of the channel blocker. This work lays the groundwork for future studies on adapting the blocker molecule for more effective inhibition of Hv1.
Collapse
|
42
|
Assessing the accuracy of octanol-water partition coefficient predictions in the SAMPL6 Part II log P Challenge. J Comput Aided Mol Des 2020; 34:335-370. [PMID: 32107702 PMCID: PMC7138020 DOI: 10.1007/s10822-020-00295-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The SAMPL Challenges aim to focus the biomolecular and physical modeling community on issues that limit the accuracy of predictive modeling of protein-ligand binding for rational drug design. In the SAMPL5 log D Challenge, designed to benchmark the accuracy of methods for predicting drug-like small molecule transfer free energies from aqueous to nonpolar phases, participants found it difficult to make accurate predictions due to the complexity of protonation state issues. In the SAMPL6 log P Challenge, we asked participants to make blind predictions of the octanol-water partition coefficients of neutral species of 11 compounds and assessed how well these methods performed absent the complication of protonation state effects. This challenge builds on the SAMPL6 p[Formula: see text] Challenge, which asked participants to predict p[Formula: see text] values of a superset of the compounds considered in this log P challenge. Blind prediction sets of 91 prediction methods were collected from 27 research groups, spanning a variety of quantum mechanics (QM) or molecular mechanics (MM)-based physical methods, knowledge-based empirical methods, and mixed approaches. There was a 50% increase in the number of participating groups and a 20% increase in the number of submissions compared to the SAMPL5 log D Challenge. Overall, the accuracy of octanol-water log P predictions in SAMPL6 Challenge was higher than cyclohexane-water log D predictions in SAMPL5, likely because modeling only the neutral species was necessary for log P and several categories of method benefited from the vast amounts of experimental octanol-water log P data. There were many highly accurate methods: 10 diverse methods achieved RMSE less than 0.5 log P units. These included QM-based methods, empirical methods, and mixed methods with physical modeling supported with empirical corrections. A comparison of physical modeling methods showed that QM-based methods outperformed MM-based methods. The average RMSE of the most accurate five MM-based, QM-based, empirical, and mixed approach methods based on RMSE were 0.92 ± 0.13, 0.48 ± 0.06, 0.47 ± 0.05, and 0.50 ± 0.06, respectively.
Collapse
|
43
|
Fragment Pose Prediction Using Non-equilibrium Candidate Monte Carlo and Molecular Dynamics Simulations. J Chem Theory Comput 2020; 16:2778-2794. [PMID: 32167763 DOI: 10.1021/acs.jctc.9b01096] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Part of early stage drug discovery involves determining how molecules may bind to the target protein. Through understanding where and how molecules bind, chemists can begin to build ideas on how to design improvements to increase binding affinities. In this retrospective study, we compare how computational approaches like docking, molecular dynamics (MD) simulations, and a non-equilibrium candidate Monte Carlo (NCMC)-based method (NCMC + MD) perform in predicting binding modes for a set of 12 fragment-like molecules, which bind to soluble epoxide hydrolase. We evaluate each method's effectiveness in identifying the dominant binding mode and finding additional binding modes (if any). Then, we compare our predicted binding modes to experimentally obtained X-ray crystal structures. We dock each of the 12 small molecules into the apo-protein crystal structure and then run simulations up to 1 μs each. Small and fragment-like molecules likely have smaller energy barriers separating different binding modes by virtue of relatively fewer and weaker interactions relative to drug-like molecules and thus likely undergo more rapid binding mode transitions. We expect, thus, to see more rapid transitions between binding modes in our study. Following this, we build Markov State Models to define our stable ligand binding modes. We investigate if adequate sampling of ligand binding modes and transitions between them can occur at the microsecond timescale using traditional MD or a hybrid NCMC+MD simulation approach. Our findings suggest that even with small fragment-like molecules, we fail to sample all the crystallographic binding modes using microsecond MD simulations, but using NCMC+MD, we have better success in sampling the crystal structure while obtaining the correct populations.
Collapse
|
44
|
Sampling Conformational Changes of Bound Ligands Using Nonequilibrium Candidate Monte Carlo and Molecular Dynamics. J Chem Theory Comput 2020; 16:1854-1865. [PMID: 32058713 DOI: 10.1021/acs.jctc.9b01066] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Flexible ligands often have multiple binding modes or bound conformations that differ by rotation of a portion of the molecule around internal rotatable bonds. Knowledge of these binding modes is important for understanding the interactions stabilizing the ligand in the binding pocket, and other studies indicate it is important for calculating accurate binding affinities. In this work, we use a hybrid molecular dynamics (MD)/nonequilibrium candidate Monte Carlo (NCMC) method to sample the different binding modes of several flexible ligands and also to estimate the population distribution of the modes. The NCMC move proposal is divided into three parts. The flexible part of the ligand is alchemically turned off by decreasing the electrostatics and steric interactions gradually, followed by rotating the rotatable bond by a random angle and then slowly turning the ligand back on to its fully interacting state. The alchemical steps prior to and after the move proposal help the surrounding protein and water atoms in the binding pocket relax around the proposed ligand conformation and increase move acceptance rates. The protein-ligand system is propagated using classical MD in between the NCMC proposals. Using this MD/NCMC method, we were able to correctly reproduce the different binding modes of inhibitors binding to two kinase targets-c-Jun N-terminal kinase-1 and cyclin-dependent kinase 2-at a much lower computational cost compared to conventional MD and umbrella sampling. This method is available as a part of the BLUES software package.
Collapse
|
45
|
D3R Grand Challenge 4: ligand similarity and MM-GBSA-based pose prediction and affinity ranking for BACE-1 inhibitors. J Comput Aided Mol Des 2020; 34:163-177. [PMID: 31781990 PMCID: PMC8208075 DOI: 10.1007/s10822-019-00249-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 11/06/2019] [Indexed: 01/05/2023]
Abstract
The Drug Design Data Resource (D3R) Grand Challenges present an opportunity to assess, in the context of a blind predictive challenge, the accuracy and the limits of tools and methodologies designed to help guide pharmaceutical drug discovery projects. Here, we report the results of our participation in the D3R Grand Challenge 4 (GC4), which focused on predicting the binding poses and affinity ranking for compounds targeting the [Formula: see text]-amyloid precursor protein (BACE-1). Our ligand similarity-based protocol using HYBRID (OpenEye Scientific Software) successfully identified poses close to the native binding mode for most of the ligands with less than 2 Å RMSD accuracy. Furthermore, we compared the performance of our HYBRID-based approach to that of AutoDock Vina and DOCK 6 and found that using a reference ligand to guide the docking process is a better strategy for pose prediction and helped HYBRID to perform better here. We also conducted end-point free energy estimates on molecules dynamics based ensembles of protein-ligand complexes using molecular mechanics combined with generalized Born surface area method (MM-GBSA). We found that the binding affinity ranking based on MM-GBSA scores have poor correlation with the experimental values. Finally, the main lessons from our participation in D3R GC4 are: (i) the generation of the macrocyclic conformers is a key step for successful pose prediction, (ii) the protonation states of the BACE-1 binding site should be treated carefully, (iii) the MM-GBSA method could not discriminate well between different predicted binding poses, and (iv) the MM-GBSA method does not perform well at predicting protein-ligand binding affinities here.
Collapse
|
46
|
Optimal designs for pairwise calculation: An application to free energy perturbation in minimizing prediction variability. J Comput Chem 2020; 41:247-257. [PMID: 31721260 PMCID: PMC6917845 DOI: 10.1002/jcc.26095] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 09/04/2019] [Accepted: 10/07/2019] [Indexed: 01/18/2023]
Abstract
Pairwise-based methods such as the free energy perturbation (FEP) method have been widely deployed to compute the binding free energy differences between two similar host-guest complexes. The calculated pairwise free energy difference is either directly adopted or transformed to absolute binding free energy for molecule rank ordering. We investigated, through both analytic derivations and simulations, how the selection of pairs in the experiment could impact the overall prediction precision. Our studies showed that (1) the estimated absolute binding free energy ( Δ G ^ ) derived from calculated pairwise differences (ΔΔG) through weighted least squares fitting is more precise in prediction than the pairwise difference values when the number of pairs is more than the number of ligands and (2) prediction precision is influenced by both the total number of pairs and the specifically selected pairs, the latter being critically important when the number of calculated pairs is limited. Furthermore, we applied optimal experimental design in pair selection and found that the optimally selected pairs can outperform randomly selected pairs in prediction precision. In an illustrative example, we showed that, upon weighing ligand structure similarity into design optimization, the weighted optimal designs are more efficient than the literature reported designs. This work provides a new approach to assess retrospective pairwise-based prediction results, and a method to design new prospective pairwise-based experiments for molecular lead optimization. © 2019 Wiley Periodicals, Inc.
Collapse
|
47
|
The SAMPL6 SAMPLing challenge: assessing the reliability and efficiency of binding free energy calculations. J Comput Aided Mol Des 2020; 34:601-633. [PMID: 31984465 DOI: 10.1007/s10822-020-00290-5] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/13/2020] [Indexed: 12/22/2022]
Abstract
Approaches for computing small molecule binding free energies based on molecular simulations are now regularly being employed by academic and industry practitioners to study receptor-ligand systems and prioritize the synthesis of small molecules for ligand design. Given the variety of methods and implementations available, it is natural to ask how the convergence rates and final predictions of these methods compare. In this study, we describe the concept and results for the SAMPL6 SAMPLing challenge, the first challenge from the SAMPL series focusing on the assessment of convergence properties and reproducibility of binding free energy methodologies. We provided parameter files, partial charges, and multiple initial geometries for two octa-acid (OA) and one cucurbit[8]uril (CB8) host-guest systems. Participants submitted binding free energy predictions as a function of the number of force and energy evaluations for seven different alchemical and physical-pathway (i.e., potential of mean force and weighted ensemble of trajectories) methodologies implemented with the GROMACS, AMBER, NAMD, or OpenMM simulation engines. To rank the methods, we developed an efficiency statistic based on bias and variance of the free energy estimates. For the two small OA binders, the free energy estimates computed with alchemical and potential of mean force approaches show relatively similar variance and bias as a function of the number of energy/force evaluations, with the attach-pull-release (APR), GROMACS expanded ensemble, and NAMD double decoupling submissions obtaining the greatest efficiency. The differences between the methods increase when analyzing the CB8-quinine system, where both the guest size and correlation times for system dynamics are greater. For this system, nonequilibrium switching (GROMACS/NS-DS/SB) obtained the overall highest efficiency. Surprisingly, the results suggest that specifying force field parameters and partial charges is insufficient to generally ensure reproducibility, and we observe differences between seemingly converged predictions ranging approximately from 0.3 to 1.0 kcal/mol, even with almost identical simulations parameters and system setup (e.g., Lennard-Jones cutoff, ionic composition). Further work will be required to completely identify the exact source of these discrepancies. Among the conclusions emerging from the data, we found that Hamiltonian replica exchange-while displaying very small variance-can be affected by a slowly-decaying bias that depends on the initial population of the replicas, that bidirectional estimators are significantly more efficient than unidirectional estimators for nonequilibrium free energy calculations for systems considered, and that the Berendsen barostat introduces non-negligible artifacts in expanded ensemble simulations.
Collapse
|
48
|
Best Practices for Alchemical Free Energy Calculations [Article v1.0]. LIVING JOURNAL OF COMPUTATIONAL MOLECULAR SCIENCE 2020; 2:18378. [PMID: 34458687 PMCID: PMC8388617 DOI: 10.33011/livecoms.2.1.18378] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Alchemical free energy calculations are a useful tool for predicting free energy differences associated with the transfer of molecules from one environment to another. The hallmark of these methods is the use of "bridging" potential energy functions representing alchemical intermediate states that cannot exist as real chemical species. The data collected from these bridging alchemical thermodynamic states allows the efficient computation of transfer free energies (or differences in transfer free energies) with orders of magnitude less simulation time than simulating the transfer process directly. While these methods are highly flexible, care must be taken in avoiding common pitfalls to ensure that computed free energy differences can be robust and reproducible for the chosen force field, and that appropriate corrections are included to permit direct comparison with experimental data. In this paper, we review current best practices for several popular application domains of alchemical free energy calculations performed with equilibrium simulations, in particular relative and absolute small molecule binding free energy calculations to biomolecular targets.
Collapse
|
49
|
Octanol-water partition coefficient measurements for the SAMPL6 blind prediction challenge. J Comput Aided Mol Des 2019; 34:405-420. [PMID: 31858363 DOI: 10.1007/s10822-019-00271-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 12/07/2019] [Indexed: 11/24/2022]
Abstract
Partition coefficients describe the equilibrium partitioning of a single, defined charge state of a solute between two liquid phases in contact, typically a neutral solute. Octanol-water partition coefficients ([Formula: see text]), or their logarithms (log P), are frequently used as a measure of lipophilicity in drug discovery. The partition coefficient is a physicochemical property that captures the thermodynamics of relative solvation between aqueous and nonpolar phases, and therefore provides an excellent test for physics-based computational models that predict properties of pharmaceutical relevance such as protein-ligand binding affinities or hydration/solvation free energies. The SAMPL6 Part II octanol-water partition coefficient prediction challenge used a subset of kinase inhibitor fragment-like compounds from the SAMPL6 [Formula: see text] prediction challenge in a blind experimental benchmark. Following experimental data collection, the partition coefficient dataset was kept blinded until all predictions were collected from participating computational chemistry groups. A total of 91 submissions were received from 27 participating research groups. This paper presents the octanol-water log P dataset for this SAMPL6 Part II partition coefficient challenge, which consisted of 11 compounds (six 4-aminoquinazolines, two benzimidazole, one pyrazolo[3,4-d]pyrimidine, one pyridine, one 2-oxoquinoline substructure containing compounds) with log P values in the range of 1.95-4.09. We describe the potentiometric log P measurement protocol used to collect this dataset using a Sirius T3, discuss the limitations of this experimental approach, and share suggestions for future log P data collection efforts for the evaluation of computational methods.
Collapse
|
50
|
Comparison of affinity ranking using AutoDock-GPU and MM-GBSA scores for BACE-1 inhibitors in the D3R Grand Challenge 4. J Comput Aided Mol Des 2019; 33:1011-1020. [PMID: 31691919 PMCID: PMC7027993 DOI: 10.1007/s10822-019-00240-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 10/21/2019] [Indexed: 11/25/2022]
Abstract
Molecular docking has been successfully used in computer-aided molecular design projects for the identification of ligand poses within protein binding sites. However, relying on docking scores to rank different ligands with respect to their experimental affinities might not be sufficient. It is believed that the binding scores calculated using molecular mechanics combined with the Poisson-Boltzman surface area (MM-PBSA) or generalized Born surface area (MM-GBSA) can predict binding affinities more accurately. In this perspective, we decided to take part in Stage 2 of the Drug Design Data Resource (D3R) Grand Challenge 4 (GC4) to compare the performance of a quick scoring function, AutoDock4, to that of MM-GBSA in predicting the binding affinities of a set of [Formula: see text]-Amyloid Cleaving Enzyme 1 (BACE-1) ligands. Our results show that re-scoring docking poses using MM-GBSA did not improve the correlation with experimental affinities. We further did a retrospective analysis of the results and found that our MM-GBSA protocol is sensitive to details in the protein-ligand system: (i) neutral ligands are more adapted to MM-GBSA calculations than charged ligands, (ii) predicted binding affinities depend on the initial conformation of the BACE-1 receptor, (iii) protonating the aspartyl dyad of BACE-1 correctly results in more accurate binding affinity predictions.
Collapse
|