1
|
On the NS-DSSB unidirectional estimates in the SAMPL6 SAMPLing challenge. J Comput Aided Mol Des 2021; 35:1055-1065. [PMID: 34625885 PMCID: PMC8523005 DOI: 10.1007/s10822-021-00419-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 09/27/2021] [Indexed: 12/03/2022]
Abstract
In the context of the recent SAMPL6 SAMPLing challenge (Rizzi et al. 2020 in J Comput Aided Mol Des 34:601–633) aimed at assessing convergence properties and reproducibility of molecular dynamics binding free energy methodologies, we propose a simple explanation of the severe errors observed in the nonequilibrium switch double-system-single-box (NS-DSSB) approach when using unidirectional estimates. At the same time, we suggest a straightforward and minimal modification of the NS-DSSB protocol for obtaining reliable unidirectional estimates for the process where the ligand is decoupled in the bound state and recoupled in the bulk.
Collapse
|
2
|
A deep learning approach for the blind logP prediction in SAMPL6 challenge. J Comput Aided Mol Des 2020; 34:535-542. [PMID: 32002779 PMCID: PMC8689685 DOI: 10.1007/s10822-020-00292-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/17/2020] [Indexed: 12/14/2022]
Abstract
Water octanol partition coefficient serves as a measure for the lipophilicity of a molecule and is important in the field of drug discovery. A novel method for computational prediction of logarithm of partition coefficient (logP) has been developed using molecular fingerprints and a deep neural network. The machine learning model was trained on a dataset of 12,000 molecules and tested on 2000 molecules. In this article, we present our results for the blind prediction of logP for the SAMPL6 challenge. While the best submission achieved a RMSE of 0.41 logP units, our submission had a RMSE of 0.61 logP units. Overall, we ranked in the top quarter out of the 92 submissions that were made. Our results show that the deep learning model can be used as a fast, accurate and robust method for high throughput prediction of logP of small molecules.
Collapse
|
3
|
SAMPL6 logP challenge: machine learning and quantum mechanical approaches. J Comput Aided Mol Des 2020; 34:495-510. [PMID: 32002780 PMCID: PMC10817701 DOI: 10.1007/s10822-020-00287-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/08/2020] [Indexed: 10/25/2022]
Abstract
Two different types of approaches: (a) approaches that combine quantitative structure activity relationships, quantum mechanical electronic structure methods, and machine-learning and, (b) electronic structure vertical solvation approaches, were used to predict the logP coefficients of 11 molecules as part of the SAMPL6 logP blind prediction challenge. Using electronic structures optimized with density functional theory (DFT), several molecular descriptors were calculated for each molecule, including van der Waals areas and volumes, HOMO/LUMO energies, dipole moments, polarizabilities, and electrophilic and nucleophilic superdelocalizabilities. A multilinear regression model and a partial least squares model were used to train a set of 97 molecules. As well, descriptors were generated using the molecular operating environment and used to create additional machine learning models. Electronic structure vertical solvation approaches considered include DFT and the domain-based local pair natural orbital methods combined with the solvated variant of the correlation consistent composite approach.
Collapse
|
4
|
Multi-phase Boltzmann weighting: accounting for local inhomogeneity in molecular simulations of water-octanol partition coefficients in the SAMPL6 challenge. J Comput Aided Mol Des 2020; 34:471-483. [PMID: 32060677 PMCID: PMC8750956 DOI: 10.1007/s10822-020-00285-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 01/08/2020] [Indexed: 02/08/2023]
Abstract
Accurately computing partition coefficients is a pivotal part of drug discovery. Specifically, octanol-water partition coefficients can provide information into hydrophobicity of drug-like molecules, as well as a de facto representation of membrane permeability. However, one challenge facing the computation of partition coefficients is the need to encapsulate various microscopic environments. These include areas of largely bulk solvent (i.e., either water or octanol) or regions where octanol is saturated with water or areas of higher salt concentration. Also, tautomeric effects require consideration. Thus, we present a Boltzmann weighting approach that incorporates transfer free energies across varying microscopic media, as well as varying tautomeric state, to compute partition coefficients in the SAMPL6 challenge.
Collapse
|
5
|
A remark on the efficiency of the double-system/single-box nonequilibrium approach in the SAMPL6 SAMPLing challenge. J Comput Aided Mol Des 2020; 34:635-639. [PMID: 32277315 DOI: 10.1007/s10822-020-00312-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 03/30/2020] [Indexed: 11/25/2022]
Abstract
The alchemical nonequilibrium switching technique was one of several methods in the top tier of performance in the recent SAMPL6 SAMPLing challenge in both accuracy and efficiency. In this paper, in the context of nonequilibrium alchemical switching, we compare the efficiency of the double-system/single-box (DSSB) approach (used in the SAMPL6 challenges) to the standard single-system/double-box method (SSDB). Exploiting the Crooks theorem in a simple but effective test case, we analytically show that the DSSB approach is almost twice as efficient as SSDB for slow near-equilibrium switching but it gives basically no gain over the conventional SSDB approach when the variance of the work distribution exceeds few [Formula: see text], with the potential of producing artifacts and entanglements if not judiciously implemented.
Collapse
|
6
|
Standard state free energies, not pK as, are ideal for describing small molecule protonation and tautomeric states. J Comput Aided Mol Des 2020; 34:561-573. [PMID: 32052350 DOI: 10.1007/s10822-020-00280-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 01/08/2020] [Indexed: 12/14/2022]
Abstract
The pKa is the standard measure used to describe the aqueous proton affinity of a compound, indicating the proton concentration (pH) at which two protonation states (e.g. A- and AH) have equal free energy. However, compounds can have additional protonation states (e.g. AH2+), and may assume multiple tautomeric forms, with the protons in different positions (microstates). Macroscopic pKas give the pH where the molecule changes its total number of protons, while microscopic pKas identify the tautomeric states involved. As tautomers have the same number of protons, the free energy difference between them and their relative probability is pH independent so there is no pKa connecting them. The question arises: What is the best way to describe protonation equilibria of a complex molecule in any pH range? Knowing the number of protons and the relative free energy of all microstates at a single pH, ∆G°, provides all the information needed to determine the free energy, and thus the probability of each microstate at each pH. Microstate probabilities as a function of pH generate titration curves that highlight the low energy, observable microstates, which can then be compared with experiment. A network description connecting microstates as nodes makes it straightforward to test thermodynamic consistency of microstate free energies. The utility of this analysis is illustrated by a description of one molecule from the SAMPL6 Blind pKa Prediction Challenge. Analysis of microstate ∆G°s also makes a more compact way to archive and compare the pH dependent behavior of compounds with multiple protonatable sites.
Collapse
|
7
|
Predicting octanol/water partition coefficients for the SAMPL6 challenge using the SM12, SM8, and SMD solvation models. J Comput Aided Mol Des 2020; 34:575-588. [PMID: 32002781 DOI: 10.1007/s10822-020-00293-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 01/17/2020] [Indexed: 10/25/2022]
Abstract
Blind predictions of octanol/water partition coefficients at 298 K for 11 kinase inhibitor fragment like compounds were made for the SAMPL6 challenge. We used the conventional, "untrained", free energy based approach wherein the octanol/water partition coefficient was computed directly as the difference in solvation free energy in water and 1-octanol. We additionally proposed and used two different forms of a "trained" approach. Physically, the goal of the trained approach is to relate the partition coefficient computed using pure 1-octanol to that using water-saturated 1-octanol. In the first case, we assumed the partition coefficient using water-saturated 1-octanol and pure 1-octanol are linearly correlated. In the second approach, we assume the solvation free energy in water-saturated 1-octanol can be written as a linear combination of the solvation free energy in pure water and 1-octanol. In all cases here, the solvation free energies were computed using electronic structure calculations in the SM12, SM8, and SMD universal solvent models. In the context of the present study, our results in general do not support the additional effort of the trained approach.
Collapse
|
8
|
The SAMPL6 SAMPLing challenge: assessing the reliability and efficiency of binding free energy calculations. J Comput Aided Mol Des 2020; 34:601-633. [PMID: 31984465 DOI: 10.1007/s10822-020-00290-5] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/13/2020] [Indexed: 12/22/2022]
Abstract
Approaches for computing small molecule binding free energies based on molecular simulations are now regularly being employed by academic and industry practitioners to study receptor-ligand systems and prioritize the synthesis of small molecules for ligand design. Given the variety of methods and implementations available, it is natural to ask how the convergence rates and final predictions of these methods compare. In this study, we describe the concept and results for the SAMPL6 SAMPLing challenge, the first challenge from the SAMPL series focusing on the assessment of convergence properties and reproducibility of binding free energy methodologies. We provided parameter files, partial charges, and multiple initial geometries for two octa-acid (OA) and one cucurbit[8]uril (CB8) host-guest systems. Participants submitted binding free energy predictions as a function of the number of force and energy evaluations for seven different alchemical and physical-pathway (i.e., potential of mean force and weighted ensemble of trajectories) methodologies implemented with the GROMACS, AMBER, NAMD, or OpenMM simulation engines. To rank the methods, we developed an efficiency statistic based on bias and variance of the free energy estimates. For the two small OA binders, the free energy estimates computed with alchemical and potential of mean force approaches show relatively similar variance and bias as a function of the number of energy/force evaluations, with the attach-pull-release (APR), GROMACS expanded ensemble, and NAMD double decoupling submissions obtaining the greatest efficiency. The differences between the methods increase when analyzing the CB8-quinine system, where both the guest size and correlation times for system dynamics are greater. For this system, nonequilibrium switching (GROMACS/NS-DS/SB) obtained the overall highest efficiency. Surprisingly, the results suggest that specifying force field parameters and partial charges is insufficient to generally ensure reproducibility, and we observe differences between seemingly converged predictions ranging approximately from 0.3 to 1.0 kcal/mol, even with almost identical simulations parameters and system setup (e.g., Lennard-Jones cutoff, ionic composition). Further work will be required to completely identify the exact source of these discrepancies. Among the conclusions emerging from the data, we found that Hamiltonian replica exchange-while displaying very small variance-can be affected by a slowly-decaying bias that depends on the initial population of the replicas, that bidirectional estimators are significantly more efficient than unidirectional estimators for nonequilibrium free energy calculations for systems considered, and that the Berendsen barostat introduces non-negligible artifacts in expanded ensemble simulations.
Collapse
|
9
|
The SAMPL6 challenge on predicting octanol-water partition coefficients from EC-RISM theory. J Comput Aided Mol Des 2020; 34:453-461. [PMID: 31981015 PMCID: PMC7125249 DOI: 10.1007/s10822-020-00283-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 01/08/2020] [Indexed: 12/14/2022]
Abstract
Results are reported for octanol–water partition coefficients (log P) of the neutral states of drug-like molecules provided during the SAMPL6 (Statistical Assessment of Modeling of Proteins and Ligands) blind prediction challenge from applying the “embedded cluster reference interaction site model” (EC-RISM) as a solvation model for quantum-chemical calculations. Following the strategy outlined during earlier SAMPL challenges we first train 1- and 2-parameter water-free (“dry”) and water-saturated (“wet”) models for n-octanol solvation Gibbs energies with respect to experimental values from the “Minnesota Solvation Database” (MNSOL), yielding a root mean square error (RMSE) of 1.5 kcal mol−1 for the best-performing 2-parameter wet model, while the optimal water model developed for the pKa part of the SAMPL6 challenge is kept unchanged (RMSE 1.6 kcal mol−1 for neutral compounds from a model trained on both neutral and ionic species). Applying these models to the blind prediction set yields a log P RMSE of less than 0.5 for our best model (2-parameters, wet). Further analysis of our results reveals that a single compound is responsible for most of the error, SM15, without which the RMSE drops to 0.2. Since this is the only compound in the challenge dataset with a hydroxyl group we investigate other alcohols for which Gibbs energy of solvation data for both water and n-octanol are available in the MNSOL database to demonstrate a systematic cause of error and to discuss strategies for improvement.
Collapse
|
10
|
SAMPL6 host-guest binding affinities and binding poses from spherical-coordinates-biased simulations. J Comput Aided Mol Des 2020; 34:589-600. [PMID: 31974852 DOI: 10.1007/s10822-020-00294-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 01/17/2020] [Indexed: 10/25/2022]
Abstract
Host-guest binding is a challenging problem in computer simulation. The prediction of binding affinities between hosts and guests is an important part of the statistical assessment of the modeling of proteins and ligands (SAMPL) challenges. In this work, the volume-based variant of well-tempered metadynamics is employed to calculate the binding affinities of the host-guest systems in the SAMPL6 challenge. By biasing the spherical coordinates describing the relative position of the host and the guest, the initial-configuration-induced bias vanishes and all possible binding poses are explored. The agreement between the predictions and the experimental results and the observation of new binding poses indicate that the volume-based technique serves as a nice candidate for the calculation of binding free energies and the search of the binding poses.
Collapse
|
11
|
LogP prediction performance with the SMD solvation model and the M06 density functional family for SAMPL6 blind prediction challenge molecules. J Comput Aided Mol Des 2020; 34:511-522. [PMID: 31939103 DOI: 10.1007/s10822-020-00278-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/08/2020] [Indexed: 12/13/2022]
Abstract
This work presents a quantum mechanical model for predicting octanol-water partition coefficients of small protein-kinase inhibitor fragments as part of the SAMPL6 LogP Prediction Challenge. The model calculates solvation free energy differences using the M06-2X functional with SMD implicit solvation and the def2-SVP basis set. This model was identified as dqxk4 in the SAMPL6 Challenge and was the third highest performing model in the physical methods category with 0.49 log Root Mean Squared Error (RMSE) for predicting the 11 compounds in SAMPL6 blind prediction set. We also collaboratively investigated the use of empirical models to address model deficiencies for halogenated compounds at minimal additional computational cost. A mixed model consisting of the dqxk4 physical and hdpuj empirical models found improved performance at 0.34 log RMSE on the SAMPL6 dataset. This collaborative mixed model approach shows how empirical models can be leveraged to expediently improve performance in chemical spaces that are difficult for ab initio methods to simulate.
Collapse
|
12
|
A blind SAMPL6 challenge: insight into the octanol-water partition coefficients of drug-like molecules via a DFT approach. J Comput Aided Mol Des 2020; 34:463-470. [PMID: 31939104 DOI: 10.1007/s10822-020-00284-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 01/08/2020] [Indexed: 01/30/2023]
Abstract
In this study quantum mechanical methods were used to predict the solvation energies of a series of drug-like molecules both in water and in octanol, in the context of the SAMPL6 n-octanol/water partition coefficient challenge. In pharmaceutical design, n-octanol/water partition coefficient, LogP, describes the drug's hydrophobicity and membrane permeability, thus, a well-established theoretical method that rapidly determines the hydrophobicity of a drug, enables the progress of the drug design. In this study, the solvation free energies were obtained via six different methodologies (B3LYP, M06-2X and ωB97XD functionals with 6-311+G** and 6-31G* basis sets) by taking into account the environment implicitly; the methodology chosen (B3LYP/6-311+G**) was used later to evaluate ΔGsolv by using explicit water as solvent. We optimized each conformer in different solvents separately, our calculations have shown that the stability of the conformers is highly dependent on the solvent environment. We have compared implicitly and explicitly solvated systems, the interaction of one explicit water with drug-molecules at the proper location leads to the prediction of more accurate LogP values.
Collapse
|
13
|
A comparison of molecular representations for lipophilicity quantitative structure-property relationships with results from the SAMPL6 logP Prediction Challenge. J Comput Aided Mol Des 2020; 34:523-534. [PMID: 31933037 DOI: 10.1007/s10822-020-00279-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/08/2020] [Indexed: 12/20/2022]
Abstract
Effective representation of a molecule is required to develop useful quantitative structure-property relationships (QSPR) for accurate prediction of chemical properties. The octanol-water partition coefficient logP, a measure of lipophilicity, is an important property for pharmacological and toxicological endpoints used in the pharmaceutical and regulatory spheres. We compare physicochemical descriptors, structural keys, and circular fingerprints in their ability to effectively represent a chemical space and characterise molecular features to correlate with lipophilicity. Exploratory landscape continuity analyses revealed that whole-molecule physicochemical descriptors could map together compounds that were similar in both molecular features and logP, indicating higher potential for use in logP QSPRs compared to the substructural approach of structural keys and circular fingerprints. Indeed, logP QSPR models parameterised by physicochemical descriptors consistently performed with the lowest error. Our best performing model was a stochastic gradient descent-optimised multilinear regression with 1438 descriptors, returning an internal benchmark RMSE of 1.03 log units. This corroborates the well-established notion that lipophilicity is an additive, whole-molecule property. We externally tested the model by participating in the 2019 SAMPL6 logP Prediction Challenge and blindly predicting for 11 protein kinase inhibitor fragment-like molecules. Our model returned an RMSE of 0.49 log units, placing eighth overall and third in the empirical methods category (submission ID 'hdpuj'). Permutation feature importance analyses revealed that physicochemical descriptors could characterise predictive molecular features highly relevant to the kinase inhibitor fragment-like molecules.
Collapse
|
14
|
Prediction of the n-octanol/water partition coefficients in the SAMPL6 blind challenge from MST continuum solvation calculations. J Comput Aided Mol Des 2019; 34:443-451. [PMID: 31776809 DOI: 10.1007/s10822-019-00262-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 11/21/2019] [Indexed: 12/20/2022]
Abstract
The IEFPCM/MST continuum solvation model is used for the blind prediction of n-octanol/water partition of a set of 11 fragment-like small molecules within the SAMPL6 Part II Partition Coefficient Challenge. The partition coefficient of the neutral species (log P) was determined using an extended parametrization of the B3LYP/6-31G(d) version of the Miertus-Scrocco-Tomasi continuum solvation model in n-octanol. Comparison with the experimental data provided for partition coefficients yielded a root-mean square error (rmse) of 0.78 (log P units), which agrees with the accuracy reported for our method (rmse = 0.80) for nitrogen-containing heterocyclic compounds. Out of the 91 sets of log P values submitted by the participants, our submission is within those with an rmse < 1 and among the four best ranked physical methods. The largest errors involve three compounds: two with the largest positive deviations (SM13 and SM08), and one with the largest negative deviations (SM15). Here we report the potentiometric determination of the log P for SM13, leading to a value of 3.62 ± 0.02, which is in better agreement with most empirical predictions than the experimental value reported in SAMPL6. In addition, further inclusion of several conformations for SM08 significantly improved our results. Inclusion of these refinements led to an overall error of 0.51 (log P units), which supports the reliability of the IEFPCM/MST model for predicting the partitioning of neutral compounds.
Collapse
|
15
|
Use of molecular dynamics fingerprints (MDFPs) in SAMPL6 octanol-water log P blind challenge. J Comput Aided Mol Des 2019; 34:393-403. [PMID: 31745704 DOI: 10.1007/s10822-019-00252-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 11/12/2019] [Indexed: 11/26/2022]
Abstract
The in silico prediction of partition coefficients is an important task in computer-aided drug discovery. In particular the octanol-water partition coefficient is used as surrogate for lipophilicity. Various computational approaches have been proposed, ranging from simple group-contribution techniques based on the 2D topology of a molecule to rigorous methods based molecular dynamics (MD) or quantum chemistry. In order to balance accuracy and computational cost, we recently developed the MD fingerprints (MDFPs), where the information in MD simulations is encoded in a floating-point vector, which can be used as input for machine learning (ML). The MDFP-ML approach was shown to perform similarly to rigorous methods while being substantially more efficient. Here, we present the application of MDFP-ML for the prediction of octanol-water partition coefficients in the SAMPL6 blind challenge. The underlying computational pipeline is made freely available in form of the MDFPtools package.
Collapse
|
16
|
SAMPL6 blind predictions of water-octanol partition coefficients using nonequilibrium alchemical approaches. J Comput Aided Mol Des 2019; 34:371-384. [PMID: 31624982 DOI: 10.1007/s10822-019-00233-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Accepted: 10/03/2019] [Indexed: 12/13/2022]
Abstract
In this paper, we compute, by means of a non equilibrium alchemical technique, the water-octanol partition coefficients (LogP) for a series of drug-like compounds in the context of the SAMPL6 challenge initiative. Our blind predictions are based on three of the most popular non-polarizable force fields, CGenFF, GAFF2, and OPLS-AA and are critically compared to other MD-based predictions produced using free energy perturbation or thermodynamic integration approaches with stratification. The proposed non-equilibrium method emerges has a reliable tool for LogP prediction, systematically being among the top performing submissions in all force field classes for at least two among the various indicators such as the Pearson or the Kendall correlation coefficients or the mean unsigned error. Contrarily to the widespread equilibrium approaches, that yielded apparently very disparate results in the SAMPL6 challenge, all our independent prediction sets, irrespective of the adopted force field and of the adopted estimate (unidirectional or bidirectional) are, mutually, from moderately to strongly correlated.
Collapse
|
17
|
Force matching as a stepping stone to QM/MM CB[8] host/guest binding free energies: a SAMPL6 cautionary tale. J Comput Aided Mol Des 2018; 32:983-999. [PMID: 30276502 PMCID: PMC6867086 DOI: 10.1007/s10822-018-0165-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 09/14/2018] [Indexed: 10/28/2022]
Abstract
Use of quantum mechanical/molecular mechanical (QM/MM) methods in binding free energy calculations, particularly in the SAMPL challenge, often fail to achieve improvement over standard additive (MM) force fields. Frequently, the implementation is through use of reference potentials, or the so-called "indirect approach", and inherently relies on sufficient overlap existing between MM and QM/MM configurational spaces. This overlap is generally poor, particularly for the use of free energy perturbation to perform the MM to QM/MM free energy correction at the end states of interest (e.g., bound and unbound states). However, by utilizing MM parameters that best reproduce forces obtained at the desired QM level of theory, it is possible to lessen the configurational disparity between MM and QM/MM. To this end, we sought to use force matching to generate MM parameters for the SAMPL6 CB[8] host-guest binding challenge, classically compute binding free energies, and apply energetic end state corrections to obtain QM/MM binding free energy differences. For the standard set of 11 molecules and the bonus set (including three additional challenge molecules), error statistics, such as the root mean square deviation (RMSE) were moderately poor (5.5 and 5.4 kcal/mol). Correlation statistics, however, were in the top two for both standard and bonus set submissions ([Formula: see text] of 0.42 and 0.26, [Formula: see text] of 0.64 and 0.47 respectively). High RMSE and moderate correlation strongly indicated the presence of systematic error. Identifiable issues were ameliorated for two of the guest molecules, resulting in a reduction of error and pointing to strong prospects for the future use of this methodology.
Collapse
|
18
|
Overview of the SAMPL6 host-guest binding affinity prediction challenge. J Comput Aided Mol Des 2018; 32:937-963. [PMID: 30415285 PMCID: PMC6301044 DOI: 10.1007/s10822-018-0170-6] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 10/07/2018] [Indexed: 10/27/2022]
Abstract
Accurately predicting the binding affinities of small organic molecules to biological macromolecules can greatly accelerate drug discovery by reducing the number of compounds that must be synthesized to realize desired potency and selectivity goals. Unfortunately, the process of assessing the accuracy of current computational approaches to affinity prediction against binding data to biological macromolecules is frustrated by several challenges, such as slow conformational dynamics, multiple titratable groups, and the lack of high-quality blinded datasets. Over the last several SAMPL blind challenge exercises, host-guest systems have emerged as a practical and effective way to circumvent these challenges in assessing the predictive performance of current-generation quantitative modeling tools, while still providing systems capable of possessing tight binding affinities. Here, we present an overview of the SAMPL6 host-guest binding affinity prediction challenge, which featured three supramolecular hosts: octa-acid (OA), the closely related tetra-endo-methyl-octa-acid (TEMOA), and cucurbit[8]uril (CB8), along with 21 small organic guest molecules. A total of 119 entries were received from ten participating groups employing a variety of methods that spanned from electronic structure and movable type calculations in implicit solvent to alchemical and potential of mean force strategies using empirical force fields with explicit solvent models. While empirical models tended to obtain better performance than first-principle methods, it was not possible to identify a single approach that consistently provided superior results across all host-guest systems and statistical metrics. Moreover, the accuracy of the methodologies generally displayed a substantial dependence on the system considered, emphasizing the need for host diversity in blind evaluations. Several entries exploited previous experimental measurements of similar host-guest systems in an effort to improve their physical-based predictions via some manner of rudimentary machine learning; while this strategy succeeded in reducing systematic errors, it did not correspond to an improvement in statistical correlation. Comparison to previous rounds of the host-guest binding free energy challenge highlights an overall improvement in the correlation obtained by the affinity predictions for OA and TEMOA systems, but a surprising lack of improvement regarding root mean square error over the past several challenge rounds. The data suggests that further refinement of force field parameters, as well as improved treatment of chemical effects (e.g., buffer salt conditions, protonation states), may be required to further enhance predictive accuracy.
Collapse
|
19
|
SAMPL6 challenge results from [Formula: see text] predictions based on a general Gaussian process model. J Comput Aided Mol Des 2018; 32:1165-1177. [PMID: 30324305 PMCID: PMC6438616 DOI: 10.1007/s10822-018-0169-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 09/28/2018] [Indexed: 12/14/2022]
Abstract
A variety of fields would benefit from accurate [Formula: see text] predictions, especially drug design due to the effect a change in ionization state can have on a molecule's physiochemical properties. Participants in the recent SAMPL6 blind challenge were asked to submit predictions for microscopic and macroscopic [Formula: see text]s of 24 drug like small molecules. We recently built a general model for predicting [Formula: see text]s using a Gaussian process regression trained using physical and chemical features of each ionizable group. Our pipeline takes a molecular graph and uses the OpenEye Toolkits to calculate features describing the removal of a proton. These features are fed into a Scikit-learn Gaussian process to predict microscopic [Formula: see text]s which are then used to analytically determine macroscopic [Formula: see text]s. Our Gaussian process is trained on a set of 2700 macroscopic [Formula: see text]s from monoprotic and select diprotic molecules. Here, we share our results for microscopic and macroscopic predictions in the SAMPL6 challenge. Overall, we ranked in the middle of the pack compared to other participants, but our fairly good agreement with experiment is still promising considering the challenge molecules are chemically diverse and often polyprotic while our training set is predominately monoprotic. Of particular importance to us when building this model was to include an uncertainty estimate based on the chemistry of the molecule that would reflect the likely accuracy of our prediction. Our model reports large uncertainties for the molecules that appear to have chemistry outside our domain of applicability, along with good agreement in quantile-quantile plots, indicating it can predict its own accuracy. The challenge highlighted a variety of means to improve our model, including adding more polyprotic molecules to our training set and more carefully considering what functional groups we do or do not identify as ionizable.
Collapse
|
20
|
An explicit-solvent hybrid QM and MM approach for predicting pKa of small molecules in SAMPL6 challenge. J Comput Aided Mol Des 2018; 32:1191-1201. [PMID: 30276503 PMCID: PMC6342563 DOI: 10.1007/s10822-018-0167-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 09/25/2018] [Indexed: 12/30/2022]
Abstract
In this work we have developed a hybrid QM and MM approach to predict pKa of small drug-like molecules in explicit solvent. The gas phase free energy of deprotonation is calculated using the M06-2X density functional theory level with Pople basis sets. The solvation free energy difference of the acid and its conjugate base is calculated at MD level using thermodynamic integration. We applied this method to the 24 drug-like molecules in the SAMPL6 blind pKa prediction challenge. We achieved an overall RMSE of 2.4 pKa units in our prediction. Our results show that further optimization of the protocol needs to be done before this method can be used as an alternative approach to the well established approaches of a full quantum level or empirical pKa prediction methods.
Collapse
|
21
|
SAMPL6 host-guest challenge: binding free energies via a multistep approach. J Comput Aided Mol Des 2018; 32:1097-1115. [PMID: 30225724 DOI: 10.1007/s10822-018-0159-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 08/31/2018] [Indexed: 12/14/2022]
Abstract
In this effort in the SAMPL6 host-guest binding challenge, a combination of molecular dynamics and quantum mechanical methods were used to blindly predict the host-guest binding free energies of a series of cucurbit[8]uril (CB8), octa-acid (OA), and tetramethyl octa-acid (TEMOA) hosts bound to various guest molecules in aqueous solution. Poses for host-guest systems were generated via molecular dynamics (MD) simulations and clustering analyses. The binding free energies for the structures obtained via cluster analyses of MD trajectories were calculated using the MMPBSA method and density functional theory (DFT) with the inclusion of Grimme's dispersion correction, an implicit solvation model to model the aqueous solution, and the resolution-of-the-identity (RI) approximation (MMPBSA, RI-B3PW91-D3, and RI-B3PW91, respectively). Among these three methods tested, the results for OA and TEMOA systems showed MMPBSA and RI-B3PW91-D3 methods can be used to qualitatively rank binding energies of small molecules with an overbinding by 7 and 37 kcal/mol respectively, and RI-B3PW91 gave the poorest quality results, indicating the importance of dispersion correction for the binding free energy calculations. Due to the complexity of the CB8 systems, all of the methods tested show poor correlation with the experimental results. Other quantum mechanical approaches used for the calculation of binding free energies included DFT without the RI approximation, utilizing truncated basis sets to reduce the computational cost (memory, disk space, CPU time), and a corrected dielectric constant to account for ionic strength within the implicit solvation model.
Collapse
|
22
|
Blinded predictions of standard binding free energies: lessons learned from the SAMPL6 challenge. J Comput Aided Mol Des 2018; 32:1047-1058. [PMID: 30159717 DOI: 10.1007/s10822-018-0154-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 08/24/2018] [Indexed: 10/28/2022]
Abstract
In the context of the SAMPL6 challenges, series of blinded predictions of standard binding free energies were made with the SOMD software for a dataset of 27 host-guest systems featuring two octa-acids hosts (OA and TEMOA) and a cucurbituril ring (CB8) host. Three different models were used, ModelA computes the free energy of binding based on a double annihilation technique; ModelB additionally takes into account long-range dispersion and standard state corrections; ModelC additionally introduces an empirical correction term derived from a regression analysis of SAMPL5 predictions previously made with SOMD. The performance of each model was evaluated with two different setups; buffer explicitly matches the ionic strength from the binding assays, whereas no-buffer merely neutralizes the host-guest net charge with counter-ions. ModelC/no-buffer shows the lowest mean-unsigned error for the overall dataset (MUE 1.29 < 1.39 < 1.50 kcal mol-1, 95% CI), while explicit modelling of the buffer improves significantly results for the CB8 host only. Correlation with experimental data ranges from excellent for the host TEMOA (R2 0.91 < 0.94 < 0.96), to poor for CB8 (R2 0.04 < 0.12 < 0.23). Further investigations indicate a pronounced dependence of the binding free energies on the modelled ionic strength, and variable reproducibility of the binding free energies between different simulation packages.
Collapse
|
23
|
Predicting ligand binding affinity using on- and off-rates for the SAMPL6 SAMPLing challenge. J Comput Aided Mol Des 2018; 32:1001-1012. [PMID: 30141102 DOI: 10.1007/s10822-018-0149-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 08/09/2018] [Indexed: 12/19/2022]
Abstract
Interest in ligand binding kinetics has been growing rapidly, as it is being discovered in more and more systems that ligand residence time is the crucial factor governing drug efficacy. Many enhanced sampling methods have been developed with the goal of predicting ligand binding rates ([Formula: see text]) and/or ligand unbinding rates ([Formula: see text]) through explicit simulation of ligand binding pathways, and these methods work by very different mechanisms. Although there is not yet a blind challenge for ligand binding kinetics, here we take advantage of experimental measurements and rigorously computed benchmarks to compare estimates of [Formula: see text] calculated as the ratio of two rates: [Formula: see text]. These rates were determined using a new enhanced sampling method based on the weighted ensemble framework that we call "REVO": Reweighting of Ensembles by Variance Optimization. This is a further development of the WExplore enhanced sampling method, in which trajectory cloning and merging steps are guided not by the definition of sampling regions, but by maximizing trajectory variance. Here we obtain estimates of [Formula: see text] and [Formula: see text] that are consistent across multiple simulations, with an average log10-scale standard deviation of 0.28 for on-rates and 0.56 for off-rates, which is well within an order of magnitude and far better than previously observed for previous applications of the WExplore algorithm. Our rank ordering of the three host-guest pairs agrees with the reference calculations, however our predicted [Formula: see text] values were systematically lower than the reference by an average of 4.2 kcal/mol. Using tree network visualizations of the trajectories in the REVO algorithm, and conformation space networks for each system, we analyze the results of our sampling, and hypothesize sources of discrepancy between our [Formula: see text] values and the reference. We also motivate the direct inclusion of [Formula: see text] and [Formula: see text] challenges in future iterations of SAMPL, to further develop the field of ligand binding kinetics prediction and modeling.
Collapse
|
24
|
High accuracy quantum-chemistry-based calculation and blind prediction of macroscopic pKa values in the context of the SAMPL6 challenge. J Comput Aided Mol Des 2018; 32:1139-1149. [PMID: 30141103 DOI: 10.1007/s10822-018-0145-7] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]
Abstract
Recent advances in the development of low-cost quantum chemical methods have made the prediction of conformational preferences and physicochemical properties of medium-sized drug-like molecules routinely feasible, with significant potential to advance drug discovery. In the context of the SAMPL6 challenge, macroscopic pKa values were blindly predicted for a set of 24 of such molecules. In this paper we present two similar quantum chemical based approaches based on the high accuracy calculation of standard reaction free energies and the subsequent determination of those pKa values via a linear free energy relationship. Both approaches use extensive conformational sampling and apply hybrid and double-hybrid density functional theory with continuum solvation to calculate free energies. The blindly calculated macroscopic pKa values were in excellent agreement with the experiment.
Collapse
|
25
|
Absolute and relative pK a predictions via a DFT approach applied to the SAMPL6 blind challenge. J Comput Aided Mol Des 2018; 32:1179-1189. [PMID: 30128926 DOI: 10.1007/s10822-018-0150-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Accepted: 08/09/2018] [Indexed: 12/25/2022]
Abstract
In this work, quantum mechanical methods were used to predict the microscopic and macroscopic pKa values for a set of 24 molecules as a part of the SAMPL6 blind challenge. The SMD solvation model was employed with M06-2X and different basis sets to evaluate three pKa calculation schemes (direct, vertical, and adiabatic). The adiabatic scheme is the most accurate approach (RMSE = 1.40 pKa units) and has high correlation (R2 = 0.93), with respect to experiment. This approach can be improved by applying a linear correction to yield an RMSE of 0.73 pKa units. Additionally, we consider including explicit solvent representation and multiple lower-energy conformations to improve the predictions for outliers. Adding three water molecules explicitly can reduce the error by 2-4 pKa units, with respect to experiment, whereas including multiple local minima conformations does not necessarily improve the pKa prediction.
Collapse
|