1
|
Medrano Sandonas L, Van Rompaey D, Fallani A, Hilfiker M, Hahn D, Perez-Benito L, Verhoeven J, Tresadern G, Kurt Wegner J, Ceulemans H, Tkatchenko A. Dataset for quantum-mechanical exploration of conformers and solvent effects in large drug-like molecules. Sci Data 2024; 11:742. [PMID: 38972891 PMCID: PMC11228031 DOI: 10.1038/s41597-024-03521-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/13/2024] [Indexed: 07/09/2024] Open
Abstract
We here introduce the Aquamarine (AQM) dataset, an extensive quantum-mechanical (QM) dataset that contains the structural and electronic information of 59,783 low-and high-energy conformers of 1,653 molecules with a total number of atoms ranging from 2 to 92 (mean: 50.9), and containing up to 54 (mean: 28.2) non-hydrogen atoms. To gain insights into the solvent effects as well as collective dispersion interactions for drug-like molecules, we have performed QM calculations supplemented with a treatment of many-body dispersion (MBD) interactions of structures and properties in the gas phase and implicit water. Thus, AQM contains over 40 global and local physicochemical properties (including ground-state and response properties) per conformer computed at the tightly converged PBE0+MBD level of theory for gas-phase molecules, whereas PBE0+MBD with the modified Poisson-Boltzmann (MPB) model of water was used for solvated molecules. By addressing both molecule-solvent and dispersion interactions, AQM dataset can serve as a challenging benchmark for state-of-the-art machine learning methods for property modeling and de novo generation of large (solvated) molecules with pharmaceutical and biological relevance.
Collapse
Affiliation(s)
- Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Dries Van Rompaey
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium.
| | - Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Mathias Hilfiker
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
| | - David Hahn
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Laura Perez-Benito
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Jonas Verhoeven
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Joerg Kurt Wegner
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
- Drug Discovery Data Sciences (D3S), Johnson & Johnson Innovative Medicine, 301 Binney Street, MA 02142, Cambridge, USA
| | - Hugo Ceulemans
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
2
|
Walton-Raaby M, Floen T, García-Díez G, Mora-Diez N. Calculating the Aqueous pK a of Phenols: Predictions for Antioxidants and Cannabinoids. Antioxidants (Basel) 2023; 12:1420. [PMID: 37507958 PMCID: PMC10376140 DOI: 10.3390/antiox12071420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
We aim to develop a theoretical methodology for the accurate aqueous pKa prediction of structurally complex phenolic antioxidants and cannabinoids. In this study, five functionals (M06-2X, B3LYP, BHandHLYP, PBE0, and TPSS) and two solvent models (SMD and PCM) were combined with the 6-311++G(d,p) basis set to predict pKa values for twenty structurally simple phenols. None of the direct calculations produced good results. However, the correlations between the calculated Gibbs energy difference of each acid and its conjugate base, ΔGaq(BA)°=ΔGaqA-°-ΔGaq(HA)°, and the experimental aqueous pKa values had superior predictive accuracy, which was also tested relative to an independent set of ten molecules of which six were structurally complex phenols. New correlations were built with twenty-seven phenols (including the phenols with experimental pKa values from the test set), which were used to make predictions. The best correlation equations used the PCM method and produced mean absolute errors of 0.26-0.27 pKa units and R2 values of 0.957-0.960. The average range of predictions for the potential antioxidants (cannabinoids) was 0.15 (0.25) pKa units, which indicates good agreement between our methodologies. The new correlation equations could be used to make pKa predictions for other phenols in water and potentially in other solvents where they might be more soluble.
Collapse
Affiliation(s)
- Max Walton-Raaby
- Department of Chemistry, Thompson Rivers University, Kamloops, BC V2C 0C8, Canada
- Department of Chemistry, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Tyler Floen
- Department of Chemistry, Thompson Rivers University, Kamloops, BC V2C 0C8, Canada
| | | | - Nelaine Mora-Diez
- Department of Chemistry, Thompson Rivers University, Kamloops, BC V2C 0C8, Canada
| |
Collapse
|
3
|
Reyna-Luna J, Soriano-Agueda L, Vera CJ, Franco-Pérez M. Insights into the coordination chemistry of antineoplastic doxorubicin with 3d-transition metal ions Zn 2+, Cu 2+, and VO 2+: a study using well-calibrated thermodynamic cycles and chemical interaction quantum chemistry models. J Comput Aided Mol Des 2023:10.1007/s10822-023-00506-4. [PMID: 37245168 DOI: 10.1007/s10822-023-00506-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 05/19/2023] [Indexed: 05/29/2023]
Abstract
We present a computational strategy based on thermodynamic cycles to predict and describe the chemical equilibrium between the 3d-transition metal ions Zn2+, Cu2+, and VO2+ and the widely used antineoplastic drug doxorubicin. Our method involves benchmarking a theoretical protocol to compute gas-phase quantities using DLPNO Coupled-Cluster calculations as reference, followed by estimating solvation contributions to the reaction Gibbs free energies using both explicit partial (micro)solvation steps for charged solutes and neutral coordination complexes, as well as a continuum solvation procedure for all solutes involved in the complexation process. We rationalized the stability of these doxorubicin-metal complexes by inspecting quantities obtained from the topology of their electron densities, particularly the bond critical points and non-covalent interaction index. Our approach allowed us to identify representative species in solution phase, infer the most likely complexation process for each case, and identify key intramolecular interactions involved in the stability of these compounds. To the best of our knowledge, this is the first study reporting thermodynamic constants for the complexation of doxorubicin with transition metal ions. Unlike other methods, our procedure is computationally affordable for medium-sized systems and provides valuable insights even with limited experimental data. Furthermore, it can be extended to describe the complexation process between 3d-transition metal ions and other bioactive ligands.
Collapse
Affiliation(s)
- Julieta Reyna-Luna
- Departamento de Física y Química Teórica, Facultad de Química, Universidad Nacional Autónoma de México, Cd. Universitaria, 04510, Ciudad de Mexico, México
| | - Luis Soriano-Agueda
- Donostia International Physics Center (DIPC), 20018, Donostia, Euskadi, Spain
| | - Christiaan Jardinez Vera
- Laboratorio de Modelado y Simulación Computacional en Nanomedicina, Escuela Superior de Apan, Universidad Autónoma del Estado de Hidalgo, Carretera Apan-Calpulalpan S/N, Colonia, 43920, Chimalpa Tlalayote, Hgo, México
| | - Marco Franco-Pérez
- Departamento de Física y Química Teórica, Facultad de Química, Universidad Nacional Autónoma de México, Cd. Universitaria, 04510, Ciudad de Mexico, México.
| |
Collapse
|
4
|
Wu J, Kang Y, Pan P, Hou T. Machine learning methods for pK a prediction of small molecules: Advances and challenges. Drug Discov Today 2022; 27:103372. [PMID: 36167281 DOI: 10.1016/j.drudis.2022.103372] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/15/2022] [Accepted: 09/21/2022] [Indexed: 11/27/2022]
Abstract
The acid-base dissociation constant (pKa) is a fundamental property influencing many ADMET properties of small molecules. However, rapid and accurate pKa prediction remains a great challenge. In this review, we outline the current advances in machine-learning-based QSAR models for pKa prediction, including descriptor-based and graph-based approaches, and summarize their pros and cons. Moreover, we highlight the current challenges and future directions regarding experimental data, crucial factors influencing pKa and in silico prediction tools. We hope that this review can provide a practical guidance for the follow-up studies.
Collapse
Affiliation(s)
- Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| |
Collapse
|
5
|
Mayr F, Wieder M, Wieder O, Langer T. Improving Small Molecule pKa Prediction Using Transfer Learning With Graph Neural Networks. Front Chem 2022; 10:866585. [PMID: 35721000 PMCID: PMC9204323 DOI: 10.3389/fchem.2022.866585] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/04/2022] [Indexed: 11/13/2022] Open
Abstract
Enumerating protonation states and calculating microstate pKa values of small molecules is an important yet challenging task for lead optimization and molecular modeling. Commercial and non-commercial solutions have notable limitations such as restrictive and expensive licenses, high CPU/GPU hour requirements, or the need for expert knowledge to set up and use. We present a graph neural network model that is trained on 714,906 calculated microstate pKa predictions from molecules obtained from the ChEMBL database. The model is fine-tuned on a set of 5,994 experimental pKa values significantly improving its performance on two challenging test sets. Combining the graph neural network model with Dimorphite-DL, an open-source program for enumerating ionization states, we have developed the open-source Python package pkasolver, which is able to generate and enumerate protonation states and calculate pKa values with high accuracy.
Collapse
|
6
|
Morency M, Néron S, Iftimie R, Wuest JD. Predicting p Ka Values of Quinols and Related Aromatic Compounds with Multiple OH Groups. J Org Chem 2021; 86:14444-14460. [PMID: 34613729 DOI: 10.1021/acs.joc.1c01279] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Quinonoid compounds play central roles as redox-active agents in photosynthesis and respiration and are also promising replacements for inorganic materials currently used in batteries. To design new quinonoid compounds and predict their state of protonation and redox behavior under various conditions, their pKa values must be known. Methods that can predict the pKa values of simple phenols cannot reliably handle complex analogues in which multiple OH groups are present and may form intramolecular hydrogen bonds. We have therefore developed a straightforward method based on a linear relationship between experimental pKa values and calculated differences in energy between quinols and their deprotonated forms. Simple adjustments allow reliable predictions of pKa values when intramolecular hydrogen bonds are present. Our approach has been validated by showing that predicted and experimental values for over 100 quinols and related compounds differ by an average of only 0.3 units. This accuracy makes it possible to select proper pKa values when experimental data vary, predict the acidity of quinols and related compounds before they are made, and determine the sites and orders of deprotonation in complex structures with multiple OH groups.
Collapse
Affiliation(s)
- Mathieu Morency
- Département de Chimie, Université de Montréal, Montréal, Québec H2V 0B3, Canada
| | - Sébastien Néron
- Département de Chimie, Université de Montréal, Montréal, Québec H2V 0B3, Canada
| | - Radu Iftimie
- Département de Chimie, Université de Montréal, Montréal, Québec H2V 0B3, Canada
| | - James D Wuest
- Département de Chimie, Université de Montréal, Montréal, Québec H2V 0B3, Canada
| |
Collapse
|
7
|
|
8
|
SAMPL7 blind challenge: quantum-mechanical prediction of partition coefficients and acid dissociation constants for small drug-like molecules. J Comput Aided Mol Des 2021; 35:841-851. [PMID: 34164769 DOI: 10.1007/s10822-021-00402-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 06/17/2021] [Indexed: 02/02/2023]
Abstract
The physicochemical properties of a drug molecule determine the therapeutic effectiveness of the drug. Thus, the development of fast and accurate theoretical approaches for the prediction of such properties is inevitable. The participation to the SAMPL7 challenge is based on the estimation of logP coefficients and pKa values of small drug-like sulfonamide derivatives. Thereby, quantum mechanical calculations were carried out in order to calculate the free energy of solvation and the transfer energy of 22 drug-like compounds in different environments (water and n-octanol) by employing the SMD solvation model. For logP calculations, we studied eleven different methodologies to calculate the transfer free energies, the lowest RMSE value was obtained for the M06L/def2-TZVP//M06L/def2-SVP level of theory. On the other hand, we employed an isodesmic reaction scheme within the macro pKa framework; this was based on selecting reference molecules similar to the SAMPL7 challenge molecules. Consequently, highly well correlated pKa values were obtained with the M062X/6-311+G(2df,2p)//M052X/6-31+G(d,p) level of theory.
Collapse
|
9
|
Bergazin TD, Tielker N, Zhang Y, Mao J, Gunner MR, Francisco K, Ballatore C, Kast SM, Mobley DL. Evaluation of log P, pK a, and log D predictions from the SAMPL7 blind challenge. J Comput Aided Mol Des 2021; 35:771-802. [PMID: 34169394 PMCID: PMC8224998 DOI: 10.1007/s10822-021-00397-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/05/2021] [Indexed: 12/16/2022]
Abstract
The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges focuses the computational modeling community on areas in need of improvement for rational drug design. The SAMPL7 physical property challenge dealt with prediction of octanol-water partition coefficients and pKa for 22 compounds. The dataset was composed of a series of N-acylsulfonamides and related bioisosteres. 17 research groups participated in the log P challenge, submitting 33 blind submissions total. For the pKa challenge, 7 different groups participated, submitting 9 blind submissions in total. Overall, the accuracy of octanol-water log P predictions in the SAMPL7 challenge was lower than octanol-water log P predictions in SAMPL6, likely due to a more diverse dataset. Compared to the SAMPL6 pKa challenge, accuracy remains unchanged in SAMPL7. Interestingly, here, though macroscopic pKa values were often predicted with reasonable accuracy, there was dramatically more disagreement among participants as to which microscopic transitions produced these values (with methods often disagreeing even as to the sign of the free energy change associated with certain transitions), indicating far more work needs to be done on pKa prediction methods.
Collapse
Affiliation(s)
| | - Nicolas Tielker
- Physikalische Chemie III, Technische Universität Dortmund, Otto-Hahn-Str. 4a, 44227, Dortmund, Germany
| | - Yingying Zhang
- Department of Physics, The Graduate Center, City University of New York, New York, 10016, USA
| | - Junjun Mao
- Department of Physics, City College of New York, New York, 10031, USA
| | - M R Gunner
- Department of Physics, The Graduate Center, City University of New York, New York, 10016, USA.,Department of Physics, City College of New York, New York, 10031, USA
| | - Karol Francisco
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, Ja Jolla, CA, 92093-0756, USA
| | - Carlo Ballatore
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, Ja Jolla, CA, 92093-0756, USA
| | - Stefan M Kast
- Physikalische Chemie III, Technische Universität Dortmund, Otto-Hahn-Str. 4a, 44227, Dortmund, Germany
| | - David L Mobley
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, CA, 92697, USA. .,Department of Chemistry, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
10
|
Işık M, Rustenburg AS, Rizzi A, Gunner MR, Mobley DL, Chodera JD. Overview of the SAMPL6 pK a challenge: evaluating small molecule microscopic and macroscopic pK a predictions. J Comput Aided Mol Des 2021; 35:131-166. [PMID: 33394238 PMCID: PMC7904668 DOI: 10.1007/s10822-020-00362-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 11/17/2020] [Indexed: 01/01/2023]
Abstract
The prediction of acid dissociation constants (pKa) is a prerequisite for predicting many other properties of a small molecule, such as its protein-ligand binding affinity, distribution coefficient (log D), membrane permeability, and solubility. The prediction of each of these properties requires knowledge of the relevant protonation states and solution free energy penalties of each state. The SAMPL6 pKa Challenge was the first time that a separate challenge was conducted for evaluating pKa predictions as part of the Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) exercises. This challenge was motivated by significant inaccuracies observed in prior physical property prediction challenges, such as the SAMPL5 log D Challenge, caused by protonation state and pKa prediction issues. The goal of the pKa challenge was to assess the performance of contemporary pKa prediction methods for drug-like molecules. The challenge set was composed of 24 small molecules that resembled fragments of kinase inhibitors, a number of which were multiprotic. Eleven research groups contributed blind predictions for a total of 37 pKa distinct prediction methods. In addition to blinded submissions, four widely used pKa prediction methods were included in the analysis as reference methods. Collecting both microscopic and macroscopic pKa predictions allowed in-depth evaluation of pKa prediction performance. This article highlights deficiencies of typical pKa prediction evaluation approaches when the distinction between microscopic and macroscopic pKas is ignored; in particular, we suggest more stringent evaluation criteria for microscopic and macroscopic pKa predictions guided by the available experimental data. Top-performing submissions for macroscopic pKa predictions achieved RMSE of 0.7-1.0 pKa units and included both quantum chemical and empirical approaches, where the total number of extra or missing macroscopic pKas predicted by these submissions were fewer than 8 for 24 molecules. A large number of submissions had RMSE spanning 1-3 pKa units. Molecules with sulfur-containing heterocycles or iodo and bromo groups were less accurately predicted on average considering all methods evaluated. For a subset of molecules, we utilized experimentally-determined microstates based on NMR to evaluate the dominant tautomer predictions for each macroscopic state. Prediction of dominant tautomers was a major source of error for microscopic pKa predictions, especially errors in charged tautomers. The degree of inaccuracy in pKa predictions observed in this challenge is detrimental to the protein-ligand binding affinity predictions due to errors in dominant protonation state predictions and the calculation of free energy corrections for multiple protonation states. Underestimation of ligand pKa by 1 unit can lead to errors in binding free energy errors up to 1.2 kcal/mol. The SAMPL6 pKa Challenge demonstrated the need for improving pKa prediction methods for drug-like molecules, especially for challenging moieties and multiprotic molecules.
Collapse
Affiliation(s)
- Mehtap Işık
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
- Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, 10065, USA.
| | - Ariën S Rustenburg
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Graduate Program in Physiology, Biophysics, and Systems Biology, Weill Cornell Medical College, New York, NY, 10065, USA
| | - Andrea Rizzi
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, 10065, USA
| | - M R Gunner
- Department of Physics, City College of New York, New York, NY, 10031, USA
| | - David L Mobley
- Department of Pharmaceutical Sciences and Department of Chemistry, University of California, Irvine, Irvine, CA, 92697, USA
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| |
Collapse
|
11
|
Malloum A, Fifen JJ, Conradie J. Determination of the absolute solvation free energy and enthalpy of the proton in solutions. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2020.114919] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
|
12
|
Farafonov VS, Lebed AV, Mchedlov-Petrossyan NO. Computing p Ka Shifts Using Traditional Molecular Dynamics: Example of Acid-Base Indicator Dyes in Organized Solutions. J Chem Theory Comput 2020; 16:5852-5865. [PMID: 32786914 DOI: 10.1021/acs.jctc.0c00231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A compound's acidity constant (Ka) in a given medium determines its protonation state and, thus, its behavior and physicochemical properties. Therefore, it is among the key characteristics considered during the design of new compounds for the needs of advanced technology, medicine, and biological research, a notable example being pH sensors. The computational prediction of Ka for weak acids and bases in homogeneous solvents is presently rather well developed. However, it is not the case for more complex media, such as microheterogeneous solutions. The constant-pH molecular dynamics (MD) method is a notable contribution to the solution of the problem, but it is not commonly used. Here, we develop an approach for predicting Ka changes of weak small-molecule acids upon transfer from water to colloid solutions by means of traditional classical molecular dynamics. The approach is based on free energy (ΔG) computations and requires limited experiment data input during calibration. It was successfully tested on a series of pH-sensitive acid-base indicator dyes in micellar solutions of surfactants. The difficulty of finite-size effects affecting ΔG computation between states with different total charges is taken into account by evaluating relevant corrections; their impact on the results is discussed, and it is found non-negligible (0.1-0.4 pKa units). A marked bias is found in the ΔG values of acid deprotonation, as computed from MD, which is apparently caused by force-field issues. It is hypothesized to affect the constant-pH MD and reaction ensemble MD methods as well. Consequently, for these methods, a preliminary calibration is suggested.
Collapse
Affiliation(s)
- Vladimir S Farafonov
- V. N. Karazin Kharkiv National University, 4 Svoboda Square, Kharkiv 61022, Ukraine
| | - Alexander V Lebed
- V. N. Karazin Kharkiv National University, 4 Svoboda Square, Kharkiv 61022, Ukraine
| | | |
Collapse
|
13
|
Yang Q, Li Y, Yang J, Liu Y, Zhang L, Luo S, Cheng J. Holistic Prediction of the p
K
a
in Diverse Solvents Based on a Machine‐Learning Approach. Angew Chem Int Ed Engl 2020; 59:19282-19291. [DOI: 10.1002/anie.202008528] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 07/13/2020] [Indexed: 12/12/2022]
Affiliation(s)
- Qi Yang
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Yao Li
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Jin‐Dong Yang
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Yidi Liu
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Long Zhang
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Sanzhong Luo
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Jin‐Pei Cheng
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| |
Collapse
|
14
|
Yang Q, Li Y, Yang J, Liu Y, Zhang L, Luo S, Cheng J. Holistic Prediction of the p
K
a
in Diverse Solvents Based on a Machine‐Learning Approach. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.202008528] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Qi Yang
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Yao Li
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Jin‐Dong Yang
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Yidi Liu
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Long Zhang
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Sanzhong Luo
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| | - Jin‐Pei Cheng
- Center of Basic Molecular Science Department of Chemistry Tsinghua University 100084 Beijing China
| |
Collapse
|
15
|
Zanetti-Polzi L, Daidone I, Amadei A. Fully Atomistic Multiscale Approach for p Ka Prediction. J Phys Chem B 2020; 124:4712-4722. [PMID: 32427481 DOI: 10.1021/acs.jpcb.0c01752] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The ionization state of titratable amino acids strongly affects proteins structure and functioning in a large number of biological processes. It is therefore essential to be able to characterize the pKa of ionizable groups inside proteins and to understand its microscopic determinants in order to gain insights into many functional properties of proteins. A big effort has been devoted to the development of theoretical approaches for the prediction of deprotonation free energies, yet the accurate theoretical/computational calculation of pKa values is recognized as a current challenge. A methodology based on a hybrid quantum/classical approach is here proposed for the computation of deprotonation free energies. The method is applied to calculate the pKa of formic acid, methylammonium, and methanethiol, providing results in good agreement with the corresponding experimental estimates. The pKa is also calculated for aspartic acid and lysine as single residues in solution and for three aspartic/glutamic acids inside a well-characterized protein: hen egg white lysozyme. While for small molecules the method is able to deal with multiple protonation states of all titratable groups, this becomes computationally very expensive for proteins. The calculated pKa values for the single amino acids (except for the zwitterionic aspartic acid) and inside the protein display a systematic shift with respect to the experimental values that suggests that the fine balance between hydrophobic and polar interactions might be not accurately reproduced by the usual classical force-fields, thus affecting the computation of deprotonation free energies. The calculated pKa shifts inside the protein are in good agreement with the corresponding experimental ones (within 1 pKa unit), well reproducing the pKa changes due to the protein environment even in the case of large pKa shifts.
Collapse
Affiliation(s)
| | - Isabella Daidone
- Department of Physical and Chemical Sciences, University of L'Aquila, Via Vetoio, I-67010 L'Aquila, Italy
| | - Andrea Amadei
- Department of Chemical and Technological Sciences, University of Rome "Tor Vergata", Via della Ricerca Scientifica, I-00185 Rome, Italy
| |
Collapse
|
16
|
Hunt P, Hosseini-Gerami L, Chrien T, Plante J, Ponting DJ, Segall M. Predicting p Ka Using a Combination of Semi-Empirical Quantum Mechanics and Radial Basis Function Methods. J Chem Inf Model 2020; 60:2989-2997. [PMID: 32357002 DOI: 10.1021/acs.jcim.0c00105] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The acid dissociation constant (pKa) has an important influence on molecular properties crucial to compound development in synthesis, formulation, and optimization of absorption, distribution, metabolism, and excretion properties. We will present a method that combines quantum mechanical calculations, at a semi-empirical level of theory, with machine learning to accurately predict pKa for a diverse range of mono- and polyprotic compounds. The resulting model has been tested on two external data sets, one specifically used to test pKa prediction methods (SAMPL6) and the second covering known drugs containing basic functionalities. Both sets were predicted with excellent accuracy (root-mean-square errors of 0.7-1.0 log units), comparable to other methodologies using a much higher level of theory and computational cost.
Collapse
Affiliation(s)
- Peter Hunt
- Optibrium Ltd., F5-6 Blenheim House, Cambridge Innovation Park, Denny End Road, Cambridge CB25 9PB, U.K
| | - Layla Hosseini-Gerami
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Tomas Chrien
- Optibrium Ltd., F5-6 Blenheim House, Cambridge Innovation Park, Denny End Road, Cambridge CB25 9PB, U.K
| | - Jeffrey Plante
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, U.K
| | - David J Ponting
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, U.K
| | - Matthew Segall
- Optibrium Ltd., F5-6 Blenheim House, Cambridge Innovation Park, Denny End Road, Cambridge CB25 9PB, U.K
| |
Collapse
|
17
|
Prasad S, Brooks BR. A deep learning approach for the blind logP prediction in SAMPL6 challenge. J Comput Aided Mol Des 2020; 34:535-542. [PMID: 32002779 PMCID: PMC8689685 DOI: 10.1007/s10822-020-00292-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/17/2020] [Indexed: 12/14/2022]
Abstract
Water octanol partition coefficient serves as a measure for the lipophilicity of a molecule and is important in the field of drug discovery. A novel method for computational prediction of logarithm of partition coefficient (logP) has been developed using molecular fingerprints and a deep neural network. The machine learning model was trained on a dataset of 12,000 molecules and tested on 2000 molecules. In this article, we present our results for the blind prediction of logP for the SAMPL6 challenge. While the best submission achieved a RMSE of 0.41 logP units, our submission had a RMSE of 0.61 logP units. Overall, we ranked in the top quarter out of the 92 submissions that were made. Our results show that the deep learning model can be used as a fast, accurate and robust method for high throughput prediction of logP of small molecules.
Collapse
Affiliation(s)
- Samarjeet Prasad
- Biophysics and Biophysical Chemistry, The Johns Hopkins University, School of Medicine, Baltimore, MD, 21205, USA.
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20814, USA.
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20814, USA
| |
Collapse
|
18
|
Krämer A, Hudson PS, Jones MR, Brooks BR. Multi-phase Boltzmann weighting: accounting for local inhomogeneity in molecular simulations of water-octanol partition coefficients in the SAMPL6 challenge. J Comput Aided Mol Des 2020; 34:471-483. [PMID: 32060677 PMCID: PMC8750956 DOI: 10.1007/s10822-020-00285-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 01/08/2020] [Indexed: 02/08/2023]
Abstract
Accurately computing partition coefficients is a pivotal part of drug discovery. Specifically, octanol-water partition coefficients can provide information into hydrophobicity of drug-like molecules, as well as a de facto representation of membrane permeability. However, one challenge facing the computation of partition coefficients is the need to encapsulate various microscopic environments. These include areas of largely bulk solvent (i.e., either water or octanol) or regions where octanol is saturated with water or areas of higher salt concentration. Also, tautomeric effects require consideration. Thus, we present a Boltzmann weighting approach that incorporates transfer free energies across varying microscopic media, as well as varying tautomeric state, to compute partition coefficients in the SAMPL6 challenge.
Collapse
Affiliation(s)
- Andreas Krämer
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
| | - Phillip S Hudson
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
- Department of Chemistry, University of South Florida, Tampa, FL, 33620, USA
| | - Michael R Jones
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| |
Collapse
|
19
|
Standard state free energies, not pK as, are ideal for describing small molecule protonation and tautomeric states. J Comput Aided Mol Des 2020; 34:561-573. [PMID: 32052350 DOI: 10.1007/s10822-020-00280-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 01/08/2020] [Indexed: 12/14/2022]
Abstract
The pKa is the standard measure used to describe the aqueous proton affinity of a compound, indicating the proton concentration (pH) at which two protonation states (e.g. A- and AH) have equal free energy. However, compounds can have additional protonation states (e.g. AH2+), and may assume multiple tautomeric forms, with the protons in different positions (microstates). Macroscopic pKas give the pH where the molecule changes its total number of protons, while microscopic pKas identify the tautomeric states involved. As tautomers have the same number of protons, the free energy difference between them and their relative probability is pH independent so there is no pKa connecting them. The question arises: What is the best way to describe protonation equilibria of a complex molecule in any pH range? Knowing the number of protons and the relative free energy of all microstates at a single pH, ∆G°, provides all the information needed to determine the free energy, and thus the probability of each microstate at each pH. Microstate probabilities as a function of pH generate titration curves that highlight the low energy, observable microstates, which can then be compared with experiment. A network description connecting microstates as nodes makes it straightforward to test thermodynamic consistency of microstate free energies. The utility of this analysis is illustrated by a description of one molecule from the SAMPL6 Blind pKa Prediction Challenge. Analysis of microstate ∆G°s also makes a more compact way to archive and compare the pH dependent behavior of compounds with multiple protonatable sites.
Collapse
|