1
|
Draper MR, Waterman A, Dannatt JE, Patel P. Integrating multiscale and machine learning approaches towards the SAMPL9 log P challenge. Phys Chem Chem Phys 2024; 26:7907-7919. [PMID: 38376855 PMCID: PMC10938873 DOI: 10.1039/d3cp04140a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
The partition coefficient (log P) is an important physicochemical property that provides information regarding a molecule's pharmacokinetics, toxicity, and bioavailability. Methods to accurately predict the partition coefficient have the potential to accelerate drug design. In an effort to test current methods and explore new computational techniques, the statistical assessment of the modeling of proteins and ligands (SAMPL) has established a blind prediction challenge. The ninth iteration challenge was to predict the toluene-water partition coefficient (log Ptol/w) of sixteen drug molecules. Herein, three approaches are reported broadly under the categories of quantum mechanics (QM), molecular mechanics (MM), and data-driven machine learning (ML). The three blind submissions yield mean unsigned errors (MUE) ranging from 1.53-2.93 log Ptol/w units. The MUEs were reduced to 1.00 log Ptol/w for the QM methods. While MM and ML methods outperformed DFT approaches for challenge molecules with fewer rotational degrees of freedom, they suffered for the larger molecules in this dataset. Overall, DFT functionals paired with a triple-ζ basis set were the simplest and most effective tool to obtain quantitatively accurate partition coefficients.
Collapse
Affiliation(s)
- Michael R Draper
- Chemistry Department, University of Dallas, Irving, Texas, 75062, USA.
| | - Asa Waterman
- Chemistry Department, University of Dallas, Irving, Texas, 75062, USA.
| | | | - Prajay Patel
- Chemistry Department, University of Dallas, Irving, Texas, 75062, USA.
| |
Collapse
|
2
|
Zamora WJ, Viayna A, Pinheiro S, Curutchet C, Bisbal L, Ruiz R, Ràfols C, Luque FJ. Prediction of toluene/water partition coefficients in the SAMPL9 blind challenge: assessment of machine learning and IEF-PCM/MST continuum solvation models. Phys Chem Chem Phys 2023. [PMID: 37376995 DOI: 10.1039/d3cp01428b] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
In recent years the use of partition systems other than the widely used biphasic n-octanol/water has received increased attention to gain insight into the molecular features that dictate the lipophilicity of compounds. Thus, the difference between n-octanol/water and toluene/water partition coefficients has proven to be a valuable descriptor to study the propensity of molecules to form intramolecular hydrogen bonds and exhibit chameleon-like properties that modulate solubility and permeability. In this context, this study reports the experimental toluene/water partition coefficients (log Ptol/w) for a series of 16 drugs that were selected as an external test set in the framework of the Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) blind challenge. This external set has been used by the computational community to calibrate their methods in the current edition (SAMPL9) of this contest. Furthermore, the study also investigates the performance of two computational strategies for the prediction of log Ptol/w. The first relies on the development of two machine learning (ML) models, which are built up by combining the selection of 11 molecular descriptors in conjunction with either the multiple linear regression (MLR) or the random forest regression (RFR) model to target a dataset of 252 experimental log Ptol/w values. The second consists of the parametrization of the IEF-PCM/MST continuum solvation model from B3LYP/6-31G(d) calculations to predict the solvation free energies of 163 compounds in toluene and benzene. The performance of the ML and IEF-PCM/MST models has been calibrated against external test sets, including the compounds that define the SAMPL9 log Ptol/w challenge. The results are used to discuss the merits and weaknesses of the two computational approaches.
Collapse
Affiliation(s)
- William J Zamora
- CBio3 Laboratory, School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica.
- Laboratory of Computational Toxicology and Artificial Intelligence (LaToxCIA), Biological Testing Laboratory (LEBi), University of Costa Rica, San Pedro, San José, Costa Rica
- Advanced Computing Lab (CNCA), National High Technology Center (CeNAT), Pavas, San José, Costa Rica
| | - Antonio Viayna
- Departament de Nutrició, Ciències de l'Alimentació i Gastronomia, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Av. Prat de la Riba 171, 08921 Santa Coloma de Gramenet, Spain.
- Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain
- Institut de Química Teòrica i Computacional (IQTC-UB), Universitat de Barcelona (UB), Barcelona, Spain
| | - Silvana Pinheiro
- CBio3 Laboratory, School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica.
- Laboratory of Computational Toxicology and Artificial Intelligence (LaToxCIA), Biological Testing Laboratory (LEBi), University of Costa Rica, San Pedro, San José, Costa Rica
| | - Carles Curutchet
- Institut de Química Teòrica i Computacional (IQTC-UB), Universitat de Barcelona (UB), Barcelona, Spain
- Departament de Farmàcia i Tecnologia Farmacèutica, i Fisicoquímica, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Av. Joan XXIII 27-31, 08028, Barcelona, Spain
| | - Laia Bisbal
- Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain
- Departament d'Enginyeria Química i Química Analítica, Universitat de Barcelona (UB), Martí i Franquès 1-11, 08028 Barcelona, Spain.
| | - Rebeca Ruiz
- Pion Inc., Forest Row Business Park, Forest Row RH18 5DW, UK
| | - Clara Ràfols
- Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain
- Departament d'Enginyeria Química i Química Analítica, Universitat de Barcelona (UB), Martí i Franquès 1-11, 08028 Barcelona, Spain.
| | - F Javier Luque
- Departament de Nutrició, Ciències de l'Alimentació i Gastronomia, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Av. Prat de la Riba 171, 08921 Santa Coloma de Gramenet, Spain.
- Institut de Biomedicina (IBUB), Universitat de Barcelona (UB), Barcelona, Spain
- Institut de Química Teòrica i Computacional (IQTC-UB), Universitat de Barcelona (UB), Barcelona, Spain
| |
Collapse
|
3
|
Ruiz R, Zamora WJ, Ràfols C, Bosch E. Molecular characteristics of several drugs evaluated from solvent/water partition measurements: Solvation parameters and intramolecular hydrogen bond indicator. Eur J Pharm Sci 2022; 168:106066. [PMID: 34767947 DOI: 10.1016/j.ejps.2021.106066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/03/2021] [Accepted: 11/05/2021] [Indexed: 11/03/2022]
Abstract
A wide set of well-known drugs, most of them included in the Abraham´s reference database, covering a wide variety of chemical structures and therapeutical functionalities were chosen in order to determine some molecular properties from solvent/water partition measurements. Partition data from aqueous solutions and four different solvents (n-dodecane, toluene, chloroform and n-octanol) were measured and reported. From them, Abraham´s molecular descriptors of selected compounds (A, B and S, accounting for hydrogen bond donor, hydrogen bond acceptor and dipolarity/polaritzability, respectively) were estimated. A and B values derived from the experimental measurements strongly agree with the tabulated ones showing the suitability of the used procedure to achieve reliable values for new molecules. However, obtained S values differ from those previously reported for several compounds. Moreover, values for a new indicator of the propensity to form intramolecular hydrogen bonds (Δlog Poct-tol) were estimated from the experimental data and also calculated according to both, the Abraham´s model and the molecular structures (SMD). The quality of both series of calculated descriptors was evaluated by contrast with the experimental values and satisfactory results were obtained in both instances. Thus, the Abraham´s way is useful when molecular descriptors are available but very good estimations can be achieved by SMD, which only requires the drug´s molecular structure.
Collapse
Affiliation(s)
- Rebeca Ruiz
- Pion Inc., Forest Row Business Park, Forest Row RH18 5DW, UK
| | - William J Zamora
- School of Chemistry and Faculty of Pharmacy, University of Costa Rica, San Pedro, San José, Costa Rica; Advanced Computing Lab (CNCA), National High Technology Center (CeNAT), Pavas, San José, Costa Rica
| | - Clara Ràfols
- Departament d'Enginyeria Química i Química Analítica and Institut de Biomedicina (IBUB), Universitat de Barcelona, Martí i Franquès 1-11, 08028 Barcelona, Spain.
| | - Elisabeth Bosch
- Departament d'Enginyeria Química i Química Analítica and Institut de Biomedicina (IBUB), Universitat de Barcelona, Martí i Franquès 1-11, 08028 Barcelona, Spain
| |
Collapse
|
4
|
Reetz MT, König G. n
‐Butanol: An Ecologically and Economically Viable Extraction Solvent for Isolating Polar Products from Aqueous Solutions. European J Org Chem 2021. [DOI: 10.1002/ejoc.202100829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Manfred T. Reetz
- Max-Planck-Institut für Kohlenforschung Kaiser-Wilhelm-Platz 1 45470 Mülheim an der Ruhr Germany
- Tianjin Institute of Industrial Biotechnology Chinese Academy of Sciences Tianjin China
| | - Gerhard König
- Centre for Enzyme Innovation University of Portsmouth St Michael's Building Portsmouth PO1 2DT United Kingdom
| |
Collapse
|
5
|
Lopez K, Pinheiro S, Zamora WJ. Multiple linear regression models for predicting the n‑octanol/water partition coefficients in the SAMPL7 blind challenge. J Comput Aided Mol Des 2021; 35:923-931. [PMID: 34251523 PMCID: PMC8273033 DOI: 10.1007/s10822-021-00409-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 07/05/2021] [Indexed: 01/19/2023]
Abstract
A multiple linear regression model called MLR-3 is used for predicting the experimental n-octanol/water partition coefficient (log PN) of 22 N-sulfonamides proposed by the organizers of the SAMPL7 blind challenge. The MLR-3 method was trained with 82 molecules including drug-like sulfonamides and small organic molecules, which resembled the main functional groups present in the challenge dataset. Our model, submitted as "TFE-MLR", presented a root-mean-square error of 0.58 and mean absolute error of 0.41 in log P units, accomplishing the highest accuracy, among empirical methods and also in all submissions based on the ranked ones. Overall, the results support the appropriateness of multiple linear regression approach MLR-3 for computing the n-octanol/water partition coefficient in sulfonamide-bearing compounds. In this context, the outstanding performance of empirical methodologies, where 75% of the ranked submissions achieved root-mean-square errors < 1 log P units, support the suitability of these strategies for obtaining accurate and fast predictions of physicochemical properties as partition coefficients of bioorganic compounds.
Collapse
Affiliation(s)
- Kenneth Lopez
- School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica
| | - Silvana Pinheiro
- Institute of Exact and Natural Sciences, Federal University of Pará, Belém, Pará, 66075-110, Brazil
| | - William J Zamora
- School of Chemistry, University of Costa Rica, San Pedro, San José, Costa Rica.
- Advanced Computing Lab (CNCA), National High Technology Center (CeNAT-CONARE), Pavas, San José, Costa Rica.
| |
Collapse
|
6
|
Donyapour N, Dickson A. Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method. J Comput Aided Mol Des 2021; 35:819-830. [PMID: 34181200 PMCID: PMC8295205 DOI: 10.1007/s10822-021-00400-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/17/2021] [Indexed: 02/02/2023]
Abstract
The prediction of [Formula: see text] values is one part of the statistical assessment of the modeling of proteins and ligands (SAMPL) blind challenges. Here, we use a molecular graph representation method called Geometric Scattering for Graphs (GSG) to transform atomic attributes to molecular features. The atomic attributes used here are parameters from classical molecular force fields including partial charges and Lennard-Jones interaction parameters. The molecular features from GSG are used as inputs to neural networks that are trained using a "master" dataset comprised of over 41,000 unique [Formula: see text] values. The specific molecular targets in the SAMPL7 [Formula: see text] prediction challenge were unique in that they all contained a sulfonyl moeity. This motivated a set of ClassicalGSG submissions where predictors were trained on different subsets of the master dataset that are filtered according to chemical types and/or the presence of the sulfonyl moeity. We find that our ranked prediction obtained 5th place with an RMSE of 0.77 [Formula: see text] units and an MAE of 0.62, while one of our non-ranked predictions achieved first place among all submissions with an RMSE of 0.55 and an MAE of 0.44. After the conclusion of the challenge we also examined the performance of open-source force field parameters that allow for an end-to-end [Formula: see text] predictor model: General AMBER Force Field (GAFF), Universal Force Field (UFF), Merck Molecular Force Field 94 (MMFF94) and Ghemical. We find that ClassicalGSG models trained with atomic attributes from MMFF94 can yield more accurate predictions compared to those trained with CGenFF atomic attributes.
Collapse
Affiliation(s)
- Nazanin Donyapour
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Alex Dickson
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
7
|
Işık M, Bergazin TD, Fox T, Rizzi A, Chodera JD, Mobley DL. Assessing the accuracy of octanol-water partition coefficient predictions in the SAMPL6 Part II log P Challenge. J Comput Aided Mol Des 2020; 34:335-370. [PMID: 32107702 PMCID: PMC7138020 DOI: 10.1007/s10822-020-00295-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The SAMPL Challenges aim to focus the biomolecular and physical modeling community on issues that limit the accuracy of predictive modeling of protein-ligand binding for rational drug design. In the SAMPL5 log D Challenge, designed to benchmark the accuracy of methods for predicting drug-like small molecule transfer free energies from aqueous to nonpolar phases, participants found it difficult to make accurate predictions due to the complexity of protonation state issues. In the SAMPL6 log P Challenge, we asked participants to make blind predictions of the octanol-water partition coefficients of neutral species of 11 compounds and assessed how well these methods performed absent the complication of protonation state effects. This challenge builds on the SAMPL6 p[Formula: see text] Challenge, which asked participants to predict p[Formula: see text] values of a superset of the compounds considered in this log P challenge. Blind prediction sets of 91 prediction methods were collected from 27 research groups, spanning a variety of quantum mechanics (QM) or molecular mechanics (MM)-based physical methods, knowledge-based empirical methods, and mixed approaches. There was a 50% increase in the number of participating groups and a 20% increase in the number of submissions compared to the SAMPL5 log D Challenge. Overall, the accuracy of octanol-water log P predictions in SAMPL6 Challenge was higher than cyclohexane-water log D predictions in SAMPL5, likely because modeling only the neutral species was necessary for log P and several categories of method benefited from the vast amounts of experimental octanol-water log P data. There were many highly accurate methods: 10 diverse methods achieved RMSE less than 0.5 log P units. These included QM-based methods, empirical methods, and mixed methods with physical modeling supported with empirical corrections. A comparison of physical modeling methods showed that QM-based methods outperformed MM-based methods. The average RMSE of the most accurate five MM-based, QM-based, empirical, and mixed approach methods based on RMSE were 0.92 ± 0.13, 0.48 ± 0.06, 0.47 ± 0.05, and 0.50 ± 0.06, respectively.
Collapse
Affiliation(s)
- Mehtap Işık
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
- Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, 10065, USA.
| | | | - Thomas Fox
- Computational Chemistry, Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397, Biberach, Germany
| | - Andrea Rizzi
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, 10065, USA
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - David L Mobley
- Department of Pharmaceutical Sciences, University of California, Irvine, CA, 92697, USA
- Department of Chemistry, University of California, Irvine, CA, 92697, USA
| |
Collapse
|