1
|
Suarez AG, Göller AH, Beck ME, Gheta SKO, Meier K. Comparative assessment of physics-based in silico methods to calculate relative solubilities. J Comput Aided Mol Des 2024; 38:36. [PMID: 39470860 DOI: 10.1007/s10822-024-00576-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Accepted: 10/11/2024] [Indexed: 11/01/2024]
Abstract
Relative solubilities, i.e. whether a given molecule is more soluble in one solvent compared to others, is a critical parameter for pharmaceutical and agricultural formulation development and chemical synthesis, material science, and environmental chemistry. In silico predictions of this crucial variable can help reducing experiments, waste of solvents and synthesis optimization. In this study, we evaluate the performance of different physics-based methods for predicting relative solubilities. Our assessment involves quantum mechanics-based COSMO-RS and molecular dynamics-based free energy methods using OPLS4, the open-source OpenFF Sage, and GAFF force fields, spanning over 200 solvent-solute combinations. Our investigation highlights the important role of compound multimerization, an effect which must be accounted for to obtain accurate relative solubility predictions. The performance landscape of these methods is varied, with significant differences in precision depending on both the method used and the solute considered, thereby offering an improved understanding of the predictive power of physics-based methods in chemical research.
Collapse
Affiliation(s)
- Adiran Garaizar Suarez
- Bayer AG, Pharmaceuticals, Structural Biology & Computational Design, Wuppertal, Germany
- Bayer AG, Crop Science, Data Science, Monheim, Germany
| | - Andreas H Göller
- Bayer AG, Pharmaceuticals, Structural Biology & Computational Design, Wuppertal, Germany
| | | | | | - Katharina Meier
- Bayer AG, Pharmaceuticals, Structural Biology & Computational Design, Wuppertal, Germany.
| |
Collapse
|
2
|
Ramos MC, White AD. Predicting small molecules solubility on endpoint devices using deep ensemble neural networks. DIGITAL DISCOVERY 2024; 3:786-795. [PMID: 38638648 PMCID: PMC11022985 DOI: 10.1039/d3dd00217a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 03/07/2024] [Indexed: 04/20/2024]
Abstract
Aqueous solubility is a valuable yet challenging property to predict. Computing solubility using first-principles methods requires accounting for the competing effects of entropy and enthalpy, resulting in long computations for relatively poor accuracy. Data-driven approaches, such as deep learning, offer improved accuracy and computational efficiency but typically lack uncertainty quantification. Additionally, ease of use remains a concern for any computational technique, resulting in the sustained popularity of group-based contribution methods. In this work, we addressed these problems with a deep learning model with predictive uncertainty that runs on a static website (without a server). This approach moves computing needs onto the website visitor without requiring installation, removing the need to pay for and maintain servers. Our model achieves satisfactory results in solubility prediction. Furthermore, we demonstrate how to create molecular property prediction models that balance uncertainty and ease of use. The code is available at https://github.com/ur-whitelab/mol.dev, and the model is useable at https://mol.dev.
Collapse
Affiliation(s)
- Mayk Caldas Ramos
- Chemical Engineer Department, University of Rochester Rochester NY 14642 USA
| | - Andrew D White
- Chemical Engineer Department, University of Rochester Rochester NY 14642 USA
| |
Collapse
|
3
|
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A. Will we ever be able to accurately predict solubility? Sci Data 2024; 11:303. [PMID: 38499581 PMCID: PMC10948805 DOI: 10.1038/s41597-024-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/29/2024] [Indexed: 03/20/2024] Open
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Collapse
Affiliation(s)
- P Llompart
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
- IDD/CADD, Sanofi, Vitry-Sur-Seine, France
| | | | - S Baybekov
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - G Marcou
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
| | - A Varnek
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
4
|
pH-dependent solubility prediction for optimized drug absorption and compound uptake by plants. J Comput Aided Mol Des 2023; 37:129-145. [PMID: 36797399 DOI: 10.1007/s10822-023-00496-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 01/31/2023] [Indexed: 02/18/2023]
Abstract
Aqueous solubility is the most important physicochemical property for agrochemical and drug candidates and a prerequisite for uptake, distribution, transport, and finally the bioavailability in living species. We here present the first-ever direct machine learning models for pH-dependent solubility in water. For this, we combined almost 300000 data points from 11 solubility assays performed over 24 years and over one million data points from lipophilicity and melting point experiments. Data were split into three pH-classes - acidic, neutral and basic - , representing the conditions of stomach and intestinal tract for animals and humans, and phloem and xylem for plants. We find that multi-task neural networks using ECFP-6 fingerprints outperform baseline random forests and single-task neural networks on the individual tasks. Our final model with three solubility tasks using the pH-class combined data from different assays and five helper tasks results in root mean square errors of 0.56 log units overall (acidic 0.61; neutral 0.52; basic 0.54) and Spearman rank correlations of 0.83 (acidic 0.78; neutral 0.86; basic 0.86), making it a valuable tool for profiling of compounds in pharmaceutical and agrochemical research. The model allows for the prediction of compound pH profiles with mean and median RMSE per molecule of 0.62 and 0.56 log units.
Collapse
|
5
|
Ahmad W, Tayara H, Chong KT. Attention-Based Graph Neural Network for Molecular Solubility Prediction. ACS OMEGA 2023; 8:3236-3244. [PMID: 36713733 PMCID: PMC9878542 DOI: 10.1021/acsomega.2c06702] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]
Abstract
Drug discovery (DD) research is aimed at the discovery of new medications. Solubility is an important physicochemical property in drug development. Active pharmaceutical ingredients (APIs) are essential substances for high drug efficacy. During DD research, aqueous solubility (AS) is a key physicochemical attribute required for API characterization. High-precision in silico solubility prediction reduces the experimental cost and time of drug development. Several artificial tools have been employed for solubility prediction using machine learning and deep learning techniques. This study aims to create different deep learning models that can predict the solubility of a wide range of molecules using the largest currently available solubility data set. Simplified molecular-input line-entry system (SMILES) strings were used as molecular representation, models developed using simple graph convolution, graph isomorphism network, graph attention network, and AttentiveFP network. Based on the performance of the models, the AttentiveFP-based network model was finally selected. The model was trained and tested on 9943 compounds. The model outperformed on 62 anticancer compounds with metric Pearson correlation R 2 and root-mean-square error values of 0.52 and 0.61, respectively. AS can be improved by graph algorithm improvement or more molecular properties addition.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
| | - Hilal Tayara
- School
of International Engineering and Science, Jeonbuk National University, Jeonju54896, South Korea
| | - Kil To Chong
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
- Advanced
Electronics and Information Research Center, Jeonbuk National University, Jeonju54896, South Korea
| |
Collapse
|
6
|
Xiouras C, Cameli F, Quilló GL, Kavousanakis ME, Vlachos DG, Stefanidis GD. Applications of Artificial Intelligence and Machine Learning Algorithms to Crystallization. Chem Rev 2022; 122:13006-13042. [PMID: 35759465 DOI: 10.1021/acs.chemrev.2c00141] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Artificial intelligence and specifically machine learning applications are nowadays used in a variety of scientific applications and cutting-edge technologies, where they have a transformative impact. Such an assembly of statistical and linear algebra methods making use of large data sets is becoming more and more integrated into chemistry and crystallization research workflows. This review aims to present, for the first time, a holistic overview of machine learning and cheminformatics applications as a novel, powerful means to accelerate the discovery of new crystal structures, predict key properties of organic crystalline materials, simulate, understand, and control the dynamics of complex crystallization process systems, as well as contribute to high throughput automation of chemical process development involving crystalline materials. We critically review the advances in these new, rapidly emerging research areas, raising awareness in issues such as the bridging of machine learning models with first-principles mechanistic models, data set size, structure, and quality, as well as the selection of appropriate descriptors. At the same time, we propose future research at the interface of applied mathematics, chemistry, and crystallography. Overall, this review aims to increase the adoption of such methods and tools by chemists and scientists across industry and academia.
Collapse
Affiliation(s)
- Christos Xiouras
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Fabio Cameli
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Gustavo Lunardon Quilló
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium.,Chemical and BioProcess Technology and Control, Department of Chemical Engineering, Faculty of Engineering Technology, KU Leuven, Gebroeders de Smetstraat 1, 9000 Ghent, Belgium
| | - Mihail E Kavousanakis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Georgios D Stefanidis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece.,Laboratory for Chemical Technology, Ghent University; Tech Lane Ghent Science Park 125, B-9052 Ghent, Belgium
| |
Collapse
|
7
|
Grebner C, Matter H, Hessler G. Artificial Intelligence in Compound Design. Methods Mol Biol 2021; 2390:349-382. [PMID: 34731477 DOI: 10.1007/978-1-0716-1787-8_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Artificial intelligence has seen an incredibly fast development in recent years. Many novel technologies for property prediction of drug molecules as well as for the design of novel molecules were introduced by different research groups. These artificial intelligence-based design methods can be applied for suggesting novel chemical motifs in lead generation or scaffold hopping as well as for optimization of desired property profiles during lead optimization. In lead generation, broad sampling of the chemical space for identification of novel motifs is required, while in the lead optimization phase, a detailed exploration of the chemical neighborhood of a current lead series is advantageous. These different requirements for successful design outcomes render different combinations of artificial intelligence technologies useful. Overall, we observe that a combination of different approaches with tailored scoring and evaluation schemes appears beneficial for efficient artificial intelligence-based compound design.
Collapse
Affiliation(s)
- Christoph Grebner
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Hans Matter
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Gerhard Hessler
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany.
| |
Collapse
|
8
|
Kolmar SS, Grulke CM. The effect of noise on the predictive limit of QSAR models. J Cheminform 2021; 13:92. [PMID: 34823605 PMCID: PMC8613965 DOI: 10.1186/s13321-021-00571-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/14/2021] [Indexed: 01/09/2023] Open
Abstract
A key challenge in the field of Quantitative Structure Activity Relationships (QSAR) is how to effectively treat experimental error in the training and evaluation of computational models. It is often assumed in the field of QSAR that models cannot produce predictions which are more accurate than their training data. Additionally, it is implicitly assumed, by necessity, that data points in test sets or validation sets do not contain error, and that each data point is a population mean. This work proposes the hypothesis that QSAR models can make predictions which are more accurate than their training data and that the error-free test set assumption leads to a significant misevaluation of model performance. This work used 8 datasets with six different common QSAR endpoints, because different endpoints should have different amounts of experimental error associated with varying complexity of the measurements. Up to 15 levels of simulated Gaussian distributed random error was added to the datasets, and models were built on the error laden datasets using five different algorithms. The models were trained on the error laden data, evaluated on error-laden test sets, and evaluated on error-free test sets. The results show that for each level of added error, the RMSE for evaluation on the error free test sets was always better. The results support the hypothesis that, at least under the conditions of Gaussian distributed random error, QSAR models can make predictions which are more accurate than their training data, and that the evaluation of models on error laden test and validation sets may give a flawed measure of model performance. These results have implications for how QSAR models are evaluated, especially for disciplines where experimental error is very large, such as in computational toxicology. ![]()
Collapse
Affiliation(s)
- Scott S Kolmar
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA.
| | - Christopher M Grulke
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| |
Collapse
|
9
|
Hu P, Jiao Z, Zhang Z, Wang Q. Development of Solubility Prediction Models with Ensemble Learning. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.1c02142] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Pingfan Hu
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Zeren Jiao
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Zhuoran Zhang
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Qingsheng Wang
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| |
Collapse
|
10
|
Zhang R, Li X, Zhang X, Qin H, Xiao W. Machine learning approaches for elucidating the biological effects of natural products. Nat Prod Rep 2021; 38:346-361. [PMID: 32869826 DOI: 10.1039/d0np00043d] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Covering: 2000 to 2020 Machine learning (ML) is an efficient tool for the prediction of bioactivity and the study of structure-activity relationships. Over the past decade, an emerging trend for combining these approaches with the study of natural products (NPs) has developed in order to manage the challenge of the discovery of bioactive NPs. In the present review, we will introduce the basic principles and protocols for using the ML approach to investigate the bioactivity of NPs, citing a series of practical examples regarding the study of anti-microbial, anti-cancer, and anti-inflammatory NPs, etc. ML algorithms manage a variety of classification and regression problems associated with bioactive NPs, from those that are linear to non-linear and from pure compounds to plant extracts. Inspired by cases reported in the literature and our own experience, a number of key points have been emphasized for reducing modeling errors, including dataset preparation and applicability domain analysis.
Collapse
Affiliation(s)
- Ruihan Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xiaoli Li
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xingjie Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Huayan Qin
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Weilie Xiao
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| |
Collapse
|
11
|
Wu Z, Zhu M, Kang Y, Leung ELH, Lei T, Shen C, Jiang D, Wang Z, Cao D, Hou T. Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets. Brief Bioinform 2020; 22:6032614. [PMID: 33313673 DOI: 10.1093/bib/bbaa321] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/09/2020] [Accepted: 10/19/2020] [Indexed: 12/18/2022] Open
Abstract
Although a wide variety of machine learning (ML) algorithms have been utilized to learn quantitative structure-activity relationships (QSARs), there is no agreed single best algorithm for QSAR learning. Therefore, a comprehensive understanding of the performance characteristics of popular ML algorithms used in QSAR learning is highly desirable. In this study, five linear algorithms [linear function Gaussian process regression (linear-GPR), linear function support vector machine (linear-SVM), partial least squares regression (PLSR), multiple linear regression (MLR) and principal component regression (PCR)], three analogizers [radial basis function support vector machine (rbf-SVM), K-nearest neighbor (KNN) and radial basis function Gaussian process regression (rbf-GPR)], six symbolists [extreme gradient boosting (XGBoost), Cubist, random forest (RF), multiple adaptive regression splines (MARS), gradient boosting machine (GBM), and classification and regression tree (CART)] and two connectionists [principal component analysis artificial neural network (pca-ANN) and deep neural network (DNN)] were employed to learn the regression-based QSAR models for 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. The results show that rbf-SVM, rbf-GPR, XGBoost and DNN generally illustrate better performances than the other algorithms. The overall performances of different algorithms can be ranked from the best to the worst as follows: rbf-SVM > XGBoost > rbf-GPR > Cubist > GBM > DNN > RF > pca-ANN > MARS > linear-GPR ≈ KNN > linear-SVM ≈ PLSR > CART ≈ PCR ≈ MLR. In terms of prediction accuracy and computational efficiency, SVM and XGBoost are recommended to the regression learning for small data sets, and XGBoost is an excellent choice for large data sets. We then investigated the performances of the ensemble models by integrating the predictions of multiple ML algorithms. The results illustrate that the ensembles of two or three algorithms in different categories can indeed improve the predictions of the best individual ML algorithms.
Collapse
Affiliation(s)
- Zhenxing Wu
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Minfeng Zhu
- Xiangya School of Pharmaceutical Sciences, Central South University, P. R. China
| | - Yu Kang
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Elaine Lai-Han Leung
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, P. R. China
| | - Tailong Lei
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Chao Shen
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Dejun Jiang
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | - Zhe Wang
- College of Pharmaceutical Sciences, Hangzhou Institute of Innovative Medicine, Zhejiang University, P. R. China
| | | | - Tingjun Hou
- Peking University, China. He is currently a professor in the College of Pharmaceutical Sciences, Zhejiang University, China
| |
Collapse
|
12
|
Alp Tokat T, Türkmenoğlu B, Güzel Y, Kızılcan DŞ. Investigation of 3D pharmacophore of N-benzyl benzamide molecules of melanogenesis inhibitors using a new descriptor Klopman index: uncertainties in model. J Mol Model 2019; 25:247. [PMID: 31342175 DOI: 10.1007/s00894-019-4120-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Accepted: 07/03/2019] [Indexed: 12/21/2022]
Abstract
We used a new descriptor called the Klopman index in our software of the "molecular comparative electron topology" (MCET) method to reduce the uncertainty resulting from the descriptors used in QSAR studies. The 3D pharmacophore model (3D-PhaM), which can demonstrate three-dimensional interaction between the ligand -receptor (L-R), is only possible with local reactive descriptors (LRD). The Klopman index, containing both Coulombic and frontier orbital and interactions of atoms of the ligand, is a good LRD. Molecular conformers having the best matching atoms with the template conformer can be selected as one of the most suitable spatial structures for interaction with the receptor, and the LRD values of the atoms in this conformer serve to determine 3D-PhaM. The 3D-PhaM of the N-benzyl benzamide derivatives, such as the melanogenesis inhibitor, was determined by ligand-based MCET and confirmed by the structure-based FlexX docking method. For compounds of the training set (42) and the external cross validation test set (6), the Q2 (0.862) and R2 (0.913) of the statistical parameters were calculated, respectively, and were checked by rm2 (0.85) of the stringent validation.
Collapse
Affiliation(s)
- Tuğba Alp Tokat
- Department of Chemistry, Faculty of Science, Erciyes University, 38039, Kayseri, Turkey
| | - Burçin Türkmenoğlu
- Department of Chemistry, Faculty of Science, Erciyes University, 38039, Kayseri, Turkey.
| | - Yahya Güzel
- Department of Chemistry, Faculty of Science, Erciyes University, 38039, Kayseri, Turkey
| | - Dilek Şeyma Kızılcan
- Department of Chemistry, Faculty of Science, Erciyes University, 38039, Kayseri, Turkey
| |
Collapse
|
13
|
Reker D, Bernardes GJL, Rodrigues T. Computational advances in combating colloidal aggregation in drug discovery. Nat Chem 2019; 11:402-418. [PMID: 30988417 DOI: 10.1038/s41557-019-0234-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 02/21/2019] [Indexed: 02/07/2023]
Abstract
Small molecule effectors are essential for drug discovery. Specific molecular recognition, reversible binding and dose-dependency are usually key requirements to ensure utility of a novel chemical entity. However, artefactual frequent-hitter and assay interference compounds may divert lead optimization and screening programmes towards attrition-prone chemical matter. Colloidal aggregates are the prime source of false positive readouts, either through protein sequestration or protein-scaffold mimicry. Nevertheless, assessment of colloidal aggregation remains somewhat overlooked and under-appreciated. In this Review, we discuss the impact of aggregation in drug discovery by analysing select examples from the literature and publicly-available datasets. We also examine and comment on technologies used to experimentally identify these potentially problematic entities. We focus on evidence-based computational filters and machine learning algorithms that may be swiftly deployed to flag chemical matter and mitigate the impact of aggregates in discovery programmes. We highlight the tools that can be used to scrutinize libraries, and identify and eliminate these problematic compounds.
Collapse
Affiliation(s)
- Daniel Reker
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA. .,Division of Gastroenterology, Hepatology and Endoscopy, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. .,MIT-IBM Watson AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Gonçalo J L Bernardes
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, UK.,Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
| | - Tiago Rodrigues
- Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal.
| |
Collapse
|
14
|
Cortés-Ciriano I, Bender A. Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks. J Chem Inf Model 2018; 59:1269-1281. [DOI: 10.1021/acs.jcim.8b00542] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
15
|
Raevsky OA, Polianczyk DE, Grigorev VY, Raevskaja OE, Dearden JC. In silico Prediction of Aqueous Solubility: a Comparative Study of Local and Global Predictive Models. Mol Inform 2015; 34:417-30. [PMID: 27490387 DOI: 10.1002/minf.201400144] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 03/05/2015] [Indexed: 11/07/2022]
Abstract
32 Quantitative Structure-Property Relationship (QSPR) models were constructed for prediction of aqueous intrinsic solubility of liquid and crystalline chemicals. Data sets contained 1022 liquid and 2615 crystalline compounds. Multiple Linear Regression (MLR), Support Vector Machine (SVM) and Random Forest (RF) methods were used to construct global models, and k-nearest neighbour (kNN), Arithmetic Mean Property (AMP) and Local Regression Property (LoReP) were used to construct local models. A set of the best QSPR models was obtained: for liquid chemicals with RMSE (root mean square error) of prediction in the range 0.50-0.60 log unit; for crystalline chemicals 0.80-0.90 log unit. In the case of global models the large number of descriptors makes mechanistic interpretation difficult. The local models use only one or two descriptors, so that a medicinal chemist working with sets of structurally-related chemicals can readily estimate their solubility. However, construction of stable local models requires the presence of closely related neighbours for each chemical considered. It is probable that a consensus of global and local QSPR models will be the optimal approach for construction of stable predictive QSPR models with mechanistic interpretation.
Collapse
Affiliation(s)
- Oleg A Raevsky
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, 142432, Russia, Chernogolovka, Severniy proezd 1 phone: +7 496 52 21867.
| | - Daniel E Polianczyk
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, 142432, Russia, Chernogolovka, Severniy proezd 1 phone: +7 496 52 21867
| | - Veniamin Yu Grigorev
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, 142432, Russia, Chernogolovka, Severniy proezd 1 phone: +7 496 52 21867
| | - Olga E Raevskaja
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, 142432, Russia, Chernogolovka, Severniy proezd 1 phone: +7 496 52 21867
| | - John C Dearden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L3 3AF, UK
| |
Collapse
|
16
|
Kew W, Mitchell JBO. Greedy and Linear Ensembles of Machine Learning Methods Outperform Single Approaches for QSPR Regression Problems. Mol Inform 2015; 34:634-47. [DOI: 10.1002/minf.201400122] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Accepted: 01/20/2015] [Indexed: 12/20/2022]
|
17
|
Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP, Bender A. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MEDCHEMCOMM 2015. [DOI: 10.1039/c4md00216d] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Proteochemometric (PCM) modelling is a computational method to model the bioactivity of multiple ligands against multiple related protein targets simultaneously.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Qurrat Ul Ain
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | | | - Eelke B. Lenselink
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Oscar Méndez-Lucio
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Peteris Prusis
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Gerard J. P. van Westen
- European Molecular Biology Laboratory
- European Bioinformatics Institute
- Wellcome Trust Genome Campus
- Hinxton
- UK
| | - Andreas Bender
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| |
Collapse
|
18
|
Cortes-Ciriano I, van Westen GJ, Lenselink EB, Murrell DS, Bender A, Malliavin T. Proteochemometric modeling in a Bayesian framework. J Cheminform 2014; 6:35. [PMID: 25045403 PMCID: PMC4083135 DOI: 10.1186/1758-2946-6-35] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 06/18/2014] [Indexed: 11/10/2022] Open
Abstract
Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with R02 values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results.
Collapse
Affiliation(s)
- Isidro Cortes-Ciriano
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825; Département de Biologie Structurale et Chimie
| | - Gerard Jp van Westen
- ChEMBL Group, European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK
| | - Eelke Bart Lenselink
- Division of Medicinal Chemistry, Leiden Academic Center for Drug Research, Leiden, The Netherlands
| | - Daniel S Murrell
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andreas Bender
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Thérèse Malliavin
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825; Département de Biologie Structurale et Chimie
| |
Collapse
|
19
|
Hao M, Li Y, Wang Y, Zhang S. Prediction of P2Y12 antagonists using a novel genetic algorithm-support vector machine coupled approach. Anal Chim Acta 2011; 690:53-63. [DOI: 10.1016/j.aca.2011.02.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2010] [Revised: 01/26/2011] [Accepted: 02/01/2011] [Indexed: 12/15/2022]
|
20
|
Hao M, Li Y, Wang Y, Zhang S. A classification study of respiratory Syncytial Virus (RSV) inhibitors by variable selection with random forest. Int J Mol Sci 2011; 12:1259-80. [PMID: 21541057 PMCID: PMC3083704 DOI: 10.3390/ijms12021259] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2010] [Revised: 02/10/2011] [Accepted: 02/11/2011] [Indexed: 12/29/2022] Open
Abstract
Experimental pEC50s for 216 selective respiratory syncytial virus (RSV) inhibitors are used to develop classification models as a potential screening tool for a large library of target compounds. Variable selection algorithm coupled with random forests (VS-RF) is used to extract the physicochemical features most relevant to the RSV inhibition. Based on the selected small set of descriptors, four other widely used approaches, i.e., support vector machine (SVM), Gaussian process (GP), linear discriminant analysis (LDA) and k nearest neighbors (kNN) routines are also employed and compared with the VS-RF method in terms of several of rigorous evaluation criteria. The obtained results indicate that the VS-RF model is a powerful tool for classification of RSV inhibitors, producing the highest overall accuracy of 94.34% for the external prediction set, which significantly outperforms the other four methods with the average accuracy of 80.66%. The proposed model with excellent prediction capacity from internal to external quality should be important for screening and optimization of potential RSV inhibitors prior to chemical synthesis in drug development.
Collapse
Affiliation(s)
- Ming Hao
- School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116012, China; E-Mails: (M.H.); (S.Z.)
| | - Yan Li
- School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116012, China; E-Mails: (M.H.); (S.Z.)
- Author to whom correspondence should be addressed; E-Mail: ; Tel.: +86-411-84986062; Fax: +86-411-84986063
| | - Yonghua Wang
- Center of Bioinformatics, Northwest A&F University, Yangling, Shaanxi 712100, China; E-Mail:
| | - Shuwei Zhang
- School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116012, China; E-Mails: (M.H.); (S.Z.)
| |
Collapse
|
21
|
Rathke F, Hansen K, Brefeld U, Müller KR. StructRank: A New Approach for Ligand-Based Virtual Screening. J Chem Inf Model 2010; 51:83-92. [DOI: 10.1021/ci100308f] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Fabian Rathke
- Department of Machine Learning, University of Technology, Berlin, Germany, Department of Image and Pattern Analysis, University of Heidelberg, Germany, and Yahoo! Research, Avinguda Diagonal 177, 08018 Barcelona, Spain
| | - Katja Hansen
- Department of Machine Learning, University of Technology, Berlin, Germany, Department of Image and Pattern Analysis, University of Heidelberg, Germany, and Yahoo! Research, Avinguda Diagonal 177, 08018 Barcelona, Spain
| | - Ulf Brefeld
- Department of Machine Learning, University of Technology, Berlin, Germany, Department of Image and Pattern Analysis, University of Heidelberg, Germany, and Yahoo! Research, Avinguda Diagonal 177, 08018 Barcelona, Spain
| | - Klaus-Robert Müller
- Department of Machine Learning, University of Technology, Berlin, Germany, Department of Image and Pattern Analysis, University of Heidelberg, Germany, and Yahoo! Research, Avinguda Diagonal 177, 08018 Barcelona, Spain
| |
Collapse
|
22
|
Cao D, Liang Y, Xu Q, Yun Y, Li H. Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features. J Comput Aided Mol Des 2010; 25:67-80. [DOI: 10.1007/s10822-010-9401-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2010] [Accepted: 11/03/2010] [Indexed: 10/18/2022]
|
23
|
Kramer C, Beck B, Clark T. Insolubility classification with accurate prediction probabilities using a MetaClassifier. J Chem Inf Model 2010; 50:404-14. [PMID: 20088498 DOI: 10.1021/ci900377e] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Insolubility is a crucial issue in drug design because insoluble compounds are often measured to be inactive although they might be active if they were soluble. We provide and analyze various insolubility classification models based on a recently published data set and compounds measured in-house at Boehringer-Ingelheim. The 2D descriptor sets from pharmacophore fingerprints and MOE and the 3D descriptor sets from ParaSurf and VolSurf were examined in conjunction with support vector machines, Bayesian regularized neural networks, and random forests. We introduce a classifier-fusion strategy, called metaclassifier, which improves upon the best single prediction and at the same time avoids descriptor selection, a potential source of overfitting. The metaclassifier strategy is compared to the simpler fusion strategies of maximum vote and highest probability picking. A prediction accuracy of 72.6% on a three class model is achieved with the metaclassifier, with nearly perfect separation of soluble and insoluble compounds and prediction as good as our calculated maximum possible agreement with experiment.
Collapse
Affiliation(s)
- Christian Kramer
- Department of Lead Discovery, Boehringer-Ingelheim Pharma GmbH & Co. KG, Biberach, Germany
| | | | | |
Collapse
|
24
|
Sakiyama Y. The use of machine learning and nonlinear statistical tools for ADME prediction. Expert Opin Drug Metab Toxicol 2010; 5:149-69. [PMID: 19239395 DOI: 10.1517/17425250902753261] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future.
Collapse
Affiliation(s)
- Yojiro Sakiyama
- Pharmacokinetics Dynamics Metabolism, Pfizer Global Research and Development, Sandwich Laboratories, Kent, UK.
| |
Collapse
|
25
|
Obrezanova O, Segall MD. Gaussian Processes for Classification: QSAR Modeling of ADMET and Target Activity. J Chem Inf Model 2010; 50:1053-61. [DOI: 10.1021/ci900406x] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Olga Obrezanova
- Optibrium Ltd., 7226 IQ Cambridge, Beach Drive, Cambridge, CB25 9TL, United Kingdom
| | - Matthew D. Segall
- Optibrium Ltd., 7226 IQ Cambridge, Beach Drive, Cambridge, CB25 9TL, United Kingdom
| |
Collapse
|
26
|
Fechner N, Jahn A, Hinselmann G, Zell A. Estimation of the applicability domain of kernel-based machine learning models for virtual screening. J Cheminform 2010; 2:2. [PMID: 20222949 PMCID: PMC2851576 DOI: 10.1186/1758-2946-2-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2009] [Accepted: 03/11/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The virtual screening of large compound databases is an important application of structural-activity relationship models. Due to the high structural diversity of these data sets, it is impossible for machine learning based QSAR models, which rely on a specific training set, to give reliable results for all compounds. Thus, it is important to consider the subset of the chemical space in which the model is applicable. The approaches to this problem that have been published so far mostly use vectorial descriptor representations to define this domain of applicability of the model. Unfortunately, these cannot be extended easily to structured kernel-based machine learning models. For this reason, we propose three approaches to estimate the domain of applicability of a kernel-based QSAR model. RESULTS We evaluated three kernel-based applicability domain estimations using three different structured kernels on three virtual screening tasks. Each experiment consisted of the training of a kernel-based QSAR model using support vector regression and the ranking of a disjoint screening data set according to the predicted activity. For each prediction, the applicability of the model for the respective compound is quantitatively described using a score obtained by an applicability domain formulation. The suitability of the applicability domain estimation is evaluated by comparing the model performance on the subsets of the screening data sets obtained by different thresholds for the applicability scores. This comparison indicates that it is possible to separate the part of the chemspace, in which the model gives reliable predictions, from the part consisting of structures too dissimilar to the training set to apply the model successfully. A closer inspection reveals that the virtual screening performance of the model is considerably improved if half of the molecules, those with the lowest applicability scores, are omitted from the screening. CONCLUSION The proposed applicability domain formulations for kernel-based QSAR models can successfully identify compounds for which no reliable predictions can be expected from the model. The resulting reduction of the search space and the elimination of some of the active compounds should not be considered as a drawback, because the results indicate that, in most cases, these omitted ligands would not be found by the model anyway.
Collapse
Affiliation(s)
- Nikolas Fechner
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Andreas Jahn
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Georg Hinselmann
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Andreas Zell
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| |
Collapse
|
27
|
Rupp M, Schroeter T, Steri R, Zettl H, Proschak E, Hansen K, Rau O, Schwarz O, Müller-Kuhrt L, Schubert-Zsilavecz M, Müller KR, Schneider G. From Machine Learning to Natural Product Derivatives that Selectively Activate Transcription Factor PPARγ. ChemMedChem 2010; 5:191-4. [DOI: 10.1002/cmdc.200900469] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
28
|
Gedeck P, Kramer C, Ertl P. Computational analysis of structure-activity relationships. PROGRESS IN MEDICINAL CHEMISTRY 2010; 49:113-60. [PMID: 20855040 DOI: 10.1016/s0079-6468(10)49004-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Peter Gedeck
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Forum 1, Novartis Campus, CH-4056 Basel, Switzerland
| | | | | |
Collapse
|
29
|
The importance of the accuracy of the experimental data for the prediction of solubility. JOURNAL OF THE SERBIAN CHEMICAL SOCIETY 2010. [DOI: 10.2298/jsc090809022e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Aqueous solubility is an important factor influencing several aspects of the pharmacokinetic profile of a drug. Numerous publications present different methodologies for the development of reliable computational models for the prediction of solubility from structure. The quality of such models can be significantly affected by the accuracy of the employed experimental solubility data. In this work, the importance of the accuracy of the experimental solubility data used for model training was investigated. Three data sets were used as training sets - Data Set 1 containing solubility data collected from various literature sources using a few criteria (n = 319), Data Set 2 created by substituting 28 values from Data set 1 with uniformly determined experimental data from one laboratory (n = 319) and Data Set 3 created by including 56 additional components, for which the solubility was also determined under uniform conditions in the same laboratory, in the Data Set 2 (n = 375). The selection of the most significant descriptors was performed by the heuristic method, using one-parameter and multi-parameter analysis. The correlations between the most significant descriptors and solubility were established using multi-linear regression analysis (MLR) for all three investigated data sets. Notable differences were observed between the equations corresponding to different data sets, suggesting that models updated with new experimental data need to be additionally optimized. It was successfully shown that the inclusion of uniform experimental data consistently leads to an improvement in the correlation coefficients. These findings contribute to an emerging consensus that improving the reliability of solubility prediction requires the inclusion of many diverse compounds for which solubility was measured under standardized conditions in the data set.
Collapse
|
30
|
Segall M, Champness E, Obrezanova O, Leeding C. Beyond Profiling: Using ADMET Models to Guide Decisions. Chem Biodivers 2009; 6:2144-51. [DOI: 10.1002/cbdv.200900148] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
31
|
Kramer C, Heinisch T, Fligge T, Beck B, Clark T. A Consistent Dataset of Kinetic Solubilities for Early-Phase Drug Discovery. ChemMedChem 2009; 4:1529-36. [DOI: 10.1002/cmdc.200900205] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
32
|
Hansen K, Mika S, Schroeter T, Sutter A, ter Laak A, Steger-Hartmann T, Heinrich N, Müller KR. Benchmark Data Set for in Silico Prediction of Ames Mutagenicity. J Chem Inf Model 2009; 49:2077-81. [PMID: 19702240 DOI: 10.1021/ci900161g] [Citation(s) in RCA: 223] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Katja Hansen
- University of Technology, Berlin, Germany, idalab GmbH, Berlin, Germany, and Bayer Schering Pharma AG, Berlin, Germany
| | - Sebastian Mika
- University of Technology, Berlin, Germany, idalab GmbH, Berlin, Germany, and Bayer Schering Pharma AG, Berlin, Germany
| | - Timon Schroeter
- University of Technology, Berlin, Germany, idalab GmbH, Berlin, Germany, and Bayer Schering Pharma AG, Berlin, Germany
| | - Andreas Sutter
- University of Technology, Berlin, Germany, idalab GmbH, Berlin, Germany, and Bayer Schering Pharma AG, Berlin, Germany
| | - Antonius ter Laak
- University of Technology, Berlin, Germany, idalab GmbH, Berlin, Germany, and Bayer Schering Pharma AG, Berlin, Germany
| | - Thomas Steger-Hartmann
- University of Technology, Berlin, Germany, idalab GmbH, Berlin, Germany, and Bayer Schering Pharma AG, Berlin, Germany
| | - Nikolaus Heinrich
- University of Technology, Berlin, Germany, idalab GmbH, Berlin, Germany, and Bayer Schering Pharma AG, Berlin, Germany
| | - Klaus-Robert Müller
- University of Technology, Berlin, Germany, idalab GmbH, Berlin, Germany, and Bayer Schering Pharma AG, Berlin, Germany
| |
Collapse
|
33
|
Fechner N, Jahn A, Hinselmann G, Zell A. Atomic local neighborhood flexibility incorporation into a structured similarity measure for QSAR. J Chem Inf Model 2009; 49:549-60. [PMID: 19434895 DOI: 10.1021/ci800329r] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In this work, we introduce a new method to regard the geometry in a structural similarity measure by approximating the conformational space of a molecule. Our idea is to break down the molecular conformation into the local conformations of neighbor atoms with respect to core atoms. This local geometry can be implicitly accessed by the trajectories of the neighboring atoms, which are emerge by rotatable bonds. In our approach, the physicochemical atomic similarity, which can be used in structured similarity measures, is augmented by a local flexibility similarity, which gives a rough estimate of the similarity of the local conformational space. We incorporated this new type of encoding the flexibility into the optimal assignment molecular similarity approach, which can be used as a pseudokernel in support vector machines. The impact of the local flexibility was evaluated on several published QSAR data sets. This lead to an improvement of the model quality on 9 out of 10 data sets compared to the unmodified optimal assignment kernel.
Collapse
Affiliation(s)
- Nikolas Fechner
- Center of Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany.
| | | | | | | |
Collapse
|
34
|
Hansen K, Rathke F, Schroeter T, Rast G, Fox T, Kriegl JM, Mika S. Bias-Correction of Regression Models: A Case Study on hERG Inhibition. J Chem Inf Model 2009; 49:1486-96. [DOI: 10.1021/ci9000794] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Katja Hansen
- University of Technology, Berlin, Germany, Departments of Drug Discovery Support and Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach a.d. Riss, Germany, and idalab GmbH, Berlin, Germany
| | - Fabian Rathke
- University of Technology, Berlin, Germany, Departments of Drug Discovery Support and Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach a.d. Riss, Germany, and idalab GmbH, Berlin, Germany
| | - Timon Schroeter
- University of Technology, Berlin, Germany, Departments of Drug Discovery Support and Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach a.d. Riss, Germany, and idalab GmbH, Berlin, Germany
| | - Georg Rast
- University of Technology, Berlin, Germany, Departments of Drug Discovery Support and Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach a.d. Riss, Germany, and idalab GmbH, Berlin, Germany
| | - Thomas Fox
- University of Technology, Berlin, Germany, Departments of Drug Discovery Support and Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach a.d. Riss, Germany, and idalab GmbH, Berlin, Germany
| | - Jan M. Kriegl
- University of Technology, Berlin, Germany, Departments of Drug Discovery Support and Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach a.d. Riss, Germany, and idalab GmbH, Berlin, Germany
| | - Sebastian Mika
- University of Technology, Berlin, Germany, Departments of Drug Discovery Support and Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach a.d. Riss, Germany, and idalab GmbH, Berlin, Germany
| |
Collapse
|
35
|
Gaussian process: an alternative approach for QSAM modeling of peptides. Amino Acids 2009; 38:199-212. [DOI: 10.1007/s00726-008-0228-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 12/18/2008] [Indexed: 10/21/2022]
|
36
|
Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Öberg T, Todeschini R, Fourches D, Varnek A. Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection. J Chem Inf Model 2008; 48:1733-46. [DOI: 10.1021/ci800151m] [Citation(s) in RCA: 282] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Igor V. Tetko
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| | - Iurii Sushko
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| | - Anil Kumar Pandey
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| | - Hao Zhu
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| | - Alexander Tropsha
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| | - Ester Papa
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| | - Tomas Öberg
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| | - Roberto Todeschini
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| | - Denis Fourches
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| | - Alexandre Varnek
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Neuherberg D-85764, Germany, Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, Kyiv-94 02660, Ukraine, Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, CB 7360, University of North Carolina at Chapel Hill, Chapel Hill,
| |
Collapse
|
37
|
Lamanna C, Bellini M, Padova A, Westerberg G, Maccari L. Straightforward Recursive Partitioning Model for Discarding Insoluble Compounds in the Drug Discovery Process. J Med Chem 2008; 51:2891-7. [DOI: 10.1021/jm701407x] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | - Marta Bellini
- Siena Biotech S.p.A., Via Fiorentina 1, 53100, Siena, Italy
| | | | | | - Laura Maccari
- Siena Biotech S.p.A., Via Fiorentina 1, 53100, Siena, Italy
| |
Collapse
|
38
|
Schwaighofer A, Schroeter T, Mika S, Hansen K, ter Laak A, Lienau P, Reichel A, Heinrich N, Müller KR. A Probabilistic Approach to Classifying Metabolic Stability. J Chem Inf Model 2008; 48:785-96. [DOI: 10.1021/ci700142c] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Anton Schwaighofer
- Fraunhofer FIRST, Kekuléstraße 7, 12489 Berlin, Germany, Technische Universität Berlin, Department of Computer Science, Franklinstraße 28/29, 10587 Berlin, Germany, idalab GmbH, Sophienstraße 24, 10178 Berlin, Germany, and Research Laboratories of Bayer Schering Pharma, Müllerstraße 178, 13342 Berlin, Germany
| | - Timon Schroeter
- Fraunhofer FIRST, Kekuléstraße 7, 12489 Berlin, Germany, Technische Universität Berlin, Department of Computer Science, Franklinstraße 28/29, 10587 Berlin, Germany, idalab GmbH, Sophienstraße 24, 10178 Berlin, Germany, and Research Laboratories of Bayer Schering Pharma, Müllerstraße 178, 13342 Berlin, Germany
| | - Sebastian Mika
- Fraunhofer FIRST, Kekuléstraße 7, 12489 Berlin, Germany, Technische Universität Berlin, Department of Computer Science, Franklinstraße 28/29, 10587 Berlin, Germany, idalab GmbH, Sophienstraße 24, 10178 Berlin, Germany, and Research Laboratories of Bayer Schering Pharma, Müllerstraße 178, 13342 Berlin, Germany
| | - Katja Hansen
- Fraunhofer FIRST, Kekuléstraße 7, 12489 Berlin, Germany, Technische Universität Berlin, Department of Computer Science, Franklinstraße 28/29, 10587 Berlin, Germany, idalab GmbH, Sophienstraße 24, 10178 Berlin, Germany, and Research Laboratories of Bayer Schering Pharma, Müllerstraße 178, 13342 Berlin, Germany
| | - Antonius ter Laak
- Fraunhofer FIRST, Kekuléstraße 7, 12489 Berlin, Germany, Technische Universität Berlin, Department of Computer Science, Franklinstraße 28/29, 10587 Berlin, Germany, idalab GmbH, Sophienstraße 24, 10178 Berlin, Germany, and Research Laboratories of Bayer Schering Pharma, Müllerstraße 178, 13342 Berlin, Germany
| | - Philip Lienau
- Fraunhofer FIRST, Kekuléstraße 7, 12489 Berlin, Germany, Technische Universität Berlin, Department of Computer Science, Franklinstraße 28/29, 10587 Berlin, Germany, idalab GmbH, Sophienstraße 24, 10178 Berlin, Germany, and Research Laboratories of Bayer Schering Pharma, Müllerstraße 178, 13342 Berlin, Germany
| | - Andreas Reichel
- Fraunhofer FIRST, Kekuléstraße 7, 12489 Berlin, Germany, Technische Universität Berlin, Department of Computer Science, Franklinstraße 28/29, 10587 Berlin, Germany, idalab GmbH, Sophienstraße 24, 10178 Berlin, Germany, and Research Laboratories of Bayer Schering Pharma, Müllerstraße 178, 13342 Berlin, Germany
| | - Nikolaus Heinrich
- Fraunhofer FIRST, Kekuléstraße 7, 12489 Berlin, Germany, Technische Universität Berlin, Department of Computer Science, Franklinstraße 28/29, 10587 Berlin, Germany, idalab GmbH, Sophienstraße 24, 10178 Berlin, Germany, and Research Laboratories of Bayer Schering Pharma, Müllerstraße 178, 13342 Berlin, Germany
| | - Klaus-Robert Müller
- Fraunhofer FIRST, Kekuléstraße 7, 12489 Berlin, Germany, Technische Universität Berlin, Department of Computer Science, Franklinstraße 28/29, 10587 Berlin, Germany, idalab GmbH, Sophienstraße 24, 10178 Berlin, Germany, and Research Laboratories of Bayer Schering Pharma, Müllerstraße 178, 13342 Berlin, Germany
| |
Collapse
|
39
|
Kramer C, Beck B, Clark T. In silico prediction of aqueous solubility – classification models. Chem Cent J 2008. [PMCID: PMC4236042 DOI: 10.1186/1752-153x-2-s1-p23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
40
|
Automatic QSAR modeling of ADME properties: blood-brain barrier penetration and aqueous solubility. J Comput Aided Mol Des 2008; 22:431-40. [PMID: 18273554 DOI: 10.1007/s10822-008-9193-8] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2007] [Accepted: 01/30/2008] [Indexed: 10/22/2022]
Abstract
In this article, we present an automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We describe the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We apply this automatic process to data sets of blood-brain barrier penetration and aqueous solubility and compare the resulting automatically generated models with 'manually' built models using external test sets. The results demonstrate the effectiveness of the automatic model generation process for two types of data sets commonly encountered in building ADME QSAR models, a small set of in vivo data and a large set of physico-chemical data.
Collapse
|
41
|
Schroeter TS, Schwaighofer A, Mika S, Ter Laak A, Suelzle D, Ganzer U, Heinrich N, Müller KR. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules. J Comput Aided Mol Des 2007; 21:651-64. [DOI: 10.1007/s10822-007-9160-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Accepted: 06/11/2007] [Indexed: 11/29/2022]
|
42
|
Schroeter TS, Schwaighofer A, Mika S, Ter Laak A, Suelzle D, Ganzer U, Heinrich N, Müller KR. Predicting Lipophilicity of Drug-Discovery Molecules using Gaussian Process Models. ChemMedChem 2007; 2:1265-7. [PMID: 17576646 DOI: 10.1002/cmdc.200700041] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Timon S Schroeter
- Intelligent Data Analysis Group, Fraunhofer FIRST, Kekulestrasse 7, 12489 Berlin, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Johnson SR, Chen XQ, Murphy D, Gudmundsson O. A Computational Model for the Prediction of Aqueous Solubility That Includes Crystal Packing, Intrinsic Solubility, and Ionization Effects. Mol Pharm 2007; 4:513-23. [PMID: 17539661 DOI: 10.1021/mp070030+] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The optimization of aqueous solubility is an important step along the route to bringing a new therapeutic to market. We describe the development of an empirical computational model to rank the pH-dependent aqueous solubility of drug candidates. The model consists of three core components to describe aqueous solubility. The first is a multivariate QSAR model for the prediction of the intrinsic solubility of the neutral solute. The second facet of the approach is the consideration of ionization using a predicted pKa and the Henderson-Hasselbalch equation. The third aspect of the model is a novel method for assessing the effects of crystal packing on solubility through a series of short molecular dynamics simulations of an actual or hypothetical small molecule crystal structure at escalating temperatures. The model also includes a Monte Carlo error function that considers the variability of each of the underlying components of the model to estimate the 90% confidence interval of estimation.
Collapse
|
44
|
Schroeter T, Schwaighofer A, Mika S, Laak AT, Suelzle D, Ganzer U, Heinrich N, Müller KR. Machine Learning Models for Lipophilicity and Their Domain of Applicability. Mol Pharm 2007; 4:524-38. [PMID: 17637064 DOI: 10.1021/mp0700413] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Unfavorable lipophilicity and water solubility cause many drug failures; therefore these properties have to be taken into account early on in lead discovery. Commercial tools for predicting lipophilicity usually have been trained on small and neutral molecules, and are thus often unable to accurately predict in-house data. Using a modern Bayesian machine learning algorithm--a Gaussian process model--this study constructs a log D7 model based on 14,556 drug discovery compounds of Bayer Schering Pharma. Performance is compared with support vector machines, decision trees, ridge regression, and four commercial tools. In a blind test on 7013 new measurements from the last months (including compounds from new projects) 81% were predicted correctly within 1 log unit, compared to only 44% achieved by commercial software. Additional evaluations using public data are presented. We consider error bars for each method (model based error bars, ensemble based, and distance based approaches), and investigate how well they quantify the domain of applicability of each model.
Collapse
|
45
|
Obrezanova O, Csanyi G, Gola JMR, Segall MD. Gaussian Processes: A Method for Automatic QSAR Modeling of ADME Properties. J Chem Inf Model 2007; 47:1847-57. [PMID: 17602549 DOI: 10.1021/ci7000633] [Citation(s) in RCA: 138] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In this article, we discuss the application of the Gaussian Process method for the prediction of absorption, distribution, metabolism, and excretion (ADME) properties. On the basis of a Bayesian probabilistic approach, the method is widely used in the field of machine learning but has rarely been applied in quantitative structure-activity relationship and ADME modeling. The method is suitable for modeling nonlinear relationships, does not require subjective determination of the model parameters, works for a large number of descriptors, and is inherently resistant to overtraining. The performance of Gaussian Processes compares well with and often exceeds that of artificial neural networks. Due to these features, the Gaussian Processes technique is eminently suitable for automatic model generation-one of the demands of modern drug discovery. Here, we describe the basic concept of the method in the context of regression problems and illustrate its application to the modeling of several ADME properties: blood-brain barrier, hERG inhibition, and aqueous solubility at pH 7.4. We also compare Gaussian Processes with other modeling techniques.
Collapse
Affiliation(s)
- Olga Obrezanova
- BioFocus DPI, 127 Cambridge Science Park, Milton Road, Cambridge, CB4 0GD, United Kingdom.
| | | | | | | |
Collapse
|
46
|
Chapter 29 Computational Models for ADME. ANNUAL REPORTS IN MEDICINAL CHEMISTRY 2007. [DOI: 10.1016/s0065-7743(07)42029-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register]
|