1
|
Dutschmann TM, Schlenker V, Baumann K. Chemoinformatic regression methods and their applicability domain. Mol Inform 2024; 43:e202400018. [PMID: 38803302 DOI: 10.1002/minf.202400018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/24/2024] [Accepted: 03/25/2024] [Indexed: 05/29/2024]
Abstract
The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Valerie Schlenker
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| |
Collapse
|
2
|
Zhu T, Tao C. Prediction models with multiple machine learning algorithms for POPs: The calculation of PDMS-air partition coefficient from molecular descriptor. JOURNAL OF HAZARDOUS MATERIALS 2022; 423:127037. [PMID: 34530267 DOI: 10.1016/j.jhazmat.2021.127037] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 08/21/2021] [Accepted: 08/23/2021] [Indexed: 06/13/2023]
Abstract
Polydimethylsiloxane-air partition coefficient (KPDMS-air) is a key parameter for passive sampling to measure POPs concentrations. In this study, 13 QSPR models were developed to predict KPDMS-air, with two descriptor selection methods (MLR and RF) and seven algorithms (MLR, LASSO, ANN, SVM, kNN, RF and GBDT). All models were based on a data set of 244 POPs from 13 different categories. The diverse model evaluation parameters calculated from training and test set were used for internal and external verification. Notably, the Radj2, QBOOT2 and Qext2 are 0.995, 0.980 and 0.951 respectively for GBDT model, showing remarkable superiority in fitting, robustness and predictability compared with other models. The discovery that molecular size, branches and types of the bonds were the main internal factors affecting the partition process was revealed by mechanism explanation. Different from the existing QSPR models based on single category compounds, the models developed herein considered multiple classes compounds, so that its application domain was more comprehensive. Therefore, the obtained models can fill the data gap of missing experimental KPDMS-air values for compounds in the application range, and help researchers better understand the distribution behavior of POPs from the perspective of molecular structure.
Collapse
Affiliation(s)
- Tengyi Zhu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China.
| | - Cuicui Tao
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China
| |
Collapse
|
3
|
Kuz’min V, Artemenko A, Ognichenko L, Hromov A, Kosinskaya A, Stelmakh S, Sessions ZL, Muratov EN. Simplex representation of molecular structure as universal QSAR/QSPR tool. Struct Chem 2021; 32:1365-1392. [PMID: 34177203 PMCID: PMC8218296 DOI: 10.1007/s11224-021-01793-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 05/07/2021] [Indexed: 10/24/2022]
Abstract
We review the development and application of the Simplex approach for the solution of various QSAR/QSPR problems. The general concept of the simplex method and its varieties are described. The advantages of utilizing this methodology, especially for the interpretation of QSAR/QSPR models, are presented in comparison to other fragmentary methods of molecular structure representation. The utility of SiRMS is demonstrated not only in the standard QSAR/QSPR applications, but also for mixtures, polymers, materials, and other complex systems. In addition to many different types of biological activity (antiviral, antimicrobial, antitumor, psychotropic, analgesic, etc.), toxicity and bioavailability, the review examines the simulation of important properties, such as water solubility, lipophilicity, as well as luminescence, and thermodynamic properties (melting and boiling temperatures, critical parameters, etc.). This review focuses on the stereochemical description of molecules within the simplex approach and details the possibilities of universal molecular stereo-analysis and stereochemical configuration description, along with stereo-isomerization mechanism and molecular fragment "topography" identification.
Collapse
Affiliation(s)
- Victor Kuz’min
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Anatoly Artemenko
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Luidmyla Ognichenko
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Alexander Hromov
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Anna Kosinskaya
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
- Department of Medical Chemistry, Odessa National Medical University, Odessa, 65082 Ukraine
| | - Sergij Stelmakh
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Zoe L. Sessions
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599 USA
| | - Eugene N. Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599 USA
- Department of Pharmaceutical Sciences, Federal University of Paraiba, Joao Pessoa, PB 58059 Brazil
| |
Collapse
|
4
|
Prasad S, Brooks BR. A deep learning approach for the blind logP prediction in SAMPL6 challenge. J Comput Aided Mol Des 2020; 34:535-542. [PMID: 32002779 PMCID: PMC8689685 DOI: 10.1007/s10822-020-00292-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/17/2020] [Indexed: 12/14/2022]
Abstract
Water octanol partition coefficient serves as a measure for the lipophilicity of a molecule and is important in the field of drug discovery. A novel method for computational prediction of logarithm of partition coefficient (logP) has been developed using molecular fingerprints and a deep neural network. The machine learning model was trained on a dataset of 12,000 molecules and tested on 2000 molecules. In this article, we present our results for the blind prediction of logP for the SAMPL6 challenge. While the best submission achieved a RMSE of 0.41 logP units, our submission had a RMSE of 0.61 logP units. Overall, we ranked in the top quarter out of the 92 submissions that were made. Our results show that the deep learning model can be used as a fast, accurate and robust method for high throughput prediction of logP of small molecules.
Collapse
Affiliation(s)
- Samarjeet Prasad
- Biophysics and Biophysical Chemistry, The Johns Hopkins University, School of Medicine, Baltimore, MD, 21205, USA.
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20814, USA.
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20814, USA
| |
Collapse
|
5
|
Klimenko K, Kuz'min V, Ognichenko L, Gorb L, Shukla M, Vinas N, Perkins E, Polishchuk P, Artemenko A, Leszczynski J. Novel enhanced applications of QSPR models: Temperature dependence of aqueous solubility. J Comput Chem 2016; 37:2045-51. [PMID: 27338156 DOI: 10.1002/jcc.24424] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2016] [Revised: 04/22/2016] [Accepted: 05/17/2016] [Indexed: 11/09/2022]
Abstract
A model developed to predict aqueous solubility at different temperatures has been proposed based on quantitative structure-property relationships (QSPR) methodology. The prediction consists of two steps. The first one predicts the value of k parameter in the linear equation lgSw=kT+c, where Sw is the value of solubility and T is the value of temperature. The second step uses Random Forest technique to create high-efficiency QSPR model. The performance of the model is assessed using cross-validation and external test set prediction. Predictive capacity of developed model is compared with COSMO-RS approximation, which has quantum chemical and thermodynamic foundations. The comparison shows slightly better prediction ability for the QSPR model presented in this publication. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Kyrylo Klimenko
- Department of Molecular Structure and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa, 65080, Ukraine.,Laboratoire de Chemoinformatique, (UMR 7140 CNRS/UniStra) Université de Strasbourg, 1, rue B. Pascal, Strasbourg, 67000, France
| | - Victor Kuz'min
- Department of Molecular Structure and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa, 65080, Ukraine
| | - Liudmila Ognichenko
- Department of Molecular Structure and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa, 65080, Ukraine
| | | | - Manoj Shukla
- US Army Engineer Research and Development Center, Vicksburg, Mississippi, 39180
| | - Natalia Vinas
- US Army Engineer Research and Development Center, Vicksburg, Mississippi, 39180
| | - Edward Perkins
- US Army Engineer Research and Development Center, Vicksburg, Mississippi, 39180
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Palacky University Olomouc, Hnevotínská 1333/5, Olomouc, 779 00, Czech Republic
| | - Anatoly Artemenko
- Department of Molecular Structure and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa, 65080, Ukraine
| | - Jerzy Leszczynski
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Jackson State University, Jackson, Mississippi, 39217
| |
Collapse
|
6
|
Computational assessment of environmental hazards of nitroaromatic compounds: influence of the type and position of aromatic ring substituents on toxicity. Struct Chem 2015. [DOI: 10.1007/s11224-015-0715-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
7
|
Polishchuk PG, Samoylenko GV, Khristova TM, Krysko OL, Kabanova TA, Kabanov VM, Kornylov AY, Klimchuk O, Langer T, Andronati SA, Kuz'min VE, Krysko AA, Varnek A. Design, Virtual Screening, and Synthesis of Antagonists of αIIbβ3 as Antiplatelet Agents. J Med Chem 2015; 58:7681-94. [PMID: 26367138 DOI: 10.1021/acs.jmedchem.5b00865] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
This article describes design, virtual screening, synthesis, and biological tests of novel αIIbβ3 antagonists, which inhibit platelet aggregation. Two types of αIIbβ3 antagonists were developed: those binding either closed or open form of the protein. At the first step, available experimental data were used to build QSAR models and ligand- and structure-based pharmacophore models and to select the most appropriate tool for ligand-to-protein docking. Virtual screening of publicly available databases (BioinfoDB, ZINC, Enamine data sets) with developed models resulted in no hits. Therefore, small focused libraries for two types of ligands were prepared on the basis of pharmacophore models. Their screening resulted in four potential ligands for open form of αIIbβ3 and four ligands for its closed form followed by their synthesis and in vitro tests. Experimental measurements of affinity for αIIbβ3 and ability to inhibit ADP-induced platelet aggregation (IC50) showed that two designed ligands for the open form 4c and 4d (IC50 = 6.2 nM and 25 nM, respectively) and one for the closed form 12b (IC50 = 11 nM) were more potent than commercial antithrombotic Tirofiban (IC50 = 32 nM).
Collapse
Affiliation(s)
- Pavel G Polishchuk
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Georgiy V Samoylenko
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Tetiana M Khristova
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine.,Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), University of Strasbourg , 1, rue B. Pascal, Strasbourg 67000, France
| | - Olga L Krysko
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Tatyana A Kabanova
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Vladimir M Kabanov
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Alexander Yu Kornylov
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Olga Klimchuk
- Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), University of Strasbourg , 1, rue B. Pascal, Strasbourg 67000, France
| | - Thierry Langer
- Department of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna , Althanstraße 14, 1090 Vienna, Austria
| | - Sergei A Andronati
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Victor E Kuz'min
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Andrei A Krysko
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Alexandre Varnek
- Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), University of Strasbourg , 1, rue B. Pascal, Strasbourg 67000, France
| |
Collapse
|
8
|
Yilmaz H, Sizochenko N, Rasulev B, Toropov A, Guzel Y, Kuz'min V, Leszczynska D, Leszczynski J. Amino substituted nitrogen heterocycle ureas as kinase insert domain containing receptor (KDR) inhibitors: Performance of structure–activity relationship approaches. J Food Drug Anal 2015; 23:168-175. [PMID: 28911371 PMCID: PMC9351780 DOI: 10.1016/j.jfda.2015.03.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
A quantitative structure–activity relationship (QSAR) study was performed on a set of amino-substituted nitrogen heterocyclic urea derivatives. Two novel approaches were applied: (1) the simplified molecular input-line entry systems (SMILES) based optimal descriptors approach; and (2) the fragment-based simplex representation of molecular structure (SiRMS) approach. Comparison with the classic scheme of building up the model and balance of correlation (BC) for optimal descriptors approach shows that the BC scheme provides more robust predictions than the classic scheme for the considered pIC50 of the heterocyclic urea derivatives. Comparison of the SMILES-based optimal descriptors and SiRMS approaches has confirmed good performance of both techniques in prediction of kinase insert domain containing receptor (KDR) inhibitory activity, expressed as a logarithm of inhibitory concentration (pIC50) of studied compounds.
Collapse
Affiliation(s)
- Hayriye Yilmaz
- Kayseri Vocational School, Biomedical Devices and Technologies, Erciyes University, 38039, Kayseri, Turkey; Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, 39217, USA
| | - Natalia Sizochenko
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, 39217, USA; Odessa I.I. Mechnikov National University, Department of Chemistry, Dvoryanskaya Street, 2, 65082, Odessa, Ukraine
| | - Bakhtiyor Rasulev
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, 39217, USA
| | - Andrey Toropov
- Laboratory of Environmental Chemistry and Toxicology, IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, 20156, Via La Masa 19, Milano, Italy
| | - Yahya Guzel
- Department of Chemistry, Faculty of Science, Erciyes University, 38039, Kayseri, Turkey
| | - Viktor Kuz'min
- Odessa I.I. Mechnikov National University, Department of Chemistry, Dvoryanskaya Street, 2, 65082, Odessa, Ukraine
| | - Danuta Leszczynska
- Department of Civil and Environmental Engineering, Jackson State University, Jackson, MS, 39217, USA
| | - Jerzy Leszczynski
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, 39217, USA.
| |
Collapse
|
9
|
Kolumbin O, Ognichenko L, Artemenko A, Polischuk P, Kulinsky М, Мuratov Е, Kuz’min V, Bobeica V. Nonexperimental Screening of the Water Solubility, Lipophilicity, Bioavailability, Mutagenicity and Toxicity of Various Pesticides with QSAR Models Aid. CHEMISTRY JOURNAL OF MOLDOVA 2013. [DOI: 10.19261/cjm.2013.08(1).12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|