1
|
Mora JR, Marquez EA, Pérez-Pérez N, Contreras-Torres E, Perez-Castillo Y, Agüero-Chapin G, Martinez-Rios F, Marrero-Ponce Y, Barigye SJ. Rethinking the applicability domain analysis in QSAR models. J Comput Aided Mol Des 2024; 38:9. [PMID: 38351144 DOI: 10.1007/s10822-024-00550-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 02/05/2024] [Indexed: 02/16/2024]
Abstract
Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in "rational" model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.
Collapse
Affiliation(s)
- Jose R Mora
- Departamento de Ingeniería Química, Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC- USFQ), Diego de Robles y Vía Interoceánica, Quito, 170901, Ecuador
| | - Edgar A Marquez
- Grupo de Investigaciones en Química Y Biología, Departamento de Química Y Biología, Facultad de Ciencias Básicas, Universidad del Norte, Carrera 51B, Km 5, vía Puerto Colombia, Barranquilla, 081007, Colombia
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Cátedras Conacyt, Ensenada, Baja California, México
| | - Noel Pérez-Pérez
- Colegio de Ciencias e Ingenierías "El Politécnico", Universidad San Francisco de Quito (USFQ), Quito, Ecuador
| | - Ernesto Contreras-Torres
- Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador
| | - Yunierkis Perez-Castillo
- Bio-Chemoinformatics Research Group, Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, 170504, Ecuador
| | - Guillermin Agüero-Chapin
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n, Porto, 4450-208, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, Porto, 4169- 007, Portugal
| | - Felix Martinez-Rios
- Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador
- Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador
| | - Stephen J Barigye
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), Madrid, 28049, Spain.
| |
Collapse
|
2
|
Kundu S. A mathematically rigorous algorithm to define, compute and assess relevance of the probable dissociation constants in characterizing a biochemical network. Sci Rep 2024; 14:3507. [PMID: 38347039 PMCID: PMC10861591 DOI: 10.1038/s41598-024-53231-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 01/30/2024] [Indexed: 02/15/2024] Open
Abstract
Metabolism results from enzymatic- and non-enzymatic interactions of several molecules, is easily parameterized with the dissociation constant and occurs via biochemical networks. The dissociation constant is an empirically determined parameter and cannot be used directly to investigate in silico models of biochemical networks. Here, we develop and present an algorithm to define, compute and assess the relevance of the probable dissociation constant for every reaction of a biochemical network. The reactants and reactions of this network are modelled by a stoichiometry number matrix. The algorithm computes the null space and then serially generates subspaces by combinatorially summing the spanning vectors that are non-trivial and unique. This is done until the terms of each row either monotonically diverge or form an alternating sequence whose terms can be partitioned into subsets with almost the same number of oppositely signed terms. For a selected null space-generated subspace the algorithm utilizes several statistical and mathematical descriptors to select and bin terms from each row into distinct outcome-specific subsets. The terms of each subset are summed, mapped to the real-valued open interval [Formula: see text] and used to populate a reaction-specific outcome vector. The p1-norm for this vector is then the probable dissociation constant for this reaction. These steps are continued until every reaction of a modelled network is unambiguously annotated. The assertions presented are complemented by computational studies of a biochemical network for aerobic glycolysis. The fundamental premise of this work is that every row of a null space-generated subspace is a valid reaction and can therefore, be modelled as a reaction-specific sequence vector with a dimension that corresponds to the cardinality of the subspace after excluding all trivial- and redundant-vectors. A major finding of this study is that the row-wise sum or the sum of the terms contained in each reaction-specific sequence vector is mapped unambiguously to a positive real number. This means that the probable dissociation constants, for all reactions, can be directly computed from the stoichiometry number matrix and are suitable indicators of outcome for every reaction of the modelled biochemical network. Additionally, we find that the unambiguous annotation for a biochemical network will require a minimum number of iterations and will determine computational complexity.
Collapse
Affiliation(s)
- Siddhartha Kundu
- Department of Biochemistry, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110029, India.
| |
Collapse
|
3
|
Martinez-Mayorga K, Rosas-Jiménez JG, Gonzalez-Ponce K, López-López E, Neme A, Medina-Franco JL. The pursuit of accurate predictive models of the bioactivity of small molecules. Chem Sci 2024; 15:1938-1952. [PMID: 38332817 PMCID: PMC10848664 DOI: 10.1039/d3sc05534e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 01/09/2024] [Indexed: 02/10/2024] Open
Abstract
Property prediction is a key interest in chemistry. For several decades there has been a continued and incremental development of mathematical models to predict properties. As more data is generated and accumulated, there seems to be more areas of opportunity to develop models with increased accuracy. The same is true if one considers the large developments in machine and deep learning models. However, along with the same areas of opportunity and development, issues and challenges remain and, with more data, new challenges emerge such as the quality and quantity and reliability of the data, and model reproducibility. Herein, we discuss the status of the accuracy of predictive models and present the authors' perspective of the direction of the field, emphasizing on good practices. We focus on predictive models of bioactive properties of small molecules relevant for drug discovery, agrochemical, food chemistry, natural product research, and related fields.
Collapse
Affiliation(s)
- Karina Martinez-Mayorga
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José G Rosas-Jiménez
- Department of Theoretical Biophysics, IMPRS on Cellular Biophysics Max-von-Laue Strasse 3 Frankfurt am Main 60438 Germany
| | - Karla Gonzalez-Ponce
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
| | - Edgar López-López
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute Mexico City 07000 Mexico
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| | - Antonio Neme
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| |
Collapse
|
4
|
Kotli M, Piir G, Maran U. Pesticide effect on earthworm lethality via interpretable machine learning. JOURNAL OF HAZARDOUS MATERIALS 2024; 461:132577. [PMID: 37793249 DOI: 10.1016/j.jhazmat.2023.132577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/15/2023] [Accepted: 09/16/2023] [Indexed: 10/06/2023]
Abstract
Earthworms are among the most important animals (invertebrates) for soil health. Many chemical substances released into nature for agricultural development, such as pesticides, may have unwanted effects on those organisms. However, it is essential to understand the extent of the impact of chemicals on soil health first and then make the proper decisions for regulatory or commercial purposes. We hypothesize that there is an expressible quantitative structure-activity relationship (QSAR) between the structure of pesticide compounds and the acute toxicity effect of earthworm species Eisenia fetida. The description of this relationship allows for a better assessment of the impact of chemicals on the said earthworm. To describe this relationship, a dataset of chemicals was collected from open-access sources to develop a mathematical model. A novel approach, combining genetic algorithm and Bayesian optimization, was used to select structural features into the model and to optimize model parameters. The final QSAR classification model was created with the Random Forest algorithm and exhibited good prediction Accuracy of 0.78 on training set and 0.80 on test set. The model representation follows FAIR principles and is available on QsarDB.org.
Collapse
Affiliation(s)
- Mihkel Kotli
- University of Tartu, Institute of Chemistry, Tartu, Estonia
| | - Geven Piir
- University of Tartu, Institute of Chemistry, Tartu, Estonia
| | - Uko Maran
- University of Tartu, Institute of Chemistry, Tartu, Estonia.
| |
Collapse
|
5
|
Piir G, Sild S, Maran U. Interpretable machine learning for the identification of estrogen receptor agonists, antagonists, and binders. CHEMOSPHERE 2024; 347:140671. [PMID: 37951393 DOI: 10.1016/j.chemosphere.2023.140671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 10/25/2023] [Accepted: 11/07/2023] [Indexed: 11/14/2023]
Abstract
An abnormal hormonal activity or exposure to endocrine-disrupting chemicals (EDCs) can cause endocrine system malfunction. Among the many interactions EDCs can affect is the disruption of estrogen signalling, which can lead to adverse health effects such as cancer, osteoporosis, neurodegenerative diseases, cardiovascular disease, insulin resistance, and obesity. Knowing which chemical can act as an EDC is a significant advantage and a practical necessity. New Approach Methodologies (NAM) computational models offer a quick and cost-effective solution for preliminary hazard assessment of chemicals without animal testing. Therefore, a machine learning approach was used to investigate the relationships between estrogen receptor (ER) activity and chemical structure to identify chemicals that can interact with ER. For this purpose, the consolidated in vitro assay data from ToxCast/Tox21 projects was used for developing Random Forest classification models for ER binding, agonists, and antagonists. The overall classification prediction accuracy reaches up to 82%, depending on whether the model predicted agonists, antagonists, or compounds that bind to the active site. Given the imbalance in endocrine disruption data, the derived models are good candidates for deprioritising chemicals and reducing animal testing. The interpretation of theoretical molecular descriptors of the models was consistent with the molecular interactions known in the ligand binding pocket. The estimated class probabilities enabled the analysis of the applicability domain of the developed models and the assessment of the predictions' reliability, followed by the guidelines for interpreting prediction results. The models are openly accessible and useable at QsarDB.org (http://dx.doi.org/10.15152/QDB.259) according to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
Collapse
Affiliation(s)
- Geven Piir
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia.
| |
Collapse
|
6
|
Fliszkiewicz B, Sajdak M. Fragments quantum descriptors in classification of bio-accumulative compounds. J Mol Graph Model 2023; 125:108584. [PMID: 37611341 DOI: 10.1016/j.jmgm.2023.108584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 07/24/2023] [Accepted: 07/29/2023] [Indexed: 08/25/2023]
Abstract
The aim of the following research is to assess the applicability of calculated quantum properties of molecular fragments as molecular descriptors in machine learning classification task. The research is based on bio-concentration and QM9-extended databases. A number of compounds with results from quantum-chemical calculations conducted with Psi4 quantum chemistry package was also added to the quantum properties database. Classification results are compared with a baseline of random guesses and predictions obtained with the traditional RDKit generated molecular descriptors. Chosen classification metrics show that results obtained with fragments quantum descriptors fall between results from baseline and those provided by molecular descriptors widely applied in cheminformatics. According to the results, the implementation of principal component analysis, causes a drop in categorization metrics.
Collapse
Affiliation(s)
- Bartłomiej Fliszkiewicz
- Department of New Technologies and Chemistry, Military University of Technology, Kaliskiego 2, Warsaw, 00-908, Poland.
| | - Marcin Sajdak
- Faculty of Energy and Environmental Engineering, Silesian University of Technology, Akademicka 2A, Gliwice, 44-109, Poland; School of Chemical Engineering, University of Birmingham, S W Campus, Birmingham, B15 TT, United Kingdom
| |
Collapse
|
7
|
Lasfar R, Tóth G. Patch seriation to visualize data and model parameters. J Cheminform 2023; 15:78. [PMID: 37689697 PMCID: PMC10492365 DOI: 10.1186/s13321-023-00757-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 08/31/2023] [Indexed: 09/11/2023] Open
Abstract
We developed a new seriation merit function for enhancing the visual information of data matrices. A local similarity matrix is calculated, where the average similarity of neighbouring objects is calculated in a limited variable space and a global function is constructed to maximize the local similarities and cluster them into patches by simple row and column ordering. The method identifies data clusters in a powerful way, if the similarity of objects is caused by some variables and these variables differ for the distinct clusters. The method can be used in the presence of missing data and also on more than two-dimensional data arrays. We show the feasibility of the method on different data sets: on QSAR, chemical, material science, food science, cheminformatics and environmental data in two- and three-dimensional cases. The method can be used during the development and the interpretation of artificial neural network models by seriating different features of the models. It helps to identify interpretable models by elucidating clusters of objects, variables and hidden layer neurons.
Collapse
Affiliation(s)
- Rita Lasfar
- Institute of Chemistry, Eötvös Loránd University, Pázmány sétány 1/a, Budapest, 1117, Hungary
| | - Gergely Tóth
- Institute of Chemistry, Eötvös Loránd University, Pázmány sétány 1/a, Budapest, 1117, Hungary.
| |
Collapse
|
8
|
Venkatraman V. FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools. Front Chem 2023; 11:1239467. [PMID: 37649967 PMCID: PMC10462816 DOI: 10.3389/fchem.2023.1239467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer's. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62-0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
9
|
Cronin MTD, Belfield SJ, Briggs KA, Enoch SJ, Firman JW, Frericks M, Garrard C, Maccallum PH, Madden JC, Pastor M, Sanz F, Soininen I, Sousoni D. Making in silico predictive models for toxicology FAIR. Regul Toxicol Pharmacol 2023; 140:105385. [PMID: 37037390 DOI: 10.1016/j.yrtph.2023.105385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 02/18/2023] [Accepted: 04/07/2023] [Indexed: 04/12/2023]
Abstract
In silico predictive models for toxicology include quantitative structure-activity relationship (QSAR) and physiologically based kinetic (PBK) approaches to predict physico-chemical and ADME properties, toxicological effects and internal exposure. Such models are used to fill data gaps as part of chemical risk assessment. There is a growing need to ensure in silico predictive models for toxicology are available for use and reproducible. This paper describes how the FAIR (Findable, Accessible, Interoperable, Reusable) principles, developed for data sharing, have been applied to in silico predictive models. In particular, this investigation has focussed on how the FAIR principles could be applied to improved regulatory acceptance of predictions from such models. Eighteen principles have been developed that cover all aspects of FAIR. It is intended that FAIRification of in silico predictive models for toxicology will increase their use and acceptance.
Collapse
Affiliation(s)
- Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK.
| | - Samuel J Belfield
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK
| | - Katharine A Briggs
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Holbeck, Leeds, LS11 5PS, UK
| | - Steven J Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK
| | - James W Firman
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK
| | - Markus Frericks
- BASF SE, APD/ET - Li 444, Speyerer St 2, 67117, Limburgerhof, Germany
| | - Clare Garrard
- ELIXIR, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Peter H Maccallum
- ELIXIR, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Judith C Madden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK
| | - Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Dept. of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Dept. of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Inari Soininen
- Synapse Research Management Partners SL, Calle Velazquez 94, planta 1, 28006, Madrid, Spain
| | - Despoina Sousoni
- ELIXIR, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
10
|
Lowe CN, Charest N, Ramsland C, Chang DT, Martin TM, Williams AJ. Transparency in Modeling through Careful Application of OECD's QSAR/QSPR Principles via a Curated Water Solubility Data Set. Chem Res Toxicol 2023; 36:465-478. [PMID: 36877669 PMCID: PMC10357388 DOI: 10.1021/acs.chemrestox.2c00379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Abstract
The need for careful assembly, training, and validation of quantitative structure-activity/property models (QSAR/QSPR) is more significant than ever as data sets become larger and sophisticated machine learning tools become increasingly ubiquitous and accessible to the scientific community. Regulatory agencies such as the United States Environmental Protection Agency must carefully scrutinize each aspect of a resulting QSAR/QSPR model to determine its potential use in environmental exposure and hazard assessment. Herein, we revisit the goals of the Organisation for Economic Cooperation and Development (OECD) in our application and discuss the validation principles for structure-activity models. We apply these principles to a model for predicting water solubility of organic compounds derived using random forest regression, a common machine learning approach in the QSA/PR literature. Using public sources, we carefully assembled and curated a data set consisting of 10,200 unique chemical structures with associated water solubility measurements. This data set was then used as a focal narrative to methodically consider the OECD's QSA/PR principles and how they can be applied to random forests. Despite some expert, mechanistically informed supervision of descriptor selection to enhance model interpretability, we achieved a model of water solubility with comparable performance to previously published models (5-fold cross validated performance 0.81 R2 and 0.98 RMSE). We hope this work will catalyze a necessary conversation around the importance of cautiously modernizing and explicitly leveraging OECD principles while pursuing state-of-the-art machine learning approaches to derive QSA/PR models suitable for regulatory consideration.
Collapse
Affiliation(s)
- Charles N. Lowe
- Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Nathaniel Charest
- ORAU Student Services Contractor to Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Christian Ramsland
- ORAU Student Services Contractor to Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Daniel T. Chang
- Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Todd M. Martin
- Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Antony J. Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| |
Collapse
|
11
|
Tullius Scotti M, Herrera-Acevedo C, Barros de Menezes RP, Martin HJ, Muratov EN, Ítalo de Souza Silva Á, Faustino Albuquerque E, Ferreira Calado L, Coy-Barrera E, Scotti L. MolPredictX: Online Biological Activity Predictions by Machine Learning Models. Mol Inform 2022; 41:e2200133. [PMID: 35961924 DOI: 10.1002/minf.202200133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/12/2022] [Indexed: 01/05/2023]
Abstract
Here we report the development of MolPredictX, an innovate and freely accessible web interface for biological activity predictions of query molecules. MolPredictX utilizes in-house QSAR models to provide 27 qualitative predictions (active or inactive), and quantitative probabilities for bioactivity against parasitic (Trypanosoma and Leishmania), viral (Dengue, Sars-CoV and Hepatitis C), pathogenic yeast (Candida albicans), bacterial (Salmonella enterica and Escherichia coli), and Alzheimer disease enzymes. In this article, we introduce the methodology and usability of this webtool, highlighting its potential role in the development of new drugs against a variety of diseases. MolPredictX is undergoing continuous development and is freely available at https://www.molpredictx.ufpb.br/.
Collapse
Affiliation(s)
- Marcus Tullius Scotti
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Chonny Herrera-Acevedo
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil.,Department of Chemical Engineering, Universidad ECCI, Carrera 19 # 49-20, 111311, Bogotá D.C., Colombia
| | - Renata Priscila Barros de Menezes
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Holli-Joi Martin
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Ávilla Ítalo de Souza Silva
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Emmanuella Faustino Albuquerque
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Lucas Ferreira Calado
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| | - Ericsson Coy-Barrera
- Bioorganic Chemistry Laboratory, Facultad de Ciencias Básicas y Aplicadas, Universidad Militar Nueva Granada, Cajicá, 250247, Colombia
| | - Luciana Scotti
- Programa de Pós-Graduação de Produtos Naturais e Sintéticos Bioativos, Universidade Federal da Paraíba, 58051-900, João Pessoa-PB, Brazil
| |
Collapse
|
12
|
Király P, Kiss R, Kovács D, Ballaj A, Tóth G. The Relevance of Goodness-of-fit, Robustness and Prediction Validation Categories of OECD-QSAR Principles with Respect to Sample Size and Model Type. Mol Inform 2022; 41:e2200072. [PMID: 35773201 PMCID: PMC9787734 DOI: 10.1002/minf.202200072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 06/30/2022] [Indexed: 12/30/2022]
Abstract
We investigated the relevance of the validation principles on the Quantitative Structure Activity Relationship models issued by Organization for Economic and Co-operation and Development. We checked the goodness-of-fit, robustness and predictivity categories in linear and nonlinear models using benchmark datasets. Most of our conclusions are drawn using the sample size dependence of the different validation parameters. We found that the goodness-of-fit parameters misleadingly overestimate the models on small samples. In the case of neural network and support vector models, the feasibility of the goodness-of-fit parameters often might be questioned. We propose to use the simplest y-scrambling method to estimate chance correlation. We found that the leave-one-out and leave-many-out cross-validation parameters can be rescaled to each other in all models and the computationally feasible method should be chosen depending on the model type. We assessed the interdependence of the validation parameters by calculating their rank correlations. Goodness of fit and robustness correlate quite well over a sample size for linear models and one of the approaches might be redundant. In the rank correlation between internal and external validation parameters, we found that the assignment of good and bad modellable data to the training or the test causes negative correlations.
Collapse
Affiliation(s)
- Péter Király
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| | - Ramóna Kiss
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| | - Dániel Kovács
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| | - Amine Ballaj
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| | - Gergely Tóth
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| |
Collapse
|
13
|
Oja M, Sild S, Piir G, Maran U. Intrinsic Aqueous Solubility: Mechanistically Transparent Data-Driven Modeling of Drug Substances. Pharmaceutics 2022; 14:pharmaceutics14102248. [PMID: 36297685 PMCID: PMC9611068 DOI: 10.3390/pharmaceutics14102248] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/12/2022] [Accepted: 10/18/2022] [Indexed: 11/07/2022] Open
Abstract
Intrinsic aqueous solubility is a foundational property for understanding the chemical, technological, pharmaceutical, and environmental behavior of drug substances. Despite years of solubility research, molecular structure-based prediction of the intrinsic aqueous solubility of drug substances is still under active investigation. This paper describes the authors’ systematic data-driven modelling in which two fit-for-purpose training data sets for intrinsic aqueous solubility were collected and curated, and three quantitative structure–property relationships were derived to make predictions for the most recent solubility challenge. All three models perform well individually, while being mechanistically transparent and easy to understand. Molecular descriptors involved in the models are related to the following key steps in the solubility process: dissociation of the molecule from the crystal, formation of a cavity in the solvent, and insertion of the molecule into the solvent. A consensus modeling approach with these models remarkably improved prediction capability and reduced the number of strong outliers by more than two times. The performance and outliers of the second solubility challenge predictions were analyzed retrospectively. All developed models have been published in the QsarDB.org repository according to FAIR principles and can be used without restrictions for exploring, downloading, and making predictions.
Collapse
Affiliation(s)
| | | | | | - Uko Maran
- Correspondence: ; Tel.: +372-7-375-254; Fax: +372-7-375-264
| |
Collapse
|
14
|
Bertato L, Chirico N, Papa E. Predicting the Bioconcentration Factor in Fish from Molecular Structures. TOXICS 2022; 10:toxics10100581. [PMID: 36287860 PMCID: PMC9610932 DOI: 10.3390/toxics10100581] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 05/14/2023]
Abstract
The bioconcentration factor (BCF) is one of the metrics used to evaluate the potential of a substance to bioaccumulate into aquatic organisms. In this work, linear and non-linear regression QSARs were developed for the prediction of log BCF using different computational approaches, and starting from a large and structurally heterogeneous dataset. The new MLR-OLS and ANN regression models have good fitting with R2 values of 0.62 and 0.70, respectively, and comparable external predictivity with R2ext 0.64 and 0.65 (RMSEext of 0.78 and 0.76), respectively. Furthermore, linear and non-linear classification models were developed using the regulatory threshold BCF >2000. A class balanced subset was used to develop classification models which were applied to chemicals not used to create the QSARs. These classification models are characterized by external and internal accuracy up to 84% and 90%, respectively, and sensitivity and specificity up to 90% and 80%, respectively. QSARs presented in this work are validated according to regulatory requirements and their quality is in line with other tools available for the same endpoint and dataset, with the advantage of low complexity and easy application through the software QSAR-ME Profiler. These QSARs can be used as alternatives for, or in combination with, existing models to support bioaccumulation assessment procedures.
Collapse
|
15
|
Pavel A, Saarimäki LA, Möbus L, Federico A, Serra A, Greco D. The potential of a data centred approach & knowledge graph data representation in chemical safety and drug design. Comput Struct Biotechnol J 2022; 20:4837-4849. [PMID: 36147662 PMCID: PMC9464643 DOI: 10.1016/j.csbj.2022.08.061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/26/2022] [Accepted: 08/26/2022] [Indexed: 11/20/2022] Open
Abstract
Big Data pervades nearly all areas of life sciences, yet the analysis of large integrated data sets remains a major challenge. Moreover, the field of life sciences is highly fragmented and, consequently, so is its data, knowledge, and standards. This, in turn, makes integrated data analysis and knowledge gathering across sub-fields a demanding task. At the same time, the integration of various research angles and data types is crucial for modelling the complexity of organisms and biological processes in a holistic manner. This is especially valid in the context of drug development and chemical safety assessment where computational methods can provide solutions for the urgent need of fast, effective, and sustainable approaches. At the same time, such computational methods require the development of methodologies suitable for an integrated and data centred Big Data view. Here we discuss Knowledge Graphs (KG) as a solution to a data centred analysis approach for drug and chemical development and safety assessment. KGs are knowledge bases, data analysis engines, and knowledge discovery systems all in one, allowing them to be used from simple data retrieval, over meta-analysis to complex predictive and knowledge discovery systems. Therefore, KGs have immense potential to advance the data centred approach, the re-usability, and informativity of data. Furthermore, they can improve the power of analysis, and the complexity of modelled processes, all while providing knowledge in a natively human understandable network data model.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Laura A Saarimäki
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Lena Möbus
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.,BioMediTech Institute, Tampere University, Tampere, Finland.,Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere, Finland.,Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| |
Collapse
|
16
|
Toots KM, Sild S, Leis J, Acree WE, Maran U. Machine Learning Quantitative Structure–Property Relationships as a Function of Ionic Liquid Cations for the Gas-Ionic Liquid Partition Coefficient of Hydrocarbons. Int J Mol Sci 2022; 23:ijms23147534. [PMID: 35886881 PMCID: PMC9323540 DOI: 10.3390/ijms23147534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/27/2022] [Accepted: 06/30/2022] [Indexed: 02/01/2023] Open
Abstract
Ionic liquids (ILs) are known for their unique characteristics as solvents and electrolytes. Therefore, new ILs are being developed and adapted as innovative chemical environments for different applications in which their properties need to be understood on a molecular level. Computational data-driven methods provide means for understanding of properties at molecular level, and quantitative structure–property relationships (QSPRs) provide the framework for this. This framework is commonly used to study the properties of molecules in ILs as an environment. The opposite situation where the property is considered as a function of the ionic liquid does not exist. The aim of the present study was to supplement this perspective with new knowledge and to develop QSPRs that would allow the understanding of molecular interactions in ionic liquids based on the structure of the cationic moiety. A wide range of applications in electrochemistry, separation and extraction chemistry depends on the partitioning of solutes between the ionic liquid and the surrounding environment that is characterized by the gas-ionic liquid partition coefficient. To model this property as a function of the structure of a cationic counterpart, a series of ionic liquids was selected with a common bis-(trifluoromethylsulfonyl)-imide anion, [Tf2N]−, for benzene, hexane and cyclohexane. MLR, SVR and GPR machine learning approaches were used to derive data-driven models and their performance was compared. The cross-validation coefficients of determination in the range 0.71–0.93 along with other performance statistics indicated a strong accuracy of models for all data series and machine learning methods. The analysis and interpretation of descriptors revealed that generally higher lipophilicity and dispersion interaction capability, and lower polarity in the cations induces a higher partition coefficient for benzene, hexane, cyclohexane and hydrocarbons in general. The applicability domain analysis of models concluded that there were no highly influential outliers and the models are applicable to a wide selection of cation families with variable size, polarity and aliphatic or aromatic nature.
Collapse
Affiliation(s)
- Karl Marti Toots
- Department of Chemistry, University of Tartu, 14a Ravila Street, 50411 Tartu, Estonia; (K.M.T.); (S.S.); (J.L.)
| | - Sulev Sild
- Department of Chemistry, University of Tartu, 14a Ravila Street, 50411 Tartu, Estonia; (K.M.T.); (S.S.); (J.L.)
| | - Jaan Leis
- Department of Chemistry, University of Tartu, 14a Ravila Street, 50411 Tartu, Estonia; (K.M.T.); (S.S.); (J.L.)
| | - William E. Acree
- Department of Chemistry, University of North Texas, 1155 Union Circle Drive #305070, Denton, TX 76203, USA;
| | - Uko Maran
- Department of Chemistry, University of Tartu, 14a Ravila Street, 50411 Tartu, Estonia; (K.M.T.); (S.S.); (J.L.)
- Correspondence:
| |
Collapse
|
17
|
Advanced Analytical Tools for the Estimation of Gut Permeability of Compounds of Pharmaceutical Interest. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12031326] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The present study aims at developing a quantitative structure–activity relationship (QSAR) model for the determination of gut permeability of 228 pharmacological drugs at different pH conditions (3, 5, 7.4, 9, intrinsic). As a consequence, five different datasets (according to the diverse permeability shown by the compounds at the different pH values) were handled, with the aim of discriminating compounds as low-permeable or high-permeable. In order to achieve this goal, molecular descriptors for all the investigated compounds were computed and then classification models calculated by means of partial least squares discriminant analysis (PLS-DA). A high predictive capability was achieved for all models, providing correct classification rates in external validation between 80% and 96%. In order to test whether a reduction in the molecular descriptors would improve predictions and provide information about the most relevant variables, a feature selection approach, covariance selection, was used to select the most relevant subsets of predictors. This led to a slight improvement in the predictive accuracies, and it has indicated that the most relevant descriptors for the discrimination of the investigated compounds into low- and high-permeable were associated with the 2D and 3D structures.
Collapse
|
18
|
Toots KM, Sild S, Leis J, Acree Jr. WE, Maran U. The quantitative structure-property relationships for the gas-ionic liquid partition coefficient of a large variety of organic compounds in three ionic liquids. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2021.117573] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
19
|
Machine Learning Applied to the Modeling of Pharmacological and ADMET Endpoints. Methods Mol Biol 2021. [PMID: 34731464 DOI: 10.1007/978-1-0716-1787-8_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
The well-known concept of quantitative structure-activity relationships (QSAR) has been gaining significant interest in the recent years. Data, descriptors, and algorithms are the main pillars to build useful models that support more efficient drug discovery processes with in silico methods. Significant advances in all three areas are the reason for the regained interest in these models. In this book chapter we review various machine learning (ML) approaches that make use of measured in vitro/in vivo data of many compounds. We put these in context with other digital drug discovery methods and present some application examples.
Collapse
|
20
|
Web-Based Quantitative Structure-Activity Relationship Resources Facilitate Effective Drug Discovery. Top Curr Chem (Cham) 2021; 379:37. [PMID: 34554348 DOI: 10.1007/s41061-021-00349-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/17/2021] [Indexed: 12/28/2022]
Abstract
Traditional drug discovery effectively contributes to the treatment of many diseases but is limited by high costs and long cycles. Quantitative structure-activity relationship (QSAR) methods were introduced to evaluate the activity of compounds virtually, which saves the significant cost of determining the activities of the compounds experimentally. Over the past two decades, many web tools for QSAR modeling with various features have been developed to facilitate the usage of QSAR methods. These web tools significantly reduce the difficulty of using QSAR and indirectly promote drug discovery. However, there are few comprehensive summaries of these QSAR tools, and researchers may have difficulty determining which tool to use. Hence, we systematically surveyed the mainstream web tools for QSAR modeling. This work may guide researchers in choosing appropriate web tools for developing QSAR models, and may also help develop more bioinformatics tools based on these existing resources. For nonprofessionals, we also hope to make more people aware of QSAR methods and expand their use.
Collapse
|
21
|
Combined Naïve Bayesian, Chemical Fingerprints and Molecular Docking Classifiers to Model and Predict Androgen Receptor Binding Data for Environmentally- and Health-Sensitive Substances. Int J Mol Sci 2021; 22:ijms22136695. [PMID: 34206613 PMCID: PMC8267747 DOI: 10.3390/ijms22136695] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 06/18/2021] [Accepted: 06/20/2021] [Indexed: 12/15/2022] Open
Abstract
Many chemicals that enter the environment, food chain, and the human body can disrupt androgen-dependent pathways and mimic hormones and therefore, may be responsible for multiple diseases from reproductive to tumor. Thus, modeling and predicting androgen receptor activity is an important area of research. The aim of the current study was to find a method or combination of methods to predict compounds that can bind to and/or disrupt the androgen receptor, and thereby guide decision making and further analysis. A stepwise procedure proceeded from analysis of protein structures from human, chimp, and rat, followed by docking and subsequent ligand, and statistics based techniques that improved classification gradually. The best methods used multivariate logistic regression of combinations of chimpanzee protein structural docking scores, extended connectivity fingerprints, and naïve Bayesians of known binders and non-binders. Combination or consensus methods included data from a variety of procedures to improve the final model accuracy.
Collapse
|
22
|
Kovács D, Király P, Tóth G. Sample-size dependence of validation parameters in linear regression models and in QSAR. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:247-268. [PMID: 33749419 DOI: 10.1080/1062936x.2021.1890208] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 02/10/2021] [Indexed: 06/12/2023]
Abstract
The dependence of statistical validation parameters was investigated on the size of the sample taken in fit of multivariate linear curves. We observed that R2 and related internal parameters were misleading as they overestimated the goodness-of-fit of models at small sample size. Cross-validation metrics showed correct trends. It was possible to scale the leave-one-out and the leave-many-out results close to identical by correcting the degrees of freedom of the models. y and x-randomized validation parameters were calculated and the methods provided close to identical results. We suggest to use the simplest methods in both cases. The external parameters followed correct trends with respect to the sample size, but their sensitivity differed. We plotted the Roy-Ojha metrics in 2D and we coloured them with respect to other external parameters to provide an easy classification of models. The rank correlations were calculated between the performance parameters. Up to a sample size, goodness-of-fit and robustness were distinguishable, but above a certain sample size, the parameters were redundant. The external-internal pairs were weakly correlated. Our data show that all the three aspects of validation are necessary at small sample sizes, but the internal check of robustness is not informative above a given sample size.
Collapse
Affiliation(s)
- D Kovács
- Institute of Chemistry, Loránd Eötvös University, Budapest, Hungary
| | - P Király
- Institute of Chemistry, Loránd Eötvös University, Budapest, Hungary
| | - G Tóth
- Institute of Chemistry, Loránd Eötvös University, Budapest, Hungary
| |
Collapse
|
23
|
Piir G, Sild S, Maran U. Binary and multi-class classification for androgen receptor agonists, antagonists and binders. CHEMOSPHERE 2021; 262:128313. [PMID: 33182081 DOI: 10.1016/j.chemosphere.2020.128313] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/24/2020] [Accepted: 09/10/2020] [Indexed: 06/11/2023]
Abstract
Androgens and androgen receptor regulate a variety of biological effects in the human body. The impaired functioning of androgen receptor may have different adverse health effects from cancer to infertility. Therefore, it is important to determine whether new chemicals have any binding activity and act as androgen agonists or antagonists before commercial use. Due to the large number of chemicals that require experimental testing, the computational methods are a viable alternative. Therefore, the aim of the present study was to develop predictive QSAR models for classifying compounds according to their activity at the androgen receptor. A large data set of chemicals from the CoMPARA project was used for this purpose and random forest classification models have been developed for androgen binding, agonistic, and antagonistic activity. In addition, a unique effort has been made for multi-class approach that discriminates between inactive compounds, agonists and antagonists simultaneously. For the evaluation set, the classification models predicted agonists with 80% of accuracy and for the antagonists' and binders' the respective metrics were 72% and 78%. Combining agonists, antagonists and inactive compounds into a multi-class approach added complexity to the modelling task and resulted to 64% prediction accuracy for the evaluation set. Considering the size of the training data sets and their imbalance, the achieved evaluation accuracy is very good. The final classification models are available for exploring and predicting at QsarDB repository (https://doi.org/10.15152/QDB.236).
Collapse
Affiliation(s)
- Geven Piir
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia
| | - Sulev Sild
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia
| | - Uko Maran
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia.
| |
Collapse
|
24
|
Fayet G, Rotureau P. Chemoinformatics for the Safety of Energetic and Reactive Materials at Ineris. Mol Inform 2020; 41:e2000190. [PMID: 33283975 DOI: 10.1002/minf.202000190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 12/06/2020] [Indexed: 11/07/2022]
Abstract
The characterization of physical hazards of substances is a key information to manage the risks associated to their use, storage and transport. With decades of work in this area, Ineris develops and implements cutting-edge experimental facilities allowing such characterizations at different scales and under various conditions to study all of the dreaded accident scenarios. This review presents the efforts engaged by Ineris more recently in the field of chemoinformatics to develop and use new predictive methods for the anticipation and management of industrials risks associated to energetic and reactive materials as a complement to experiments. An overview of the methods used for the development of Quantitative Structure-Property Relationships for physical hazards are presented and discussed regarding the specificities associated to this class of properties. A review of models developed at Ineris is also provided from the first tentative models on the explosivity of nitro compounds to the successful application to the flammability of organic mixtures. Then, a discussion is proposed on the use of QSPR models. Good practices for robust use for QSPR models are recalled with specific comments related to physical hazards, notably for regulatory purpose. Dissemination and training efforts engaged by Ineris are also presented. The potential offered by these predictive methods in terms of in silico design and for the development of new intrinsically safer technologies in safety-by-design strategies is finally discussed. At last, challenges and perspectives to extend the application of chemoinformatics in the field of safety and in particular for the physical hazards of energetic and reactive substances are proposed.
Collapse
Affiliation(s)
- Guillaume Fayet
- Ineris, Accidental Risk Division, Parc Technologique Alata, 60550, Verneuil-en-Halatte, France
| | - Patricia Rotureau
- Ineris, Accidental Risk Division, Parc Technologique Alata, 60550, Verneuil-en-Halatte, France
| |
Collapse
|
25
|
Zukić S, Maran U. Modelling of antiproliferative activity measured in HeLa cervical cancer cells in a series of xanthene derivatives. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2020; 31:905-921. [PMID: 33236957 DOI: 10.1080/1062936x.2020.1839131] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Accepted: 10/15/2020] [Indexed: 06/11/2023]
Abstract
Cancer remains one of the leading causes of death in humans, and new drug substances are therefore being developed. Thus, the anti-cancer activity of xanthene derivatives has become an important topic in the development of new and potent anti-cancer drug substances. Previously published novel series of xanthen-3-one and xanthen-1,8-dione derivatives have been synthesized in one of our laboratories and showed anti-proliferative activity in HeLa cancer cell lines. This series serves as a good basis to develop quantitative structure-activity relationship (QSAR), to study the relations between anti-proliferative activity and chemical structures. A QSAR model has been derived that relies only on two-dimensional molecular descriptors, providing mechanistic insight into the anti-proliferative activity of xanthene derivatives. The model is validated internally and externally and additionally with the set of inactive compounds of the original data, confirming model applicability for the design and discovery of novel xanthene derivatives. The QSAR model is available at the QsarDB repository (http://dx.doi.10.15152/QDB.237).
Collapse
Affiliation(s)
- S Zukić
- Department of Pharmaceutical Chemistry, University of Sarajevo , Sarajevo, Bosnia and Herzegovina
| | - U Maran
- Department of Chemistry, University of Tartu , Tartu, Estonia
| |
Collapse
|
26
|
Abstract
At the end of her academic career, the author summarizes the main aspects of QSAR modeling, giving comments and suggestions according to her 23 years' experience in QSAR research on environmental topics. The focus is mainly on Multiple Linear Regression, particularly Ordinary Least Squares, using a Genetic Algorithm for variable selection from various theoretical molecular descriptors, but the comments can be useful also for other QSAR methods. The need for rigorous validation, also external, and for applicability domain check to guarantee predictivity and reliability of QSAR models is particularly highlighted. The commented approach is the “predictive” one, based on chemometrics, and is usefully applied to the prioritization of environmental pollutants. All the discussed points and the author's ideas are implemented in the software QSARINS, as a legacy to the QSAR community.
Collapse
|
27
|
Schaduangrat N, Lampa S, Simeon S, Gleeson MP, Spjuth O, Nantasenamat C. Towards reproducible computational drug discovery. J Cheminform 2020; 12:9. [PMID: 33430992 PMCID: PMC6988305 DOI: 10.1186/s13321-020-0408-x] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 01/02/2020] [Indexed: 12/11/2022] Open
Abstract
The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computational drug discovery. This review explores the following topics: (1) the current state-of-the-art on reproducible research, (2) research documentation (e.g. electronic laboratory notebook, Jupyter notebook, etc.), (3) science of reproducible research (i.e. comparison and contrast with related concepts as replicability, reusability and reliability), (4) model development in computational drug discovery, (5) computational issues on model development and deployment, (6) use case scenarios for streamlining the computational drug discovery protocol. In computational disciplines, it has become common practice to share data and programming codes used for numerical calculations as to not only facilitate reproducibility, but also to foster collaborations (i.e. to drive the project further by introducing new ideas, growing the data, augmenting the code, etc.). It is therefore inevitable that the field of computational drug design would adopt an open approach towards the collection, curation and sharing of data/code.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, 10700, Bangkok, Thailand
| | - Samuel Lampa
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24, Uppsala, Sweden
| | - Saw Simeon
- Interdisciplinary Graduate Program in Bioscience, Faculty of Science, Kasetsart University, 10900, Bangkok, Thailand
| | - Matthew Paul Gleeson
- Department of Biomedical Engineering, Faculty of Engineering, King Mongkut's Institute of Technology Ladkrabang, 10520, Bangkok, Thailand.
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24, Uppsala, Sweden.
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, 10700, Bangkok, Thailand.
| |
Collapse
|
28
|
Nantasenamat C. Best Practices for Constructing Reproducible QSAR Models. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2020. [DOI: 10.1007/978-1-0716-0150-1_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
29
|
Abstract
Abstract
The prediction of toxicological endpoints has gained broad acceptance; it is widely applied in early stages of drug discovery as well as for impurities obtained in the production of generic or equivalent products. In this work, we describe methodologies for the prediction of toxicological endpoints compounds, with a particular focus on secondary metabolites. Case studies include toxicity prediction of natural compound databases with anti-diabetic, anti-malaria and anti-HIV properties.
Collapse
|
30
|
Ida T, Nishida M, Hori Y. Revisiting Formic Acid Decomposition by a Graph-Theoretical Approach. J Phys Chem A 2019; 123:9579-9586. [DOI: 10.1021/acs.jpca.9b05994] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Tomonori Ida
- Graduate School of Natural Science and Technology, Kanazawa University, Kakuma, Kanazawa 920-1192, Japan
| | - Manami Nishida
- Graduate School of Natural Science and Technology, Kanazawa University, Kakuma, Kanazawa 920-1192, Japan
| | - Yuta Hori
- Center for Computational Sciences, University of Tsukuba, Tsukuba 305-8577, Japan
| |
Collapse
|
31
|
Diukendjieva A, Tsakovska I, Alov P, Pencheva T, Pajeva I, Worth AP, Madden JC, Cronin MT. Advances in the prediction of gastrointestinal absorption: Quantitative Structure-Activity Relationship (QSAR) modelling of PAMPA permeability. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.comtox.2018.12.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
32
|
Vighi M, Barsi A, Focks A, Grisoni F. Predictive models in ecotoxicology: Bridging the gap between scientific progress and regulatory applicability-Remarks and research needs. INTEGRATED ENVIRONMENTAL ASSESSMENT AND MANAGEMENT 2019; 15:345-351. [PMID: 30821044 DOI: 10.1002/ieam.4136] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Accepted: 02/18/2019] [Indexed: 06/09/2023]
Abstract
This paper concludes a special series of 7 articles (4 on toxicokinetic-toxicodynamic [TK-TD] models and 3 on quantitative structure-activity relationship [QSAR] models) published in previous issues of Integrated Environmental Assessment and Management (IEAM). The present paper summarizes the special series articles and highlights their contribution to the topic of increasing the regulatory applicability of effect models. For both TK-TD and QSAR approaches, we then describe the main research needs. The use of TK-TD models for describing sublethal effects must be better developed, particularly through the improvement of the dynamic energy budget (DEBtox) approach. The potential of TK-TD models for moving from lower (molecular) to higher (population) hierarchical levels is highlighted as a promising research line. Some relevant issues to improve the acceptance of QSAR models at the regulatory level are also described, such as increased transparency of the performance assessment and of the modeling algorithms, model documentation, relevance of the chosen target for regulatory needs, and improved mechanistic interpretability. Integr Environ Assess Manag 2019;00:000-000. © 2019 SETAC.
Collapse
Affiliation(s)
- Marco Vighi
- IMDEA Water Institute, Alcalà de Henares (Madrid), Spain
| | - Alpar Barsi
- Dutch Board for the Authorisation of Plant Protection Products and Biocides (Ctgb), Ede, Netherlands
| | - Andreas Focks
- Wageningen University & Research, Wageningen, Netherlands
| | - Francesca Grisoni
- University of Milano-Bicocca, Department of Earth and Environmental Sciences, Milano, Italy
| |
Collapse
|
33
|
Oja M, Sild S, Maran U. Logistic Classification Models for pH–Permeability Profile: Predicting Permeability Classes for the Biopharmaceutical Classification System. J Chem Inf Model 2019; 59:2442-2455. [DOI: 10.1021/acs.jcim.8b00833] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Mare Oja
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu 50411, Estonia
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu 50411, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu 50411, Estonia
| |
Collapse
|
34
|
Kazmi SR, Jun R, Yu MS, Jung C, Na D. In silico approaches and tools for the prediction of drug metabolism and fate: A review. Comput Biol Med 2019; 106:54-64. [PMID: 30682640 DOI: 10.1016/j.compbiomed.2019.01.008] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Revised: 01/14/2019] [Accepted: 01/14/2019] [Indexed: 01/08/2023]
Abstract
The fate of administered drugs is largely influenced by their metabolism. For example, endogenous enzyme-catalyzed conversion of drugs may result in therapeutic inactivation or activation or may transform the drugs into toxic chemical compounds. This highlights the importance of drug metabolism in drug discovery and development, and accounts for the wide variety of experimental technologies that provide insights into the fate of drugs. In view of the high cost of traditional drug development, a number of computational approaches have been developed for predicting the metabolic fate of drug candidates, allowing for screening of large numbers of chemical compounds and then identifying a small number of promising candidates. In this review, we introduce in silico approaches and tools that have been developed to predict drug metabolism and fate, and assess their potential to facilitate the virtual discovery of promising drug candidates. We also provide a brief description of various recent models for predicting different aspects of enzyme-drug reactions and provide a list of recent in silico tools used for drug metabolism prediction.
Collapse
Affiliation(s)
- Sayada Reemsha Kazmi
- School of Integrative Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Ren Jun
- School of Integrative Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Myeong-Sang Yu
- School of Integrative Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Chanjin Jung
- School of Integrative Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Dokyun Na
- School of Integrative Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea.
| |
Collapse
|
35
|
Piir G, Kahn I, García-Sosa AT, Sild S, Ahte P, Maran U. Best Practices for QSAR Model Reporting: Physical and Chemical Properties, Ecotoxicity, Environmental Fate, Human Health, and Toxicokinetics Endpoints. ENVIRONMENTAL HEALTH PERSPECTIVES 2018; 126:126001. [PMID: 30561225 PMCID: PMC6371683 DOI: 10.1289/ehp3264] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Revised: 10/19/2018] [Accepted: 11/07/2018] [Indexed: 05/31/2023]
Abstract
BACKGROUND Quantitative and qualitative structure–activity relationships (QSARs) have been used to understand chemical behavior for almost a century. The main source of QSAR models is the scientific literature, but the open question is how well these models are documented. OBJECTIVES The main aim of this study was to critically analyze the publication practices of QSARs with regard to transparency, potential reproducibility, and independent verification. The focus was on the level of technical completeness of the published QSARs. METHODS A total of 1,533 QSAR articles reporting 79 individual endpoints, mostly in environmental and health science, were reviewed. The QSAR parameters required for technical completeness were grouped into five categories: chemical structures, experimental endpoint values, descriptor values, mathematical representation of the model, and predicted endpoint values. The data were summarized and discussed using Circos plots. RESULTS Altogether, 42.5% of the reviewed articles were found to be potentially reproducible. The potential reproducibility for different endpoint groups varied; the respective rates were 39% for physical and chemical properties, 52% for ecotoxicity, 56% for environmental fate, 30% for human health, and 32% for toxicokinetics. The reproducibility of QSARs is discussed and placed in the context of the reproducibility of the experimental methods. Included are 65 references to open QSAR datasets as examples of models restored from scientific articles. DISCUSSION Strikingly poor documentation of QSARs was observed, which reduces the transparency, availability, and consequently, the application of research results in scientific, industrial, and regulatory areas. A list of the components needed to ensure the best practices for QSAR reporting is provided, allowing long-term use and preservation of the models. This list also allows an assessment of the reproducibility of models by interested parties such as journal editors, reviewers, regulators, evaluators, and potential users. https://doi.org/10.1289/EHP3264.
Collapse
Affiliation(s)
- Geven Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Iiris Kahn
- Department of Chemistry and Biotechnology, Tallinn University of Technology, Tallinn, Estonia
| | | | - Sulev Sild
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Priit Ahte
- Department of Chemistry and Biotechnology, Tallinn University of Technology, Tallinn, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| |
Collapse
|
36
|
pH-permeability profiles for drug substances: Experimental detection, comparison with human intestinal absorption and modelling. Eur J Pharm Sci 2018; 123:429-440. [PMID: 30100533 DOI: 10.1016/j.ejps.2018.07.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 06/19/2018] [Accepted: 07/04/2018] [Indexed: 01/05/2023]
Abstract
The influence of pH on human intestinal absorption is frequently not considered in early drug discovery studies in the modelling and subsequent prediction of intestinal absorption for drug candidates. To bridge this gap, in this study, experimental membrane permeability data were measured for current and former drug substances with a parallel artificial membrane permeability assay (PAMPA) at different pH values (3, 5, 7.4 and 9). The presented data are in good agreement with human intestinal absorption, showing a clear influence of pH on the efficiency of intestinal absorption. For the measured data, simple and general quantitative structure-activity relationships (QSARs) were developed for each pH that makes it possible to predict the pH profiles for passive membrane permeability (i.e., a pH-permeability profile), and these predictions coincide well with the experimental data. QSARs are also proposed for the data series of highest and intrinsic membrane permeability. The molecular descriptors in the models were analysed and mechanistically related to the interaction pattern of permeability in membranes. In addition to the regression models, classification models are also proposed. All models were successfully validated and blind tested with external data. The models are available in the QsarDB repository (http://dx.doi.org/10.15152/QDB.203).
Collapse
|
37
|
Patel M, Chilton ML, Sartini A, Gibson L, Barber C, Covey-Crump L, Przybylak KR, Cronin MTD, Madden JC. Assessment and Reproducibility of Quantitative Structure–Activity Relationship Models by the Nonexpert. J Chem Inf Model 2018; 58:673-682. [DOI: 10.1021/acs.jcim.7b00523] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Mukesh Patel
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Martyn L. Chilton
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Andrea Sartini
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Laura Gibson
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Chris Barber
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Liz Covey-Crump
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, England
| | - Katarzyna R. Przybylak
- School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - Mark T. D. Cronin
- School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - Judith C. Madden
- School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| |
Collapse
|
38
|
Gramatica P, Papa E, Sangion A. QSAR modeling of cumulative environmental end-points for the prioritization of hazardous chemicals. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2018; 20:38-47. [PMID: 29226926 DOI: 10.1039/c7em00519a] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The hazard of chemicals in the environment is inherently related to the molecular structure and derives simultaneously from various chemical properties/activities/reactivities. Models based on Quantitative Structure Activity Relationships (QSARs) are useful to screen, rank and prioritize chemicals that may have an adverse impact on humans and the environment. This paper reviews a selection of QSAR models (based on theoretical molecular descriptors) developed for cumulative multivariate endpoints, which were derived by mathematical combination of multiple effects and properties. The cumulative end-points provide an integrated holistic point of view to address environmentally relevant properties of chemicals.
Collapse
Affiliation(s)
- Paola Gramatica
- QSAR Research Unit on Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences (DiSTA), University of Insubria, Varese, Italy.
| | | | | |
Collapse
|
39
|
Nolte TM, Pinto-Gil K, Hendriks AJ, Ragas AMJ, Pastor M. Quantitative structure-activity relationships for primary aerobic biodegradation of organic chemicals in pristine surface waters: starting points for predicting biodegradation under acclimatization. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2018; 20:157-170. [PMID: 29192704 DOI: 10.1039/c7em00375g] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Microbial biomass and acclimation can affect the removal of organic chemicals in natural surface waters. In order to account for these effects and develop more robust models for biodegradation, we have compiled and curated removal data for un-acclimated (pristine) surface waters on which we developed quantitative structure-activity relationships (QSARs). Global analysis of the very heterogeneous dataset including neutral, anionic, cationic and zwitterionic chemicals (N = 233) using a random forest algorithm showed that useful predictions were possible (Qext2 = 0.4-0.5) though relatively large standard errors were associated (SDEP ∼0.7). Classification of the chemicals based on speciation state and metabolic pathway showed that biodegradation is influenced by the two, and that the dependence of biodegradation on chemical characteristics is non-linear. Class-specific QSAR analysis indicated that shape and charge distribution determine the biodegradation of neutral chemicals (R2 ∼ 0.6), e.g. through membrane permeation or binding to P450 enzymes, whereas the average biodegradation of charged chemicals is 1 to 2 orders of magnitude lower, for which degradation depends more directly on cellular uptake (R2 ∼ 0.6). Further analysis showed that specific chemical classes such as peptides and organic halogens are relatively less biodegradable in pristine surface waters, resulting in the need for the microbial consortia to acclimate. Additional literature data was used to verify an acclimation model (based on Monod-type kinetics) capable of extrapolating QSAR predictions to acclimating conditions such as in water treatment, downstream lakes and large rivers under μg L-1 to mg L-1 concentrations. The framework developed, despite being based on multiple assumptions, is promising and needs further validation using experimentation with more standardised and homogenised conditions as well as adequate characterization of the inoculum used.
Collapse
Affiliation(s)
- Tom M Nolte
- Department of Environmental Science, Institute for Water and Wetland Research, Radboud University Nijmegen, P. O. Box 9010, 6500 GL Nijmegen, The Netherlands.
| | | | | | | | | |
Collapse
|
40
|
Gozalbes R, Vicente de Julián-Ortiz J. Applications of Chemoinformatics in Predictive Toxicology for Regulatory Purposes, Especially in the Context of the EU REACH Legislation. ACTA ACUST UNITED AC 2018. [DOI: 10.4018/ijqspr.2018010101] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Chemoinformatics methodologies such as QSAR/QSPR have been used for decades in drug discovery projects, especially for the finding of new compounds with therapeutic properties and the optimization of ADME properties on chemical series. The application of computational techniques in predictive toxicology is much more recent, and they are experiencing an increasingly interest because of the new legal requirements imposed by national and international regulations. In the pharmaceutical field, the US Food and Drug Administration (FDA) support the use of predictive models for regulatory decision-making when assessing the genotoxic and carcinogenic potential of drug impurities. In Europe, the REACH legislation promotes the use of QSAR in order to reduce the huge amount of animal testing needed to demonstrate the safety of new chemical entities subjected to registration, provided they meet specific conditions to ensure their quality and predictive power. In this review, the authors summarize the state of art of in silico methods for regulatory purposes, with especial emphasis on QSAR models.
Collapse
|
41
|
Viira B, García-Sosa AT, Maran U. Chemical structure and correlation analysis of HIV-1 NNRT and NRT inhibitors and database-curated, published inhibition constants with chemical structure in diverse datasets. J Mol Graph Model 2017; 76:205-223. [PMID: 28738270 DOI: 10.1016/j.jmgm.2017.06.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Revised: 06/18/2017] [Accepted: 06/19/2017] [Indexed: 01/26/2023]
Abstract
Human immunodeficiency virus (HIV-1) reverse transcriptase is a major target for designing anti-HIV drugs. Developed inhibitors are divided into non-nucleoside analog reverse-transcriptase inhibitors (NNRTIs) and nucleoside analog reverse-transcriptase inhibitors (NRTIs) depending on their mechanism. Given that many inhibitors have been studied and for many of them binding affinity constants have been calculated, it is beneficial to analyze the chemical landscape of these families of inhibitors and correlate these inhibition constants with molecular structure descriptors. For this, the HIV-1 RT data was retrieved from the ChEMBL database, carefully curated, and original literature verified, grouped into NRTIs and NNRTIs, analyzed using a hierarchical scaffold classification method and modelled with best multi-linear regression approach. Analysis of the HIV-1 NNRTIs subset results in ten different common structural parent types of oxazepanone, piperazinone, pyrazine, oxazinanone, diazinanone, pyridine, pyrrole, diazepanone, thiazole, and triazine. The same analysis for HIV-1 NRTIs groups structures into four different parent types of uracil, pyrimide, pyrimidione, and imidazole. Each scaffold tree corresponding to the parent types has been carefully analyzed and examined, and changes in chemical structure favorable to potency and stability are highlighted. For both subsets, descriptive and predictive QSAR models are derived, discussed and externally validated, revealing general trends in relationships between molecular structure and binding affinity constants in structurally diverse datasets. Data and QSAR models are available at the QsarDB repository (http://dx.doi.org/10.15152/QDB.202).
Collapse
Affiliation(s)
- Birgit Viira
- Institute of Chemistry, University of Tartu, Tartu 50411, Estonia
| | | | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu 50411, Estonia.
| |
Collapse
|
42
|
Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Chertó M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 2017; 9:33. [PMID: 29086040 PMCID: PMC5461230 DOI: 10.1186/s13321-017-0220-4] [Citation(s) in RCA: 210] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 05/16/2017] [Indexed: 12/15/2022] Open
Abstract
Background The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software.CDK 2.0 provides new features and improved performance ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0220-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6200 MD, Maastricht, The Netherlands.
| | | | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24, Uppsala, Sweden
| | - Arvid Berg
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24, Uppsala, Sweden
| | - Lars Carlsson
- AstraZeneca, Innovative Medicines & Early Development, Quantitative Biology, Möndal, Sweden
| | | | - Stefan Kuhn
- Department of Informatics, University of Leicester, Leicester, UK
| | - Tomáš Pluskal
- Whitehead Institute for Biomedical Research, 455 Main Street, Cambridge, MA, 02142, USA
| | | | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24, Uppsala, Sweden
| | | | - Chris T Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6200 MD, Maastricht, The Netherlands
| | - Rajarshi Guha
- National Center for Advancing Translational Sciences, 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University, Lessingstr. 8, 07743, Jena, Germany
| |
Collapse
|
43
|
Önlü S, Türker Saçan M. Impact of geometry optimization methods on QSAR modelling: A case study for predicting human serum albumin binding affinity. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:491-509. [PMID: 28705017 DOI: 10.1080/1062936x.2017.1343253] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 06/09/2017] [Indexed: 06/07/2023]
Abstract
Quantitative structure-activity relationship (QSAR) modelling is a major tool employed in the prediction of various endpoints. However, current QSAR literature is missing a full understanding of the impact of quantum chemical calculation methods on the estimation of molecular descriptors and model performance. Here, we provide a comprehensive analysis of the quantitative effects of different geometry optimization methods (semi-empirical, ab initio Hartee-Fock and density functional theory) on the molecular descriptors. Using experimental binding affinity to human serum albumin (HSA) data, we comparatively investigated the influence of employing descriptors derived from three calculation methods on the QSAR models. We propose a 4-descriptor QSAR model in line with the OECD validation principles for the prediction of drug binding affinity to HSA (log KHSA) as a potential tool for drug development. We also confirm the prediction capability of the proposed model on a heterogeneous external set of chemicals. Furthermore, we recommend an activity-independent rational approach for the selection of geometry optimization method for an improved QSAR model development.
Collapse
Affiliation(s)
- S Önlü
- a Boğaziçi University, Institute of Environmental Sciences , Hisar Campus, Istanbul , Turkey
| | - M Türker Saçan
- a Boğaziçi University, Institute of Environmental Sciences , Hisar Campus, Istanbul , Turkey
| |
Collapse
|
44
|
Nolte TM, Ragas AMJ. A review of quantitative structure-property relationships for the fate of ionizable organic chemicals in water matrices and identification of knowledge gaps. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2017; 19:221-246. [PMID: 28296985 DOI: 10.1039/c7em00034k] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Many organic chemicals are ionizable by nature. After use and release into the environment, various fate processes determine their concentrations, and hence exposure to aquatic organisms. In the absence of suitable data, such fate processes can be estimated using Quantitative Structure-Property Relationships (QSPRs). In this review we compiled available QSPRs from the open literature and assessed their applicability towards ionizable organic chemicals. Using quantitative and qualitative criteria we selected the 'best' QSPRs for sorption, (a)biotic degradation, and bioconcentration. The results indicate that many suitable QSPRs exist, but some critical knowledge gaps remain. Specifically, future focus should be directed towards the development of QSPR models for biodegradation in wastewater and sediment systems, direct photolysis and reaction with singlet oxygen, as well as additional reactive intermediates. Adequate QSPRs for bioconcentration in fish exist, but more accurate assessments can be achieved using pharmacologically based toxicokinetic (PBTK) models. No adequate QSPRs exist for bioconcentration in non-fish species. Due to the high variability of chemical and biological species as well as environmental conditions in QSPR datasets, accurate predictions for specific systems and inter-dataset conversions are problematic, for which standardization is needed. For all QSPR endpoints, additional data requirements involve supplementing the current chemical space covered and accurately characterizing the test systems used.
Collapse
Affiliation(s)
- Tom M Nolte
- Department of Environmental Science, Institute for Water and Wetland Research, Radboud University Nijmegen, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands.
| | - Ad M J Ragas
- Department of Environmental Science, Institute for Water and Wetland Research, Radboud University Nijmegen, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands.
| |
Collapse
|
45
|
Tetko IV, Maran U, Tropsha A. Public (Q)SAR Services, Integrated Modeling Environments, and Model Repositories on the Web: State of the Art and Perspectives for Future Development. Mol Inform 2016; 36. [PMID: 27778468 DOI: 10.1002/minf.201600082] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 10/03/2016] [Indexed: 01/08/2023]
Abstract
Thousands of (Quantitative) Structure-Activity Relationships (Q)SAR models have been described in peer-reviewed publications; however, this way of sharing seldom makes models available for the use by the research community outside of the developer's laboratory. Conversely, on-line models allow broad dissemination and application representing the most effective way of sharing the scientific knowledge. Approaches for sharing and providing on-line access to models range from web services created by individual users and laboratories to integrated modeling environments and model repositories. This emerging transition from the descriptive and informative, but "static", and for the most part, non-executable print format to interactive, transparent and functional delivery of "living" models is expected to have a transformative effect on modern experimental research in areas of scientific and regulatory use of (Q)SAR models.
Collapse
Affiliation(s)
- Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München -, German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, D-, 85764, Neuherberg, Germany.,BigChem GmbH, Ingolstädter Landstraße 1, b. 60w, D-, 85764, Neuherberg, Germany
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA.,Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya St. 18, 420008, Kazan, Russia
| |
Collapse
|
46
|
Oja M, Maran U. Quantitative structure-permeability relationships at various pH values for neutral and amphoteric drugs and drug-like compounds. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016; 27:813-832. [PMID: 27748631 DOI: 10.1080/1062936x.2016.1238408] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2016] [Accepted: 09/15/2016] [Indexed: 06/06/2023]
Abstract
Human intestinal absorption is a key property for orally administered drugs and is dependent on pH. This study focuses on neutral and amphoteric compounds and their membrane permeabilities across the range of pH values found in the human intestine. The membrane permeability values for 15 neutral and 60 amphoteric compounds at pH 3, 5, 7.4 and 9 were measured using the parallel artificial membrane permeability assay (PAMPA). For each data series the quantitative structure-permeability relationships were developed and analysed. The results show that the membrane permeability of neutral compounds is attributed to a single structural characteristic, the hydrogen bond donor ability. Amphoteric compounds are more complex because of their chemical constitution, and therefore require three-parameter models to describe and predict membrane permeability. Analysis of the models for amphoteric compounds reveals that membrane permeability depends on multiple structural characteristics: the partition coefficient, hydrogen bond properties and the shape of the molecules. In addition to conventional validation strategies, two external compounds (isradipine and omeprazole) were tested and revealed very good agreement of pH profiles between experimental and predicted membrane permeability for all of the developed models. Selected QSAR models are available at the QsarDB repository (http://dx.doi.org/10.15152/QDB.184).
Collapse
Affiliation(s)
- Mare Oja
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| |
Collapse
|
47
|
Mangiatordi GF, Alberga D, Altomare CD, Carotti A, Catto M, Cellamare S, Gadaleta D, Lattanzi G, Leonetti F, Pisani L, Stefanachi A, Trisciuzzi D, Nicolotti O. Mind the Gap! A Journey towards Computational Toxicology. Mol Inform 2016; 35:294-308. [PMID: 27546034 DOI: 10.1002/minf.201501017] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 03/23/2016] [Indexed: 11/11/2022]
Abstract
Computational methods have advanced toxicology towards the development of target-specific models based on a clear cause-effect rationale. However, the predictive potential of these models presents strengths and weaknesses. On the good side, in silico models are valuable cheap alternatives to in vitro and in vivo experiments. On the other, the unconscious use of in silico methods can mislead end-users with elusive results. The focus of this review is on the basic scientific and regulatory recommendations in the derivation and application of computational models. Attention is paid to examine the interplay between computational toxicology and drug discovery and development. Avoiding the easy temptation of an overoptimistic future, we report our view on what can, or cannot, realistically be done. Indeed, studies of safety/toxicity represent a key element of chemical prioritization programs carried out by chemical industries, and primarily by pharmaceutical companies.
Collapse
Affiliation(s)
- Giuseppe Felice Mangiatordi
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Domenico Alberga
- Dipartimento Interateneo di Fisica 'M.Merlin', Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Cosimo Damiano Altomare
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Angelo Carotti
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Marco Catto
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Saverio Cellamare
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Domenico Gadaleta
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Gianluca Lattanzi
- Dipartimento Interateneo di Fisica 'M.Merlin', Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Francesco Leonetti
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Leonardo Pisani
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Angela Stefanachi
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy
| | - Orazio Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco, Università di Bari 'AldoMoro', Via Orabona, 4, 70126, Bari, Italy.
| |
Collapse
|
48
|
Oja M, Maran U. Quantitative structure-permeability relationships at various pH values for acidic and basic drugs and drug-like compounds. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:701-719. [PMID: 26383235 DOI: 10.1080/1062936x.2015.1085896] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 08/19/2015] [Indexed: 06/05/2023]
Abstract
Absorption in gastrointestinal tract compartments varies and is largely influenced by pH. Therefore, considering pH in studies and analyses of membrane permeability provides an opportunity to gain a better understanding of the behaviour of compounds and to obtain good permeability estimates for prediction purposes. This study concentrates on relationships between the chemical structure and membrane permeability of acidic and basic drugs and drug-like compounds. The membrane permeability of 36 acidic and 61 basic compounds was measured using the parallel artificial membrane permeability assay (PAMPA) at pH 3, 5, 7.4 and 9. Descriptive and/or predictive single-parameter quantitative structure-permeability relationships were derived for all pH values. For acidic compounds, membrane permeability is mainly influenced by hydrogen bond donor properties, as revealed by models with r(2) > 0.8 for pH 3 and pH 5. For basic compounds, the best (r(2) > 0.7) structure-permeability relationships are obtained with the octanol-water distribution coefficient for pH 7.4 and pH 9, indicating the importance of partition properties. In addition to the validation set, the prediction quality of the developed models was tested with folic acid and astemizole, showing good matches between experimental and calculated membrane permeabilities at key pHs. Selected QSAR models are available at the QsarDB repository ( http://dx.doi.org/10.15152/QDB.166 ).
Collapse
Affiliation(s)
- M Oja
- a Institute of Chemistry , University of Tartu , Ravila 14A, Tartu 50411 , Estonia
| | - U Maran
- a Institute of Chemistry , University of Tartu , Ravila 14A, Tartu 50411 , Estonia
| |
Collapse
|