1
|
Backenköhler M, Groß J, Wolf V, Volkamer A. Guided Docking as a Data Generation Approach Facilitates Structure-Based Machine Learning on Kinases. J Chem Inf Model 2024; 64:4009-4020. [PMID: 38751014 DOI: 10.1021/acs.jcim.4c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2024]
Abstract
Drug discovery pipelines nowadays rely on machine learning models to explore and evaluate large chemical spaces. While including 3D structural information is considered beneficial, structural models are hindered by the availability of protein-ligand complex structures. Exemplified for kinase drug discovery, we address this issue by generating kinase-ligand complex data using template docking for the kinase compound subset of available ChEMBL assay data. To evaluate the benefit of the created complex data, we use it to train a structure-based E(3)-invariant graph neural network. Our evaluation shows that binding affinities can be predicted with significantly higher precision by models that take synthetic binding poses into account compared to ligand- or drug-target interaction models alone.
Collapse
Affiliation(s)
- Michael Backenköhler
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| | - Joschka Groß
- Modeling and Simulation, Saarland University, Saarbrücken 66123, Germany
| | - Verena Wolf
- Modeling and Simulation, Saarland University, Saarbrücken 66123, Germany
| | - Andrea Volkamer
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
- Structural Bioinformatics and in Silico Toxicology Institute of Physiology, Universitätsmedizin Berlin, Berlin 10117, Germany
| |
Collapse
|
2
|
Du Y. Binding Curve Viewer: Visualizing the Equilibrium and Kinetics of Protein-Ligand Binding and Competitive Binding. J Chem Inf Model 2024; 64:4180-4192. [PMID: 38720179 PMCID: PMC11134506 DOI: 10.1021/acs.jcim.4c00130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/21/2024] [Accepted: 04/25/2024] [Indexed: 05/28/2024]
Abstract
Understanding the thermodynamics and kinetics of the protein-ligand interaction is essential for biologists and pharmacologists. To visualize the equilibrium and kinetics of the binding reaction with 1:1 stoichiometry and no cooperativity, we obtained the exact relationship of the concentration of the protein-ligand complex and the time in the second-order binding process and numerically simulated the process of competitive binding. First, two common concerns in measuring protein-ligand interactions were focused on how to avoid the titration regime and how to establish the appropriate incubation time. Then, we gave examples of how the commonly used experimental conditions of [L]0 ≫ [P]0 and [I]0 ≫ [P]0 affected the estimation of the kinetic and thermodynamic properties. Theoretical inhibition curves were calculated, and the apparent IC50 and IC50 were estimated accordingly under predefined conditions. Using the estimated apparent IC50, we compared the apparent Ki and Ki calculated by using the Cheng-Prusoff equation, Lin-Riggs equation, and Wang's group equation. We also applied our tools to simulate high-throughput screening and compare the results of real experiments. The visualization tool for simulating the saturation experiment, kinetic experiments of binding and competitive binding, and inhibition curve, "Binding Curve Viewer," is available at www.eplatton.net/binding-curve-viewer.
Collapse
Affiliation(s)
- Yu Du
- Department
of Clinical Laboratory, The Second Affiliated
Hospital of Jiaxing University, Huancheng North Road 1518, Jiaxing, Zhejiang 314000, China
- The
Key Laboratory, The Second Affiliated Hospital
of Jiaxing University, Huancheng North Road 1518, Jiaxing, Zhejiang 314000, China
| |
Collapse
|
3
|
Wossnig L, Furtmann N, Buchanan A, Kumar S, Greiff V. Best practices for machine learning in antibody discovery and development. Drug Discov Today 2024; 29:104025. [PMID: 38762089 DOI: 10.1016/j.drudis.2024.104025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 04/25/2024] [Accepted: 05/13/2024] [Indexed: 05/20/2024]
Abstract
In the past 40 years, therapeutic antibody discovery and development have advanced considerably, with machine learning (ML) offering a promising way to speed up the process by reducing costs and the number of experiments required. Recent progress in ML-guided antibody design and development (D&D) has been hindered by the diversity of data sets and evaluation methods, which makes it difficult to conduct comparisons and assess utility. Establishing standards and guidelines will be crucial for the wider adoption of ML and the advancement of the field. This perspective critically reviews current practices, highlights common pitfalls and proposes method development and evaluation guidelines for various ML-based techniques in therapeutic antibody D&D. Addressing challenges across the ML process, best practices are recommended for each stage to enhance reproducibility and progress.
Collapse
Affiliation(s)
- Leonard Wossnig
- LabGenius Ltd, The Biscuit Factory, 100 Drummond Road, London SE16 4DG, UK; Department of Computer Science, University College London, 66-72 Gower St, London WC1E 6EA, UK.
| | - Norbert Furtmann
- R&D Large Molecules Research Platform, Sanofi Deutschland GmbH, Industriepark Höchst, Frankfurt Am Main, Germany
| | - Andrew Buchanan
- Biologics Engineering, R&D, AstraZeneca, Cambridge CB2 0AA, UK
| | - Sandeep Kumar
- Computational Protein Design and Modeling Group, Computational Science, Moderna Therapeutics, 200 Technology Square, Cambridge, MA 02139, USA
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| |
Collapse
|
4
|
Schmitz B, Frieg B, Homeyer N, Jessen G, Gohlke H. Extracting binding energies and binding modes from biomolecular simulations of fragment binding to endothiapepsin. Arch Pharm (Weinheim) 2024; 357:e2300612. [PMID: 38319801 DOI: 10.1002/ardp.202300612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/18/2023] [Accepted: 01/10/2024] [Indexed: 02/08/2024]
Abstract
Fragment-based drug discovery (FBDD) aims to discover a set of small binding fragments that may be subsequently linked together. Therefore, in-depth knowledge of the individual fragments' structural and energetic binding properties is essential. In addition to experimental techniques, the direct simulation of fragment binding by molecular dynamics (MD) simulations became popular to characterize fragment binding. However, former studies showed that long simulation times and high computational demands per fragment are needed, which limits applicability in FBDD. Here, we performed short, unbiased MD simulations of direct fragment binding to endothiapepsin, a well-characterized model system of pepsin-like aspartic proteases. To evaluate the strengths and limitations of short MD simulations for the structural and energetic characterization of fragment binding, we predicted the fragments' absolute free energies and binding poses based on the direct simulations of fragment binding and compared the predictions to experimental data. The predicted absolute free energies are in fair agreement with the experiment. Combining the MD data with binding mode predictions from molecular docking approaches helped to correctly identify the most promising fragments for further chemical optimization. Importantly, all computations and predictions were done within 5 days, suggesting that MD simulations may become a viable tool in FBDD projects.
Collapse
Affiliation(s)
- Birte Schmitz
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Benedikt Frieg
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), and Institute of Biological Information Processing (IBI-7: Structural Biochemistry), Forschungszentrum Jülich, Jülich, Germany
| | - Nadine Homeyer
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Gisela Jessen
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Holger Gohlke
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), and Institute of Biological Information Processing (IBI-7: Structural Biochemistry), Forschungszentrum Jülich, Jülich, Germany
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
5
|
Landrum GA, Riniker S. Combining IC 50 or Ki Values from Different Sources Is a Source of Significant Noise. J Chem Inf Model 2024; 64:1560-1567. [PMID: 38394344 PMCID: PMC10934815 DOI: 10.1021/acs.jcim.4c00049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/08/2024] [Accepted: 02/13/2024] [Indexed: 02/25/2024]
Abstract
As part of the ongoing quest to find or construct large data sets for use in validating new machine learning (ML) approaches for bioactivity prediction, it has become distressingly common for researchers to combine literature IC50 data generated using different assays into a single data set. It is well-known that there are many situations where this is a scientifically risky thing to do, even when the assays are against exactly the same target, but the risks of assays being incompatible are even higher when pulling data from large collections of literature data like ChEMBL. Here, we estimate the amount of noise present in combined data sets using cases where measurements for the same compound are reported in multiple assays against the same target. This approach shows that IC50 assays selected using minimal curation settings have poor agreement with each other: almost 65% of the points differ by more than 0.3 log units, 27% differ by more than one log unit, and the correlation between the assays, as measured by Kendall's τ, is only 0.51. Requiring that most of the assay metadata in ChEMBL matches ("maximal curation") in order to combine two assays improves the situation (48% of the points differ by more than 0.3 log units, 13% by more than one log unit, and Kendall's τ is 0.71) at the expense of having smaller data sets. Surprisingly, our analysis shows similar amounts of noise when combining data from different literature Ki assays. We suggest that good scientific practice requires careful curation when combining data sets from different assays and hope that our maximal curation strategy will help to improve the quality of the data that are being used to build and validate ML models for bioactivity prediction. To help achieve this, the code and ChEMBL queries that we used for the maximal curation approach are available as open-source software in our GitHub repository, https://github.com/rinikerlab/overlapping_assays.
Collapse
Affiliation(s)
- Gregory A. Landrum
- Department of Chemistry and Applied
Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Sereina Riniker
- Department of Chemistry and Applied
Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
6
|
Pecina A, Fanfrlík J, Lepšík M, Řezáč J. SQM2.20: Semiempirical quantum-mechanical scoring function yields DFT-quality protein-ligand binding affinity predictions in minutes. Nat Commun 2024; 15:1127. [PMID: 38321025 PMCID: PMC10847445 DOI: 10.1038/s41467-024-45431-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 01/24/2024] [Indexed: 02/08/2024] Open
Abstract
Accurate estimation of protein-ligand binding affinity is the cornerstone of computer-aided drug design. We present a universal physics-based scoring function, named SQM2.20, addressing key terms of binding free energy using semiempirical quantum-mechanical computational methods. SQM2.20 incorporates the latest methodological advances while remaining computationally efficient even for systems with thousands of atoms. To validate it rigorously, we have compiled and made available the PL-REX benchmark dataset consisting of high-resolution crystal structures and reliable experimental affinities for ten diverse protein targets. Comparative assessments demonstrate that SQM2.20 outperforms other scoring methods and reaches a level of accuracy similar to much more expensive DFT calculations. In the PL-REX dataset, it achieves excellent correlation with experimental data (average R2 = 0.69) and exhibits consistent performance across all targets. In contrast to DFT, SQM2.20 provides affinity predictions in minutes, making it suitable for practical applications in hit identification or lead optimization.
Collapse
Affiliation(s)
- Adam Pecina
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Jindřich Fanfrlík
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Martin Lepšík
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Jan Řezáč
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
7
|
Stefan SM, Rafehi M. Medicinal polypharmacology: Exploration and exploitation of the polypharmacolome in modern drug development. Drug Dev Res 2024; 85:e22125. [PMID: 37920929 DOI: 10.1002/ddr.22125] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 09/23/2023] [Accepted: 10/12/2023] [Indexed: 11/04/2023]
Abstract
At the core of complex and multifactorial human diseases, such as cancer, metabolic syndrome, or neurodegeneration, are multiple players that cross-talk in robust biological networks which are intrinsically resilient to alterations. These multifactorial diseases are characterized by sophisticated feedback mechanisms which manifest cellular imbalance and resistance to drug therapy. By adhering to the specificity paradigm ("one target-one drug concept"), research focused for many years on drugs with very narrow mechanisms of action. This narrow focus promoted therapy ineffectiveness and resistance. However, modern drug discovery has evolved over the last years, increasingly emphasizing integral strategies for the development of clinically effective drugs. These integral strategies include the controlled engagement of multiple targets to overcome therapy resistance. Apart from the additive or even synergistic effects in therapy, multitarget drugs harbor molecular-structural attributes to explore orphan targets of which intrinsic substrates/physiological role(s) and/or modulators are unknown for future therapy purposes. We designated this multidisciplinary and translational research field between medicinal chemistry, chemical biology, and molecular pharmacology as 'medicinal polypharmacology'. Medicinal polypharmacology emerged as alternative approach to common single-targeted pharmacology stretching from basic drug and target identification processes to clinical evaluation of multitarget drugs, and the exploration and exploitation of the 'polypharmacolome' is at the forefront of modern drug development research.
Collapse
Affiliation(s)
- Sven Marcel Stefan
- Drug Development and Chemical Biology, Lübeck Institute of Experimental Dermatology (LIED), University of Lübeck and University Medical Center Schleswig-Holstein, Lübeck, Germany
- Translational Neurodegeneration Research and Neuropathology Lab, Department of Pathology, Section of Neuropathology and Oslo University Hospital, University of Oslo, Oslo, Norway
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, New South Wales, Australia
| | - Muhammad Rafehi
- Department of Medical Education, Augsburg University Medicine, Augsburg, Germany
- Institute of Clinical Pharmacology, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
8
|
Schindler CEM, Kuhn D, Hartung IV. The experiment is the limit. Nat Rev Chem 2023; 7:752-753. [PMID: 37880428 DOI: 10.1038/s41570-023-00552-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Affiliation(s)
| | - Daniel Kuhn
- Discovery & Development Technologies, Merck KGaA, Darmstadt, Germany
| | - Ingo V Hartung
- Discovery & Development Technologies, Merck KGaA, Darmstadt, Germany
| |
Collapse
|
9
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
10
|
Ross GA, Lu C, Scarabelli G, Albanese SK, Houang E, Abel R, Harder ED, Wang L. The maximal and current accuracy of rigorous protein-ligand binding free energy calculations. Commun Chem 2023; 6:222. [PMID: 37838760 PMCID: PMC10576784 DOI: 10.1038/s42004-023-01019-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 10/02/2023] [Indexed: 10/16/2023] Open
Abstract
Computational techniques can speed up the identification of hits and accelerate the development of candidate molecules for drug discovery. Among techniques for predicting relative binding affinities, the most consistently accurate is free energy perturbation (FEP), a class of rigorous physics-based methods. However, uncertainty remains about how accurate FEP is and can ever be. Here, we present what we believe to be the largest publicly available dataset of proteins and congeneric series of small molecules, and assess the accuracy of the leading FEP workflow. To ascertain the limit of achievable accuracy, we also survey the reproducibility of experimental relative affinity measurements. We find a wide variability in experimental accuracy and a correspondence between binding and functional assays. When careful preparation of protein and ligand structures is undertaken, FEP can achieve accuracy comparable to experimental reproducibility. Throughout, we highlight reliable protocols that can help maximize the accuracy of FEP in prospective studies.
Collapse
Affiliation(s)
- Gregory A Ross
- Schrödinger Inc, New York, NY, USA.
- Isomorphic Labs, London, UK.
| | - Chao Lu
- Schrödinger Inc, New York, NY, USA
| | | | | | | | | | | | | |
Collapse
|
11
|
Santillo MF, Sprando RL. Predicting binding between 55 cannabinoids and 4,799 biological targets by in silico methods. J Appl Toxicol 2023; 43:1476-1487. [PMID: 37101313 DOI: 10.1002/jat.4478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/11/2023] [Accepted: 04/22/2023] [Indexed: 04/28/2023]
Abstract
Recently, there has been an increase in cannabis-derived products being marketed as foods, dietary supplements, and other consumer products. Cannabis contains over a hundred cannabinoids, many of which have unknown physiological effects. Since there are large numbers of cannabinoids, and many are not commercially available for in vitro testing, an in silico tool (Chemotargets Clarity software) was used to predict binding between 55 cannabinoids and 4,799 biological targets (enzymes, ion channels, receptors, and transporters). This tool relied on quantitative structure activity relationships (QSAR), structural similarity, and other approaches to predict binding. From this screening, 827 cannabinoid-target binding pairs were predicted, which included 143 unique targets. Many cannabinoids sharing core structures (cannabinoid "types") had similar binding profiles, whereas most cannabinoids containing carboxylic acid groups were similar without regards to their core structure. For some of the binding predictions (43), in vitro binding data were available, and they agreed well with in silico binding data (median fourfold difference in binding concentrations). Finally, clinical adverse effects associated with 22 predicted targets were identified from an online database (Clarivate Off-X), providing important insights on potential human health hazards. Overall, in silico biological target predictions are a rapid means to identify potential hazards due to cannabinoid-target interactions, and the data can be used to prioritize subsequent in vitro and in vivo testing.
Collapse
Affiliation(s)
- Michael F Santillo
- Division of Toxicology, Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, US Food and Drug Administration, Laurel, Maryland, USA
| | - Robert L Sprando
- Division of Toxicology, Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, US Food and Drug Administration, Laurel, Maryland, USA
| |
Collapse
|
12
|
Chen L, Fan Z, Chang J, Yang R, Hou H, Guo H, Zhang Y, Yang T, Zhou C, Sui Q, Chen Z, Zheng C, Hao X, Zhang K, Cui R, Zhang Z, Ma H, Ding Y, Zhang N, Lu X, Luo X, Jiang H, Zhang S, Zheng M. Sequence-based drug design as a concept in computational drug design. Nat Commun 2023; 14:4217. [PMID: 37452028 PMCID: PMC10349078 DOI: 10.1038/s41467-023-39856-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 06/27/2023] [Indexed: 07/18/2023] Open
Abstract
Drug development based on target proteins has been a successful approach in recent decades. However, the conventional structure-based drug design (SBDD) pipeline is a complex, human-engineered process with multiple independently optimized steps. Here, we propose a sequence-to-drug concept for computational drug design based on protein sequence information by end-to-end differentiable learning. We validate this concept in three stages. First, we design TransformerCPI2.0 as a core tool for the concept, which demonstrates generalization ability across proteins and compounds. Second, we interpret the binding knowledge that TransformerCPI2.0 learned. Finally, we use TransformerCPI2.0 to discover new hits for challenging drug targets, and identify new target for an existing drug based on an inverse application of the concept. Overall, this proof-of-concept study shows that the sequence-to-drug concept adds a perspective on drug design. It can serve as an alternative method to SBDD, particularly for proteins that do not yet have high-quality 3D structures available.
Collapse
Affiliation(s)
- Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zisheng Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
| | - Jie Chang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Ruirui Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
| | - Hui Hou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Hao Guo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Yinghui Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Tianbiao Yang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chenmao Zhou
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Qibang Sui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Zhengyang Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chen Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xinyue Hao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Keke Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
| | - Rongrong Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Zehong Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hudson Ma
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Yiluan Ding
- Department of Analytical Chemistry, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Naixia Zhang
- Department of Analytical Chemistry, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaojie Lu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 1 Sub-lane Xiangshan, Hangzhou, 310024, China
| | - Sulin Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Jiangsu, Nanjing, 210023, China.
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, No. 393 Huaxia Middle Road, Shanghai, 200031, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 1 Sub-lane Xiangshan, Hangzhou, 310024, China.
| |
Collapse
|
13
|
Losev TV, Gerasimov IS, Panova MV, Lisov AA, Abdyusheva YR, Rusina PV, Zaletskaya E, Stroganov OV, Medvedev MG, Novikov FN. Quantum Mechanical-Cluster Approach to Solve the Bioisosteric Replacement Problem in Drug Design. J Chem Inf Model 2023; 63:1239-1248. [PMID: 36763797 DOI: 10.1021/acs.jcim.2c01212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Bioisosteres are molecules that differ in substituents but still have very similar shapes. Bioisosteric replacements are ubiquitous in modern drug design, where they are used to alter metabolism, change bioavailability, or modify activity of the lead compound. Prediction of relative affinities of bioisosteres with computational methods is a long-standing task; however, the very shape closeness makes bioisosteric substitutions almost intractable for computational methods, which use standard force fields. Here, we design a quantum mechanical (QM)-cluster approach based on the GFN2-xTB semi-empirical quantum-chemical method and apply it to a set of H → F bioisosteric replacements. The proposed methodology enables advanced prediction of biological activity change upon bioisosteric substitution of -H with -F, with the standard deviation of 0.60 kcal/mol, surpassing the ChemPLP scoring function (0.83 kcal/mol), and making QM-based ΔΔG estimation comparable to ∼0.42 kcal/mol standard deviation of in vitro experiment. The speed of the method and lack of tunable parameters makes it affordable in current drug research.
Collapse
Affiliation(s)
- Timofey V Losev
- N.D. Zelinsky Institute of Organic Chemistry of Russian Academy of Sciences, Leninsky prospect 47, 119991 Moscow, Russian Federation.,Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1/3, 119991 Moscow, Russian Federation.,A.N. Nesmeyanov Institute of Organoelement Compounds of Russian Academy of Sciences, Vavilov Str. 28, 119991 Moscow, Russian Federation
| | - Igor S Gerasimov
- N.D. Zelinsky Institute of Organic Chemistry of Russian Academy of Sciences, Leninsky prospect 47, 119991 Moscow, Russian Federation.,Department of Chemistry, Kyungpook National University, Daegu 41566, South Korea
| | - Maria V Panova
- N.D. Zelinsky Institute of Organic Chemistry of Russian Academy of Sciences, Leninsky prospect 47, 119991 Moscow, Russian Federation
| | - Alexey A Lisov
- N.D. Zelinsky Institute of Organic Chemistry of Russian Academy of Sciences, Leninsky prospect 47, 119991 Moscow, Russian Federation
| | - Yana R Abdyusheva
- N.D. Zelinsky Institute of Organic Chemistry of Russian Academy of Sciences, Leninsky prospect 47, 119991 Moscow, Russian Federation.,National Research University Higher School of Economics, Myasnitskaya Street 20, 101000 Moscow, Russian Federation
| | - Polina V Rusina
- N.D. Zelinsky Institute of Organic Chemistry of Russian Academy of Sciences, Leninsky prospect 47, 119991 Moscow, Russian Federation
| | - Eugenia Zaletskaya
- N.D. Zelinsky Institute of Organic Chemistry of Russian Academy of Sciences, Leninsky prospect 47, 119991 Moscow, Russian Federation.,National Research University Higher School of Economics, Myasnitskaya Street 20, 101000 Moscow, Russian Federation
| | - Oleg V Stroganov
- BioMolTech Corp., 226 York Mills Rd, Toronto, Ontario M2L 1L1, Canada
| | - Michael G Medvedev
- N.D. Zelinsky Institute of Organic Chemistry of Russian Academy of Sciences, Leninsky prospect 47, 119991 Moscow, Russian Federation
| | - Fedor N Novikov
- N.D. Zelinsky Institute of Organic Chemistry of Russian Academy of Sciences, Leninsky prospect 47, 119991 Moscow, Russian Federation.,National Research University Higher School of Economics, Myasnitskaya Street 20, 101000 Moscow, Russian Federation
| |
Collapse
|
14
|
Breznik M, Ge Y, Bluck JP, Briem H, Hahn DF, Christ CD, Mortier J, Mobley DL, Meier K. Prioritizing Small Sets of Molecules for Synthesis through in-silico Tools: A Comparison of Common Ranking Methods. ChemMedChem 2023; 18:e202200425. [PMID: 36240514 PMCID: PMC9868080 DOI: 10.1002/cmdc.202200425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/10/2022] [Indexed: 01/26/2023]
Abstract
Prioritizing molecules for synthesis is a key role of computational methods within medicinal chemistry. Multiple tools exist for ranking molecules, from the cheap and popular molecular docking methods to more computationally expensive molecular-dynamics (MD)-based methods. It is often questioned whether the accuracy of the more rigorous methods justifies the higher computational cost and associated calculation time. Here, we compared the performance on ranking the binding of small molecules for seven scoring functions from five docking programs, one end-point method (MM/GBSA), and two MD-based free energy methods (PMX, FEP+). We investigated 16 pharmaceutically relevant targets with a total of 423 known binders. The performance of docking methods for ligand ranking was strongly system dependent. We observed that MD-based methods predominantly outperformed docking algorithms and MM/GBSA calculations. Based on our results, we recommend the application of MD-based free energy methods for prioritization of molecules for synthesis in lead optimization, whenever feasible.
Collapse
Affiliation(s)
- Marko Breznik
- Computational Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - Yunhui Ge
- Department of Pharmaceutical Sciences, University of California, Irvine, CA 92697, USA
| | - Joseph P. Bluck
- Computational Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - Hans Briem
- Computational Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - David F. Hahn
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340, Belgium
| | - Clara D. Christ
- Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - Jérémie Mortier
- Computational Molecular Design, Pharmaceuticals, R&D, Bayer AG, 13342 Berlin, Germany
| | - David L. Mobley
- Department of Pharmaceutical Sciences, University of California, Irvine, CA 92697, USA,Department of Chemistry, University of California, Irvine, CA 92697, USA
| | - Katharina Meier
- Computational Life Science Technology Functions, Crop Science, R&D, Bayer AG, 40789 Monheim, Germany
| |
Collapse
|
15
|
Rodríguez-Pérez R, Trunzer M, Schneider N, Faller B, Gerebtzoff G. Multispecies Machine Learning Predictions of In Vitro Intrinsic Clearance with Uncertainty Quantification Analyses. Mol Pharm 2023; 20:383-394. [PMID: 36437712 DOI: 10.1021/acs.molpharmaceut.2c00680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In pharmaceutical research, compounds are optimized for metabolic stability to avoid a too fast elimination of the drug. Intrinsic clearance (CLint) measured in liver microsomes or hepatocytes is an important parameter during lead optimization. In this work, machine learning models were developed to relate the compound structure to microsomal metabolic stability and predict CLint for new compounds. A multitask (MT) learning architecture was introduced to model the CLint of six species simultaneously, giving as a result a multispecies machine learning model. MT graph neural network (MT-GNN) regression was identified as the top-performing method, and an ensemble of 10 MT-GNN models was evaluated prospectively. Geometric mean fold errors were consistently smaller than 2-fold. Moreover, high precision values were obtained in the prediction of "high" (>300 μL/min/mg) and "low" (<100 μL/min/mg) CLint compounds. Precision values ranged from 80 to 94% for low CLint predictions and from 75 to 97% for high CLint predictions, depending on the species. Uncertainty on experimental values and model predictions was systematically quantified. Experimental variability (aleatoric uncertainty) of all historical Novartis in vitro clearance experiments was analyzed. Interestingly, MT-GNN models' performance approached assays' experimental variability. Moreover, uncertainty estimation in predictions (epistemic uncertainty) enabled identifying predictions associated with lower and higher error. Taken together, our manuscript combines a multispecies deep learning model and large-scale uncertainty analyses to improve CLint predictions and facilitate early informed decisions for compound prioritization.
Collapse
Affiliation(s)
| | - Markus Trunzer
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | - Nadine Schneider
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | - Bernard Faller
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | - Grégori Gerebtzoff
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| |
Collapse
|
16
|
Mlčochová H, Ratih R, Michalcová L, Wätzig H, Glatz Z, Stein M. Comparison of mobility shift affinity capillary electrophoresis and capillary electrophoresis frontal analysis for binding constant determination between human serum albumin and small drugs. Electrophoresis 2022; 43:1724-1734. [DOI: 10.1002/elps.202100320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 05/30/2022] [Accepted: 06/06/2022] [Indexed: 11/08/2022]
Affiliation(s)
- Hana Mlčochová
- Institute of Medicinal and Pharmaceutical Chemistry TU Braunschweig Braunschweig Lower Saxony Germany
- Department of Biochemistry Faculty of Science Masaryk University Brno Czech Republic
| | - Ratih Ratih
- Institute of Medicinal and Pharmaceutical Chemistry TU Braunschweig Braunschweig Lower Saxony Germany
- Department of Pharmaceutical Chemistry Faculty of Pharmacy University of Surabaya Surabaya East Java Indonesia
| | - Lenka Michalcová
- Institute of Medicinal and Pharmaceutical Chemistry TU Braunschweig Braunschweig Lower Saxony Germany
- Department of Biochemistry Faculty of Science Masaryk University Brno Czech Republic
| | - Hermann Wätzig
- Institute of Medicinal and Pharmaceutical Chemistry TU Braunschweig Braunschweig Lower Saxony Germany
| | - Zdeněk Glatz
- Department of Biochemistry Faculty of Science Masaryk University Brno Czech Republic
| | - Matthias Stein
- Institute of Medicinal and Pharmaceutical Chemistry TU Braunschweig Braunschweig Lower Saxony Germany
| |
Collapse
|
17
|
Kwapien K, Nittinger E, He J, Margreitter C, Voronov A, Tyrchan C. Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning Models in Drug Design. ACS OMEGA 2022; 7:26573-26581. [PMID: 35936431 PMCID: PMC9352238 DOI: 10.1021/acsomega.2c02738] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/08/2022] [Indexed: 05/20/2023]
Abstract
Matched molecular pairs (MMPs) are nowadays a commonly applied concept in drug design. They are used in many computational tools for structure-activity relationship analysis, biological activity prediction, or optimization of physicochemical properties. However, until now it has not been shown in a rigorous way that MMPs, that is, changing only one substituent between two molecules, can be predicted with higher accuracy and precision in contrast to any other chemical compound pair. It is expected that any model should be able to predict such a defined change with high accuracy and reasonable precision. In this study, we examine the predictability of four classical properties relevant for drug design ranging from simple physicochemical parameters (log D and solubility) to more complex cell-based ones (permeability and clearance), using different data sets and machine learning algorithms. Our study confirms that additive data are the easiest to predict, which highlights the importance of recognition of nonadditivity events and the challenging complexity of predicting properties in case of scaffold hopping. Despite deep learning being well suited to model nonlinear events, these methods do not seem to be an exception of this observation. Though they are in general performing better than classical machine learning methods, this leaves the field with a still standing challenge.
Collapse
Affiliation(s)
- Karolina Kwapien
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | - Eva Nittinger
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | - Jiazhen He
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | | | - Alexey Voronov
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | - Christian Tyrchan
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden
| |
Collapse
|
18
|
Ring replacement recommender: Ring modifications for improving biological activity. Eur J Med Chem 2022; 238:114483. [DOI: 10.1016/j.ejmech.2022.114483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 05/16/2022] [Accepted: 05/17/2022] [Indexed: 11/19/2022]
|
19
|
Yu J, Wang D, Zheng M. Uncertainty quantification: Can we trust artificial intelligence in drug discovery? iScience 2022; 25:104814. [PMID: 35996575 PMCID: PMC9391523 DOI: 10.1016/j.isci.2022.104814] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The problem of human trust is one of the most fundamental problems in applied artificial intelligence in drug discovery. In silico models have been widely used to accelerate the process of drug discovery in recent years. However, most of these models can only give reliable predictions within a limited chemical space that the training set covers (applicability domain). Predictions of samples falling outside the applicability domain are unreliable and sometimes dangerous for the drug-design decision-making process. Uncertainty quantification accordingly has drawn great attention to enable autonomous drug designing. By quantifying the confidence level of model predictions, the reliability of the predictions can be quantitatively represented to assist researchers in their molecular reasoning and experimental design. Here we summarize the state-of-the-art approaches to uncertainty quantification and underline how they can be used for drug design and discovery projects. Furthermore, we also outline four representative application scenarios of uncertainty quantification in drug discovery.
Collapse
|
20
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
21
|
He K. Pharmacological affinity fingerprints derived from bioactivity data for the identification of designer drugs. J Cheminform 2022; 14:35. [PMID: 35672835 PMCID: PMC9171973 DOI: 10.1186/s13321-022-00607-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/05/2022] [Indexed: 12/15/2022] Open
Abstract
Facing the continuous emergence of new psychoactive substances (NPS) and their threat to public health, more effective methods for NPS prediction and identification are critical. In this study, the pharmacological affinity fingerprints (Ph-fp) of NPS compounds were predicted by Random Forest classification models using bioactivity data from the ChEMBL database. The binary Ph-fp is the vector consisting of a compound’s activity against a list of molecular targets reported to be responsible for the pharmacological effects of NPS. Their performance in similarity searching and unsupervised clustering was assessed and compared to 2D structure fingerprints Morgan and MACCS (1024-bits ECFP4 and 166-bits SMARTS-based MACCS implementation of RDKit). The performance in retrieving compounds according to their pharmacological categorizations is influenced by the predicted active assay counts in Ph-fp and the choice of similarity metric. Overall, the comparative unsupervised clustering analysis suggests the use of a classification model with Morgan fingerprints as input for the construction of Ph-fp. This combination gives satisfactory clustering performance based on external and internal clustering validation indices.
Collapse
|
22
|
Identification of Kukoamine A, Zeaxanthin, and Clexane as New Furin Inhibitors. Int J Mol Sci 2022; 23:ijms23052796. [PMID: 35269938 PMCID: PMC8911046 DOI: 10.3390/ijms23052796] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 02/23/2022] [Accepted: 02/25/2022] [Indexed: 02/01/2023] Open
Abstract
The endogenous protease furin is a key protein in many different diseases, such as cancer and infections. For this reason, a wide range of studies has focused on targeting furin from a therapeutic point of view. Our main objective consisted of identifying new compounds that could enlarge the furin inhibitor arsenal; secondarily, we assayed their adjuvant effect in combination with a known furin inhibitor, CMK, which avoids the SARS-CoV-2 S protein cleavage by means of that inhibition. Virtual screening was carried out to identify potential furin inhibitors. The inhibition of physiological and purified recombinant furin by screening selected compounds, Clexane, and these drugs in combination with CMK was assayed in fluorogenic tests by using a specific furin substrate. The effects of the selected inhibitors from virtual screening on cell viability (293T HEK cell line) were assayed by means of flow cytometry. Through virtual screening, Zeaxanthin and Kukoamine A were selected as the main potential furin inhibitors. In fluorogenic assays, these two compounds and Clexane inhibited both physiological and recombinant furin in a dose-dependent way. In addition, these compounds increased physiological furin inhibition by CMK, showing an adjuvant effect. In conclusion, we identified Kukoamine A, Zeaxanthin, and Clexane as new furin inhibitors. In addition, these drugs were able to increase furin inhibition by CMK, so they could also increase its efficiency when avoiding S protein proteolysis, which is essential for SARS-CoV-2 cell infection.
Collapse
|
23
|
Tresadern G, Tatikola K, Cabrera J, Wang L, Abel R, van Vlijmen H, Geys H. The Impact of Experimental and Calculated Error on the Performance of Affinity Predictions. J Chem Inf Model 2022; 62:703-717. [DOI: 10.1021/acs.jcim.1c01214] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Kanaka Tatikola
- Nonclinical Statistics, Janssen Research & Development, 920 Route 202 South, Raritan, New Jersey 08869, United States
| | - Javier Cabrera
- Department of Statistics, Rutgers University, New Brunswick, New Jersey 08901-8554, United States
| | - Lingle Wang
- Schrödinger, Inc., New York, New York 10036, United States
| | - Robert Abel
- Schrödinger, Inc., New York, New York 10036, United States
| | - Herman van Vlijmen
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Helena Geys
- Nonclinical Statistics, Janssen Research & Development, Turnhoutseweg 30, B-2340 Beerse, Belgium
| |
Collapse
|
24
|
Jiménez-Luna J, Skalic M, Weskamp N. Benchmarking Molecular Feature Attribution Methods with Activity Cliffs. J Chem Inf Model 2022; 62:274-283. [PMID: 35019265 DOI: 10.1021/acs.jcim.1c01163] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Feature attribution techniques are popular choices within the explainable artificial intelligence toolbox, as they can help elucidate which parts of the provided inputs used by an underlying supervised-learning method are considered relevant for a specific prediction. In the context of molecular design, these approaches typically involve the coloring of molecular graphs, whose presentation to medicinal chemists can be useful for making a decision of which compounds to synthesize or prioritize. The consistency of the highlighted moieties alongside expert background knowledge is expected to contribute to the understanding of machine-learning models in drug design. Quantitative evaluation of such coloring approaches, however, has so far been limited to substructure identification tasks. We here present an approach that is based on maximum common substructure algorithms applied to experimentally-determined activity cliffs. Using the proposed benchmark, we found that molecule coloring approaches in conjunction with classical machine-learning models tend to outperform more modern, graph-neural-network alternatives. The provided benchmark data are fully open sourced, which we hope will facilitate the testing of newly developed molecular feature attribution techniques.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093 Zurich, Switzerland.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Miha Skalic
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Nils Weskamp
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| |
Collapse
|
25
|
Leveraging nonstructural data to predict structures and affinities of protein-ligand complexes. Proc Natl Acad Sci U S A 2021; 118:2112621118. [PMID: 34921117 PMCID: PMC8713799 DOI: 10.1073/pnas.2112621118] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/15/2021] [Indexed: 01/02/2023] Open
Abstract
Structure-based drug design depends on the ability to predict both the three-dimensional structures of candidate molecules bound to their targets and the associated binding affinities. We demonstrate that one can substantially improve the accuracy of these predictions using easily obtained data about completely different molecules that bind to the same target without requiring any target-bound structures of these molecules. The approach we developed to integrate physical and data-driven modeling may find a variety of applications in the rapidly growing field of artificial intelligence for drug discovery. Over the past five decades, tremendous effort has been devoted to computational methods for predicting properties of ligands—i.e., molecules that bind macromolecular targets. Such methods, which are critical to rational drug design, fall into two categories: physics-based methods, which directly model ligand interactions with the target given the target’s three-dimensional (3D) structure, and ligand-based methods, which predict ligand properties given experimental measurements for similar ligands. Here, we present a rigorous statistical framework to combine these two sources of information. We develop a method to predict a ligand’s pose—the 3D structure of the ligand bound to its target—that leverages a widely available source of information: a list of other ligands that are known to bind the same target but for which no 3D structure is available. This combination of physics-based and ligand-based modeling improves pose prediction accuracy across all major families of drug targets. Using the same framework, we develop a method for virtual screening of drug candidates, which outperforms standard physics-based and ligand-based virtual screening methods. Our results suggest broad opportunities to improve prediction of various ligand properties by combining diverse sources of information through customized machine-learning approaches.
Collapse
|
26
|
Bai X, Yin Y. Exploration and augmentation of pharmacological space via adversarial auto-encoder model for facilitating kinase-centric drug development. J Cheminform 2021; 13:95. [PMID: 34872613 PMCID: PMC8650415 DOI: 10.1186/s13321-021-00574-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 11/20/2021] [Indexed: 11/10/2022] Open
Abstract
Predicting compound-protein interactions (CPIs) is of great importance for drug discovery and repositioning, yet still challenging mainly due to the sparse nature of CPI matrixes, resulting in poor generalization performance. Hence, unlike typical CPI prediction models focused on representation learning or model selection, we propose a deep neural network-based strategy, PCM-AAE, that re-explores and augments the pharmacological space of kinase inhibitors by introducing the adversarial auto-encoder model (AAE) to improve the generalization of the prediction model. To complete the data space, we constructed Ensemble of PCM-AAE (EPA), an ensemble model that quickly and accurately yields quantitative predictions of binding affinity between any human kinase and inhibitor. In rigorous internal validation, EPA showed excellent performance, consistently outperforming the model trained with the imbalanced set, especially for targets with relatively fewer training data points. Improved prediction accuracy of EPA for external datasets enhances its generalization ability, making it possible to gracefully handle previously unseen kinases and inhibitors. EPA showed promising potential when directly applied to virtual screening and off-target prediction, exhibiting its practicality in hit prediction. Our strategy is expected to facilitate kinase-centric drug development, as well as to solve more challenging prediction problems with insufficient data points.
Collapse
Affiliation(s)
- Xinyu Bai
- Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China.,Institute of Systems Biomedicine, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, People's Republic of China
| | - Yuxin Yin
- Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China. .,Institute of Systems Biomedicine, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, People's Republic of China. .,Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China.
| |
Collapse
|
27
|
Thomas M, Boardman A, Garcia-Ortegon M, Yang H, de Graaf C, Bender A. Applications of Artificial Intelligence in Drug Design: Opportunities and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:1-59. [PMID: 34731463 DOI: 10.1007/978-1-0716-1787-8_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Artificial intelligence (AI) has undergone rapid development in recent years and has been successfully applied to real-world problems such as drug design. In this chapter, we review recent applications of AI to problems in drug design including virtual screening, computer-aided synthesis planning, and de novo molecule generation, with a focus on the limitations of the application of AI therein and opportunities for improvement. Furthermore, we discuss the broader challenges imposed by AI in translating theoretical practice to real-world drug design; including quantifying prediction uncertainty and explaining model behavior.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andrew Boardman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Miguel Garcia-Ortegon
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.,Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
28
|
Kolmar SS, Grulke CM. The effect of noise on the predictive limit of QSAR models. J Cheminform 2021; 13:92. [PMID: 34823605 PMCID: PMC8613965 DOI: 10.1186/s13321-021-00571-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/14/2021] [Indexed: 01/09/2023] Open
Abstract
A key challenge in the field of Quantitative Structure Activity Relationships (QSAR) is how to effectively treat experimental error in the training and evaluation of computational models. It is often assumed in the field of QSAR that models cannot produce predictions which are more accurate than their training data. Additionally, it is implicitly assumed, by necessity, that data points in test sets or validation sets do not contain error, and that each data point is a population mean. This work proposes the hypothesis that QSAR models can make predictions which are more accurate than their training data and that the error-free test set assumption leads to a significant misevaluation of model performance. This work used 8 datasets with six different common QSAR endpoints, because different endpoints should have different amounts of experimental error associated with varying complexity of the measurements. Up to 15 levels of simulated Gaussian distributed random error was added to the datasets, and models were built on the error laden datasets using five different algorithms. The models were trained on the error laden data, evaluated on error-laden test sets, and evaluated on error-free test sets. The results show that for each level of added error, the RMSE for evaluation on the error free test sets was always better. The results support the hypothesis that, at least under the conditions of Gaussian distributed random error, QSAR models can make predictions which are more accurate than their training data, and that the evaluation of models on error laden test and validation sets may give a flawed measure of model performance. These results have implications for how QSAR models are evaluated, especially for disciplines where experimental error is very large, such as in computational toxicology. ![]()
Collapse
Affiliation(s)
- Scott S Kolmar
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA.
| | - Christopher M Grulke
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| |
Collapse
|
29
|
Yang ZY, Fu L, Lu AP, Liu S, Hou TJ, Cao DS. Semi-automated workflow for molecular pair analysis and QSAR-assisted transformation space expansion. J Cheminform 2021; 13:86. [PMID: 34774096 PMCID: PMC8590336 DOI: 10.1186/s13321-021-00564-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/30/2021] [Indexed: 12/01/2022] Open
Abstract
In the process of drug discovery, the optimization of lead compounds has always been a challenge faced by pharmaceutical chemists. Matched molecular pair analysis (MMPA), a promising tool to efficiently extract and summarize the relationship between structural transformation and property change, is suitable for local structural optimization tasks. Especially, the integration of MMPA with QSAR modeling can further strengthen the utility of MMPA in molecular optimization navigation. In this study, a new semi-automated procedure based on KNIME was developed to support MMPA on both large- and small-scale datasets, including molecular preparation, QSAR model construction, applicability domain evaluation, and MMP calculation and application. Two examples covering regression and classification tasks were provided to gain a better understanding of the importance of MMPA, which has also shown the reliability and utility of this MMPA-by-QSAR pipeline. ![]()
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, People's Republic of China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China. .,Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Changsha, 410013, Hunan, China. .,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, 999077, SAR, People's Republic of China.
| |
Collapse
|
30
|
Machine Learning Applied to the Modeling of Pharmacological and ADMET Endpoints. Methods Mol Biol 2021. [PMID: 34731464 DOI: 10.1007/978-1-0716-1787-8_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
The well-known concept of quantitative structure-activity relationships (QSAR) has been gaining significant interest in the recent years. Data, descriptors, and algorithms are the main pillars to build useful models that support more efficient drug discovery processes with in silico methods. Significant advances in all three areas are the reason for the regained interest in these models. In this book chapter we review various machine learning (ML) approaches that make use of measured in vitro/in vivo data of many compounds. We put these in context with other digital drug discovery methods and present some application examples.
Collapse
|
31
|
Mervin LH, Trapotsi MA, Afzal AM, Barrett IP, Bender A, Engkvist O. Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty. J Cheminform 2021; 13:62. [PMID: 34412708 PMCID: PMC8375213 DOI: 10.1186/s13321-021-00539-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/30/2021] [Indexed: 11/24/2022] Open
Abstract
Measurements of protein–ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., Ki versus IC50 values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein–ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4–0.6 log units and when ideal probability estimates between 0.4–0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC50 value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold.
Collapse
Affiliation(s)
- Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Maria-Anna Trapotsi
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Avid M Afzal
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ian P Barrett
- Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
32
|
Garcia de Lomana M, Morger A, Norinder U, Buesen R, Landsiedel R, Volkamer A, Kirchmair J, Mathea M. ChemBioSim: Enhancing Conformal Prediction of In Vivo Toxicity by Use of Predicted Bioactivities. J Chem Inf Model 2021; 61:3255-3272. [PMID: 34153183 PMCID: PMC8317154 DOI: 10.1021/acs.jcim.1c00451] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Indexed: 02/07/2023]
Abstract
Computational methods such as machine learning approaches have a strong track record of success in predicting the outcomes of in vitro assays. In contrast, their ability to predict in vivo endpoints is more limited due to the high number of parameters and processes that may influence the outcome. Recent studies have shown that the combination of chemical and biological data can yield better models for in vivo endpoints. The ChemBioSim approach presented in this work aims to enhance the performance of conformal prediction models for in vivo endpoints by combining chemical information with (predicted) bioactivity assay outcomes. Three in vivo toxicological endpoints, capturing genotoxic (MNT), hepatic (DILI), and cardiological (DICC) issues, were selected for this study due to their high relevance for the registration and authorization of new compounds. Since the sparsity of available biological assay data is challenging for predictive modeling, predicted bioactivity descriptors were introduced instead. Thus, a machine learning model for each of the 373 collected biological assays was trained and applied on the compounds of the in vivo toxicity data sets. Besides the chemical descriptors (molecular fingerprints and physicochemical properties), these predicted bioactivities served as descriptors for the models of the three in vivo endpoints. For this study, a workflow based on a conformal prediction framework (a method for confidence estimation) built on random forest models was developed. Furthermore, the most relevant chemical and bioactivity descriptors for each in vivo endpoint were preselected with lasso models. The incorporation of bioactivity descriptors increased the mean F1 scores of the MNT model from 0.61 to 0.70 and for the DICC model from 0.72 to 0.82 while the mean efficiencies increased by roughly 0.10 for both endpoints. In contrast, for the DILI endpoint, no significant improvement in model performance was observed. Besides pure performance improvements, an analysis of the most important bioactivity features allowed detection of novel and less intuitive relationships between the predicted biological assay outcomes used as descriptors and the in vivo endpoints. This study presents how the prediction of in vivo toxicity endpoints can be improved by the incorporation of biological information-which is not necessarily captured by chemical descriptors-in an automated workflow without the need for adding experimental workload for the generation of bioactivity descriptors as predicted outcomes of bioactivity assays were utilized. All bioactivity CP models for deriving the predicted bioactivities, as well as the in vivo toxicity CP models, can be freely downloaded from https://doi.org/10.5281/zenodo.4761225.
Collapse
Affiliation(s)
- Marina Garcia de Lomana
- BASF
SE, Ludwigshafen am Rhein 67063, Germany
- Department
of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Vienna 1090, Austria
| | - Andrea Morger
- In Silico
Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz
1, Berlin 10117, Germany
| | - Ulf Norinder
- MTM
Research Centre, School of Science and Technology, Örebro University, Örebro SE-70182, Sweden
| | | | | | - Andrea Volkamer
- In Silico
Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz
1, Berlin 10117, Germany
| | - Johannes Kirchmair
- Department
of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Vienna 1090, Austria
| | | |
Collapse
|
33
|
Nonadditivity in public and inhouse data: implications for drug design. J Cheminform 2021; 13:47. [PMID: 34215341 PMCID: PMC8254291 DOI: 10.1186/s13321-021-00525-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 06/09/2021] [Indexed: 11/10/2022] Open
Abstract
Numerous ligand-based drug discovery projects are based on structure-activity relationship (SAR) analysis, such as Free-Wilson (FW) or matched molecular pair (MMP) analysis. Intrinsically they assume linearity and additivity of substituent contributions. These techniques are challenged by nonadditivity (NA) in protein-ligand binding where the change of two functional groups in one molecule results in much higher or lower activity than expected from the respective single changes. Identifying nonlinear cases and possible underlying explanations is crucial for a drug design project since it might influence which lead to follow. By systematically analyzing all AstraZeneca (AZ) inhouse compound data and publicly available ChEMBL25 bioactivity data, we show significant NA events in almost every second assay among the inhouse and once in every third assay in public data sets. Furthermore, 9.4% of all compounds of the AZ database and 5.1% from public sources display significant additivity shifts indicating important SAR features or fundamental measurement errors. Using NA data in combination with machine learning showed that nonadditive data is challenging to predict and even the addition of nonadditive data into training did not result in an increase in predictivity. Overall, NA analysis should be applied on a regular basis in many areas of computational chemistry and can further improve rational drug design.
Collapse
|
34
|
Lin Z, Zou J, Liu S, Peng C, Li Z, Wan X, Fang D, Yin J, Gobbo G, Chen Y, Ma J, Wen S, Zhang P, Yang M. A Cloud Computing Platform for Scalable Relative and Absolute Binding Free Energy Predictions: New Opportunities and Challenges for Drug Discovery. J Chem Inf Model 2021; 61:2720-2732. [PMID: 34086476 DOI: 10.1021/acs.jcim.0c01329] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Free energy perturbation (FEP) has become widely used in drug discovery programs for binding affinity prediction between candidate compounds and their biological targets. However, limitations of FEP applications also exist, including, but not limited to, high cost, long waiting time, limited scalability, and breadth of application scenarios. To overcome these problems, we have developed XFEP, a scalable cloud computing platform for both relative and absolute free energy predictions using optimized simulation protocols. XFEP enables large-scale FEP calculations in a more efficient, scalable, and affordable way, for example, the evaluation of 5000 compounds can be performed in 1 week using 50-100 GPUs with a computing cost roughly equivalent to the cost for the synthesis of only one new compound. By combining these capabilities with artificial intelligence techniques for goal-directed molecule generation and evaluation, new opportunities can be explored for FEP applications in the drug discovery stages of hit identification, hit-to-lead, and lead optimization based not only on structure exploitation within the given chemical series but also including evaluation and comparison of completely unrelated molecules during structure exploration in a larger chemical space. XFEP provides the basis for scalable FEP applications to become more widely used in drug discovery projects and to speed up the drug discovery process from hit identification to preclinical candidate compound nomination.
Collapse
Affiliation(s)
- Zhixiong Lin
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Junjie Zou
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Shuai Liu
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China.,XtalPi Inc., 245 Main Street, Cambridge, Massachusetts 02142, United States
| | - Chunwang Peng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Zhipeng Li
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Xiao Wan
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Dong Fang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Jian Yin
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Gianpaolo Gobbo
- XtalPi Inc., 245 Main Street, Cambridge, Massachusetts 02142, United States
| | - Yongpan Chen
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Jian Ma
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Shuhao Wen
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China.,XtalPi Inc., 245 Main Street, Cambridge, Massachusetts 02142, United States
| | - Peiyu Zhang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Mingjun Yang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| |
Collapse
|
35
|
Matsumoto K, Miyao T, Funatsu K. Ranking-Oriented Quantitative Structure-Activity Relationship Modeling Combined with Assay-Wise Data Integration. ACS OMEGA 2021; 6:11964-11973. [PMID: 34056351 PMCID: PMC8154010 DOI: 10.1021/acsomega.1c00463] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/21/2021] [Indexed: 05/15/2023]
Abstract
In ligand-based drug design, quantitative structure-activity relationship (QSAR) models play an important role in activity prediction. One of the major end points of QSAR models is half-maximal inhibitory concentration (IC50). Experimental IC50 data from various research groups have been accumulated in publicly accessible databases, providing an opportunity for us to use such data in predictive QSAR models. In this study, we focused on using a ranking-oriented QSAR model as a predictive model because relative potency strength within the same assay is solid information that is not based on any mechanical assumptions. We conducted rigorous validation using the ChEMBL database and previously reported data sets. Ranking support vector machine (ranking-SVM) models trained on compounds from similar assays were as good as support vector regression (SVR) with the Tanimoto kernel trained on compounds from all the assays. As effective ways of data integration, for ranking-SVM, integrated compounds should be selected from only similar assays in terms of compounds. For SVR with the Tanimoto kernel, entire compounds from different assays can be incorporated.
Collapse
Affiliation(s)
- Katsuhisa Matsumoto
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5
Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Kimito Funatsu
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5
Takayama-cho, Ikoma, Nara, 630-0192, Japan
- Department
of Chemical System Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
- E-mail: . Phone: +81-3-5841-7751. Fax: +81-3-5841-7771
| |
Collapse
|
36
|
Tarasova O, Poroikov V. Machine Learning in Discovery of New Antivirals and Optimization of Viral Infections Therapy. Curr Med Chem 2021; 28:7840-7861. [PMID: 33949929 DOI: 10.2174/0929867328666210504114351] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/13/2021] [Accepted: 02/24/2021] [Indexed: 11/22/2022]
Abstract
Nowadays, computational approaches play an important role in the design of new drug-like compounds and optimization of pharmacotherapeutic treatment of diseases. The emerging growth of viral infections, including those caused by the Human Immunodeficiency Virus (HIV), Ebola virus, recently detected coronavirus, and some others, leads to many newly infected people with a high risk of death or severe complications. A huge amount of chemical, biological, clinical data is at the disposal of the researchers. Therefore, there are many opportunities to find the relationships between the particular features of chemical data and the antiviral activity of biologically active compounds based on machine learning approaches. Biological and clinical data can also be used for building models to predict relationships between viral genotype and drug resistance, which might help determine the clinical outcome of treatment. In the current study, we consider machine-learning approaches in the antiviral research carried out during the past decade. We overview in detail the application of machine-learning methods for the design of new potential antiviral agents and vaccines, drug resistance prediction, and analysis of virus-host interactions. Our review also covers the perspectives of using the machine-learning approaches for antiviral research, including Dengue, Ebola viruses, Influenza A, Human Immunodeficiency Virus, coronaviruses, and some others.
Collapse
Affiliation(s)
- Olga Tarasova
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| | - Vladimir Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| |
Collapse
|
37
|
Kawai K, Tomonou M, Machida Y, Karuo Y, Tarui A, Sato K, Ikeda Y, Kinashi T, Omote M. Effect of Learning Dataset for Identification of Active Molecules: A Case Study of Integrin αIIbβ3 Inhibitors. Mol Inform 2021; 40:e2060040. [PMID: 33738924 DOI: 10.1002/minf.202060040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 01/30/2021] [Indexed: 01/13/2023]
Abstract
Efficient in silico approaches are needed to identify strong integrin αIIbβ3 inhibitors through a small number of measurements. To address the challenge, we investigated the effect of learning dataset on the classification performance of machine learning models focusing on weak and inactive compounds. The structure and activity information of the compounds were obtained from ChEMBL, and pCHEMBL values were used to classify them as active, inactive, or weak. Datasets with various imbalance levels from active:inactive=1 : 1 to 1 : 1000 were used for the machine learning. The prediction scores of the weak samples were found to lie between the predictive values of active and inactive compounds. In addition, another dataset that consists of 149 actives and 6.9 million inactives was screened; the results indicated that the number of positive predictions decreased for models trained with a higher number of inactives. Although there is a trade-off between false positives and false negatives, for determination of compounds with strong activity using a reduced number of measurements, it is better to use a large number of inactives for learning and identifying compounds that score higher than the weak samples.
Collapse
Affiliation(s)
- Kentaro Kawai
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Mami Tomonou
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yume Machida
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yukiko Karuo
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Atsushi Tarui
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Kazuyuki Sato
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yoshiki Ikeda
- Department of Molecular Genetics, Institute of Biomedical Science, Kansai Medical University, 2-5-1 Shin-machi, Hirakata, Osaka, 573-1010, Japan
| | - Tatsuo Kinashi
- Department of Molecular Genetics, Institute of Biomedical Science, Kansai Medical University, 2-5-1 Shin-machi, Hirakata, Osaka, 573-1010, Japan
| | - Masaaki Omote
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| |
Collapse
|
38
|
Jiménez-Luna J, Skalic M, Weskamp N, Schneider G. Coloring Molecules with Explainable Artificial Intelligence for Preclinical Relevance Assessment. J Chem Inf Model 2021; 61:1083-1094. [PMID: 33629843 DOI: 10.1021/acs.jcim.0c01344] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Graph neural networks are able to solve certain drug discovery tasks such as molecular property prediction and de novo molecule generation. However, these models are considered "black-box" and "hard-to-debug". This study aimed to improve modeling transparency for rational molecular design by applying the integrated gradients explainable artificial intelligence (XAI) approach for graph neural network models. Models were trained for predicting plasma protein binding, hERG channel inhibition, passive permeability, and cytochrome P450 inhibition. The proposed methodology highlighted molecular features and structural elements that are in agreement with known pharmacophore motifs, correctly identified property cliffs, and provided insights into unspecific ligand-target interactions. The developed XAI approach is fully open-sourced and can be used by practitioners to train new models on other clinically relevant endpoints.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8049 Zurich, Switzerland
| | - Miha Skalic
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Nils Weskamp
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8049 Zurich, Switzerland
| |
Collapse
|
39
|
Bieniek M, Bhati AP, Wan S, Coveney PV. TIES 20: Relative Binding Free Energy with a Flexible Superimposition Algorithm and Partial Ring Morphing. J Chem Theory Comput 2021; 17:1250-1265. [PMID: 33486956 PMCID: PMC7876800 DOI: 10.1021/acs.jctc.0c01179] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Indexed: 12/14/2022]
Abstract
The TIES (Thermodynamic Integration with Enhanced Sampling) protocol is a formally exact alchemical approach in computational chemistry to the calculation of relative binding free energies. The validity of TIES relies on the correctness of matching atoms across compared pairs of ligands, laying the foundation for the transformation along an alchemical pathway. We implement a flexible topology superimposition algorithm which uses an exhaustive joint-traversal for computing the largest common component(s). The algorithm is employed to enable matching and morphing of partial rings in the TIES protocol along with a validation study using 55 transformations and five different proteins from our previous work. We find that TIES 20 with the RESP charge system, using the new superimposition algorithm, reproduces the previous results with mean unsigned error of 0.75 kcal/mol with respect to the experimental data. Enabling the morphing of partial rings decreases the size of the alchemical region in the dual-topology transformations resulting in a significant improvement in the prediction precision. We find that increasing the ensemble size from 5 to 20 replicas per λ window only has a minimal impact on the accuracy. However, the non-normal nature of the relative free energy distributions underscores the importance of ensemble simulation. We further compare the results with the AM1-BCC charge system and show that it improves agreement with the experimental data by slightly over 10%. This improvement is partly due to AM1-BCC affecting only the charges of the atoms local to the mutation, which translates to even fewer morphed atoms, consequently reducing issues with sampling and therefore ensemble averaging. TIES 20, in conjunction with the enablement of ring morphing, reduces the size of the alchemical region and significantly improves the precision of the predicted free energies.
Collapse
Affiliation(s)
- Mateusz
K. Bieniek
- Centre for Computational Science, Department
of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| | - Agastya P. Bhati
- Centre for Computational Science, Department
of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| | - Shunzhou Wan
- Centre for Computational Science, Department
of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| | - Peter V. Coveney
- Centre for Computational Science, Department
of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| |
Collapse
|
40
|
Rufer AC. Drug discovery for enzymes. Drug Discov Today 2021; 26:875-886. [PMID: 33454380 DOI: 10.1016/j.drudis.2021.01.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 12/21/2020] [Accepted: 01/07/2021] [Indexed: 02/06/2023]
Abstract
Enzymes are essential, physiological catalysts involved in all processes of life, including metabolism, cellular signaling and motility, as well as cell growth and division. They are attractive drug targets because of the presence of defined substrate-binding pockets, which can be exploited as binding sites for pharmaceutical enzyme inhibitors. Understanding the reaction mechanisms of enzymes and the molecular mode of action of enzyme inhibitors is indispensable for the discovery and development of potent, efficacious, and safe novel drugs. The combination of classical concepts of enzymology with new experimental and data analysis methods opens new routes for drug discovery.
Collapse
Affiliation(s)
- Arne Christian Rufer
- Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 065/208A, 4070 Basel, Switzerland.
| |
Collapse
|
41
|
Matter H, Buning C, Stefanescu DD, Ruf S, Hessler G. Using Graph Databases to Investigate Trends in Structure–Activity Relationship Networks. J Chem Inf Model 2020; 60:6120-6134. [DOI: 10.1021/acs.jcim.0c00947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hans Matter
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Industriepark Höchst, D-65926 Frankfurt am Main, Germany
| | - Christian Buning
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Industriepark Höchst, D-65926 Frankfurt am Main, Germany
| | - Dan Dragos Stefanescu
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Industriepark Höchst, D-65926 Frankfurt am Main, Germany
| | - Sven Ruf
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Industriepark Höchst, D-65926 Frankfurt am Main, Germany
| | - Gerhard Hessler
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Industriepark Höchst, D-65926 Frankfurt am Main, Germany
| |
Collapse
|
42
|
Smit IA, Afzal AM, Allen CHG, Svensson F, Hanser T, Bender A. Systematic Analysis of Protein Targets Associated with Adverse Events of Drugs from Clinical Trials and Postmarketing Reports. Chem Res Toxicol 2020; 34:365-384. [PMID: 33351593 DOI: 10.1021/acs.chemrestox.0c00294] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Adverse drug reactions (ADRs) are undesired effects of medicines that can harm patients and are a significant source of attrition in drug development. ADRs are anticipated by routinely screening drugs against secondary pharmacology protein panels. However, there is still a lack of quantitative information on the links between these off-target proteins and the reporting of ADRs in humans. Here, we present a systematic analysis of associations between measured and predicted in vitro bioactivities of drugs and adverse events (AEs) in humans from two sources of data: the Side Effect Resource, derived from clinical trials, and the Food and Drug Administration Adverse Event Reporting System, derived from postmarketing surveillance. The ratio of a drug's therapeutic unbound plasma concentration over the drug's in vitro potency against a given protein was used to select proteins most likely to be relevant to in vivo effects. In examining individual target bioactivities as predictors of AEs, we found a trade-off between the positive predictive value and the fraction of drugs with AEs that can be detected. However, considering sets of multiple targets for the same AE can help identify a greater fraction of AE-associated drugs. Of the 45 targets with statistically significant associations to AEs, 30 are included on existing safety target panels. The remaining 15 targets include 9 carbonic anhydrases, of which CA5B is significantly associated with cholestatic jaundice. We include the full quantitative data on associations between measured and predicted in vitro bioactivities and AEs in humans in this work, which can be used to make a more informed selection of safety profiling targets.
Collapse
Affiliation(s)
- Ines A Smit
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Avid M Afzal
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Chad H G Allen
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Fredrik Svensson
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Thierry Hanser
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds LS11 5PS, United Kingdom
| | - Andreas Bender
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
43
|
Bender A, Cortés-Ciriano I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet. Drug Discov Today 2020; 26:511-524. [PMID: 33346134 DOI: 10.1016/j.drudis.2020.12.009] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/07/2020] [Accepted: 12/11/2020] [Indexed: 12/30/2022]
Abstract
Although artificial intelligence (AI) has had a profound impact on areas such as image recognition, comparable advances in drug discovery are rare. This article quantifies the stages of drug discovery in which improvements in the time taken, success rate or affordability will have the most profound overall impact on bringing new drugs to market. Changes in clinical success rates will have the most profound impact on improving success in drug discovery; in other words, the quality of decisions regarding which compound to take forward (and how to conduct clinical trials) are more important than speed or cost. Although current advances in AI focus on how to make a given compound, the question of which compound to make, using clinical efficacy and safety-related end points, has received significantly less attention. As a consequence, current proxy measures and available data cannot fully utilize the potential of AI in drug discovery, in particular when it comes to drug efficacy and safety in vivo. Thus, addressing the questions of which data to generate and which end points to model will be key to improving clinically relevant decision-making in the future.
Collapse
Affiliation(s)
- Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road CB2 1EW, UK; Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Isidro Cortés-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK.
| |
Collapse
|
44
|
Marchand JR, Knehans T, Caflisch A, Vitalis A. An ABSINTH-Based Protocol for Predicting Binding Affinities between Proteins and Small Molecules. J Chem Inf Model 2020; 60:5188-5202. [PMID: 32897071 DOI: 10.1021/acs.jcim.0c00558] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The core task in computational drug discovery is to accurately predict binding free energies in receptor-ligand systems for large libraries of putative binders. Here, the ABSINTH implicit solvent model and force field are extended to describe small, organic molecules and their interactions with proteins. We show that an automatic pipeline based on partitioning arbitrary molecules into substructures corresponding to model compounds with known free energies of solvation can be combined with the CHARMM general force field into a method that is successful at the two important challenges a scoring function faces in virtual screening work flows: it ranks known binders with correlation values rivaling that of comparable state-of-the-art methods and it enriches true binders in a set of decoys. Our protocol introduces innovative modifications to common virtual screening workflows, notably the use of explicit ions as competitors and the integration over multiple protein and ligand species differing in their protonation states. We demonstrate the value of modifications to both the protocol and ABSINTH itself. We conclude by discussing the limitations of high-throughput implicit methods such as the one proposed here.
Collapse
Affiliation(s)
- Jean-Rémy Marchand
- Department of Biochemistry, University of Zürich, CH 8057 Zürich, Switzerland
| | - Tim Knehans
- Department of Biochemistry, University of Zürich, CH 8057 Zürich, Switzerland
| | - Amedeo Caflisch
- Department of Biochemistry, University of Zürich, CH 8057 Zürich, Switzerland
| | - Andreas Vitalis
- Department of Biochemistry, University of Zürich, CH 8057 Zürich, Switzerland
| |
Collapse
|
45
|
Stecula A, Hussain MS, Viola RE. Discovery of Novel Inhibitors of a Critical Brain Enzyme Using a Homology Model and a Deep Convolutional Neural Network. J Med Chem 2020; 63:8867-8875. [DOI: 10.1021/acs.jmedchem.0c00473] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Adrian Stecula
- Atomwise Inc., San Francisco, California 94103, United States
| | - Muhammad S. Hussain
- Department of Chemistry and Biochemistry, University of Toledo, Toledo, Ohio 43606, United States
| | - Ronald E. Viola
- Department of Chemistry and Biochemistry, University of Toledo, Toledo, Ohio 43606, United States
| |
Collapse
|
46
|
Watson OP, Cortes-Ciriano I, Taylor AR, Watson JA. A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery. Bioinformatics 2020; 35:4656-4663. [PMID: 31070704 PMCID: PMC6853675 DOI: 10.1093/bioinformatics/btz293] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 03/22/2019] [Accepted: 04/17/2019] [Indexed: 02/07/2023] Open
Abstract
Motivation Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure-activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs. Results The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and ‘memorize’ the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand. Availability and implementation All software and data are available as Jupyter notebooks found at https://github.com/owatson/QuantileBootstrap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Isidro Cortes-Ciriano
- Goring on Thames, Evariste Technologies Ltd., RG8 9AL UK.,Department of Chemistry, Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Aimee R Taylor
- Department of Epidemiology, Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, MA 02115 USA.,Infectious Disease Microbiome Program, Broad Institute, Cambridge, MA 02142 USA
| | - James A Watson
- Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of Oxford, Oxford OX3, 7LF UK.,Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| |
Collapse
|
47
|
Burggraaff L, van Vlijmen HWT, IJzerman AP, van Westen GJP. Quantitative prediction of selectivity between the A 1 and A 2A adenosine receptors. J Cheminform 2020; 12:33. [PMID: 33431012 PMCID: PMC7222572 DOI: 10.1186/s13321-020-00438-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 05/04/2020] [Indexed: 11/10/2022] Open
Abstract
The development of drugs is often hampered due to off-target interactions leading to adverse effects. Therefore, computational methods to assess the selectivity of ligands are of high interest. Currently, selectivity is often deduced from bioactivity predictions of a ligand for multiple targets (individual machine learning models). Here we show that modeling selectivity directly, by using the affinity difference between two drug targets as output value, leads to more accurate selectivity predictions. We test multiple approaches on a dataset consisting of ligands for the A1 and A2A adenosine receptors (among others classification, regression, and we define different selectivity classes). Finally, we present a regression model that predicts selectivity between these two drug targets by directly training on the difference in bioactivity, modeling the selectivity-window. The quality of this model was good as shown by the performances for fivefold cross-validation: ROC A1AR-selective 0.88 ± 0.04 and ROC A2AAR-selective 0.80 ± 0.07. To increase the accuracy of this selectivity model even further, inactive compounds were identified and removed prior to selectivity prediction by a combination of statistical models and structure-based docking. As a result, selectivity between the A1 and A2A adenosine receptors was predicted effectively using the selectivity-window model. The approach presented here can be readily applied to other selectivity cases.
Collapse
Affiliation(s)
- Lindsey Burggraaff
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Herman W T van Vlijmen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Janssen Research & Development, Turnhoutseweg 30, Beerse, Belgium
| | - Adriaan P IJzerman
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.
| |
Collapse
|
48
|
Awale M, Riniker S, Kramer C. Matched Molecular Series Analysis for ADME Property Prediction. J Chem Inf Model 2020; 60:2903-2914. [PMID: 32369360 DOI: 10.1021/acs.jcim.0c00269] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Generation and prioritization of new molecules are the most central part of the drug design process. Matched molecular series analysis (MMSA) has recently been proposed as a formal approach that captures both of these key elements of design. In order to better understand the power of MMSA and its specific limitations, we here evaluate its performance as an ADME property prediction tool. We use four large and diverse inhouse data sets, logD, microsomal clearance, CYP2C9, and CYP3A4 inhibition. MMSA follows the concept of parallel structure-activity relationship (SAR), where if two identical substituent series on different scaffolds show similarity in their property profiles, SAR from one series can be transferred to the other series. We test four different similarity metrics to identify pairs of molecular series where information can be transferred. We find that the best prediction performance is achieved by a combination of centered root-mean-square deviation (cRMSD) and a network score approach previously published by Keefer et al. However, cRMSD alone strikes the best balance between accuracy and the number of predictions that can be made. We identify statistical metrics that allow estimating when MMSA predictions will work, similar to the well-known applicability domain concept in machine learning. MMSA achieves a prediction accuracy that is comparable to a standard machine-learning model and matched molecular pair analysis. In contrast to machine learning, however, it is very easy to understand where MMSA predictions are coming from. Finally, to prospectively test the power of MMSA, we retested compounds that were strong outliers in the initial predictions and show how the MMSA model can help to identify erroneous data points.
Collapse
Affiliation(s)
- Mahendra Awale
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Christian Kramer
- Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland
| |
Collapse
|
49
|
Sheridan RP, Karnachi P, Tudor M, Xu Y, Liaw A, Shah F, Cheng AC, Joshi E, Glick M, Alvarez J. Experimental Error, Kurtosis, Activity Cliffs, and Methodology: What Limits the Predictivity of Quantitative Structure-Activity Relationship Models? J Chem Inf Model 2020; 60:1969-1982. [PMID: 32207612 DOI: 10.1021/acs.jcim.9b01067] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Given a particular descriptor/method combination, some quantitative structure-activity relationship (QSAR) datasets are very predictive by random-split cross-validation while others are not. Recent literature in modelability suggests that the limiting issue for predictivity is in the data, not the QSAR methodology, and the limits are due to activity cliffs. Here, we investigate, on in-house data, the relative usefulness of experimental error, distribution of the activities, and activity cliff metrics in determining how predictive a dataset is likely to be. We include unmodified in-house datasets, datasets that should be perfectly predictive based only on the chemical structure, datasets where the distribution of activities is manipulated, and datasets that include a known amount of added noise. We find that activity cliff metrics determine predictivity better than the other metrics we investigated, whatever the type of dataset, consistent with the modelability literature. However, such metrics cannot distinguish real activity cliffs due to large uncertainties in the activities. We also show that a number of modern QSAR methods, and some alternative descriptors, are equally bad at predicting the activities of compounds on activity cliffs, consistent with the assumptions behind "modelability." Finally, we relate time-split predictivity with random-split predictivity and show that different coverages of chemical space are at least as important as uncertainty in activity and/or activity cliffs in limiting predictivity.
Collapse
Affiliation(s)
- Robert P Sheridan
- Computational and Structural Chemistry, Merck & Company Inc., Kenilworth, New Jersey 07033, United States
| | - Prabha Karnachi
- Computational and Structural Chemistry, Merck & Company Inc., Kenilworth, New Jersey 07033, United States
| | - Matthew Tudor
- Computational and Structural Chemistry, Merck & Company Inc., West Point, Pennsylvania 19486, United States
| | - Yuting Xu
- Biometrics Research, Merck & Company Inc., Rahway, New Jersey 07065, United States
| | - Andy Liaw
- Biometrics Research, Merck & Company Inc., Rahway, New Jersey 07065, United States
| | - Falgun Shah
- Computational and Structural Chemistry, Merck & Company Inc., West Point, Pennsylvania 19486, United States
| | - Alan C Cheng
- Computational and Structural Chemistry, Merck & Company Inc., South San Francisco, California 94080, United States
| | - Elizabeth Joshi
- Pharmacokinetics, Pharmacodynamics & Drug Metabolism, Merck & Company Inc., West Point, Pennsylvania 19486, United States
| | - Meir Glick
- Computational and Structural Chemistry, Merck & Company Inc., Boston, Massachusetts 02115, United States
| | - Juan Alvarez
- Computational and Structural Chemistry, Merck & Company Inc., Boston, Massachusetts 02115, United States
| |
Collapse
|
50
|
Drakakis G, Cortés-Ciriano I, Alexander-Dann B, Bender A. Elucidating Compound Mechanism of Action and Predicting Cytotoxicity Using Machine Learning Approaches, Taking Prediction Confidence into Account. ACTA ACUST UNITED AC 2020; 11:e73. [PMID: 31483099 DOI: 10.1002/cpch.73] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The modes of action (MoAs) of drugs frequently are unknown, because many are small molecules initially identified from phenotypic screens, giving rise to the need to elucidate their MoAs. In addition, the high attrition rate for candidate drugs in preclinical studies due to intolerable toxicity has motivated the development of computational approaches to predict drug candidate (cyto)toxicity as early as possible in the drug-discovery process. Here, we provide detailed instructions for capitalizing on bioactivity predictions to elucidate the MoAs of small molecules and infer their underlying phenotypic effects. We illustrate how these predictions can be used to infer the underlying antidepressive effects of marketed drugs. We also provide the necessary functionalities to model cytotoxicity data using single and ensemble machine-learning algorithms. Finally, we give detailed instructions on how to calculate confidence intervals for individual predictions using the conformal prediction framework. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Georgios Drakakis
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Ben Alexander-Dann
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|