1
|
Gangwal A, Lavecchia A. Unleashing the power of generative AI in drug discovery. Drug Discov Today 2024; 29:103992. [PMID: 38663579 DOI: 10.1016/j.drudis.2024.103992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 03/22/2024] [Accepted: 04/18/2024] [Indexed: 05/04/2024]
Abstract
Artificial intelligence (AI) is revolutionizing drug discovery by enhancing precision, reducing timelines and costs, and enabling AI-driven computer-aided drug design. This review focuses on recent advancements in deep generative models (DGMs) for de novo drug design, exploring diverse algorithms and their profound impact. It critically analyses the challenges that are intricately interwoven into these technologies, proposing strategies to unlock their full potential. It features case studies of both successes and failures in advancing drugs to clinical trials with AI assistance. Last, it outlines a forward-looking plan for optimizing DGMs in de novo drug design, thereby fostering faster and more cost-effective drug development.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule 424001, Maharashtra, India
| | - Antonio Lavecchia
- "Drug Discovery" Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy.
| |
Collapse
|
2
|
van Tilborg D, Brinkmann H, Criscuolo E, Rossen L, Özçelik R, Grisoni F. Deep learning for low-data drug discovery: Hurdles and opportunities. Curr Opin Struct Biol 2024; 86:102818. [PMID: 38669740 DOI: 10.1016/j.sbi.2024.102818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024]
Abstract
Deep learning is becoming increasingly relevant in drug discovery, from de novo design to protein structure prediction and synthesis planning. However, it is often challenged by the small data regimes typical of certain drug discovery tasks. In such scenarios, deep learning approaches-which are notoriously 'data-hungry'-might fail to live up to their promise. Developing novel approaches to leverage the power of deep learning in low-data scenarios is sparking great attention, and future developments are expected to propel the field further. This mini-review provides an overview of recent low-data-learning approaches in drug discovery, analyzing their hurdles and advantages. Finally, we venture to provide a forecast of future research directions in low-data learning for drug discovery.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/DerekvTilborg
| | - Helena Brinkmann
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/hlnbrkmnn
| | - Emanuele Criscuolo
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/emanuelecriscu9
| | - Luke Rossen
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/molecular_ml
| | - Rıza Özçelik
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/Rza_ozcelik
| | - Francesca Grisoni
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands.
| |
Collapse
|
3
|
Arora S, Chettri S, Percha V, Kumar D, Latwal M. Artifical intelligence: a virtual chemist for natural product drug discovery. J Biomol Struct Dyn 2024; 42:3826-3835. [PMID: 37232451 DOI: 10.1080/07391102.2023.2216295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023]
Abstract
Nature is full of a bundle of medicinal substances and its product perceived as a prerogative structure to collaborate with protein drug targets. The natural product's (NPs) structure heterogeneity and eccentric characteristics inspired scientists to work on natural product-inspired medicine. To gear NP drug-finding artificial intelligence (AI) to confront and excavate unexplored opportunities. Natural product-inspired drug discoveries based on AI to act as an innovative tool for molecular design and lead discovery. Various models of machine learning produce quickly synthesizable mimetics of the natural products templates. The invention of novel natural products mimetics by computer-assisted technology provides a feasible strategy to get the natural product with defined bio-activities. AI's hit rate makes its high importance by improving trail patterns such as dose selection, trail life span, efficacy parameters, and biomarkers. Along these lines, AI methods can be a successful tool in a targeted way to formulate advanced medicinal applications for natural products. 'Prediction of future of natural product based drug discovery is not magic, actually its artificial intelligence'Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Shefali Arora
- Department of Chemistry, University of Petroleum and Energy Studies, Dehradun, Uttarakhand, India
| | - Sukanya Chettri
- Department of Chemistry, University of Petroleum and Energy Studies, Dehradun, Uttarakhand, India
| | - Versha Percha
- Department of Pharmaceutical Chemistry, Dolphin(PG) Institute of Biomedical and Natural Sciences, Dehradun, Uttarakhand, India
| | - Deepak Kumar
- Department of Pharmaceutical Chemistry, Dolphin(PG) Institute of Biomedical and Natural Sciences, Dehradun, Uttarakhand, India
| | - Mamta Latwal
- Department of Chemistry, University of Petroleum and Energy Studies, Dehradun, Uttarakhand, India
| |
Collapse
|
4
|
Kong Y, Zhou C, Tan D, Xu X, Li Z, Cheng J. Discovery of Potential Neonicotinoid Insecticides by an Artificial Intelligence Generative Model and Structure-Based Virtual Screening. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:5145-5152. [PMID: 38419506 DOI: 10.1021/acs.jafc.3c06895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
The identification of neonicotinoid insecticides bearing novel scaffolds is of great importance for pesticide discovery. Here, artificial intelligence-based tools and virtual screening strategy were integrated to discover potential leads of neonicotinoid insecticides. A deep generative model was successfully constructed using a recurrent neural network combined with transfer learning. The model evaluation showed that the pretrained model could accurately grasp the SMILES grammar of drug-like molecules and generate potential neonicotinoid compounds after transfer learning. The generated molecules were evaluated by hierarchical virtual screening, hits were subjected to a similarity search, and the most similar structures were purchased for the bioassay. Compounds A2 and A5 displayed 52.5 and 50.3% mortality rates against Aphis craccivora at 100 mg/L, respectively. The docking study indicated that these two compounds have similar binding modes to neonicotinoids, which were verified by further molecular dynamics simulations.
Collapse
Affiliation(s)
- Yijin Kong
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Cong Zhou
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Du Tan
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Xiaoyong Xu
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zhong Li
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Jiagao Cheng
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
5
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024; 15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open
Abstract
There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Azim Ansari
- Computer Aided Drug Design Center Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Dhule, India
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, Malaysia
| | - Vinoth Kumarasamy
- Department of Parasitology and Medical Entomology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Cheras, Malaysia
| | - Vetriselvan Subramaniyan
- Pharmacology Unit, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
- School of Bioengineering and Biosciences, Lovely Professional University, Phagwara, Punjab, India
| | - Ling Shing Wong
- Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia
| |
Collapse
|
6
|
Matos GDR, Pak S, Rizzo RC. Descriptor-Driven de Novo Design Algorithms for DOCK6 Using RDKit. J Chem Inf Model 2023; 63:5803-5822. [PMID: 37698425 PMCID: PMC10694857 DOI: 10.1021/acs.jcim.3c01031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2023]
Abstract
Structure-based methods that employ principles of de novo design can be used to construct small organic molecules from scratch using pre-existing fragment libraries to sample chemical space and are an important class of computational algorithms for drug-lead discovery. Here, we present a powerful new design method for DOCK6 that employs a Descriptor-Driven De Novo strategy (termed D3N) in which user-defined cheminformatics descriptors (and their target ranges) are calculated at each layer of growth using the open-source toolkit RDKit. The objective is to tailor ligand growth toward desirable regions of chemical space. The approach was extensively validated through: (1) comparison of cheminformatics descriptors computed using the new DOCK6/RDKit interface versus the standard Python/RDKit installation, (2) examination of descriptor distributions generated using D3N growth under different conditions (target ranges and environments), and (3) construction of ligands with very tight (pinpoint) descriptor ranges using clinically relevant compounds as a reference. Our testing confirms that the new DOCK6/RDKit integration is robust, showcases how the new D3N routines can be used to direct sampling around user-defined chemical spaces, and highlights the utility of on-the-fly descriptor calculations for ligand design to important drug targets.
Collapse
Affiliation(s)
- Guilherme Duarte Ramos Matos
- Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, New York 11794, USA
- Instituto de Química, Universidade de Brasília, Brasília, Distrito Federal, 70910-900, Brazil
| | - Steven Pak
- Department of Pharmacological Sciences, Stony Brook University, Stony Brook, New York, 11794, USA
| | - Robert C. Rizzo
- Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, New York 11794, USA
- Institute of Chemical Biology & Drug Discovery, Stony Brook University, Stony Brook, New York 11794, USA
- Laufer Center for Physical & Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, USA
| |
Collapse
|
7
|
Bae B, Bae H, Nam H. LOGICS: Learning optimal generative distribution for designing de novo chemical structures. J Cheminform 2023; 15:77. [PMID: 37674239 PMCID: PMC10483765 DOI: 10.1186/s13321-023-00747-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023] Open
Abstract
In recent years, the field of computational drug design has made significant strides in the development of artificial intelligence (AI) models for the generation of de novo chemical compounds with desired properties and biological activities, such as enhanced binding affinity to target proteins. These high-affinity compounds have the potential to be developed into more potent therapeutics for a broad spectrum of diseases. Due to the lack of data required for the training of deep generative models, however, some of these approaches have fine-tuned their molecular generators using data obtained from a separate predictor. While these studies show that generative models can produce structures with the desired target properties, it remains unclear whether the diversity of the generated structures and the span of their chemical space align with the distribution of the intended target molecules. In this study, we present a novel generative framework, LOGICS, a framework for Learning Optimal Generative distribution Iteratively for designing target-focused Chemical Structures. We address the exploration-exploitation dilemma, which weighs the choice between exploring new options and exploiting current knowledge. To tackle this issue, we incorporate experience memory and employ a layered tournament selection approach to refine the fine-tuning process. The proposed method was applied to the binding affinity optimization of two target proteins of different protein classes, κ-opioid receptors, and PIK3CA, and the quality and the distribution of the generative molecules were evaluated. The results showed that LOGICS outperforms competing state-of-the-art models and generates more diverse de novo chemical structures with optimized properties. The source code is available at the GitHub repository ( https://github.com/GIST-CSBL/LOGICS ).
Collapse
Affiliation(s)
- Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Haelee Bae
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- Center for AI-Applied High Efficiency Drug Discovery (AHEDD), Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
8
|
Chenthamarakshan V, Hoffman SC, Owen CD, Lukacik P, Strain-Damerell C, Fearon D, Malla TR, Tumber A, Schofield CJ, Duyvesteyn HME, Dejnirattisai W, Carrique L, Walter TS, Screaton GR, Matviiuk T, Mojsilovic A, Crain J, Walsh MA, Stuart DI, Das P. Accelerating drug target inhibitor discovery with a deep generative foundation model. SCIENCE ADVANCES 2023; 9:eadg7865. [PMID: 37343087 DOI: 10.1126/sciadv.adg7865] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 05/17/2023] [Indexed: 06/23/2023]
Abstract
Inhibitor discovery for emerging drug-target proteins is challenging, especially when target structure or active molecules are unknown. Here, we experimentally validate the broad utility of a deep generative framework trained at-scale on protein sequences, small molecules, and their mutual interactions-unbiased toward any specific target. We performed a protein sequence-conditioned sampling on the generative foundation model to design small-molecule inhibitors for two dissimilar targets: the spike protein receptor-binding domain (RBD) and the main protease from SARS-CoV-2. Despite using only the target sequence information during the model inference, micromolar-level inhibition was observed in vitro for two candidates out of four synthesized for each target. The most potent spike RBD inhibitor exhibited activity against several variants in live virus neutralization assays. These results establish that a single, broadly deployable generative foundation model for accelerated inhibitor discovery is effective and efficient, even in the absence of target structure or binder information.
Collapse
Affiliation(s)
| | - Samuel C Hoffman
- IBM Research, Thomas J. Watson Research Center, Yorktown Heights, New York, NY, USA
| | - C David Owen
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - Petra Lukacik
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - Claire Strain-Damerell
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - Daren Fearon
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - Tika R Malla
- Chemistry Research Laboratory, Department of Chemistry and the Ineos Oxford Institute for Antimicrobial Research, University of Oxford, 12 Mansfield Road, OX1 3TA Oxford, UK
| | - Anthony Tumber
- Chemistry Research Laboratory, Department of Chemistry and the Ineos Oxford Institute for Antimicrobial Research, University of Oxford, 12 Mansfield Road, OX1 3TA Oxford, UK
| | - Christopher J Schofield
- Chemistry Research Laboratory, Department of Chemistry and the Ineos Oxford Institute for Antimicrobial Research, University of Oxford, 12 Mansfield Road, OX1 3TA Oxford, UK
| | - Helen M E Duyvesteyn
- Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
| | - Wanwisa Dejnirattisai
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Loic Carrique
- Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
| | - Thomas S Walter
- Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
| | - Gavin R Screaton
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | | | | | - Jason Crain
- IBM Research Europe, Hartree Centre, Daresbury WA4 4AD, UK
- Department of Biochemistry, University of Oxford, Oxford OX1 3QU, UK
| | - Martin A Walsh
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - David I Stuart
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
| | - Payel Das
- IBM Research, Thomas J. Watson Research Center, Yorktown Heights, New York, NY, USA
| |
Collapse
|
9
|
Ballarotto M, Willems S, Stiller T, Nawa F, Marschner JA, Grisoni F, Merk D. De Novo Design of Nurr1 Agonists via Fragment-Augmented Generative Deep Learning in Low-Data Regime. J Med Chem 2023. [PMID: 37256819 DOI: 10.1021/acs.jmedchem.3c00485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Generative neural networks trained on SMILES can design innovative bioactive molecules de novo. These so-called chemical language models (CLMs) have typically been trained on tens of template molecules for fine-tuning. However, it is challenging to apply CLM to orphan targets with few known ligands. We have fine-tuned a CLM with a single potent Nurr1 agonist as template in a fragment-augmented fashion and obtained novel Nurr1 agonists using sampling frequency for design prioritization. Nanomolar potency and binding affinity of the top-ranking design and its structural novelty compared to available Nurr1 ligands highlight its value as an early chemical tool and as a lead for Nurr1 agonist development, as well as the applicability of CLM in very low-data scenarios.
Collapse
Affiliation(s)
- Marco Ballarotto
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
- Department of Pharmaceutical Sciences, Università degli Studi di Perugia, 06123 Perugia, Italy
| | - Sabine Willems
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Tanja Stiller
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Felix Nawa
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Julian A Marschner
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, 5612AZ Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, 3584CB Utrecht, The Netherlands
| | - Daniel Merk
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| |
Collapse
|
10
|
Moret M, Pachon Angona I, Cotos L, Yan S, Atz K, Brunner C, Baumgartner M, Grisoni F, Schneider G. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat Commun 2023; 14:114. [PMID: 36611029 PMCID: PMC9825622 DOI: 10.1038/s41467-022-35692-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 12/19/2022] [Indexed: 01/09/2023] Open
Abstract
Generative chemical language models (CLMs) can be used for de novo molecular structure generation by learning from a textual representation of molecules. Here, we show that hybrid CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), a collection of virtual molecules was created with a generative CLM. This virtual compound library was refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ ligands. Several of the computer-generated molecular designs were commercially available, enabling fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified, highlighting the method's scaffold-hopping potential. Chemical synthesis and biochemical testing of two of the top-ranked de novo designed molecules and their derivatives corroborated the model's ability to generate PI3Kγ ligands with medium to low nanomolar activity for hit-to-lead expansion. The most potent compounds led to pronounced inhibition of PI3K-dependent Akt phosphorylation in a medulloblastoma cell model, demonstrating efficacy of PI3Kγ ligands in PI3K/Akt pathway repression in human tumor cells. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design.
Collapse
Affiliation(s)
- Michael Moret
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Irene Pachon Angona
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Leandro Cotos
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Shen Yan
- University of Zurich, University Children's Hospital, Children's Research Center, Pediatric Molecular Neuro-Oncology Research, Lengghalde 5, 8008, Zurich, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Cyrill Brunner
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Martin Baumgartner
- University of Zurich, University Children's Hospital, Children's Research Center, Pediatric Molecular Neuro-Oncology Research, Lengghalde 5, 8008, Zurich, Switzerland
| | - Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland. .,Eindhoven University of Technology, Institute for Complex Molecular Systems and Eindhoven Artificial Intelligence Systems Institute, Department of Biomedical Engineering, Groene Loper 7, 5612AZ, Eindhoven, The Netherlands. .,Center for 393 Living Technologies, Alliance TU/e, WUR, UU, UMC 394 Utrecht, Utrecht, 3584 CB, The Netherlands.
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland. .,ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 138602, Singapore.
| |
Collapse
|
11
|
Ogawa K, Sakamoto D, Hosoki R. Computer Science Technology in Natural Products Research: A Review of Its Applications and Implications. Chem Pharm Bull (Tokyo) 2023; 71:486-494. [PMID: 37394596 DOI: 10.1248/cpb.c23-00039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Computational approaches to drug development are rapidly growing in popularity and have been used to produce significant results. Recent developments in information science have expanded databases and chemical informatics knowledge relating to natural products. Natural products have long been well-studied, and a large number of unique structures and remarkable active substances have been reported. Analyzing accumulated natural product knowledge using emerging computational science techniques is expected to yield more new discoveries. In this article, we discuss the current state of natural product research using machine learning. The basic concepts and frameworks of machine learning are summarized. Natural product research that utilizes machine learning is described in terms of the exploration of active compounds, automatic compound design, and application to spectral data. In addition, efforts to develop drugs for intractable diseases will be addressed. Lastly, we discuss key considerations for applying machine learning in this field. This paper aims to promote progress in natural product research by presenting the current state of computational science and chemoinformatics approaches in terms of its applications, strengths, limitations, and implications for the field.
Collapse
Affiliation(s)
- Keiko Ogawa
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| | - Daiki Sakamoto
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| | - Rumiko Hosoki
- Laboratory of Regulatory Science, College of Pharmaceutical Sciences, Ritsumeikan University
| |
Collapse
|
12
|
Kumar R, Sharma A, Alexiou A, Ashraf GM. Artificial Intelligence in De novo Drug Design: Are We Still There? Curr Top Med Chem 2022; 22:2483-2492. [PMID: 36263480 DOI: 10.2174/1568026623666221017143244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 01/20/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-assisted design of drug candidates with novel structures and desired properties has received significant attention in the recent past, so related areas of forward prediction that aim to discover chemical matters worth synthesizing and further experimental investigation. OBJECTIVES The purpose behind developing AI-driven models is to explore the broader chemical space and suggest new drug candidate scaffolds with promising therapeutic value. Moreover, it is anticipated that such AI-based models may not only significantly reduce the cost and time but also decrease the attrition rate of drug candidates that fail to reach the desirable endpoints at the final stages of drug development. In an attempt to develop AI-based models for de novo drug design, numerous methods have been proposed by various study groups by applying machine learning and deep learning algorithms to chemical datasets. However, there are many challenges in obtaining accurate predictions, and real breakthroughs in de novo drug design are still scarce. METHODS In this review, we explore the recent trends in developing AI-based models for de novo drug design to assess the current status, challenges, and opportunities in the field. CONCLUSION The consistently improved AI algorithms and the abundance of curated training chemical data indicate that AI-based de novo drug design should perform better than the current models. Improvements in the performance are warranted to obtain better outcomes in the form of potential drug candidates, which can perform well in in vivo conditions, especially in the case of more complex diseases.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Uttar Pradesh, India
| | - Anju Sharma
- Department of Applied Science, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
| | - Athanasios Alexiou
- Novel Global Community Educational Foundation, Hebersham, 2770 NSW, Australia.,AFNP Med Austria, 1010 Wien, Austria
| | - Ghulam Md Ashraf
- Pre-Clinical Research Unit (PCRU), King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia.,Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
13
|
Korshunova M, Huang N, Capuzzi S, Radchenko DS, Savych O, Moroz YS, Wells CI, Willson TM, Tropsha A, Isayev O. Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds. Commun Chem 2022; 5:129. [PMID: 36697952 PMCID: PMC9814657 DOI: 10.1038/s42004-022-00733-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Accepted: 09/12/2022] [Indexed: 01/28/2023] Open
Abstract
Deep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties. Many deep learning approaches employ reinforcement learning for optimizing the target properties of the generated molecules. However, the success of this approach is often hampered by the problem of sparse rewards as the majority of the generated molecules are expectedly predicted as inactives. We propose several technical innovations to address this problem and improve the balance between exploration and exploitation modes in reinforcement learning. In a proof-of-concept study, we demonstrate the application of the deep generative recurrent neural network architecture enhanced by several proposed technical tricks to design inhibitors of the epidermal growth factor (EGFR) and further experimentally validate their potency. The proposed technical solutions are expected to substantially improve the success rate of finding novel bioactive compounds for specific biological targets using generative and reinforcement learning approaches.
Collapse
Affiliation(s)
- Maria Korshunova
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA, USA. .,Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Niles Huang
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Stephen Capuzzi
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dmytro S Radchenko
- Enamine Ltd, 78 Chervonotkatska Street, Kyiv, 02094, Ukraine.,Taras Shevchenko National University of Kyiv, Volodymyrska Street 60, Kyiv, 01601, Ukraine
| | - Olena Savych
- Enamine Ltd, 78 Chervonotkatska Street, Kyiv, 02094, Ukraine
| | - Yuriy S Moroz
- Taras Shevchenko National University of Kyiv, Volodymyrska Street 60, Kyiv, 01601, Ukraine.,Chemspace LLC, Chervonotkatska Street 85, Suite 1, Kyiv, 02094, Ukraine
| | - Carrow I Wells
- Structual Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Timothy M Willson
- Structual Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA, USA. .,Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
14
|
Li C, Wang C, Sun M, Zeng Y, Yuan Y, Gou Q, Wang G, Guo Y, Pu X. Correlated RNN Framework to Quickly Generate Molecules with Desired Properties for Energetic Materials in the Low Data Regime. J Chem Inf Model 2022; 62:4873-4887. [PMID: 35998331 DOI: 10.1021/acs.jcim.2c00997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Motivated by the challenging of deep learning on the low data regime and the urgent demand for intelligent design on highly energetic materials, we explore a correlated deep learning framework, which consists of three recurrent neural networks (RNNs) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with a high detonation velocity in the case of very limited data available. To avoid the dependence on the external big data set, data augmentation by fragment shuffling of 303 energetic compounds is utilized to produce 500,000 molecules to pretrain RNN, through which the model can learn sufficient structure knowledge. Then the pretrained RNN is fine-tuned by focusing on the 303 energetic compounds to generate 7153 molecules similar to the energetic compounds. In order to more reliably screen the molecules with a high detonation velocity, the SMILE enumeration augmentation coupled with the pretrained knowledge is utilized to build an RNN-based prediction model, through which R2 is boosted from 0.4446 to 0.9572. The comparable performance with the transfer learning strategy based on an existing big database (ChEMBL) to produce the energetic molecules and drug-like ones further supports the effectiveness and generality of our strategy in the low data regime. High-precision quantum mechanics calculations further confirm that 35 new molecules present a higher detonation velocity and lower synthetic accessibility than the classic explosive RDX, along with good thermal stability. In particular, three new molecules are comparable to caged CL-20 in the detonation velocity. All the source codes and the data set are freely available at https://github.com/wangchenghuidream/RNNMGM.
Collapse
Affiliation(s)
- Chuan Li
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Chenghui Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Ming Sun
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yan Zeng
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yuan Yuan
- College of Management, Southwest University for Nationalities, Chengdu 610041, China
| | - Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Guangchuan Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
15
|
Bung N, Krishnan SR, Roy A. An In Silico Explainable Multiparameter Optimization Approach for De Novo Drug Design against Proteins from the Central Nervous System. J Chem Inf Model 2022; 62:2685-2695. [PMID: 35581002 DOI: 10.1021/acs.jcim.2c00462] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The aim of drug design and development is to produce a drug that can inhibit the target protein and possess a balanced physicochemical and toxicity profile. Traditionally, this is a multistep process where different parameters such as activity and physicochemical and pharmacokinetic properties are optimized sequentially, which often leads to high attrition rate during later stages of drug design and development. We have developed a deep learning-based de novo drug design method that can design novel small molecules by optimizing target specificity as well as multiple parameters (including late-stage parameters) in a single step. All possible combinations of parameters were optimized to understand the effect of each parameter over the other parameters. An explainable predictive model was used to identify the molecular fragments responsible for the property being optimized. The proposed method was applied against the human 5-hydroxy tryptamine receptor 1B (5-HT1B), a protein from the central nervous system (CNS). Various physicochemical properties specific to CNS drugs were considered along with the target specificity and blood-brain barrier permeability (BBBP), which act as an additional challenge for CNS drug delivery. The contribution of each parameter toward molecule design was identified by analyzing the properties of generated small molecules from optimization of all possible parameter combinations. The final optimized generative model was able to design similar inhibitors compared to known inhibitors of 5-HT1B. In addition, the functional groups of the generated small molecules that guide the BBBP predictive model were identified through feature attribution techniques.
Collapse
Affiliation(s)
- Navneet Bung
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad 500081, India
| | | | - Arijit Roy
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad 500081, India
| |
Collapse
|
16
|
Tao Xue H, Stanley-Baker M, Wai Kin Kong A, Leung Li H, Wen Bin Goh W. Data considerations for predictive modeling applied to the discovery of bioactive natural products. Drug Discov Today 2022; 27:2235-2243. [DOI: 10.1016/j.drudis.2022.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 03/21/2022] [Accepted: 05/10/2022] [Indexed: 11/29/2022]
|
17
|
A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics. Molecules 2022; 27:molecules27082513. [PMID: 35458710 PMCID: PMC9028877 DOI: 10.3390/molecules27082513] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 03/31/2022] [Accepted: 04/10/2022] [Indexed: 02/01/2023] Open
Abstract
Publicly available compound and bioactivity databases provide an essential basis for data-driven applications in life-science research and drug design. By analyzing several bioactivity repositories, we discovered differences in compound and target coverage advocating the combined use of data from multiple sources. Using data from ChEMBL, PubChem, IUPHAR/BPS, BindingDB, and Probes & Drugs, we assembled a consensus dataset focusing on small molecules with bioactivity on human macromolecular targets. This allowed an improved coverage of compound space and targets, and an automated comparison and curation of structural and bioactivity data to reveal potentially erroneous entries and increase confidence. The consensus dataset comprised of more than 1.1 million compounds with over 10.9 million bioactivity data points with annotations on assay type and bioactivity confidence, providing a useful ensemble for computational applications in drug design and chemogenomics.
Collapse
|
18
|
Creanza TM, Lamanna G, Delre P, Contino M, Corriero N, Saviano M, Mangiatordi GF, Ancona N. DeLA-Drug: A Deep Learning Algorithm for Automated Design of Druglike Analogues. J Chem Inf Model 2022; 62:1411-1424. [PMID: 35294184 DOI: 10.1021/acs.jcim.2c00205] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
In this paper, we present a deep learning algorithm for automated design of druglike analogues (DeLA-Drug), a recurrent neural network (RNN) model composed of two long short-term memory (LSTM) layers and conceived for data-driven generation of similar-to-bioactive compounds. DeLA-Drug captures the syntax of SMILES strings of more than 1 million compounds belonging to the ChEMBL28 database and, by employing a new strategy called sampling with substitutions (SWS), generates molecules starting from a single user-defined query compound. Remarkably, the algorithm preserves druglikeness and synthetic accessibility of the known bioactive compounds present in the ChEMBL28 repository. The absence of any time-demanding fine-tuning procedure enables DeLA-Drug to perform a fast generation of focused libraries for further high-throughput screening and makes it a suitable tool for performing de novo design even in low-data regimes. To provide a concrete idea of its applicability, DeLA-Drug was applied to the cannabinoid receptor subtype 2 (CB2R), a known target involved in different pathological conditions such as cancer and neurodegeneration. DeLA-Drug, available as a free web platform (http://www.ba.ic.cnr.it/softwareic/deladrugportal/), can help medicinal chemists interested in generating analogues of compounds already available in their laboratories and, for this reason, good candidates for an easy and low-cost synthesis.
Collapse
Affiliation(s)
- Teresa Maria Creanza
- CNR─Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Via Amendola 122/o, 70126 Bari, Italy
| | - Giuseppe Lamanna
- Chemistry Department, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125 Bari, Italy.,CNR─Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Pietro Delre
- Chemistry Department, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125 Bari, Italy.,CNR─Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Marialessandra Contino
- Department of Pharmacy─Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125 Bari, Italy
| | - Nicola Corriero
- CNR─Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Michele Saviano
- CNR─Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | | | - Nicola Ancona
- CNR─Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Via Amendola 122/o, 70126 Bari, Italy
| |
Collapse
|
19
|
Sumita M, Terayama K, Suzuki N, Ishihara S, Tamura R, Chahal MK, Payne DT, Yoshizoe K, Tsuda K. De novo creation of a naked eye-detectable fluorescent molecule based on quantum chemical computation and machine learning. SCIENCE ADVANCES 2022; 8:eabj3906. [PMID: 35263133 PMCID: PMC8906732 DOI: 10.1126/sciadv.abj3906] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 01/19/2022] [Indexed: 06/14/2023]
Abstract
Designing fluorescent molecules requires considering multiple interrelated molecular properties, as opposed to properties that straightforwardly correlated with molecular structure, such as light absorption of molecules. In this study, we have used a de novo molecule generator (DNMG) coupled with quantum chemical computation (QC) to develop fluorescent molecules, which are garnering significant attention in various disciplines. Using massive parallel computation (1024 cores, 5 days), the DNMG has produced 3643 candidate molecules. We have selected an unreported molecule and seven reported molecules and synthesized them. Photoluminescence spectrum measurements demonstrated that the DNMG can successfully design fluorescent molecules with 75% accuracy (n = 6/8) and create an unreported molecule that emits fluorescence detectable by the naked eye.
Collapse
Affiliation(s)
- Masato Sumita
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
- International Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
| | - Kei Terayama
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Kanagawa 230-0045, Japan
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku, Kyoto 606-8507, Japan
- Medical Sciences Innovation Hub Program, RIKEN Cluster for Science, Technology and Innovation Hub, Tsurumi-ku, Kanagawa 230-0045, Japan
| | - Naoya Suzuki
- Materials Science and Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, Nakaku, Sakai, Osaka 599-8531, Japan
| | - Shinsuke Ishihara
- International Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
| | - Ryo Tamura
- International Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
- Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba 277-8561, Japan
| | - Mandeep K. Chahal
- Department of Chemistry, University of Southampton, University Road, Highfield, Southampton SO17 1BJ, UK
| | - Daniel T. Payne
- International Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
- International Center for Young Scientists (ICYS), National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
| | - Kazuki Yoshizoe
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
- Research Institute for Information Technology (RIIT), Kyushu University, 744 Motooka, Nishi-ku, Fukuoka City, Fukuoka 819-0395, Japan
| | - Koji Tsuda
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
- Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba 277-8561, Japan
| |
Collapse
|
20
|
Bian Y, Xie XQ. Artificial Intelligent Deep Learning Molecular Generative Modeling of Scaffold-Focused and Cannabinoid CB2 Target-Specific Small-Molecule Sublibraries. Cells 2022; 11:cells11050915. [PMID: 35269537 PMCID: PMC8909864 DOI: 10.3390/cells11050915] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 02/26/2022] [Accepted: 02/26/2022] [Indexed: 02/01/2023] Open
Abstract
Design and generation of high-quality target- and scaffold-specific small molecules is an important strategy for the discovery of unique and potent bioactive drug molecules. To achieve this goal, authors have developed the deep-learning molecule generation model (DeepMGM) and applied it for the de novo molecular generation of scaffold-focused small-molecule libraries. In this study, a recurrent neural network (RNN) using long short-term memory (LSTM) units was trained with drug-like molecules to result in a general model (g-DeepMGM). Sampling practices on indole and purine scaffolds illustrate the feasibility of creating scaffold-focused chemical libraries based on machine intelligence. Subsequently, a target-specific model (t-DeepMGM) for cannabinoid receptor 2 (CB2) was constructed following the transfer learning process of known CB2 ligands. Sampling outcomes can present similar properties to the reported active molecules. Finally, a discriminator was trained and attached to the DeepMGM to result in an in silico molecular design-test circle. Medicinal chemistry synthesis and biological validation was performed to further investigate the generation outcome, showing that XIE9137 was identified as a potential allosteric modulator of CB2. This study demonstrates how recent progress in deep learning intelligence can benefit drug discovery, especially in de novo molecular design and chemical library generation.
Collapse
Affiliation(s)
- Yuemin Bian
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology PharmacoAnalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA;
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, Pharmacometrics & System Pharmacology PharmacoAnalytics, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA;
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
- Drug Discovery Institute, University of Pittsburgh, Pittsburgh, PA 15261, USA
- Departments of Computational Biology and Structural Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA
- Correspondence:
| |
Collapse
|
21
|
Moret M, Grisoni F, Katzberger P, Schneider G. Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models. J Chem Inf Model 2022; 62:1199-1206. [PMID: 35191696 PMCID: PMC8924923 DOI: 10.1021/acs.jcim.2c00079] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare "greedy" (beam search) with "explorative" (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.
Collapse
Affiliation(s)
- Michael Moret
- Department of Chemistry and Applied Biosciences, ETH Zurich, RETHINK, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland
| | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Groene Loper 7, Eindhoven 5612AZ, Netherlands.,Center for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, Utrecht 3584 CB, The Netherlands
| | - Paul Katzberger
- Department of Chemistry and Applied Biosciences, ETH Zurich, RETHINK, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, RETHINK, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland.,ETH Singapore SEC Ltd., 1 CREATE Way, #06-01 CREATE Tower, Singapore 138602, Singapore
| |
Collapse
|
22
|
Saldívar-González FI, Aldas-Bulos VD, Medina-Franco JL, Plisson F. Natural product drug discovery in the artificial intelligence era. Chem Sci 2022; 13:1526-1546. [PMID: 35282622 PMCID: PMC8827052 DOI: 10.1039/d1sc04471k] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/10/2021] [Indexed: 12/19/2022] Open
Abstract
Natural products (NPs) are primarily recognized as privileged structures to interact with protein drug targets. Their unique characteristics and structural diversity continue to marvel scientists for developing NP-inspired medicines, even though the pharmaceutical industry has largely given up. High-performance computer hardware, extensive storage, accessible software and affordable online education have democratized the use of artificial intelligence (AI) in many sectors and research areas. The last decades have introduced natural language processing and machine learning algorithms, two subfields of AI, to tackle NP drug discovery challenges and open up opportunities. In this article, we review and discuss the rational applications of AI approaches developed to assist in discovering bioactive NPs and capturing the molecular "patterns" of these privileged structures for combinatorial design or target selectivity.
Collapse
Affiliation(s)
- F I Saldívar-González
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - V D Aldas-Bulos
- Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| | - J L Medina-Franco
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - F Plisson
- CONACYT - Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| |
Collapse
|
23
|
Sharma S, Shen T, Chitranshi N, Gupta V, Basavarajappa D, Mirzaei M, You Y, Krezel W, Graham SL, Gupta V. Retinoid X Receptor: Cellular and Biochemical Roles of Nuclear Receptor with a Focus on Neuropathological Involvement. Mol Neurobiol 2022; 59:2027-2050. [PMID: 35015251 PMCID: PMC9015987 DOI: 10.1007/s12035-021-02709-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 12/21/2021] [Indexed: 12/13/2022]
Abstract
Retinoid X receptors (RXRs) present a subgroup of the nuclear receptor superfamily with particularly high evolutionary conservation of ligand binding domain. The receptor exists in α, β, and γ isotypes that form homo-/heterodimeric complexes with other permissive and non-permissive receptors. While research has identified the biochemical roles of several nuclear receptor family members, the roles of RXRs in various neurological disorders remain relatively under-investigated. RXR acts as ligand-regulated transcription factor, modulating the expression of genes that plays a critical role in mediating several developmental, metabolic, and biochemical processes. Cumulative evidence indicates that abnormal RXR signalling affects neuronal stress and neuroinflammatory networks in several neuropathological conditions. Protective effects of targeting RXRs through pharmacological ligands have been established in various cell and animal models of neuronal injury including Alzheimer disease, Parkinson disease, glaucoma, multiple sclerosis, and stroke. This review summarises the existing knowledge about the roles of RXR, its interacting partners, and ligands in CNS disorders. Future research will determine the importance of structural and functional heterogeneity amongst various RXR isotypes as well as elucidate functional links between RXR homo- or heterodimers and specific physiological conditions to increase drug targeting efficiency in pathological conditions.
Collapse
Affiliation(s)
- Samridhi Sharma
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia.
| | - Ting Shen
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia
| | - Nitin Chitranshi
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia
| | - Veer Gupta
- School of Medicine, Deakin University, Melbourne, VIC, Australia
| | - Devaraj Basavarajappa
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia
| | - Mehdi Mirzaei
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia
| | - Yuyi You
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia.,Save Sight Institute, University of Sydney, Sydney, NSW, Australia
| | - Wojciech Krezel
- Institut de Génétique Et de Biologie Moléculaire Et Cellulaire, INSERM U1258, CNRS UMR 7104, Unistra, 67404, Illkirch-Graffenstaden, France
| | - Stuart L Graham
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia.,Save Sight Institute, University of Sydney, Sydney, NSW, Australia
| | - Vivek Gupta
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia.
| |
Collapse
|
24
|
Abstract
Artificial intelligence (AI) offers new possibilities for hit and lead finding in medicinal chemistry. Several instances of AI have been used for prospective de novo drug design. Among these, chemical language models have been shown to perform well in various experimental scenarios. In this study, we provide a hands-on introduction to chemical language modeling. A technique based on recurrent neural networks is discussed in detail, together with a step-by-step guide to applying this AI method for focused compound library design. The program code is freely available at URL: github.com/ETHmodlab/de_novo_design_RNN .
Collapse
Affiliation(s)
- Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, Netherlands.
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
| |
Collapse
|
25
|
Vijayan RSK, Kihlberg J, Cross JB, Poongavanam V. Enhancing preclinical drug discovery with artificial intelligence. Drug Discov Today 2021; 27:967-984. [PMID: 34838731 DOI: 10.1016/j.drudis.2021.11.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/15/2021] [Accepted: 11/19/2021] [Indexed: 12/14/2022]
Abstract
Artificial intelligence (AI) is becoming an integral part of drug discovery. It has the potential to deliver across the drug discovery and development value chain, starting from target identification and reaching through clinical development. In this review, we provide an overview of current AI technologies and a glimpse of how AI is reimagining preclinical drug discovery by highlighting examples where AI has made a real impact. Considering the excitement and hyperbole surrounding AI in drug discovery, we aim to present a realistic view by discussing both opportunities and challenges in adopting AI in drug discovery.
Collapse
Affiliation(s)
- R S K Vijayan
- Institute for Applied Cancer Science, MD Anderson Cancer Center, Houston, TX, USA
| | - Jan Kihlberg
- Department of Chemistry-BMC, Uppsala University, Uppsala, Sweden
| | - Jason B Cross
- Institute for Applied Cancer Science, MD Anderson Cancer Center, Houston, TX, USA.
| | | |
Collapse
|
26
|
Sousa T, Correia J, Pereira V, Rocha M. Generative Deep Learning for Targeted Compound Design. J Chem Inf Model 2021; 61:5343-5361. [PMID: 34699719 DOI: 10.1021/acs.jcim.0c01496] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
In the past few years, de novo molecular design has increasingly been using generative models from the emergent field of Deep Learning, proposing novel compounds that are likely to possess desired properties or activities. De novo molecular design finds applications in different fields ranging from drug discovery and materials sciences to biotechnology. A panoply of deep generative models, including architectures as Recurrent Neural Networks, Autoencoders, and Generative Adversarial Networks, can be trained on existing data sets and provide for the generation of novel compounds. Typically, the new compounds follow the same underlying statistical distributions of properties exhibited on the training data set Additionally, different optimization strategies, including transfer learning, Bayesian optimization, reinforcement learning, and conditional generation, can direct the generation process toward desired aims, regarding their biological activities, synthesis processes or chemical features. Given the recent emergence of these technologies and their relevance, this work presents a systematic and critical review on deep generative models and related optimization methods for targeted compound design, and their applications.
Collapse
Affiliation(s)
- Tiago Sousa
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| | - João Correia
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| | - Vítor Pereira
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, Portugal
| |
Collapse
|
27
|
Tong X, Liu X, Tan X, Li X, Jiang J, Xiong Z, Xu T, Jiang H, Qiao N, Zheng M. Generative Models for De Novo Drug Design. J Med Chem 2021; 64:14011-14027. [PMID: 34533311 DOI: 10.1021/acs.jmedchem.1c00927] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Artificial intelligence (AI) is booming. Among various AI approaches, generative models have received much attention in recent years. Inspired by these successes, researchers are now applying generative model techniques to de novo drug design, which has been considered as the "holy grail" of drug discovery. In this Perspective, we first focus on describing models such as recurrent neural network, autoencoder, generative adversarial network, transformer, and hybrid models with reinforcement learning. Next, we summarize the applications of generative models to drug design, including generating various compounds to expand the compound library and designing compounds with specific properties, and we also list a few publicly available molecular design tools based on generative models which can be used directly to generate molecules. In addition, we also introduce current benchmarks and metrics frequently used for generative models. Finally, we discuss the challenges and prospects of using generative models to aid drug design.
Collapse
Affiliation(s)
- Xiaochu Tong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiaohong Liu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiaoqin Tan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Jiaxin Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Zhaoping Xiong
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen 518100, China
| | | | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Nan Qiao
- Laboratory of Health Intelligence, Huawei Technologies Co., Ltd, Shenzhen 518100, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
28
|
Moret M, Helmstädter M, Grisoni F, Schneider G, Merk D. Beam‐Search zum automatisierten Entwurf und Scoring neuer ROR‐Liganden mithilfe maschineller Intelligenz**. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202104405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Michael Moret
- ETH Zurich Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Schweiz
| | - Moritz Helmstädter
- Goethe University Frankfurt Institute of Pharmaceutical Chemistry Max-von-Laue-Straße 9 60438 Frankfurt Deutschland
| | - Francesca Grisoni
- ETH Zurich Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Schweiz
- Eindhoven University of Technology Institute for Complex Molecular Systems Department of Biomedical Engineering Groene Loper 7 5612AZ Eindhoven Niederlande
| | - Gisbert Schneider
- ETH Zurich Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Schweiz
- ETH Singapore SEC Ltd 1 CREATE Way, #06-01 CREATE Tower Singapore 138602 Singapur
| | - Daniel Merk
- Goethe University Frankfurt Institute of Pharmaceutical Chemistry Max-von-Laue-Straße 9 60438 Frankfurt Deutschland
- LMU München Department of Pharmacy Butenandtstraße 7 81377 München Deutschland
| |
Collapse
|
29
|
Moret M, Helmstädter M, Grisoni F, Schneider G, Merk D. Beam Search for Automated Design and Scoring of Novel ROR Ligands with Machine Intelligence*. Angew Chem Int Ed Engl 2021; 60:19477-19482. [PMID: 34165856 PMCID: PMC8457062 DOI: 10.1002/anie.202104405] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/02/2021] [Indexed: 01/10/2023]
Abstract
Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. Herein, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORγ. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.
Collapse
Affiliation(s)
- Michael Moret
- ETH ZurichDepartment of Chemistry and Applied BiosciencesVladimir-Prelog-Weg 48093ZurichSwitzerland
| | - Moritz Helmstädter
- Goethe University FrankfurtInstitute of Pharmaceutical ChemistryMax-von-Laue-Strasse 960438FrankfurtGermany
| | - Francesca Grisoni
- ETH ZurichDepartment of Chemistry and Applied BiosciencesVladimir-Prelog-Weg 48093ZurichSwitzerland
- Eindhoven University of TechnologyInstitute for Complex Molecular SystemsDepartment of Biomedical EngineeringGroene Loper 75612AZEindhovenNetherlands
| | - Gisbert Schneider
- ETH ZurichDepartment of Chemistry and Applied BiosciencesVladimir-Prelog-Weg 48093ZurichSwitzerland
- ETH Singapore SEC Ltd1 CREATE Way, #06-01 CREATE TowerSingapore138602Singapore
| | - Daniel Merk
- Goethe University FrankfurtInstitute of Pharmaceutical ChemistryMax-von-Laue-Strasse 960438FrankfurtGermany
- LMU MunichDepartment of PharmacyButenandtstrasse 781377MunichGermany
| |
Collapse
|
30
|
Serrano Nájera G, Narganes Carlón D, Crowther DJ. TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery. Sci Rep 2021; 11:15747. [PMID: 34344904 PMCID: PMC8333311 DOI: 10.1038/s41598-021-94897-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 07/08/2021] [Indexed: 02/07/2023] Open
Abstract
Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine coverage of human genes by omic-scale research make it difficult to maintain meaningful overviews from which to identify promising new trends. Here we propose an automated yet flexible pipeline that identifies trends in the scientific corpus which align with the specific interests of a researcher and facilitate an initial prioritisation of opportunities. Using a procedure based on co-citation networks and machine learning, genes and diseases are first parsed from PubMed articles using a novel named entity recognition system together with publication date and supporting information. Then recurrent neural networks are trained to predict the publication dynamics of all human genes. For a user-defined therapeutic focus, genes generating more publications or citations are identified as high-interest targets. We also used topic detection routines to help understand why a gene is trendy and implement a system to propose the most prominent review articles for a potential target. This TrendyGenes pipeline detects emerging targets and pathways and provides a new way to explore the literature for individual researchers, pharmaceutical companies and funding agencies.
Collapse
Affiliation(s)
- Guillermo Serrano Nájera
- Division of Cell and Developmental Biology, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - David Narganes Carlón
- Division of Cell and Developmental Biology, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
- Division of Population Health and Genomics, Ninewells Hospital, School of Medicine, University of Dundee, Dundee, DD1 9SY, UK
- Exscientia Ltd, Dundee One, River Court, 5 West Victoria Dock Road, Dundee, DD1 3JT, UK
| | - Daniel J Crowther
- Exscientia Ltd, Dundee One, River Court, 5 West Victoria Dock Road, Dundee, DD1 3JT, UK.
| |
Collapse
|
31
|
Gallego V, Naveiro R, Roca C, Ríos Insua D, Campillo NE. AI in drug development: a multidisciplinary perspective. Mol Divers 2021; 25:1461-1479. [PMID: 34251580 PMCID: PMC8342381 DOI: 10.1007/s11030-021-10266-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/29/2021] [Indexed: 01/09/2023]
Abstract
The introduction of a new drug to the commercial market follows a complex and long process that typically spans over several years and entails large monetary costs due to a high attrition rate. Because of this, there is an urgent need to improve this process using innovative technologies such as artificial intelligence (AI). Different AI tools are being applied to support all four steps of the drug development process (basic research for drug discovery; pre-clinical phase; clinical phase; and postmarketing). Some of the main tasks where AI has proven useful include identifying molecular targets, searching for hit and lead compounds, synthesising drug-like compounds and predicting ADME-Tox. This review, on the one hand, brings in a mathematical vision of some of the key AI methods used in drug development closer to medicinal chemists and, on the other hand, brings the drug development process and the use of different models closer to mathematicians. Emphasis is placed on two aspects not mentioned in similar surveys, namely, Bayesian approaches and their applications to molecular modelling and the eventual final use of the methods to actually support decisions. Promoting a perfect synergy.
Collapse
Affiliation(s)
- Víctor Gallego
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Roi Naveiro
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Carlos Roca
- AItenea Biotech S.L. Parque Científico de Madrid, Faraday, 7, 28049, Madrid, Spain
| | - David Ríos Insua
- ICMAT-CSIC and Dept. of Statistics and OR, U. Compl. Madrid, Madrid, Spain
| | - Nuria E Campillo
- CIB-Margarita Salas (CSIC), Ramiro de Maeztu, 9, 28040, Madrid, Spain.
| |
Collapse
|
32
|
Friedrich L, Cingolani G, Ko Y, Iaselli M, Miciaccia M, Perrone MG, Neukirch K, Bobinger V, Merk D, Hofstetter RK, Werz O, Koeberle A, Scilimati A, Schneider G. Learning from Nature: From a Marine Natural Product to Synthetic Cyclooxygenase-1 Inhibitors by Automated De Novo Design. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2021; 8:e2100832. [PMID: 34176236 PMCID: PMC8373093 DOI: 10.1002/advs.202100832] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 05/16/2021] [Indexed: 05/03/2023]
Abstract
The repertoire of natural products offers tremendous opportunities for chemical biology and drug discovery. Natural product-inspired synthetic molecules represent an ecologically and economically sustainable alternative to the direct utilization of natural products. De novo design with machine intelligence bridges the gap between the worlds of bioactive natural products and synthetic molecules. On employing the compound Marinopyrrole A from marine Streptomyces as a design template, the algorithm constructs innovative small molecules that can be synthesized in three steps, following the computationally suggested synthesis route. Computational activity prediction reveals cyclooxygenase (COX) as a putative target of both Marinopyrrole A and the de novo designs. The molecular designs are experimentally confirmed as selective COX-1 inhibitors with nanomolar potency. X-ray structure analysis reveals the binding of the most selective compound to COX-1. This molecular design approach provides a blueprint for natural product-inspired hit and lead identification for drug discovery with machine intelligence.
Collapse
Affiliation(s)
- Lukas Friedrich
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir‐Prelog‐Weg 4Zurich8093Switzerland
| | - Gino Cingolani
- Department of Biochemistry and Molecular BiologySidney Kimmel Cancer CenterThomas Jefferson University1020 Locust StreetPhiladelphiaPA19107USA
| | - Ying‐Hui Ko
- Department of Biochemistry and Molecular BiologySidney Kimmel Cancer CenterThomas Jefferson University1020 Locust StreetPhiladelphiaPA19107USA
| | - Mariaclara Iaselli
- Department of Pharmacy – Pharmaceutical SciencesUniversity of BariVia E. Orabona 4Bari70125Italy
| | - Morena Miciaccia
- Department of Pharmacy – Pharmaceutical SciencesUniversity of BariVia E. Orabona 4Bari70125Italy
| | - Maria Grazia Perrone
- Department of Pharmacy – Pharmaceutical SciencesUniversity of BariVia E. Orabona 4Bari70125Italy
| | - Konstantin Neukirch
- Michael Popp Institute and Center for Molecular Biosciences Innsbruck (CMBI)University of InnsbruckInnsbruck6020Austria
| | - Veronika Bobinger
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir‐Prelog‐Weg 4Zurich8093Switzerland
| | - Daniel Merk
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir‐Prelog‐Weg 4Zurich8093Switzerland
- Institute of Pharmaceutical ChemistryGoethe‐UniversityMax‐von‐Laue Straße 9Frankfurt am Main60438Germany
| | - Robert Klaus Hofstetter
- Department of Pharmaceutical/Medicinal ChemistryFriedrich‐Schiller‐University JenaPhilosophenweg 14Jena07743Germany
| | - Oliver Werz
- Department of Pharmaceutical/Medicinal ChemistryFriedrich‐Schiller‐University JenaPhilosophenweg 14Jena07743Germany
| | - Andreas Koeberle
- Michael Popp Institute and Center for Molecular Biosciences Innsbruck (CMBI)University of InnsbruckInnsbruck6020Austria
| | - Antonio Scilimati
- Department of Pharmacy – Pharmaceutical SciencesUniversity of BariVia E. Orabona 4Bari70125Italy
| | - Gisbert Schneider
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir‐Prelog‐Weg 4Zurich8093Switzerland
- ETH Singapore SEC Ltd1 CREATE Way, #06‐01 CREATE TowerSingapore138602Singapore
| |
Collapse
|
33
|
|
34
|
Grisoni F, Huisman BJH, Button AL, Moret M, Atz K, Merk D, Schneider G. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. SCIENCE ADVANCES 2021; 7:eabg3338. [PMID: 34117066 PMCID: PMC8195470 DOI: 10.1126/sciadv.abg3338] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Accepted: 04/23/2021] [Indexed: 05/24/2023]
Abstract
Automating the molecular design-make-test-analyze cycle accelerates hit and lead finding for drug discovery. Using deep learning for molecular design and a microfluidics platform for on-chip chemical synthesis, liver X receptor (LXR) agonists were generated from scratch. The computational pipeline was tuned to explore the chemical space of known LXRα agonists and generate novel molecular candidates. To ensure compatibility with automated on-chip synthesis, the chemical space was confined to the virtual products obtainable from 17 one-step reactions. Twenty-five de novo designs were successfully synthesized in flow. In vitro screening of the crude reaction products revealed 17 (68%) hits, with up to 60-fold LXR activation. The batch resynthesis, purification, and retesting of 14 of these compounds confirmed that 12 of them were potent LXR agonists. These results support the suitability of the proposed design-make-test-analyze framework as a blueprint for automated drug design with artificial intelligence and miniaturized bench-top synthesis.
Collapse
Affiliation(s)
- Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, Netherlands
| | - Berend J H Huisman
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
| | - Alexander L Button
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
- University of Lausanne, Department of Computational Biology, Lausanne, Switzerland
| | - Michael Moret
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
| | - Daniel Merk
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, Frankfurt, Germany
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- ETH Singapore SEC Ltd, Singapore, Singapore
| |
Collapse
|
35
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 95] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
36
|
Porras G, Chassagne F, Lyles JT, Marquez L, Dettweiler M, Salam AM, Samarakoon T, Shabih S, Farrokhi DR, Quave CL. Ethnobotany and the Role of Plant Natural Products in Antibiotic Drug Discovery. Chem Rev 2021; 121:3495-3560. [PMID: 33164487 PMCID: PMC8183567 DOI: 10.1021/acs.chemrev.0c00922] [Citation(s) in RCA: 124] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The crisis of antibiotic resistance necessitates creative and innovative approaches, from chemical identification and analysis to the assessment of bioactivity. Plant natural products (NPs) represent a promising source of antibacterial lead compounds that could help fill the drug discovery pipeline in response to the growing antibiotic resistance crisis. The major strength of plant NPs lies in their rich and unique chemodiversity, their worldwide distribution and ease of access, their various antibacterial modes of action, and the proven clinical effectiveness of plant extracts from which they are isolated. While many studies have tried to summarize NPs with antibacterial activities, a comprehensive review with rigorous selection criteria has never been performed. In this work, the literature from 2012 to 2019 was systematically reviewed to highlight plant-derived compounds with antibacterial activity by focusing on their growth inhibitory activity. A total of 459 compounds are included in this Review, of which 50.8% are phenolic derivatives, 26.6% are terpenoids, 5.7% are alkaloids, and 17% are classified as other metabolites. A selection of 183 compounds is further discussed regarding their antibacterial activity, biosynthesis, structure-activity relationship, mechanism of action, and potential as antibiotics. Emerging trends in the field of antibacterial drug discovery from plants are also discussed. This Review brings to the forefront key findings on the antibacterial potential of plant NPs for consideration in future antibiotic discovery and development efforts.
Collapse
Affiliation(s)
- Gina Porras
- Center for the Study of Human Health, Emory University, 1557 Dickey Dr., Atlanta, Georgia 30322
| | - François Chassagne
- Center for the Study of Human Health, Emory University, 1557 Dickey Dr., Atlanta, Georgia 30322
| | - James T. Lyles
- Center for the Study of Human Health, Emory University, 1557 Dickey Dr., Atlanta, Georgia 30322
| | - Lewis Marquez
- Molecular and Systems Pharmacology Program, Laney Graduate School, Emory University, 615 Michael St., Whitehead 115, Atlanta, Georgia 30322
| | - Micah Dettweiler
- Department of Dermatology, Emory University, 615 Michael St., Whitehead 105L, Atlanta, Georgia 30322
| | - Akram M. Salam
- Molecular and Systems Pharmacology Program, Laney Graduate School, Emory University, 615 Michael St., Whitehead 115, Atlanta, Georgia 30322
| | - Tharanga Samarakoon
- Emory University Herbarium, Emory University, 1462 Clifton Rd NE, Room 102, Atlanta, Georgia 30322
| | - Sarah Shabih
- Center for the Study of Human Health, Emory University, 1557 Dickey Dr., Atlanta, Georgia 30322
| | - Darya Raschid Farrokhi
- Center for the Study of Human Health, Emory University, 1557 Dickey Dr., Atlanta, Georgia 30322
| | - Cassandra L. Quave
- Center for the Study of Human Health, Emory University, 1557 Dickey Dr., Atlanta, Georgia 30322
- Emory University Herbarium, Emory University, 1462 Clifton Rd NE, Room 102, Atlanta, Georgia 30322
- Department of Dermatology, Emory University, 615 Michael St., Whitehead 105L, Atlanta, Georgia 30322
- Molecular and Systems Pharmacology Program, Laney Graduate School, Emory University, 615 Michael St., Whitehead 115, Atlanta, Georgia 30322
| |
Collapse
|
37
|
Zhang R, Li X, Zhang X, Qin H, Xiao W. Machine learning approaches for elucidating the biological effects of natural products. Nat Prod Rep 2021; 38:346-361. [PMID: 32869826 DOI: 10.1039/d0np00043d] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Covering: 2000 to 2020 Machine learning (ML) is an efficient tool for the prediction of bioactivity and the study of structure-activity relationships. Over the past decade, an emerging trend for combining these approaches with the study of natural products (NPs) has developed in order to manage the challenge of the discovery of bioactive NPs. In the present review, we will introduce the basic principles and protocols for using the ML approach to investigate the bioactivity of NPs, citing a series of practical examples regarding the study of anti-microbial, anti-cancer, and anti-inflammatory NPs, etc. ML algorithms manage a variety of classification and regression problems associated with bioactive NPs, from those that are linear to non-linear and from pure compounds to plant extracts. Inspired by cases reported in the literature and our own experience, a number of key points have been emphasized for reducing modeling errors, including dataset preparation and applicability domain analysis.
Collapse
Affiliation(s)
- Ruihan Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xiaoli Li
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Xingjie Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Huayan Qin
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| | - Weilie Xiao
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan Research & Development Center for Natural Products, School of Chemical Science and Technology, Yunnan University, 2 Rd Cuihubei, P. R. China.
| |
Collapse
|
38
|
Abstract
Molecular descriptors encode a variety of molecular representations for computer-assisted drug discovery. Here, we focus on the Weighted Holistic Atom Localization and Entity Shape (WHALES) descriptors, which were originally designed for scaffold hopping from natural products to synthetic molecules. WHALES descriptors capture molecular shape and partial charges simultaneously. We introduce the key aspects of the WHALES concept and provide a step-by-step guide on how to use these descriptors for virtual compound screening and scaffold hopping. The results presented can be reproduced by using the code freely available from URL: github.com/ETHmodlab/scaffold_hopping_whales .
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Zurich, Switzerland.
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
39
|
Polykovskiy D, Zhebrak A, Sanchez-Lengeling B, Golovanov S, Tatanov O, Belyaev S, Kurbanov R, Artamonov A, Aladinskiy V, Veselov M, Kadurin A, Johansson S, Chen H, Nikolenko S, Aspuru-Guzik A, Zhavoronkov A. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Front Pharmacol 2020; 11:565644. [PMID: 33390943 PMCID: PMC7775580 DOI: 10.3389/fphar.2020.565644] [Citation(s) in RCA: 197] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 10/26/2020] [Indexed: 01/06/2023] Open
Abstract
Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to standardize training and comparison of molecular generative models. MOSES provides training and testing datasets, and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Mark Veselov
- Insilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong Kong
| | - Artur Kadurin
- Insilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong Kong
| | - Simon Johansson
- Molecular AI, DiscoverySciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Hongming Chen
- Molecular AI, DiscoverySciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Sergey Nikolenko
- Insilico Medicine Hong Kong Ltd., Pak Shek Kok, Hong Kong
- Neuromation OU, Tallinn, Estonia
- Computer Science Department, National Research University Higher School of Economics, St. Petersburg, Russia
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- CIFAR AI Chair, Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Toronto, ON, Canada
| | | |
Collapse
|
40
|
Chen Y, Kirchmair J. Cheminformatics in Natural Product-based Drug Discovery. Mol Inform 2020; 39:e2000171. [PMID: 32725781 PMCID: PMC7757247 DOI: 10.1002/minf.202000171] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 07/28/2020] [Indexed: 12/20/2022]
Abstract
This review seeks to provide a timely survey of the scope and limitations of cheminformatics methods in natural product-based drug discovery. Following an overview of data resources of chemical, biological and structural information on natural products, we discuss, among other aspects, in silico methods for (i) data curation and natural products dereplication, (ii) analysis, visualization, navigation and comparison of the chemical space, (iii) quantification of natural product-likeness, (iv) prediction of the bioactivities (virtual screening, target prediction), ADME and safety profiles (toxicity) of natural products, (v) natural products-inspired de novo design and (vi) prediction of natural products prone to cause interference with biological assays. Among the many methods discussed are rule-based, similarity-based, shape-based, pharmacophore-based and network-based approaches, docking and machine learning methods.
Collapse
Affiliation(s)
- Ya Chen
- Center for Bioinformatics (ZBH)Department of Computer ScienceFaculty of MathematicsInformatics and Natural SciencesUniversität Hamburg20146HamburgGermany
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH)Department of Computer ScienceFaculty of MathematicsInformatics and Natural SciencesUniversität Hamburg20146HamburgGermany
- Department of Pharmaceutical ChemistryFaculty of Life SciencesUniversity of Vienna1090ViennaAustria
| |
Collapse
|
41
|
Zhao L, Ciallella HL, Aleksunes LM, Zhu H. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 2020; 25:1624-1638. [PMID: 32663517 PMCID: PMC7572559 DOI: 10.1016/j.drudis.2020.07.005] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 06/26/2020] [Accepted: 07/06/2020] [Indexed: 02/06/2023]
Abstract
Advancing a new drug to market requires substantial investments in time as well as financial resources. Crucial bioactivities for drug candidates, including their efficacy, pharmacokinetics (PK), and adverse effects, need to be investigated during drug development. With advancements in chemical synthesis and biological screening technologies over the past decade, a large amount of biological data points for millions of small molecules have been generated and are stored in various databases. These accumulated data, combined with new machine learning (ML) approaches, such as deep learning, have shown great potential to provide insights into relevant chemical structures to predict in vitro, in vivo, and clinical outcomes, thereby advancing drug discovery and development in the big data era.
Collapse
Affiliation(s)
- Linlin Zhao
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Heather L Ciallella
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ 08854, USA
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
42
|
Schneider P, Welin M, Svensson B, Walse B, Schneider G. Virtual Screening and Design with Machine Intelligence Applied to Pim-1 Kinase Inhibitors. Mol Inform 2020; 39:e2000109. [PMID: 33448694 PMCID: PMC7539333 DOI: 10.1002/minf.202000109] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 06/17/2020] [Indexed: 12/17/2022]
Abstract
Ligand-based virtual screening of large compound collections, combined with fast bioactivity determination, facilitate the discovery of bioactive molecules with desired properties. Here, chemical similarity based machine learning and label-free differential scanning fluorimetry were used to rapidly identify new ligands of the anticancer target Pim-1 kinase. The three-dimensional crystal structure complex of human Pim-1 with ligand bound revealed an ATP-competitive binding mode. Generative de novo design with a recurrent neural network additionally suggested innovative molecular scaffolds. Results corroborate the validity of the chemical similarity principle for rapid ligand prototyping, suggesting the complementarity of similarity-based and generative computational approaches.
Collapse
Affiliation(s)
- Petra Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.,inSili.com GmbH, Segantinisteig 3, 8049, Zurich, Switzerland
| | - Martin Welin
- SARomics Biostructures AB, Medicon Village, SE-223 81, Lund, Sweden
| | - Bo Svensson
- SARomics Biostructures AB, Medicon Village, SE-223 81, Lund, Sweden
| | - Björn Walse
- SARomics Biostructures AB, Medicon Village, SE-223 81, Lund, Sweden
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| |
Collapse
|
43
|
Öztürk H, Özgür A, Schwaller P, Laino T, Ozkirimli E. Exploring chemical space using natural language processing methodologies for drug discovery. Drug Discov Today 2020; 25:689-705. [DOI: 10.1016/j.drudis.2020.01.020] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 12/20/2019] [Accepted: 01/28/2020] [Indexed: 01/06/2023]
|
44
|
Yuan Q, Santana-Bonilla A, Zwijnenburg MA, Jelfs KE. Molecular generation targeting desired electronic properties via deep generative models. NANOSCALE 2020; 12:6744-6758. [PMID: 32163074 DOI: 10.1039/c9nr10687a] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
As we seek to discover new functional materials, we need ways to explore the vast chemical space of precursor building blocks, not only generating large numbers of possible building blocks to investigate, but trying to find non-obvious options, that we might not suggest by chemical experience alone. Artificial intelligence techniques provide a possible avenue to generate large numbers of organic building blocks for functional materials, and can even do so from very small initial libraries of known building blocks. Specifically, we demonstrate the application of deep recurrent neural networks for the exploration of the chemical space of building blocks for a test case of donor-acceptor oligomers with specific electronic properties. The recurrent neural network learned how to produce novel donor-acceptor oligomers by trading off between selected atomic substitutions, such as halogenation or methylation, and molecular features such as the oligomer's size. The electronic and structural properties of the generated oligomers can be tuned by sampling from different subsets of the training database, which enabled us to enrich the library of donor-acceptors towards desired properties. We generated approximately 1700 new donor-acceptor oligomers with a recurrent neural network tuned to target oligomers with a HOMO-LUMO gap <2 eV and a dipole moment <2 Debye, which could have potential application in organic photovoltaics.
Collapse
Affiliation(s)
- Qi Yuan
- Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, W12 0BZ, UK.
| | - Alejandro Santana-Bonilla
- Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, W12 0BZ, UK.
| | - Martijn A Zwijnenburg
- Department of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, UK
| | - Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, W12 0BZ, UK.
| |
Collapse
|
45
|
|
46
|
Grisoni F, Moret M, Lingwood R, Schneider G. Bidirectional Molecule Generation with Recurrent Neural Networks. J Chem Inf Model 2020; 60:1175-1183. [PMID: 31904964 DOI: 10.1021/acs.jcim.9b00943] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Recurrent neural networks (RNNs) are able to generate de novo molecular designs using simplified molecular input line entry systems (SMILES) string representations of the chemical structure. RNN-based structure generation is usually performed unidirectionally, by growing SMILES strings from left to right. However, there is no natural start or end of a small molecule, and SMILES strings are intrinsically nonunivocal representations of molecular graphs. These properties motivate bidirectional structure generation. Here, bidirectional generative RNNs for SMILES-based molecule design are introduced. To this end, two established bidirectional methods were implemented, and a new method for SMILES string generation and data augmentation is introduced-the bidirectional molecule design by alternate learning (BIMODAL). These three bidirectional strategies were compared to the unidirectional forward RNN approach for SMILES string generation, in terms of the (i) novelty, (ii) scaffold diversity, and (iii) chemical-biological relevance of the computer-generated molecules. The results positively advocate bidirectional strategies for SMILES-based molecular de novo design, with BIMODAL showing superior results to the unidirectional forward RNN for most of the criteria in the tested conditions. The code of the methods and the pretrained models can be found at URL https://github.com/ETHmodlab/BIMODAL.
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Michael Moret
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Robin Lingwood
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| |
Collapse
|
47
|
Schneider P, Walters WP, Plowright AT, Sieroka N, Listgarten J, Goodnow RA, Fisher J, Jansen JM, Duca JS, Rush TS, Zentgraf M, Hill JE, Krutoholow E, Kohler M, Blaney J, Funatsu K, Luebkemann C, Schneider G. Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov 2019. [DOI: 78495111110.1038/s41573-019-0050-3' target='_blank'>'"<>78495111110.1038/s41573-019-0050-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [78495111110.1038/s41573-019-0050-3','', '10.1038/s42004-018-0068-1')">Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]
78495111110.1038/s41573-019-0050-3" />
|
48
|
Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov 2019; 19:353-364. [DOI: 10.1038/s41573-019-0050-3] [Citation(s) in RCA: 222] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/28/2019] [Indexed: 12/17/2022]
|
49
|
Cova TFGG, Pais AACC. Deep Learning for Deep Chemistry: Optimizing the Prediction of Chemical Patterns. Front Chem 2019; 7:809. [PMID: 32039134 PMCID: PMC6988795 DOI: 10.3389/fchem.2019.00809] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/11/2019] [Indexed: 12/14/2022] Open
Abstract
Computational Chemistry is currently a synergistic assembly between ab initio calculations, simulation, machine learning (ML) and optimization strategies for describing, solving and predicting chemical data and related phenomena. These include accelerated literature searches, analysis and prediction of physical and quantum chemical properties, transition states, chemical structures, chemical reactions, and also new catalysts and drug candidates. The generalization of scalability to larger chemical problems, rather than specialization, is now the main principle for transforming chemical tasks in multiple fronts, for which systematic and cost-effective solutions have benefited from ML approaches, including those based on deep learning (e.g. quantum chemistry, molecular screening, synthetic route design, catalysis, drug discovery). The latter class of ML algorithms is capable of combining raw input into layers of intermediate features, enabling bench-to-bytes designs with the potential to transform several chemical domains. In this review, the most exciting developments concerning the use of ML in a range of different chemical scenarios are described. A range of different chemical problems and respective rationalization, that have hitherto been inaccessible due to the lack of suitable analysis tools, is thus detailed, evidencing the breadth of potential applications of these emerging multidimensional approaches. Focus is given to the models, algorithms and methods proposed to facilitate research on compound design and synthesis, materials design, prediction of binding, molecular activity, and soft matter behavior. The information produced by pairing Chemistry and ML, through data-driven analyses, neural network predictions and monitoring of chemical systems, allows (i) prompting the ability to understand the complexity of chemical data, (ii) streamlining and designing experiments, (ii) discovering new molecular targets and materials, and also (iv) planning or rethinking forthcoming chemical challenges. In fact, optimization engulfs all these tasks directly.
Collapse
Affiliation(s)
- Tânia F. G. G. Cova
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Alberto A. C. C. Pais
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
50
|
Bruns D, Merk D, Santhana Kumar K, Baumgartner M, Schneider G. Synthetic Activators of Cell Migration Designed by Constructive Machine Learning. ChemistryOpen 2019; 8:1303-1308. [PMID: 31660283 PMCID: PMC6807213 DOI: 10.1002/open.201900222] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 09/11/2019] [Indexed: 11/26/2022] Open
Abstract
Constructive machine learning aims to create examples from its learned domain which are likely to exhibit similar properties. Here, a recurrent neural network was trained with the chemical structures of known cell-migration modulators. This machine learning model was used to generate new molecules that mimic the training compounds. Two top-scoring designs were synthesized, and tested for functional activity in a phenotypic spheroid cell migration assay. These computationally generated small molecules significantly increased the migration of medulloblastoma cells. The results further corroborate the applicability of constructive machine learning to the de novo design of druglike molecules with desired properties.
Collapse
Affiliation(s)
- Dominique Bruns
- ETH Zurich, Department ofChemistry and Applied BiosciencesVladimir-Prelog-Weg 4CH-8093ZurichSwitzerland
| | - Daniel Merk
- ETH Zurich, Department ofChemistry and Applied BiosciencesVladimir-Prelog-Weg 4CH-8093ZurichSwitzerland
| | - Karthiga Santhana Kumar
- Pediatric Neuro-OncologyResearch Group, Department of Oncology, Children's Research Center, University Children's Hospital ZurichLengghalde 5CH-8008ZurichSwitzerland
| | - Martin Baumgartner
- Pediatric Neuro-OncologyResearch Group, Department of Oncology, Children's Research Center, University Children's Hospital ZurichLengghalde 5CH-8008ZurichSwitzerland
| | - Gisbert Schneider
- ETH Zurich, Department ofChemistry and Applied BiosciencesVladimir-Prelog-Weg 4CH-8093ZurichSwitzerland
| |
Collapse
|