1
|
Isigkeit L, Hörmann T, Schallmayer E, Scholz K, Lillich FF, Ehrler JHM, Hufnagel B, Büchner J, Marschner JA, Pabel J, Proschak E, Merk D. Automated design of multi-target ligands by generative deep learning. Nat Commun 2024; 15:7946. [PMID: 39261471 PMCID: PMC11390726 DOI: 10.1038/s41467-024-52060-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 08/23/2024] [Indexed: 09/13/2024] Open
Abstract
Generative deep learning models enable data-driven de novo design of molecules with tailored features. Chemical language models (CLM) trained on string representations of molecules such as SMILES have been successfully employed to design new chemical entities with experimentally confirmed activity on intended targets. Here, we probe the application of CLM to generate multi-target ligands for designed polypharmacology. We capitalize on the ability of CLM to learn from small fine-tuning sets of molecules and successfully bias the model towards designing drug-like molecules with similarity to known ligands of target pairs of interest. Designs obtained from CLM after pooled fine-tuning are predicted active on both proteins of interest and comprise pharmacophore elements of ligands for both targets in one molecule. Synthesis and testing of twelve computationally favored CLM designs for six target pairs reveals modulation of at least one intended protein by all selected designs with up to double-digit nanomolar potency and confirms seven compounds as designed dual ligands. These results corroborate CLM for multi-target de novo design as source of innovation in drug discovery.
Collapse
Affiliation(s)
- Laura Isigkeit
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Tim Hörmann
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany
| | - Espen Schallmayer
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Katharina Scholz
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany
| | - Felix F Lillich
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, 60596, Frankfurt, Germany
| | - Johanna H M Ehrler
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Benedikt Hufnagel
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Jasmin Büchner
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Julian A Marschner
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany
| | - Jörg Pabel
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany
| | - Ewgenij Proschak
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, 60596, Frankfurt, Germany
| | - Daniel Merk
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany.
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany.
| |
Collapse
|
2
|
Bhowmik D, Zhang P, Fox Z, Irle S, Gounley J. Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms. PATTERNS (NEW YORK, N.Y.) 2024; 5:100947. [PMID: 38645768 PMCID: PMC11026973 DOI: 10.1016/j.patter.2024.100947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/14/2023] [Accepted: 02/08/2024] [Indexed: 04/23/2024]
Abstract
This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.
Collapse
Affiliation(s)
- Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Pei Zhang
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Stephan Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
3
|
Liu L, Zhao X, Huang X. Generating Potential RET-Specific Inhibitors Using a Novel LSTM Encoder-Decoder Model. Int J Mol Sci 2024; 25:2357. [PMID: 38397034 PMCID: PMC10889381 DOI: 10.3390/ijms25042357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 02/11/2024] [Accepted: 02/13/2024] [Indexed: 02/25/2024] Open
Abstract
The receptor tyrosine kinase RET (rearranged during transfection) plays a vital role in various cell signaling pathways and is a critical factor in the development of the nervous system. Abnormal activation of the RET kinase can lead to several cancers, including thyroid cancer and non-small-cell lung cancer. However, most RET kinase inhibitors are multi-kinase inhibitors. Therefore, the development of an effective RET-specific inhibitor continues to present a significant challenge. To address this issue, we built a molecular generation model based on fragment-based drug design (FBDD) and a long short-term memory (LSTM) encoder-decoder structure to generate receptor-specific molecules with novel scaffolds. Remarkably, our model was trained with a molecular assembly accuracy of 98.4%. Leveraging the pre-trained model, we rapidly generated a RET-specific-candidate active-molecule library by transfer learning. Virtual screening based on our molecular generation model was performed, combined with molecular dynamics simulation and binding energy calculation, to discover specific RET inhibitors, and five novel molecules were selected. Further analyses indicated that two of these molecules have good binding affinities and synthesizability, exhibiting high selectivity. Overall, this investigation demonstrates the capacity of our model to generate novel receptor-specific molecules and provides a rapid method to discover potential drugs.
Collapse
Affiliation(s)
| | - Xi Zhao
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, Changchun 130061, China;
| | - Xuri Huang
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, Changchun 130061, China;
| |
Collapse
|
4
|
Bae B, Bae H, Nam H. LOGICS: Learning optimal generative distribution for designing de novo chemical structures. J Cheminform 2023; 15:77. [PMID: 37674239 PMCID: PMC10483765 DOI: 10.1186/s13321-023-00747-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023] Open
Abstract
In recent years, the field of computational drug design has made significant strides in the development of artificial intelligence (AI) models for the generation of de novo chemical compounds with desired properties and biological activities, such as enhanced binding affinity to target proteins. These high-affinity compounds have the potential to be developed into more potent therapeutics for a broad spectrum of diseases. Due to the lack of data required for the training of deep generative models, however, some of these approaches have fine-tuned their molecular generators using data obtained from a separate predictor. While these studies show that generative models can produce structures with the desired target properties, it remains unclear whether the diversity of the generated structures and the span of their chemical space align with the distribution of the intended target molecules. In this study, we present a novel generative framework, LOGICS, a framework for Learning Optimal Generative distribution Iteratively for designing target-focused Chemical Structures. We address the exploration-exploitation dilemma, which weighs the choice between exploring new options and exploiting current knowledge. To tackle this issue, we incorporate experience memory and employ a layered tournament selection approach to refine the fine-tuning process. The proposed method was applied to the binding affinity optimization of two target proteins of different protein classes, κ-opioid receptors, and PIK3CA, and the quality and the distribution of the generative molecules were evaluated. The results showed that LOGICS outperforms competing state-of-the-art models and generates more diverse de novo chemical structures with optimized properties. The source code is available at the GitHub repository ( https://github.com/GIST-CSBL/LOGICS ).
Collapse
Affiliation(s)
- Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Haelee Bae
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- AI Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
- Center for AI-Applied High Efficiency Drug Discovery (AHEDD), Gwangju Institute of Science and Technology (GIST), Buk-Gu, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
5
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
6
|
Blanchard AE, Bhowmik D, Fox Z, Gounley J, Glaser J, Akpa BS, Irle S. Adaptive language model training for molecular design. J Cheminform 2023; 15:59. [PMID: 37291633 DOI: 10.1186/s13321-023-00719-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 04/03/2023] [Indexed: 06/10/2023] Open
Abstract
The vast size of chemical space necessitates computational approaches to automate and accelerate the design of molecular sequences to guide experimental efforts for drug discovery. Genetic algorithms provide a useful framework to incrementally generate molecules by applying mutations to known chemical structures. Recently, masked language models have been applied to automate the mutation process by leveraging large compound libraries to learn commonly occurring chemical sequences (i.e., using tokenization) and predict rearrangements (i.e., using mask prediction). Here, we consider how language models can be adapted to improve molecule generation for different optimization tasks. We use two different generation strategies for comparison, fixed and adaptive. The fixed strategy uses a pre-trained model to generate mutations; the adaptive strategy trains the language model on each new generation of molecules selected for target properties during optimization. Our results show that the adaptive strategy allows the language model to more closely fit the distribution of molecules in the population. Therefore, for enhanced fitness optimization, we suggest the use of the fixed strategy during an initial phase followed by the use of the adaptive strategy. We demonstrate the impact of adaptive training by searching for molecules that optimize both heuristic metrics, drug-likeness and synthesizability, as well as predicted protein binding affinity from a surrogate model. Our results show that the adaptive strategy provides a significant improvement in fitness optimization compared to the fixed pre-trained model, empowering the application of language models to molecular design tasks.
Collapse
Affiliation(s)
- Andrew E Blanchard
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Jens Glaser
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Belinda S Akpa
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- Chemical & Biomolecular Engineering, University of Tennessee, Knoxville, TN, 37996, USA
| | - Stephan Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| |
Collapse
|
7
|
Ballarotto M, Willems S, Stiller T, Nawa F, Marschner JA, Grisoni F, Merk D. De Novo Design of Nurr1 Agonists via Fragment-Augmented Generative Deep Learning in Low-Data Regime. J Med Chem 2023. [PMID: 37256819 DOI: 10.1021/acs.jmedchem.3c00485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Generative neural networks trained on SMILES can design innovative bioactive molecules de novo. These so-called chemical language models (CLMs) have typically been trained on tens of template molecules for fine-tuning. However, it is challenging to apply CLM to orphan targets with few known ligands. We have fine-tuned a CLM with a single potent Nurr1 agonist as template in a fragment-augmented fashion and obtained novel Nurr1 agonists using sampling frequency for design prioritization. Nanomolar potency and binding affinity of the top-ranking design and its structural novelty compared to available Nurr1 ligands highlight its value as an early chemical tool and as a lead for Nurr1 agonist development, as well as the applicability of CLM in very low-data scenarios.
Collapse
Affiliation(s)
- Marco Ballarotto
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
- Department of Pharmaceutical Sciences, Università degli Studi di Perugia, 06123 Perugia, Italy
| | - Sabine Willems
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Tanja Stiller
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Felix Nawa
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Julian A Marschner
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, 5612AZ Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, 3584CB Utrecht, The Netherlands
| | - Daniel Merk
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| |
Collapse
|
8
|
van Tilborg D, Alenicheva A, Grisoni F. Exposing the Limitations of Molecular Machine Learning with Activity Cliffs. J Chem Inf Model 2022; 62:5938-5951. [PMID: 36456532 PMCID: PMC9749029 DOI: 10.1021/acs.jcim.2c01073] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 12/03/2022]
Abstract
Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs─pairs of molecules that are highly similar in their structure but exhibit large differences in potency─have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| | | | - Francesca Grisoni
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| |
Collapse
|
9
|
Jang SH, Sivakumar D, Mudedla SK, Choi J, Lee S, Jeon M, Bvs SK, Hwang J, Kang M, Shin EG, Lee KM, Jung KY, Kim JS, Wu S. PCW-A1001, AI-assisted de novo design approach to design a selective inhibitor for FLT-3(D835Y) in acute myeloid leukemia. Front Mol Biosci 2022; 9:1072028. [PMID: 36504722 PMCID: PMC9732455 DOI: 10.3389/fmolb.2022.1072028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/16/2022] [Indexed: 11/27/2022] Open
Abstract
Treating acute myeloid leukemia (AML) by targeting FMS-like tyrosine kinase 3 (FLT-3) is considered an effective treatment strategy. By using AI-assisted hit optimization, we discovered a novel and highly selective compound with desired drug-like properties with which to target the FLT-3 (D835Y) mutant. In the current study, we applied an AI-assisted de novo design approach to identify a novel inhibitor of FLT-3 (D835Y). A recurrent neural network containing long short-term memory cells (LSTM) was implemented to generate potential candidates related to our in-house hit compound (PCW-1001). Approximately 10,416 hits were generated from 20 epochs, and the generated hits were further filtered using various toxicity and synthetic feasibility filters. Based on the docking and free energy ranking, the top compound was selected for synthesis and screening. Of these three compounds, PCW-A1001 proved to be highly selective for the FLT-3 (D835Y) mutant, with an IC50 of 764 nM, whereas the IC50 of FLT-3 WT was 2.54 μM.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Minsung Kang
- Division of Radiation Biomedical Research, Korea Institute of Radiological and Medical Sciences, Seoul, South Korea
| | - Eun Gyeong Shin
- Therapeutics & Biotechnology Division, Korea Research Institute of Chemical Technology, Daejeon, South Korea
- Department of Medicinal Chemistry and Pharmacology, University of Science & Technology, Daejeon, South Korea
| | - Kyu Myung Lee
- Therapeutics & Biotechnology Division, Korea Research Institute of Chemical Technology, Daejeon, South Korea
| | - Kwan-Young Jung
- Therapeutics & Biotechnology Division, Korea Research Institute of Chemical Technology, Daejeon, South Korea
- Department of Medicinal Chemistry and Pharmacology, University of Science & Technology, Daejeon, South Korea
| | - Jae-Sung Kim
- Division of Radiation Biomedical Research, Korea Institute of Radiological and Medical Sciences, Seoul, South Korea
| | - Sangwook Wu
- R&D Center, PharmCADD, Busan, South Korea
- Department of Physics, Pukyong National University, Busan, South Korea
| |
Collapse
|
10
|
Kumar R, Sharma A, Alexiou A, Ashraf GM. Artificial Intelligence in De novo Drug Design: Are We Still There? Curr Top Med Chem 2022; 22:2483-2492. [PMID: 36263480 DOI: 10.2174/1568026623666221017143244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 01/20/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-assisted design of drug candidates with novel structures and desired properties has received significant attention in the recent past, so related areas of forward prediction that aim to discover chemical matters worth synthesizing and further experimental investigation. OBJECTIVES The purpose behind developing AI-driven models is to explore the broader chemical space and suggest new drug candidate scaffolds with promising therapeutic value. Moreover, it is anticipated that such AI-based models may not only significantly reduce the cost and time but also decrease the attrition rate of drug candidates that fail to reach the desirable endpoints at the final stages of drug development. In an attempt to develop AI-based models for de novo drug design, numerous methods have been proposed by various study groups by applying machine learning and deep learning algorithms to chemical datasets. However, there are many challenges in obtaining accurate predictions, and real breakthroughs in de novo drug design are still scarce. METHODS In this review, we explore the recent trends in developing AI-based models for de novo drug design to assess the current status, challenges, and opportunities in the field. CONCLUSION The consistently improved AI algorithms and the abundance of curated training chemical data indicate that AI-based de novo drug design should perform better than the current models. Improvements in the performance are warranted to obtain better outcomes in the form of potential drug candidates, which can perform well in in vivo conditions, especially in the case of more complex diseases.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Uttar Pradesh, India
| | - Anju Sharma
- Department of Applied Science, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
| | - Athanasios Alexiou
- Novel Global Community Educational Foundation, Hebersham, 2770 NSW, Australia.,AFNP Med Austria, 1010 Wien, Austria
| | - Ghulam Md Ashraf
- Pre-Clinical Research Unit (PCRU), King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia.,Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
11
|
Li C, Wang C, Sun M, Zeng Y, Yuan Y, Gou Q, Wang G, Guo Y, Pu X. Correlated RNN Framework to Quickly Generate Molecules with Desired Properties for Energetic Materials in the Low Data Regime. J Chem Inf Model 2022; 62:4873-4887. [PMID: 35998331 DOI: 10.1021/acs.jcim.2c00997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Motivated by the challenging of deep learning on the low data regime and the urgent demand for intelligent design on highly energetic materials, we explore a correlated deep learning framework, which consists of three recurrent neural networks (RNNs) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with a high detonation velocity in the case of very limited data available. To avoid the dependence on the external big data set, data augmentation by fragment shuffling of 303 energetic compounds is utilized to produce 500,000 molecules to pretrain RNN, through which the model can learn sufficient structure knowledge. Then the pretrained RNN is fine-tuned by focusing on the 303 energetic compounds to generate 7153 molecules similar to the energetic compounds. In order to more reliably screen the molecules with a high detonation velocity, the SMILE enumeration augmentation coupled with the pretrained knowledge is utilized to build an RNN-based prediction model, through which R2 is boosted from 0.4446 to 0.9572. The comparable performance with the transfer learning strategy based on an existing big database (ChEMBL) to produce the energetic molecules and drug-like ones further supports the effectiveness and generality of our strategy in the low data regime. High-precision quantum mechanics calculations further confirm that 35 new molecules present a higher detonation velocity and lower synthetic accessibility than the classic explosive RDX, along with good thermal stability. In particular, three new molecules are comparable to caged CL-20 in the detonation velocity. All the source codes and the data set are freely available at https://github.com/wangchenghuidream/RNNMGM.
Collapse
Affiliation(s)
- Chuan Li
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Chenghui Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Ming Sun
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yan Zeng
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yuan Yuan
- College of Management, Southwest University for Nationalities, Chengdu 610041, China
| | - Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Guangchuan Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
12
|
Andrews J, Gkountouna O, Blaisten-Barojas E. Forecasting molecular dynamics energetics of polymers in solution from supervised machine learning. Chem Sci 2022; 13:7021-7033. [PMID: 35774160 PMCID: PMC9200117 DOI: 10.1039/d2sc01216b] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/24/2022] [Indexed: 11/21/2022] Open
Abstract
Machine learning techniques including neural networks are popular tools for chemical, physical and materials applications searching for viable alternative methods in the analysis of structure and energetics of systems ranging from crystals to biomolecules. Efforts are less abundant for prediction of kinetics and dynamics. Here we explore the ability of three well established recurrent neural network architectures for reproducing and forecasting the energetics of a liquid solution of ethyl acetate containing a macromolecular polymer-lipid aggregate at ambient conditions. Data models from three recurrent neural networks, ERNN, LSTM and GRU, are trained and tested on half million points time series of the macromolecular aggregate potential energy and its interaction energy with the solvent obtained from molecular dynamics simulations. Our exhaustive analyses convey that the recurrent neural network architectures investigated generate data models that reproduce excellently the time series although their capability of yielding short or long term energetics forecasts with expected statistical distributions of the time points is limited. We propose an in silico protocol by extracting time patterns of the original series and utilizing these patterns to create an ensemble of artificial network models trained on an ensemble of time series seeded by the additional time patters. The energetics forecast improve, predicting a band of forecasted time series with a spread of values consistent with the molecular dynamics energy fluctuations span. Although the distribution of points from the band of energy forecasts is not optimal, the proposed in silico protocol provides useful estimates of the solvated macromolecular aggregate fate. Given the growing application of artificial networks in materials design, the data-based protocol presented here expands the realm of science areas where supervised machine learning serves as a decision making tool aiding the simulation practitioner to assess when long simulations are worth to be continued.
Collapse
Affiliation(s)
- James Andrews
- Center for Simulation and Modeling, George Mason University Fairfax Virginia 22030 USA
- Department of Computational and Data Sciences, George Mason University Fairfax Virginia 22030 USA
| | - Olga Gkountouna
- Department of Computational and Data Sciences, George Mason University Fairfax Virginia 22030 USA
| | - Estela Blaisten-Barojas
- Center for Simulation and Modeling, George Mason University Fairfax Virginia 22030 USA
- Department of Computational and Data Sciences, George Mason University Fairfax Virginia 22030 USA
| |
Collapse
|
13
|
Flam-Shepherd D, Zhu K, Aspuru-Guzik A. Language models can learn complex molecular distributions. Nat Commun 2022; 13:3293. [PMID: 35672310 PMCID: PMC9174447 DOI: 10.1038/s41467-022-30839-x] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 05/16/2022] [Indexed: 11/09/2022] Open
Abstract
Deep generative models of molecules have grown immensely in popularity, trained on relevant datasets, these models are used to search through chemical space. The downstream utility of generative models for the inverse design of novel functional compounds, depends on their ability to learn a training distribution of molecules. The most simple example is a language model that takes the form of a recurrent neural network and generates molecules using a string representation. Since their initial use, subsequent work has shown that language models are very capable, in particular, recent research has demonstrated their utility in the low data regime. In this work, we investigate the capacity of simple language models to learn more complex distributions of molecules. For this purpose, we introduce several challenging generative modeling tasks by compiling larger, more complex distributions of molecules and we evaluate the ability of language models on each task. The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions. Language models can accurately generate: distributions of the highest scoring penalized LogP molecules in ZINC15, multi-modal molecular distributions as well as the largest molecules in PubChem. The results highlight the limitations of some of the most popular and recent graph generative models- many of which cannot scale to these molecular distributions.
Collapse
Affiliation(s)
- Daniel Flam-Shepherd
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada.
| | - Kevin Zhu
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada.
- Department of Chemistry, University of Toronto, Toronto, ON, M5G 1Z8, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, M5G 1Z8, Canada.
| |
Collapse
|
14
|
Godinez WJ, Ma EJ, Chao AT, Pei L, Skewes-Cox P, Canham SM, Jenkins JL, Young JM, Martin EJ, Guiguemde WA. Design of potent antimalarials with generative chemistry. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00448-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
15
|
Deep Learning Applied to Ligand-Based De Novo Drug Design. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:273-299. [PMID: 34731474 DOI: 10.1007/978-1-0716-1787-8_12] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
In the latest years, the application of deep generative models to suggest virtual compounds is becoming a new and powerful tool in drug discovery projects. The idea behind this review is to offer an updated view on de novo design approaches based on artificial intelligent (AI) algorithms, with a particular focus on ligand-based methods. We start this review by reporting a brief overview of the most relevant de novo design approaches developed before the use of AI techniques. We then describe the nowadays most common neural network architectures employed in ligand-based de novo design, together with an up-to-date list of more than 100 deep generative models found in the literature (2017-2020). In order to show how deep generative approaches are applied into drug discovery context, we report all the now available studies in which generated compounds have been synthetized and their biological activity tested. Finally, we discuss what we envisage as beneficial future directions for further application of deep generative models in de novo drug design.
Collapse
|
16
|
Melo MCR, Maasch JRMA, de la Fuente-Nunez C. Accelerating antibiotic discovery through artificial intelligence. Commun Biol 2021; 4:1050. [PMID: 34504303 PMCID: PMC8429579 DOI: 10.1038/s42003-021-02586-0] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
By targeting invasive organisms, antibiotics insert themselves into the ancient struggle of the host-pathogen evolutionary arms race. As pathogens evolve tactics for evading antibiotics, therapies decline in efficacy and must be replaced, distinguishing antibiotics from most other forms of drug development. Together with a slow and expensive antibiotic development pipeline, the proliferation of drug-resistant pathogens drives urgent interest in computational methods that promise to expedite candidate discovery. Strides in artificial intelligence (AI) have encouraged its application to multiple dimensions of computer-aided drug design, with increasing application to antibiotic discovery. This review describes AI-facilitated advances in the discovery of both small molecule antibiotics and antimicrobial peptides. Beyond the essential prediction of antimicrobial activity, emphasis is also given to antimicrobial compound representation, determination of drug-likeness traits, antimicrobial resistance, and de novo molecular design. Given the urgency of the antimicrobial resistance crisis, we analyze uptake of open science best practices in AI-driven antibiotic discovery and argue for openness and reproducibility as a means of accelerating preclinical research. Finally, trends in the literature and areas for future inquiry are discussed, as artificially intelligent enhancements to drug discovery at large offer many opportunities for future applications in antibiotic development.
Collapse
Affiliation(s)
- Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacqueline R M A Maasch
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
17
|
Moret M, Helmstädter M, Grisoni F, Schneider G, Merk D. Beam‐Search zum automatisierten Entwurf und Scoring neuer ROR‐Liganden mithilfe maschineller Intelligenz**. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202104405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Michael Moret
- ETH Zurich Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Schweiz
| | - Moritz Helmstädter
- Goethe University Frankfurt Institute of Pharmaceutical Chemistry Max-von-Laue-Straße 9 60438 Frankfurt Deutschland
| | - Francesca Grisoni
- ETH Zurich Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Schweiz
- Eindhoven University of Technology Institute for Complex Molecular Systems Department of Biomedical Engineering Groene Loper 7 5612AZ Eindhoven Niederlande
| | - Gisbert Schneider
- ETH Zurich Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Schweiz
- ETH Singapore SEC Ltd 1 CREATE Way, #06-01 CREATE Tower Singapore 138602 Singapur
| | - Daniel Merk
- Goethe University Frankfurt Institute of Pharmaceutical Chemistry Max-von-Laue-Straße 9 60438 Frankfurt Deutschland
- LMU München Department of Pharmacy Butenandtstraße 7 81377 München Deutschland
| |
Collapse
|
18
|
Moret M, Helmstädter M, Grisoni F, Schneider G, Merk D. Beam Search for Automated Design and Scoring of Novel ROR Ligands with Machine Intelligence*. Angew Chem Int Ed Engl 2021; 60:19477-19482. [PMID: 34165856 PMCID: PMC8457062 DOI: 10.1002/anie.202104405] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/02/2021] [Indexed: 01/10/2023]
Abstract
Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. Herein, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORγ. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.
Collapse
Affiliation(s)
- Michael Moret
- ETH ZurichDepartment of Chemistry and Applied BiosciencesVladimir-Prelog-Weg 48093ZurichSwitzerland
| | - Moritz Helmstädter
- Goethe University FrankfurtInstitute of Pharmaceutical ChemistryMax-von-Laue-Strasse 960438FrankfurtGermany
| | - Francesca Grisoni
- ETH ZurichDepartment of Chemistry and Applied BiosciencesVladimir-Prelog-Weg 48093ZurichSwitzerland
- Eindhoven University of TechnologyInstitute for Complex Molecular SystemsDepartment of Biomedical EngineeringGroene Loper 75612AZEindhovenNetherlands
| | - Gisbert Schneider
- ETH ZurichDepartment of Chemistry and Applied BiosciencesVladimir-Prelog-Weg 48093ZurichSwitzerland
- ETH Singapore SEC Ltd1 CREATE Way, #06-01 CREATE TowerSingapore138602Singapore
| | - Daniel Merk
- Goethe University FrankfurtInstitute of Pharmaceutical ChemistryMax-von-Laue-Strasse 960438FrankfurtGermany
- LMU MunichDepartment of PharmacyButenandtstrasse 781377MunichGermany
| |
Collapse
|
19
|
|
20
|
Grisoni F, Huisman BJH, Button AL, Moret M, Atz K, Merk D, Schneider G. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. SCIENCE ADVANCES 2021; 7:eabg3338. [PMID: 34117066 PMCID: PMC8195470 DOI: 10.1126/sciadv.abg3338] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Accepted: 04/23/2021] [Indexed: 05/24/2023]
Abstract
Automating the molecular design-make-test-analyze cycle accelerates hit and lead finding for drug discovery. Using deep learning for molecular design and a microfluidics platform for on-chip chemical synthesis, liver X receptor (LXR) agonists were generated from scratch. The computational pipeline was tuned to explore the chemical space of known LXRα agonists and generate novel molecular candidates. To ensure compatibility with automated on-chip synthesis, the chemical space was confined to the virtual products obtainable from 17 one-step reactions. Twenty-five de novo designs were successfully synthesized in flow. In vitro screening of the crude reaction products revealed 17 (68%) hits, with up to 60-fold LXR activation. The batch resynthesis, purification, and retesting of 14 of these compounds confirmed that 12 of them were potent LXR agonists. These results support the suitability of the proposed design-make-test-analyze framework as a blueprint for automated drug design with artificial intelligence and miniaturized bench-top synthesis.
Collapse
Affiliation(s)
- Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, Netherlands
| | - Berend J H Huisman
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
| | - Alexander L Button
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
- University of Lausanne, Department of Computational Biology, Lausanne, Switzerland
| | - Michael Moret
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland
| | - Daniel Merk
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, Frankfurt, Germany
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- ETH Singapore SEC Ltd, Singapore, Singapore
| |
Collapse
|
21
|
Serafim MSM, Dos Santos Júnior VS, Gertrudes JC, Maltarollo VG, Honorio KM. Machine learning techniques applied to the drug design and discovery of new antivirals: a brief look over the past decade. Expert Opin Drug Discov 2021; 16:961-975. [PMID: 33957833 DOI: 10.1080/17460441.2021.1918098] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Introduction: Drug design and discovery of new antivirals will always be extremely important in medicinal chemistry, taking into account known and new viral diseases that are yet to come. Although machine learning (ML) have shown to improve predictions on the biological potential of chemicals and accelerate the discovery of drugs over the past decade, new methods and their combinations have improved their performance and established promising perspectives regarding ML in the search for new antivirals.Areas covered: The authors consider some interesting areas that deal with different ML techniques applied to antivirals. Recent innovative studies on ML and antivirals were selected and analyzed in detail. Also, the authors provide a brief look at the past to the present to detect advances and bottlenecks in the area.Expert opinion: From classical ML techniques, it was possible to boost the searches for antivirals. However, from the emergence of new algorithms and the improvement in old approaches, promising results will be achieved every day, as we have observed in the case of SARS-CoV-2. Recent experience has shown that it is possible to use ML to discover new antiviral candidates from virtual screening and drug repurposing.
Collapse
Affiliation(s)
- Mateus Sá Magalhães Serafim
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | | | - Jadson Castro Gertrudes
- Departamento de Computação, Instituto de Ciências Exatas e Biológicas, Universidade Federal de Ouro Preto (UFOP), Ouro Preto, Brazil
| | - Vinícius Gonçalves Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Kathia Maria Honorio
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo (USP), São Paulo, Brazil.,Centro de Ciências Naturais e Humanas, Universidade Federal do ABC (UFABC), Santo André, Brazil
| |
Collapse
|
22
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
23
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|
24
|
Automated design and optimization of multitarget schizophrenia drug candidates by deep learning. Eur J Med Chem 2020; 204:112572. [DOI: 10.1016/j.ejmech.2020.112572] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 06/04/2020] [Accepted: 06/11/2020] [Indexed: 01/20/2023]
|
25
|
Capecchi A, Reymond JL. Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning. Biomolecules 2020; 10:E1385. [PMID: 32998475 PMCID: PMC7600738 DOI: 10.3390/biom10101385] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 09/22/2020] [Accepted: 09/25/2020] [Indexed: 12/20/2022] Open
Abstract
Microbial natural products (NPs) are an important source of drugs, however, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP). The resulting interactive map organizes molecules by physico-chemical properties and compound families such as peptides and glycosides. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin.
Collapse
Affiliation(s)
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland;
| |
Collapse
|
26
|
Cai C, Wang S, Xu Y, Zhang W, Tang K, Ouyang Q, Lai L, Pei J. Transfer Learning for Drug Discovery. J Med Chem 2020; 63:8683-8694. [PMID: 32672961 DOI: 10.1021/acs.jmedchem.9b02147] [Citation(s) in RCA: 144] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The data sets available to train models for in silico drug discovery efforts are often small. Indeed, the sparse availability of labeled data is a major barrier to artificial-intelligence-assisted drug discovery. One solution to this problem is to develop algorithms that can cope with relatively heterogeneous and scarce data. Transfer learning is a type of machine learning that can leverage existing, generalizable knowledge from other related tasks to enable learning of a separate task with a small set of data. Deep transfer learning is the most commonly used type of transfer learning in the field of drug discovery. This Perspective provides an overview of transfer learning and related applications to drug discovery to date. Furthermore, it provides outlooks on the future development of transfer learning for drug discovery.
Collapse
Affiliation(s)
- Chenjing Cai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China
| | - Shiwei Wang
- PTN Graduate Program, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China
| | - Youjun Xu
- BNLMS and Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, P. R. China
| | - Weilin Zhang
- Beijing Intelligent Pharma Technology Co., Ltd., Beijing 100083, P. R. China
| | - Ke Tang
- Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, P. R. China
| | - Qi Ouyang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China.,The State Key Laboratory for Artificial Microstructures and Mesoscopic Physics, School of Physics, Peking University, Beijing 100871, P. R. China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China.,BNLMS and Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, P. R. China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China
| |
Collapse
|
27
|
Amabilino S, Pogány P, Pickett SD, Green DVS. Guidelines for Recurrent Neural Network Transfer Learning-Based Molecular Generation of Focused Libraries. J Chem Inf Model 2020; 60:5699-5713. [DOI: 10.1021/acs.jcim.0c00343] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Silvia Amabilino
- School of Chemistry, University of Bristol, Cantock’s Close, Bristol BS8 1TS, United Kingdom
| | - Peter Pogány
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| | - Stephen D. Pickett
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| | - Darren V. S. Green
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| |
Collapse
|
28
|
Mansbach RA, Leus IV, Mehla J, Lopez CA, Walker JK, Rybenkov VV, Hengartner NW, Zgurskaya HI, Gnanakaran S. Machine Learning Algorithm Identifies an Antibiotic Vocabulary for Permeating Gram-Negative Bacteria. J Chem Inf Model 2020; 60:2838-2847. [PMID: 32453589 DOI: 10.1021/acs.jcim.0c00352] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Drug discovery faces a crisis. The industry has used up the "obvious" space in which to find novel drugs for biomedical applications, and productivity is declining. One strategy to combat this is rational approaches to expand the search space without relying on chemical intuition, to avoid rediscovery of similar spaces. In this work, we present proof of concept of an approach to rationally identify a "chemical vocabulary" related to a specific drug activity of interest without employing known rules. We focus on the pressing concern of multidrug resistance in Pseudomonas aeruginosa by searching for submolecules that promote compound entry into this bacterium. By synergizing theory, computation, and experiment, we validate our approach, explain the molecular mechanism behind identified fragments promoting compound entry, and select candidate compounds from an external library that display good permeation ability.
Collapse
Affiliation(s)
- Rachael A Mansbach
- Department of Theoretical Biology and Biophysics, Los Alamos National Lab, MS-K710, P.O. Box 1663, Los Alamos, New Mexico 87545-0001, United States
| | - Inga V Leus
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, SLSRC, Rm 1000, Norman, Oklahoma 73019-5251, United States
| | - Jitender Mehla
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, SLSRC, Rm 1000, Norman, Oklahoma 73019-5251, United States
| | - Cesar A Lopez
- Department of Theoretical Biology and Biophysics, Los Alamos National Lab, MS-K710, P.O. Box 1663, Los Alamos, New Mexico 87545-0001, United States
| | - John K Walker
- Pharmacology and Physiological Science, School of Medicine, Saint Louis University, Schwitalla Hall, Room M362, St. Louis, Missouri 63104, United States
| | - Valentin V Rybenkov
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, SLSRC, Rm 1000, Norman, Oklahoma 73019-5251, United States
| | - Nicolas W Hengartner
- Department of Theoretical Biology and Biophysics, Los Alamos National Lab, MS-K710, P.O. Box 1663, Los Alamos, New Mexico 87545-0001, United States
| | - Helen I Zgurskaya
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, SLSRC, Rm 1000, Norman, Oklahoma 73019-5251, United States
| | - S Gnanakaran
- Department of Theoretical Biology and Biophysics, Los Alamos National Lab, MS-K710, P.O. Box 1663, Los Alamos, New Mexico 87545-0001, United States
| |
Collapse
|
29
|
Li X, Xu Y, Yao H, Lin K. Chemical space exploration based on recurrent neural networks: applications in discovering kinase inhibitors. J Cheminform 2020; 12:42. [PMID: 33430983 PMCID: PMC7278228 DOI: 10.1186/s13321-020-00446-3] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 06/04/2020] [Indexed: 01/10/2023] Open
Abstract
With the rise of artificial intelligence (AI) in drug discovery, de novo molecular generation provides new ways to explore chemical space. However, because de novo molecular generation methods rely on abundant known molecules, generated molecules may have a problem of novelty. Novelty is important in highly competitive areas of medicinal chemistry, such as the discovery of kinase inhibitors. In this study, de novo molecular generation based on recurrent neural networks was applied to discover a new chemical space of kinase inhibitors. During the application, the practicality was evaluated, and new inspiration was found. With the successful discovery of one potent Pim1 inhibitor and two lead compounds that inhibit CDK4, AI-based molecular generation shows potentials in drug discovery and development.![]()
Collapse
Affiliation(s)
- Xuanyi Li
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China
| | - Yinqiu Xu
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China
| | - Hequan Yao
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| | - Kejiang Lin
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| |
Collapse
|
30
|
Arús-Pous J, Patronov A, Bjerrum EJ, Tyrchan C, Reymond JL, Chen H, Engkvist O. SMILES-based deep generative scaffold decorator for de-novo drug design. J Cheminform 2020; 12:38. [PMID: 33431013 PMCID: PMC7260788 DOI: 10.1186/s13321-020-00441-8] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/16/2020] [Indexed: 12/21/2022] Open
Abstract
Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.
Collapse
Affiliation(s)
- Josep Arús-Pous
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden. .,Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | - Atanas Patronov
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Esben Jannik Bjerrum
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Respiratory Inflammation, and Autoimmune (RIA), BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health -Guangdong Laboratory, Guangzhou, China
| | - Ola Engkvist
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
31
|
|
32
|
de Souza Neto LR, Moreira-Filho JT, Neves BJ, Maidana RLBR, Guimarães ACR, Furnham N, Andrade CH, Silva FP. In silico Strategies to Support Fragment-to-Lead Optimization in Drug Discovery. Front Chem 2020; 8:93. [PMID: 32133344 PMCID: PMC7040036 DOI: 10.3389/fchem.2020.00093] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 01/30/2020] [Indexed: 12/16/2022] Open
Abstract
Fragment-based drug (or lead) discovery (FBDD or FBLD) has developed in the last two decades to become a successful key technology in the pharmaceutical industry for early stage drug discovery and development. The FBDD strategy consists of screening low molecular weight compounds against macromolecular targets (usually proteins) of clinical relevance. These small molecular fragments can bind at one or more sites on the target and act as starting points for the development of lead compounds. In developing the fragments attractive features that can translate into compounds with favorable physical, pharmacokinetics and toxicity (ADMET-absorption, distribution, metabolism, excretion, and toxicity) properties can be integrated. Structure-enabled fragment screening campaigns use a combination of screening by a range of biophysical techniques, such as differential scanning fluorimetry, surface plasmon resonance, and thermophoresis, followed by structural characterization of fragment binding using NMR or X-ray crystallography. Structural characterization is also used in subsequent analysis for growing fragments of selected screening hits. The latest iteration of the FBDD workflow employs a high-throughput methodology of massively parallel screening by X-ray crystallography of individually soaked fragments. In this review we will outline the FBDD strategies and explore a variety of in silico approaches to support the follow-up fragment-to-lead optimization of either: growing, linking, and merging. These fragment expansion strategies include hot spot analysis, druggability prediction, SAR (structure-activity relationships) by catalog methods, application of machine learning/deep learning models for virtual screening and several de novo design methods for proposing synthesizable new compounds. Finally, we will highlight recent case studies in fragment-based drug discovery where in silico methods have successfully contributed to the development of lead compounds.
Collapse
Affiliation(s)
- Lauro Ribeiro de Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - José Teófilo Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - Bruno Junior Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
- Laboratory of Cheminformatics, Centro Universitário de Anápolis – UniEVANGÉLICA, Anápolis, Brazil
| | - Rocío Lucía Beatriz Riveros Maidana
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
- Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Ana Carolina Ramos Guimarães
- Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Carolina Horta Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - Floriano Paes Silva
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| |
Collapse
|
33
|
Capecchi A, Zhang A, Reymond JL. Populating Chemical Space with Peptides Using a Genetic Algorithm. J Chem Inf Model 2020; 60:121-132. [PMID: 31868369 DOI: 10.1021/acs.jcim.9b01014] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In drug discovery, one uses chemical space as a concept to organize molecules according to their structures and properties. One often would like to generate new possible molecules at a specific location in the chemical space marked by a molecule of interest. Herein, we report the peptide design genetic algorithm (PDGA, code available at https://github.com/reymond-group/PeptideDesignGA ), a computational tool capable of producing peptide sequences of various topologies (linear, cyclic/polycyclic, or dendritic) in proximity of any molecule of interest in a chemical space defined by macromolecule extended atom-pair fingerprint (MXFP), an atom-pair fingerprint describing molecular shape and pharmacophores. We show that the PDGA generates high-similarity analogues of bioactive peptides with diverse peptide chain topologies and of nonpeptide target molecules. We illustrate the chemical space accessible by the PDGA with an interactive 3D map of the MXFP property space available at http://faerun.gdb.tools/ . The PDGA should be generally useful to generate peptides at any location in the chemical space.
Collapse
Affiliation(s)
- Alice Capecchi
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Alain Zhang
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry , University of Bern , Freiestrasse 3 , 3012 Bern , Switzerland
| |
Collapse
|
34
|
Arús-Pous J, Johansson SV, Prykhodko O, Bjerrum EJ, Tyrchan C, Reymond JL, Chen H, Engkvist O. Randomized SMILES strings improve the quality of molecular generative models. J Cheminform 2019; 11:71. [PMID: 33430971 PMCID: PMC6873550 DOI: 10.1186/s13321-019-0393-0] [Citation(s) in RCA: 126] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 11/09/2019] [Indexed: 12/22/2022] Open
Abstract
Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set. The generated chemical space is evaluated with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.
Collapse
Affiliation(s)
- Josep Arús-Pous
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden.
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | | | - Oleksii Prykhodko
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | | | - Christian Tyrchan
- Medicinal Chemistry, BioPharmaceuticals Early RIA, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| |
Collapse
|
35
|
Xia Z, Karpov P, Popowicz G, Tetko IV. Focused Library Generator: case of Mdmx inhibitors. J Comput Aided Mol Des 2019; 34:769-782. [PMID: 31677002 DOI: 10.1007/s10822-019-00242-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2019] [Accepted: 10/22/2019] [Indexed: 01/18/2023]
Abstract
We present a Focused Library Generator that is able to create from scratch new molecules with desired properties. After training the Generator on the ChEMBL database, transfer learning was used to switch the generator to producing new Mdmx inhibitors that are a promising class of anticancer drugs. Lilly medicinal chemistry filters, molecular docking, and a QSAR IC50 model were used to refine the output of the Generator. Pharmacophore screening and molecular dynamics (MD) simulations were then used to further select putative ligands. Finally, we identified five promising hits with equivalent or even better predicted binding free energies and IC50 values than known Mdmx inhibitors. The source code of the project is available on https://github.com/bigchem/online-chem.
Collapse
Affiliation(s)
- Zhonghua Xia
- Institute of Structural Biology, Helmholtz Zentrum München - Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Pavel Karpov
- Institute of Structural Biology, Helmholtz Zentrum München - Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
- BigChem GmbH, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Grzegorz Popowicz
- Institute of Structural Biology, Helmholtz Zentrum München - Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München - Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
- BigChem GmbH, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
| |
Collapse
|
36
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 350] [Impact Index Per Article: 70.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|