1
|
Tropsha A, Martin HJ, Cherkasov A. The Six Ds of Exponentials and drug discovery: A path toward reversing Eroom's law. Drug Discov Today 2025; 30:104341. [PMID: 40122449 PMCID: PMC12043357 DOI: 10.1016/j.drudis.2025.104341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Revised: 03/09/2025] [Accepted: 03/18/2025] [Indexed: 03/25/2025]
Abstract
Many technological sectors underwent recent exponential growth because of digital disruption, a phenomenon Peter Diamantis characterized as the 'Six Ds of Exponentials': digitization, deception, disruption, demonetization, dematerialization, and democratization. In contrast, drug discovery has been marked by rising costs and modest growth, if any, of annual drug approvals. We argue that the exponential growth of drug discovery can be also achieved through digital disruption brought by data expansion, mature artificial intelligence (AI), automation of experiments, public-private partnerships, and open science. We detected the emergence of all 'Six Ds of Exponentials' within modern drug discovery and discuss how each of the 'Six Ds' can further empower the field and forcefully address the societal demand for novel, potent, affordable, and accessible medicines.
Collapse
Affiliation(s)
- Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Holli-Joi Martin
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Artem Cherkasov
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
2
|
Bellenger J, Koos MRM, Avery M, Bundesmann M, Ciszewski G, Khunte B, Leverett C, Ostner G, Ryder TF, Farley KA. An Automated Purification Workflow Coupled with Material-Sparing High-Throughput 1H NMR for Parallel Medicinal Chemistry. ACS Med Chem Lett 2024; 15:1635-1644. [PMID: 39291006 PMCID: PMC11403749 DOI: 10.1021/acsmedchemlett.4c00245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 07/12/2024] [Accepted: 07/13/2024] [Indexed: 09/19/2024] Open
Abstract
In medicinal chemistry, purification and characterization of organic compounds is an ever-growing challenge, with an increasing number of compounds being synthesized at a decreased scale of preparation. In response to this trend, we developed a parallel medicinal chemistry (PMC)-tailored platform, coupling automated purification to mass spectrometry (MS) and nuclear magnetic resonance spectroscopy (NMR) on a range of synthetic scales (∼3.0-75.0 μmol). Here, the generation and acquisition of 1.7 mm NMR samples is fully integrated into a high-throughput automated workflow, processing 36 000 compounds yearly. Utilizing dead volume, which is inaccessible in conventional liquid handling, NMR samples are generated on as little as 10 μg without consuming material prioritized for biological assays. As miniaturized PMC synthesis becomes the industry standard, we can now obtain quality NMR spectra from limited material. Paired with automated structure verification, this platform has the potential to allow NMR to become as important for high-throughput analysis as ultrahigh performance liquid chromatography (UPLC)-MS.
Collapse
Affiliation(s)
- Justin Bellenger
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| | - Martin R M Koos
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| | - Melissa Avery
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| | - Mark Bundesmann
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| | - Gregory Ciszewski
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| | - Bhagyashree Khunte
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| | - Carolyn Leverett
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| | - Gregory Ostner
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| | - Tim F Ryder
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| | - Kathleen A Farley
- Medicine Design, Pfizer Inc., 445 Eastern Point Rd, Groton, Connecticut 06340, United States
| |
Collapse
|
3
|
Mervin L, Voronov A, Kabeshov M, Engkvist O. QSARtuna: An Automated QSAR Modeling Platform for Molecular Property Prediction in Drug Design. J Chem Inf Model 2024; 64:5365-5374. [PMID: 38950185 DOI: 10.1021/acs.jcim.4c00457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Machine-learning (ML) and deep-learning (DL) approaches to predict the molecular properties of small molecules are increasingly deployed within the design-make-test-analyze (DMTA) drug design cycle to predict molecular properties of interest. Despite this uptake, there are only a few automated packages to aid their development and deployment that also support uncertainty estimation, model explainability, and other key aspects of model usage. This represents a key unmet need within the field, and the large number of molecular representations and algorithms (and associated parameters) means it is nontrivial to robustly optimize, evaluate, reproduce, and deploy models. Here, we present QSARtuna, a molecule property prediction modeling pipeline, written in Python and utilizing the Optuna, Scikit-learn, RDKit, and ChemProp packages, which enables the efficient and automated comparison between molecular representations and machine learning models. The platform was developed by considering the increasingly important aspect of model uncertainty quantification and explainability by design. We provide details for our framework and provide illustrative examples to demonstrate the capability of the software when applied to simple molecular property, reaction/reactivity prediction, and DNA encoded library enrichment classification. We hope that the release of QSARtuna will further spur innovation in automatic ML modeling and provide a platform for education of best practices in molecular property modeling. The code for the QSARtuna framework is made freely available via GitHub.
Collapse
Affiliation(s)
- Lewis Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge CB2 0AA, United Kingdom
| | - Alexey Voronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 412 96, Sweden
| | - Mikhail Kabeshov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 412 96, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 412 96, Sweden
- Department of Computer Science and Engineering, University of Gothenburg, Chalmers University of Technology, Gothenburg 412 96, Sweden
| |
Collapse
|
4
|
Dodds M, Guo J, Löhr T, Tibo A, Engkvist O, Janet JP. Sample efficient reinforcement learning with active learning for molecular design. Chem Sci 2024; 15:4146-4160. [PMID: 38487235 PMCID: PMC10935729 DOI: 10.1039/d3sc04653b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/07/2024] [Indexed: 03/17/2024] Open
Abstract
Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.
Collapse
Affiliation(s)
- Michael Dodds
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jeff Guo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Thomas Löhr
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| |
Collapse
|
5
|
Heifetz A. Accelerating COVID-19 Drug Discovery with High-Performance Computing. Methods Mol Biol 2024; 2716:405-411. [PMID: 37702951 DOI: 10.1007/978-1-0716-3449-3_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
The recent COVID-19 pandemic has served as a timely reminder that the existing drug discovery is a laborious, expensive, and slow process. Never has there been such global demand for a therapeutic treatment to be identified as a matter of such urgency. Unfortunately, this is a scenario likely to repeat itself in future, so it is of interest to explore ways in which to accelerate drug discovery at pandemic speed. Computational methods naturally lend themselves to this because they can be performed rapidly if sufficient computational resources are available. Recently, high-performance computing (HPC) technologies have led to remarkable achievements in computational drug discovery and yielded a series of new platforms, algorithms, and workflows. The application of artificial intelligence (AI) and machine learning (ML) approaches is also a promising and relatively new avenue to revolutionize the drug design process and therefore reduce costs. In this review, I describe how molecular dynamics simulations (MD) were successfully integrated with ML and adapted to HPC to form a powerful tool to study inhibitors for four of the COVID-19 target proteins. The emphasis of this review is on the strategy that was used with an explanation of each of the steps in the accelerated drug discovery workflow. For specific technical details, the reader is directed to the relevant research publications.
Collapse
|
6
|
Tao W, Liu Y, Lin X, Song B, Zeng X. Prediction of multi-relational drug-gene interaction via Dynamic hyperGraph Contrastive Learning. Brief Bioinform 2023; 24:bbad371. [PMID: 37864294 DOI: 10.1093/bib/bbad371] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/11/2023] [Accepted: 09/29/2023] [Indexed: 10/22/2023] Open
Abstract
Drug-gene interaction prediction occupies a crucial position in various areas of drug discovery, such as drug repurposing, lead discovery and off-target detection. Previous studies show good performance, but they are limited to exploring the binding interactions and ignoring the other interaction relationships. Graph neural networks have emerged as promising approaches owing to their powerful capability of modeling correlations under drug-gene bipartite graphs. Despite the widespread adoption of graph neural network-based methods, many of them experience performance degradation in situations where high-quality and sufficient training data are unavailable. Unfortunately, in practical drug discovery scenarios, interaction data are often sparse and noisy, which may lead to unsatisfactory results. To undertake the above challenges, we propose a novel Dynamic hyperGraph Contrastive Learning (DGCL) framework that exploits local and global relationships between drugs and genes. Specifically, graph convolutions are adopted to extract explicit local relations among drugs and genes. Meanwhile, the cooperation of dynamic hypergraph structure learning and hypergraph message passing enables the model to aggregate information in a global region. With flexible global-level messages, a self-augmented contrastive learning component is designed to constrain hypergraph structure learning and enhance the discrimination of drug/gene representations. Experiments conducted on three datasets show that DGCL is superior to eight state-of-the-art methods and notably gains a 7.6% performance improvement on the DGIdb dataset. Further analyses verify the robustness of DGCL for alleviating data sparsity and over-smoothing issues.
Collapse
Affiliation(s)
- Wen Tao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Xuan Lin
- School of Computer Science, Xiangtan University, Xiangtan, 411105 Hunan, China
- Key Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105 Hunan, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082 Hunan, China
| |
Collapse
|
7
|
Handa K, Wright P, Yoshimura S, Kageyama M, Iijima T, Bender A. Prediction of Compound Plasma Concentration-Time Profiles in Mice Using Random Forest. Mol Pharm 2023. [PMID: 37096989 DOI: 10.1021/acs.molpharmaceut.3c00071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2023]
Abstract
Pharmacokinetic (PK) parameters such as clearance (CL) and volume of distribution (Vd) have been the subject of previous in silico predictive models. However, having information of the concentration over time profile explicitly can provide additional value like time above MIC or AUC, etc., to understand both the efficacy and safety-related aspects of a compound. In this work, we developed machine learning models for plasma concentration-time profiles after both i.v. and p.o. dosing for a series of 17 in-house projects. For explanatory variables, MACCS Keys chemical descriptors as well as in silico and experimental in vitro PK parameters were used. The predictive accuracy of random forest (RF), message passing neural network, 2-compartment models using estimated CL and Vdss, and an average model (as a control experiment) was investigated using 5-fold cross-validation (5-fold CV) and leave-one-project-out validation (LOPO-V). The predictive accuracy of RF in 5-fold CV for i.v. and p.o. plasma concentration-time profiles was the best among the models studied, with an RMSE for i.v. dosing at 0.08, 1, and 8 h of 0.245, 0.474, and 0.462, respectively, and an RMSE for p.o. dosing at 0.25, 1, and 8 h of 0.500, 0.612, and 0.509, respectively. Furthermore, by investigating the importance of the in vitro PK parameters using the Gini index, we observed that the general prior knowledge in ADME research was reflected well in the respective feature importance of in vitro parameters such as predicted human Vd (hVd) for the initial distribution, mouse intrinsic CL and unbound fraction of mouse plasma for the elimination process, and Caco2 permeability for the absorption process. Also, this model is the first model that can predict twin peaks in the concentration-time profile much better than a baseline compartment model. Because of its combination of sufficient accuracy and speed of prediction, we found the model to be fit-for-purpose for practical lead optimization.
Collapse
Affiliation(s)
- Koichi Handa
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-shi, Tokyo 191-8512, Japan
| | - Peter Wright
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Saki Yoshimura
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-shi, Tokyo 191-8512, Japan
| | - Michiharu Kageyama
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-shi, Tokyo 191-8512, Japan
| | - Takeshi Iijima
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-shi, Tokyo 191-8512, Japan
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
8
|
Jaume-Santero F, Bornet A, Valery A, Naderi N, Vicente Alvarez D, Proios D, Yazdani A, Bournez C, Fessard T, Teodoro D. Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios. J Chem Inf Model 2023; 63:1914-1924. [PMID: 36952584 PMCID: PMC10091402 DOI: 10.1021/acs.jcim.2c01407] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a wide range of chemical reactions. Among models suited for the task of language translation, the recently introduced molecular transformer reached impressive performance in terms of forward-synthesis and retrosynthesis predictions. In this study, we first present an analysis of the performance of transformer models for product, reactant, and reagent prediction tasks under different scenarios of data availability and data augmentation. We find that the impact of data augmentation depends on the prediction task and on the metric used to evaluate the model performance. Second, we probe the contribution of different combinations of input formats, tokenization schemes, and embedding strategies to model performance. We find that less stable input settings generally lead to better performance. Lastly, we validate the superiority of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show a strong agreement for predictions that pass the round-trip test. This demonstrates the usefulness of more elaborate metrics in complex predictive scenarios and highlights the limitations of direct comparisons to a predefined database, which may include a limited number of chemical reaction pathways.
Collapse
Affiliation(s)
- Fernando Jaume-Santero
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
| | - Alban Bornet
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
| | | | - Nona Naderi
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - David Vicente Alvarez
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
| | - Dimitrios Proios
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
| | - Anthony Yazdani
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
| | | | | | - Douglas Teodoro
- Department of Radiology and Medical Informatics, University of Geneva, 1205 Geneva, Switzerland
- Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, 1227 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
9
|
Tysinger EP, Rai BK, Sinitskiy AV. Can We Quickly Learn to "Translate" Bioactive Molecules with Transformer Models? J Chem Inf Model 2023; 63:1734-1744. [PMID: 36914216 DOI: 10.1021/acs.jcim.2c01618] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
Meaningful exploration of the chemical space of druglike molecules in drug design is a highly challenging task due to a combinatorial explosion of possible modifications of molecules. In this work, we address this problem with transformer models, a type of machine learning (ML) model originally developed for machine translation. By training transformer models on pairs of similar bioactive molecules from the public ChEMBL data set, we enable them to learn medicinal-chemistry-meaningful, context-dependent transformations of molecules, including those absent from the training set. By retrospective analysis on the performance of transformer models on ChEMBL subsets of ligands binding to COX2, DRD2, or HERG protein targets, we demonstrate that the models can generate structures identical or highly similar to most active ligands, despite the models having not seen any ligands active against the corresponding protein target during training. Our work demonstrates that human experts working on hit expansion in drug design can easily and quickly employ transformer models, originally developed to translate texts from one natural language to another, to "translate" from known molecules active against a given protein target to novel molecules active against the same target.
Collapse
Affiliation(s)
- Emma P Tysinger
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Brajesh K Rai
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Anton V Sinitskiy
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
10
|
Urbina F, Lowden CT, Culberson JC, Ekins S. MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction. ACS OMEGA 2022; 7:18699-18713. [PMID: 35694522 PMCID: PMC9178760 DOI: 10.1021/acsomega.2c01404] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 05/11/2022] [Indexed: 05/04/2023]
Abstract
Generative machine learning models have become widely adopted in drug discovery and other fields to produce new molecules and explore molecular space, with the goal of discovering novel compounds with optimized properties. These generative models are frequently combined with transfer learning or scoring of the physicochemical properties to steer generative design, yet often, they are not capable of addressing a wide variety of potential problems, as well as converge into similar molecular space when combined with a scoring function for the desired properties. In addition, these generated compounds may not be synthetically feasible, reducing their capabilities and limiting their usefulness in real-world scenarios. Here, we introduce a suite of automated tools called MegaSyn representing three components: a new hill-climb algorithm, which makes use of SMILES-based recurrent neural network (RNN) generative models, analog generation software, and retrosynthetic analysis coupled with fragment analysis to score molecules for their synthetic feasibility. We show that by deconstructing the targeted molecules and focusing on substructures, combined with an ensemble of generative models, MegaSyn generally performs well for the specific tasks of generating new scaffolds as well as targeted analogs, which are likely synthesizable and druglike. We now describe the development, benchmarking, and testing of this suite of tools and propose how they might be used to optimize molecules or prioritize promising lead compounds using these RNN examples provided by multiple test case examples.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Christopher T. Lowden
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - J. Christopher Culberson
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - Sean Ekins
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| |
Collapse
|
11
|
From traditional to data-driven medicinal chemistry: a case study. Drug Discov Today 2022; 27:2065-2070. [PMID: 35452790 DOI: 10.1016/j.drudis.2022.04.017] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 04/08/2022] [Accepted: 04/13/2022] [Indexed: 12/20/2022]
Abstract
Artificial intelligence (AI) and data science are beginning to impact drug discovery. It usually takes considerable time and effort until new scientific concepts or technologies make a transition from conceptual stages to practical applicability and until experience values are gathered. Especially for computational approaches, demonstrating measurable impact on drug discovery projects is not a trivial task. A pilot study at Daiichi Sankyo Company has attempted to integrate data-driven approaches into practical medicinal chemistry and quantify the impact, as reported herein. Although the organization and focal points of early-phase drug discovery naturally vary at different pharmaceutical companies, the results of this pilot study indicate the significant potential of data-driven medicinal chemistry and suggest new models for internal training of next-generation medicinal chemists. Keywords: medicinal chemistry; drug discovery; chemoinformatics; data science; data-driven R&D.
Collapse
|