Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Winter R, Montanari F, Noé F, Clevert DA. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 2019;10:1692-1701. [PMID: 30842833 PMCID: PMC6368215 DOI: 10.1039/c8sc04175j] [Citation(s) in RCA: 233] [Impact Index Per Article: 46.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 11/17/2018] [Indexed: 12/23/2022] Open

For:	Winter R, Montanari F, Noé F, Clevert DA. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 2019;10:1692-1701. [PMID: 30842833 PMCID: PMC6368215 DOI: 10.1039/c8sc04175j] [Citation(s) in RCA: 233] [Impact Index Per Article: 46.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 11/17/2018] [Indexed: 12/23/2022] Open

Number

Cited by Other Article(s)

Ferraz-Caetano J, Teixeira F, Cordeiro MNDS. Data-driven, explainable machine learning model for predicting volatile organic compounds' standard vaporization enthalpy. CHEMOSPHERE 2024;359:142257. [PMID: 38719116 DOI: 10.1016/j.chemosphere.2024.142257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 04/18/2024] [Accepted: 05/04/2024] [Indexed: 05/21/2024]

Duan Y, Yang X, Zeng X, Wang W, Deng Y, Cao D. Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge. J Med Chem 2024;67:9575-9586. [PMID: 38748846 DOI: 10.1021/acs.jmedchem.4c00692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]

Xia X, Liu Y, Zheng C, Zhang X, Wu Q, Gao X, Zeng X, Su Y. Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space. J Chem Inf Model 2024. [PMID: 38870455 DOI: 10.1021/acs.jcim.4c00031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]

Aksamit N, Hou J, Li Y, Ombuki-Berman B. Integrating transformers and many-objective optimization for drug design. BMC Bioinformatics 2024;25:208. [PMID: 38849719 PMCID: PMC11161990 DOI: 10.1186/s12859-024-05822-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 05/30/2024] [Indexed: 06/09/2024] Open

Tran TTV, Tayara H, Chong KT. AMPred-CNN: Ames mutagenicity prediction model based on convolutional neural networks. Comput Biol Med 2024;176:108560. [PMID: 38754218 DOI: 10.1016/j.compbiomed.2024.108560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/15/2024] [Accepted: 05/05/2024] [Indexed: 05/18/2024]

Zhang VY, O'Connor SL, Welsh WJ, James MH. Machine learning models to predict ligand binding affinity for the orexin 1 receptor. ARTIFICIAL INTELLIGENCE CHEMISTRY 2024;2:100040. [PMID: 38476266 PMCID: PMC10927255 DOI: 10.1016/j.aichem.2023.100040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]

Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024;45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]

Abstract

Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.

Collapse

Du W, Zhao L, Wu R, Huang B, Liu S, Liu Y, Huang H, Shi G. Predicting drug-Protein interaction with deep learning framework for molecular graphs and sequences: Potential candidates against SAR-CoV-2. PLoS One 2024;19:e0299696. [PMID: 38728335 PMCID: PMC11086825 DOI: 10.1371/journal.pone.0299696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 02/14/2024] [Indexed: 05/12/2024] Open

Qiu X, Wang H, Tan X, Fang Z. G-K BertDTA: A graph representation learning and semantic embedding-based framework for drug-target affinity prediction. Comput Biol Med 2024;173:108376. [PMID: 38552281 DOI: 10.1016/j.compbiomed.2024.108376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/21/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]

Abstract

Developing new drugs is costly, time-consuming, and risky. Drug-target affinity (DTA), indicating the binding capability between drugs and target proteins, is a crucial indicator for drug development. Accurately predicting interaction strength between new drug-target pairs by analyzing previous experiments aids in screening potential drug molecules, repurposing them, and developing safe and effective medicines. Existing computational models for DTA prediction rely on strings or single-graph neural networks, lacking consideration of protein structure and molecular semantic information, leading to limited accuracy. Our experiments demonstrate that string-based methods may overlook protein conformations, causing a high root mean square error (RMSE) of 3.584 in affinity due to a lack of spatial context. Single graph networks also underperform on topology features, with a 6% lower confidence interval (CI) for activity classification. Absent semantic information also limits generalization across diverse compounds, resulting in 18% increment in RMSE and 5% in misclassifications within quantifications study, restricting potential drug discovery. To address these limitations, we propose G-K BertDTA, a novel framework for accurate DTA prediction incorporating protein features, molecular semantic features, and molecular structural information. In this proposed model, we represent drugs as graphs, with a GIN employed to learn the molecular topological information. For the extraction of protein structural features, we utilize a DenseNet architecture. A knowledge-based BERT semantic model is incorporated to obtain rich pre-trained semantic embeddings, thereby enhancing the feature information. We extensively evaluated our proposed approach on the publicly available benchmark datasets (i.e., KIBA and Davis), and experimental results demonstrate the promising performance of our method, which consistently outperforms previous state-of-the-art approaches. Code is available at https://github.com/AmbitYuki/G-K-BertDTA.

Collapse

Schieferdecker S, Rottach F, Vock E. In Silico Prediction of Oral Acute Rodent Toxicity Using Consensus Machine Learning. J Chem Inf Model 2024;64:3114-3122. [PMID: 38498695 DOI: 10.1021/acs.jcim.4c00056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]

Lin J, He Y, Ru C, Long W, Li M, Wen Z. Advancing Adverse Drug Reaction Prediction with Deep Chemical Language Model for Drug Safety Evaluation. Int J Mol Sci 2024;25:4516. [PMID: 38674100 PMCID: PMC11050562 DOI: 10.3390/ijms25084516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 04/28/2024] Open

Heyndrickx W, Mervin L, Morawietz T, Sturm N, Friedrich L, Zalewski A, Pentina A, Humbeck L, Oldenhof M, Niwayama R, Schmidtke P, Fechner N, Simm J, Arany A, Drizard N, Jabal R, Afanasyeva A, Loeb R, Verma S, Harnqvist S, Holmes M, Pejo B, Telenczuk M, Holway N, Dieckmann A, Rieke N, Zumsande F, Clevert DA, Krug M, Luscombe C, Green D, Ertl P, Antal P, Marcus D, Do Huu N, Fuji H, Pickett S, Acs G, Boniface E, Beck B, Sun Y, Gohier A, Rippmann F, Engkvist O, Göller AH, Moreau Y, Galtier MN, Schuffenhauer A, Ceulemans H. MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information. J Chem Inf Model 2024;64:2331-2344. [PMID: 37642660 PMCID: PMC11005050 DOI: 10.1021/acs.jcim.3c00799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 08/31/2023]

Affiliation(s)

Wouter Heyndrickx Janssen Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium
Lewis Mervin AstraZeneca R&D, Biomedical Campus, 1 Francis Crick Ave, Cambridge CB2 0SL, U.K.
Tobias Morawietz Bayer Pharma AG, Global Drug Discovery, Chemical Research, Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
Noé Sturm Novartis Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
Lukas Friedrich Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
Adam Zalewski Amgen Research (Munich) GmbH, Staffelseestraße 2, Munich 81477, Germany
Anastasia Pentina Bayer AG, Machine Learning Research, Research & Development, Pharmaceuticals, Berlin 10117, Germany
Lina Humbeck BI Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
Martijn Oldenhof KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Ritsuya Niwayama Institut de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
Peter Schmidtke Discngine, Avenue Ledru Rollin 79, Paris 75012, France
Nikolas Fechner Novartis Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
Jaak Simm KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Adam Arany KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Nicolas Drizard Iktos, 65 rue de Prony, Paris 75017, France
Rama Jabal Iktos, 65 rue de Prony, Paris 75017, France
Arina Afanasyeva Modality Informatics Group, Digital Research Solutions, Advanced Informatics & Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
Regis Loeb KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Shlok Verma GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
Simon Harnqvist GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
Matthew Holmes GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
Balazs Pejo Budapest University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
Maria Telenczuk Owkin, 12 Rue Martel, Paris 75010, France
Nicholas Holway Novartis Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
Arne Dieckmann Bayer AG, API Production, Product Supply, Pharmaceuticals, Ernst-Schering-Straße 14, Bergkamen 59192, Germany
Nicola Rieke NVIDIA GmbH, Floessergasse 2, Munich 81369, Germany
Friederike Zumsande Amgen Research (Munich) GmbH, Staffelseestraße 2, Munich 81477, Germany
Djork-Arné Clevert Bayer AG, Machine Learning Research, Research & Development, Pharmaceuticals, Berlin 10117, Germany
Michael Krug Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
Christopher Luscombe GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
Darren Green GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
Peter Ertl Novartis Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
Peter Antal Budapest University of Technology and Economics, Department of Measurement and Information Systems, Műegyetem rkp. 3, Budapest 1111, Hungary
David Marcus GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
Nicolas Do Huu Iktos, 65 rue de Prony, Paris 75017, France
Hideyoshi Fuji Modality Informatics Group, Digital Research Solutions, Advanced Informatics & Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
Stephen Pickett GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
Gergely Acs Budapest University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
Eric Boniface Substra Foundation - Labelia Labs, 4 rue Voltaire, Nantes 44000, France
Bernd Beck BI Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
Yax Sun Amgen Research, 1 Amgen Center Drive, Thousand Oaks, California 92130, United States
Arnaud Gohier Institut de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
Friedrich Rippmann Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
Ola Engkvist AstraZeneca, Molecular AI, Discovery Sciences, R&D, Pepparedsleden 1, Mölndal 431 50, Sweden
Andreas H. Göller Bayer Pharma AG, Global Drug Discovery, Chemical Research, Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
Yves Moreau KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Mathieu N. Galtier Owkin, 4 Rue Voltaire, Nantes 44000, France
Ansgar Schuffenhauer Novartis Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
Hugo Ceulemans Janssen Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium

Collapse

Matsukiyo Y, Yamanaka C, Yamanishi Y. De Novo Generation of Chemical Structures of Inhibitor and Activator Candidates for Therapeutic Target Proteins by a Transformer-Based Variational Autoencoder and Bayesian Optimization. J Chem Inf Model 2024;64:2345-2355. [PMID: 37768595 DOI: 10.1021/acs.jcim.3c00824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]

Ferraz-Caetano J, Teixeira F, Cordeiro MNDS. Explainable Supervised Machine Learning Model To Predict Solvation Gibbs Energy. J Chem Inf Model 2024;64:2250-2262. [PMID: 37603608 PMCID: PMC11005042 DOI: 10.1021/acs.jcim.3c00544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Indexed: 08/23/2023]

Svensson E, Hoedt PJ, Hochreiter S, Klambauer G. HyperPCM: Robust Task-Conditioned Modeling of Drug-Target Interactions. J Chem Inf Model 2024;64:2539-2553. [PMID: 38185877 PMCID: PMC11005051 DOI: 10.1021/acs.jcim.3c01417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024]

Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024;64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]

Yi J, Shi S, Fu L, Yang Z, Nie P, Lu A, Wu C, Deng Y, Hsieh C, Zeng X, Hou T, Cao D. OptADMET: a web-based tool for substructure modifications to improve ADMET properties of lead compounds. Nat Protoc 2024;19:1105-1121. [PMID: 38263521 DOI: 10.1038/s41596-023-00942-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 10/27/2023] [Indexed: 01/25/2024]

Chen L, Jiang J, Dou B, Feng H, Liu J, Zhu Y, Zhang B, Zhou T, Wei GW. Machine learning study of the extended drug-target interaction network informed by pain related voltage-gated sodium channels. Pain 2024;165:908-921. [PMID: 37851391 PMCID: PMC11021136 DOI: 10.1097/j.pain.0000000000003089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/09/2023] [Indexed: 10/19/2023]

Wang C, Ong HH, Chiba S, Rajapakse JC. GLDM: hit molecule generation with constrained graph latent diffusion model. Brief Bioinform 2024;25:bbae142. [PMID: 38581415 PMCID: PMC10998532 DOI: 10.1093/bib/bbae142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/08/2024] [Accepted: 03/03/2024] [Indexed: 04/08/2024] Open

Chang J, Ye JC. Bidirectional generation of structure and properties through a single molecular foundation model. Nat Commun 2024;15:2323. [PMID: 38485914 PMCID: PMC10940637 DOI: 10.1038/s41467-024-46440-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 02/27/2024] [Indexed: 03/18/2024] Open

Hunklinger A, Hartog P, Šícho M, Godin G, Tetko IV. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2024;29:100144. [PMID: 38316342 DOI: 10.1016/j.slasd.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024]

Kaufman B, Williams EC, Underkoffler C, Pederson R, Mardirossian N, Watson I, Parkhill J. COATI: Multimodal Contrastive Pretraining for Representing and Traversing Chemical Space. J Chem Inf Model 2024;64:1145-1157. [PMID: 38316665 DOI: 10.1021/acs.jcim.3c01753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]

Abstract

Creating a successful small molecule drug is a challenging multiparameter optimization problem in an effectively infinite space of possible molecules. Generative models have emerged as powerful tools for traversing data manifolds composed of images, sounds, and text and offer an opportunity to dramatically improve the drug discovery and design process. To create generative optimization methods that are more useful than brute-force molecular generation and filtering via virtual screening, we propose that four integrated features are necessary: large, quantitative data sets of molecular structure and activity, an invertible vector representation of realistic accessible molecules, smooth and differentiable regressors that quantify uncertainty, and algorithms to simultaneously optimize properties of interest. Over the course of 12 months, Terray Therapeutics has collected a data set of 2 billion quantitative binding measurements of small molecules to therapeutic targets, which directly motivates multiparameter generative optimization of molecules conditioned on these data. To this end, we present contrastive optimization for accelerated therapeutic inference (COATI), a pretrained, multimodal encoder-decoder model of druglike chemical space. COATI is constructed without any human biasing of features, using contrastive learning from text and 3D representations of molecules to allow for downstream use with structural models. We demonstrate that COATI possesses many of the desired properties of universal molecular embedding: fixed-dimension, invertibility, autoencoding, accurate regression, and low computation cost. Finally, we present a novel metadynamics algorithm for generative optimization using a small subset of our proprietary data collected for a model protein, carbonic anhydrase, designing molecules that satisfy the multiparameter optimization task of potency, solubility, and drug likeness. This work sets the stage for fully integrated generative molecular design and optimization for small molecules.

Collapse

Yoshikai Y, Mizuno T, Nemoto S, Kusuhara H. Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations. Nat Commun 2024;15:1197. [PMID: 38365821 PMCID: PMC10873378 DOI: 10.1038/s41467-024-45102-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 01/11/2024] [Indexed: 02/18/2024] Open

Johnson TA, Abrahamsson DP. Quantification of chemicals in non-targeted analysis without analytical standards - Understanding the mechanism of electrospray ionization and making predictions. CURRENT OPINION IN ENVIRONMENTAL SCIENCE & HEALTH 2024;37:100529. [PMID: 38312491 PMCID: PMC10836048 DOI: 10.1016/j.coesh.2023.100529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]

Führer F, Gruber A, Diedam H, Göller AH, Menz S, Schneckener S. A deep neural network: mechanistic hybrid model to predict pharmacokinetics in rat. J Comput Aided Mol Des 2024;38:7. [PMID: 38294570 DOI: 10.1007/s10822-023-00547-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 12/21/2023] [Indexed: 02/01/2024]

Du H, Wei GW, Hou T. Multiscale topology in interactomic network: from transcriptome to antiaddiction drug repurposing. Brief Bioinform 2024;25:bbae054. [PMID: 38499497 PMCID: PMC10948341 DOI: 10.1093/bib/bbae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/05/2024] [Accepted: 01/25/2024] [Indexed: 03/20/2024] Open

Yi JC, Yang ZY, Zhao WT, Yang ZJ, Zhang XC, Wu CK, Lu AP, Cao DS. ChemMORT: an automatic ADMET optimization platform using deep learning and multi-objective particle swarm optimization. Brief Bioinform 2024;25:bbae008. [PMID: 38385872 PMCID: PMC10883642 DOI: 10.1093/bib/bbae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/17/2023] [Accepted: 01/02/2024] [Indexed: 02/23/2024] Open

Viganò EL, Ballabio D, Roncaglioni A. Artificial Intelligence and Machine Learning Methods to Evaluate Cardiotoxicity following the Adverse Outcome Pathway Frameworks. TOXICS 2024;12:87. [PMID: 38276722 PMCID: PMC10820364 DOI: 10.3390/toxics12010087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 01/27/2024]

Abstract

Cardiovascular disease is a leading global cause of mortality. The potential cardiotoxic effects of chemicals from different classes, such as environmental contaminants, pesticides, and drugs can significantly contribute to effects on health. The same chemical can induce cardiotoxicity in different ways, following various Adverse Outcome Pathways (AOPs). In addition, the potential synergistic effects between chemicals further complicate the issue. In silico methods have become essential for tackling the problem from different perspectives, reducing the need for traditional in vivo testing, and saving valuable resources in terms of time and money. Artificial intelligence (AI) and machine learning (ML) are among today's advanced approaches for evaluating chemical hazards. They can serve, for instance, as a first-tier component of Integrated Approaches to Testing and Assessment (IATA). This study employed ML and AI to assess interactions between chemicals and specific biological targets within the AOP networks for cardiotoxicity, starting with molecular initiating events (MIEs) and progressing through key events (KEs). We explored methods to encode chemical information in a suitable way for ML and AI. We started with commonly used approaches in Quantitative Structure-Activity Relationship (QSAR) methods, such as molecular descriptors and different types of fingerprint. We then increased the complexity of encoders, incorporating graph-based methods, auto-encoders, and character embeddings employed in neural language processing. We also developed a multimodal neural network architecture, capable of considering the complementary nature of different chemical representations simultaneously. The potential of this approach, compared to more conventional architectures designed to handle a single encoder, becomes apparent when the amount of data increases.

Collapse

Colliandre L, Muller C. Bayesian Optimization in Drug Discovery. Methods Mol Biol 2024;2716:101-136. [PMID: 37702937 DOI: 10.1007/978-1-0716-3449-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]

Olmedo DA, Durant-Archibold AA, López-Pérez JL, Medina-Franco JL. Design and Diversity Analysis of Chemical Libraries in Drug Discovery. Comb Chem High Throughput Screen 2024;27:502-515. [PMID: 37409545 DOI: 10.2174/1386207326666230705150110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/30/2023] [Accepted: 05/30/2023] [Indexed: 07/07/2023]

Schimunek J, Seidl P, Elez K, Hempel T, Le T, Noé F, Olsson S, Raich L, Winter R, Gokcan H, Gusev F, Gutkin EM, Isayev O, Kurnikova MG, Narangoda CH, Zubatyuk R, Bosko IP, Furs KV, Karpenko AD, Kornoushenko YV, Shuldau M, Yushkevich A, Benabderrahmane MB, Bousquet-Melou P, Bureau R, Charton B, Cirou BC, Gil G, Allen WJ, Sirimulla S, Watowich S, Antonopoulos N, Epitropakis N, Krasoulis A, Itsikalis V, Theodorakis S, Kozlovskii I, Maliutin A, Medvedev A, Popov P, Zaretckii M, Eghbal-Zadeh H, Halmich C, Hochreiter S, Mayr A, Ruch P, Widrich M, Berenger F, Kumar A, Yamanishi Y, Zhang KYJ, Bengio E, Bengio Y, Jain MJ, Korablyov M, Liu CH, Marcou G, Glaab E, Barnsley K, Iyengar SM, Ondrechen MJ, Haupt VJ, Kaiser F, Schroeder M, Pugliese L, Albani S, Athanasiou C, Beccari A, Carloni P, D'Arrigo G, Gianquinto E, Goßen J, Hanke A, Joseph BP, Kokh DB, Kovachka S, Manelfi C, Mukherjee G, Muñiz-Chicharro A, Musiani F, Nunes-Alves A, Paiardi G, Rossetti G, Sadiq SK, Spyrakis F, Talarico C, Tsengenes A, Wade RC, Copeland C, Gaiser J, Olson DR, Roy A, Venkatraman V, Wheeler TJ, Arthanari H, Blaschitz K, Cespugli M, Durmaz V, Fackeldey K, Fischer PD, Gorgulla C, Gruber C, Gruber K, Hetmann M, Kinney JE, Padmanabha Das KM, Pandita S, Singh A, Steinkellner G, Tesseyre G, Wagner G, Wang ZF, Yust RJ, Druzhilovskiy DS, Filimonov DA, Pogodin PV, Poroikov V, Rudik AV, Stolbov LA, Veselovsky AV, De Rosa M, De Simone G, Gulotta MR, Lombino J, Mekni N, Perricone U, Casini A, Embree A, Gordon DB, Lei D, Pratt K, Voigt CA, Chen KY, Jacob Y, Krischuns T, Lafaye P, Zettor A, Rodríguez ML, White KM, Fearon D, Von Delft F, Walsh MA, Horvath D, Brooks CL, Falsafi B, Ford B, García-Sastre A, Yup Lee S, Naffakh N, Varnek A, Klambauer G, Hermans TM. A community effort in SARS-CoV-2 drug discovery. Mol Inform 2024;43:e202300262. [PMID: 37833243 DOI: 10.1002/minf.202300262] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/13/2023] [Accepted: 10/13/2023] [Indexed: 10/15/2023]

Pérez-Correa I, Giunta PD, Mariño FJ, Francesconi JA. Transformer-Based Representation of Organic Molecules for Potential Modeling of Physicochemical Properties. J Chem Inf Model 2023;63:7676-7688. [PMID: 38062559 DOI: 10.1021/acs.jcim.3c01548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]

Wang F, Pasin D, Skinnider MA, Liigand J, Kleis JN, Brown D, Oler E, Sajed T, Gautam V, Harrison S, Greiner R, Foster LJ, Dalsgaard PW, Wishart DS. Deep Learning-Enabled MS/MS Spectrum Prediction Facilitates Automated Identification Of Novel Psychoactive Substances. Anal Chem 2023;95:18326-18334. [PMID: 38048435 PMCID: PMC10733899 DOI: 10.1021/acs.analchem.3c02413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 11/10/2023] [Accepted: 11/13/2023] [Indexed: 12/06/2023]

Affiliation(s)

Fei Wang Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada Alberta Machine Intelligence Institute, Edmonton, Alberta T5J 3B1, Canada
Daniel Pasin Section of Forensic Chemistry, Department of Forensic Medicine, University of Copenhagen, Copenhagen 2100, Denmark
Michael A. Skinnider Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, United States Ludwig Institute for Cancer Research, Princeton University, Princeton, New Jersey 08544, United States
Jaanus Liigand Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada Institute of Chemistry, University of Tartu, Tartu 50411, Estonia
Jan-Niklas Kleis Institute of Forensic Medicine, Forensic Toxicology, Johannes Gutenberg University Mainz, Mainz 55131, Germany
David Brown Forensic Science Laboratory, ChemCentre, Bentley, Western Australia 6102, Australia School of Molecular and Life Sciences, Curtin University, Bentley, Western Australia 6009, Australia
Eponine Oler Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
Tanvir Sajed Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
Vasuk Gautam Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
Stephen Harrison Forensic Science Laboratory, ChemCentre, Bentley, Western Australia 6102, Australia
Russell Greiner Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada Alberta Machine Intelligence Institute, Edmonton, Alberta T5J 3B1, Canada
Leonard J. Foster Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, British Columbia V6T 2A1, Canada
Petur Weihe Dalsgaard Section of Forensic Chemistry, Department of Forensic Medicine, University of Copenhagen, Copenhagen 2100, Denmark
David S. Wishart Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, Alberta T6G 1C9, Canada Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta T6G 2C8, Canada Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States

Collapse

Wang R, Feng H, Wei GW. ChatGPT in Drug Discovery: A Case Study on Anticocaine Addiction Drug Development with Chatbots. J Chem Inf Model 2023;63:7189-7209. [PMID: 37956228 PMCID: PMC11021135 DOI: 10.1021/acs.jcim.3c01429] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Wang Z, Feng Z, Li Y, Li B, Wang Y, Sha C, He M, Li X. BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation. Brief Bioinform 2023;25:bbad400. [PMID: 38033291 PMCID: PMC10783874 DOI: 10.1093/bib/bbad400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/02/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023] Open

Shen C, Luo J, Xia K. Molecular geometric deep learning. CELL REPORTS METHODS 2023;3:100621. [PMID: 37875121 PMCID: PMC10694498 DOI: 10.1016/j.crmeth.2023.100621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 06/16/2023] [Accepted: 09/28/2023] [Indexed: 10/26/2023]

Yu L, He X, Fang X, Liu L, Liu J. Deep Learning with Geometry-Enhanced Molecular Representation for Augmentation of Large-Scale Docking-Based Virtual Screening. J Chem Inf Model 2023;63:6501-6514. [PMID: 37882338 DOI: 10.1021/acs.jcim.3c01371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]

Ilnicka A, Schneider G. Designing molecules with autoencoder networks. NATURE COMPUTATIONAL SCIENCE 2023;3:922-933. [PMID: 38177601 DOI: 10.1038/s43588-023-00548-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/03/2023] [Indexed: 01/06/2024]

Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023;22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]

Affiliation(s)

Michael W Mullowney Duchossois Family Institute, The University of Chicago, Chicago, IL, USA
Katherine R Duncan Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
Somayah S Elsayed Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
Neha Garg School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
Justin J J van der Hooft Bioinformatics Group, Wageningen University, Wageningen, The Netherlands Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
Nathaniel I Martin Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
David Meijer Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Barbara R Terlouw Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Friederike Biermann Bioinformatics Group, Wageningen University, Wageningen, The Netherlands Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
Kai Blin The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
Janani Durairaj Biozentrum, University of Basel, Basel, Switzerland
Marina Gorostiola González Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands ONCODE institute, Leiden, The Netherlands
Eric J N Helfrich Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
Florian Huber Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
Stefan Leopold-Messer Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
Kohulan Rajan Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
Tristan de Rond School of Chemical Sciences, University of Auckland, Auckland, New Zealand
Jeffrey A van Santen Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
Maria Sorokina Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany Pharmaceuticals R&D, Bayer AG, Berlin, Germany
Marcy J Balunas Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
Mehdi A Beniddir Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
Doris A van Bergeijk Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
Laura M Carroll Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
Chase M Clark Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
Djork-Arné Clevert WRDM - Machine Learning Research, Pfizer, Berlin, Germany
Chris A Dejong Adapsyn Bioscience, Hamilton, Ontario, Canada
Chao Du Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
Scarlet Ferrinho Chemistry Department, University of St Andrews, St Andrews, UK
Francesca Grisoni Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
Albert Hofstetter Laboratory of Physical Chemistry, ETH Zürich, Zürich, Switzerland
Willem Jespers Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
Olga V Kalinina Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany Center for Bioinformatics, Saarland University, Saarbrücken, Germany
Satria A Kautsar Department of Chemistry, Scripps Research, FL, USA
Hyunwoo Kim College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
Tiago F Leao Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
Joleen Masschelein Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium Department of Biology, KU Leuven, Heverlee, Belgium
Evan R Rees Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
Raphael Reher Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
Daniel Reker Department of Biomedical Engineering, Duke University, Durham, NC, USA Duke Microbiome Center, Duke University, Durham, NC, USA
Philippe Schwaller Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Marwin Segler Microsoft Research, Cambridge, UK
Michael A Skinnider Adapsyn Bioscience, Hamilton, Ontario, Canada Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
Allison S Walker Department of Chemistry, Vanderbilt University, Nashville, TN, USA Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
Egon L Willighagen Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
Barbara Zdrazil European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
Nadine Ziemert Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
Rebecca J M Goss Chemistry Department, University of St Andrews, St Andrews, UK
Pierre Guyomard Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
Andrea Volkamer Center for Bioinformatics, Saarland University, Saarbrücken, Germany In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
William H Gerwick Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
Hyun Uk Kim Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
Rolf Müller Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany Department of Pharmacy, Saarland University, Saarbrücken, Germany German Center for infection research (DZIF), Braunschweig, Germany Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
Gilles P van Wezel Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
Gerard J P van Westen Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
Anna K H Hirsch Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany. Department of Pharmacy, Saarland University, Saarbrücken, Germany. German Center for infection research (DZIF), Braunschweig, Germany. Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
Roger G Linington Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
Serina L Robinson Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
Marnix H Medema Bioinformatics Group, Wageningen University, Wageningen, The Netherlands. Institute of Biology, Leiden University, Leiden, The Netherlands.

Collapse

Wang R, Feng H, Wei GW. ChatGPT in Drug Discovery: A Case Study on Anti-Cocaine Addiction Drug Development with Chatbots. ARXIV 2023:arXiv:2308.06920v2. [PMID: 37645039 PMCID: PMC10462169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]

Dollar O, Joshi N, Pfaendtner J, Beck DAC. Efficient 3D Molecular Design with an E(3) Invariant Transformer VAE. J Phys Chem A 2023;127:7844-7852. [PMID: 37670244 DOI: 10.1021/acs.jpca.3c04188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]

Feng H, Wang R, Zhan CG, Wei GW. Multiobjective Molecular Optimization for Opioid Use Disorder Treatment Using Generative Network Complex. J Med Chem 2023;66:12479-12498. [PMID: 37623046 PMCID: PMC11037444 DOI: 10.1021/acs.jmedchem.3c01053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]

Cremer J, Medrano Sandonas L, Tkatchenko A, Clevert DA, De Fabritiis G. Equivariant Graph Neural Networks for Toxicity Prediction. Chem Res Toxicol 2023;36. [PMID: 37690056 PMCID: PMC10583285 DOI: 10.1021/acs.chemrestox.3c00032] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Indexed: 09/12/2023]

Abstract

Predictive modeling of toxicity is a crucial step in the drug discovery pipeline. It can help filter out molecules with a high probability of failing in the early stages of de novo drug design. Thus, several machine learning (ML) models have been developed to predict the toxicity of molecules by combining classical ML techniques or deep neural networks with well-known molecular representations such as fingerprints or 2D graphs. But the more natural, accurate representation of molecules is expected to be defined in physical 3D space like in ab initio methods. Recent studies successfully used equivariant graph neural networks (EGNNs) for representation learning based on 3D structures to predict quantum-mechanical properties of molecules. Inspired by this, we investigated the performance of EGNNs to construct reliable ML models for toxicity prediction. We used the equivariant transformer (ET) model in TorchMD-NET for this. Eleven toxicity data sets taken from MoleculeNet, TDCommons, and ToxBenchmark have been considered to evaluate the capability of ET for toxicity prediction. Our results show that ET adequately learns 3D representations of molecules that can successfully correlate with toxicity activity, achieving good accuracies on most data sets comparable to state-of-the-art models. We also test a physicochemical property, namely, the total energy of a molecule, to inform the toxicity prediction with a physical prior. However, our work suggests that these two properties can not be related. We also provide an attention weight analysis for helping to understand the toxicity prediction in 3D space and thus increase the explainability of the ML model. In summary, our findings offer promising insights considering 3D geometry information via EGNNs and provide a straightforward way to integrate molecular conformers into ML-based pipelines for predicting and investigating toxicity prediction in physical space. We expect that in the future, especially for larger, more diverse data sets, EGNNs will be an essential tool in this domain.

Collapse

Zhang Y, Menke J, He J, Nittinger E, Tyrchan C, Koch O, Zhao H. Similarity-based pairing improves efficiency of siamese neural networks for regression tasks and uncertainty quantification. J Cheminform 2023;15:75. [PMID: 37649050 PMCID: PMC10469421 DOI: 10.1186/s13321-023-00744-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 08/10/2023] [Indexed: 09/01/2023] Open

Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA. Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 2023;15:73. [PMID: 37641120 PMCID: PMC10464382 DOI: 10.1186/s13321-023-00743-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open

Lanini J, Santarossa G, Sirockin F, Lewis R, Fechner N, Misztela H, Lewis S, Maziarz K, Stanley M, Segler M, Stiefl N, Schneider N. PREFER: A New Predictive Modeling Framework for Molecular Discovery. J Chem Inf Model 2023;63:4497-4504. [PMID: 37487018 DOI: 10.1021/acs.jcim.3c00523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]

Gu Y, Li J, Kang H, Zhang B, Zheng S. Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning. Molecules 2023;28:5982. [PMID: 37630234 PMCID: PMC10459669 DOI: 10.3390/molecules28165982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 07/27/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open

Abstract

Ligand-based virtual screening (LBVS) is a promising approach for rapid and low-cost screening of potentially bioactive molecules in the early stage of drug discovery. Compared with traditional similarity-based machine learning methods, deep learning frameworks for LBVS can more effectively extract high-order molecule structure representations from molecular fingerprints or structures. However, the 3D conformation of a molecule largely influences its bioactivity and physical properties, and has rarely been considered in previous deep learning-based LBVS methods. Moreover, the relative bioactivity benchmark dataset is still lacking. To address these issues, we introduce a novel end-to-end deep learning architecture trained from molecular conformers for LBVS. We first extracted molecule conformers from multiple public molecular bioactivity data and consolidated them into a large-scale bioactivity benchmark dataset, which totally includes millions of endpoints and molecules corresponding to 954 targets. Then, we devised a deep learning-based LBVS called EquiVS to learn molecule representations from conformers for bioactivity prediction. Specifically, graph convolutional network (GCN) and equivariant graph neural network (EGNN) are sequentially stacked to learn high-order molecule-level and conformer-level representations, followed with attention-based deep multiple-instance learning (MIL) to aggregate these representations and then predict the potential bioactivity for the query molecule on a given target. We conducted various experiments to validate the data quality of our benchmark dataset, and confirmed EquiVS achieved better performance compared with 10 traditional machine learning or deep learning-based LBVS methods. Further ablation studies demonstrate the significant contribution of molecular conformation for bioactivity prediction, as well as the reasonability and non-redundancy of deep learning architecture in EquiVS. Finally, a model interpretation case study on CDK2 shows the potential of EquiVS in optimal conformer discovery. The overall study shows that our proposed benchmark dataset and EquiVS method have promising prospects in virtual screening applications.

Collapse

Yamanaka C, Uki S, Kaitoh K, Iwata M, Yamanishi Y. De novo drug design based on patient gene expression profiles via deep learning. Mol Inform 2023;42:e2300064. [PMID: 37475603 DOI: 10.1002/minf.202300064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/25/2023] [Accepted: 07/20/2023] [Indexed: 07/22/2023]

Litsa EE, Chenthamarakshan V, Das P, Kavraki LE. An end-to-end deep learning framework for translating mass spectra to de-novo molecules. Commun Chem 2023;6:132. [PMID: 37353554 DOI: 10.1038/s42004-023-00932-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 06/13/2023] [Indexed: 06/25/2023] Open

Dost K, Pullar-Strecker Z, Brydon L, Zhang K, Hafner J, Riddle PJ, Wicker JS. Combatting over-specialization bias in growing chemical databases. J Cheminform 2023;15:53. [PMID: 37208694 DOI: 10.1186/s13321-023-00716-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 03/25/2023] [Indexed: 05/21/2023] Open

Abstract

BACKGROUND

Predicting in advance the behavior of new chemical compounds can support the design process of new products by directing the research toward the most promising candidates and ruling out others. Such predictive models can be data-driven using Machine Learning or based on researchers' experience and depend on the collection of past results. In either case: models (or researchers) can only make reliable assumptions about compounds that are similar to what they have seen before. Therefore, consequent usage of these predictive models shapes the dataset and causes a continuous specialization shrinking the applicability domain of all trained models on this dataset in the future, and increasingly harming model-based exploration of the space.

PROPOSED SOLUTION

In this paper, we propose CANCELS (CounterActiNg Compound spEciaLization biaS), a technique that helps to break the dataset specialization spiral. Aiming for a smooth distribution of the compounds in the dataset, we identify areas in the space that fall short and suggest additional experiments that help bridge the gap. Thereby, we generally improve the dataset quality in an entirely unsupervised manner and create awareness of potential flaws in the data. CANCELS does not aim to cover the entire compound space and hence retains a desirable degree of specialization to a specified research domain.

RESULTS

An extensive set of experiments on the use-case of biodegradation pathway prediction not only reveals that the bias spiral can indeed be observed but also that CANCELS produces meaningful results. Additionally, we demonstrate that mitigating the observed bias is crucial as it cannot only intervene with the continuous specialization process, but also significantly improves a predictor's performance while reducing the number of required experiments. Overall, we believe that CANCELS can support researchers in their experimentation process to not only better understand their data and potential flaws, but also to grow the dataset in a sustainable way. All code is available under github.com/KatDost/Cancels .

Collapse