1
|
Tan Z, Lin K, Zhao Y, Zhou T. Generative discovery of safer chemical alternatives using diffusion modeling: A case study in green solvent design for cyclohexane/benzene extractive distillation. J Environ Sci (China) 2025; 154:390-401. [PMID: 40049881 DOI: 10.1016/j.jes.2024.08.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/11/2024] [Accepted: 08/12/2024] [Indexed: 05/13/2025]
Abstract
Over the past century, advancements in chemistry have significantly propelled human innovation, enhancing both industrial and consumer products. However, this rapid progression has resulted in chemical pollution increasingly surpassing planetary boundaries, as production and release rates have outpaced our monitoring capabilities. To catalyze more impactful efforts, this study transitions from traditional chemical assessment to inverse chemical design, introducing a generative graph latent diffusion model aimed at discovering safer alternatives. In a case study on the design of green solvents for cyclohexane/benzene extraction distillation, we constructed a design database encompassing functional, environmental hazards, and process constraints. Virtual screening of previous design dataset revealed distinct trade-off trends between these design requirements. Based on the screening outcomes, an unconstrained generative model was developed, which covered a broader chemical space and demonstrated superior capabilities for structural interpolation and extrapolation. To further optimize molecular generation towards desired properties, a multi-objective latent diffusion method was applied, yielding 19 candidate molecules. Of these, 7 were identified in PubChem as the most viable green solvent candidates, while the remaining 12 as potential novel candidates. Overall, this study effectively designed green solvent candidates for safer and more sustainable industrial production, setting a promising precedent for the development of environmentally friendly alternatives in other areas of chemical research.
Collapse
Affiliation(s)
- Zhichao Tan
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
| | - Kunsen Lin
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China; Engineering Research Center of Polymer Green Recycling of Ministry of Education, College of Environmental Science and Engineering, Fujian Normal University, Fuzhou 350007, China
| | - Youcai Zhao
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China; Tianfu Yongxing Laboratory, Chengdu 610000, China.
| | - Tao Zhou
- The State Key Laboratory of Pollution Control and Resource Reuse, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China.
| |
Collapse
|
2
|
Piazza L, Srinivasan S, Tuccinardi T, Bajorath J. Transforming molecular cores, substituents, and combinations into structurally diverse compounds using chemical language models. Eur J Med Chem 2025; 291:117615. [PMID: 40222164 DOI: 10.1016/j.ejmech.2025.117615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2025] [Revised: 03/19/2025] [Accepted: 04/07/2025] [Indexed: 04/15/2025]
Abstract
Transformer-based chemical language models (CLMs) were derived to generate structurally and topologically diverse embeddings of core structure fragments, substituents, or core/substituent combinations in chemically proper compounds, representing a design task that is difficult to address using conventional structure generation methods. To this end, CLM variants were challenged to learn different fragment-to-compound mappings in the absence of structural rules or any other fragment linking or synthetic information. The resulting alternative models were found to have high syntactic fidelity, but displayed notable differences in their ability to generate valid candidate compounds containing test fragments, with a clear preference for a model variant processing core/substituent combinations. However, the majority of valid candidate compounds generated with all models were distinct from training data and structurally novel. In addition, the CLMs exhibited high chemical diversification capacity and often generated structures with new topologies not encountered during training. Furthermore, all models produced large numbers of close structural analogues of known bioactive compounds covering a large target space, thus indicating the relevance of newly generated candidates for pharmaceutical research. As a part of our study, the new methodology and all data are made publicly available.
Collapse
Affiliation(s)
- Lisa Piazza
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany; Department of Pharmacy, University of Pisa, Via Bonanno 6, 56126, Pisa, Italy
| | - Sanjana Srinivasan
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany
| | - Tiziano Tuccinardi
- Department of Pharmacy, University of Pisa, Via Bonanno 6, 56126, Pisa, Italy
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany.
| |
Collapse
|
3
|
Li T, Chen YT, Zhang XB, Du RR, Ma LN, Lan YQ. Asymmetric heterogeneous catalysis using crystalline porous materials. Chem Soc Rev 2025. [PMID: 40384435 DOI: 10.1039/d4cs00538d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2025]
Abstract
Asymmetric catalysis has emerged as a pivotal strategy in the synthesis of chiral compounds, offering significant advantages in selectivity and efficiency. In recent years, heterogeneous catalysis has become a focal point in the fields of organic synthesis and materials science due to continuous advancements in science and technology, especially the use of crystalline porous materials (CPMs) as catalysts. This review summarizes recent advances in using CPMs, such as metal-organic frameworks (MOFs), covalent organic frameworks (COFs) and zeolites, as promising supports for asymmetric catalysts. These materials provide high surface areas, tunable porosity, and the ability to host active catalytic sites, which enhance reaction rates and selectivity. In this review, we summarize the stereostructural properties of chiral CPMs to guide the future design of asymmetric heterogeneous catalysts and the study of catalytic mechanisms. Moreover, we discuss various strategies for incorporating catalytic moieties into these frameworks, including direct synthesis, post-synthesis modification and induced synthesis methods. Additionally, we highlight recent examples where CPMs have been successfully applied in asymmetric transformations, examining their mechanistic insights and the role of substrate diffusion in achieving high enantioselectivity. This review concludes with a perspective on the challenges and future directions in this rapidly evolving field, emphasizing the need for further integration of advanced artificial intelligence techniques and design principles to optimize the synthesis and catalytic performance of chiral CPMs.
Collapse
Affiliation(s)
- Teng Li
- Guangdong Provincial Key Laboratory of Carbon Dioxide Resource Utilization, School of Chemistry, South China Normal University, Guangzhou, 510006, P. R. China.
| | - Yan-Ting Chen
- Guangdong Provincial Key Laboratory of Carbon Dioxide Resource Utilization, School of Chemistry, South China Normal University, Guangzhou, 510006, P. R. China.
| | - Xiao-Bin Zhang
- Guangdong Provincial Key Laboratory of Carbon Dioxide Resource Utilization, School of Chemistry, South China Normal University, Guangzhou, 510006, P. R. China.
| | - Rong-Rong Du
- Guangdong Provincial Key Laboratory of Carbon Dioxide Resource Utilization, School of Chemistry, South China Normal University, Guangzhou, 510006, P. R. China.
| | - Lin-Na Ma
- Guangdong Provincial Key Laboratory of Carbon Dioxide Resource Utilization, School of Chemistry, South China Normal University, Guangzhou, 510006, P. R. China.
| | - Ya-Qian Lan
- Guangdong Provincial Key Laboratory of Carbon Dioxide Resource Utilization, School of Chemistry, South China Normal University, Guangzhou, 510006, P. R. China.
| |
Collapse
|
4
|
Seal S, Mahale M, García-Ortegón M, Joshi CK, Hosseini-Gerami L, Beatson A, Greenig M, Shekhar M, Patra A, Weis C, Mehrjou A, Badré A, Paisley B, Lowe R, Singh S, Shah F, Johannesson B, Williams D, Rouquie D, Clevert DA, Schwab P, Richmond N, Nicolaou CA, Gonzalez RJ, Naven R, Schramm C, Vidler LR, Mansouri K, Walters WP, Wilk DD, Spjuth O, Carpenter AE, Bender A. Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World. Chem Res Toxicol 2025; 38:759-807. [PMID: 40314361 DOI: 10.1021/acs.chemrestox.5c00033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
Abstract
Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.
Collapse
Affiliation(s)
- Srijit Seal
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
| | - Manas Mahale
- Department of Pharmaceutical Chemistry, Bombay College of Pharmacy, Mumbai 400098, India
| | | | - Chaitanya K Joshi
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, U.K
| | | | - Alex Beatson
- Axiom Bio, San Francisco, California 94107, United States
| | - Matthew Greenig
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
| | - Mrinal Shekhar
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | | | | | | | - Adrien Badré
- Novartis Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Brianna Paisley
- Eli Lilly & Company, Indianapolis, Indiana 46285, United States
| | | | - Shantanu Singh
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Falgun Shah
- Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States
| | | | | | - David Rouquie
- Toxicology Data Science, Bayer SAS Crop Science Division, Valbonne Sophia-Antipolis 06560, France
| | - Djork-Arné Clevert
- Pfizer, Worldwide Research, Development and Medical, Machine Learning & Computational Sciences, Berlin 10922, Germany
| | | | | | - Christos A Nicolaou
- Computational Drug Design, Digital Science & Innovation, Novo Nordisk US R&D, Lexington, Massachusetts 02421, United States
| | - Raymond J Gonzalez
- Non Clinical Drug Safety, Merck Inc., West Point, Pennsylvania 19486, United States
| | - Russell Naven
- Novartis Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | | | - Kamel Mansouri
- NIH/NIEHS/DTT/NICEATM, Research Triangle Park, North Carolina 27709, United States
| | | | | | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala 751 24, Sweden
- Phenaros Pharmaceuticals AB, Uppsala 75239, Sweden
| | - Anne E Carpenter
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Andreas Bender
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K
- College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi 127788, United Arab Emirates
| |
Collapse
|
5
|
Duke R, Yang CH, Ganapathysubramanian B, Risko C. Evaluating Molecular Similarity Measures: Do Similarity Measures Reflect Electronic Structure Properties? J Chem Inf Model 2025; 65:4311-4319. [PMID: 40299458 DOI: 10.1021/acs.jcim.5c00175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
The rapid adoption of big data, machine learning (ML), and generative artificial intelligence (AI) in chemical discovery has heightened the importance of quantifying molecular similarity. Molecular similarity, commonly assessed as the distance between molecular fingerprints, is integral to applications such as database curation, diversity analysis, and property prediction. AI tools frequently rely on these similarity measures to cluster molecules under the assumption that structurally similar molecules exhibit similar properties. However, this assumption is not universally valid, particularly for continuous properties like electronic structure properties. Despite the prevalence of fingerprint-based similarity measures, their evaluation has largely depended on biological activity data sets and qualitative metrics, limiting their relevance for nonbiological domains. To address this gap, we propose a framework to evaluate the correlation between molecular similarity measures and molecular properties. Our approach builds on the concept of neighborhood behavior and incorporates kernel density estimation (KDE) analysis to quantify how well similarity measures capture property relationships. Using a data set of over 350 million molecule pairs with electronic structure, redox, and optical properties, we systematically evaluate the correlation between several molecular fingerprint generators, distance functions, and these properties. Both the curated data set and the evaluation framework are publicly available.
Collapse
Affiliation(s)
- Rebekah Duke
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Chih-Hsuan Yang
- Department of Mechanical Engineering and Translational AI Research and Education Center, Iowa State University, Ames, Iowa 50011, United States
| | - Baskar Ganapathysubramanian
- Department of Mechanical Engineering and Translational AI Research and Education Center, Iowa State University, Ames, Iowa 50011, United States
| | - Chad Risko
- Department of Chemistry and Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
6
|
Wang Z, You F. Leveraging generative models with periodicity-aware, invertible and invariant representations for crystalline materials design. NATURE COMPUTATIONAL SCIENCE 2025:10.1038/s43588-025-00797-7. [PMID: 40346195 DOI: 10.1038/s43588-025-00797-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 03/25/2025] [Indexed: 05/11/2025]
Abstract
Designing periodicity-aware, invariant and invertible representations provides an opportunity for the inverse design of crystalline materials with desired properties by generative models. This objective requires optimizing representations and refining the architecture of generative models, yet its feasibility remains uncertain, given current progress in molecular inverse generation. In this Perspective, we highlight the progress of various methods for designing representations and generative schemes for crystalline materials, discuss the challenges in the field and propose a roadmap for future developments.
Collapse
Affiliation(s)
- Zhilong Wang
- Cornell University AI for Science Institute, Cornell University, Ithaca, NY, USA
- College of Engineering, Cornell University, Ithaca, NY, USA
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
| | - Fengqi You
- Cornell University AI for Science Institute, Cornell University, Ithaca, NY, USA.
- College of Engineering, Cornell University, Ithaca, NY, USA.
- Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
7
|
Haddad R, Litsa EE, Liu Z, Yu X, Burkhardt D, Bhisetti G. Targeted molecular generation with latent reinforcement learning. Sci Rep 2025; 15:15202. [PMID: 40307420 PMCID: PMC12043925 DOI: 10.1038/s41598-025-99785-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 04/22/2025] [Indexed: 05/02/2025] Open
Abstract
Computational methods for generating molecules with specific physiochemical properties or biological activity can greatly assist drug discovery efforts. Deep learning generative models constitute a significant step towards that direction. We introduce a novel approach that utilizes a Reinforcement Learning paradigm, called proximal policy optimization, for optimizing molecules in the latent space of a pretrained generative model. Working in the latent space of a generative model lets us bypass the need for explicitly defining chemical rules when computationally designing molecules. The generation of molecules is achieved through navigating the latent space for identifying regions that correspond to molecules with desired properties. Proximal policy optimization is a state-of-the-art policy gradient algorithm capable of operating in continuous high-dimensional spaces in a sample-efficient manner. We have paired our optimization framework with the latent spaces of two different architectures of autoencoder models showing that the method is agnostic to the underlying architecture. We present results on commonly used benchmarks for molecule optimization that demonstrate that our method has comparable or even superior performance to state-of-the-art approaches. We additionally show how our method can generate molecules that contain a pre-specified substructure while simultaneously optimizing for molecular properties, a task highly relevant to real drug discovery scenarios.
Collapse
Affiliation(s)
| | | | - Zhen Liu
- Cellarity, Inc, Somerville, USA
- Carnegie Mellon University, Pittsburgh, USA
| | | | | | | |
Collapse
|
8
|
Choi J, Nam G, Choi J, Jung Y. A Perspective on Foundation Models in Chemistry. JACS AU 2025; 5:1499-1518. [PMID: 40313808 PMCID: PMC12042027 DOI: 10.1021/jacsau.4c01160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 02/07/2025] [Accepted: 02/07/2025] [Indexed: 05/03/2025]
Abstract
Foundation models are an emerging paradigm in artificial intelligence (AI), with successful examples like ChatGPT transforming daily workflows. Generally, foundation models are large-scale, pretrained models capable of adapting to various downstream tasks by leveraging extensive data and model scaling. Their success has inspired researchers to develop foundation models for a wide range of chemical challenges, from materials discovery to understanding structure-property relationships, areas where conventional machine learning (ML) models often face limitations. In addition, foundation models hold promise for addressing persistent ML challenges in chemistry, such as data scarcity and poor generalization. In this perspective, we review recent progress in the development of foundation models in chemistry across applications of varying scope. We also discuss emerging trends and provide an outlook on promising approaches for advancing foundation models in chemistry.
Collapse
Affiliation(s)
- Junyoung Choi
- Department
of Chemical and Biological Engineering, and Institute of Chemical
Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Gunwook Nam
- Department
of Chemical and Biological Engineering, and Institute of Chemical
Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Jaesik Choi
- Graduate
School of Artificial Intelligence, KAIST
Daejeon, 291 Daehak-ro,
N24, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Yousung Jung
- Department
of Chemical and Biological Engineering, and Institute of Chemical
Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
- Institute
of Engineering Research, Seoul National
University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
9
|
Sun S, Huggins DJ. Comparing Molecules Generated by MMPDB and REINVENT4 with Ideas from Drug Discovery Design Teams. J Chem Inf Model 2025; 65:4219-4231. [PMID: 40207451 DOI: 10.1021/acs.jcim.5c00250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2025]
Abstract
This study compares molecules designed by drug discovery project teams from the Sanders Tri-Institutional Therapeutics Discovery Institute with molecules generated by two computational tools: MMPDB and REINVENT4. Seven different test cases with diverse chemotypes are studied in order to explore the potential of these computational tools in complementing human expertise in the early stages of drug discovery. By comparing the molecular structures and properties generated by MMPDB and REINVENT4 to those designed by project design teams, we aim to assess the value of such tools. The results indicate that MMPDB and REINVENT4 cover regions of chemical space larger than those covered by ideas from the drug discovery project teams. However, the chemical spaces covered by the two methods are quite different, and neither method completely covers the chemical space identified by the drug discovery project teams. Thus, the computational methods are complementary to one another and to drug discovery project team ideation. Effective application of generative molecule design tools has the potential to accelerate the identification of novel therapeutic candidates by expanding the chemical space explored during drug discovery and enabling optimal exploration.
Collapse
Affiliation(s)
- Shan Sun
- Sanders Tri-Institutional Therapeutics Discovery Institute, New York, New York 10021, United States
| | - David J Huggins
- Sanders Tri-Institutional Therapeutics Discovery Institute, New York, New York 10021, United States
- Department of Physiology and Biophysics, Weill Cornell Medical College of Cornell University, New York, New York 10065, United States
| |
Collapse
|
10
|
Mroz AM, Basford AR, Hastedt F, Jayasekera IS, Mosquera-Lois I, Sedgwick R, Ballester PJ, Bocarsly JD, Antonio Del Río Chanona E, Evans ML, Frost JM, Ganose AM, Greenaway RL, Kuok Mimi Hii K, Li Y, Misener R, Walsh A, Zhang D, Jelfs KE. Cross-disciplinary perspectives on the potential for artificial intelligence across chemistry. Chem Soc Rev 2025. [PMID: 40278836 PMCID: PMC12024683 DOI: 10.1039/d5cs00146c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Indexed: 04/26/2025]
Abstract
From accelerating simulations and exploring chemical space, to experimental planning and integrating automation within experimental labs, artificial intelligence (AI) is changing the landscape of chemistry. We are seeing a significant increase in the number of publications leveraging these powerful data-driven insights and models to accelerate all aspects of chemical research. For example, how we represent molecules and materials to computer algorithms for predictive and generative models, as well as the physical mechanisms by which we perform experiments in the lab for automation. Here, we present ten diverse perspectives on the impact of AI coming from those with a range of backgrounds from experimental chemistry, computational chemistry, computer science, engineering and across different areas of chemistry, including drug discovery, catalysis, chemical automation, chemical physics, materials chemistry. The ten perspectives presented here cover a range of themes, including AI for computation, facilitating discovery, supporting experiments, and enabling technologies for transformation. We highlight and discuss imminent challenges and ways in which we are redefining problems to accelerate the impact of chemical research via AI.
Collapse
Affiliation(s)
- Austin M Mroz
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
- I-X Centre for AI in Science, Imperial College London, London W12 0BZ, UK
| | - Annabel R Basford
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | - Friedrich Hastedt
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK
| | | | | | - Ruby Sedgwick
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Joshua D Bocarsly
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, USA
| | | | - Matthew L Evans
- UCLouvain, Institute of Condensed Matter and Nanosciences (IMCN), Chemin des Étoiles 8, Louvain-la-Neuve 1348, Belgium
- Matgenix SRL, A6K Advanced Engineering Center, Charleroi, Belgium
- Datalab Industries Ltd, King's Lynn, Norfolk, UK
| | - Jarvist M Frost
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | - Alex M Ganose
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | | | | | - Yingzhen Li
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Ruth Misener
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Aron Walsh
- Department of Materials, Imperial College London, London SW7 2AZ, UK
| | - Dandan Zhang
- I-X Centre for AI in Science, Imperial College London, London W12 0BZ, UK
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Kim E Jelfs
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| |
Collapse
|
11
|
Edaugal J, Zhang D, Liu D, Glezakou VA, Sun N. Solvent Screening for Separation Processes Using Machine Learning and High-Throughput Technologies. CHEM & BIO ENGINEERING 2025; 2:210-228. [PMID: 40302870 PMCID: PMC12035567 DOI: 10.1021/cbe.4c00170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 02/13/2025] [Accepted: 02/16/2025] [Indexed: 05/02/2025]
Abstract
As the chemical industry shifts toward sustainable practices, there is a growing initiative to replace conventional fossil-derived solvents with environmentally friendly alternatives such as ionic liquids (ILs) and deep eutectic solvents (DESs). Artificial intelligence (AI) plays a key role in the discovery and design of novel solvents and the development of green processes. This review explores the latest advancements in AI-assisted solvent screening with a specific focus on machine learning (ML) models for physicochemical property prediction and separation process design. Additionally, this paper highlights recent progress in the development of automated high-throughput (HT) platforms for solvent screening. Finally, this paper discusses the challenges and prospects of ML-driven HT strategies for green solvent design and optimization. To this end, this review provides key insights to advance solvent screening strategies for future chemical and separation processes.
Collapse
Affiliation(s)
- Justin
P. Edaugal
- Advanced
Biofuels and Bioproducts Process Development Unit, Biological Systems
and Engineering Division, Lawrence Berkeley
National Laboratory, Emeryville, California 94608, United States
| | - Difan Zhang
- Physical
and Computational Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Dupeng Liu
- Advanced
Biofuels and Bioproducts Process Development Unit, Biological Systems
and Engineering Division, Lawrence Berkeley
National Laboratory, Emeryville, California 94608, United States
| | | | - Ning Sun
- Advanced
Biofuels and Bioproducts Process Development Unit, Biological Systems
and Engineering Division, Lawrence Berkeley
National Laboratory, Emeryville, California 94608, United States
| |
Collapse
|
12
|
Sil S, Datta I, Basu S. Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs. Front Mol Biosci 2025; 12:1542267. [PMID: 40264953 PMCID: PMC12011600 DOI: 10.3389/fmolb.2025.1542267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Accepted: 03/17/2025] [Indexed: 04/24/2025] Open
Abstract
Intrinsically Disordered Proteins (IDPs) challenge traditional structure-function paradigms by existing as dynamic ensembles rather than stable tertiary structures. Capturing these ensembles is critical to understanding their biological roles, yet Molecular Dynamics (MD) simulations, though accurate and widely used, are computationally expensive and struggle to sample rare, transient states. Artificial intelligence (AI) offers a transformative alternative, with deep learning (DL) enabling efficient and scalable conformational sampling. They leverage large-scale datasets to learn complex, non-linear, sequence-to-structure relationships, allowing for the modeling of conformational ensembles in IDPs without the constraints of traditional physics-based approaches. Such DL approaches have been shown to outperform MD in generating diverse ensembles with comparable accuracy. Most models rely primarily on simulated data for training and experimental data serves a critical role in validation, aligning the generated conformational ensembles with observable physical and biochemical properties. However, challenges remain, including dependence on data quality, limited interpretability, and scalability for larger proteins. Hybrid approaches combining AI and MD can bridge the gaps by integrating statistical learning with thermodynamic feasibility. Future directions include incorporating physics-based constraints and learning experimental observables into DL frameworks to refine predictions and enhance applicability. AI-driven methods hold significant promise in IDP research, offering novel insights into protein dynamics and therapeutic targeting while overcoming the limitations of traditional MD simulations.
Collapse
Affiliation(s)
- Souradeep Sil
- Department of Genetics, Osmania University, Hyderabad, India
| | - Ishita Datta
- Department of Genetics and Plant Breeding, Banaras Hindu University, Varanasi, India
| | - Sankar Basu
- Department of Microbiology, Asutosh College (Affiliated with University of Calcutta), Kolkata, India
| |
Collapse
|
13
|
Chen LY, Li YP. Uncertainty quantification with graph neural networks for efficient molecular design. Nat Commun 2025; 16:3262. [PMID: 40188130 PMCID: PMC11972353 DOI: 10.1038/s41467-025-58503-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 03/21/2025] [Indexed: 04/07/2025] Open
Abstract
Optimizing molecular design across expansive chemical spaces presents unique challenges, especially in maintaining predictive accuracy under domain shifts. This study integrates uncertainty quantification (UQ), directed message passing neural networks (D-MPNNs), and genetic algorithms (GAs) to address these challenges. We systematically evaluate whether UQ-enhanced D-MPNNs can effectively optimize broad, open-ended chemical spaces and identify the most effective implementation strategies. Using benchmarks from the Tartarus and GuacaMol platforms, our results show that UQ integration via probabilistic improvement optimization (PIO) enhances optimization success in most cases, supporting more reliable exploration of chemically diverse regions. In multi-objective tasks, PIO proves especially advantageous, balancing competing objectives and outperforming uncertainty-agnostic approaches. This work provides practical guidelines for integrating UQ in computational-aided molecular design (CAMD).
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, Taipei, Taiwan, ROC
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, Taipei, Taiwan, ROC.
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), Taipei, Taiwan, ROC.
| |
Collapse
|
14
|
Hassen AK, Šícho M, van Aalst YJ, Huizenga MCW, Reynolds DNR, Luukkonen S, Bernatavicius A, Clevert DA, Janssen APA, van Westen GJP, Preuss M. Generate what you can make: achieving in-house synthesizability with readily available resources in de novo drug design. J Cheminform 2025; 17:41. [PMID: 40155970 PMCID: PMC11954305 DOI: 10.1186/s13321-024-00910-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 09/28/2024] [Indexed: 04/01/2025] Open
Abstract
Computer-Aided Synthesis Planning (CASP) and CASP-based approximated synthesizability scores have rarely been used as generation objectives in Computer-Aided Drug Design despite facilitating the in-silico generation of synthesizable molecules. However, these synthesizability approaches are disconnected from the reality of small laboratory drug design, where building block resources are limited, thus making the notion of in-house synthesizability with already available resources highly desirable. In this work, we show a successful in-house de novo drug design workflow generating active and in-house synthesizable ligands of monoglyceride lipase (MGLL). First, we demonstrate the successful transfer of CASP from 17.4 million commercial building blocks to a small laboratory setting of roughly 6000 building blocks with only a decrease of -12% in CASP success when accepting two reaction-steps longer synthesis routes on average. Next, we present a rapidly retrainable in-house synthesizability score, successfully capturing our in-house synthesizability without relying on external building block resources. We show that including our in-house synthesizability score in a multi-objective de novo drug design workflow, alongside a simple QSAR model, provides thousands of potentially active and easily in-house synthesizable molecules. Finally, we experimentally evaluate the synthesis and biochemical activity of three de novo candidates using their CASP-suggested synthesis routes employing only in-house building blocks. We find one candidate with evident activity, suggesting potential new ligand ideas for MGLL inhibitors while showcasing the usefulness of our in-house synthesizability score for de novo drug design.Scientific contribution Our core scientific contribution is the introduction of in-house de novo drug design, which enables the practical application of generative methods in small laboratories by utilizing a limited stock of available building blocks. Our fast-to-adapt workflow for in-house synthesizability scoring requires minimal computational retraining costs while supporting a high diversity of generated structures. We highlight the practicality of our approach through a comprehensive in-vitro case study that relies entirely on in-house resources, including in-silico generation, synthesis planning, and activity evaluation.
Collapse
Affiliation(s)
- Alan Kai Hassen
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands.
- Machine Learning Research, Pfizer Research and Development, Berlin, Germany.
| | - Martin Šícho
- Leiden Academic Centre of Drug Research, Leiden University, Leiden, The Netherlands
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technolog, University of Chemistry and Technology Prague, Prague, Czech Republic
| | - Yorick J van Aalst
- Leiden Academic Centre of Drug Research, Leiden University, Leiden, The Netherlands
| | | | - Darcy N R Reynolds
- Leiden Institute of Chemistry, Leiden University, Leiden, The Netherlands
| | - Sohvi Luukkonen
- Leiden Academic Centre of Drug Research, Leiden University, Leiden, The Netherlands
| | - Andrius Bernatavicius
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
- Leiden Academic Centre of Drug Research, Leiden University, Leiden, The Netherlands
| | - Djork-Arné Clevert
- Machine Learning Research, Pfizer Research and Development, Berlin, Germany
| | | | - Gerard J P van Westen
- Leiden Academic Centre of Drug Research, Leiden University, Leiden, The Netherlands.
| | - Mike Preuss
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
15
|
Nadkarni I, Martínez Cordeiro JP, Aluru NR. Molecular Denoising Using Diffusion Models with Physics-Informed Priors. J Phys Chem Lett 2025; 16:3078-3085. [PMID: 40101967 DOI: 10.1021/acs.jpclett.5c00274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
Denoising Diffusion Probabilistic Models (DDPMs) are powerful generative models that have demonstrated superior performance in a variety of tasks and applications in material science and molecular graph modeling. Inspired by nonequilibrium statistical mechanics, these models iteratively degrade data through a forward diffusion process and then restore it by learning the time-reversal of the forward process. Despite their success, a significant drawback of DDPMs is their reliance on numerous iterations to generate high-quality samples, resulting in slow sampling. In this Letter, we introduce a strategy to improve DDPMs for atomistic systems by leveraging the thermodynamics of the data by deriving physics-informed priors. Drawing on principles from statistical mechanics, we derive physics-informed parameters for the prior distribution to initialize the Markov chain closer to the true data distribution. This strategy shortens the Markov chain, thereby improving the model's training efficiency and accelerating the sampling process. We demonstrate the effectiveness of our method in denoising noisy radial distribution functions obtained from a single atomic configuration of diverse Lennard-Jones and multiatomic liquids.
Collapse
Affiliation(s)
- Ishan Nadkarni
- Walker Department of Mechanical Engineering, Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin 78712, Texas, United States
| | - J P Martínez Cordeiro
- Walker Department of Mechanical Engineering, Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin 78712, Texas, United States
| | - Narayana R Aluru
- Walker Department of Mechanical Engineering, Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin 78712, Texas, United States
| |
Collapse
|
16
|
Croitoru A, Kumar A, Lambry JC, Lee J, Sharif S, Yu W, MacKerell AD, Aleksandrov A. Increasing the Accuracy and Robustness of the CHARMM General Force Field with an Expanded Training Set. J Chem Theory Comput 2025; 21:3044-3065. [PMID: 40033678 PMCID: PMC11938330 DOI: 10.1021/acs.jctc.5c00046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Small molecule empirical force fields (FFs), including the CHARMM General Force Field (CGenFF), are designed to have wide coverage of organic molecules and to rapidly assign parameters to molecules not explicitly included in the FF. Assignment of parameters to new molecules in CGenFF is based on a trained bond-angle-dihedral charge increment linear interpolation scheme for the partial atomic charges along with bonded parameters assigned based on analogy using a rules-based penalty score scheme associated with atom types and chemical connectivity. Accordingly, the accuracy of CGenFF is related to the extent of the training set of available parameters. In the present study that training set is extended by 1390 molecules selected to represent connectivities new to CGenFF training compounds. Quantum mechanical (QM) data for optimized geometries, bond, valence angle, and dihedral angle potential energy scans, interactions with water, molecular dipole moments, and electrostatic potentials were used as target data. The resultant bonded parameters and partial atomic charges were used to train a new version of the CGenFF program, v5.0, which was used to generate parameters for a validation set of molecules, including drug-like molecules approved by the FDA, which were then benchmarked against both experimental and QM data. CGenFF v5.0 shows overall improvements with respect to QM intramolecular geometries, vibrations, dihedral potential energy scans, dipole moments and interactions with water. Tests of pure solvent properties of 216 molecules show small improvements versus the previous release of CGenFF v2.5.1 reflecting the high quality of the Lennard-Jones parameters that were explicitly optimized during the initial optimization of both the CGenFF and the CHARMM36 force field. CGenFF v5.0 represents an improvement that is anticipated to more accurately model intramolecular geometries and strain energies as well as noncovalent interactions of drug-like and other organic molecules.
Collapse
Affiliation(s)
- Anastasia Croitoru
- Laboratoire d’Optique et Biosciences (CNRS UMR7645,
INSERM U1182), Ecole Polytechnique, Institut polytechnique de Paris, F-91128
Palaiseau, France
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Anmol Kumar
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Jean-Christophe Lambry
- Laboratoire d’Optique et Biosciences (CNRS UMR7645,
INSERM U1182), Ecole Polytechnique, Institut polytechnique de Paris, F-91128
Palaiseau, France
| | - Jihyeon Lee
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Suliman Sharif
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Wenbo Yu
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Alexander D. MacKerell
- Department of Pharmaceutical Sciences, School of Pharmacy,
University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA
| | - Alexey Aleksandrov
- Laboratoire d’Optique et Biosciences (CNRS UMR7645,
INSERM U1182), Ecole Polytechnique, Institut polytechnique de Paris, F-91128
Palaiseau, France
| |
Collapse
|
17
|
Ishida S, Sato T, Honma T, Terayama K. Large language models open new way of AI-assisted molecule design for chemists. J Cheminform 2025; 17:36. [PMID: 40128788 PMCID: PMC11934680 DOI: 10.1186/s13321-025-00984-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 03/07/2025] [Indexed: 03/26/2025] Open
Abstract
Recent advancements in artificial intelligence (AI)-based molecular design methodologies have offered synthetic chemists new ways to design functional molecules with their desired properties. While various AI-based molecule generators have significantly advanced toward practical applications, their effective use still requires specialized knowledge and skills concerning AI techniques. Here, we develop a large language model (LLM)-powered chatbot, ChatChemTS, that assists users in designing new molecules using an AI-based molecule generator through only chat interactions, including automated construction of reward functions for the specified properties. Our study showcases the utility of ChatChemTS through de novo design cases involving chromophores and anticancer drugs (epidermal growth factor receptor inhibitors), exemplifying single- and multiobjective molecule optimization scenarios, respectively. ChatChemTS is provided as an open-source package on GitHub at https://github.com/molecule-generator-collection/ChatChemTS . Scientific contribution ChatChemTS is an open-source application that assists users in utilizing an AI-based molecule generator, ChemTSv2, solely through chat interactions. This study demonstrates that LLMs possess the potential to utilize advanced software, such as AI-based molecular generators, which require specialized knowledge and technical skills.
Collapse
Affiliation(s)
- Shoichi Ishida
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.
- MolNavi LLC, #402 Wizard building 1-4-3 Sengen-cho Nishi-ku, Yokohama, Kanagawa, 220-0072, Japan.
| | - Tomohiro Sato
- RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Teruki Honma
- RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.
- MolNavi LLC, #402 Wizard building 1-4-3 Sengen-cho Nishi-ku, Yokohama, Kanagawa, 220-0072, Japan.
- RIKEN Center for Advanced Intelligence Project, 1-4-1, Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan.
- MDX Research Center for Element Strategy, Institute of Science Tokyo, 4259, Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa, 226-8501, Japan.
| |
Collapse
|
18
|
Exner TE, Dokler J, Friedrichs S, Seitz C, Bleken FL, Friis J, Hagelien TF, Mercuri F, Costa AL, Furxhi I, Sarimveis H, Afantitis A, Marvuglia A, Larrea-Gallegos GM, Serchi T, Serra A, Greco D, Nymark P, Himly M, Wiench K, Watzek N, Schillinger EK, Gavillet J, Lynch I, Karwath A, Haywood AL, Gkoutos GV, Hischier R. Going digital to boost safe and sustainable materials innovation markets. The digital safe-and-sustainability-by-design innovation approach of the PINK project. Comput Struct Biotechnol J 2025; 29:110-124. [PMID: 40241813 PMCID: PMC12002836 DOI: 10.1016/j.csbj.2025.03.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2025] [Revised: 03/10/2025] [Accepted: 03/11/2025] [Indexed: 04/18/2025] Open
Abstract
In this innovation report, we present the vision of the PINK project to foster Safe-and-Sustainable-by-Design (SSbD) advanced materials and chemicals (AdMas&Chems) development by integrating state-of-the-art computational modelling, simulation tools and data resources. PINK proposes a novel approach for the use of the SSbD Framework, whose innovative approach is based on the application of a multi-objective optimisation procedure for the criteria of functionality, safety, sustainability and cost efficiency. At the core is the PINK open innovation platform, a distributed system that integrates all relevant modelling resources enriched with advanced data visualisation and an AI-driven decision support system. Data and modelling tools from the, in large parts, independently developed areas of functional design, safety assessment, life cycle assessment & costing are brought together based on a newly created Interoperability Framework. The PINK In Silico Hub, as the user Interface to the platform, finally guides the user through the complete AdMas&Chems development process from idea creation to market introduction. Guided by two Developmental Case Studies, the process of building of the PINK Platform is iterative, ensuring industry readiness to implement and apply it. Additionally, the Industrial Demonstrator programme will be introduced as part of the final project phase, which allows industry partners and especially small and medium enterprises (SMEs) to become part of the PINK consortium. Feedback from the Demonstrators as well as other stakeholder-engagement activities and collaborations will shape the platform's final look and feel and, even more important, activities to assure long-term technical sustainability.
Collapse
Affiliation(s)
- Thomas E. Exner
- Seven Past Nine d.o.o., Hribljane, Cerknica 1380, Slovenia
- Seven Past Nine GmbH., Rebacker 6, Schopfheim 79650, Germany
| | - Joh Dokler
- Seven Past Nine d.o.o., Hribljane, Cerknica 1380, Slovenia
| | | | | | | | - Jesper Friis
- SINTEF AS, Strindvegen 4, Trondheim 7034, Norway
| | | | - Francesco Mercuri
- Istituto per lo Studio dei Materiali Nanostrutturati (ISMN), Consiglio Nazionale delle Ricerche, Via P. Gobetti 101, Bologna 40128, Italy
| | - Anna L. Costa
- Istituto di Scienza, Tecnologia e Sostenibilità per lo Sviluppo dei Materiali Ceramici. Consiglio Nationale Delle Ricerche (CNR-ISSMC), Via Granarolo 64, Faenza 48018, Italy
| | - Irini Furxhi
- Istituto di Scienza, Tecnologia e Sostenibilità per lo Sviluppo dei Materiali Ceramici. Consiglio Nationale Delle Ricerche (CNR-ISSMC), Via Granarolo 64, Faenza 48018, Italy
| | - Haralambos Sarimveis
- School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechniou, Athens 15780, Greece
| | | | - Antonino Marvuglia
- Luxembourg Institute of Science and Technology, 5, avenue des Hauts-Fourneaux, Esch-sur-Alzette 4362, Luxembourg
| | - Gustavo M. Larrea-Gallegos
- Luxembourg Institute of Science and Technology, 5, avenue des Hauts-Fourneaux, Esch-sur-Alzette 4362, Luxembourg
| | - Tommaso Serchi
- Luxembourg Institute of Science and Technology, 5, avenue des Hauts-Fourneaux, Esch-sur-Alzette 4362, Luxembourg
| | - Angela Serra
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Faculty of Medicine and Health Technology, Tampere University, Tampere 33520, Finland
- Division of Pharmaceutical Biosciences, Faculty of Pharmacy, University of Helsinki, Helsinki 00790, Finland
| | - Dario Greco
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Faculty of Medicine and Health Technology, Tampere University, Tampere 33520, Finland
- Division of Pharmaceutical Biosciences, Faculty of Pharmacy, University of Helsinki, Helsinki 00790, Finland
| | - Penny Nymark
- Institute for Environmental Medicine, Karolinska Institutet, Nobels Väg 5, Stockholm 17177, Sweden
| | - Martin Himly
- Department of Biosciences & Medical Biology, Paris Lodron Universität Salzburg, Hellbrunnerstrasse 34, Salzburg 5020, Austria
| | - Karin Wiench
- BASF SE, Carl Bosch Str. 38, Ludwigshafen am Rhein 67056, Germany
| | - Nico Watzek
- BASF SE, Carl Bosch Str. 38, Ludwigshafen am Rhein 67056, Germany
| | | | - Jérôme Gavillet
- Innovative Advanced Materials Initiative, Rue de Ransbeek 310, Bruxelles 1120, Belgium
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
- Centre for Environmental Research and Justice, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Andreas Karwath
- Department of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
- Centre for Health Data Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Alexe L. Haywood
- Department of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
- Centre for Health Data Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Georgios V. Gkoutos
- Centre for Environmental Research and Justice, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
- Department of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
- Centre for Health Data Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Roland Hischier
- Advancing Life Cycle Assessment Group, Swiss Federal Laboratories for Materials Science and Technology, Lerchenfeldstrasse 5, St. Gallen 9014, Switzerland
| |
Collapse
|
19
|
Chen Z, Meng Z, He T, Li H, Cao J, Xu L, Xiao H, Zhang Y, He X, Fang G. Crystal Structure Prediction Meets Artificial Intelligence. J Phys Chem Lett 2025; 16:2581-2591. [PMID: 40029992 DOI: 10.1021/acs.jpclett.4c03727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2025]
Abstract
Crystal structure prediction (CSP) represents a fundamental research frontier in computational materials science and chemistry, aiming to predict thermodynamically stable periodic structures from given chemical compositions. Traditional methods often face challenges such as high computational costs and local minima trapping. Recently, artificial intelligence methods, represented by generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, and large language models (LLMs), have revolutionized the traditional prediction paradigm. These computational frameworks efficiently extract chemical rules and structural features from crystal databases, significantly reducing computational costs while maintaining prediction accuracy. This Perspective systematically evaluates the advantages and limitations of various generative models, explores their synergies with conventional approaches, and discusses their future prospects in accelerating materials discovery and development, providing new insights for future research directions.
Collapse
Affiliation(s)
- Zian Chen
- College of Chemistry and Materials Engineering, Wenzhou University, Wenzhou 325035, China
| | - Zijun Meng
- College of Chemistry and Materials Engineering, Wenzhou University, Wenzhou 325035, China
| | - Tao He
- College of Chemistry and Materials Engineering, Wenzhou University, Wenzhou 325035, China
| | - Haichao Li
- College of Chemistry and Materials Engineering, Wenzhou University, Wenzhou 325035, China
| | - Jian Cao
- College of Chemistry and Materials Engineering, Wenzhou University, Wenzhou 325035, China
| | - Lina Xu
- College of Chemistry and Materials Engineering, Wenzhou University, Wenzhou 325035, China
| | - Hongping Xiao
- College of Chemistry and Materials Engineering, Wenzhou University, Wenzhou 325035, China
| | - Yueyu Zhang
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing 401120, China
- New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 200062, China
| | - Guoyong Fang
- College of Chemistry and Materials Engineering, Wenzhou University, Wenzhou 325035, China
| |
Collapse
|
20
|
Park J, Ahn J, Choi J, Kim J. Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-Directed Molecular Generation. J Chem Inf Model 2025; 65:2283-2296. [PMID: 39988822 PMCID: PMC11898073 DOI: 10.1021/acs.jcim.4c01669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 02/11/2025] [Accepted: 02/12/2025] [Indexed: 02/25/2025]
Abstract
Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence (AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates improved performance over existing approaches in generating molecules having the desired properties, including penalized LogP, QED, and celecoxib similarity, without any prior knowledge. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics.
Collapse
Affiliation(s)
- Jinyeong Park
- Department
of Computer Science and Engineering, Incheon
National University, Incheon 22012, Republic
of Korea
| | - Jaegyoon Ahn
- Department
of Computer Science and Engineering, Incheon
National University, Incheon 22012, Republic
of Korea
| | - Jonghwan Choi
- Division
of Software, Hallym University, Chuncheon-si, Kangwon-do 24252, Republic
of Korea
| | - Jibum Kim
- Department
of Computer Science and Engineering, Incheon
National University, Incheon 22012, Republic
of Korea
- Center
for Brain-Machine Interface, Incheon National
University, Incheon 22012, Republic
of Korea
| |
Collapse
|
21
|
Rizzi A, Mandelli D. High performance-oriented computer aided drug design approaches in the exascale era. Expert Opin Drug Discov 2025:1-10. [PMID: 39953911 DOI: 10.1080/17460441.2025.2468289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 01/25/2025] [Accepted: 02/13/2025] [Indexed: 02/17/2025]
Abstract
INTRODUCTION In 2023, the first exascale supercomputer was opened to the public in the US. With a demonstrated 1.1 exaflops of performance, Frontier represents an unprecedented breakthrough in high-performance computing (HPC). Currently, more (and more powerful) machines are being installed worldwide. Computer-aided drug design (CADD) is one of the fields of computational science that can greatly benefit from exascale computing for the benefit of the whole society. However, scaling CADD approaches to exploit exascale machines require new algorithmic and software solutions. AREAS COVERED Here, the authors consider physics-based and machine learning (ML)-aided techniques for the design of small molecule binders capable of leveraging modern parallel computer architectures. Specifically, the authors focus on HPC-oriented large-scale applications from the past 3 years that were enabled by (pre)exascale supercomputers by running on up tothousands of accelerated nodes. EXPERT OPINION In the area of ML, exascale computers can enable the training of generative models with unprecedented predictive power to design novel ligands, provided large amounts of high-quality data are available. Exascale computers could also unlock the potential of accurate ML-aided physics-based methods to boost the success rate of structure-based drug design campaigns. Currently, however, methodological developments are still required to allow routine large-scale applications of such rigorous approaches.
Collapse
Affiliation(s)
- Andrea Rizzi
- Computational Biomedicine (INM-9), Forschungszentrum Jülich Gmbh, Wilhelm-Johnen Straße, Jülich, Germany
- Atomistic Simulations, Italian Institute of Technology, via Morego, Genova, Italy
| | - Davide Mandelli
- Computational Biomedicine (INM-9), Forschungszentrum Jülich Gmbh, Wilhelm-Johnen Straße, Jülich, Germany
| |
Collapse
|
22
|
Reymond JL. Chemical space as a unifying theme for chemistry. J Cheminform 2025; 17:6. [PMID: 39825400 PMCID: PMC11740331 DOI: 10.1186/s13321-025-00954-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Accepted: 01/09/2025] [Indexed: 01/20/2025] Open
Abstract
Chemistry has diversified from a basic understanding of the elements to studying millions of highly diverse molecules and materials, which together are conceptualized as the chemical space. A map of this chemical space where distances represent similarities between compounds can represent the mutual relationships between different subfields of chemistry and help the discipline to be viewed and understood globally.
Collapse
Affiliation(s)
- Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
23
|
Jin T, Singla V, Hsu HH, Savoie BM. Large property models: a new generative machine-learning formulation for molecules. Faraday Discuss 2025; 256:104-119. [PMID: 39660390 DOI: 10.1039/d4fd00113c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2024]
Abstract
Generative models for the inverse design of molecules with particular properties have been heavily hyped, but have yet to demonstrate significant gains over machine-learning-augmented expert intuition. A major challenge of such models is their limited accuracy in predicting molecules with targeted properties in the data-scarce regime, which is the regime typical of the prized outliers that it is hoped inverse models will discover. For example, activity data for a drug target or stability data for a material may only number in the tens to hundreds of samples, which is insufficient to learn an accurate and reasonably general property-to-structure inverse mapping from scratch. We've hypothesized that the property-to-structure mapping becomes unique when a sufficient number of properties are supplied to the models during training. This hypothesis has several important corollaries if true. It would imply that data-scarce properties can be completely determined using a set of more accessible molecular properties. It would also imply that a generative model trained on multiple properties would exhibit an accuracy phase transition after achieving a sufficient size-a process analogous to what has been observed in the context of large language models. To interrogate these behaviors, we have built the first transformers trained on the property-to-molecular-graph task, which we dub "large property models" (LPMs). A key ingredient is supplementing these models during training with relatively basic but abundant chemical property data. The motivation for the large-property-model paradigm, the model architectures, and case studies are presented here.
Collapse
Affiliation(s)
- Tianfan Jin
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana, USA
| | - Veerupaksh Singla
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana, USA
| | - Hsuan-Hao Hsu
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana, USA
| | - Brett M Savoie
- Department of Chemical and Biomolecular Engineering, The University of Notre Dame, Notre Dame, Indiana, USA.
| |
Collapse
|
24
|
Cheng AH, Ser CT, Skreta M, Guzmán-Cordero A, Thiede L, Burger A, Aldossary A, Leong SX, Pablo-García S, Strieth-Kalthoff F, Aspuru-Guzik A. Spiers Memorial Lecture: How to do impactful research in artificial intelligence for chemistry and materials science. Faraday Discuss 2025; 256:10-60. [PMID: 39400305 DOI: 10.1039/d4fd00153b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Machine learning has been pervasively touching many fields of science. Chemistry and materials science are no exception. While machine learning has been making a great impact, it is still not reaching its full potential or maturity. In this perspective, we first outline current applications across a diversity of problems in chemistry. Then, we discuss how machine learning researchers view and approach problems in the field. Finally, we provide our considerations for maximizing impact when researching machine learning for chemistry.
Collapse
Affiliation(s)
- Austin H Cheng
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Cher Tian Ser
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Andrés Guzmán-Cordero
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
- Tinbergen Institute, University of Amsterdam, Amsterdam, Netherlands
| | - Luca Thiede
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Andreas Burger
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | | | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore 63737, Singapore
| | | | | | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
- Acceleration Consortium, Toronto, Ontario M5G 1X6, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Canada
- Department of Materials Science and Engineering, University of Toronto, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Canada
| |
Collapse
|
25
|
Reidenbach D, Krishnapriyan AS. CoarsenConf: Equivariant Coarsening with Aggregated Attention for Molecular Conformer Generation. J Chem Inf Model 2025; 65:22-30. [PMID: 39688534 PMCID: PMC11733938 DOI: 10.1021/acs.jcim.4c01001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 10/24/2024] [Accepted: 11/11/2024] [Indexed: 12/18/2024]
Abstract
Molecular conformer generation (MCG) is an important task in cheminformatics and drug discovery. The ability to efficiently generate low-energy 3D structures can avoid expensive quantum mechanical simulations, leading to accelerated virtual screenings and enhanced structural exploration. Several generative models have been developed for MCG, but many struggle to consistently produce high-quality conformers for meaningful downstream applications. To address these issues, we introduce CoarsenConf, which coarse-grains molecular graphs based on torsional angles and integrates them into an SE(3)-equivariant hierarchical variational autoencoder. Through equivariant coarse-graining, we aggregate the fine-grained atomic coordinates of subgraphs connected via rotatable bonds, creating a variable-length coarse-grained latent representation. Our model uses a novel aggregated attention mechanism to restore fine-grained coordinates from the coarse-grained latent representation, enabling efficient generation of accurate conformers. Furthermore, we evaluate the chemical and biochemical quality of our generated conformers on multiple downstream applications, including property prediction and large-scale oracle-based protein docking. Overall, CoarsenConf generates more accurate conformer ensembles compared to prior generative models.
Collapse
Affiliation(s)
- Danny Reidenbach
- Department
of Chemical Engineering, Department of Computer Science, University of California Berkeley, Berkeley, California 94720, United States
- NVIDIA, Santa Clara, California 95051, United States
| | - Aditi S. Krishnapriyan
- Department
of Chemical Engineering, Department of Computer Science, University of California Berkeley, Berkeley, California 94720, United States
| |
Collapse
|
26
|
Liu T, Hwang L, Burley S, Nitsche C, Southan C, Walters W, Gilson M. BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data. Nucleic Acids Res 2025; 53:D1633-D1644. [PMID: 39574417 PMCID: PMC11701568 DOI: 10.1093/nar/gkae1075] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 10/16/2024] [Accepted: 10/23/2024] [Indexed: 01/18/2025] Open
Abstract
BindingDB (bindingdb.org) is a public, web-accessible database of experimentally measured binding affinities between small molecules and proteins, which supports diverse applications including medicinal chemistry, biochemical pathway annotation, training of artificial intelligence models and computational chemistry methods development. This update reports significant growth and enhancements since our last review in 2016. Of note, the database now contains 2.9 million binding measurements spanning 1.3 million compounds and thousands of protein targets. This growth is largely attributable to our unique focus on curating data from US patents, which has yielded a substantial influx of novel binding data. Recent improvements include a remake of the website following responsive web design principles, enhanced search and filtering capabilities, new data download options and webservices and establishment of a long-term data archive replicated across dispersed sites. We also discuss BindingDB's positioning relative to related resources, its open data sharing policies, insights gleaned from the dataset and plans for future growth and development.
Collapse
Affiliation(s)
- Tiqing Liu
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | - Linda Hwang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers. The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Rutgers Cancer Institute, Robert Wood Johnson Medical School, New Brunswick, NJ 08903, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA; Rutgers Artificial Intelligence and Data Science (RAD) Collaboratory, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Carmen I Nitsche
- Cambridge Crystallographic Data Centre, Inc., Boston, MA 02108, USA
| | - Christopher Southan
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, EH8 9XD, UK
| | | | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
27
|
Hupatz H, Rahu I, Wang WC, Peets P, Palm EH, Kruve A. Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening. Anal Bioanal Chem 2025; 417:473-493. [PMID: 39138659 DOI: 10.1007/s00216-024-05471-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024]
Abstract
Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.
Collapse
Affiliation(s)
- Henrik Hupatz
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden
| | - Ida Rahu
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
| | - Wei-Chieh Wang
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
| | - Pilleriin Peets
- Institute of Biodiversity, Faculty of Biological Science, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Emma H Palm
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367, Belvaux, Luxembourg
| | - Anneli Kruve
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden.
- Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 8, 114 18, Stockholm, Sweden.
| |
Collapse
|
28
|
Cui Z, Qi C, Zhou T, Yu Y, Wang Y, Zhang Z, Zhang Y, Wang W, Liu Y. Artificial intelligence and food flavor: How AI models are shaping the future and revolutionary technologies for flavor food development. Compr Rev Food Sci Food Saf 2025; 24:e70068. [PMID: 39783879 DOI: 10.1111/1541-4337.70068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 10/16/2024] [Accepted: 11/04/2024] [Indexed: 01/12/2025]
Abstract
The food flavor science, traditionally reliant on experimental methods, is now entering a promising era with the help of artificial intelligence (AI). By integrating existing technologies with AI, researchers can explore and develop new flavor substances in a digital environment, saving time and resources. More and more research will use AI and big data to enhance product flavor, improve product quality, meet consumer needs, and drive the industry toward a smarter and more sustainable future. In this review, we elaborate on the mechanisms of flavor recognition and their potential impact on nutritional regulation. With the increase of data accumulation and the development of internet information technology, food flavor databases and food ingredient databases have made great progress. These databases provide detailed information on the nutritional content, flavor molecules, and chemical properties of various food compounds, providing valuable data support for the rapid evaluation of flavor components and the construction of screening technology. With the popularization of AI in various fields, the field of food flavor has also ushered in new development opportunities. This review explores the mechanisms of flavor recognition and the role of AI in enhancing food flavor analysis through high-throughput omics data and screening technologies. AI algorithms offer a pathway to scientifically improve product formulations, thereby enhancing flavor and customized meals. Furthermore, it discusses the safety challenges of integrating AI into the food flavor industry.
Collapse
Affiliation(s)
- Zhiyong Cui
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Chengliang Qi
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Tianxing Zhou
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
- Department of Bioinformatics, Faculty of Science, The University of Melbourne, Melbourne, Victoria, Australia
| | - Yanyang Yu
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Yueming Wang
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Zhiwei Zhang
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Yin Zhang
- Key Laboratory of Meat Processing of Sichuan, Chengdu University, Chengdu, China
| | - Wenli Wang
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Yuan Liu
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
- School of Food Science and Engineering, Ningxia University, Yinchuan, China
| |
Collapse
|
29
|
Kulichenko M, Nebgen B, Lubbers N, Smith JS, Barros K, Allen AEA, Habib A, Shinkle E, Fedik N, Li YW, Messerly RA, Tretiak S. Data Generation for Machine Learning Interatomic Potentials and Beyond. Chem Rev 2024; 124:13681-13714. [PMID: 39572011 DOI: 10.1021/acs.chemrev.4c00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2024]
Abstract
The field of data-driven chemistry is undergoing an evolution, driven by innovations in machine learning models for predicting molecular properties and behavior. Recent strides in ML-based interatomic potentials have paved the way for accurate modeling of diverse chemical and structural properties at the atomic level. The key determinant defining MLIP reliability remains the quality of the training data. A paramount challenge lies in constructing training sets that capture specific domains in the vast chemical and structural space. This Review navigates the intricate landscape of essential components and integrity of training data that ensure the extensibility and transferability of the resulting models. We delve into the details of active learning, discussing its various facets and implementations. We outline different types of uncertainty quantification applied to atomistic data acquisition and the correlations between estimated uncertainty and true error. The role of atomistic data samplers in generating diverse and informative structures is highlighted. Furthermore, we discuss data acquisition via modified and surrogate potential energy surfaces as an innovative approach to diversify training data. The Review also provides a list of publicly available data sets that cover essential domains of chemical space.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Justin S Smith
- NVIDIA Corporation, Santa Clara, California 95051, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Alice E A Allen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Adela Habib
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Emily Shinkle
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Richard A Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
30
|
Wu Y, Su T, Du B, Hu S, Xiong J, Pan D. Kolmogorov-Arnold Network Made Learning Physics Laws Simple. J Phys Chem Lett 2024; 15:12393-12400. [PMID: 39656192 DOI: 10.1021/acs.jpclett.4c02589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2024]
Abstract
In recent years, contrastive learning has gained widespread adoption in machine learning applications to physical systems primarily due to its distinctive cross-modal capabilities and scalability. Building on the foundation of Kolmogorov-Arnold Networks (KANs) [Liu, Z. et al. Kan: Kolmogorov-arnold networks. arXiv 2024, 2404.19756], we introduce a novel contrastive learning framework, Kolmogorov-Arnold Contrastive Crystal Property Pretraining (KCCP), which integrates the principles of CLIP and KAN to establish robust correlations between crystal structures and their physical properties. During the training process, we conducted a comparative analysis between Multilayer Perceptron (MLP) and KAN, revealing that KAN significantly outperforms MLP in both accuracy and convergence speed for this task. By extending the capabilities of contrastive learning to the realm of physical systems, KCCP offers a promising approach for constructing cross-data structural and cross-modal physical models, representing an area of considerable potential.
Collapse
Affiliation(s)
- Yue Wu
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Tianhao Su
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Bingsheng Du
- Yunnan Province Crystalline Silicon Material Technology Innovation Center, Yunnan Tongwei High Purity Crystalline Silicon Co., Ltd., Baoshan, Yunnan 678000, China
| | - Shunbo Hu
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
- Institute for the Conservation of Cultural Heritage, School of Cultural Heritage and Information Management, Shanghai University, 200444 Shanghai, China
- Ministry of Education Key Laboratory of Silicate Cultural Relics Conservation, Shanghai University, 200444 Shanghai, China
| | - Jie Xiong
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
| | - Deng Pan
- Materials Genome Institute, Shanghai University, 200444 Shanghai, China
- Ministry of Education Key Laboratory of Silicate Cultural Relics Conservation, Shanghai University, 200444 Shanghai, China
| |
Collapse
|
31
|
Yadav MK, Dahiya V, Tripathi MK, Chaturvedi N, Rashmi M, Ghosh A, Raj VS. Unleashing the future: The revolutionary role of machine learning and artificial intelligence in drug discovery. Eur J Pharmacol 2024; 985:177103. [PMID: 39515559 DOI: 10.1016/j.ejphar.2024.177103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 10/23/2024] [Accepted: 11/05/2024] [Indexed: 11/16/2024]
Abstract
Drug discovery is a complex and multifaceted process aimed at identifying new therapeutic compounds with the potential to treat various diseases. Traditional methods of drug discovery are often time-consuming, expensive, and characterized by low success rates. Because of this, there is an urgent need to improve the drug development process using new technologies. The integration of the current state-of-art of artificial intelligence (AI) and machine learning (ML) approaches with conventional methods will enhance the efficiency and effectiveness of pharmaceutical research. This review highlights the transformative impact of AI and ML in drug discovery, discussing current applications, challenges, and future directions in harnessing these technologies to accelerate the development of innovative therapeutics. We have discussed the latest developments in AI and ML technologies to streamline several stages of drug discovery, from target identification and validation to lead optimization and preclinical studies.
Collapse
Affiliation(s)
- Manoj Kumar Yadav
- Department of Biomedical Engineering, SRM University Delhi-NCR, Sonepat, Haryana, India.
| | - Vandana Dahiya
- Department of Biomedical Engineering, SRM University Delhi-NCR, Sonepat, Haryana, India
| | | | - Navaneet Chaturvedi
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Mayank Rashmi
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Arabinda Ghosh
- Department of Molecular Biology and Bioinformatics, Tripura University, Suryamaninagar, Tripura, India
| | - V Samuel Raj
- Center for Drug Design Discovery and Development (C4D), SRM University Delhi-NCR, Sonepat, Haryana, India.
| |
Collapse
|
32
|
Nangia AK. Molecular tweaking by generative cheminformatics and ligand-protein structures for rational drug discovery. Bioorg Chem 2024; 153:107920. [PMID: 39489080 DOI: 10.1016/j.bioorg.2024.107920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Accepted: 10/23/2024] [Indexed: 11/05/2024]
Abstract
The purpose of this review is two-fold: (1) to summarize artificial intelligence and machine learning approaches and document the role of ligand-protein structures in directing drug discovery; (2) to present examples of drugs from the recent literature (past decade) of case studies where such strategies have been applied to accelerate the discovery pipeline. Compared to 50 years ago when drug discovery was largely a synthetic chemist driven research exercise, today a holistic approach needs to be adopted with seamless integration between synthetic and medicinal chemistry, supramolecular complexes, computations, artificial intelligence, machine learning, structural biology, chemical biology, diffraction analytical tools, drugs databases, and pharmacology. The urgency for an integrated and collaborative platform to accelerate drug discovery in an academic setting is emphasized.
Collapse
Affiliation(s)
- Ashwini K Nangia
- School of Chemistry, University of Hyderabad, Hyderabad 500 046, India.
| |
Collapse
|
33
|
López-Pérez K, Avellaneda-Tamayo JF, Chen L, López-López E, Juárez-Mercado KE, Medina-Franco JL, Miranda-Quintana RA. Molecular similarity: Theory, applications, and perspectives. ARTIFICIAL INTELLIGENCE CHEMISTRY 2024; 2:100077. [PMID: 40124654 PMCID: PMC11928018 DOI: 10.1016/j.aichem.2024.100077] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]
Abstract
Molecular similarity pervades much of our understanding and rationalization of chemistry. This has become particularly evident in the current data-intensive era of chemical research, with similarity measures serving as the backbone of many Machine Learning (ML) supervised and unsupervised procedures. Here, we present a discussion on the role of molecular similarity in drug design, chemical space exploration, chemical "art" generation, molecular representations, and many more. We also discuss more recent topics in molecular similarity, like the ability to efficiently compare large molecular libraries.
Collapse
Affiliation(s)
- Kenneth López-Pérez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA
| | - Juan F. Avellaneda-Tamayo
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - Lexin Chen
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Section 14-740, Mexico City 07000, Mexico
| | - K. Eurídice Juárez-Mercado
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - José L. Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | | |
Collapse
|
34
|
He Y, Liu F, Min W, Liu G, Wu Y, Wang Y, Yan X, Yan B. De novo Design of Biocompatible Nanomaterials Using Quasi-SMILES and Recurrent Neural Networks. ACS APPLIED MATERIALS & INTERFACES 2024. [PMID: 39567202 DOI: 10.1021/acsami.4c15600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2024]
Abstract
Screening nanomaterials (NMs) with desired properties from the extensive chemical space presents significant challenges. The potential toxicity of NMs further limits their applications in biological systems. Traditional methods struggle with these complexities, but generative models offer a possible solution to producing new molecules without prior knowledge. However, converting complex 3D nanostructures into computer-readable formats remains a critical prerequisite. To overcome these challenges, we proposed an innovative deep-learning framework for the de novo design of biocompatible NMs. This framework comprises two predictive models and a generative model, utilizing a Quasi-SMILES representation to encode three-dimensional structural information on NMs. Our generative model successfully created 289 new NMs not previously seen in the training set. The predictive models identified a particularly promising NM characterized by high cellular uptake and low toxicity. This NM was successfully synthesized, and its predicted properties were experimentally validated. Our approach advances the application of artificial intelligence in NM design and provides a practical solution for balancing functionality and toxicity in NMs.
Collapse
Affiliation(s)
- Ying He
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Fang Liu
- Department of Plastic Surgery, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, Shandong 250014, PR China
- Jinan Clinical Research Center for Tissue Engineering Skin Regeneration and Wound Repair, Jinan Shandong 250014, PR China
| | - Weicui Min
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Guohong Liu
- School of Health, Guangzhou Vocational University of Science and Technology, Guangzhou 510555, China
| | - Yinbao Wu
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Yan Wang
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Xiliang Yan
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Bing Yan
- Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|
35
|
P de Oliveira SH, Pedawi A, Kenyon V, van den Bedem H. NGT: Generative AI with Synthesizability Guarantees Discovers MC2R Inhibitors from a Tera-Scale Virtual Screen. J Med Chem 2024; 67:19417-19427. [PMID: 39471377 DOI: 10.1021/acs.jmedchem.4c01763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2024]
Abstract
Commercially available, synthesis-on-demand virtual libraries contain upward of trillions of readily synthesizable compounds for drug discovery campaigns. These libraries are a critical resource for rapid cycles of in silico discovery, property optimization and in vitro validation. However, as these libraries continue to grow exponentially in size, traditional search strategies encounter significant limitations. Here we present NeuralGenThesis (NGT), an efficient reinforcement learning approach to generate compounds from ultralarge libraries that satisfy user-specified constraints. Our method first trains a generative model over a virtual library and subsequently trains a normalizing flow to learn a distribution over latent space that decodes constraint-satisfying compounds. NGT allows multiple constraints simultaneously without dictating how molecular properties are calculated. Using NGT, we generated potent and selective inhibitors for the melanocortin-2 receptor (MC2R) from a three trillion compound library. NGT offers a powerful and scalable solution for navigating ultralarge virtual libraries, accelerating drug discovery efforts.
Collapse
Affiliation(s)
| | - Aryan Pedawi
- Atomwise Inc, San Francisco, California 94108, United States
| | - Victor Kenyon
- Atomwise Inc, San Francisco, California 94108, United States
| | - Henry van den Bedem
- Atomwise Inc, San Francisco, California 94108, United States
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, California 94143, United States
| |
Collapse
|
36
|
Bernatavicius A, Šícho M, Janssen APA, Hassen AK, Preuss M, van Westen GJP. AlphaFold Meets De Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models. J Chem Inf Model 2024; 64:8113-8122. [PMID: 39475544 PMCID: PMC11558674 DOI: 10.1021/acs.jcim.4c00309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 10/15/2024] [Accepted: 10/15/2024] [Indexed: 11/12/2024]
Abstract
Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.
Collapse
Affiliation(s)
- Andrius Bernatavicius
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
- Leiden
Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
| | - Martin Šícho
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
- CZ-OPENSCREEN:
National Infrastructure for Chemical Biology, Department of Informatics
and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech
Republic
| | - Antonius P. A. Janssen
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
- Leiden
Institute of Chemistry, Leiden University, Einsteinweg 55, 2333CC Leiden, The
Netherlands
| | - Alan Kai Hassen
- Leiden
Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
| | - Mike Preuss
- Leiden
Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
| | - Gerard J. P. van Westen
- Leiden
Academic Centre for Drug Research, Leiden
University, Einsteinweg 55, 2333CC Leiden, The Netherlands
| |
Collapse
|
37
|
Cheng B. Response Matching for Generating Materials and Molecules. J Chem Theory Comput 2024; 20:9259-9266. [PMID: 39365029 PMCID: PMC11500275 DOI: 10.1021/acs.jctc.4c00998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 09/22/2024] [Accepted: 09/24/2024] [Indexed: 10/05/2024]
Abstract
Diffusion models have recently emerged as powerful tools for the generation of new molecular and material structures. The key insight is that the noise in these models is related to the response of the atoms to displacement, and the denoising step is thus analogous to the geometry relaxation of atomistic systems starting from a random structure. Building on this, we present a generative method called Response Matching (RM), which leverages the fact that each stable material or molecule exists at the minimum of its potential energy surface. Any perturbation induces a response in energy and stress, driving the structure back to equilibrium. Matching this response is closely related to score matching in diffusion models. Another important aspect of state-of-the-art diffusion models is the incorporation of physical symmetries such as translation, rotation, and periodicity. RM employs a machine learning interatomic potential and random structure search as the denoising model, inherently respecting these symmetries and exploiting the locality of atomic interactions. RM handles both molecules and bulk materials under the same framework. Its efficiency and generalization are demonstrated on three systems: a small organic molecular data set, stable crystals from the Materials Project, and one-shot learning on a single diamond configuration.
Collapse
Affiliation(s)
- Bingqing Cheng
- Department
of Chemistry, University of California, Berkeley, California 94720, United States
- The
Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria
| |
Collapse
|
38
|
Cheng G, Gong XG, Yin WJ. An approach for full space inverse materials design by combining universal machine learning potential, universal property model, and optimization algorithm. Sci Bull (Beijing) 2024; 69:3066-3074. [PMID: 39142945 DOI: 10.1016/j.scib.2024.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 05/12/2024] [Accepted: 06/03/2024] [Indexed: 08/16/2024]
Abstract
We present a full space inverse materials design (FSIMD) approach that fully automates the materials design for target physical properties without the need to provide the atomic composition, chemical stoichiometry, and crystal structure in advance. Here, we used density functional theory reference data to train a universal machine learning potential (UPot) and transfer learning to train a universal bulk modulus model (UBmod). Both UPot and UBmod were able to cover materials systems composed of any element among 42 elements. Interfaced with optimization algorithm and enhanced sampling, the FSIMD approach is applied to find the materials with the largest cohesive energy and the largest bulk modulus, respectively. NaCl-type ZrC was found to be the material with the largest cohesive energy. For bulk modulus, diamond was identified to have the largest value. The FSIMD approach is also applied to design materials with other multi-objective properties with accuracy limited principally by the amount, reliability, and diversity of the training data. The FSIMD approach provides a new way for inverse materials design with other functional properties for practical applications.
Collapse
Affiliation(s)
- Guanjian Cheng
- College of Energy, Soochow Institute for Energy and Materials InnovationS (SIEMIS), and Jiangsu Provincial Key Laboratory for Advanced Carbon Materials and Wearable Energy Technologies, Soochow University, Suzhou 215006, China; Shanghai Qi Zhi Institute, Shanghai 200232, China
| | - Xin-Gao Gong
- Key Laboratory for Computational Physical Sciences (MOE), Institute of Computational Physical Sciences, Fudan University, Shanghai 200438, China; Shanghai Qi Zhi Institute, Shanghai 200232, China
| | - Wan-Jian Yin
- College of Energy, Soochow Institute for Energy and Materials InnovationS (SIEMIS), and Jiangsu Provincial Key Laboratory for Advanced Carbon Materials and Wearable Energy Technologies, Soochow University, Suzhou 215006, China; Shanghai Qi Zhi Institute, Shanghai 200232, China.
| |
Collapse
|
39
|
Wang K, Huang Y, Wang Y, You Q, Wang L. Recent advances from computer-aided drug design to artificial intelligence drug design. RSC Med Chem 2024; 15:d4md00522h. [PMID: 39493228 PMCID: PMC11523840 DOI: 10.1039/d4md00522h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 10/09/2024] [Indexed: 11/05/2024] Open
Abstract
Computer-aided drug design (CADD), a cornerstone of modern drug discovery, can predict how a molecular structure relates to its activity and interacts with its target using structure-based and ligand-based methods. Fueled by ever-increasing data availability and continuous model optimization, artificial intelligence drug design (AIDD), as an enhanced iteration of CADD, has thrived in the past decade. AIDD demonstrates unprecedented opportunities in protein folding, property prediction, and molecular generation. It can also facilitate target identification, high-throughput screening (HTS), and synthetic route prediction. With AIDD involved, the process of drug discovery is greatly accelerated. Notably, AIDD offers the potential to explore uncharted territories of chemical space beyond current knowledge. In this perspective, we began by briefly outlining the main workflows and components of CADD. Then through showcasing exemplary cases driven by AIDD in recent years, we describe the evolving role of artificial intelligence (AI) in drug discovery from three distinct stages, that is, chemical library screening, linker generation, and de novo molecular generation. In this process, we attempted to draw comparisons between the features of CADD and AIDD.
Collapse
Affiliation(s)
- Keran Wang
- State Key Laboratory of Natural Medicines and, Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University Nanjing 210009 China +86 025 83271351 +86 15261483858
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University Nanjing 210009 China
| | - Yanwen Huang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University Beijing 100191 China
| | - Yan Wang
- Department of Urology, Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine Shanghai 201203 China +86 13122152007
| | - Qidong You
- State Key Laboratory of Natural Medicines and, Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University Nanjing 210009 China +86 025 83271351 +86 15261483858
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University Nanjing 210009 China
| | - Lei Wang
- State Key Laboratory of Natural Medicines and, Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University Nanjing 210009 China +86 025 83271351 +86 15261483858
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University Nanjing 210009 China
| |
Collapse
|
40
|
Hu J, Wu P, Li Y, Li Q, Wang S, Liu Y, Qian K, Yang G. Discovering Photoswitchable Molecules for Drug Delivery with Large Language Models and Chemist Instruction Training. Pharmaceuticals (Basel) 2024; 17:1300. [PMID: 39458941 PMCID: PMC11510428 DOI: 10.3390/ph17101300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 09/23/2024] [Accepted: 09/27/2024] [Indexed: 10/28/2024] Open
Abstract
Background: As large language models continue to expand in size and diversity, their substantial potential and the relevance of their applications are increasingly being acknowledged. The rapid advancement of these models also holds profound implications for the long-term design of stimulus-responsive materials used in drug delivery. Methods: The large model used Hugging Face's Transformers package with BigBird, Gemma, and GPT NeoX architectures. Pre-training used the PubChem dataset, and fine-tuning used QM7b. Chemist instruction training was based on Direct Preference Optimization. Drug Likeness, Synthetic Accessibility, and PageRank Scores were used to filter molecules. All computational chemistry simulations were performed using ORCA and Time-Dependent Density-Functional Theory. Results: To optimize large models for extensive dataset processing and comprehensive learning akin to a chemist's intuition, the integration of deeper chemical insights is imperative. Our study initially compared the performance of BigBird, Gemma, GPT NeoX, and others, specifically focusing on the design of photoresponsive drug delivery molecules. We gathered excitation energy data through computational chemistry tools and further investigated light-driven isomerization reactions as a critical mechanism in drug delivery. Additionally, we explored the effectiveness of incorporating human feedback into reinforcement learning to imbue large models with chemical intuition, enhancing their understanding of relationships involving -N=N- groups in the photoisomerization transitions of photoresponsive molecules. Conclusions: We implemented an efficient design process based on structural knowledge and data, driven by large language model technology, to obtain a candidate dataset of specific photoswitchable molecules. However, the lack of specialized domain datasets remains a challenge for maximizing model performance.
Collapse
Affiliation(s)
- Junjie Hu
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK; (J.H.); (Q.L.); (S.W.)
| | - Peng Wu
- School of Chemistry and Chemical Engineering, Ningxia University, Yinchuan 750014, China;
| | - Yulin Li
- Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong;
| | - Qi Li
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK; (J.H.); (Q.L.); (S.W.)
| | - Shiyi Wang
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK; (J.H.); (Q.L.); (S.W.)
| | - Yang Liu
- Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Third Hospital of Shanxi Medical University, Tongji Shanxi Hospital, Taiyuan 030032, China;
| | - Kun Qian
- Department of Information and Intelligence Development, Zhongshan Hospital, Fudan University, 180 Fenglin Road, Shanghai 200032, China
| | - Guang Yang
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK; (J.H.); (Q.L.); (S.W.)
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 6NP, UK
- School of Biomedical Engineering & Imaging Sciences, King’s College London, London WC2R 2LS, UK
| |
Collapse
|
41
|
Kneiding H, Balcells D. Augmenting genetic algorithms with machine learning for inverse molecular design. Chem Sci 2024:d4sc02934h. [PMID: 39296997 PMCID: PMC11404003 DOI: 10.1039/d4sc02934h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/09/2024] [Indexed: 09/21/2024] Open
Abstract
Evolutionary and machine learning methods have been successfully applied to the generation of molecules and materials exhibiting desired properties. The combination of these two paradigms in inverse design tasks can yield powerful methods that explore massive chemical spaces more efficiently, improving the quality of the generated compounds. However, such synergistic approaches are still an incipient area of research and appear underexplored in the literature. This perspective covers different ways of incorporating machine learning approaches into evolutionary learning frameworks, with the overall goal of increasing the optimization efficiency of genetic algorithms. In particular, machine learning surrogate models for faster fitness function evaluation, discriminator models to control population diversity on-the-fly, machine learning based crossover operations, and evolution in latent space are discussed. The further potential of these synergistic approaches in generative tasks is also assessed, outlining promising directions for future developments.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| |
Collapse
|
42
|
Schmid SP, Schlosser L, Glorius F, Jorner K. Catalysing (organo-)catalysis: Trends in the application of machine learning to enantioselective organocatalysis. Beilstein J Org Chem 2024; 20:2280-2304. [PMID: 39290209 PMCID: PMC11406055 DOI: 10.3762/bjoc.20.196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 08/09/2024] [Indexed: 09/19/2024] Open
Abstract
Organocatalysis has established itself as a third pillar of homogeneous catalysis, besides transition metal catalysis and biocatalysis, as its use for enantioselective reactions has gathered significant interest over the last decades. Concurrent to this development, machine learning (ML) has been increasingly applied in the chemical domain to efficiently uncover hidden patterns in data and accelerate scientific discovery. While the uptake of ML in organocatalysis has been comparably slow, the last two decades have showed an increased interest from the community. This review gives an overview of the work in the field of ML in organocatalysis. The review starts by giving a short primer on ML for experimental chemists, before discussing its application for predicting the selectivity of organocatalytic transformations. Subsequently, we review ML employed for privileged catalysts, before focusing on its application for catalyst and reaction design. Concluding, we give our view on current challenges and future directions for this field, drawing inspiration from the application of ML to other scientific domains.
Collapse
Affiliation(s)
- Stefan P Schmid
- Institute of Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich CH-8093, Switzerland
| | - Leon Schlosser
- Organisch-Chemisches Institut, Universität Münster, 48149 Münster, Germany
| | - Frank Glorius
- Organisch-Chemisches Institut, Universität Münster, 48149 Münster, Germany
| | - Kjell Jorner
- Institute of Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich CH-8093, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, ETH Zurich, Zurich CH-8093, Switzerland
| |
Collapse
|
43
|
Ramasamy N, Raj AJLP, Akula VV, Nagarasampatti Palani K. Leveraging experimental and computational tools for advancing carbon capture adsorbents research. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:55069-55098. [PMID: 39225926 DOI: 10.1007/s11356-024-34838-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 08/24/2024] [Indexed: 09/04/2024]
Abstract
CO2 emissions have been steadily increasing and have been a major contributor for climate change compelling nations to take decisive action fast. The average global temperature could reach 1.5 °C by 2035 which could cause a significant impact on the environment, if the emissions are left unchecked. Several strategies have been explored of which carbon capture is considered the most suitable for faster deployment. Among different carbon capture solutions, adsorption is considered both practical and sustainable for scale-up. But the development of adsorbents that can exhibit satisfactory performance is typically done through the experimental approach. This hit and trial method is costly and time consuming and often success is not guaranteed. Machine learning (ML) and other computational tools offer an alternate to this approach and is accessible to everyone. Often, the research towards materials focuses on maximizing its performance under simulated conditions. The aim of this study is to present a holistic view on progress in material research for carbon capture and the various tools available in this regard. Thus, in this review, we first present a context on the workflow for carbon capture material development before providing various machine learning and computational tools available to support researchers at each stage of the process. The most popular application of ML models is for predicting material performance and recommends that ML approaches can be utilized wherever possible so that experimentations can be focused on the later stages of the research and development.
Collapse
Affiliation(s)
- Niranjan Ramasamy
- Department of Chemical Engineering, Rajalakshmi Engineering College, Chennai, India
| | | | - Vedha Varshini Akula
- Department of Chemical Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, 602117, Kancheepuram, India
| | - Kavitha Nagarasampatti Palani
- Department of Chemical Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, 602117, Kancheepuram, India.
| |
Collapse
|
44
|
Sultan A, Sieg J, Mathea M, Volkamer A. Transformers for Molecular Property Prediction: Lessons Learned from the Past Five Years. J Chem Inf Model 2024; 64:6259-6280. [PMID: 39136669 DOI: 10.1021/acs.jcim.4c00747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pretraining data, optimal architecture selections, and promising pretraining objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.
Collapse
Affiliation(s)
- Afnan Sultan
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| | | | | | - Andrea Volkamer
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| |
Collapse
|
45
|
Kalasin S, Surareungchai W. Artificial intelligence-aiding lab-on-a-chip workforce designed oral [3.1.0] bi and [4.2.0] tricyclic catalytic interceptors inhibiting multiple SARS-CoV-2 protomers assisted by double-shell deep learning. RSC Adv 2024; 14:26897-26910. [PMID: 39193274 PMCID: PMC11347926 DOI: 10.1039/d4ra03965c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/20/2024] [Indexed: 08/29/2024] Open
Abstract
While each massive pandemic has claimed the lives of millions of vulnerable populations over the centuries, one limitation exists: that the Edisonian approach (human-directed with trial errors) relies on repurposing pharmaceuticals, designing drugs, and herbal remedies with the violation of Lipinski's rule of five druglikeness. It may lead to adverse health effects with long-term health multimorbidity. Nevertheless, declining birth rates and aging populations will likely cause a shift in society due to a shortage of a scientific workforce to defend against the next pandemic incursion. The challenge of combating the ongoing post-COVID-19 pandemic has been exacerbated by the lack of gold standard drugs to deactivate multiple SARS-CoV-2 protein targets. Meanwhile, there are three FDA-approved antivirals, Remdesivir, Molnupiravir, and Paxlovid, with moderate clinical efficacy and drug resistance. There is a pressing need for additional antivirals and prepared omics technology to combat the current and future devastating coronavirus pandemics. While there is a limitation of existing contemporary inhibitors to deactivate viral RNA replication with minimal rotational bonds, one strategy is to create Lipinski inhibitors with less than 10 rotational bonds and precise halogen bond placement to destabilize multiple viral protomers. This work describes the efforts to design gold-standard oral inhibitors of bi- and tri-cyclic catalytic interceptors with electrophilic heads using double-shell deep learning. Here, KS1 with and KS2 compounds designed by lab-on-a-chip technology attain 5-fold novel filtered-Lipinski, GHOSE, VEBER, EGAN, and MUEGGE druglikeness. The graph neural network (GNN) relies on module-initiation, expansion, relabeling atom index, and termination (METORITE) iterations, while the deep neural network (DNN) engages pinning, extraction, convolution, pooling, and flattening (PROOF) operations. The cyclic compound's specific halogen atom location enhances the nitrile catalytic head, which deactivates several viral protein targets. Initiating this lab-on-a-chip that is not susceptible to the aging process for creating clinical compounds can leverage a new path to many valuable drugs with speedy oral drug discovery, especially to defend the loss of vulnerable population and prevent multimorbidity that is susceptible to hidden viral persistence in the continuing aging times.
Collapse
Affiliation(s)
- Surachate Kalasin
- Faculty of Science and Nanoscience & Nanotechnology Graduate Program, King Mongkut's University of Technology Thonburi 10140 Thailand
| | - Werasak Surareungchai
- Faculty of Science and Nanoscience & Nanotechnology Graduate Program, King Mongkut's University of Technology Thonburi 10140 Thailand
- Pilot Plant Research and Development Laboratory, King Mongkut's University of Technology Thonburi 10150 Bangkok Thailand
- School of Bioresource and Technology, King Mongkut's University of Technology Thonburi 10150 Bangkok Thailand
- Analytical Sciences and National Doping Test Institute, Mahidol University Bangkok 10400 Thailand
| |
Collapse
|
46
|
Gryn'ova G, Bereau T, Müller C, Friederich P, Wade RC, Nunes-Alves A, Soares TA, Merz K. EDITORIAL: Chemical Compound Space Exploration by Multiscale High-Throughput Screening and Machine Learning. J Chem Inf Model 2024; 64:5737-5738. [PMID: 39129448 DOI: 10.1021/acs.jcim.4c01300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Affiliation(s)
- Ganna Gryn'ova
- School of Chemistry, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Tristan Bereau
- Institute for Theoretical Physics, Heidelberg University, Heidelberg 69120, Germany
| | - Carolin Müller
- Computer-Chemistry-Center, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nägelsbachstraße 25, Erlangen 91052, Germany
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Kaiserstr. 12, Karlsruhe 76131, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Kaiserstr. 12, Karlsruhe 76131, Germany
| | - Rebecca C Wade
- Molecular and Cellular Modeling Group, Heidelberg Institute for Theoretical Studies (HITS), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Im Neuenheimer Feld 329, Heidelberg 69120, Germany
- Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Im Neuenheimer Feld 205, Heidelberg 69120, Germany
| | - Ariane Nunes-Alves
- Institute of Chemistry, Technische Universität Berlin, Berlin 10623, Germany
| | - Thereza A Soares
- Department of Chemistry, FFCLRP, University of São Paulo, Ribeirão Preto 14040-901, Brazil
- Hylleraas Centre for Quantum Molecular Sciences, University of Oslo, Oslo 0315, Norway
| | - Kenneth Merz
- Department of Chemistry, Michigan State University, Michigan 48824, United States
| |
Collapse
|
47
|
Qin T, Wang Y, Kong M, Zhong H, Wu T, Xi Z, Qian Z, Li K, Cai Y, Wu J, Li W. Identification of potential PIM-2 inhibitors via ligand-based generative models, molecular docking and molecular dynamics simulations. Mol Divers 2024; 28:2245-2262. [PMID: 38954072 DOI: 10.1007/s11030-024-10916-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 06/11/2024] [Indexed: 07/04/2024]
Abstract
Proviral Integrations of Moloney-2 (PIM-2) kinase is a promising target for various cancers and other diseases, and its inhibitors hold potential for treating related diseases. However, there is currently no clinically available PIM-2 inhibitor. In this study, we constructed a generative model for de novo PIM-2 inhibitor design based on artificial intelligence, performed molecular docking and molecular dynamics (MD) simulations to develop an efficient PIM-2 inhibitor generative model and discover potential PIM-2 inhibitors. First, we designed a generative model based on a Bi-directional Long Short-Term Memory (BiLSTM) framework combined with a transfer learning strategy and generated a new PIM-2 small molecule library using existing active drug databases. The generated compound library was then virtually screened by molecular docking and scaffold similarity comparison, identifying 10 initial hit compounds with better performance. Next, using the inhibitor in the crystal structure as a positive control, we performed two rounds of MD simulations, with lengths of 100 ns and 500 ns, respectively, to study the dynamic stability of the protein-ligand systems of the 10 compounds with PIM-2. Analyzed the interactions with key hinge region residues, binding free energies, and changes in the ATP pocket size. The generative model demonstrates good molecular generation capability and can generate efficient novel molecules with similar physicochemical properties as active PIM-2 drugs. Among the 10 initially selected hit compounds, 5 compounds C3 (- 29.69 kcal/mol), C4 (- 33.31 kcal/mol), C5 (- 28.59 kcal/mol), C8 (- 34.68 kcal/mol), and C9 (- 25.88 kcal/mol) have higher binding energies with PIM-2 than the positive drug 3YR (- 26.18 kcal/mol). The MD simulation results are consistent with the docking analysis, these compounds have lower and more stable RMSD values for the complex systems with the reported positive drug 3YR and PIM-2 complex system. They can form long-term stable interactions with active site and the hinge region of PIM-2, which suggests these compounds are likely to have potent inhibitory effects on PIM-2. This study provides an efficient generative model for PIM-2 inhibitor research and discovers 5 potential novel PIM-2 inhibitors.
Collapse
Affiliation(s)
- Tianli Qin
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
- The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China
| | - Yijian Wang
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Miaomiao Kong
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Hongliang Zhong
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, 325000, Zhejiang, China
| | - Tao Wu
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Zixuan Xi
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Zhenyong Qian
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, 325000, Zhejiang, China
| | - Ke Li
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, 325000, Zhejiang, China
| | - Yuepiao Cai
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, 325000, China.
| | - Jianzhang Wu
- The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China.
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, 325000, Zhejiang, China.
| | - Wulan Li
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
48
|
Jin T, Zhao Q, Schofield AB, Savoie BM. Deductive machine learning models for product identification. Chem Sci 2024; 15:11995-12005. [PMID: 39092129 PMCID: PMC11290435 DOI: 10.1039/d3sc04909d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 06/09/2024] [Indexed: 08/04/2024] Open
Abstract
Deductive solution strategies are required in prediction scenarios that are under determined, when contradictory information is available, or more generally wherever one-to-many non-functional mappings occur. In contrast, most contemporary machine learning (ML) in the chemical sciences is inductive learning from example, with a fixed set of features. Chemical workflows are replete with situations requiring deduction, including many aspects of lab automation and spectral interpretation. Here, a general strategy is described for designing and training machine learning models capable of deduction that consists of combining individual inductive models into a larger deductive network. The training and testing of these models is demonstrated on the task of deducing reaction products from a mixture of spectral sources. The resulting models can distinguish between intended and unintended reaction outcomes and identify starting material based on a mixture of spectral sources. The models also perform well on tasks that they were not directly trained on, like performing structural inference using real rather than simulated spectral inputs, predicting minor products from named organic chemistry reactions, identifying reagents and isomers as plausible impurities, and handling missing or conflicting information. A new dataset of 1 124 043 simulated spectra that were generated to train these models is also distributed with this work. These findings demonstrate that deductive bottlenecks for chemical problems are not fundamentally insuperable for ML models.
Collapse
Affiliation(s)
- Tianfan Jin
- Department of Chemical Engineering, Purdue University West Lafayette USA
| | - Qiyuan Zhao
- Department of Chemical Engineering, Purdue University West Lafayette USA
| | - Andrew B Schofield
- Department of Chemical Engineering, Purdue University West Lafayette USA
| | - Brett M Savoie
- Department of Chemical Engineering, Purdue University West Lafayette USA
| |
Collapse
|
49
|
Hu J, Wu P, Wang S, Wang B, Yang G. A Human Feedback Strategy for Photoresponsive Molecules in Drug Delivery: Utilizing GPT-2 and Time-Dependent Density Functional Theory Calculations. Pharmaceutics 2024; 16:1014. [PMID: 39204359 PMCID: PMC11359544 DOI: 10.3390/pharmaceutics16081014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 07/11/2024] [Accepted: 07/19/2024] [Indexed: 09/04/2024] Open
Abstract
Photoresponsive drug delivery stands as a pivotal frontier in smart drug administration, leveraging the non-invasive, stable, and finely tunable nature of light-triggered methodologies. The generative pre-trained transformer (GPT) has been employed to generate molecular structures. In our study, we harnessed GPT-2 on the QM7b dataset to refine a UV-GPT model with adapters, enabling the generation of molecules responsive to UV light excitation. Utilizing the Coulomb matrix as a molecular descriptor, we predicted the excitation wavelengths of these molecules. Furthermore, we validated the excited state properties through quantum chemical simulations. Based on the results of these calculations, we summarized some tips for chemical structures and integrated them into the alignment of large-scale language models within the reinforcement learning from human feedback (RLHF) framework. The synergy of these findings underscores the successful application of GPT technology in this critical domain.
Collapse
Affiliation(s)
- Junjie Hu
- Faculty of Medicine, Imperial College London, London SW7 2AZ, UK
| | - Peng Wu
- School of Chemistry and Chemical Engineering, Ningxia University, Yinchuan 750014, China
| | - Shiyi Wang
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
| | - Binju Wang
- College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Guang Yang
- Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 6NP, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London WC2R 2LS, UK
| |
Collapse
|
50
|
Bakkers MJG, Ritschel T, Tiemessen M, Dijkman J, Zuffianò AA, Yu X, van Overveld D, Le L, Voorzaat R, van Haaren MM, de Man M, Tamara S, van der Fits L, Zahn R, Juraszek J, Langedijk JPM. Efficacious human metapneumovirus vaccine based on AI-guided engineering of a closed prefusion trimer. Nat Commun 2024; 15:6270. [PMID: 39054318 PMCID: PMC11272930 DOI: 10.1038/s41467-024-50659-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 07/12/2024] [Indexed: 07/27/2024] Open
Abstract
The prefusion conformation of human metapneumovirus fusion protein (hMPV Pre-F) is critical for eliciting the most potent neutralizing antibodies and is the preferred immunogen for an efficacious vaccine against hMPV respiratory infections. Here we show that an additional cleavage event in the F protein allows closure and correct folding of the trimer. We therefore engineered the F protein to undergo double cleavage, which enabled screening for Pre-F stabilizing substitutions at the natively folded protomer interfaces. To identify these substitutions, we developed an AI convolutional classifier that successfully predicts complex polar interactions often overlooked by physics-based methods and visual inspection. The combination of additional processing, stabilization of interface regions and stabilization of the membrane-proximal stem, resulted in a Pre-F protein vaccine candidate without the need for a heterologous trimerization domain that exhibited high expression yields and thermostability. Cryo-EM analysis shows the complete ectodomain structure, including the stem, and a specific interaction of the newly identified cleaved C-terminus with the adjacent protomer. Importantly, the protein induces high and cross-neutralizing antibody responses resulting in near complete protection against hMPV challenge in cotton rats, making the highly stable, double-cleaved hMPV Pre-F trimer an attractive vaccine candidate.
Collapse
Affiliation(s)
- Mark J G Bakkers
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands
- ForgeBio B.V., Amsterdam, The Netherlands
| | - Tina Ritschel
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands
- J&J Innovative Medicine Technology, R&D, New Brunswick, NJ, USA
| | | | - Jacobus Dijkman
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands
- Van 't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Machine Learning Lab, Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
| | - Angelo A Zuffianò
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands
- Promaton BV, Amsterdam, The Netherlands
| | - Xiaodi Yu
- Structural & Protein Science, Janssen Research and Development, Spring House, PA, 19044, USA
| | | | - Lam Le
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands
| | | | | | - Martijn de Man
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands
| | - Sem Tamara
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands
| | | | - Roland Zahn
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands
| | - Jarek Juraszek
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands
| | - Johannes P M Langedijk
- Janssen Vaccines & Prevention BV, Leiden, The Netherlands.
- ForgeBio B.V., Amsterdam, The Netherlands.
| |
Collapse
|