1
|
Mahajan S, Li Y. Toward Molecular Simulation Guided Design of Next-Generation Membranes: Challenges and Opportunities. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2025. [PMID: 40375598 DOI: 10.1021/acs.langmuir.4c05181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2025]
Abstract
Membranes provide energy-efficient solutions for separating ions from water, ion-ion separation, neutral or charged molecules, and mixed gases. Understanding the fundamental mechanisms and design principles for these separation challenges has significant applications in the food and agriculture, energy, pharmaceutical, and electronics industries and environmental remediation. In situ experimental probes to explore Angstrom-nanometer length-scale and pico-nanosecond time-scale phenomena remain limited. Currently, molecular simulations such as density functional theory, ab initio molecular dynamics (MD), all-atom MD, and coarse-grained MD provide physics-based predictive models to study these phenomena. The status of molecular simulations to study transport mechanisms and state-of-the-art membrane separation is discussed. Furthermore, limitations and open challenges in molecular simulations are discussed. Finally, the importance of molecular simulations in generating data sets for machine learning and exploration of membrane design space is addressed.
Collapse
Affiliation(s)
- Subhamoy Mahajan
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Ying Li
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
2
|
Mroz AM, Basford AR, Hastedt F, Jayasekera IS, Mosquera-Lois I, Sedgwick R, Ballester PJ, Bocarsly JD, Antonio Del Río Chanona E, Evans ML, Frost JM, Ganose AM, Greenaway RL, Kuok Mimi Hii K, Li Y, Misener R, Walsh A, Zhang D, Jelfs KE. Cross-disciplinary perspectives on the potential for artificial intelligence across chemistry. Chem Soc Rev 2025. [PMID: 40278836 PMCID: PMC12024683 DOI: 10.1039/d5cs00146c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Indexed: 04/26/2025]
Abstract
From accelerating simulations and exploring chemical space, to experimental planning and integrating automation within experimental labs, artificial intelligence (AI) is changing the landscape of chemistry. We are seeing a significant increase in the number of publications leveraging these powerful data-driven insights and models to accelerate all aspects of chemical research. For example, how we represent molecules and materials to computer algorithms for predictive and generative models, as well as the physical mechanisms by which we perform experiments in the lab for automation. Here, we present ten diverse perspectives on the impact of AI coming from those with a range of backgrounds from experimental chemistry, computational chemistry, computer science, engineering and across different areas of chemistry, including drug discovery, catalysis, chemical automation, chemical physics, materials chemistry. The ten perspectives presented here cover a range of themes, including AI for computation, facilitating discovery, supporting experiments, and enabling technologies for transformation. We highlight and discuss imminent challenges and ways in which we are redefining problems to accelerate the impact of chemical research via AI.
Collapse
Affiliation(s)
- Austin M Mroz
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
- I-X Centre for AI in Science, Imperial College London, London W12 0BZ, UK
| | - Annabel R Basford
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | - Friedrich Hastedt
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK
| | | | | | - Ruby Sedgwick
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Joshua D Bocarsly
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, USA
| | | | - Matthew L Evans
- UCLouvain, Institute of Condensed Matter and Nanosciences (IMCN), Chemin des Étoiles 8, Louvain-la-Neuve 1348, Belgium
- Matgenix SRL, A6K Advanced Engineering Center, Charleroi, Belgium
- Datalab Industries Ltd, King's Lynn, Norfolk, UK
| | - Jarvist M Frost
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | - Alex M Ganose
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | | | | | - Yingzhen Li
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Ruth Misener
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Aron Walsh
- Department of Materials, Imperial College London, London SW7 2AZ, UK
| | - Dandan Zhang
- I-X Centre for AI in Science, Imperial College London, London W12 0BZ, UK
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Kim E Jelfs
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| |
Collapse
|
3
|
Song W, Sun H. Local reaction condition optimization via machine learning. J Mol Model 2025; 31:143. [PMID: 40266356 DOI: 10.1007/s00894-025-06365-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2025] [Accepted: 03/31/2025] [Indexed: 04/24/2025]
Abstract
CONTEXT Reaction condition optimization addresses shared requirements across academia and industry, particularly in chemistry, pharmaceutical development, and fine chemical engineering. This review examines recent progress and persistent challenges in machine learning-guided optimization of localized reaction conditions, with an emphasis on three core aspects: dataset, condition representation, and optimization methods, as well as the main issues in each related stage. The review explores challenges such as dataset scarcity, data quality, and the "completeness trap" in dataset preparation stage, summarizes the limitations of current molecular representation techniques in condition representation stage, and discusses the search efficiency challenges of optimization methods in optimization stage. METHODS The review analyzes the molecular representation techniques and identifies them as the primary bottleneck in advancing localized reaction condition optimization. It further examines existing optimization methodologies. Among them, Bayesian optimization and active learning emerges as the most commonly applied approaches in this field, utilizing incremental learning mechanisms and human-in-the-loop strategies to minimize experimental data requirements while mitigating molecular representation limitations. The review concludes that advancements in molecular representation techniques are essential for developing more efficient optimization methods in the future.
Collapse
Affiliation(s)
- Wenhuan Song
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264209, China.
| | - Honggang Sun
- School of Mechanical, Electrical & Information Engineering, Shandong University, Weihai, 264209, China
| |
Collapse
|
4
|
Torren-Peraire P, Verhoeven J, Herman D, Ceulemans H, Tetko IV, Wegner JK. Improving route development using convergent retrosynthesis planning. J Cheminform 2025; 17:26. [PMID: 40016850 PMCID: PMC11869726 DOI: 10.1186/s13321-025-00953-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Accepted: 01/09/2025] [Indexed: 03/01/2025] Open
Abstract
Retrosynthesis consists of recursively breaking down a target molecule to produce a synthesis route composed of readily accessible building blocks. In recent years, computer-aided synthesis planning methods have allowed a greater exploration of potential synthesis routes, combining state-of-the-art machine-learning methods with chemical knowledge. However, these methods are generally developed to produce individual routes from a singular product to a set of proposed building blocks and are not designed to leverage potential shared paths between targets. These methods do not necessarily encompass real-world use cases in medicinal chemistry, where one seeks to synthesize sets of target compounds in a library mode, looking for maximal convergence into a shared retrosynthetic path going via advanced key intermediate compounds. Using a graph-based processing pipeline, we explore Johnson & Johnson Electronic Laboratory Notebooks (J&J ELN) and publicly available datasets to identify complex routes with multiple target molecules sharing common intermediates, producing convergent synthesis routes. We find that over 70% of all reactions are involved in convergent synthesis, covering over 80% of all projects in the case of J&J ELN data. Scientific contributionWe introduce a novel planning approach to develop convergent synthesis routes, which can search multiple products and intermediates simultaneously guided by state-of-the-art machine learning single-step retrosynthesis models, enhancing the overall efficiency and practical applicability of retrosynthetic planning. We evaluate the multi-step synthesis planning approach using the extracted convergent routes and observe that solvability is generally high across those routes, being able to identify a convergent route for over 80% of the test routes and showing an individual compound solvability of over 90%. We find that by using a convergent search approach, we can synthesize almost 30% more compounds simultaneously for J&J ELN as compared to using an individual search, while providing an increased use of common intermediates.
Collapse
Affiliation(s)
- Paula Torren-Peraire
- In-Silico Discovery, Research & Development, Johnson & Johnson, Beerse, 2340, Belgium.
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - Deutsches Forschungszentrum Für Gesundheit Und Umwelt (GmbH), Neuherberg, 86764, Germany.
| | - Jonas Verhoeven
- In-Silico Discovery, Research & Development, Johnson & Johnson, Beerse, 2340, Belgium
| | - Dorota Herman
- In-Silico Discovery, Research & Development, Johnson & Johnson, Beerse, 2340, Belgium
| | - Hugo Ceulemans
- In-Silico Discovery, Research & Development, Johnson & Johnson, Beerse, 2340, Belgium
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - Deutsches Forschungszentrum Für Gesundheit Und Umwelt (GmbH), Neuherberg, 86764, Germany
| | - Jörg K Wegner
- In-Silico Discovery, Research & Development, Johnson & Johnson, Cambridge, 02142, US
| |
Collapse
|
5
|
Klahn P. How Should we Teach Medicinal Chemistry in Higher Education to Prepare Students for a Future Career as Medicinal Chemists and Drug Designers? - A Teacher's Perspective. ChemMedChem 2025; 20:e202400791. [PMID: 39564941 PMCID: PMC11733470 DOI: 10.1002/cmdc.202400791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Indexed: 11/21/2024]
Abstract
In the recent two decades, the multidisciplinary field of medicinal chemistry has undergone several conceptual and technology-driven paradigm changes with significant impact on the skill set medicinal chemists need to acquire during their education. Considering the need for academic medicinal chemistry teaching, this article aims at identifying important skills, competences, and basic knowledge as general learning outcomes based on an analysis of the relevant stakeholders and concludes effective teaching strategies preparing students for a future career as medicinal chemists and drug designers.
Collapse
Affiliation(s)
- Philipp Klahn
- Department of Chemistry and Molecular BiologyDivision of Organic and Medicinal ChemistryUniversity of GothenburgMedicinaregatan 7B, NatriumGöteborg413 90Sweden
| |
Collapse
|
6
|
Zhang X, Lin H, Zhang M, Zhou Y, Ma J. A data-driven group retrosynthesis planning model inspired by neurosymbolic programming. Nat Commun 2025; 16:192. [PMID: 39747027 PMCID: PMC11695995 DOI: 10.1038/s41467-024-55374-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 12/10/2024] [Indexed: 01/04/2025] Open
Abstract
Deep generative models have garnered significant attention for their efficiency in drug discovery, yet the synthesis of proposed molecules remains a challenge. Retrosynthetic planning, a part of computer-assisted synthesis planning, addresses this challenge by recursively decomposing molecules using symbolic rules and machine-trained scoring functions. However, current methods often treat each molecule independently, missing the opportunity to utilize shared synthesis patterns and repeat pathways, which may contribute from known synthesis routes to newly emerging, similar molecules, a notable challenge with AI-generated small molecules. Our investigation reveals reusable synthesis patterns that augment the reaction template library, resulting in progressively decreasing marginal inference time as the algorithm processes more molecules. Nevertheless, expanding the library enlarges the search space, necessitating investigation into methods for effectively prediction of reactions in retrosynthesis search. Inspired by human learning, our algorithm, akin to neurosymbolic programming, builds upon commonly used multi-step concepts such as cascade and complementary reactions and can evolve from practical experiences, enhancing the prediction model for fundamental and compositional reaction templates. The evolutionary process involves wake, abstraction, and dreaming phases, alternatively extending the reaction template library and refining models for more efficient retrosynthesis. Our algorithm outperforms existing methods, discovers chemistry patterns, and significantly reduces inference time in retrosynthetic planning for a group of similar molecules, showcasing its potential in validating results from generative models.
Collapse
Affiliation(s)
- Xuefeng Zhang
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Haowei Lin
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Muhan Zhang
- Institute for Artificial Intelligence, Peking University, Beijing, China
| | - Yuan Zhou
- Yau Mathematical Sciences Center, Tsinghua University, Beijing, China.
- Beijing Institute of Mathematical Sciences and Applications, Beijing, China.
- Department of Mathematical Sciences, Tsinghua University, Beijing, China.
| | - Jianzhu Ma
- Department of Electronic Engineering, Tsinghua University, Beijing, China.
- Institute for AI Industry Research, Tsinghua University, Beijing, China.
| |
Collapse
|
7
|
Madushanka A, Laird E, Clark C, Kraka E. SmartCADD: AI-QM Empowered Drug Discovery Platform with Explainability. J Chem Inf Model 2024; 64:6799-6813. [PMID: 39177478 DOI: 10.1021/acs.jcim.4c00720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
Artificial intelligence (AI) has emerged as a pivotal force in enhancing productivity across various sectors, with its impact being profoundly felt within the pharmaceutical and biotechnology domains. Despite AI's rapid adoption, its integration into scientific research faces resistance due to myriad challenges: the opaqueness of AI models, the intricate nature of their implementation, and the issue of data scarcity. In response to these impediments, we introduce SmartCADD, an innovative, open-source virtual screening platform that combines deep learning, computer-aided drug design (CADD), and quantum mechanics methodologies within a user-friendly Python framework. SmartCADD is engineered to streamline the construction of comprehensive virtual screening workflows that incorporate a variety of formerly independent techniques─spanning ADMET property predictions, de novo 2D and 3D pharmacophore modeling, molecular docking, to the integration of explainable AI mechanisms. This manuscript highlights the foundational principles, key functionalities, and the unique integrative approach of SmartCADD. Furthermore, we demonstrate its efficacy through a case study focused on the identification of promising lead compounds for HIV inhibition. By democratizing access to advanced AI and quantum mechanics tools, SmartCADD stands as a catalyst for progress in pharmaceutical research and development, heralding a new era of innovation and efficiency.
Collapse
Affiliation(s)
- Ayesh Madushanka
- Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States
| | - Eli Laird
- Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States
| | - Corey Clark
- Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States
| | - Elfi Kraka
- Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States
| |
Collapse
|
8
|
Jang H, Seo S, Park S, Kim BJ, Choi GW, Choi J, Park C. De novo drug design through gradient-based regularized search in information-theoretically controlled latent space. J Comput Aided Mol Des 2024; 38:32. [PMID: 39190191 PMCID: PMC11349835 DOI: 10.1007/s10822-024-00571-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 07/31/2024] [Indexed: 08/28/2024]
Abstract
Over the last decade, automatic chemical design frameworks for discovering molecules with drug-like properties have significantly progressed. Among them, the variational autoencoder (VAE) is a cutting-edge approach that models the tractable latent space of the molecular space. In particular, the usage of a VAE along with a property estimator has attracted considerable interest because it enables gradient-based optimization of a given molecule. However, although successful results have been achieved experimentally, the theoretical background and prerequisites for the correct operation of this method have not yet been clarified. In view of the above, we theoretically analyze and rigorously reconstruct the entire framework. From the perspective of parameterized distribution and the information theory, we first describe how the previous model overcomes the limitations of the beta VAE in discovering molecules with the desired properties. Furthermore, we describe the prerequisites for training the above model. Next, from the log-likelihood perspective of each term, we reformulate the objectives for exploring latent space to generate drug-like molecules. The distributional constraints are defined in this study, which will break away from the invalid molecular search. We demonstrated that our model could discover a novel chemical compound for targeting BCL-2 family proteins in de novo approach. Through the theoretical analysis and practical implementation, the importance of the aforementioned prerequisites and constraints to operate the model was verified.
Collapse
Affiliation(s)
- Hyosoon Jang
- Graduate School of AI, POSTECH, 77 Cheongam-Ro, Pohang, 37673, Gyeongbuk, Republic of Korea
| | - Sangmin Seo
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, Seoul, 03722, Republic of Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, Seoul, 03722, Republic of Korea
| | - Byung Ju Kim
- UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Geon-Woo Choi
- Department of Medical Bigdata Convergence, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon, 24341, Gangwon-do, Republic of Korea
| | - Jonghwan Choi
- College of Information Science, Hallym University, 1 Hallymdaehak-gil, Chuncheon, 24252, Gangwon-do, Republic of Korea.
| | - Chihyun Park
- Department of Medical Bigdata Convergence, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon, 24341, Gangwon-do, Republic of Korea.
- Department of Compupter Science and Engineering, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon, 24341, Gangwon-do, Republic of Korea.
| |
Collapse
|
9
|
Zhang X, Liu J, Yang F, Zhang Q, Yang Z, Shah HA. Planning biosynthetic pathways of target molecules based on metabolic reaction prediction and AND-OR tree search. Comput Biol Chem 2024; 111:108106. [PMID: 38833912 DOI: 10.1016/j.compbiolchem.2024.108106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 05/06/2024] [Accepted: 05/13/2024] [Indexed: 06/06/2024]
Abstract
Bioretrosynthesis problem is to predict synthetic routes using substrates for given natural products (NPs). However, the huge number of metabolic reactions leads to a combinatorial explosion of searching space, which is high time-consuming and costly. Here, we propose a framework called BioRetro to predict bioretrosynthesis pathways using a one-step bioretrosynthesis network, termed HybridMLP combined with AND-OR tree heuristic search. The HybridMLP predicts precursors that will produce the target NPs, while the AND-OR tree generates the iterative multi-step biosynthetic pathways. The one-step bioretrosynthesis prediction experiments are conducted on MetaNetX dataset by using HybridMLP, which achieves 46.5%, 74.6%, 81.6% in terms of the top-1, top-5, top-10 accuracies. The great performance demonstrates the effectiveness of HybridMLP in one-step bioretrosynthesis. Besides, the evaluation of two benchmark datasets reveals that BioRetro can significantly improve the speed and success rate in predicting biosynthesis pathways. In addition, the BioRetro is further shown to find the synthetic pathway of compounds, such as ginsenoside F1 with the same substrates as reported but different enzymes, which may be the novel potential enzyme to have better catalytic performance.
Collapse
Affiliation(s)
- Xiaolei Zhang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China.
| | - Feng Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Qiang Zhang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Zhihui Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Hayat Ali Shah
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China
| |
Collapse
|
10
|
Han Y, Xu X, Hsieh CY, Ding K, Xu H, Xu R, Hou T, Zhang Q, Chen H. Retrosynthesis prediction with an iterative string editing model. Nat Commun 2024; 15:6404. [PMID: 39080274 PMCID: PMC11289138 DOI: 10.1038/s41467-024-50617-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 07/09/2024] [Indexed: 08/02/2024] Open
Abstract
Retrosynthesis is a crucial task in drug discovery and organic synthesis, where artificial intelligence (AI) is increasingly employed to expedite the process. However, existing approaches employ token-by-token decoding methods to translate target molecule strings into corresponding precursors, exhibiting unsatisfactory performance and limited diversity. As chemical reactions typically induce local molecular changes, reactants and products often overlap significantly. Inspired by this fact, we propose reframing single-step retrosynthesis prediction as a molecular string editing task, iteratively refining target molecule strings to generate precursor compounds. Our proposed approach involves a fragment-based generative editing model that uses explicit sequence editing operations. Additionally, we design an inference module with reposition sampling and sequence augmentation to enhance both prediction accuracy and diversity. Extensive experiments demonstrate that our model generates high-quality and diverse results, achieving superior performance with a promising top-1 accuracy of 60.8% on the standard benchmark dataset USPTO-50 K.
Collapse
Affiliation(s)
- Yuqiang Han
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Xiaoyang Xu
- Polytechnic Institute, Zhejiang University, Hangzhou, 310015, China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310018, China
| | - Keyan Ding
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Hongxia Xu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310018, China
- Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 311121, China
| | - Renjun Xu
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310018, China.
| | - Qiang Zhang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China.
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China.
| | - Huajun Chen
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China.
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, 311200, China.
- Zhejiang-University-Ant-Group Joint Center for Knowledge Graphs, Hangzhou, 310000, China.
- Hangzhou Institute of Medicine Chinese Academy of Science, Hangzhou, 310023, China.
| |
Collapse
|
11
|
Li J, Lin K, Pei J, Lai L. Challenging Complexity with Simplicity: Rethinking the Role of Single-Step Models in Computer-Aided Synthesis Planning. J Chem Inf Model 2024; 64:5470-5479. [PMID: 38940765 DOI: 10.1021/acs.jcim.4c00432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
Computer-assisted synthesis planning has become increasingly important in drug discovery. While deep-learning models have shown remarkable progress in achieving high accuracies for single-step retrosynthetic predictions, their performances in retrosynthetic route planning need to be checked. This study compares the intricate single-step models with a straightforward template enumeration approach for retrosynthetic route planning on a real-world drug molecule data set. Despite the superior single-step accuracy of advanced models, the template enumeration method with a heuristic-based retrosynthesis knowledge score was found to surpass them in efficiency in searching the reaction space, achieving a higher or comparable solve rate within the same time frame. This counterintuitive result underscores the importance of efficiency and retrosynthesis knowledge in retrosynthesis route planning and suggests that future research should incorporate a simple template enumeration as a benchmark. It also suggests that this simple yet effective strategy should be considered alongside more complex models to better cater to the practical needs of computer-assisted synthesis planning in drug discovery.
Collapse
Affiliation(s)
- Junren Li
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Kangjie Lin
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Luhua Lai
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
12
|
Saigiridharan L, Hassen AK, Lai H, Torren-Peraire P, Engkvist O, Genheden S. AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application. J Cheminform 2024; 16:57. [PMID: 38778382 PMCID: PMC11112899 DOI: 10.1186/s13321-024-00860-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
We present an updated overview of the AiZynthFinder package for retrosynthesis planning. Since the first version was released in 2020, we have added a substantial number of new features based on user feedback. Feature enhancements include policies for filter reactions, support for any one-step retrosynthesis model, a scoring framework and several additional search algorithms. To exemplify the typical use-cases of the software and highlight some learnings, we perform a large-scale analysis on several hundred thousand target molecules from diverse sources. This analysis looks at for instance route shape, stock usage and exploitation of reaction space, and points out strengths and weaknesses of our retrosynthesis approach. The software is released as open-source for educational purposes as well as to provide a reference implementation of the core algorithms for synthesis prediction. We hope that releasing the software as open-source will further facilitate innovation in developing novel methods for synthetic route prediction. AiZynthFinder is a fast, robust and extensible open-source software and can be downloaded from https://github.com/MolecularAI/aizynthfinder .
Collapse
Affiliation(s)
| | - Alan Kai Hassen
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Helen Lai
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Paula Torren-Peraire
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Zentrum München, Neuherberg, Germany
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Samuel Genheden
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| |
Collapse
|
13
|
Ju W, Fang Z, Gu Y, Liu Z, Long Q, Qiao Z, Qin Y, Shen J, Sun F, Xiao Z, Yang J, Yuan J, Zhao Y, Wang Y, Luo X, Zhang M. A Comprehensive Survey on Deep Graph Representation Learning. Neural Netw 2024; 173:106207. [PMID: 38442651 DOI: 10.1016/j.neunet.2024.106207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 01/23/2024] [Accepted: 02/21/2024] [Indexed: 03/07/2024]
Abstract
Graph representation learning aims to effectively encode high-dimensional sparse graph-structured data into low-dimensional dense vectors, which is a fundamental task that has been widely studied in a range of fields, including machine learning and data mining. Classic graph embedding methods follow the basic idea that the embedding vectors of interconnected nodes in the graph can still maintain a relatively close distance, thereby preserving the structural information between the nodes in the graph. However, this is sub-optimal due to: (i) traditional methods have limited model capacity which limits the learning performance; (ii) existing techniques typically rely on unsupervised learning strategies and fail to couple with the latest learning paradigms; (iii) representation learning and downstream tasks are dependent on each other which should be jointly enhanced. With the remarkable success of deep learning, deep graph representation learning has shown great potential and advantages over shallow (traditional) methods, there exist a large number of deep graph representation learning techniques have been proposed in the past decade, especially graph neural networks. In this survey, we conduct a comprehensive survey on current deep graph representation learning algorithms by proposing a new taxonomy of existing state-of-the-art literature. Specifically, we systematically summarize the essential components of graph representation learning and categorize existing approaches by the ways of graph neural network architectures and the most recent advanced learning paradigms. Moreover, this survey also provides the practical and promising applications of deep graph representation learning. Last but not least, we state new perspectives and suggest challenging directions which deserve further investigations in the future.
Collapse
Affiliation(s)
- Wei Ju
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zheng Fang
- School of Intelligence Science and Technology, Peking University, Beijing, 100871, China
| | - Yiyang Gu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zequn Liu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100086, China
| | - Ziyue Qiao
- Artificial Intelligence Thrust, The Hong Kong University of Science and Technology, Guangzhou, 511453, China
| | - Yifang Qin
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jianhao Shen
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Fang Sun
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Zhiping Xiao
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Junwei Yang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jingyang Yuan
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yusheng Zhao
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yifan Wang
- School of Information Technology & Management, University of International Business and Economics, Beijing, 100029, China
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, 90095, USA.
| | - Ming Zhang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China.
| |
Collapse
|
14
|
Siebenmorgen T, Menezes F, Benassou S, Merdivan E, Didi K, Mourão ASD, Kitel R, Liò P, Kesselheim S, Piraud M, Theis FJ, Sattler M, Popowicz GM. MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery. NATURE COMPUTATIONAL SCIENCE 2024; 4:367-378. [PMID: 38730184 PMCID: PMC11136668 DOI: 10.1038/s43588-024-00627-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 04/11/2024] [Indexed: 05/12/2024]
Abstract
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.
Collapse
Affiliation(s)
- Till Siebenmorgen
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Filipe Menezes
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Sabrina Benassou
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany
| | | | - Kieran Didi
- Computer Laboratory, Cambridge University, Cambridge, UK
| | - André Santos Dias Mourão
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Radosław Kitel
- Faculty of Chemistry, Jagiellonian University, Krakow, Poland
| | - Pietro Liò
- Computer Laboratory, Cambridge University, Cambridge, UK
| | - Stefan Kesselheim
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany
| | - Marie Piraud
- Helmholtz AI, Helmholtz Munich, Neuherberg, Germany
| | - Fabian J Theis
- Helmholtz AI, Helmholtz Munich, Neuherberg, Germany
- Computational Health Center, Institute of Computational Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Michael Sattler
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Grzegorz M Popowicz
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany.
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany.
| |
Collapse
|
15
|
Harada S, Takenaka H, Ito T, Kanda H, Nemoto T. Valence-isomer selective cycloaddition reaction of cycloheptatrienes-norcaradienes. Nat Commun 2024; 15:2309. [PMID: 38485991 PMCID: PMC10940685 DOI: 10.1038/s41467-024-46523-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 02/29/2024] [Indexed: 03/18/2024] Open
Abstract
The rapid and precise creation of complex molecules while controlling multiple selectivities is the principal objective in synthetic chemistry. Combining data science and organic synthesis to achieve this goal is an emerging trend, but few examples of successful reaction designs are reported. We develop an artificial neural network regression model using bond orbital data to predict chemical reactivities. Actual experimental verification confirms cycloheptatriene-selective [6 + 2]-cycloaddition utilizing nitroso compounds and norcaradiene-selective [4 + 2]-cycloaddition reactions employing benzynes. Additionally, a one-pot asymmetric synthesis is achieved by telescoping the enantioselective dearomatization of non-activated benzenes and cycloadditions. Computational studies provide a rational explanation for the seemingly anomalous occurrence of thermally prohibited suprafacial [6 + 2]-cycloaddition without photoirradiation.
Collapse
Affiliation(s)
- Shingo Harada
- Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, 260-8675, Japan.
| | - Hiroki Takenaka
- Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, 260-8675, Japan
| | - Tsubasa Ito
- Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, 260-8675, Japan
| | - Haruki Kanda
- Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, 260-8675, Japan
| | - Tetsuhiro Nemoto
- Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, 260-8675, Japan.
| |
Collapse
|
16
|
Kanda H, Okabe A, Harada S, Nemoto T. Systematic Studies of Functional Group Tolerance and Chemoselectivity in Carbene-Mediated Intramolecular Cyclopropanation and Intermolecular C-H Functionalization. Chem Pharm Bull (Tokyo) 2024; 72:313-318. [PMID: 38494725 DOI: 10.1248/cpb.c24-00022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Generating reliable data on functional group compatibility and chemoselectivity is essential for evaluating the practicality of chemical reactions and predicting retrosynthetic routes. In this context, we performed systematic studies using a functional group evaluation kit including 26 kinds of additives to assess the functional group tolerance of carbene-mediated reactions. Our findings revealed that some intermolecular heteroatom-hydrogen insertion reactions proceed faster than intramolecular cyclopropanation reactions. Lewis basic functionalities inhibited rhodium-catalyzed C-H functionalization of indoles. While performing these studies, we observed an unexpected C-H functionalization of a 1-naphthol variant used as an additive.
Collapse
Affiliation(s)
- Haruki Kanda
- Graduate School of Pharmaceutical Sciences, Chiba University
| | - Ayaka Okabe
- Graduate School of Pharmaceutical Sciences, Chiba University
| | - Shingo Harada
- Graduate School of Pharmaceutical Sciences, Chiba University
| | | |
Collapse
|
17
|
Verma S, Paliwal S. Recent Developments and Applications of Biocatalytic and Chemoenzymatic Synthesis for the Generation of Diverse Classes of Drugs. Curr Pharm Biotechnol 2024; 25:448-467. [PMID: 37885105 DOI: 10.2174/0113892010238984231019085154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 08/26/2023] [Accepted: 09/19/2023] [Indexed: 10/28/2023]
Abstract
Biocatalytic and chemoenzymatic biosynthesis are powerful methods of organic chemistry that use enzymes to execute selective reactions and allow the efficient production of organic compounds. The advantages of these approaches include high selectivity, mild reaction conditions, and the ability to work with complex substrates. The utilization of chemoenzymatic techniques for the synthesis of complicated compounds has lately increased dramatically in the area of organic chemistry. Biocatalytic technologies and modern synthetic methods are utilized synergistically in a multi-step approach to a target molecule under this paradigm. Chemoenzymatic techniques are promising for simplifying access to essential bioactive compounds because of the remarkable regio- and stereoselectivity of enzymatic transformations and the reaction diversity of modern organic chemistry. Enzyme kits may include ready-to-use, reproducible biocatalysts. Its use opens up new avenues for the synthesis of active therapeutic compounds and aids in drug development by synthesizing active components to construct scaffolds in a targeted and preparative manner. This study summarizes current breakthroughs as well as notable instances of biocatalytic and chemoenzymatic synthesis. To assist organic chemists in the use of enzymes for synthetic applications, it also provides some basic guidelines for selecting the most appropriate enzyme for a targeted reaction while keeping aspects like cofactor requirement, solvent tolerance, use of whole cell or isolated enzymes, and commercial availability in mind.
Collapse
Affiliation(s)
- Swati Verma
- Department of Pharmacy, ITS College of Pharmacy, Muradnagar, Ghaziabad, India
- Department of Pharmacy, Banasthali Vidyapith, Banasthali, 304022, Rajasthan, India
| | - Sarvesh Paliwal
- Department of Pharmacy, Banasthali Vidyapith, Banasthali, 304022, Rajasthan, India
| |
Collapse
|
18
|
Ryu G, Kim GB, Yu T, Lee SY. Deep learning for metabolic pathway design. Metab Eng 2023; 80:130-141. [PMID: 37734652 DOI: 10.1016/j.ymben.2023.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 09/17/2023] [Accepted: 09/19/2023] [Indexed: 09/23/2023]
Abstract
The establishment of a bio-based circular economy is imperative in tackling the climate crisis and advancing sustainable development. In this realm, the creation of microbial cell factories is central to generating a variety of chemicals and materials. The design of metabolic pathways is crucial in shaping these microbial cell factories, especially when it comes to producing chemicals with yet-to-be-discovered biosynthetic routes. To aid in navigating the complexities of chemical and metabolic domains, computer-supported tools for metabolic pathway design have emerged. In this paper, we evaluate how digital strategies can be employed for pathway prediction and enzyme discovery. Additionally, we touch upon the recent strides made in using deep learning techniques for metabolic pathway prediction. These computational tools and strategies streamline the design of metabolic pathways, facilitating the development of microbial cell factories. Leveraging the capabilities of deep learning in metabolic pathway design is profoundly promising, potentially hastening the advent of a bio-based circular economy.
Collapse
Affiliation(s)
- Gahyeon Ryu
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Gi Bae Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Taeho Yu
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea; BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea; Graduate School of Engineering Biology, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
19
|
Hadfield TE, Scantlebury J, Deane CM. Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding. J Cheminform 2023; 15:84. [PMID: 37726844 PMCID: PMC10509074 DOI: 10.1186/s13321-023-00755-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 08/25/2023] [Indexed: 09/21/2023] Open
Abstract
Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach for assessing the extent to which machine learning-based virtual screening models are able to identify the functional groups responsible for binding. To sidestep the difficulty in establishing the ground truth importance of each atom of a large scale set of protein-ligand complexes, we propose a protocol for generating synthetic data. Each ligand in the dataset is surrounded by a randomly sampled point cloud of pharmacophores, and the label assigned to the synthetic protein-ligand complex is determined by a 3-dimensional deterministic binding rule. This allows us to precisely quantify the ground truth importance of each atom and compare it to the model generated attributions. Using our generated datasets, we demonstrate that a recently proposed deep learning-based virtual screening model, PointVS, identified the most important functional groups with 39% more efficiency than a fingerprint-based random forest, suggesting that it would generalise more effectively to new examples. In addition, we found that ligand-specific biases, such as those present in widely used virtual screening datasets, substantially impaired the ability of all ML models to identify the most important functional groups. We have made our synthetic data generation framework available to facilitate the benchmarking of new virtual screening models. Code is available at https://github.com/tomhadfield95/synthVS .
Collapse
Affiliation(s)
- Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Jack Scantlebury
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK.
| |
Collapse
|
20
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
21
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|