1
|
Huo Z, Xie X, Tong R. Machine Learning for Developing Sustainable Polymers. Chemistry 2025:e202500718. [PMID: 40266984 DOI: 10.1002/chem.202500718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 04/20/2025] [Accepted: 04/22/2025] [Indexed: 04/25/2025]
Abstract
Sustainable polymers from renewable resources have been gaining importance due to their recyclability and reduced environmental impact. However, their development through conventional trial-and-error methods remains inefficient and resource-intensive. Machine learning (ML) has emerged as a powerful tool in polymer science, enabling rapid prediction, and discovery of new chemicals and materials. In this review, we examine emerging trends in ML applications for sustainable polymer development, focusing on catalyst discovery, property optimization, and new polymer design. We analyze unique challenges in applying ML to sustainable polymers and evaluate proposed solutions, providing insights for future development in this rapidly evolving field.
Collapse
Affiliation(s)
- Ziyu Huo
- Department of Chemical Engineering, Virginia Polytechnic Institute and State University, 635 Prices Fork Road, Blacksburg, Virginia, 24061, USA
| | - Xiaoyu Xie
- Department of Chemical Engineering, Virginia Polytechnic Institute and State University, 635 Prices Fork Road, Blacksburg, Virginia, 24061, USA
| | - Rong Tong
- Department of Chemical Engineering, Virginia Polytechnic Institute and State University, 635 Prices Fork Road, Blacksburg, Virginia, 24061, USA
| |
Collapse
|
2
|
Chen J, Huang X, Hua C, He Y, Schwaller P. A multi-modal transformer for predicting global minimum adsorption energy. Nat Commun 2025; 16:3232. [PMID: 40185724 PMCID: PMC11971357 DOI: 10.1038/s41467-025-58499-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Accepted: 03/17/2025] [Indexed: 04/07/2025] Open
Abstract
The fast assessment of the global minimum adsorption energy (GMAE) between catalyst surfaces and adsorbates is crucial for large-scale catalyst screening. However, multiple adsorption sites and numerous possible adsorption configurations for each surface/adsorbate combination make it prohibitively expensive to calculate the GMAE through density functional theory (DFT). Thus, we designed a multi-modal transformer called AdsMT to rapidly predict the GMAE based on surface graphs and adsorbate feature vectors without site-binding information. The AdsMT model effectively captures the intricate relationships between adsorbates and surface atoms through the cross-attention mechanism, hence avoiding the enumeration of adsorption configurations. Three diverse benchmark datasets were introduced, providing a foundation for further research on the challenging GMAE prediction task. Our AdsMT framework demonstrates excellent performance by adopting the tailored graph encoder and transfer learning, achieving mean absolute errors of 0.09, 0.14, and 0.39 eV, respectively. Beyond GMAE prediction, AdsMT's cross-attention scores showcase the interpretable potential to identify the most energetically favorable adsorption sites. Additionally, uncertainty quantification was integrated into our models to enhance the trustworthiness of the predictions.
Collapse
Affiliation(s)
- Junwu Chen
- Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Xu Huang
- Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Department of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Cheng Hua
- Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China
| | - Yulian He
- Department of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, China.
- University of Michigan-Shanghai Jiao Tong University Joint Institute (UM-SJTU JI), Shanghai, China.
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| |
Collapse
|
3
|
Phan TL, Weinbauer K, Laffitte MEG, Pan Y, Merkle D, Andersen JL, Fagerberg R, Flamm C, Stadler PF. SynTemp: Efficient Extraction of Graph-Based Reaction Rules from Large-Scale Reaction Databases. J Chem Inf Model 2025; 65:2882-2896. [PMID: 40019281 PMCID: PMC11938280 DOI: 10.1021/acs.jcim.4c01795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 02/12/2025] [Accepted: 02/17/2025] [Indexed: 03/01/2025]
Abstract
Reaction templates are graphs that represent the reaction center as well as the surrounding context in order to specify salient features of chemical reactions. They are subgraphs of imaginary transition states, which are equivalent to double pushout graph rewriting rules and thus can be applied directly to predict reaction outcomes at the structural formula level. We introduce here SynTemp, a framework designed to extract and hierarchically cluster reaction templates from large-scale reaction data repositories. Rule inference is implemented as a robust graph-theoretic approach, which first computes an atom-atom mapping (AAM) as a consensus over partial predictions from multiple state-of-the-art tools and then augments the raw AAM by mechanistically relevant hydrogen atoms and extracts the reactions center extended by relevant context. SynTemp achieves an exceptional accuracy of 99.5% and a success rate of 71.23% in obtaining AAMs on the chemical reaction dataset. Hierarchical clustering of the extended reaction centers based on topological features results in a library of 311 transformation rules explaining 86% of the reaction dataset.
Collapse
Affiliation(s)
- Tieu-Long Phan
- Bioinformatics
Group, Department of Computer Science &Interdisciplinary Center
for Bioinformatics &School for Embedded and Composite Artificial
Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany
- Department
of Mathematics and Computer Science, University
of Southern Denmark, DK-5230 Odense M, Denmark
| | - Klaus Weinbauer
- Bioinformatics
Group, Department of Computer Science &Interdisciplinary Center
for Bioinformatics &School for Embedded and Composite Artificial
Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany
- Machine
Learning Research Unit, TU Wien Informatics, A-1040 Wien, Austria
| | - Marcos E. González Laffitte
- Bioinformatics
Group, Department of Computer Science &Interdisciplinary Center
for Bioinformatics &School for Embedded and Composite Artificial
Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany
- Center
for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Leipzig University, D-04103 Leipzig, Germany
| | - Yingjie Pan
- Department
of Mathematics and Computer Science, University
of Southern Denmark, DK-5230 Odense M, Denmark
- Department
of Theoretical Chemistry, University of
Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Daniel Merkle
- Department
of Mathematics and Computer Science, University
of Southern Denmark, DK-5230 Odense M, Denmark
- Faculty of
Technology, Bielefeld University, Postfach 10 01 31, D-33501 Bielefeld, Germany
| | - Jakob L. Andersen
- Department
of Mathematics and Computer Science, University
of Southern Denmark, DK-5230 Odense M, Denmark
| | - Rolf Fagerberg
- Department
of Mathematics and Computer Science, University
of Southern Denmark, DK-5230 Odense M, Denmark
| | - Christoph Flamm
- Department
of Theoretical Chemistry, University of
Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F. Stadler
- Bioinformatics
Group, Department of Computer Science &Interdisciplinary Center
for Bioinformatics &School for Embedded and Composite Artificial
Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany
- Max
Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
- Facultad
de Ciencias, Universidad National de Colombia, Bogotá CO-111321, Colombia
- Center
for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, DK-1870 Frederiksberg, Denmark
- Santa
Fe Institute, 1399 Hyde
Park Rd., Santa Fe, New Mexico 87501, United States
| |
Collapse
|
4
|
Ruiz-Moreno AJ, Del Castillo-Izquierdo Á, Tamargo-Rubio I, Fu J. MicrobeRX: a tool for enzymatic-reaction-based metabolite prediction in the gut microbiome. MICROBIOME 2025; 13:78. [PMID: 40108657 PMCID: PMC11921629 DOI: 10.1186/s40168-025-02070-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 02/23/2025] [Indexed: 03/22/2025]
Abstract
BACKGROUND The gut microbiome functions as a metabolic organ, producing numerous enzymes that influence host health; however, their substrates and metabolites remain largely unknown. RESULTS We present MicrobeRX, an enzyme-based metabolite prediction tool that employs 5487 human reactions and 4030 unique microbial reactions from 6286 genome-scale models, as well as 3650 drug metabolic reactions from the DrugBank database (v.5.1.12). MicrobeRX includes additional analysis modules for metabolite visualization and enzymatic and taxonomic analyses. When we applied MicrobeRX to 1083 orally administered drugs that have been approved in at least one jurisdiction at some point in time (DrugBank), it predicted metabolites with physicochemical properties and structures similar to metabolites found in biosamples (from MiMeDB). It also outperformed another existing metabolite prediction tool (BioTransformer 3.0) in terms of predictive potential, molecular diversity, reduction of redundant predictions, and enzyme annotation. CONCLUSIONS Our analysis revealed both unique and overlapping metabolic capabilities in human and microbial metabolism and chemo- and taxa-specific microbial biotransformations. MicrobeRX bridges the genomic and chemical spaces of the gut microbiome, making it a valuable tool for unlocking the chemical potential of the gut microbiome in human health, the food and pharmaceutical industries, and environmental safety. Video Abstract.
Collapse
Affiliation(s)
- Angel J Ruiz-Moreno
- Department of Genetics, University Medical Center Groningen, Groningen, 9713GZ, The Netherlands.
- Department of Pediatrics, University Medical Center Groningen, Groningen, 9713GZ, The Netherlands.
| | - Ángela Del Castillo-Izquierdo
- Department of Genetics, University Medical Center Groningen, Groningen, 9713GZ, The Netherlands
- Department of Medical Microbiology, University Medical Center Groningen, Groningen, 9713GZ, The Netherlands
| | - Isabel Tamargo-Rubio
- Department of Genetics, University Medical Center Groningen, Groningen, 9713GZ, The Netherlands
| | - Jingyuan Fu
- Department of Genetics, University Medical Center Groningen, Groningen, 9713GZ, The Netherlands.
- Department of Pediatrics, University Medical Center Groningen, Groningen, 9713GZ, The Netherlands.
| |
Collapse
|
5
|
Yin X, Wang X, Wu Z, Li Q, Kang Y, Deng Y, Luo P, Liu H, Shi G, Wang Z, Yao X, Hsieh CY, Hou T. Syn-MolOpt: a synthesis planning-driven molecular optimization method using data-derived functional reaction templates. J Cheminform 2025; 17:27. [PMID: 40025591 PMCID: PMC11874783 DOI: 10.1186/s13321-025-00975-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 02/18/2025] [Indexed: 03/04/2025] Open
Abstract
Molecular optimization is a crucial step in drug development, involving structural modifications to improve the desired properties of drug candidates. Although many deep-learning-based molecular optimization algorithms have been proposed and may perform well on benchmarks, they usually do not pay sufficient attention to the synthesizability of molecules, resulting in optimized compounds difficult to be synthesized. To address this issue, we first developed a general pipeline capable of constructing functional reaction template library specific to any property where a predictive model can be built. Based on these functional templates, we introduced Syn-MolOpt, a synthesis planning-oriented molecular optimization method. During optimization, functional reaction templates steer the process towards specific properties by effectively transforming relevant structural fragments. In four diverse tasks, including two toxicity-related (GSK3β-Mutagenicity and GSK3β-hERG) and two metabolism-related (GSK3β-CYP3A4 and GSK3β-CYP2C19) multi-property molecular optimizations, Syn-MolOpt outperformed three benchmark models (Modof, HierG2G, and SynNet), highlighting its efficacy and adaptability. Additionally, visualization of the synthetic routes for molecules optimized by Syn-MolOpt confirms the effectiveness of functional reaction templates in molecular optimization. Notably, Syn-MolOpt's robust performance in scenarios with limited scoring accuracy demonstrates its potential for real-world molecular optimization applications. By considering both optimization and synthesizability, Syn-MolOpt promises to be a valuable tool in molecular optimization.Scientific contribution Syn-MolOpt takes into account both molecular optimization and synthesis, allowing for the design of property-specific functional reaction template libraries for the properties to be optimized, and providing reference synthesis routes for the optimized compounds while optimizing the targeted properties. Syn-MolOpt's universal workflow makes it suitable for various types of molecular optimization tasks.
Collapse
Affiliation(s)
- Xiaodan Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- Liangzhu Laboratory, Zhejiang University, Hangzhou, 311121, China
| | - Xiaorui Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Qin Li
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, China
| | - Pei Luo
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
| | - Guqin Shi
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai, 310115, China
| | - Zheng Wang
- Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai, 310115, China
| | - Xiaojun Yao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
6
|
Ali M, Mizuno Y, Akiyama S, Nagata Y, Komatsuzaki T. Enumeration Approach to Atom-to-Atom Mapping Accelerated by Ising Computing. J Chem Inf Model 2025; 65:1901-1910. [PMID: 39893651 PMCID: PMC11863377 DOI: 10.1021/acs.jcim.4c01871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 11/30/2024] [Accepted: 12/27/2024] [Indexed: 02/04/2025]
Abstract
Chemical reactions are regarded as transformations of chemical structures, and the question of which atoms in the reactants correspond to which atoms in the products has attracted chemists for a long time. Atom-to-atom mapping (AAM) is a procedure that establishes such correspondence(s) between the atoms of reactants and products in a chemical reaction. Currently, automatic AAM tools play a pivotal role in various chemoinformatics tasks. However, achieving accurate automatic AAM for complex or unknown reactions within a reasonable computation time remains a significant challenge due to the combinatorial nature of the problem and the difficulty in applying appropriate reaction rules. In this study, we propose a rule-free AAM algorithm, which enumerates all atom-to-atom correspondences that minimize the number of bond cleavages and formations during the reaction. To reduce the computational burden associated with the combinatorial optimization (i.e., minimizing bond changes), we introduce Ising computing, a computing paradigm that has gained significant attention for its efficiency in solving hard combinatorial optimization problems. We found that our Ising computing framework outperforms conventional combinatorial optimization algorithms in terms of computation times, making it feasible to solve the AAM problem without reaction rules in an acceptable time. Furthermore, our AAM algorithm successfully found the correct AAM solution for all problems in a benchmark data set. In contrast, conventional AAM algorithms based on chemical heuristics failed for several problems. Specifically, these algorithms either failed to find the optimal solution in terms of bond changes, or they identified only one optimal solution, which was incorrect when multiple optimal solutions exist. These results emphasize the importance of enumerating all optimal correspondences that minimize bond changes, which is effectively achieved by our Ising-computing framework.
Collapse
Affiliation(s)
- Mohammad Ali
- Graduate
School of Chemical Sciences and Engineering, Hokkaido University, Kita 13, Nishi 8, Kita-ku, Sapporo 060-8628, Hokkaido, Japan
- Statistics
Discipline, Khulna University, Sher-E-Bangla Road, Khulna 9208, Bangladesh
| | - Yuta Mizuno
- Graduate
School of Chemical Sciences and Engineering, Hokkaido University, Kita 13, Nishi 8, Kita-ku, Sapporo 060-8628, Hokkaido, Japan
- Research
Institute for Electronic Science, Hokkaido University, Kita 20, Nishi 10, Kita-ku, Sapporo 001-0020, Hokkaido, Japan
- Institute
for Chemical Reaction Design and Discovery, Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Hokkaido, Japan
| | - Seiji Akiyama
- Institute
for Chemical Reaction Design and Discovery, Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Hokkaido, Japan
- ERATO
Maeda Artificial Intelligence for Chemical Reaction Design and Discovery
Project, Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Hokkaido, Japan
| | - Yuuya Nagata
- Institute
for Chemical Reaction Design and Discovery, Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Hokkaido, Japan
- ERATO
Maeda Artificial Intelligence for Chemical Reaction Design and Discovery
Project, Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Hokkaido, Japan
| | - Tamiki Komatsuzaki
- Graduate
School of Chemical Sciences and Engineering, Hokkaido University, Kita 13, Nishi 8, Kita-ku, Sapporo 060-8628, Hokkaido, Japan
- Research
Institute for Electronic Science, Hokkaido University, Kita 20, Nishi 10, Kita-ku, Sapporo 001-0020, Hokkaido, Japan
- Institute
for Chemical Reaction Design and Discovery, Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Hokkaido, Japan
- SANKEN, Osaka University, 8-1 Mihogaoka, Osaka 567-0047, Ibaraki, Japan
| |
Collapse
|
7
|
Hua PX, Huang Z, Xu ZY, Zhao Q, Ye CY, Wang YF, Xu YH, Fu Y, Ding H. An active representation learning method for reaction yield prediction with small-scale data. Commun Chem 2025; 8:42. [PMID: 39929993 PMCID: PMC11811124 DOI: 10.1038/s42004-025-01434-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 01/27/2025] [Indexed: 02/13/2025] Open
Abstract
Reaction optimization plays an essential role in chemical research and industrial production. To explore a large reaction system, a practical issue is how to reduce the heavy experimental load for finding the high-yield conditions. In this paper, we present an efficient machine learning tool called "RS-Coreset", where the key idea is to take advantage of deep representation learning techniques to guide an interactive procedure for representing the full reaction space. Our proposed tool only uses small-scale data, say 2.5% to 5% of the instances, to predict the yields of the reaction space. We validate the performance on three public datasets and achieve state-of-the-art results. Moreover, we apply this tool to assist the realistic exploration of the Lewis base-boryl radicals enabled dechlorinative coupling reactions in our lab. The tool can help us to effectively predict the yields and even discover several feasible reaction combinations that were overlooked in previous articles.
Collapse
Affiliation(s)
- Peng-Xiang Hua
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Zhen Huang
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Zhe-Yuan Xu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Qiang Zhao
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Chen-Yang Ye
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Yi-Feng Wang
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Yun-He Xu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Yao Fu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Hu Ding
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| |
Collapse
|
8
|
Nakamura S, Yasuo N, Sekijima M. Molecular optimization using a conditional transformer for reaction-aware compound exploration with reinforcement learning. Commun Chem 2025; 8:40. [PMID: 39922979 PMCID: PMC11807120 DOI: 10.1038/s42004-025-01437-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 01/28/2025] [Indexed: 02/10/2025] Open
Abstract
Designing molecules with desirable properties is a critical endeavor in drug discovery. Because of recent advances in deep learning, molecular generative models have been developed. However, the existing compound exploration models often disregard the important issue of ensuring the feasibility of organic synthesis. To address this issue, we propose TRACER, which is a framework that integrates the optimization of molecular property optimization with synthetic pathway generation. The model can predict the product derived from a given reactant via a conditional transformer under the constraints of a reaction type. The molecular optimization results of an activity prediction model targeting DRD2, AKT1, and CXCR4 revealed that TRACER effectively generated compounds with high scores. The transformer model, which recognizes the entire structures, captures the complexity of the organic synthesis and enables its navigation in a vast chemical space while considering real-world reactivity constraints.
Collapse
Affiliation(s)
- Shogo Nakamura
- Department of Life Science and Technology, Institute of Science Tokyo, 4259-J3-23, Nagatsuta-cho, Midori-ku, Yokohama, 226-8501, Kanagawa, Japan
| | - Nobuaki Yasuo
- Academy for Convergence of Materials and Informatics (TAC-MI), Institute of Science Tokyo, S6-23, Ookayama, Meguro-ku, 152-8550, Tokyo, Japan
| | - Masakazu Sekijima
- Department Computer Science, Institute of Science Tokyo, 4259-J3-23, Nagatsuta-cho, Midori-ku, Yokohama, 226-8501, Kanagawa, Japan.
| |
Collapse
|
9
|
Ramos MC, Collison CJ, White AD. A review of large language models and autonomous agents in chemistry. Chem Sci 2025; 16:2514-2572. [PMID: 39829984 PMCID: PMC11739813 DOI: 10.1039/d4sc03921a] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 12/03/2024] [Indexed: 01/22/2025] Open
Abstract
Large language models (LLMs) have emerged as powerful tools in chemistry, significantly impacting molecule design, property prediction, and synthesis optimization. This review highlights LLM capabilities in these domains and their potential to accelerate scientific discovery through automation. We also review LLM-based autonomous agents: LLMs with a broader set of tools to interact with their surrounding environment. These agents perform diverse tasks such as paper scraping, interfacing with automated laboratories, and synthesis planning. As agents are an emerging topic, we extend the scope of our review of agents beyond chemistry and discuss across any scientific domains. This review covers the recent history, current capabilities, and design of LLMs and autonomous agents, addressing specific challenges, opportunities, and future directions in chemistry. Key challenges include data quality and integration, model interpretability, and the need for standard benchmarks, while future directions point towards more sophisticated multi-modal agents and enhanced collaboration between agents and experimental methods. Due to the quick pace of this field, a repository has been built to keep track of the latest studies: https://github.com/ur-whitelab/LLMs-in-science.
Collapse
Affiliation(s)
- Mayk Caldas Ramos
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| | - Christopher J Collison
- School of Chemistry and Materials Science, Rochester Institute of Technology Rochester NY USA
| | - Andrew D White
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| |
Collapse
|
10
|
Ue T, Sato A, Miyao T. Analog Accessibility Score (AAscore) for Rational Compound Selection. J Chem Inf Model 2024; 64:9350-9360. [PMID: 39639743 DOI: 10.1021/acs.jcim.4c01691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2024]
Abstract
Various in silico scores have been proposed to objectively assess the characteristics and properties of a compound. However, there is still no score that represents the analog accessibility of a compound. Such a score would be valuable for selecting compounds proposed by virtual screening or for prioritizing hit compounds for the hit-to-lead phase. This study proposes an analog accessibility score (AAscore), where retrosynthesis prediction and forward product prediction models were utilized to generate virtual analogs. The AAscore is defined as the number of unique analogs and virtual synthetic routes. To evaluate the AAscore in terms of the number of actually synthesized analog compounds, analog compounds were prepared by using the compound-core relationship (CCR) method. It was found that the AAscore was little correlated with the number of CCR-based analogs. Furthermore, AAscores were found to be significantly influenced by the number of extracted candidate reactants from a reactant database. A case study targeting compounds active against carbonic anhydrase 2 showed that the AAscore could identify compounds that were synthesized into analogs.
Collapse
Affiliation(s)
- Takato Ue
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara630-0192, Japan
| | - Akinori Sato
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara630-0192, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara630-0192, Japan
| | - Tomoyuki Miyao
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara630-0192, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara630-0192, Japan
| |
Collapse
|
11
|
Nana Teukam YG, Kwate Dassi L, Manica M, Probst D, Schwaller P, Laino T. Language models can identify enzymatic binding sites in protein sequences. Comput Struct Biotechnol J 2024; 23:1929-1937. [PMID: 38736695 PMCID: PMC11087710 DOI: 10.1016/j.csbj.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/05/2024] [Accepted: 04/05/2024] [Indexed: 05/14/2024] Open
Abstract
Recent advances in language modeling have had a tremendous impact on how we handle sequential data in science. Language architectures have emerged as a hotbed of innovation and creativity in natural language processing over the last decade, and have since gained prominence in modeling proteins and chemical processes, elucidating structural relationships from textual/sequential data. Surprisingly, some of these relationships refer to three-dimensional structural features, raising important questions on the dimensionality of the information encoded within sequential data. Here, we demonstrate that the unsupervised use of a language model architecture to a language representation of bio-catalyzed chemical reactions can capture the signal at the base of the substrate-binding site atomic interactions. This allows us to identify the three-dimensional binding site position in unknown protein sequences. The language representation comprises a reaction-simplified molecular-input line-entry system (SMILES) for substrate and products, and amino acid sequence information for the enzyme. This approach can recover, with no supervision, 52.13% of the binding site when considering co-crystallized substrate-enzyme structures as ground truth, vastly outperforming other attention-based models.
Collapse
Affiliation(s)
| | - Loïc Kwate Dassi
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Matteo Manica
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Daniel Probst
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Philippe Schwaller
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Teodoro Laino
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| |
Collapse
|
12
|
Vangala SR, Krishnan SR, Bung N, Nandagopal D, Ramasamy G, Kumar S, Sankaran S, Srinivasan R, Roy A. Suitability of large language models for extraction of high-quality chemical reaction dataset from patent literature. J Cheminform 2024; 16:131. [PMID: 39593165 PMCID: PMC11590295 DOI: 10.1186/s13321-024-00928-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 11/10/2024] [Indexed: 11/28/2024] Open
Abstract
With the advent of artificial intelligence (AI), it is now possible to design diverse and novel molecules from previously unexplored chemical space. However, a challenge for chemists is the synthesis of such molecules. Recently, there have been attempts to develop AI models for retrosynthesis prediction, which rely on the availability of a high-quality training dataset. In this work, we explore the suitability of large language models (LLMs) for extraction of high-quality chemical reaction data from patent documents. A comparative study on the same set of patents from an earlier study showed that the proposed automated approach can enhance the current datasets by addition of 26% new reactions. Several challenges were identified during reaction mining, and for some of them alternative solutions were proposed. A detailed analysis was also performed wherein several wrong entries were identified in the previously curated dataset. Reactions extracted using the proposed pipeline over a larger patent dataset can improve the accuracy and efficiency of synthesis prediction models in future.Scientific contributionIn this work we evaluated the suitability of large language models for mining a high-quality chemical reaction dataset from patent literature. We showed that the proposed approach can significantly improve the quantity of the reaction database by identifying more chemical reactions and improve the quality of the reaction database by correcting previous errors/false positives.
Collapse
Affiliation(s)
- Sarveswara Rao Vangala
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | | | - Navneet Bung
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Dhandapani Nandagopal
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Gomathi Ramasamy
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Satyam Kumar
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Sridharan Sankaran
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Rajgopal Srinivasan
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Arijit Roy
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India.
| |
Collapse
|
13
|
Restrepo G. Spaces of mathematical chemistry. Theory Biosci 2024; 143:237-251. [PMID: 39259256 PMCID: PMC11604753 DOI: 10.1007/s12064-024-00425-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 08/22/2024] [Indexed: 09/12/2024]
Abstract
In an effort to expand the domain of mathematical chemistry and inspire research beyond the realms of graph theory and quantum chemistry, we explore five mathematical chemistry spaces and their interconnectedness. These spaces comprise the chemical space, which encompasses substances and reactions; the space of reaction conditions, spanning the physical and chemical aspects involved in chemical reactions; the space of reaction grammars, which encapsulates the rules for creating and breaking chemical bonds; the space of substance properties, covering all documented measurements regarding substances; and the space of substance representations, composed of the various ontologies for characterising substances.
Collapse
Affiliation(s)
- Guillermo Restrepo
- Max Planck Institute for Mathematics in the Sciences, Inselstr. 22, Leipzig, 04103, Saxony, Germany.
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstr. 16-18, Leipzig, 04107, Saxony, Germany.
- School of Applied Sciences and Engineering, EAFIT University, Carrera 49 No 7 Sur-50, Medellin, 050022, Antioquia, Colombia.
| |
Collapse
|
14
|
Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024; 20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open
Abstract
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
15
|
Sato A, Asahara R, Miyao T. Chemical Graph-Based Transformer Models for Yield Prediction of High-Throughput Cross-Coupling Reaction Datasets. ACS OMEGA 2024; 9:40907-40919. [PMID: 39372005 PMCID: PMC11447720 DOI: 10.1021/acsomega.4c06113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/28/2024] [Accepted: 09/03/2024] [Indexed: 10/08/2024]
Abstract
The chemical reaction yield is an important factor to determine the reaction conditions. Recently, many data-driven models for yield prediction using high-throughput experimentation datasets have been reported. In this study, we propose a neural network architecture based on the chemical graphs of the reaction components to predict the reaction yield. The proposed model is the sequential combination of a message-passing neural network and a transformer encoder (MPNN-Transformer). The reaction components are converted to molecular matrices by the first network, followed by the interplay of the reaction components in the second network after adding the embeddings of the compound roles in the chemical reaction. The predictive ability of the proposed models was compared with state-of-the-art yield prediction models using two high-throughput experimental datasets: the Buchwald-Hartwig cross-coupling (BHC) and Suzuki-Miyaura cross-coupling (SMC) reaction datasets. Overall, the MPNN-Transformer models showed high prediction accuracy for the BHC reaction datasets and some of the extrapolation-oriented SMC reaction datasets. These models also performed well when the training dataset size was relatively large. Furthermore, analyzing the poorly predicted reactions for the BHC reaction dataset revealed a limitation of the data-driven yield prediction approach based on the chemical structural similarity.
Collapse
Affiliation(s)
- Akinori Sato
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Ryosuke Asahara
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Tomoyuki Miyao
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| |
Collapse
|
16
|
Kim T, Lee S, Kwak Y, Choi MS, Park J, Hwang SJ, Kim SG. READRetro: natural product biosynthesis predicting with retrieval-augmented dual-view retrosynthesis. THE NEW PHYTOLOGIST 2024; 243:2512-2527. [PMID: 39081009 DOI: 10.1111/nph.20012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 07/08/2024] [Indexed: 08/23/2024]
Abstract
Plants, as a sessile organism, produce various secondary metabolites to interact with the environment. These chemicals have fascinated the plant science community because of their ecological significance and notable biological activity. However, predicting the complete biosynthetic pathways from target molecules to metabolic building blocks remains a challenge. Here, we propose retrieval-augmented dual-view retrosynthesis (READRetro) as a practical bio-retrosynthesis tool to predict the biosynthetic pathways of plant natural products. Conventional bio-retrosynthesis models have been limited in their ability to predict biosynthetic pathways for natural products. READRetro was optimized for the prediction of complex metabolic pathways by incorporating cutting-edge deep learning architectures, an ensemble approach, and two retrievers. Evaluation of single- and multi-step retrosynthesis showed that each component of READRetro significantly improved its ability to predict biosynthetic pathways. READRetro was also able to propose the known pathways of secondary metabolites such as monoterpene indole alkaloids and the unknown pathway of menisdaurilide, demonstrating its applicability to real-world bio-retrosynthesis of plant natural products. For researchers interested in the biosynthesis and production of secondary metabolites, a user-friendly website (https://readretro.net) and the open-source code of READRetro have been made available.
Collapse
Affiliation(s)
- Taein Kim
- Department of Biological Sciences, KAIST, Daejeon, 34141, Korea
| | - Seul Lee
- Kim Jaechul Graduate School of AI, KAIST, Daejeon, 34141, Korea
| | - Yejin Kwak
- Department of BioMedical Convergence Engineering, Pusan National University, Yangsan, 50612, Korea
| | - Min-Soo Choi
- Department of Biological Sciences, KAIST, Daejeon, 34141, Korea
| | - Jeongbin Park
- Department of BioMedical Convergence Engineering, Pusan National University, Yangsan, 50612, Korea
| | - Sung Ju Hwang
- Kim Jaechul Graduate School of AI, KAIST, Daejeon, 34141, Korea
- School of Computing, KAIST, Daejeon, 34141, Korea
| | - Sang-Gyu Kim
- Department of Biological Sciences, KAIST, Daejeon, 34141, Korea
| |
Collapse
|
17
|
van Gerwen P, Briling KR, Bunne C, Somnath VR, Laplaza R, Krause A, Corminboeuf C. 3DReact: Geometric Deep Learning for Chemical Reactions. J Chem Inf Model 2024; 64:5771-5785. [PMID: 39007724 PMCID: PMC11323278 DOI: 10.1021/acs.jcim.4c00104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 07/03/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024]
Abstract
Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction data sets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS, and Proparg-21-TS data sets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different data sets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Ksenia R. Briling
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Charlotte Bunne
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Vignesh Ram Somnath
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Ruben Laplaza
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Andreas Krause
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
18
|
Zhu K, Huang M, Wang Y, Gu Y, Li W, Liu G, Tang Y. MetaPredictor: in silico prediction of drug metabolites based on deep language models with prompt engineering. Brief Bioinform 2024; 25:bbae374. [PMID: 39082648 PMCID: PMC11289679 DOI: 10.1093/bib/bbae374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/02/2024] [Accepted: 07/16/2024] [Indexed: 08/03/2024] Open
Abstract
Metabolic processes can transform a drug into metabolites with different properties that may affect its efficacy and safety. Therefore, investigation of the metabolic fate of a drug candidate is of great significance for drug discovery. Computational methods have been developed to predict drug metabolites, but most of them suffer from two main obstacles: the lack of model generalization due to restrictions on metabolic transformation rules or specific enzyme families, and high rate of false-positive predictions. Here, we presented MetaPredictor, a rule-free, end-to-end and prompt-based method to predict possible human metabolites of small molecules including drugs as a sequence translation problem. We innovatively introduced prompt engineering into deep language models to enrich domain knowledge and guide decision-making. The results showed that using prompts that specify the sites of metabolism (SoMs) can steer the model to propose more accurate metabolite predictions, achieving a 30.4% increase in recall and a 16.8% reduction in false positives over the baseline model. The transfer learning strategy was also utilized to tackle the limited availability of metabolic data. For the adaptation to automatic or non-expert prediction, MetaPredictor was designed as a two-stage schema consisting of automatic identification of SoMs followed by metabolite prediction. Compared to four available drug metabolite prediction tools, our method showed comparable performance on the major enzyme families and better generalization that could additionally identify metabolites catalyzed by less common enzymes. The results indicated that MetaPredictor could provide a more comprehensive and accurate prediction of drug metabolism through the effective combination of transfer learning and prompt-based learning strategies.
Collapse
Affiliation(s)
- Keyun Zhu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Mengting Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yimeng Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yaxin Gu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
19
|
Zeng T, Jin Z, Zheng S, Yu T, Wu R. Developing BioNavi for Hybrid Retrosynthesis Planning. JACS AU 2024; 4:2492-2502. [PMID: 39055138 PMCID: PMC11267531 DOI: 10.1021/jacsau.4c00228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/18/2024] [Accepted: 06/20/2024] [Indexed: 07/27/2024]
Abstract
Illuminating synthetic pathways is essential for producing valuable chemicals, such as bioactive molecules. Chemical and biological syntheses are crucial, and their integration often leads to more efficient and sustainable pathways. Despite the rapid development of retrosynthesis models, few of them consider both chemical and biological syntheses, hindering the pathway design for high-value chemicals. Here, we propose BioNavi by innovating multitask learning and reaction templates into the deep learning-driven model to design hybrid synthesis pathways in a more interpretable manner. BioNavi outperforms existing approaches on different data sets, achieving a 75% hit rate in replicating reported biosynthetic pathways and displaying superior ability in designing hybrid synthesis pathways. Additional case studies further illustrate the potential application of BioNavi in a de novo pathway design. The enhanced web server (http://biopathnavi.qmclab.com/bionavi/) simplifies input operations and implements step-by-step exploration according to user experience. We show that BioNavi is a handy navigator for designing synthetic pathways for various chemicals.
Collapse
Affiliation(s)
- Tao Zeng
- School
of Pharmaceutical Sciences, Sun Yat-sen
University, Guangzhou 510006, P. R. China
| | - Zhehao Jin
- Center
for Synthetic Biochemistry, CAS Key Laboratory of Quantitative Engineering
Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
(CAS), Shenzhen 518055, P. R. China
| | - Shuangjia Zheng
- Global
Institute of Future Technology, Shanghai
Jiao Tong University, Shanghai 200240, P. R. China
| | - Tao Yu
- Center
for Synthetic Biochemistry, CAS Key Laboratory of Quantitative Engineering
Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
(CAS), Shenzhen 518055, P. R. China
| | - Ruibo Wu
- School
of Pharmaceutical Sciences, Sun Yat-sen
University, Guangzhou 510006, P. R. China
| |
Collapse
|
20
|
Phan TL, Weinbauer K, Gärtner T, Merkle D, Andersen JL, Fagerberg R, Stadler PF. Reaction rebalancing: a novel approach to curating reaction databases. J Cheminform 2024; 16:82. [PMID: 39030583 PMCID: PMC11264917 DOI: 10.1186/s13321-024-00875-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 06/24/2024] [Indexed: 07/21/2024] Open
Abstract
PURPOSE Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need. METHODS The SynRBL framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities. RESULTS The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively. CONCLUSION The SynRBL framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning. SCIENTIFIC CONTRIBUTION SynRBL features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, SynRBL successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, SynRBL achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.
Collapse
Affiliation(s)
- Tieu-Long Phan
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark.
| | - Klaus Weinbauer
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany
- Machine Learning Research Unit, TU Wien Informatics, Erzherzog-Johann-Platz 1 (FB02), A-1040, Wien, Austria
| | - Thomas Gärtner
- Machine Learning Research Unit, TU Wien Informatics, Erzherzog-Johann-Platz 1 (FB02), A-1040, Wien, Austria
| | - Daniel Merkle
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark
- Faculty of Technology, Bielefeld University, Postfach 100131, 33501, Bielefeld, Germany
| | - Jakob L Andersen
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark
| | - Rolf Fagerberg
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230, Odense M, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090, Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870, Frederiksberg, Denmark
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
| |
Collapse
|
21
|
Chen S, Noh J, Jang J, Kim S, Gu GH, Jung Y. Reaction Templates: Bridging Synthesis Knowledge and Artificial Intelligence. Acc Chem Res 2024; 57:1964-1972. [PMID: 38924502 DOI: 10.1021/acs.accounts.4c00261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
ConspectusThe field of chemical research boasts a long history of developing software to automate synthesis planning and reaction prediction. Early software relied heavily on expert systems, requiring significant effort to encode vast amounts of synthesis knowledge into a computer-readable format. However, recent advancements in deep learning have shifted the focus toward AI models, offering improved prediction capabilities. Despite these advancements, current AI models often lack the integration of known synthesis rules and intuitions, creating a gap that hinders interpretability and future development of the models. To bridge them, our research group has been actively working on incorporating reaction templates into deep learning models, achieving promising results across various applications.In this Account, we present our latest works to incorporate the known synthesis knowledge into the deep learning models through the utilization of reaction templates. We begin by highlighting the limitations of early computer programs heavily reliant on hand-coded rules. These programs, while providing a foundation for the field, presented limitations in scalability and adaptability. We then introduce SMARTS (SMILES arbitrary target specification), a popular Python-readable format for representing chemical reactions. This format of reaction encoding facilitates the quick integration of synthesis knowledge into AI models built using the Python language. With the SMARTS-based reaction templates, we introduce our recent efforts of developing an AI model for reaction-based molecule optimization. Subsequently, we discuss the recent efforts to automate the extraction of reaction templates from vast chemical reaction databases. This approach eliminates the previously required manual effort of encoding knowledge, a process that could be time-consuming and prone to error when dealing with large data sets. By customizing the automated extraction algorithm, we have developed powerful AI models for specific tasks such as retrosynthesis (LocalRetro), reaction outcome prediction (LocalTransform), and atom-to-atom mapping (LocalMapper). These models, aligned with the intuition of chemists, demonstrate the effectiveness of incorporating reaction templates into deep learning frameworks.Looking toward the future, we believe that utilizing reaction templates to connect known chemical knowledge and AI models holds immense potential for various applications. Not only can this approach significantly benefit future AI models focused on challenging tasks like reaction mechanism labeling and prediction, but we anticipate it can also extend its reach to the realm of inorganic synthesis. By integrating synthesis knowledge, we can not only achieve improved performance but also enhance the interpretability of AI models, paving the way for further advancements in AI-powered chemical synthesis.
Collapse
Affiliation(s)
- Shuan Chen
- Department of Chemical and Biological Engineering, and Institute of Chemical Process, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Juhwan Noh
- Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology (KRICT), 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, South Korea
| | - Jidon Jang
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology (KRICT), 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, South Korea
| | - Seongmin Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291, Daehak-ro, Yuseong-gu, Daejeon 34141, South Korea
| | - Geun Ho Gu
- Department of Energy Engineering, Korea Institute of Energy Technology (KENTECH), 21 Kentech-gil, Naju, Jeonnam 58330, South Korea
| | - Yousung Jung
- Department of Chemical and Biological Engineering, and Institute of Chemical Process, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
- Institute of Engineering Research, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
22
|
Shi Z, Wang D, Li Y, Deng R, Lin J, Liu C, Li H, Wang R, Zhao M, Mao Z, Yuan Q, Liao X, Ma H. REME: an integrated platform for reaction enzyme mining and evaluation. Nucleic Acids Res 2024; 52:W299-W305. [PMID: 38769057 PMCID: PMC11223788 DOI: 10.1093/nar/gkae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/16/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
A key challenge in pathway design is finding proper enzymes that can be engineered to catalyze a non-natural reaction. Although existing tools can identify potential enzymes based on similar reactions, these tools encounter several issues. Firstly, the calculated similar reactions may not even have the same reaction type. Secondly, the associated enzymes are often numerous and identifying the most promising candidate enzymes is difficult due to the lack of data for evaluation. Thirdly, existing web tools do not provide interactive functions that enable users to fine-tune results based on their expertise. Here, we present REME (https://reme.biodesign.ac.cn/), the first integrated web platform for reaction enzyme mining and evaluation. Combining atom-to-atom mapping, atom type change identification, and reaction similarity calculation enables quick ranking and visualization of reactions similar to an objective non-natural reaction. Additional functionality enables users to filter similar reactions by their specified functional groups and candidate enzymes can be further filtered (e.g. by organisms) or expanded by Enzyme Commission number (EC) or sequence homology. Afterward, enzyme attributes (such as kcat, Km, optimal temperature and pH) can be assessed with deep learning-based methods, facilitating the swift identification of potential enzymes that can catalyze the non-natural reaction.
Collapse
Affiliation(s)
- Zhenkun Shi
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Dehang Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Yang Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- University of Chinese Academy of Sciences, Beijing 101408, PR China
| | - Rui Deng
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Jiawei Lin
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
| | - Cui Liu
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Haoran Li
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Ruoyu Wang
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Muqiang Zhao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Zhitao Mao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Qianqian Yuan
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| | - Xiaoping Liao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
- Haihe Laboratory of Synthetic Biology, Tianjin 300308, PR China
| | - Hongwu Ma
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
| |
Collapse
|
23
|
Chen LY, Li YP. AutoTemplate: enhancing chemical reaction datasets for machine learning applications in organic chemistry. J Cheminform 2024; 16:74. [PMID: 38937840 PMCID: PMC11212196 DOI: 10.1186/s13321-024-00869-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 06/09/2024] [Indexed: 06/29/2024] Open
Abstract
This paper presents AutoTemplate, an innovative data preprocessing protocol, addressing the crucial need for high-quality chemical reaction datasets in the realm of machine learning applications in organic chemistry. Recent advances in artificial intelligence have expanded the application of machine learning in chemistry, particularly in yield prediction, retrosynthesis, and reaction condition prediction. However, the effectiveness of these models hinges on the integrity of chemical reaction datasets, which are often plagued by inconsistencies like missing reactants, incorrect atom mappings, and outright erroneous reactions. AutoTemplate introduces a two-stage approach to refine these datasets. The first stage involves extracting meaningful reaction transformation rules and formulating generic reaction templates using a simplified SMARTS representation. This simplification broadens the applicability of templates across various chemical reactions. The second stage is template-guided reaction curation, where these templates are systematically applied to validate and correct the reaction data. This process effectively amends missing reactant information, rectifies atom-mapping errors, and eliminates incorrect data entries. A standout feature of AutoTemplate is its capability to concurrently identify and correct false chemical reactions. It operates on the premise that most reactions in datasets are accurate, using these as templates to guide the correction of flawed entries. The protocol demonstrates its efficacy across a range of chemical reactions, significantly enhancing dataset quality. This advancement provides a more robust foundation for developing reliable machine learning models in chemistry, thereby improving the accuracy of forward and retrosynthetic predictions. AutoTemplate marks a significant progression in the preprocessing of chemical reaction datasets, bridging a vital gap and facilitating more precise and efficient machine learning applications in organic synthesis. SCIENTIFIC CONTRIBUTION: The proposed automated preprocessing tool for chemical reaction data aims to identify errors within chemical databases. Specifically, if the errors involve atom mapping or the absence of reactant types, corrections can be systematically applied using reaction templates, ultimately elevating the overall quality of the database.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan.
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei, 11529, Taiwan.
| |
Collapse
|
24
|
Keto A, Guo T, Underdue M, Stuyver T, Coley CW, Zhang X, Krenske EH, Wiest O. Data-Efficient, Chemistry-Aware Machine Learning Predictions of Diels-Alder Reaction Outcomes. J Am Chem Soc 2024; 146:16052-16061. [PMID: 38822795 DOI: 10.1021/jacs.4c03131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2024]
Abstract
The application of machine learning models to the prediction of reaction outcomes currently needs large and/or highly featurized data sets. We show that a chemistry-aware model, NERF, which mimics the bonding changes that occur during reactions, allows for highly accurate predictions of the outcomes of Diels-Alder reactions using a relatively small training set, with no pretraining and no additional features. We establish a diverse data set of 9537 intramolecular, hetero-, aromatic, and inverse electron demand Diels-Alder reactions. This data set is used to train a NERF model, and the performance is compared against state-of-the-art classification and generative machine learning models across low- and high-data regimes, with and without pretraining. The predictive accuracy (regio- and site selectivity in the major product) achieved by NERF exceeds 90% when as little as 40% of the data set is used for training. Another high-performing model, Chemformer, requires a larger training data set (>45%) and pretraining to reach 90% Top-1 accuracy. Accurate predictions of less-represented reaction subclasses, such as those involving heteroatomic or aromatic substrates, require higher percentages of training data. We also show how NERF can use small amounts of additional training data to quickly learn new systems and improve its overall understanding of reactivity. Synthetic chemists stand to benefit as this model can be rapidly expanded and tailored to areas of chemistry corresponding to the low-data regime.
Collapse
Affiliation(s)
- Angus Keto
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Taicheng Guo
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Morgan Underdue
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Xiangliang Zhang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Elizabeth H Krenske
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Olaf Wiest
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
25
|
Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024; 64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]
Abstract
By accelerating time-consuming processes with high efficiency, computing has become an essential part of many modern chemical pipelines. Machine learning is a class of computing methods that can discover patterns within chemical data and utilize this knowledge for a wide variety of downstream tasks, such as property prediction or substance generation. The complex and diverse chemical space requires complex machine learning architectures with great learning power. Recently, learning models based on transformer architectures have revolutionized multiple domains of machine learning, including natural language processing and computer vision. Naturally, there have been ongoing endeavors in adopting these techniques to the chemical domain, resulting in a surge of publications within a short period. The diversity of chemical structures, use cases, and learning models necessitate a comprehensive summarization of existing works. In this paper, we review recent innovations in adapting transformers to solve learning problems in chemistry. Because chemical data is diverse and complex, we structure our discussion based on chemical representations. Specifically, we highlight the strengths and weaknesses of each representation, the current progress of adapting transformer architectures, and future directions.
Collapse
Affiliation(s)
- Kha-Dinh Luong
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| | - Ambuj Singh
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| |
Collapse
|
26
|
Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024; 45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]
Abstract
Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.
Collapse
Affiliation(s)
- Manajit Das
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Ankit Ghosh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
27
|
van Gerwen P, Briling KR, Calvino Alonso Y, Franke M, Corminboeuf C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. DIGITAL DISCOVERY 2024; 3:932-943. [PMID: 38756222 PMCID: PMC11094696 DOI: 10.1039/d3dd00175j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/28/2024] [Indexed: 05/18/2024]
Abstract
In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATMd, B2Rl2, EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Yannick Calvino Alonso
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Malte Franke
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
28
|
Schlosser L, Rana D, Pflüger P, Katzenburg F, Glorius F. EnTdecker - A Machine Learning-Based Platform for Guiding Substrate Discovery in Energy Transfer Catalysis. J Am Chem Soc 2024; 146:13266-13275. [PMID: 38695558 DOI: 10.1021/jacs.4c01352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Due to the magnitude of chemical space, the discovery of novel substrates in energy transfer (EnT) catalysis remains a daunting task. Experimental and computational strategies to identify compounds that successfully undergo EnT-mediated reactions are limited by their time and cost efficiency. To accelerate the discovery process in EnT catalysis, we herein present the EnTdecker platform, which facilitates the large-scale virtual screening of potential substrates using machine-learning (ML) based predictions of their excited state properties. To achieve this, a data set is created containing more than 34,000 molecules aiming to cover a vast fraction of synthetically relevant compound space for EnT catalysis. Using this data predictive models are trained, and their aptitude for an in-lab application is demonstrated by rediscovering successful substrates from literature as well as experimental validation through luminescence-based screening. By reducing the computational effort needed to obtain excited state properties, the EnTdecker platform represents a tool to efficiently guide substrate selection and increase the experimental success rate for EnT catalysis. Moreover, through an easy-to-use web application, EnTdecker is made publicly accessible under entdecker.uni-muenster.de.
Collapse
Affiliation(s)
- Leon Schlosser
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Debanjan Rana
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Philipp Pflüger
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Felix Katzenburg
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Frank Glorius
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| |
Collapse
|
29
|
Rana D, Pflüger PM, Hölter NP, Tan G, Glorius F. Standardizing Substrate Selection: A Strategy toward Unbiased Evaluation of Reaction Generality. ACS CENTRAL SCIENCE 2024; 10:899-906. [PMID: 38680564 PMCID: PMC11046462 DOI: 10.1021/acscentsci.3c01638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 03/14/2024] [Accepted: 03/18/2024] [Indexed: 05/01/2024]
Abstract
With over 10,000 new reaction protocols arising every year, only a handful of these procedures transition from academia to application. A major reason for this gap stems from the lack of comprehensive knowledge about a reaction's scope, i.e., to which substrates the protocol can or cannot be applied. Even though chemists invest substantial effort to assess the scope of new protocols, the resulting scope tables involve significant biases, reducing their expressiveness. Herein we report a standardized substrate selection strategy designed to mitigate these biases and evaluate the applicability, as well as the limits, of any chemical reaction. Unsupervised learning is utilized to map the chemical space of industrially relevant molecules. Subsequently, potential substrate candidates are projected onto this universal map, enabling the selection of a structurally diverse set of substrates with optimal relevance and coverage. By testing our methodology on different chemical reactions, we were able to demonstrate its effectiveness in finding general reactivity trends by using a few highly representative examples. The developed methodology empowers chemists to showcase the unbiased applicability of novel methodologies, facilitating their practical applications. We hope that this work will trigger interdisciplinary discussions about biases in synthetic chemistry, leading to improved data quality.
Collapse
Affiliation(s)
- Debanjan Rana
- Universität Münster,
Organisch-Chemisches Institut, Corrensstraße 36, 48149 Münster, Germany
| | - Philipp M. Pflüger
- Universität Münster,
Organisch-Chemisches Institut, Corrensstraße 36, 48149 Münster, Germany
| | - Niklas P. Hölter
- Universität Münster,
Organisch-Chemisches Institut, Corrensstraße 36, 48149 Münster, Germany
| | - Guangying Tan
- Universität Münster,
Organisch-Chemisches Institut, Corrensstraße 36, 48149 Münster, Germany
| | - Frank Glorius
- Universität Münster,
Organisch-Chemisches Institut, Corrensstraße 36, 48149 Münster, Germany
| |
Collapse
|
30
|
Zhang C, Arun A, Lapkin AA. Completing and Balancing Database Excerpted Chemical Reactions with a Hybrid Mechanistic-Machine Learning Approach. ACS OMEGA 2024; 9:18385-18399. [PMID: 38680356 PMCID: PMC11044172 DOI: 10.1021/acsomega.4c00262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 03/31/2024] [Accepted: 04/03/2024] [Indexed: 05/01/2024]
Abstract
Computer-aided synthesis planning (CASP) development of reaction routes requires an understanding of complete reaction structures. However, most reactions in the current databases are missing reaction coparticipants. Although reaction prediction and atom mapping tools can predict major reaction participants and trace atom rearrangements in reactions, they fail to identify the missing molecules to complete reactions. This is because these approaches are data-driven models trained on the current reaction databases, which comprise incomplete reactions. In this work, a workflow was developed to tackle the reaction completion challenge. This includes a heuristic-based method to identify balanced reactions from reaction databases and complete some imbalanced reactions by adding candidate molecules. A machine learning masked language model (MLM) was trained to learn from simplified molecular input line entry system (SMILES) sentences of these completed reactions. The model predicted missing molecules for the incomplete reactions, a workflow analogous to predicting missing words in sentences. The model is promising for the prediction of small- and middle-sized missing molecules in incomplete reaction records. The workflow combining both the heuristic and machine learning methods completed more than half of the entire reaction space.
Collapse
Affiliation(s)
- Chonghuan Zhang
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
| | - Adarsh Arun
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
- Cambridge
Centre for Advanced Research and Education in Singapore, CARES Ltd., 1 CREATE Way, CREATE Tower #05-05, Singapore 138602 Singapore
- Chemical
Data Intelligence (CDI) Pte., Ltd., 9 Raffles Place #26-01, Republic Plaza, Singapore 048619 Singapore
| | - Alexei A. Lapkin
- Department
of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
- Cambridge
Centre for Advanced Research and Education in Singapore, CARES Ltd., 1 CREATE Way, CREATE Tower #05-05, Singapore 138602 Singapore
- Chemical
Data Intelligence (CDI) Pte., Ltd., 9 Raffles Place #26-01, Republic Plaza, Singapore 048619 Singapore
| |
Collapse
|
31
|
Westerlund AM, Manohar Koki S, Kancharla S, Tibo A, Saigiridharan L, Kabeshov M, Mercado R, Genheden S. Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis. J Chem Inf Model 2024; 64:3021-3033. [PMID: 38602390 DOI: 10.1021/acs.jcim.3c01685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Synthesis planning of new pharmaceutical compounds is a well-known bottleneck in modern drug design. Template-free methods, such as transformers, have recently been proposed as an alternative to template-based methods for single-step retrosynthetic predictions. Here, we trained and evaluated a transformer model, called the Chemformer, for retrosynthesis predictions within drug discovery. The proprietary data set used for training comprised ∼18 M reactions from literature, patents, and electronic lab notebooks. Chemformer was evaluated for the purpose of both single-step and multistep retrosynthesis. We found that the single-step performance of Chemformer was especially good on reaction classes common in drug discovery, with most reaction classes showing a top-10 round-trip accuracy above 0.97. Moreover, Chemformer reached a higher round-trip accuracy compared to that of a template-based model. By analyzing multistep retrosynthesis experiments, we observed that Chemformer found synthetic routes, leading to commercial starting materials for 95% of the target compounds, an increase of more than 20% compared to the template-based model on a proprietary compound data set. In addition to this, we discovered that Chemformer suggested novel disconnections corresponding to reaction templates, which are not included in the template-based model. These findings were further supported by a publicly available ChEMBL compound data set. The conclusions drawn from this work allow for the design of a synthesis planning tool where template-based and template-free models work in harmony to optimize retrosynthetic recommendations.
Collapse
Affiliation(s)
- Annie M Westerlund
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| | - Siva Manohar Koki
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Supriya Kancharla
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Alessandro Tibo
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| | | | - Mikhail Kabeshov
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| | - Rocío Mercado
- Department of Computer Science and Engineering, Chalmers University of Technology, 412 96 Göteborg, Sweden
| | - Samuel Genheden
- Department of Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Mölndal, Sweden
| |
Collapse
|
32
|
Astero M, Rousu J. Learning symmetry-aware atom mapping in chemical reactions through deep graph matching. J Cheminform 2024; 16:46. [PMID: 38650016 PMCID: PMC11036715 DOI: 10.1186/s13321-024-00841-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/07/2024] [Indexed: 04/25/2024] Open
Abstract
Accurate atom mapping, which establishes correspondences between atoms in reactants and products, is a crucial step in analyzing chemical reactions. In this paper, we present a novel end-to-end approach that formulates the atom mapping problem as a deep graph matching task. Our proposed model, AMNet (Atom Matching Network), utilizes molecular graph representations and employs various atom and bond features using graph neural networks to capture the intricate structural characteristics of molecules, ensuring precise atom correspondence predictions. Notably, AMNet incorporates the consideration of molecule symmetry, enhancing accuracy while simultaneously reducing computational complexity. The integration of the Weisfeiler-Lehman isomorphism test for symmetry identification refines the model's predictions. Furthermore, our model maps the entire atom set in a chemical reaction, offering a comprehensive approach beyond focusing solely on the main molecules in reactions. We evaluated AMNet's performance on a subset of USPTO reaction datasets, addressing various tasks, including assessing the impact of molecular symmetry identification, understanding the influence of feature selection on AMNet performance, and comparing its performance with the state-of-the-art method. The result reveals an average accuracy of 97.3% on mapped atoms, with 99.7% of reactions correctly mapped when the correct mapped atom is within the top 10 predicted atoms.Scientific contributionThe paper introduces a novel end-to-end deep graph matching model for atom mapping, utilizing molecular graph representations to capture structural characteristics effectively. It enhances accuracy by integrating symmetry detection through the Weisfeiler-Lehman test, reducing the number of possible mappings and improving efficiency. Unlike previous methods, it maps the entire reaction, not just main components, providing a comprehensive view. Additionally, by integrating efficient graph matching techniques, it reduces computational complexity, making atom mapping more feasible.
Collapse
Affiliation(s)
- Maryam Astero
- Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland.
| | - Juho Rousu
- Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland.
| |
Collapse
|
33
|
Ding Y, Qiang B, Chen Q, Liu Y, Zhang L, Liu Z. Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective. J Chem Inf Model 2024; 64:2955-2970. [PMID: 38489239 DOI: 10.1021/acs.jcim.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate the design of novel reactions, optimize existing ones for higher yields, and discover new pathways for synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning models, it is imperative to derive robust and informative representations or engage in feature engineering using extensive data sets of reactions. This work aims to provide a comprehensive review of established reaction featurization approaches, offering insights into the selection of representations and the design of features for a wide array of tasks. The advantages and limitations of employing SMILES, molecular fingerprints, molecular graphs, and physics-based properties are meticulously elaborated. Solutions to bridge the gap between different representations will also be critically evaluated. Additionally, we introduce a new frontier in chemical reaction pretraining, holding promise as an innovative yet unexplored avenue.
Collapse
Affiliation(s)
- Yuheng Ding
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Bo Qiang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Qixuan Chen
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Yiqiao Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Liangren Zhang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Zhenming Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| |
Collapse
|
34
|
Strieth-Kalthoff F, Szymkuć S, Molga K, Aspuru-Guzik A, Glorius F, Grzybowski BA. Artificial Intelligence for Retrosynthetic Planning Needs Both Data and Expert Knowledge. J Am Chem Soc 2024. [PMID: 38598363 DOI: 10.1021/jacs.4c00338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Rapid advancements in artificial intelligence (AI) have enabled breakthroughs across many scientific disciplines. In organic chemistry, the challenge of planning complex multistep chemical syntheses should conceptually be well-suited for AI. Yet, the development of AI synthesis planners trained solely on reaction-example-data has stagnated and is not on par with the performance of "hybrid" algorithms combining AI with expert knowledge. This Perspective examines possible causes of these shortcomings, extending beyond the established reasoning of insufficient quantities of reaction data. Drawing attention to the intricacies and data biases that are specific to the domain of synthetic chemistry, we advocate augmenting the unique capabilities of AI with the knowledge base and the reasoning strategies of domain experts. By actively involving synthetic chemists, who are the end users of any synthesis planning software, into the development process, we envision to bridge the gap between computer algorithms and the intricate nature of chemical synthesis.
Collapse
Affiliation(s)
- Felix Strieth-Kalthoff
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
| | - Sara Szymkuć
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Karol Molga
- Allchemy, 2145 45th Street #201, Highland, Indiana 46322, United States
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Alán Aspuru-Guzik
- University of Toronto, Department of Chemistry and Department of Computer Science, 80 St. George St., Toronto, Ontario M5S 3H6, Canada
- University of Toronto, Department of Computer Science, 10 King's College Road, Toronto, Ontario M5S 3G4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave., Toronto, Ontario M5G 1M1, Canada
- University of Toronto, Department of Chemical Engineering and Applied Chemistry, 200 College St., Toronto, Ontario M5S 3E5, Canada
- University of Toronto, Department of Materials Science and Engineering, 184 College St., Toronto, Ontario M5S 3E4, Canada
| | - Frank Glorius
- Universität Münster, Organisch-Chemisches Institut, Corrensstr. 36, 48149 Münster, Germany
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, ul. Kasprzaka 44/52, Warsaw 01-224, Poland
- IBS Center for Algorithmic and Robotized Synthesis, CARS, UNIST 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
- Department of Chemistry, UNIST, 50, UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan 689-798, South Korea
| |
Collapse
|
35
|
Hartog PBR, Krüger F, Genheden S, Tetko IV. Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition. J Cheminform 2024; 16:39. [PMID: 38576047 PMCID: PMC10993590 DOI: 10.1186/s13321-024-00824-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/09/2024] [Indexed: 04/06/2024] Open
Abstract
Stakeholders of machine learning models desire explainable artificial intelligence (XAI) to produce human-understandable and consistent interpretations. In computational toxicity, augmentation of text-based molecular representations has been used successfully for transfer learning on downstream tasks. Augmentations of molecular representations can also be used at inference to compare differences between multiple representations of the same ground-truth. In this study, we investigate the robustness of eight XAI methods using test-time augmentation for a molecular-representation model in the field of computational toxicity prediction. We report significant differences between explanations for different representations of the same ground-truth, and show that randomized models have similar variance. We hypothesize that text-based molecular representations in this and past research reflect tokenization more than learned parameters. Furthermore, we see a greater variance between in-domain predictions than out-of-domain predictions, indicating XAI measures something other than learned parameters. Finally, we investigate the relative importance given to expert-derived structural alerts and find similar importance given irregardless of applicability domain, randomization and varying training procedures. We therefore caution future research to validate their methods using a similar comparison to human intuition without further investigation. SCIENTIFIC CONTRIBUTION: In this research we critically investigate XAI through test-time augmentation, contrasting previous assumptions about using expert validation and showing inconsistencies within models for identical representations. SMILES augmentation has been used to increase model accuracy, but was here adapted from the field of image test-time augmentation to be used as an independent indication of the consistency within SMILES-based molecular representation models.
Collapse
Affiliation(s)
- Peter B R Hartog
- Molecular AI, Discovery Sciences, R &D, AstraZeneca, 431 83, Mölndal, Sweden.
- Institute of Structural Biology, Helmholtz Munich, Munich, 85764, Germany.
| | - Fabian Krüger
- Institute of Structural Biology, Helmholtz Munich, Munich, 85764, Germany
| | - Samuel Genheden
- Molecular AI, Discovery Sciences, R &D, AstraZeneca, 431 83, Mölndal, Sweden
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Munich, Munich, 85764, Germany
| |
Collapse
|
36
|
Dobbelaere MR, Lengyel I, Stevens CV, Van Geem KM. Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices. J Cheminform 2024; 16:37. [PMID: 38553720 PMCID: PMC10980627 DOI: 10.1186/s13321-024-00834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/23/2024] [Indexed: 04/02/2024] Open
Abstract
The challenge of devising pathways for organic synthesis remains a central issue in the field of medicinal chemistry. Over the span of six decades, computer-aided synthesis planning has given rise to a plethora of potent tools for formulating synthetic routes. Nevertheless, a significant expert task still looms: determining the appropriate solvent, catalyst, and reagents when provided with a set of reactants to achieve and optimize the desired product for a specific step in the synthesis process. Typically, chemists identify key functional groups and rings that exert crucial influences at the reaction center, classify reactions into categories, and may assign them names. This research introduces Rxn-INSIGHT, an open-source algorithm based on the bond-electron matrix approach, with the purpose of automating this endeavor. Rxn-INSIGHT not only streamlines the process but also facilitates extensive querying of reaction databases, effectively replicating the thought processes of an organic chemist. The core functions of the algorithm encompass the classification and naming of reactions, extraction of functional groups, rings, and scaffolds from the involved chemical entities. The provision of reaction condition recommendations based on the similarity and prevalence of reactions eventually arises as a side application. The performance of our rule-based model has been rigorously assessed against a carefully curated benchmark dataset, exhibiting an accuracy rate exceeding 90% in reaction classification and surpassing 95% in reaction naming. Notably, it has been discerned that a pivotal factor in selecting analogous reactions lies in the analysis of ring structures participating in the reactions. An examination of ring structures within the USPTO chemical reaction database reveals that with just 35 unique rings, a remarkable 75% of all rings found in nearly 1 million products can be encompassed. Furthermore, Rxn-INSIGHT is proficient in suggesting appropriate choices for solvents, catalysts, and reagents in entirely novel reactions, all within the span of a second, utilizing nothing more than an everyday laptop.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
| | - István Lengyel
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium
- ChemInsights LLC, Dover, DE, 19901, USA
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium.
| |
Collapse
|
37
|
Chen S, An S, Babazade R, Jung Y. Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning. Nat Commun 2024; 15:2250. [PMID: 38480709 PMCID: PMC10937625 DOI: 10.1038/s41467-024-46364-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/20/2024] [Indexed: 03/17/2024] Open
Abstract
Atom-to-atom mapping (AAM) is a task of identifying the position of each atom in the molecules before and after a chemical reaction, which is important for understanding the reaction mechanism. As more machine learning (ML) models were developed for retrosynthesis and reaction outcome prediction recently, the quality of these models is highly dependent on the quality of the AAM in reaction datasets. Although there are algorithms using graph theory or unsupervised learning to label the AAM for reaction datasets, existing methods map the atoms based on substructure alignments instead of chemistry knowledge. Here, we present LocalMapper, an ML model that learns correct AAM from chemist-labeled reactions via human-in-the-loop machine learning. We show that LocalMapper can predict the AAM for 50 K reactions with 98.5% calibrated accuracy by learning from only 2% of the human-labeled reactions from the entire dataset. More importantly, the confident predictions given by LocalMapper, which cover 97% of 50 K reactions, show 100% accuracy for 3,000 randomly sampled reactions. In an out-of-distribution experiment, LocalMapper shows favorable performance over other existing methods. We expect LocalMapper can be used to generate more precise reaction AAM and improve the quality of future ML-based reaction prediction models.
Collapse
Affiliation(s)
- Shuan Chen
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea
| | - Sunggi An
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea
| | | | - Yousung Jung
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea.
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea.
- Institute of Chemical Processes, Seoul National University, Seoul, South Korea.
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea.
| |
Collapse
|
38
|
Kaufman B, Williams EC, Underkoffler C, Pederson R, Mardirossian N, Watson I, Parkhill J. COATI: Multimodal Contrastive Pretraining for Representing and Traversing Chemical Space. J Chem Inf Model 2024; 64:1145-1157. [PMID: 38316665 DOI: 10.1021/acs.jcim.3c01753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
Creating a successful small molecule drug is a challenging multiparameter optimization problem in an effectively infinite space of possible molecules. Generative models have emerged as powerful tools for traversing data manifolds composed of images, sounds, and text and offer an opportunity to dramatically improve the drug discovery and design process. To create generative optimization methods that are more useful than brute-force molecular generation and filtering via virtual screening, we propose that four integrated features are necessary: large, quantitative data sets of molecular structure and activity, an invertible vector representation of realistic accessible molecules, smooth and differentiable regressors that quantify uncertainty, and algorithms to simultaneously optimize properties of interest. Over the course of 12 months, Terray Therapeutics has collected a data set of 2 billion quantitative binding measurements of small molecules to therapeutic targets, which directly motivates multiparameter generative optimization of molecules conditioned on these data. To this end, we present contrastive optimization for accelerated therapeutic inference (COATI), a pretrained, multimodal encoder-decoder model of druglike chemical space. COATI is constructed without any human biasing of features, using contrastive learning from text and 3D representations of molecules to allow for downstream use with structural models. We demonstrate that COATI possesses many of the desired properties of universal molecular embedding: fixed-dimension, invertibility, autoencoding, accurate regression, and low computation cost. Finally, we present a novel metadynamics algorithm for generative optimization using a small subset of our proprietary data collected for a model protein, carbonic anhydrase, designing molecules that satisfy the multiparameter optimization task of potency, solubility, and drug likeness. This work sets the stage for fully integrated generative molecular design and optimization for small molecules.
Collapse
Affiliation(s)
- Benjamin Kaufman
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Edward C Williams
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Carl Underkoffler
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Ryan Pederson
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Narbe Mardirossian
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Ian Watson
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - John Parkhill
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| |
Collapse
|
39
|
Zhang D, Wang Z, Oberschelp C, Bradford E, Hellweg S. Enhanced Deep-Learning Model for Carbon Footprints of Chemicals. ACS SUSTAINABLE CHEMISTRY & ENGINEERING 2024; 12:2700-2708. [PMID: 38389904 PMCID: PMC10880087 DOI: 10.1021/acssuschemeng.3c07038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/17/2024] [Accepted: 01/17/2024] [Indexed: 02/24/2024]
Abstract
Millions of chemicals have been designed; however, their product carbon footprints (PCFs) are largely unknown, leaving questions about their sustainability. This general lack of PCF data is because the data needed for comprehensive environmental analyses are typically not available in the early molecular design stages. Several predictive tools have been developed to estimate the PCF of chemicals, which are applicable to only a narrow range of common chemicals and have limited predictive ability. Here, we propose FineChem 2, which is based on a novel transformer framework and first-hand industry data, for accurately predicting the PCF of chemicals. Compared to previous tools, FineChem 2 demonstrates significantly better predictive power, and its applicability domains are improved by ∼75% on a diverse set of chemicals on the global market, including the high-production-volume chemicals identified by regulators, daily chemicals, and chemical additives in food and plastics. In addition, through better interpretability from the attention mechanism, FineChem 2 may successfully identify PCF-intensive substructures and critical raw materials of chemicals, providing insights into the design of more sustainable molecules and processes. Therefore, we highlight FineChem 2 for estimating the PCF of chemicals, contributing to advancements in the sustainable transition of the global chemical industry.
Collapse
Affiliation(s)
- Dachuan Zhang
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
| | - Zhanyun Wang
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
- Technology
and Society Laboratory, Empa-Swiss Federal
Laboratories for Materials Science and Technology, St. Gallen CH-9014, Switzerland
| | - Christopher Oberschelp
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
| | - Eric Bradford
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
| | - Stefanie Hellweg
- National
Centre of Competence in Research (NCCR) Catalysis, Ecological Systems
Design, Institute of Environmental Engineering, ETH Zürich, Zürich 8093, Switzerland
| |
Collapse
|
40
|
King-Smith E, Faber FA, Reilly U, Sinitskiy AV, Yang Q, Liu B, Hyek D, Lee AA. Predictive Minisci late stage functionalization with transfer learning. Nat Commun 2024; 15:426. [PMID: 38225239 PMCID: PMC10789750 DOI: 10.1038/s41467-023-42145-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/01/2023] [Indexed: 01/17/2024] Open
Abstract
Structural diversification of lead molecules is a key component of drug discovery to explore chemical space. Late-stage functionalizations (LSFs) are versatile methodologies capable of installing functional handles on richly decorated intermediates to deliver numerous diverse products in a single reaction. Predicting the regioselectivity of LSF is still an open challenge in the field. Numerous efforts from chemoinformatics and machine learning (ML) groups have made strides in this area. However, it is arduous to isolate and characterize the multitude of LSF products generated, limiting available data and hindering pure ML approaches. We report the development of an approach that combines a message passing neural network and 13C NMR-based transfer learning to predict the atom-wise probabilities of functionalization for Minisci and P450-based functionalizations. We validated our model both retrospectively and with a series of prospective experiments, showing that it accurately predicts the outcomes of Minisci-type and P450 transformations and outperforms the well-established Fukui-based reactivity indices and other machine learning reactivity-based algorithms.
Collapse
Affiliation(s)
- Emma King-Smith
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Felix A Faber
- Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Usa Reilly
- Development & Medical, Pfizer Worldwide Research, Groton, CT, USA
| | - Anton V Sinitskiy
- Machine Learning Computational Sciences, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Qingyi Yang
- Development & Medical, Pfizer Worldwide Research, Cambridge, MA, USA
| | - Bo Liu
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Dennis Hyek
- Spectrix Analytic Services, LLC., North Haven, CT, USA
| | - Alpha A Lee
- Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
41
|
Voinarovska V, Kabeshov M, Dudenko D, Genheden S, Tetko IV. When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges. J Chem Inf Model 2024; 64:42-56. [PMID: 38116926 PMCID: PMC10778086 DOI: 10.1021/acs.jcim.3c01524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023]
Abstract
Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to catalysts, temperature, and purification processes. Successfully developing a reliable predictive model not only holds the potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic predictive approaches and bolster a plethora of applications within the field. In this review, we systematically evaluate the efficacy of current ML methodologies in chemoinformatics, shedding light on their milestones and inherent limitations. Additionally, a detailed examination of a representative case study provides insights into the prevailing issues related to data availability and transferability in the discipline.
Collapse
Affiliation(s)
- Varvara Voinarovska
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
- TUM
Graduate School, Faculty of Chemistry, Technical
University of Munich, 85748 Garching, Germany
| | - Mikhail Kabeshov
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Dmytro Dudenko
- Enamine
Ltd., 78 Chervonotkatska str., 02094 Kyiv, Ukraine
| | - Samuel Genheden
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Igor V. Tetko
- Molecular
Targets and Therapeutics Center, Helmholtz Munich − Deutsches
Forschungszentrum für Gesundheit und Umwelt (GmbH), Institute of Structural Biology, 85764 Neuherberg, Germany
| |
Collapse
|
42
|
Heid E, Probst D, Green WH, Madsen GKH. EnzymeMap: curation, validation and data-driven prediction of enzymatic reactions. Chem Sci 2023; 14:14229-14242. [PMID: 38098707 PMCID: PMC10718068 DOI: 10.1039/d3sc02048g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023] Open
Abstract
Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic reactions for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models to predict the activity of enzymes on non-native substrates, to perform retrosynthetic pathway searches, or to predict the outcomes of reactions including regio- and stereoselectivity are becoming increasingly important. However, current approaches are substantially hindered by the limited amount of available data, especially if balanced and atom mapped reactions are needed and if the models feature machine learning components. We therefore constructed a high-quality dataset (EnzymeMap) by developing a large set of correction and validation algorithms for recorded reactions in the literature and showcase its significant positive impact on machine learning models of retrosynthesis, forward prediction, and regioselectivity prediction, outperforming previous approaches by a large margin. Our dataset allows for deep learning models of enzymatic reactions with unprecedented accuracy, and is freely available online.
Collapse
Affiliation(s)
- Esther Heid
- Institute of Materials Chemistry, TU Wien 1060 Vienna Austria
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | | | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
| | | |
Collapse
|
43
|
Kim GB, Kim JY, Lee JA, Norsigian CJ, Palsson BO, Lee SY. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nat Commun 2023; 14:7370. [PMID: 37963869 PMCID: PMC10645960 DOI: 10.1038/s41467-023-43216-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 11/03/2023] [Indexed: 11/16/2023] Open
Abstract
Functional annotation of open reading frames in microbial genomes remains substantially incomplete. Enzymes constitute the most prevalent functional gene class in microbial genomes and can be described by their specific catalytic functions using the Enzyme Commission (EC) number. Consequently, the ability to predict EC numbers could substantially reduce the number of un-annotated genes. Here we present a deep learning model, DeepECtransformer, which utilizes transformer layers as a neural network architecture to predict EC numbers. Using the extensively studied Escherichia coli K-12 MG1655 genome, DeepECtransformer predicted EC numbers for 464 un-annotated genes. We experimentally validated the enzymatic activities predicted for three proteins (YgfF, YciO, and YjdM). Further examination of the neural network's reasoning process revealed that the trained neural network relies on functional motifs of enzymes to predict EC numbers. Thus, DeepECtransformer is a method that facilitates the functional annotation of uncharacterized genes.
Collapse
Affiliation(s)
- Gi Bae Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Ji Yeon Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Jong An Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Charles J Norsigian
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, 2800, Kongens Lyngby, Denmark
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea.
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea.
- BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
44
|
Toniato A, Vaucher AC, Lehmann MM, Luksch T, Schwaller P, Stenta M, Laino T. Fast Customization of Chemical Language Models to Out-of-Distribution Data Sets. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2023; 35:8806-8815. [PMID: 38027545 PMCID: PMC10653079 DOI: 10.1021/acs.chemmater.3c01406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/09/2023] [Accepted: 10/09/2023] [Indexed: 12/01/2023]
Abstract
The world is on the verge of a new industrial revolution, and language models are poised to play a pivotal role in this transformative era. Their ability to offer intelligent insights and forecasts has made them a valuable asset for businesses seeking a competitive advantage. The chemical industry, in particular, can benefit significantly from harnessing their power. Since 2016 already, language models have been applied to tasks such as predicting reaction outcomes or retrosynthetic routes. While such models have demonstrated impressive abilities, the lack of publicly available data sets with universal coverage is often the limiting factor for achieving even higher accuracies. This makes it imperative for organizations to incorporate proprietary data sets into their model training processes to improve their performance. So far, however, these data sets frequently remain untapped as there are no established criteria for model customization. In this work, we report a successful methodology for retraining language models on reaction outcome prediction and single-step retrosynthesis tasks, using proprietary, nonpublic data sets. We report a considerable boost in accuracy by combining patent and proprietary data in a multidomain learning formulation. This exercise, inspired by a real-world use case, enables us to formulate guidelines that can be adopted in different corporate settings to customize chemical language models easily.
Collapse
Affiliation(s)
- Alessandra Toniato
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Alain C. Vaucher
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | | | | | - Philippe Schwaller
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| | - Marco Stenta
- Syngenta
Crop Protection AG, Stein 4332, Switzerland
| | - Teodoro Laino
- IBM
Research Europe, Rüschlikon 8803, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), 8093 Zürich, Switzerland
| |
Collapse
|
45
|
Ryu G, Kim GB, Yu T, Lee SY. Deep learning for metabolic pathway design. Metab Eng 2023; 80:130-141. [PMID: 37734652 DOI: 10.1016/j.ymben.2023.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 09/17/2023] [Accepted: 09/19/2023] [Indexed: 09/23/2023]
Abstract
The establishment of a bio-based circular economy is imperative in tackling the climate crisis and advancing sustainable development. In this realm, the creation of microbial cell factories is central to generating a variety of chemicals and materials. The design of metabolic pathways is crucial in shaping these microbial cell factories, especially when it comes to producing chemicals with yet-to-be-discovered biosynthetic routes. To aid in navigating the complexities of chemical and metabolic domains, computer-supported tools for metabolic pathway design have emerged. In this paper, we evaluate how digital strategies can be employed for pathway prediction and enzyme discovery. Additionally, we touch upon the recent strides made in using deep learning techniques for metabolic pathway prediction. These computational tools and strategies streamline the design of metabolic pathways, facilitating the development of microbial cell factories. Leveraging the capabilities of deep learning in metabolic pathway design is profoundly promising, potentially hastening the advent of a bio-based circular economy.
Collapse
Affiliation(s)
- Gahyeon Ryu
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Gi Bae Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Taeho Yu
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, KAIST, Daejeon, 34141, Republic of Korea; BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea; Graduate School of Engineering Biology, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
46
|
Wang X, Hsieh CY, Yin X, Wang J, Li Y, Deng Y, Jiang D, Wu Z, Du H, Chen H, Li Y, Liu H, Wang Y, Luo P, Hou T, Yao X. Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center. RESEARCH (WASHINGTON, D.C.) 2023; 6:0231. [PMID: 37849643 PMCID: PMC10578430 DOI: 10.34133/research.0231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 08/29/2023] [Indexed: 10/19/2023]
Abstract
Effective synthesis planning powered by deep learning (DL) can significantly accelerate the discovery of new drugs and materials. However, most DL-assisted synthesis planning methods offer either none or very limited capability to recommend suitable reaction conditions (RCs) for their reaction predictions. Currently, the prediction of RCs with a DL framework is hindered by several factors, including: (a) lack of a standardized dataset for benchmarking, (b) lack of a general prediction model with powerful representation, and (c) lack of interpretability. To address these issues, we first created 2 standardized RC datasets covering a broad range of reaction classes and then proposed a powerful and interpretable Transformer-based RC predictor named Parrot. Through careful design of the model architecture, pretraining method, and training strategy, Parrot improved the overall top-3 prediction accuracy on catalysis, solvents, and other reagents by as much as 13.44%, compared to the best previous model on a newly curated dataset. Additionally, the mean absolute error of the predicted temperatures was reduced by about 4 °C. Furthermore, Parrot manifests strong generalization capacity with superior cross-chemical-space prediction accuracy. Attention analysis indicates that Parrot effectively captures crucial chemical information and exhibits a high level of interpretability in the prediction of RCs. The proposed model Parrot exemplifies how modern neural network architecture when appropriately pretrained can be versatile in making reliable, generalizable, and interpretable recommendation for RCs even when the underlying training dataset may still be limited in diversity.
Collapse
Affiliation(s)
- Xiaorui Wang
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health,
Macau University of Science and Technology, Macao, 999078, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
| | - Xiaodan Yin
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health,
Macau University of Science and Technology, Macao, 999078, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering,
Lanzhou University, Lanzhou, 730000, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
- CarbonSilicon AI Technology Co.,
Ltd, Hangzhou, Zhejiang310018, China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
| | - Hongming Chen
- Center of Chemistry and Chemical Biology,
Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510530, China
| | - Yun Li
- College of Chemistry and Chemical Engineering,
Lanzhou University, Lanzhou, 730000, China
| | - Huanxiang Liu
- Faculty of Applied Sciences,
Macao Polytechnic University, Macao, 999078, China
| | - Yuwei Wang
- College of Pharmacy,
Shaanxi University of Chinese Medicine, Xianyang, Shaanxi, 712044, China
| | - Pei Luo
- Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health,
Macau University of Science and Technology, Macao, 999078, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences,
Zhejiang University, Hangzhou, 310058, China
| | - Xiaojun Yao
- Faculty of Applied Sciences,
Macao Polytechnic University, Macao, 999078, China
| |
Collapse
|
47
|
Schrier J, Norquist AJ, Buonassisi T, Brgoch J. In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science. J Am Chem Soc 2023; 145:21699-21716. [PMID: 37754929 DOI: 10.1021/jacs.3c04783] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-Tc oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.
Collapse
Affiliation(s)
- Joshua Schrier
- Department of Chemistry, Fordham University, The Bronx, New York 10458, United States
| | - Alexander J Norquist
- Department of Chemistry, Haverford College, Haverford, Pennsylvania 19041, United States
| | - Tonio Buonassisi
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jakoah Brgoch
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204, United States
| |
Collapse
|
48
|
Orsi M, Probst D, Schwaller P, Reymond JL. Alchemical analysis of FDA approved drugs. DIGITAL DISCOVERY 2023; 2:1289-1296. [PMID: 38013905 PMCID: PMC10561545 DOI: 10.1039/d3dd00039g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 08/29/2023] [Indexed: 11/29/2023]
Abstract
Chemical space maps help visualize similarities within molecular sets. However, there are many different molecular similarity measures resulting in a confusing number of possible comparisons. To overcome this limitation, we exploit the fact that tools designed for reaction informatics also work for alchemical processes that do not obey Lavoisier's principle, such as the transmutation of lead into gold. We start by using the differential reaction fingerprint (DRFP) to create tree-maps (TMAPs) representing the chemical space of pairs of drugs selected as being similar according to various molecular fingerprints. We then use the Transformer-based RXNMapper model to understand structural relationships between drugs, and its confidence score to distinguish between pairs related by chemically feasible transformations and pairs related by alchemical transmutations. This analysis reveals a diversity of structural similarity relationships that are otherwise difficult to analyze simultaneously. We exemplify this approach by visualizing FDA-approved drugs, EGFR inhibitors, and polymyxin B analogs.
Collapse
Affiliation(s)
- Markus Orsi
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Daniel Probst
- Ecole Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | | | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
49
|
Yan Y, Zhao Y, Yao H, Feng J, Liang L, Han W, Xu X, Pu C, Zang C, Chen L, Li Y, Liu H, Lu T, Chen Y, Zhang Y. RPBP: Deep Retrosynthesis Reaction Prediction Based on Byproducts. J Chem Inf Model 2023; 63:5956-5970. [PMID: 37724339 DOI: 10.1021/acs.jcim.3c00274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/20/2023]
Abstract
Retrosynthesis prediction is crucial in organic synthesis and drug discovery, aiding chemists in designing efficient synthetic routes for target molecules. Data-driven deep retrosynthesis prediction has gained importance due to new algorithms and enhanced computing power. Although existing models show certain predictive power on the USPTO-50K benchmark data set, no one considers the effects of byproducts during the prediction process, which may be due to the lack of byproduct information in the benchmark data set. Here, we propose a novel two-stage retrosynthesis reaction prediction framework based on byproducts called RPBP. First, RPBP predicts the byproduct involved in the reaction based on the product molecule. Then, it handles an end-to-end prediction problem based on the prediction of reactants by product and byproduct. Unlike other methods that first identify the potential reaction center and then predict reactant molecules, RPBP considers additional information from byproducts, such as reaction reagents, conditions, and sites. Interestingly, adding byproducts reduces model learning complexity in natural language processing (NLP). Our RPBP model achieves 54.7% and 66.6% top-1 retrosynthesis prediction accuracy when the reaction class is unknown and known, respectively. It outperforms existing methods for known-class reactions, thanks to the rich chemical information in byproducts. The prediction of four kinase drugs from the literature demonstrates the model's practicality and potential to accelerate drug discovery.
Collapse
Affiliation(s)
- Yingchao Yan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yang Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Huifeng Yao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Jie Feng
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
| | - Li Liang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Weijie Han
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xiaohe Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Chengtao Pu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Chengdong Zang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lingfeng Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuanyuan Li
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|
50
|
Kreutter D, Reymond JL. Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search. Chem Sci 2023; 14:9959-9969. [PMID: 37736648 PMCID: PMC10510629 DOI: 10.1039/d3sc01604h] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/30/2023] [Indexed: 09/23/2023] Open
Abstract
Computer-aided synthesis planning (CASP) aims to automatically learn organic reactivity from literature and perform retrosynthesis of unseen molecules. CASP systems must learn reactions sufficiently precisely to propose realistic disconnections, while avoiding overfitting to leave room for diverse options, and explore possible routes such as to allow short synthetic sequences to emerge. Herein we report an open-source CASP tool proposing original solutions to both challenges. First, we use a triple transformer loop (TTL) predicting starting materials (T1), reagents (T2), and products (T3) to explore various disconnection sites defined by combining systematic, template-based, and transformer-based tagging procedures. Second, we integrate TTL into a multistep tree search algorithm (TTLA) prioritizing sequences using a route penalty score (RPScore) considering the number of steps, their confidence score, and the simplicity of all intermediates along the route. Our approach favours short synthetic routes to commercial starting materials, as exemplified by retrosynthetic analyses of recently approved drugs.
Collapse
Affiliation(s)
- David Kreutter
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|