1
|
van Gerwen P, Briling KR, Calvino Alonso Y, Franke M, Corminboeuf C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. DIGITAL DISCOVERY 2024; 3:932-943. [PMID: 38756222 PMCID: PMC11094696 DOI: 10.1039/d3dd00175j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/28/2024] [Indexed: 05/18/2024]
Abstract
In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATMd, B2Rl2, EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Yannick Calvino Alonso
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Malte Franke
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
2
|
Astero M, Rousu J. Learning symmetry-aware atom mapping in chemical reactions through deep graph matching. J Cheminform 2024; 16:46. [PMID: 38650016 PMCID: PMC11036715 DOI: 10.1186/s13321-024-00841-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/07/2024] [Indexed: 04/25/2024] Open
Abstract
Accurate atom mapping, which establishes correspondences between atoms in reactants and products, is a crucial step in analyzing chemical reactions. In this paper, we present a novel end-to-end approach that formulates the atom mapping problem as a deep graph matching task. Our proposed model, AMNet (Atom Matching Network), utilizes molecular graph representations and employs various atom and bond features using graph neural networks to capture the intricate structural characteristics of molecules, ensuring precise atom correspondence predictions. Notably, AMNet incorporates the consideration of molecule symmetry, enhancing accuracy while simultaneously reducing computational complexity. The integration of the Weisfeiler-Lehman isomorphism test for symmetry identification refines the model's predictions. Furthermore, our model maps the entire atom set in a chemical reaction, offering a comprehensive approach beyond focusing solely on the main molecules in reactions. We evaluated AMNet's performance on a subset of USPTO reaction datasets, addressing various tasks, including assessing the impact of molecular symmetry identification, understanding the influence of feature selection on AMNet performance, and comparing its performance with the state-of-the-art method. The result reveals an average accuracy of 97.3% on mapped atoms, with 99.7% of reactions correctly mapped when the correct mapped atom is within the top 10 predicted atoms.Scientific contributionThe paper introduces a novel end-to-end deep graph matching model for atom mapping, utilizing molecular graph representations to capture structural characteristics effectively. It enhances accuracy by integrating symmetry detection through the Weisfeiler-Lehman test, reducing the number of possible mappings and improving efficiency. Unlike previous methods, it maps the entire reaction, not just main components, providing a comprehensive view. Additionally, by integrating efficient graph matching techniques, it reduces computational complexity, making atom mapping more feasible.
Collapse
Affiliation(s)
- Maryam Astero
- Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland.
| | - Juho Rousu
- Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland.
| |
Collapse
|
3
|
Chen S, An S, Babazade R, Jung Y. Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning. Nat Commun 2024; 15:2250. [PMID: 38480709 PMCID: PMC10937625 DOI: 10.1038/s41467-024-46364-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/20/2024] [Indexed: 03/17/2024] Open
Abstract
Atom-to-atom mapping (AAM) is a task of identifying the position of each atom in the molecules before and after a chemical reaction, which is important for understanding the reaction mechanism. As more machine learning (ML) models were developed for retrosynthesis and reaction outcome prediction recently, the quality of these models is highly dependent on the quality of the AAM in reaction datasets. Although there are algorithms using graph theory or unsupervised learning to label the AAM for reaction datasets, existing methods map the atoms based on substructure alignments instead of chemistry knowledge. Here, we present LocalMapper, an ML model that learns correct AAM from chemist-labeled reactions via human-in-the-loop machine learning. We show that LocalMapper can predict the AAM for 50 K reactions with 98.5% calibrated accuracy by learning from only 2% of the human-labeled reactions from the entire dataset. More importantly, the confident predictions given by LocalMapper, which cover 97% of 50 K reactions, show 100% accuracy for 3,000 randomly sampled reactions. In an out-of-distribution experiment, LocalMapper shows favorable performance over other existing methods. We expect LocalMapper can be used to generate more precise reaction AAM and improve the quality of future ML-based reaction prediction models.
Collapse
Affiliation(s)
- Shuan Chen
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea
| | - Sunggi An
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea
| | | | - Yousung Jung
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea.
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea.
- Institute of Chemical Processes, Seoul National University, Seoul, South Korea.
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea.
| |
Collapse
|
4
|
Schwaller P, Hoover B, Reymond JL, Strobelt H, Laino T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. SCIENCE ADVANCES 2021; 7:7/15/eabe4166. [PMID: 33827815 PMCID: PMC8026122 DOI: 10.1126/sciadv.abe4166] [Citation(s) in RCA: 62] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 02/03/2021] [Indexed: 05/07/2023]
Abstract
Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of "reaction rules" from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.
Collapse
Affiliation(s)
- Philippe Schwaller
- IBM Research Europe, CH-8803 Rüschlikon, Switzerland.
- Department of Chemistry and Biochemistry, University of Bern, Switzerland
| | - Benjamin Hoover
- MIT-IBM Watson AI Lab, IBM Research Cambridge, Cambridge, MA 02142, USA
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Switzerland
| | - Hendrik Strobelt
- MIT-IBM Watson AI Lab, IBM Research Cambridge, Cambridge, MA 02142, USA
| | - Teodoro Laino
- IBM Research Europe, CH-8803 Rüschlikon, Switzerland
| |
Collapse
|
5
|
Automatic mapping of atoms across both simple and complex chemical reactions. Nat Commun 2019; 10:1434. [PMID: 30926819 PMCID: PMC6441094 DOI: 10.1038/s41467-019-09440-2] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Accepted: 03/01/2019] [Indexed: 11/08/2022] Open
Abstract
Mapping atoms across chemical reactions is important for substructure searches, automatic extraction of reaction rules, identification of metabolic pathways, and more. Unfortunately, the existing mapping algorithms can deal adequately only with relatively simple reactions but not those in which expert chemists would benefit from computer's help. Here we report how a combination of algorithmics and expert chemical knowledge significantly improves the performance of atom mapping, allowing the machine to deal with even the most mechanistically complex chemical and biochemical transformations. The key feature of our approach is the use of few but judiciously chosen reaction templates that are used to generate plausible "intermediate" atom assignments which then guide a graph-theoretical algorithm towards the chemically correct isomorphic mappings. The algorithm performs significantly better than the available state-of-the-art reaction mappers, suggesting its uses in database curation, mechanism assignments, and - above all - machine extraction of reaction rules underlying modern synthesis-planning programs.
Collapse
|
6
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
7
|
Hadadi N, Hafner J, Soh KC, Hatzimanikatis V. Reconstruction of biological pathways and metabolic networks from in silico labeled metabolites. Biotechnol J 2017; 12. [DOI: 10.1002/biot.201600464] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Revised: 11/21/2016] [Accepted: 11/28/2016] [Indexed: 12/13/2022]
Affiliation(s)
- Noushin Hadadi
- Laboratory of Computational Systems Biotechnology (LCSB); Swiss Federal Institute of Technology (EPFL); Lausanne Switzerland
| | - Jasmin Hafner
- Laboratory of Computational Systems Biotechnology (LCSB); Swiss Federal Institute of Technology (EPFL); Lausanne Switzerland
| | - Keng Cher Soh
- Laboratory of Computational Systems Biotechnology (LCSB); Swiss Federal Institute of Technology (EPFL); Lausanne Switzerland
| | - Vassily Hatzimanikatis
- Laboratory of Computational Systems Biotechnology (LCSB); Swiss Federal Institute of Technology (EPFL); Lausanne Switzerland
| |
Collapse
|
8
|
Lynch MF, Willett P. Information retrieval research in the Department of Information Studies, University of Sheffield: 1965-1985. J Inf Sci 2016. [DOI: 10.1177/016555158701300405] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This paper discusses research which was carried out at the Department of Information Studies, University of Sheffield in the period 1965 to 1985 into storage and retrieval techniques for databases of textual and chemical structure data. The research includes the development of methods for the auto matic production of printed subject indexes and for the inde xing and retrieval of chemical structures and chemical reac tions, the variety generation method for the analysis, characterization and storage of data in a range of types of textual database, the prediction of biological activity in chemical compounds, and the design of document retrieval systems.
Collapse
Affiliation(s)
- Michael F. Lynch
- Department of Information Studies, The University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom
| | - P. Willett
- Department of Information Studies, The University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom
| |
Collapse
|
9
|
Mann M, Nahar F, Schnorr N, Backofen R, Stadler PF, Flamm C. Atom mapping with constraint programming. Algorithms Mol Biol 2014; 9:23. [PMID: 25484913 PMCID: PMC4256833 DOI: 10.1186/s13015-014-0023-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Accepted: 10/30/2014] [Indexed: 11/18/2022] Open
Abstract
Chemical reactions are rearrangements of chemical bonds. Each atom in an educt molecule thus appears again in a specific position of one of the reaction products. This bijection between educt and product atoms is not reported by chemical reaction databases, however, so that the “Atom Mapping Problem” of finding this bijection is left as an important computational task for many practical applications in computational chemistry and systems biology. Elementary chemical reactions feature a cyclic imaginary transition state (ITS) that imposes additional restrictions on the bijection between educt and product atoms that are not taken into account by previous approaches. We demonstrate that Constraint Programming is well-suited to solving the Atom Mapping Problem in this setting. The performance of our approach is evaluated for a manually curated subset of chemical reactions from the KEGG database featuring various ITS cycle layouts and reaction mechanisms.
Collapse
|
10
|
Warr WA. A Short Review of Chemical Reaction Database Systems, Computer-Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility. Mol Inform 2014; 33:469-76. [DOI: 10.1002/minf.201400052] [Citation(s) in RCA: 85] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2014] [Accepted: 04/22/2014] [Indexed: 11/09/2022]
|
11
|
Kraut H, Eiblmaier J, Grethe G, Löw P, Matuszczyk H, Saller H. Algorithm for Reaction Classification. J Chem Inf Model 2013; 53:2884-95. [DOI: 10.1021/ci400442f] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Hans Kraut
- InfoChem GmbH, Landsberger Strasse
408/V, D-81241, Munich, Bavaria, Germany
| | - Josef Eiblmaier
- InfoChem GmbH, Landsberger Strasse
408/V, D-81241, Munich, Bavaria, Germany
| | - Guenter Grethe
- 352 Channing
Way, Alameda, California 94502-7409, United States
| | - Peter Löw
- InfoChem GmbH, Landsberger Strasse
408/V, D-81241, Munich, Bavaria, Germany
| | - Heinz Matuszczyk
- InfoChem GmbH, Landsberger Strasse
408/V, D-81241, Munich, Bavaria, Germany
| | - Heinz Saller
- InfoChem GmbH, Landsberger Strasse
408/V, D-81241, Munich, Bavaria, Germany
| |
Collapse
|
12
|
Chen WL, Chen DZ, Taylor KT. Automatic reaction mapping and reaction center detection. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013. [DOI: 10.1002/wcms.1140] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
13
|
First EL, Gounaris CE, Floudas CA. Stereochemically Consistent Reaction Mapping and Identification of Multiple Reaction Mechanisms through Integer Linear Optimization. J Chem Inf Model 2011; 52:84-92. [DOI: 10.1021/ci200351b] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Eric L. First
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Chrysanthos E. Gounaris
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Christodoulos A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
14
|
Warr WA. Representation of chemical structures. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.36] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
15
|
Dogane I, Takabatake T, Bersohn M. Computer-executed synthesis planning, a progress report. ACTA ACUST UNITED AC 2010. [DOI: 10.1002/recl.19921110606] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
16
|
Abstract
Automated reaction mapping is a fundamental first step in the analysis of chemical reactions and opens the door to the development of sophisticated chemical kinetic tools. This article formulates the reaction mapping problem as an optimization problem. The problem is shown to be NP-Complete for general graphs. Five algorithms based on canonical graph naming and enumerative combinatoric techniques are developed to solve the problem. Unlike previous formulations based on limited configurations or classifications, our algorithms are uniquely capable of mapping any reaction that can be represented as a set of chemical graphs optimally. This is due to the direct use of Graph Isomorphism as the basis for these algorithms as opposed to the more commonly used Maximum Common Subgraph. Experimental results on chemical and biological reaction databases demonstrate the efficiency of our algorithms.
Collapse
|
17
|
Willett P. From chemical documentation to chemoinformatics: 50 years of chemical information science. J Inf Sci 2008. [DOI: 10.1177/0165551507084631] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This paper summarizes the historical development of the discipline that is now called `chemoinformatics'. It shows how this has evolved, principally as a result of technological developments in chemistry and biology during the past decade, from long-established techniques for the modelling and searching of chemical molecules. A total of 30 papers, the earliest dating back to 1957, are briefly summarized to highlight some of the key publications and to show the development of the discipline.
Collapse
|
18
|
|
19
|
Willett P. Searching Techniques for Databases of Two- and Three-Dimensional Chemical Structures. J Med Chem 2005; 48:4183-99. [PMID: 15974568 DOI: 10.1021/jm0582165] [Citation(s) in RCA: 126] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Peter Willett
- Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Western Bank, Sheffield, UK.
| |
Collapse
|
20
|
Weise A. Computeranalyse chemischer Reaktionsgleichungen mit Stöchiometriekorrektur. ACTA ACUST UNITED AC 2004. [DOI: 10.1002/prac.19803220510] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
21
|
Wang K, Wang L, Yuan Q, Luo S, Yao J, Yuan S, Zheng C, Brandt J. Construction of a generic reaction knowledge base by reaction data mining. J Mol Graph Model 2002; 19:427-33, 469. [PMID: 11552691 DOI: 10.1016/s1093-3263(00)00102-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
As synthesis by combinatorial chemistry and high throughput screening have become well-established strategies in the drug discovery process, chemists face increased challenges in managing large amounts of data and using these data to design more diverse and focused libraries. As synthesis is an intuitive and empirical process, however, the classical approaches to computer-assisted synthesis planning do not fully satisfy the needs of the synthetic chemist. We describe a novel computational technique for extracting reaction data and building a generic reaction knowledge base (GRKB) to provide chemists with useful and well-organized knowledge. The method consists of three key steps: (1) the automatic recognition of reaction centers, (2) the definition of a hierarchy of reaction patterns, and (3) the organization of the generic reaction knowledge. Significant reaction knowledge has been discovered via mining a subset of the InfoChem Reaction database. A frame system has been constructed to store and retrieve the GRKB. Applications of this GRKB to synthesis planning are illustrated.
Collapse
Affiliation(s)
- K Wang
- Laboratory of Computer Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 354 Fenglin Lu, Shanghai 200032, China
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Nowak G. Approximate reasoning of the chemical reactivity for computer simulation of chemical reactions. ACTA ACUST UNITED AC 1997. [DOI: 10.1016/s0097-8485(97)00020-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
23
|
|
24
|
Jauffret P, Tonnelier C, Hanser T, Kaufmann G, Wolff R. Machine learning of generic reactions: 2. toward an advanced computer representation of chemical reactions. ACTA ACUST UNITED AC 1990. [DOI: 10.1016/0898-5529(90)90060-l] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
25
|
Tree-structured maximal common subgraph searching. An example of parallel computation with a single sequential processor. ACTA ACUST UNITED AC 1989. [DOI: 10.1016/0898-5529(89)90046-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
26
|
Funatsu K, Endo T, Kotera N, Sasaki SI. Automatic recognition of reaction site in organic chemical reactions. ACTA ACUST UNITED AC 1988. [DOI: 10.1016/0898-5529(88)90008-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
27
|
|