1
|
Flores JE, Prymolenna AV, Lewis LA, Winans NM, Eder EK, Kew W, Young RP, Bramer LM. nmRanalysis: An Open-Source Web Application for Semi-automated NMR Metabolite Profiling. Anal Chem 2025; 97:7037-7046. [PMID: 40129367 DOI: 10.1021/acs.analchem.4c05104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2025]
Abstract
Though data acquisition and initial signal preprocessing of nuclear magnetic resonance (NMR) spectra have achieved high degrees of automation, downstream processing─specifically the profiling of spectra─has bottlenecked the overall NMR analysis workflow. Several efforts have been made to mitigate this bottleneck, but these solutions often trade an increase in automation for limitations elsewhere. In this work, we introduce nmRanalysis, a user-friendly web application that integrates the strengths of existing profiling tools for a more automated profiling workflow. nmRanalysis additionally incorporates novel features, including a machine-learning-driven recommender system for metabolite identification, further increasing the utility of nmRanalysis over the individual tools that it incorporates.
Collapse
Affiliation(s)
- Javier E Flores
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Anastasiya V Prymolenna
- Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Logan A Lewis
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Natalie M Winans
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Elizabeth K Eder
- Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - William Kew
- Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Robert P Young
- Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Lisa M Bramer
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| |
Collapse
|
2
|
Basnet BB, Zhou ZY, Wei B, Wang H. Advances in AI-based strategies and tools to facilitate natural product and drug development. Crit Rev Biotechnol 2025:1-32. [PMID: 40159111 DOI: 10.1080/07388551.2025.2478094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Revised: 02/11/2025] [Accepted: 02/16/2025] [Indexed: 04/02/2025]
Abstract
Natural products and their derivatives have been important for treating diseases in humans, animals, and plants. However, discovering new structures from natural sources is still challenging. In recent years, artificial intelligence (AI) has greatly aided the discovery and development of natural products and drugs. AI facilitates to: connect genetic data to chemical structures or vice-versa, repurpose known natural products, predict metabolic pathways, and design and optimize metabolites biosynthesis. More recently, the emergence and improvement in neural networks such as deep learning and ensemble automated web based bioinformatics platforms have sped up the discovery process. Meanwhile, AI also improves the identification and structure elucidation of unknown compounds from raw data like mass spectrometry and nuclear magnetic resonance. This article reviews these AI-driven methods and tools, highlighting their practical applications and guide for efficient natural product discovery and drug development.
Collapse
Affiliation(s)
- Buddha Bahadur Basnet
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
- Central Department of Biotechnology, Tribhuvan University, Kathmandu, Nepal
| | - Zhen-Yi Zhou
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
| | - Bin Wei
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
| | - Hong Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
- Key Laboratory of Marine Fishery Resources Exploitment, Utilization of Zhejiang Province, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
3
|
Lücken L, Mitschke N, Dittmar T, Blasius B. Network Flow Methods for NMR-Based Compound Identification. Anal Chem 2025; 97:4832-4840. [PMID: 39998390 PMCID: PMC11912116 DOI: 10.1021/acs.analchem.4c01652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 01/28/2025] [Accepted: 02/06/2025] [Indexed: 02/26/2025]
Abstract
In this work, we introduce a novel method for compound identification in mixtures based on nuclear magnetic resonance spectra. Contrary to many other methods, our approach can be used without peak-picking the mixture spectrum and simultaneously optimizes the fit of all individual compound spectra in a given library. At the core of the method, a minimum cost flow problem is solved on a network consisting of nodes that represent spectral peaks of the library compounds and the mixture. We show that our approach can outperform other popular algorithms by applying it to a standard compound identification task for 2D 1H,13C HSQC spectra of artificial mixtures and a natural sample using a library of 501 compounds. Moreover, our method retrieves individual compound concentrations with at least semiquantitative accuracy for artificial mixtures with up to 34 compounds. A software implementation of the minimum cost flow method is available on GitHub (https://github.com/GeoMetabolomics-ICBM/mcfNMR).
Collapse
Affiliation(s)
- Leonhard Lücken
- Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky Universität Oldenburg, Ammerländer Heerstraße 114-118, 26129 Oldenburg, Germany
| | - Nico Mitschke
- Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky Universität Oldenburg, Ammerländer Heerstraße 114-118, 26129 Oldenburg, Germany
| | - Thorsten Dittmar
- Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky Universität Oldenburg, Ammerländer Heerstraße 114-118, 26129 Oldenburg, Germany
- Helmholtz Institute for Functional Marine Biodiversity, Carl von Ossietzky Universität Oldenburg, Ammerländer Heerstraße 114-118, 26129 Oldenburg, Germany
| | - Bernd Blasius
- Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky Universität Oldenburg, Ammerländer Heerstraße 114-118, 26129 Oldenburg, Germany
- Helmholtz Institute for Functional Marine Biodiversity, Carl von Ossietzky Universität Oldenburg, Ammerländer Heerstraße 114-118, 26129 Oldenburg, Germany
| |
Collapse
|
4
|
Tian Z, Dai Y, Hu F, Shen Z, Xu H, Zhang H, Xu J, Hu Y, Diao Y, Li H. Enhancing Chemical Reaction Monitoring with a Deep Learning Model for NMR Spectra Image Matching to Target Compounds. J Chem Inf Model 2024; 64:5624-5633. [PMID: 38979856 DOI: 10.1021/acs.jcim.4c00522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
In the synthetic laboratory, researchers typically rely on nuclear magnetic resonance (NMR) spectra to elucidate structures of synthesized products and confirm whether they match the desired target compounds. As chemical synthesis technology evolves toward intelligence and continuity, efficient computer-assisted structure elucidation (CASE) techniques are required to replace time-consuming manual analysis and provide the necessary speed. However, current CASE methods typically aim to derive precise chemical structures from spectroscopic data, yet they suffer from drawbacks such as low accuracy, high computational cost, and reliance on chemical libraries. In meticulously designed chemical synthesis reactions, researchers prioritize confirming the attainment of the target product based on NMR spectra, rather than focusing on identifying the specific product obtained. For this purpose, we innovatively developed a binary classification model, termed as MatCS, to directly predict the relationship between NMR spectra image (including 1H NMR and 13C NMR) and the molecular structure of the target compound. After evaluating various feature extraction methods, MatCS employs a combination of the Graph Attention Networks and Graph Convolutional Networks to learn the structural features of molecular graphs and the pretrained ResNet101 network with a Convolutional Block Attention Module to extract features from NMR spectra images. The results show that on a challenging Testsim data set, which poses difficulty in distinguishing spectra of similar molecular structures, MatCS achieves comprehensive evaluation metrics with an F1-score of 0.81 and an AUC value of 0.87. Simultaneously, it exhibited commendable performance on an external SDBS data set containing experimental NMR spectra, showcasing substantial potential for structural verification tasks in real automated chemical synthesis.
Collapse
Affiliation(s)
- ZiJing Tian
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yan Dai
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Hu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - ZiHao Shen
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - HongLing Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - HongWen Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - JinHang Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - YuTing Hu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - YanYan Diao
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai 200062, China
- Lingang Laboratory, Shanghai 200031, China
| | - HongLin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai 200062, China
- Lingang Laboratory, Shanghai 200031, China
| |
Collapse
|
5
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
6
|
Xue X, Sun H, Yang M, Liu X, Hu HY, Deng Y, Wang X. Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Anal Chem 2023; 95:13733-13745. [PMID: 37688541 DOI: 10.1021/acs.analchem.3c02540] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.
Collapse
Affiliation(s)
- Xi Xue
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- Beijing Key Laboratory of Active Substances Discovery and Drugability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P. R. China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hai-Yu Hu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
- CarbonSilicon AI Technology Co., Ltd. Beijing 100080, China
| |
Collapse
|
7
|
Peng Z, Zhang Y, Ai Z, Pandiselvam R, Guo J, Kothakota A, Liu Y. Current physical techniques for the degradation of aflatoxins in food and feed: Safety evaluation methods, degradation mechanisms and products. Compr Rev Food Sci Food Saf 2023; 22:4030-4052. [PMID: 37306549 DOI: 10.1111/1541-4337.13197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 05/16/2023] [Accepted: 05/26/2023] [Indexed: 06/13/2023]
Abstract
Aflatoxins are the most toxic natural mycotoxins discovered so far, posing a serious menace to the food safety and trading economy of the world, especially developing countries. How to effectively detoxify has persistently occupied a place on the list of "global hot-point" concerns. Among the developed detoxification methods, physical methods, as the authoritative techniques for aflatoxins degradation, could rapidly induce irreversible denaturation of aflatoxins. This review presents a brief overview of aflatoxins detection and degradation product structure identification methods. Four main safety evaluation methods for aflatoxins and degradation product toxicity assessment are highlighted combined with an update on research of aflatoxins decontamination in the last decade. Furthermore, the latest applications, degradation mechanisms and products of physical aflatoxin decontamination techniques including microwave heating, irradiation, pulsed light, cold plasma and ultrasound are discussed in detail. Regulatory issues related to "detoxification" are also explained. Finally, we put forward the challenges and future work in studying aflatoxin degradation based on the existing research. The purpose of supplying this information is to help researchers have a deeper understanding on the degradation of aflatoxins, break through the existing bottleneck, and further improve and innovate the detoxification methods of aflatoxins.
Collapse
Affiliation(s)
- Zekang Peng
- College of Engineering, China Agricultural University, Beijing, China
| | - Yue Zhang
- College of Engineering, China Agricultural University, Beijing, China
| | - Ziping Ai
- College of Engineering, China Agricultural University, Beijing, China
| | - Ravi Pandiselvam
- Division of Physiology, Biochemistry and Post-Harvest Technology, ICAR-Central Plantation Crops Research Institute, Kasaragod, Kerala, India
| | - Jiale Guo
- College of Engineering, China Agricultural University, Beijing, China
| | - Anjineyulu Kothakota
- Agro-Processing & Technology Division, CSIR-National Institute for Interdisciplinary Science and Technology (NIIST), Trivandrum, Kerala, India
| | - Yanhong Liu
- College of Engineering, China Agricultural University, Beijing, China
| |
Collapse
|
8
|
Judge MT, Ebbels TMD. Problems, principles and progress in computational annotation of NMR metabolomics data. Metabolomics 2022; 18:102. [PMID: 36469142 PMCID: PMC9722819 DOI: 10.1007/s11306-022-01962-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 11/18/2022] [Indexed: 12/08/2022]
Abstract
BACKGROUND Compound identification remains a critical bottleneck in the process of exploiting Nuclear Magnetic Resonance (NMR) metabolomics data, especially for 1H 1-dimensional (1H 1D) data. As databases of reference compound spectra have grown, workflows have evolved to rely heavily on their search functions to facilitate this process by generating lists of potential metabolites found in complex mixture data, facilitating annotation and identification. However, approaches for validating and communicating annotations are most often guided by expert knowledge, and therefore are highly variable despite repeated efforts to align practices and define community standards. AIM OF REVIEW This review is aimed at broadening the application of automated annotation tools by discussing the key ideas of spectral matching and beginning to describe a set of terms to classify this information, thus advancing standards for communicating annotation confidence. Additionally, we hope that this review will facilitate the growing collaboration between chemical data scientists, software developers and the NMR metabolomics community aiding development of long-term software solutions. KEY SCIENTIFIC CONCEPTS OF REVIEW We begin with a brief discussion of the typical untargeted NMR identification workflow. We differentiate between annotation (hypothesis generation, filtering), and identification (hypothesis testing, verification), and note the utility of different NMR data features for annotation. We then touch on three parts of annotation: (1) generation of queries, (2) matching queries to reference data, and (3) scoring and confidence estimation of potential matches for verification. In doing so, we highlight existing approaches to automated and semi-automated annotation from the perspective of the structural information they utilize, as well as how this information can be represented computationally.
Collapse
Affiliation(s)
- Michael T Judge
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College, 131 Sir Alexander Fleming Building, South Kensington Campus, London, UK
| | - Timothy M D Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College, 131 Sir Alexander Fleming Building, South Kensington Campus, London, UK.
| |
Collapse
|
9
|
Sumita M, Terayama K, Tamura R, Tsuda K. QCforever: A Quantum Chemistry Wrapper for Everyone to Use in Black-Box Optimization. J Chem Inf Model 2022; 62:4427-4434. [PMID: 36074116 PMCID: PMC9518232 DOI: 10.1021/acs.jcim.2c00812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Indexed: 11/29/2022]
Abstract
To obtain observable physical or molecular properties such as ionization potential and fluorescent wavelength with quantum chemical (QC) computation, multi-step computation manipulated by a human is required. Hence, automating the multi-step computational process and making it a black box that can be handled by anybody are important for effective database construction and fast realistic material design through the framework of black-box optimization where machine learning algorithms are introduced as a predictor. Here, we propose a Python library, QCforever, to automate the computation of some molecular properties and chemical phenomena induced by molecules. This tool just requires a molecule file for providing its observable properties, automating the computation process of molecular properties (for ionization potential, fluorescence, etc.) and output analysis for providing their multi-values for evaluating a molecule. Incorporating the tool in black-box optimization, we can explore molecules that have properties we desired within the limitation of QC computation.
Collapse
Affiliation(s)
- Masato Sumita
- RIKEN
Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- International
Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, Tsukuba 305-0044, Japan
| | - Kei Terayama
- RIKEN
Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- Graduate
School of Medical Life Science, Yokohama
City University, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Ryo Tamura
- RIKEN
Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- International
Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, Tsukuba 305-0044, Japan
- Graduate
School of Frontier Sciences, The University
of Tokyo, Kashiwa 277-8561, Japan
- Research
and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan
| | - Koji Tsuda
- RIKEN
Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- Graduate
School of Frontier Sciences, The University
of Tokyo, Kashiwa 277-8561, Japan
- Research
and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan
| |
Collapse
|
10
|
Sridharan B, Mehta S, Pathak Y, Priyakumar UD. Deep Reinforcement Learning for Molecular Inverse Problem of Nuclear Magnetic Resonance Spectra to Molecular Structure. J Phys Chem Lett 2022; 13:4924-4933. [PMID: 35635003 DOI: 10.1021/acs.jpclett.2c00624] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Spectroscopy is the study of how matter interacts with electromagnetic radiation. The spectra of any molecule are highly information-rich, yet the inverse relation of spectra to the corresponding molecular structure is still an unsolved problem. Nuclear magnetic resonance (NMR) spectroscopy is one such critical technique in the scientists' toolkit to characterize molecules. In this work, a novel machine learning framework is proposed that attempts to solve this inverse problem by navigating the chemical space to find the correct structure given an NMR spectra. The proposed framework uses a combination of online Monte Carlo tree search (MCTS) and a set of graph convolution networks to build a molecule iteratively. Our method can predict the structure of the molecule ∼80% of the time in its top 3 guesses for molecules with <10 heavy atoms. We believe that the proposed framework is a significant step in solving the inverse design problem of NMR spectra.
Collapse
Affiliation(s)
- Bhuvanesh Sridharan
- Centre for Computational Natural Science and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Sarvesh Mehta
- Centre for Computational Natural Science and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Yashaswi Pathak
- Centre for Computational Natural Science and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - U Deva Priyakumar
- Centre for Computational Natural Science and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| |
Collapse
|
11
|
Singh K, Münchmeyer J, Weber L, Leser U, Bande A. Graph Neural Networks for Learning Molecular Excitation Spectra. J Chem Theory Comput 2022; 18:4408-4417. [PMID: 35671364 DOI: 10.1021/acs.jctc.2c00255] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Machine learning (ML) approaches have demonstrated the ability to predict molecular spectra at a fraction of the computational cost of traditional theoretical chemistry methods while maintaining high accuracy. Graph neural networks (GNNs) are particularly promising in this regard, but different types of GNNs have not yet been systematically compared. In this work, we benchmark and analyze five different GNNs for the prediction of excitation spectra from the QM9 dataset of organic molecules. We compare the GNN performance in the obvious runtime measurements, prediction accuracy, and analysis of outliers in the test set. Moreover, through TMAP clustering and statistical analysis, we are able to highlight clear hotspots of high prediction errors as well as optimal spectra prediction for molecules with certain functional groups. This in-depth benchmarking and subsequent analysis protocol lays down a recipe for comparing different ML methods and evaluating dataset quality.
Collapse
Affiliation(s)
- Kanishka Singh
- Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Hahn-Meitner-Platz 1, Berlin 10409, Germany.,Institute of Chemistry and Biochemistry, Freie Universität Berlin, Arnimallee 22, Berlin 14195, Germany
| | - Jannes Münchmeyer
- Deutsches GeoForschungsZentrum GFZ, Telegrafenberg, 14473 Potsdam, Germany.,Humboldt-Universität zu Berlin, Unter den Linden 6, 10117 Berlin, Germany
| | - Leon Weber
- Humboldt-Universität zu Berlin, Unter den Linden 6, 10117 Berlin, Germany.,Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Robert-Rössle-Strase 10, Berlin 13125, Germany
| | - Ulf Leser
- Humboldt-Universität zu Berlin, Unter den Linden 6, 10117 Berlin, Germany
| | - Annika Bande
- Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Hahn-Meitner-Platz 1, Berlin 10409, Germany
| |
Collapse
|
12
|
Deep Learning-Based Method for Compound Identification in NMR Spectra of Mixtures. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27123653. [PMID: 35744782 PMCID: PMC9227391 DOI: 10.3390/molecules27123653] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 06/03/2022] [Accepted: 06/05/2022] [Indexed: 11/16/2022]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is highly unbiased and reproducible, which provides us a powerful tool to analyze mixtures consisting of small molecules. However, the compound identification in NMR spectra of mixtures is highly challenging because of chemical shift variations of the same compound in different mixtures and peak overlapping among molecules. Here, we present a pseudo-Siamese convolutional neural network method (pSCNN) to identify compounds in mixtures for NMR spectroscopy. A data augmentation method was implemented for the superposition of several NMR spectra sampled from a spectral database with random noises. The augmented dataset was split and used to train, validate and test the pSCNN model. Two experimental NMR datasets (flavor mixtures and additional flavor mixture) were acquired to benchmark its performance in real applications. The results show that the proposed method can achieve good performances in the augmented test set (ACC = 99.80%, TPR = 99.70% and FPR = 0.10%), the flavor mixtures dataset (ACC = 97.62%, TPR = 96.44% and FPR = 2.29%) and the additional flavor mixture dataset (ACC = 91.67%, TPR = 100.00% and FPR = 10.53%). We have demonstrated that the translational invariance of convolutional neural networks can solve the chemical shift variation problem in NMR spectra. In summary, pSCNN is an off-the-shelf method to identify compounds in mixtures for NMR spectroscopy because of its accuracy in compound identification and robustness to chemical shift variation.
Collapse
|
13
|
Migdadi L, Telfah A, Hergenröder R, Wöhler C. Novelty detection for metabolic dynamics established on breast cancer tissue using 2D NMR TOCSY spectra. Comput Struct Biotechnol J 2022; 20:2965-2977. [PMID: 35782733 PMCID: PMC9213235 DOI: 10.1016/j.csbj.2022.05.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 05/26/2022] [Accepted: 05/26/2022] [Indexed: 11/30/2022] Open
Abstract
Most metabolic profiling approaches focus only on identifying pre-known metabolites on NMR TOCSY spectrum using configured parameters. However, there is a lack of tasks dealing with automating the detection of new metabolites that might appear during the dynamic evolution of biological cells. Novelty detection is a category of machine learning that is used to identify data that emerge during the test phase and were not considered during the training phase. We propose a novelty detection system for detecting novel metabolites in the 2D NMR TOCSY spectrum of a breast cancer-tissue sample. We build one- and multi-class recognition systems using different classifiers such as, Kernel Null Foley-Sammon Transform, Kernel Density Estimation, and Support Vector Data Description. The training models were constructed based on different sizes of training data and are used in the novelty detection procedure. Multiple evaluation measures were applied to test the performance of the novelty detection methods. Depending on the training data size, all classifiers were able to achieve 0% false positive rates and total misclassification error in addition to 100% true positive rates. The median total time for the novelty detection process varies between 1.5 and 20 seconds, depending on the classifier and the amount of training data. The results of our novel metabolic profiling method demonstrate its suitability, robustness and speed in automated metabolic research.
Collapse
Key Words
- 2D NMR TOCSY
- ATP, Adenosine Triphosphate
- AUC, Area under Curve
- BMRB, Biological Magnetic Resonance Data Bank
- Breast cancer
- Chemometrics
- Classification
- HMDB, Human Metabolome Database
- KDE, Kernel Density Estimation
- KNFST, Kernel Null Foley–Sammon Transform
- Machine learning
- Metabolic profiling
- Metabolomics
- NMR, Nuclear Magnetic Resonance
- Novelty detection
- ROC, Receiver Operating Characteristic
- SVDD, Support Vector Data Description
- TOCSY, Total Correlation Spectroscopy
Collapse
Affiliation(s)
- Lubaba Migdadi
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V, 44139 Dortmund, Germany
- Image Analysis Group, TU Dortmund, 44227 Dortmund, Germany
| | - Ahmad Telfah
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V, 44139 Dortmund, Germany
| | - Roland Hergenröder
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V, 44139 Dortmund, Germany
| | | |
Collapse
|
14
|
Seifert NA, Prozument K, Davis MJ. Computational optimal transport for molecular spectra: The semi-discrete case. J Chem Phys 2022; 156:134117. [PMID: 35395885 DOI: 10.1063/5.0087385] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Comparing a discrete molecular spectrum to a continuous molecular spectrum in a quantitative manner is a challenging problem, for example, when attempting to fit a theoretical stick spectrum to a continuous spectrum. In this paper, the use of computational optimal transport is investigated for such a problem. In the optimal transport literature, the comparison of a discrete and a continuous spectrum is referred to as semi-discrete optimal transport and is a situation where a metric such as least-squares may be difficult to define except under special conditions. The merits of an optimal transport approach for this problem are investigated using the transport distance defined for the semi-discrete case. A tutorial on semi-discrete optimal transport for molecular spectra is included in this paper, and several well-chosen synthetic spectra are investigated to demonstrate the utility of computational optimal transport for the semi-discrete case. Among several types of investigations, we include calculations showing how the frequency resolution of the continuous spectrum affects the transport distance between a discrete and a continuous spectrum. We also use the transport distance to measure the distance between a continuous experimental electronic absorption spectrum of SO2 and a theoretical stick spectrum for the same system. The comparison of the theoretical and experimental SO2 spectra also allows us to suggest a theoretical value for the band origin that is closer to the observed band origin than previous theoretical values.
Collapse
Affiliation(s)
- Nathan A Seifert
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, Illinois 60439, USA
| | - Kirill Prozument
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, Illinois 60439, USA
| | - Michael J Davis
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, Illinois 60439, USA
| |
Collapse
|
15
|
Sridharan B, Goel M, Priyakumar UD. Modern Machine Learning for Tackling Inverse Problems in Chemistry: Molecular Design to Realization. Chem Commun (Camb) 2022; 58:5316-5331. [DOI: 10.1039/d1cc07035e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The discovery of new molecules and materials helps expand the horizons of novel and innovative real-life applications. In the pursuit of finding molecules with desired properties, chemists have traditionally relied...
Collapse
|
16
|
Molecular search by NMR spectrum based on evaluation of matching between spectrum and molecule. Sci Rep 2021; 11:20998. [PMID: 34697368 PMCID: PMC8546062 DOI: 10.1038/s41598-021-00488-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 10/13/2021] [Indexed: 11/17/2022] Open
Abstract
Inferring molecular structures from experimentally measured nuclear magnetic resonance (NMR) spectra is an important task in many chemistry applications. Herein, we present a novel method implementing an automated molecular search by NMR spectrum. Given a query spectrum and a pool of candidate molecules, the matching score of each candidate molecule with respect to the query spectrum is evaluated by introducing a molecule-to-spectrum estimation procedure. The candidate molecule with the highest matching score is selected. This procedure does not require any prior knowledge of the corresponding molecular structure nor laborious manual efforts by chemists. We demonstrate the effectiveness of the proposed method on molecular search using 13C NMR spectra.
Collapse
|
17
|
Kikuchi J, Yamada S. The exposome paradigm to predict environmental health in terms of systemic homeostasis and resource balance based on NMR data science. RSC Adv 2021; 11:30426-30447. [PMID: 35480260 PMCID: PMC9041152 DOI: 10.1039/d1ra03008f] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 08/31/2021] [Indexed: 12/22/2022] Open
Abstract
The environment, from microbial ecosystems to recycled resources, fluctuates dynamically due to many physical, chemical and biological factors, the profile of which reflects changes in overall state, such as environmental illness caused by a collapse of homeostasis. To evaluate and predict environmental health in terms of systemic homeostasis and resource balance, a comprehensive understanding of these factors requires an approach based on the "exposome paradigm", namely the totality of exposure to all substances. Furthermore, in considering sustainable development to meet global population growth, it is important to gain an understanding of both the circulation of biological resources and waste recycling in human society. From this perspective, natural environment, agriculture, aquaculture, wastewater treatment in industry, biomass degradation and biodegradable materials design are at the forefront of current research. In this respect, nuclear magnetic resonance (NMR) offers tremendous advantages in the analysis of samples of molecular complexity, such as crude bio-extracts, intact cells and tissues, fibres, foods, feeds, fertilizers and environmental samples. Here we outline examples to promote an understanding of recent applications of solution-state, solid-state, time-domain NMR and magnetic resonance imaging (MRI) to the complex evaluation of organisms, materials and the environment. We also describe useful databases and informatics tools, as well as machine learning techniques for NMR analysis, demonstrating that NMR data science can be used to evaluate the exposome in both the natural environment and human society towards a sustainable future.
Collapse
Affiliation(s)
- Jun Kikuchi
- Environmental Metabolic Analysis Research Team, RIKEN Center for Sustainable Resource Science 1-7-22 Suehiro-cho, Tsurumi-ku Yokohama 230-0045 Japan
- Graduate School of Bioagricultural Sciences, Nagoya University Furo-cho, Chikusa-ku Nagoya 464-8601 Japan
- Graduate School of Medical Life Science, Yokohama City University 1-7-29 Suehiro-cho, Tsurumi-ku Yokohama 230-0045 Japan
| | - Shunji Yamada
- Environmental Metabolic Analysis Research Team, RIKEN Center for Sustainable Resource Science 1-7-22 Suehiro-cho, Tsurumi-ku Yokohama 230-0045 Japan
- Prediction Science Laboratory, RIKEN Cluster for Pioneering Research 7-1-26 Minatojima-minami-machi, Chuo-ku Kobe 650-0047 Japan
- Data Assimilation Research Team, RIKEN Center for Computational Science 7-1-26 Minatojima-minami-machi, Chuo-ku Kobe 650-0047 Japan
| |
Collapse
|
18
|
Ma B, Terayama K, Matsumoto S, Isaka Y, Sasakura Y, Iwata H, Araki M, Okuno Y. Structure-Based de Novo Molecular Generator Combined with Artificial Intelligence and Docking Simulations. J Chem Inf Model 2021; 61:3304-3313. [PMID: 34242036 DOI: 10.1021/acs.jcim.1c00679] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Recently, molecular generation models based on deep learning have attracted significant attention in drug discovery. However, most existing molecular generation models have serious limitations in the context of drug design wherein they do not sufficiently consider the effect of the three-dimensional (3D) structure of the target protein in the generation process. In this study, we developed a new deep learning-based molecular generator, SBMolGen, that integrates a recurrent neural network, a Monte Carlo tree search, and docking simulations. The results of an evaluation using four target proteins (two kinases and two G protein-coupled receptors) showed that the generated molecules had a better binding affinity score (docking score) than the known active compounds, and the generated molecules possessed a broader chemical space distribution. SBMolGen not only generates novel binding active molecules but also presents 3D docking poses with target proteins, which will be useful in subsequent drug design. The code is available at https://github.com/clinfo/SBMolGen.
Collapse
Affiliation(s)
- Biao Ma
- Center for Cluster Development and Coordination, Foundation for Biomedical Research and Innovation at Kobe, 1-5-2, Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Kei Terayama
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku, Kyoto 606-8507, Japan.,Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yohohama, Kanagawa 230-0045, Japan
| | - Shigeyuki Matsumoto
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku, Kyoto 606-8507, Japan
| | - Yuta Isaka
- Center for Cluster Development and Coordination, Foundation for Biomedical Research and Innovation at Kobe, 1-5-2, Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Yoko Sasakura
- Center for Cluster Development and Coordination, Foundation for Biomedical Research and Innovation at Kobe, 1-5-2, Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Hiroaki Iwata
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku, Kyoto 606-8507, Japan
| | - Mitsugu Araki
- Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku, Kyoto 606-8507, Japan
| | - Yasushi Okuno
- Center for Cluster Development and Coordination, Foundation for Biomedical Research and Innovation at Kobe, 1-5-2, Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,Graduate School of Medicine, Kyoto University, 53 Shogoin-Kawaharacho, Sakyo-ku, Kyoto 606-8507, Japan
| |
Collapse
|
19
|
Kurotani A, Kakiuchi T, Kikuchi J. Solubility Prediction from Molecular Properties and Analytical Data Using an In-phase Deep Neural Network (Ip-DNN). ACS OMEGA 2021; 6:14278-14287. [PMID: 34124451 PMCID: PMC8190808 DOI: 10.1021/acsomega.1c01035] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 04/28/2021] [Indexed: 06/12/2023]
Abstract
Materials informatics is an emerging field that allows us to predict the properties of materials and has been applied in various research and development fields, such as materials science. In particular, solubility factors such as the Hansen and Hildebrand solubility parameters (HSPs and SP, respectively) and Log P are important values for understanding the physical properties of various substances. In this study, we succeeded at establishing a solubility prediction tool using a unique machine learning method called the in-phase deep neural network (ip-DNN), which starts exclusively from the analytical input data (e.g., NMR information, refractive index, and density) to predict solubility by predicting intermediate elements, such as molecular components and molecular descriptors, in the multiple-step method. For improving the level of accuracy of the prediction, intermediate regression models were employed when performing in-phase machine learning. In addition, we developed a website dedicated to the established solubility prediction method, which is freely available at "http://dmar.riken.jp/matsolca/".
Collapse
Affiliation(s)
- Atsushi Kurotani
- RIKEN
Center for Sustainable Resource Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Toshifumi Kakiuchi
- AGC
Yokohama Technical Center, 1-1 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Jun Kikuchi
- RIKEN
Center for Sustainable Resource Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Graduate
School of Medical Life Science, Yokohama
City University, 1-7-29
Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Graduate
School of Bioagricultural Sciences, Nagoya
University, 1 Furo-cho, Chikusa-ku, Nagoya, Aichi 464-0810, Japan
| |
Collapse
|
20
|
Ito K, Xu X, Kikuchi J. Improved Prediction of Carbonless NMR Spectra by the Machine Learning of Theoretical and Fragment Descriptors for Environmental Mixture Analysis. Anal Chem 2021; 93:6901-6906. [PMID: 33929838 DOI: 10.1021/acs.analchem.1c00756] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
As the first multidimensional NMR approach, 2D J-resolved (2DJ) spectroscopy is distinguished by signal resolution and detection sensitivity with remarkable advantages for the exhaustive evaluation of complex mixtures and environmental samples due to its carbonless feature without the requirement of 13C connectivity. Generally, the 2DJ signal assignment of metabolic mixtures is problematic in spite of references to experimental NMR databases, owing to the existence of metabolic "dark matter." In this study, a new method to predict 2DJ spectra was developed with a combination of quantum mechanical (QM) computation and machine learning (ML). The predictive accuracy of J-coupling constants was evaluated using validated data. The root-mean-square deviation (RMSD) for QM computation was 3.52 Hz, while the RMSD for QM + ML was 1.21 Hz, indicating a substantial increase in predictive accuracy. The proposed model was applied to predict the 2DJ spectra of 60 standard substances and 55 components of seawater. Furthermore, two practical environmental samples were used to evaluate the robustness of the constructed predictive model. A J-coupling tree and J-split spectra produced from QM + ML of aliphatic moieties had good consistency with the experimental data, as compared with the theoretical data produced by QM computation. The predicted J-coupling tree for the J-coupling multiplet analysis of freely rotating bonds in the complex mixture, which is traditionally difficult, was interpretable. In addition, in silico identification of the J-split 1H NMR signals, which was independent of experimental databases, aided in the discovery of new components in a mixture.
Collapse
Affiliation(s)
- Kengo Ito
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Xiangru Xu
- Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Jun Kikuchi
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Graduate School of Bioagricultural Sciences, Nagoya University, 1 Furo-cho, Chikusa-ku, Nagoya, Aichi 464-0810, Japan
| |
Collapse
|
21
|
Terayama K, Sumita M, Tamura R, Tsuda K. Black-Box Optimization for Automated Discovery. Acc Chem Res 2021; 54:1334-1346. [PMID: 33635621 DOI: 10.1021/acs.accounts.0c00713] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
In chemistry and materials science, researchers and engineers discover, design, and optimize chemical compounds or materials with their professional knowledge and techniques. At the highest level of abstraction, this process is formulated as black-box optimization. For instance, the trial-and-error process of synthesizing various molecules for better material properties can be regarded as optimizing a black-box function describing the relation between a chemical formula and its properties. Various black-box optimization algorithms have been developed in the machine learning and statistics communities. Recently, a number of researchers have reported successful applications of such algorithms to chemistry. They include the design of photofunctional molecules and medical drugs, optimization of thermal emission materials and high Li-ion conductive solid electrolytes, and discovery of a new phase in inorganic thin films for solar cells.There are a wide variety of algorithms available for black-box optimization, such as Bayesian optimization, reinforcement learning, and active learning. Practitioners need to select an appropriate algorithm or, in some cases, develop novel algorithms to meet their demands. It is also necessary to determine how to best combine machine learning techniques with quantum mechanics- and molecular mechanics-based simulations, and experiments. In this Account, we give an overview of recent studies regarding automated discovery, design, and optimization based on black-box optimization. The Account covers the following algorithms: Bayesian optimization to optimize the chemical or physical properties, an optimization method using a quantum annealer, best-arm identification, gray-box optimization, and reinforcement learning. In addition, we introduce active learning and boundless objective-free exploration, which may not fall into the category of black-box optimization.Data quality and quantity are key for the success of these automated discovery techniques. As laboratory automation and robotics are put forward, automated discovery algorithms would be able to match human performance at least in some domains in the near future.
Collapse
Affiliation(s)
- Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi-ku 230-0045, Japan
- RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- Medical Sciences Innovation Hub Program, RIKEN, Yokohama 230-0045, Japan
- Graduate School of Medicine, Kyoto University, Sakyo-ku 606-8507, Japan
| | - Masato Sumita
- RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- International Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, Tsukuba 305-0044, Japan
| | - Ryo Tamura
- RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- International Center for Materials Nanoarchitectonics (WPI-MANA), National Institute for Materials Science, Tsukuba 305-0044, Japan
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8561, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan
| | - Koji Tsuda
- RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8561, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan
| |
Collapse
|
22
|
Signal Deconvolution and Generative Topographic Mapping Regression for Solid-State NMR of Multi-Component Materials. Int J Mol Sci 2021; 22:ijms22031086. [PMID: 33499371 PMCID: PMC7865946 DOI: 10.3390/ijms22031086] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 01/15/2021] [Accepted: 01/17/2021] [Indexed: 01/19/2023] Open
Abstract
Solid-state nuclear magnetic resonance (ssNMR) spectroscopy provides information on native structures and the dynamics for predicting and designing the physical properties of multi-component solid materials. However, such an analysis is difficult because of the broad and overlapping spectra of these materials. Therefore, signal deconvolution and prediction are great challenges for their ssNMR analysis. We examined signal deconvolution methods using a short-time Fourier transform (STFT) and a non-negative tensor/matrix factorization (NTF, NMF), and methods for predicting NMR signals and physical properties using generative topographic mapping regression (GTMR). We demonstrated the applications for macromolecular samples involved in cellulose degradation, plastics, and microalgae such as Euglena gracilis. During cellulose degradation, 13C cross-polarization (CP)-magic angle spinning spectra were separated into signals of cellulose, proteins, and lipids by STFT and NTF. GTMR accurately predicted cellulose degradation for catabolic products such as acetate and CO2. Using these methods, the 1H anisotropic spectrum of poly-ε-caprolactone was separated into the signals of crystalline and amorphous solids. Forward prediction and inverse prediction of GTMR were used to compute STFT-processed NMR signals from the physical properties of polylactic acid. These signal deconvolution and prediction methods for ssNMR spectra of macromolecules can resolve the problem of overlapping spectra and support macromolecular characterization and material design.
Collapse
|