1
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
2
|
Vadaddi SM, Zhao Q, Savoie BM. Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability. J Phys Chem A 2024; 128:2543-2555. [PMID: 38517281 DOI: 10.1021/acs.jpca.3c07240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Activation energy characterization of competing reactions is a costly but crucial step for understanding the kinetic relevance of distinct reaction pathways, product yields, and myriad other properties of reacting systems. The standard methodology for activation energy characterization has historically been a transition state search using the highest level of theory that can be afforded. However, recently, several groups have popularized the idea of predicting activation energies directly based on nothing more than the reactant and product graphs, a sufficiently complex neural network, and a broad enough data set. Here, we have revisited this task using the recently developed Reaction Graph Depth 1 (RGD1) transition state data set and several newly developed graph attention architectures. All of these new architectures achieve similar state-of-the-art results of ∼4 kcal/mol mean absolute error on withheld testing sets of reactions but poor performance on external testing sets composed of reactions with differing mechanisms, reaction molecularity, or reactant size distribution. Limited transferability is also shown to be shared by other contemporary graph to activation energy architectures through a series of case studies. We conclude that an array of standard graph architectures can already achieve results comparable to the irreducible error of available reaction data sets but that out-of-distribution performance remains poor.
Collapse
Affiliation(s)
- Sai Mahit Vadaddi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
3
|
Specht T, Arweiler J, Stüber J, Münnemann K, Hasse H, Jirasek F. Automated nuclear magnetic resonance fingerprinting of mixtures. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2024; 62:286-297. [PMID: 37515509 DOI: 10.1002/mrc.5381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/30/2023] [Accepted: 07/03/2023] [Indexed: 07/31/2023]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is a powerful tool for qualitative and quantitative analysis. However, for complex mixtures, determining the speciation from NMR spectra can be tedious and sometimes even unfeasible. On the other hand, identifying and quantifying structural groups in a mixture from NMR spectra is much easier than doing the same for components. We call this group-based approach "NMR fingerprinting." In this work, we show that NMR fingerprinting can even be performed in an automated way, without expert knowledge, based only on standard NMR spectra, namely, 13C, 1H, and 13C DEPT NMR spectra. Our approach is based on the machine-learning method of support vector classification (SVC), which was trained here on thousands of labeled pure-component NMR spectra from open-source data banks. We demonstrate the applicability of the automated NMR fingerprinting using test mixtures, of which spectra were taken using a simple benchtop NMR spectrometer. The results from the NMR fingerprinting agree remarkably well with the ground truth, which was known from the gravimetric preparation of the samples. To facilitate the application of the method, we provide an interactive website (https://nmr-fingerprinting.de), where spectral information can be uploaded and which returns the NMR fingerprint. The NMR fingerprinting can be used in many ways, for example, for process monitoring or thermodynamic modeling using group-contribution methods-or simply as a first step in species analysis.
Collapse
Affiliation(s)
- Thomas Specht
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Justus Arweiler
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Johannes Stüber
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Kerstin Münnemann
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Hans Hasse
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Fabian Jirasek
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
4
|
Roach J, Mital R, Haffner JJ, Colwell N, Coats R, Palacios HM, Liu Z, Godinho JLP, Ness M, Peramuna T, McCall LI. Microbiome metabolite quantification methods enabling insights into human health and disease. Methods 2024; 222:81-99. [PMID: 38185226 DOI: 10.1016/j.ymeth.2023.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 10/27/2023] [Accepted: 12/13/2023] [Indexed: 01/09/2024] Open
Abstract
Many of the health-associated impacts of the microbiome are mediated by its chemical activity, producing and modifying small molecules (metabolites). Thus, microbiome metabolite quantification has a central role in efforts to elucidate and measure microbiome function. In this review, we cover general considerations when designing experiments to quantify microbiome metabolites, including sample preparation, data acquisition and data processing, since these are critical to downstream data quality. We then discuss data analysis and experimental steps to demonstrate that a given metabolite feature is of microbial origin. We further discuss techniques used to quantify common microbial metabolites, including short-chain fatty acids (SCFA), secondary bile acids (BAs), tryptophan derivatives, N-acyl amides and trimethylamine N-oxide (TMAO). Lastly, we conclude with challenges and future directions for the field.
Collapse
Affiliation(s)
- Jarrod Roach
- Department of Chemistry and Biochemistry, University of Oklahoma
| | - Rohit Mital
- Department of Biology, University of Oklahoma
| | - Jacob J Haffner
- Department of Anthropology, University of Oklahoma; Laboratories of Molecular Anthropology and Microbiome Research, University of Oklahoma
| | - Nathan Colwell
- Department of Chemistry and Biochemistry, University of Oklahoma
| | - Randy Coats
- Department of Chemistry and Biochemistry, University of Oklahoma
| | - Horvey M Palacios
- Department of Anthropology, University of Oklahoma; Laboratories of Molecular Anthropology and Microbiome Research, University of Oklahoma
| | - Zongyuan Liu
- Department of Chemistry and Biochemistry, University of Oklahoma
| | | | - Monica Ness
- Department of Chemistry and Biochemistry, University of Oklahoma
| | - Thilini Peramuna
- Department of Chemistry and Biochemistry, University of Oklahoma
| | - Laura-Isobel McCall
- Department of Chemistry and Biochemistry, University of Oklahoma; Laboratories of Molecular Anthropology and Microbiome Research, University of Oklahoma; Department of Chemistry and Biochemistry, San Diego State University.
| |
Collapse
|
5
|
Hu G, Qiu M. Machine learning-assisted structure annotation of natural products based on MS and NMR data. Nat Prod Rep 2023; 40:1735-1753. [PMID: 37519196 DOI: 10.1039/d3np00025g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Covering: up to March 2023Machine learning (ML) has emerged as a popular tool for analyzing the structures of natural products (NPs). This review presents a summary of the recent advancements in ML-assisted mass spectrometry (MS) and nuclear magnetic resonance (NMR) data analysis to establish the chemical structures of NPs. First, ML-based MS/MS analyses that rely on library matching are discussed, which involves the utilization of ML algorithms to calculate similarity, predict the MS/MS fragments, and form molecular fingerprint. Then, ML assisted MS/MS structural annotation without library matching is reviewed. Furthermore, the cases of ML algorithms in assisting structural studies of NPs based on NMR are discussed from four perspectives: NMR prediction, functional group identification, structural categorization and quantum chemical calculation. Finally, the review concludes with a discussion of the challenges and the trends associated with the structural establishment of NPs based on ML algorithms.
Collapse
Affiliation(s)
- Guilin Hu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Minghua Qiu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|
6
|
De Grazia G, Cucinotta L, Sciarrone D, Donato P, Trovato E, Riad N, Hattab ME, Mondello L, Rotondo A. Preparative three-dimensional GC and nuclear magnetic resonance for the isolation and identification of two sesquiterpene ethers from Dictyota Dichotoma. J Sep Sci 2023; 46:e2300261. [PMID: 37386802 DOI: 10.1002/jssc.202300261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/23/2023] [Accepted: 06/15/2023] [Indexed: 07/01/2023]
Abstract
Separation science plays a crucial role in the isolation of novel compounds contained in complex matrices. Yet their rationale employment needs preliminary structure elucidation, which usually requires sufficient aliquots of grade substances to characterize the molecule by nuclear magnetic resonance experiments. In this study, two peculiar oxa-tricycloundecane ethers were isolated by means of preparative multidimensional gas chromatography from the brown alga species Dictyota dichotoma (Huds.) Lam., aiming to assign their 3D structures. Density functional theory simulations were carried out to select the correct configurational species matching the experimental NMR data (in terms of enantiomeric couples). In this case, the theoretical approach was crucial as the protonic signal overlap and spectral overcrowding were preventing any other unambiguous structural information. Just after the identification through the density functional theory data matching of the correct relative configuration it was possible to verify an enhanced self-consistency with the experimental data, confirming the stereochemistry. The results obtained further pave the way toward structure elucidation of highly asymmetric molecules, whose configuration cannot be inferred by other means or strategies.
Collapse
Affiliation(s)
- Gemma De Grazia
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy
| | - Lorenzo Cucinotta
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy
- Traceability Unit, Research and Innovation Centre, Fondazione Edmund Mach, San Michele all'Adige, Trento, Italy
| | - Danilo Sciarrone
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy
| | - Paola Donato
- Department of Biomedical, Dental, Morphological and Functional Imaging Sciences, University of Messina, Messina, Italy
| | - Emanuela Trovato
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy
| | - Nacera Riad
- Laboratory of Natural Products Chemistry and Biomolecules, Faculty of Sciences, University Blida 1, Blida, Algeria
| | - Mohamed El Hattab
- Laboratory of Natural Products Chemistry and Biomolecules, Faculty of Sciences, University Blida 1, Blida, Algeria
| | - Luigi Mondello
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, University of Messina, Messina, Italy
- Department of Chemical, Biological, Pharmaceutical and Environmental Sciences, Chromaleont S.R.L., University of Messina, Messina, Italy
| | - Archimede Rotondo
- Department of Biomedical, Dental, Morphological and Functional Imaging Sciences, University of Messina, Messina, Italy
| |
Collapse
|
7
|
Yao L, Yang M, Song J, Yang Z, Sun H, Shi H, Liu X, Ji X, Deng Y, Wang X. Conditional Molecular Generation Net Enables Automated Structure Elucidation Based on 13C NMR Spectra and Prior Knowledge. Anal Chem 2023; 95:5393-5401. [PMID: 36926883 DOI: 10.1021/acs.analchem.2c05817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Structure elucidation of unknown compounds based on nuclear magnetic resonance (NMR) remains a challenging problem in both synthetic organic and natural product chemistry. Library matching has been an efficient method to assist structure elucidation. However, it is limited by the coverage of libraries. In addition, prior knowledge such as molecular fragments is neglected. To solve the problem, we propose a conditional molecular generation net (CMGNet) to allow input of multiple sources of information. CMGNet not only uses 13C NMR spectrum data as input but molecular formulas and fragments of molecules are also employed as input conditions. Our model applies large-scale pretraining for molecular understanding and fine-tuning on two NMR spectral data sets of different granularity levels to accommodate structure elucidation tasks. CMGNet generates structures based on 13C NMR data, molecular formula, and fragment information, with a recovery rate of 94.17% in the top 10 recommendations. In addition, the generative model performed well in the generation of various classes of compounds and in the structural revision task. CMGNet has a deep understanding of molecular connectivities from 13C NMR, molecular formula, and fragments, paving the way for a new paradigm of deep learning-assisted inverse problem-solving.
Collapse
Affiliation(s)
- Lin Yao
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Jianfei Song
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Zhuo Yang
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Hanyu Sun
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Hui Shi
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| | - Xue Liu
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China.,Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China.,CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China
| |
Collapse
|
8
|
Judge MT, Ebbels TMD. Problems, principles and progress in computational annotation of NMR metabolomics data. Metabolomics 2022; 18:102. [PMID: 36469142 PMCID: PMC9722819 DOI: 10.1007/s11306-022-01962-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 11/18/2022] [Indexed: 12/08/2022]
Abstract
BACKGROUND Compound identification remains a critical bottleneck in the process of exploiting Nuclear Magnetic Resonance (NMR) metabolomics data, especially for 1H 1-dimensional (1H 1D) data. As databases of reference compound spectra have grown, workflows have evolved to rely heavily on their search functions to facilitate this process by generating lists of potential metabolites found in complex mixture data, facilitating annotation and identification. However, approaches for validating and communicating annotations are most often guided by expert knowledge, and therefore are highly variable despite repeated efforts to align practices and define community standards. AIM OF REVIEW This review is aimed at broadening the application of automated annotation tools by discussing the key ideas of spectral matching and beginning to describe a set of terms to classify this information, thus advancing standards for communicating annotation confidence. Additionally, we hope that this review will facilitate the growing collaboration between chemical data scientists, software developers and the NMR metabolomics community aiding development of long-term software solutions. KEY SCIENTIFIC CONCEPTS OF REVIEW We begin with a brief discussion of the typical untargeted NMR identification workflow. We differentiate between annotation (hypothesis generation, filtering), and identification (hypothesis testing, verification), and note the utility of different NMR data features for annotation. We then touch on three parts of annotation: (1) generation of queries, (2) matching queries to reference data, and (3) scoring and confidence estimation of potential matches for verification. In doing so, we highlight existing approaches to automated and semi-automated annotation from the perspective of the structural information they utilize, as well as how this information can be represented computationally.
Collapse
Affiliation(s)
- Michael T Judge
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College, 131 Sir Alexander Fleming Building, South Kensington Campus, London, UK
| | - Timothy M D Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College, 131 Sir Alexander Fleming Building, South Kensington Campus, London, UK.
| |
Collapse
|
9
|
Li C, Cong Y, Deng W. Identifying molecular functional groups of organic compounds by deep learning of NMR data. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022; 60:1061-1069. [PMID: 35674984 DOI: 10.1002/mrc.5292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 06/02/2022] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
We preprocess the raw nuclear magnetic resonance (NMR) spectrum and extract key features by using two different methodologies, called equidistant sampling and peak sampling for subsequent substructure pattern recognition. We also provide a strategy to address the imbalance issue frequently encountered in statistical modeling of NMR data set and establish two conventional support vector machine (SVM) and K-nearest neighbor (KNN) models to assess the capability of two feature selections, respectively. Our results in this study show that the models using the selected features of peak sampling outperform those using equidistant sampling. Then we build the recurrent neural network (RNN) model trained by data collected from peak sampling. Furthermore, we illustrate the easier optimization of hyperparameters and the better generalization ability of the RNN deep learning model by detailed comparison with traditional machine learning SVM and KNN models.
Collapse
Affiliation(s)
- Chongcan Li
- School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou, China
| | - Yong Cong
- College of Chemistry and Chemical Engineering, State Key Laboratory of Applied Organic Chemistry, Key Laboratory of Nonferrous Metals Chemistry and Resources Utilization, Lanzhou University, Lanzhou, China
| | - Weihua Deng
- School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou, China
| |
Collapse
|
10
|
Deep Learning-Based Method for Compound Identification in NMR Spectra of Mixtures. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27123653. [PMID: 35744782 PMCID: PMC9227391 DOI: 10.3390/molecules27123653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 06/03/2022] [Accepted: 06/05/2022] [Indexed: 11/16/2022]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is highly unbiased and reproducible, which provides us a powerful tool to analyze mixtures consisting of small molecules. However, the compound identification in NMR spectra of mixtures is highly challenging because of chemical shift variations of the same compound in different mixtures and peak overlapping among molecules. Here, we present a pseudo-Siamese convolutional neural network method (pSCNN) to identify compounds in mixtures for NMR spectroscopy. A data augmentation method was implemented for the superposition of several NMR spectra sampled from a spectral database with random noises. The augmented dataset was split and used to train, validate and test the pSCNN model. Two experimental NMR datasets (flavor mixtures and additional flavor mixture) were acquired to benchmark its performance in real applications. The results show that the proposed method can achieve good performances in the augmented test set (ACC = 99.80%, TPR = 99.70% and FPR = 0.10%), the flavor mixtures dataset (ACC = 97.62%, TPR = 96.44% and FPR = 2.29%) and the additional flavor mixture dataset (ACC = 91.67%, TPR = 100.00% and FPR = 10.53%). We have demonstrated that the translational invariance of convolutional neural networks can solve the chemical shift variation problem in NMR spectra. In summary, pSCNN is an off-the-shelf method to identify compounds in mixtures for NMR spectroscopy because of its accuracy in compound identification and robustness to chemical shift variation.
Collapse
|
11
|
Xu Z, Gu S, Li Y, Wu J, Zhao Y. Recognition-Enabled Automated Analyte Identification via 19F NMR. Anal Chem 2022; 94:8285-8292. [PMID: 35622989 DOI: 10.1021/acs.analchem.2c00642] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Nuclear magnetic resonance (NMR) is an indispensable tool for structural elucidation and noninvasive analysis. Automated identification of analytes with NMR is highly pursued in metabolism research and disease diagnosis; however, this process is often complicated by the signal overlap and the sample matrix. We herein report a detection scheme based on 19F NMR spectroscopy and dynamic recognition, which effectively simplifies the detection signal and mitigates the influence of the matrix on the detection. It is demonstrated that this approach can not only detect and differentiate capsaicin and dihydrocapsaicin in complex real-world samples but also quantify the ibuprofen content in sustained-release capsules. Based on the 19F signals obtained in the detection using a set of three 19F probes, automated analyte identification is achieved, effectively reducing the odds of misrecognition caused by structural similarity.
Collapse
Affiliation(s)
- Zhenchuang Xu
- Key Laboratory of Organofluorine Chemistry, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| | - Siyi Gu
- Key Laboratory of Organofluorine Chemistry, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| | - Yipeng Li
- Key Laboratory of Organofluorine Chemistry, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| | - Jian Wu
- Instrumental Analysis Center, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| | - Yanchuan Zhao
- Key Laboratory of Organofluorine Chemistry, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China.,Key Laboratory of Energy Regulation Materials, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| |
Collapse
|