1
|
Liu Y, De Vijlder T, Bittremieux W, Laukens K, Heyndrickx W. Current and future deep learning algorithms for tandem mass spectrometry (MS/MS)-based small molecule structure elucidation. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2025; 39 Suppl 1:e9120. [PMID: 33955607 DOI: 10.1002/rcm.9120] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 04/13/2021] [Accepted: 04/29/2021] [Indexed: 06/12/2023]
Abstract
RATIONALE Structure elucidation of small molecules has been one of the cornerstone applications of mass spectrometry for decades. Despite the increasing availability of software tools, structure elucidation from tandem mass spectrometry (MS/MS) data remains a challenging task, leaving many spectra unidentified. However, as an increasing number of reference MS/MS spectra are being curated at a repository scale and shared on public servers, there is an exciting opportunity to develop powerful new deep learning (DL) models for automated structure elucidation. ARCHITECTURES Recent early-stage DL frameworks mostly follow a "two-step approach" that translates MS/MS spectra to database structures after first predicting molecular descriptors. The related architectures could suffer from: (1) computational complexity because of the separate training of descriptor-specific classifiers, (2) the high dimensional nature of mass spectral data and information loss due to data preprocessing, (3) low substructure coverage and class imbalance problem of predefined molecular fingerprints. Inspired by successful DL frameworks employed in drug discovery fields, we have conceptualized and designed hypothetical DL architectures to tackle the above issues. For (1), we recommend multitask learning to achieve better performance with fewer classifiers by grouping structurally related descriptors. For (2) and (3), we introduce feature engineering to extract condensed and higher-order information from spectra and structure data. For instance, encoding spectra with subtrees and pre-calculated spectral patterns add peak interactions to the model input. Encoding structures with graph convolutional networks incorporates connectivity within a molecule. The joint embedding of spectra and structures can enable simultaneous spectral library and molecular database search. CONCLUSIONS In principle, given enough training data, adapted DL architectures, optimal hyperparameters and computing power, DL frameworks can predict small molecule structures, completely or at least partially, from MS/MS spectra. However, their performance and general applicability should be fairly evaluated against classical machine learning frameworks.
Collapse
Affiliation(s)
| | | | - Wout Bittremieux
- University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), University of Antwerp, Antwerp, Belgium
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, San Diego, CA, USA
| | - Kris Laukens
- University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), University of Antwerp, Antwerp, Belgium
| | | |
Collapse
|
2
|
Zheng F, You L, Zhao X, Lu X, Xu G. Predicting Tandem Mass Spectra of Small Molecules Using Graph Embedding of Precursor-Product Ion Pair Graph. Anal Chem 2024; 96:19190-19195. [PMID: 39575948 DOI: 10.1021/acs.analchem.4c04375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Liquid chromatography-mass spectrometry (LC-MS)-based metabolomics identification relies heavily on high-quality MS/MS data; MS/MS prediction is a good way to address this issue. However, the accuracy of the prediction, resolution, and correlation with chemical structures have not been well-solved. In this study, we have developed a MS/MS prediction method, PPGB-MS2, which transforms the MS/MS prediction into fragment intensity prediction, and the concept of precursor-product ion pair graph bags (PPGBs) was introduced to represent fragments, achieving uniform representation of precursor and product ion structures and MS/MS fragmentation information. The chemical structure information is kept before it is incorporated into machine learning models. Due to the PPGB representation, graph neural networks (GNNs) can be utilized to achieve MS/MS fragment intensity prediction. The system was trained and evaluated using [M+H]+ and [M-H]- data acquired by an Agilent QTOF 6530 in the NIST 20 tandem MS database. Results demonstrated that the average cosine similarity is 0.71 in the test set, which is higher than classical MS/MS prediction methods. PPGB-MS2 also achieves high-resolution MS/MS prediction due to its effective management of the correspondence between fragments and structures.
Collapse
Affiliation(s)
- Fujian Zheng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Lei You
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Xinjie Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| |
Collapse
|
3
|
Liu J, Bao C, Zhang J, Han Z, Fang H, Lu H. Artificial intelligence with mass spectrometry-based multimodal molecular profiling methods for advancing therapeutic discovery of infectious diseases. Pharmacol Ther 2024; 263:108712. [PMID: 39241918 DOI: 10.1016/j.pharmthera.2024.108712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 07/22/2024] [Accepted: 09/03/2024] [Indexed: 09/09/2024]
Abstract
Infectious diseases, driven by a diverse array of pathogens, can swiftly undermine public health systems. Accurate diagnosis and treatment of infectious diseases-centered around the identification of biomarkers and the elucidation of disease mechanisms-are in dire need of more versatile and practical analytical approaches. Mass spectrometry (MS)-based molecular profiling methods can deliver a wealth of information on a range of functional molecules, including nucleic acids, proteins, and metabolites. While MS-driven omics analyses can yield vast datasets, the sheer complexity and multi-dimensionality of MS data can significantly hinder the identification and characterization of functional molecules within specific biological processes and events. Artificial intelligence (AI) emerges as a potent complementary tool that can substantially enhance the processing and interpretation of MS data. AI applications in this context lead to the reduction of spurious signals, the improvement of precision, the creation of standardized analytical frameworks, and the increase of data integration efficiency. This critical review emphasizes the pivotal roles of MS based omics strategies in the discovery of biomarkers and the clarification of infectious diseases. Additionally, the review underscores the transformative ability of AI techniques to enhance the utility of MS-based molecular profiling in the field of infectious diseases by refining the quality and practicality of data produced from omics analyses. In conclusion, we advocate for a forward-looking strategy that integrates AI with MS-based molecular profiling. This integration aims to transform the analytical landscape and the performance of biological molecule characterization, potentially down to the single-cell level. Such advancements are anticipated to propel the development of AI-driven predictive models, thus improving the monitoring of diagnostics and therapeutic discovery for the ongoing challenge related to infectious diseases.
Collapse
Affiliation(s)
- Jingjing Liu
- School of Chinese Medicine, Hong Kong Traditional Chinese Medicine Phenome Research Center, State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong 999077, China
| | - Chaohui Bao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Jiaxin Zhang
- School of Chinese Medicine, Hong Kong Traditional Chinese Medicine Phenome Research Center, State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong 999077, China
| | - Zeguang Han
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
| | - Hai Fang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
| | - Haitao Lu
- School of Chinese Medicine, Hong Kong Traditional Chinese Medicine Phenome Research Center, State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong 999077, China; Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China; Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
4
|
Russo FF, Nowatzky Y, Jaeger C, Parr MK, Benner P, Muth T, Lisec J. Machine learning methods for compound annotation in non-targeted mass spectrometry-A brief overview of fingerprinting, in silico fragmentation and de novo methods. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2024; 38:e9876. [PMID: 39180507 DOI: 10.1002/rcm.9876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 07/03/2024] [Accepted: 07/12/2024] [Indexed: 08/26/2024]
Abstract
Non-targeted screenings (NTS) are essential tools in different fields, such as forensics, health and environmental sciences. NTSs often employ mass spectrometry (MS) methods due to their high throughput and sensitivity in comparison to, for example, nuclear magnetic resonance-based methods. As the identification of mass spectral signals, called annotation, is labour intensive, it has been used for developing supporting tools based on machine learning (ML). However, both the diversity of mass spectral signals and the sheer quantity of different ML tools developed for compound annotation present a challenge for researchers in maintaining a comprehensive overview of the field. In this work, we illustrate which ML-based methods are available for compound annotation in non-targeted MS experiments and provide a nuanced comparison of the ML models used in MS data analysis, unravelling their unique features and performance metrics. Through this overview we support researchers to judiciously apply these tools in their daily research. This review also offers a detailed exploration of methods and datasets to show gaps in current methods, and promising target areas, offering a starting point for developers intending to improve existing methodologies.
Collapse
Affiliation(s)
- Francesco F Russo
- Department of Analytical Chemistry and Reference Materials, Organic Trace Analysis and Food Analysis, Bundesanstalt für Materialforschung und -prüfung (BAM), Berlin, Germany
| | - Yannek Nowatzky
- eScience, Bundesanstalt für Materialprüfung und -forschung, Berlin, Germany
| | - Carsten Jaeger
- Department of Analytical Chemistry and Reference Materials, Environmental Analysis, Bundesanstalt für Materialforschung und -prüfung (BAM), Berlin, Germany
| | - Maria K Parr
- Institute of Pharmacy, Pharmaceutical and Medicinal Chemistry (Pharmaceutical Analyses), Freie Universität, Berlin, Germany
| | - Phillipp Benner
- eScience, Bundesanstalt für Materialprüfung und -forschung, Berlin, Germany
| | - Thilo Muth
- Department MF 2, Domain Specific Data Competence Centre, Robert Koch Institut, Berlin, Germany
| | - Jan Lisec
- Department of Analytical Chemistry and Reference Materials, Organic Trace Analysis and Food Analysis, Bundesanstalt für Materialforschung und -prüfung (BAM), Berlin, Germany
| |
Collapse
|
5
|
Hu G, Qiu M. Machine learning-assisted structure annotation of natural products based on MS and NMR data. Nat Prod Rep 2023; 40:1735-1753. [PMID: 37519196 DOI: 10.1039/d3np00025g] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Covering: up to March 2023Machine learning (ML) has emerged as a popular tool for analyzing the structures of natural products (NPs). This review presents a summary of the recent advancements in ML-assisted mass spectrometry (MS) and nuclear magnetic resonance (NMR) data analysis to establish the chemical structures of NPs. First, ML-based MS/MS analyses that rely on library matching are discussed, which involves the utilization of ML algorithms to calculate similarity, predict the MS/MS fragments, and form molecular fingerprint. Then, ML assisted MS/MS structural annotation without library matching is reviewed. Furthermore, the cases of ML algorithms in assisting structural studies of NPs based on NMR are discussed from four perspectives: NMR prediction, functional group identification, structural categorization and quantum chemical calculation. Finally, the review concludes with a discussion of the challenges and the trends associated with the structural establishment of NPs based on ML algorithms.
Collapse
Affiliation(s)
- Guilin Hu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Minghua Qiu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|
6
|
Gaudêncio SP, Bayram E, Lukić Bilela L, Cueto M, Díaz-Marrero AR, Haznedaroglu BZ, Jimenez C, Mandalakis M, Pereira F, Reyes F, Tasdemir D. Advanced Methods for Natural Products Discovery: Bioactivity Screening, Dereplication, Metabolomics Profiling, Genomic Sequencing, Databases and Informatic Tools, and Structure Elucidation. Mar Drugs 2023; 21:md21050308. [PMID: 37233502 DOI: 10.3390/md21050308] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/11/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023] Open
Abstract
Natural Products (NP) are essential for the discovery of novel drugs and products for numerous biotechnological applications. The NP discovery process is expensive and time-consuming, having as major hurdles dereplication (early identification of known compounds) and structure elucidation, particularly the determination of the absolute configuration of metabolites with stereogenic centers. This review comprehensively focuses on recent technological and instrumental advances, highlighting the development of methods that alleviate these obstacles, paving the way for accelerating NP discovery towards biotechnological applications. Herein, we emphasize the most innovative high-throughput tools and methods for advancing bioactivity screening, NP chemical analysis, dereplication, metabolite profiling, metabolomics, genome sequencing and/or genomics approaches, databases, bioinformatics, chemoinformatics, and three-dimensional NP structure elucidation.
Collapse
Affiliation(s)
- Susana P Gaudêncio
- Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal
- UCIBIO-Applied Molecular Biosciences Unit, Chemistry Department, NOVA School of Science and Technology, NOVA University of Lisbon, 2819-516 Caparica, Portugal
| | - Engin Bayram
- Institute of Environmental Sciences, Room HKC-202, Hisar Campus, Bogazici University, Bebek, Istanbul 34342, Turkey
| | - Lada Lukić Bilela
- Department of Biology, Faculty of Science, University of Sarajevo, 71000 Sarajevo, Bosnia and Herzegovina
| | - Mercedes Cueto
- Instituto de Productos Naturales y Agrobiología-CSIC, 38206 La Laguna, Spain
| | - Ana R Díaz-Marrero
- Instituto de Productos Naturales y Agrobiología-CSIC, 38206 La Laguna, Spain
- Instituto Universitario de Bio-Orgánica (IUBO), Universidad de La Laguna, 38206 La Laguna, Spain
| | - Berat Z Haznedaroglu
- Institute of Environmental Sciences, Room HKC-202, Hisar Campus, Bogazici University, Bebek, Istanbul 34342, Turkey
| | - Carlos Jimenez
- CICA- Centro Interdisciplinar de Química e Bioloxía, Departamento de Química, Facultade de Ciencias, Universidade da Coruña, 15071 A Coruña, Spain
| | - Manolis Mandalakis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, HCMR Thalassocosmos, 71500 Gournes, Crete, Greece
| | - Florbela Pereira
- LAQV, REQUIMTE, Chemistry Department, NOVA School of Science and Technology, NOVA University of Lisbon, 2819-516 Caparica, Portugal
| | - Fernando Reyes
- Fundación MEDINA, Avda. del Conocimiento 34, 18016 Armilla, Spain
| | - Deniz Tasdemir
- GEOMAR Centre for Marine Biotechnology (GEOMAR-Biotech), Research Unit Marine Natural Products Chemistry, GEOMAR Helmholtz Centre for Ocean Research Kiel, Am Kiel-Kanal 44, 24106 Kiel, Germany
- Faculty of Mathematics and Natural Science, Kiel University, Christian-Albrechts-Platz 4, 24118 Kiel, Germany
| |
Collapse
|
7
|
Chen CC, Mondal K, Vervliet P, Covaci A, O'Brien EP, Rockne KJ, Drummond JL, Hanley L. Logistic Regression Analysis of LC-MS/MS Data of Monomers Eluted from Aged Dental Composites: A Supervised Machine-Learning Approach. Anal Chem 2023; 95:5205-5213. [PMID: 36917068 DOI: 10.1021/acs.analchem.2c04362] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
Compound identification by database searching that matches experimental with library mass spectra is commonly used in mass spectrometric (MS) data analysis. Vendor software often outputs scores that represent the quality of each spectral match for the identified compounds. However, software-generated identification results can differ drastically depending on the initial search parameters. Machine learning is applied here to provide a statistical evaluation of software-generated compound identification results from experimental tandem MS data. This task was accomplished using the logistic regression algorithm to assign an identification probability value to each identified compound. Logistic regression is usually used for classification, but here it is used to generate identification probabilities without setting a threshold for classification. Liquid chromatography coupled with quadrupole-time-of-flight tandem MS was used to analyze the organic monomers leached from resin-based dental composites in a simulated oral environment. The collected tandem MS data were processed with vendor software, followed by statistical evaluation of these results using logistic regression. The assigned identification probability to each compound provides more confidence in identification beyond solely by database matching. A total of 21 distinct monomers were identified among all samples, including five intact monomers and chemical degradation products of bisphenol A glycidyl methacrylate (BisGMA), oligomers of bisphenol-A ethoxylate methacrylate (BisEMA), triethylene glycol dimethacrylate (TEGDMA), and urethane dimethacrylate (UDMA). The logistic regression model can be used to evaluate any database-matched liquid chromatography-tandem MS result by training a new model using analytical standards of compounds present in a chosen database and then generating identification probabilities for candidates from unknown data using the new model.
Collapse
Affiliation(s)
- Chien-Chia Chen
- Chemistry, University of Illinois Chicago, Chicago, Illinois 60607, United States
| | - Karabi Mondal
- Materials and Environmental Engineering, University of Illinois Chicago, Chicago, Illinois 60607, United States
| | | | - Adrian Covaci
- Toxicological Center, University of Antwerp, 2610 Wilrijk, Belgium
| | - Evan P O'Brien
- Materials and Environmental Engineering, University of Illinois Chicago, Chicago, Illinois 60607, United States
| | - Karl J Rockne
- Materials and Environmental Engineering, University of Illinois Chicago, Chicago, Illinois 60607, United States
| | - James L Drummond
- Restorative Dentistry, University of Illinois Chicago, Chicago, Illinois 60607, United States
| | - Luke Hanley
- Chemistry, University of Illinois Chicago, Chicago, Illinois 60607, United States
| |
Collapse
|
8
|
Gertner DS, Violi JP, Bishop DP, Padula MP. Lipid Spectrum Generator: A Simple Script for the Generation of Accurate In Silico Lipid Fragmentation Spectra. Anal Chem 2023; 95:2909-2916. [PMID: 36692449 DOI: 10.1021/acs.analchem.2c04518] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Due to the complexity of lipids in nature, the use of in silico generated spectral libraries to identify lipid species from mass spectral data has become an integral part of many lipidomic workflows. However, many in silico libraries are either limited in usability or their capacity to represent lipid species. Here, we introduce Lipid Spectrum Generator, an open-source in silico spectral library generator specifically designed to aid in the identification of lipids in liquid chromatography-tandem mass spectrometry analysis.
Collapse
Affiliation(s)
- David S Gertner
- School of Life Sciences and Proteomics Core Facility, Faculty of Science, University of Technology Sydney, Ultimo 2007, Australia
| | - Jake P Violi
- School of Life Sciences and Proteomics Core Facility, Faculty of Science, University of Technology Sydney, Ultimo 2007, Australia
| | - David P Bishop
- School of Mathematical and Physical Sciences, Faculty of Science, University of Technology Sydney, Ultimo 2007, Australia
| | - Matthew P Padula
- School of Life Sciences and Proteomics Core Facility, Faculty of Science, University of Technology Sydney, Ultimo 2007, Australia
| |
Collapse
|
9
|
Yesiltepe Y, Govind N, Metz TO, Renslow RS. An initial investigation of accuracy required for the identification of small molecules in complex samples using quantum chemical calculated NMR chemical shifts. J Cheminform 2022; 14:64. [PMID: 36138446 PMCID: PMC9499888 DOI: 10.1186/s13321-022-00587-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 02/06/2022] [Indexed: 11/24/2022] Open
Abstract
The majority of primary and secondary metabolites in nature have yet to be identified, representing a major challenge for metabolomics studies that currently require reference libraries from analyses of authentic compounds. Using currently available analytical methods, complete chemical characterization of metabolomes is infeasible for both technical and economic reasons. For example, unambiguous identification of metabolites is limited by the availability of authentic chemical standards, which, for the majority of molecules, do not exist. Computationally predicted or calculated data are a viable solution to expand the currently limited metabolite reference libraries, if such methods are shown to be sufficiently accurate. For example, determining nuclear magnetic resonance (NMR) spectroscopy spectra in silico has shown promise in the identification and delineation of metabolite structures. Many researchers have been taking advantage of density functional theory (DFT), a computationally inexpensive yet reputable method for the prediction of carbon and proton NMR spectra of metabolites. However, such methods are expected to have some error in predicted 13C and 1H NMR spectra with respect to experimentally measured values. This leads us to the question-what accuracy is required in predicted 13C and 1H NMR chemical shifts for confident metabolite identification? Using the set of 11,716 small molecules found in the Human Metabolome Database (HMDB), we simulated both experimental and theoretical NMR chemical shift databases. We investigated the level of accuracy required for identification of metabolites in simulated pure and impure samples by matching predicted chemical shifts to experimental data. We found 90% or more of molecules in simulated pure samples can be successfully identified when errors of 1H and 13C chemical shifts in water are below 0.6 and 7.1 ppm, respectively, and below 0.5 and 4.6 ppm in chloroform solvation, respectively. In simulated complex mixtures, as the complexity of the mixture increased, greater accuracy of the calculated chemical shifts was required, as expected. However, if the number of molecules in the mixture is known, e.g., when NMR is combined with MS and sample complexity is low, the likelihood of confident molecular identification increased by 90%.
Collapse
Affiliation(s)
- Yasemin Yesiltepe
- The Gene and Linda Voiland School of Chemical Engineering and Bioengineering, Washington State University, Pullman, WA, USA
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Niranjan Govind
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Thomas O Metz
- The Gene and Linda Voiland School of Chemical Engineering and Bioengineering, Washington State University, Pullman, WA, USA
| | - Ryan S Renslow
- The Gene and Linda Voiland School of Chemical Engineering and Bioengineering, Washington State University, Pullman, WA, USA.
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA.
| |
Collapse
|
10
|
Ljoncheva M, Stepišnik T, Kosjek T, Džeroski S. Machine learning for identification of silylated derivatives from mass spectra. J Cheminform 2022; 14:62. [PMID: 36109826 PMCID: PMC9476372 DOI: 10.1186/s13321-022-00636-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 07/31/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Motivation
Compound structure identification is using increasingly more sophisticated computational tools, among which machine learning tools are a recent addition that quickly gains in importance. These tools, of which the method titled Compound Structure Identification:Input Output Kernel Regression (CSI:IOKR) is an excellent example, have been used to elucidate compound structure from mass spectral (MS) data with significant accuracy, confidence and speed. They have, however, largely focused on data coming from liquid chromatography coupled to tandem mass spectrometry (LC–MS).
Gas chromatography coupled to mass spectrometry (GC–MS) is an alternative which offers several advantages as compared to LC–MS, including higher data reproducibility. Of special importance is the substantial compound coverage offered by GC–MS, further expanded by derivatization procedures, such as silylation, which can improve the volatility, thermal stability and chromatographic peak shape of semi-volatile analytes. Despite these advantages and the increasing size of compound databases and MS libraries, GC–MS data have not yet been used by machine learning approaches to compound structure identification.
Results
This study presents a successful application of the CSI:IOKR machine learning method for the identification of environmental contaminants from GC–MS spectra. We use CSI:IOKR as an alternative to exhaustive search of MS libraries, independent of instrumental platform and data processing software. We use a comprehensive dataset of GC–MS spectra of trimethylsilyl derivatives and their molecular structures, derived from a large commercially available MS library, to train a model that maps between spectra and molecular structures. We test the learned model on a different dataset of GC–MS spectra of trimethylsilyl derivatives of environmental contaminants, generated in-house and made publicly available. The results show that 37% (resp. 50%) of the tested compounds are correctly ranked among the top 10 (resp. 20) candidate compounds suggested by the model. Even though spectral comparisons with reference standards or de novo structural elucidations are neccessary to validate the predictions, machine learning provides efficient candidate prioritization and reduction of the time spent for compound annotation.
Collapse
|
11
|
King E, Overstreet R, Nguyen J, Ciesielski D. Augmentation of MS/MS Libraries with Spectral Interpolation for Improved Identification. J Chem Inf Model 2022; 62:3724-3733. [PMID: 35905451 PMCID: PMC9400100 DOI: 10.1021/acs.jcim.2c00620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Tandem mass spectrometry (MS/MS) is a primary tool for
the identification
of small molecules and metabolites where resultant spectra are most
commonly identified by matching them with spectra in MS/MS reference
libraries. The high degree of variability in MS/MS spectrum acquisition
techniques and parameters creates a significant challenge for building
standardized reference libraries. Here we present a method to improve
the usefulness of existing MS/MS libraries by augmenting available
experimental spectra data sets with statistically interpolated spectra
at unreported collision energies. We find that highly accurate spectral
approximations can be interpolated from as few as three experimental
spectra and that the interpolated spectra will be consistent with
true spectra gathered from the same instrument as the experimental
spectra. Supplementing existing spectral databases with interpolated
spectra yields consistent improvements to identification accuracy
on a range of instruments and precursor types. Applying this method
yields significant improvements (∼10% more spectra correctly
identified) on large data sets (2000–10 000 spectra),
indicating this is a quick yet adept tool for improving spectral matching
in situations where available reference libraries are not yet sufficient.
We also find improvements of matching spectra across instrument types
(between an Agilent Q-TOF and an Orbitrap Elite), at high collision
energies (50–90 eV), and with smaller data sets available through
MassBank.
Collapse
Affiliation(s)
- Ethan King
- Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Richard Overstreet
- Signature Science and Technology Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Julia Nguyen
- Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Danielle Ciesielski
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
12
|
Tian Z, Liu F, Li D, Fernie AR, Chen W. Strategies for structure elucidation of small molecules based on LC–MS/MS data from complex biological samples. Comput Struct Biotechnol J 2022; 20:5085-5097. [PMID: 36187931 PMCID: PMC9489805 DOI: 10.1016/j.csbj.2022.09.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 09/03/2022] [Accepted: 09/03/2022] [Indexed: 11/06/2022] Open
Abstract
LC–MS/MS is a major analytical platform for metabolomics, which has become a recent hotspot in the research fields of life and environmental sciences. By contrast, structure elucidation of small molecules based on LC–MS/MS data remains a major challenge in the chemical and biological interpretation of untargeted metabolomics datasets. In recent years, several strategies for structure elucidation using LC–MS/MS data from complex biological samples have been proposed, these strategies can be simply categorized into two types, one based on structure annotation of mass spectra and for the other on retention time prediction. These strategies have helped many scientists conduct research in metabolite-related fields and are indispensable for the development of future tools. Here, we summarized the characteristics of the current tools and strategies for structure elucidation of small molecules based on LC–MS/MS data, and further discussed the directions and perspectives to improve the power of the tools or strategies for structure elucidation.
Collapse
|
13
|
Wang F, Liigand J, Tian S, Arndt D, Greiner R, Wishart DS. CFM-ID 4.0: More Accurate ESI-MS/MS Spectral Prediction and Compound Identification. Anal Chem 2021; 93:11692-11700. [PMID: 34403256 PMCID: PMC9064193 DOI: 10.1021/acs.analchem.1c01465] [Citation(s) in RCA: 180] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
In the field of metabolomics, mass spectrometry (MS) is the method most commonly used for identifying and annotating metabolites. As this typically involves matching a given MS spectrum against an experimentally acquired reference spectral library, this approach is limited by the coverage and size of such libraries (which typically number in the thousands). These experimental libraries can be greatly extended by predicting the MS spectra of known chemical structures (which number in the millions) to create computational reference spectral libraries. To facilitate the generation of predicted spectral reference libraries, we developed CFM-ID, a computer program that can accurately predict ESI-MS/MS spectrum for a given compound structure. CFM-ID is one of the best-performing methods for compound-to-mass-spectrum prediction and also one of the top tools for in silico mass-spectrum-to-compound identification. This work improves CFM-ID's ability to predict ESI-MS/MS spectra from compounds by (1) learning parameters from features based on the molecular topology, (2) adding a new approach to ring cleavage that models such cleavage as a sequence of simple chemical bond dissociations, and (3) expanding its hand-written rule-based predictor to cover more chemical classes, including acylcarnitines, acylcholines, flavonols, flavones, flavanones, and flavonoid glycosides. We demonstrate that this new version of CFM-ID (version 4.0) is significantly more accurate than previous CFM-ID versions in terms of both EI-MS/MS spectral prediction and compound identification. CFM-ID 4.0 is available at http://cfmid4.wishartlab.com/ as a web server and docker images can be downloaded at https://hub.docker.com/r/wishartlab/cfmid.
Collapse
Affiliation(s)
- Fei Wang
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Alberta Machine Intelligence Institute, Edmonton, AB T5J 3B1, Canada
| | - Jaanus Liigand
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Institute of Chemistry, University of Tartu, Tartu 50411, Estonia
| | - Siyang Tian
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
| | - David Arndt
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
| | - Russell Greiner
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Department of Psychiatry, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Alberta Machine Intelligence Institute, Edmonton, AB T5J 3B1, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| |
Collapse
|
14
|
Shah HA, Liu J, Yang Z, Feng J. Review of Machine Learning Methods for the Prediction and Reconstruction of Metabolic Pathways. Front Mol Biosci 2021; 8:634141. [PMID: 34222327 PMCID: PMC8247443 DOI: 10.3389/fmolb.2021.634141] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 06/01/2021] [Indexed: 11/13/2022] Open
Abstract
Prediction and reconstruction of metabolic pathways play significant roles in many fields such as genetic engineering, metabolic engineering, drug discovery, and are becoming the most active research topics in synthetic biology. With the increase of related data and with the development of machine learning techniques, there have many machine leaning based methods been proposed for prediction or reconstruction of metabolic pathways. Machine learning techniques are showing state-of-the-art performance to handle the rapidly increasing volume of data in synthetic biology. To support researchers in this field, we briefly review the research progress of metabolic pathway reconstruction and prediction based on machine learning. Some challenging issues in the reconstruction of metabolic pathways are also discussed in this paper.
Collapse
Affiliation(s)
- Hayat Ali Shah
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Zhihui Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Jing Feng
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| |
Collapse
|
15
|
Krettler CA, Thallinger GG. A map of mass spectrometry-based in silico fragmentation prediction and compound identification in metabolomics. Brief Bioinform 2021; 22:6184408. [PMID: 33758925 DOI: 10.1093/bib/bbab073] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/29/2021] [Accepted: 02/12/2021] [Indexed: 12/27/2022] Open
Abstract
Metabolomics, the comprehensive study of the metabolome, and lipidomics-the large-scale study of pathways and networks of cellular lipids-are major driving forces in enabling personalized medicine. Complicated and error-prone data analysis still remains a bottleneck, however, especially for identifying novel metabolites. Comparing experimental mass spectra to curated databases containing reference spectra has been the gold standard for identification of compounds, but constructing such databases is a costly and time-demanding task. Many software applications try to circumvent this process by utilizing cutting-edge advances in computational methods-including quantum chemistry and machine learning-and simulate mass spectra by performing theoretical, so called in silico fragmentations of compounds. Other solutions concentrate directly on experimental spectra and try to identify structural properties by investigating reoccurring patterns and the relationships between them. The considerable progress made in the field allows recent approaches to provide valuable clues to expedite annotation of experimental mass spectra. This review sheds light on individual strengths and weaknesses of these tools, and attempts to evaluate them-especially in view of lipidomics, when considering complex mixtures found in biological samples as well as mass spectrometer inter-instrument variability.
Collapse
Affiliation(s)
- Christoph A Krettler
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| | - Gerhard G Thallinger
- Institute of Biomedical Informatics, Graz University of Technology, Stremayrgasse 16/I, 8010, Graz, Austria.,Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010, Graz, Austria
| |
Collapse
|
16
|
Li Y, Kuhn M, Gavin AC, Bork P. Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features. Bioinformatics 2020; 36:1213-1218. [PMID: 31605112 PMCID: PMC7703789 DOI: 10.1093/bioinformatics/btz736] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/30/2019] [Accepted: 09/25/2019] [Indexed: 01/11/2023] Open
Abstract
Motivation Untargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites’ structures from MS/MS spectra is still a great challenge. Results We present a new analysis method, called SubFragment-Matching (SF-Matching) that is based on the hypothesis that molecules with similar structural features will exhibit similar fragmentation patterns. We combine information on fragmentation patterns of molecules with shared substructures and then use random forest models to predict whether a given structure can yield a certain fragmentation pattern. These models can then be used to score candidate molecules for a given mass spectrum. For rapid identification, we pre-compute such scores for common biological molecular structure databases. Using benchmarking datasets, we find that our method has similar performance to CSI: FingerID and those very high accuracies can be achieved by combining our method with CSI: FingerID. Rarefaction analysis of the training dataset shows that the performance of our method will increase as more experimental data become available. Availability and implementation SF-Matching is available from http://www.bork.embl.de/Docu/sf_matching. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuanyue Li
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Anne-Claude Gavin
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit (MMPU), 69117 Heidelberg, Germany
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit (MMPU), 69117 Heidelberg, Germany.,Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany.,Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| |
Collapse
|
17
|
Liu FJ, Jiang Y, Li P, Liu YD, Xin GZ, Yao ZP, Li HJ. Diagnostic fragmentation-assisted mass spectral networking coupled with in silico dereplication for deep annotation of steroidal alkaloids in medicinal Fritillariae Bulbus. JOURNAL OF MASS SPECTROMETRY : JMS 2020; 55:e4528. [PMID: 32559823 DOI: 10.1002/jms.4528] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 04/03/2020] [Accepted: 04/07/2020] [Indexed: 06/11/2023]
Abstract
Fully understanding the chemicals in an herbal medicine remains a challenging task. Molecular networking (MN) allows to organize tandem mass spectrometry (MS/MS) data in complex samples by mass spectral similarity, which yet suffers from low coverage and accuracy of compound annotation due to the size limitation of available databases and differentiation obstacle of similar chemical scaffolds. In this work, an enhanced MN-based strategy named diagnostic fragmentation-assisted molecular networking coupled with in silico dereplication (DFMN-ISD) was introduced to overcome these obstacles: the rule-based fragmentation patterns provide insights into similar chemical scaffolds, the generated in silico candidates based on metabolic reactions expand the available natural product databases, and the in silico annotation method facilitates the further dereplication of candidates by computing their fragmentation trees. As a case, this approach was applied to globally profile the steroidal alkaloids in Fritillariae bulbus, a commonly used antitussive and expectorant herbal medicine. Consequently, a total of 325 steroidal alkaloids were discovered, including 106 cis-D/E-cevanines, 142 trans-D/E-cevanines, 29 jervines, 23 veratramines, and 25 verazines. And 10 of them were confirmed by available reference standards. Approximately 70% of the putative steroidal alkaloids have never been reported in previous publications, demonstrating the benefit of DFMN-ISD approach for the comprehensive characterization of chemicals in a complex plant organism.
Collapse
Affiliation(s)
- Feng-Jie Liu
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, 210009, China
| | - Yan Jiang
- College of Chemical Engineering, Nanjing Forestry University, Nanjing, China
| | - Ping Li
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, 210009, China
| | - Yang-Dan Liu
- School of Pharmacy, Shenyang Pharmaceutical University, Shenyang, China
| | - Gui-Zhong Xin
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, 210009, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, Food Safety and Technology Research Centre and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Kowloon, China
- State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation) and Shenzhen Key Laboratory of Food Biological Safety Control, Shenzhen Research Institute of Hong Kong Polytechnic University, Shenzhen, China
| | - Hui-Jun Li
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, 210009, China
| |
Collapse
|
18
|
|
19
|
Hwang H, Jeong HK, Lee HK, Park GW, Lee JY, Lee SY, Kang YM, An HJ, Kang JG, Ko JH, Kim JY, Yoo JS. Machine Learning Classifies Core and Outer Fucosylation of N-Glycoproteins Using Mass Spectrometry. Sci Rep 2020; 10:318. [PMID: 31941975 PMCID: PMC6962204 DOI: 10.1038/s41598-019-57274-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 12/27/2019] [Indexed: 12/14/2022] Open
Abstract
Protein glycosylation is known to be involved in biological progresses such as cell recognition, growth, differentiation, and apoptosis. Fucosylation of glycoproteins plays an important role for structural stability and function of N-linked glycoproteins. Although many of biological and clinical studies of protein fucosylation by fucosyltransferases has been reported, structural classification of fucosylated N-glycoproteins such as core or outer isoforms remains a challenge. Here, we report for the first time the classification of N-glycopeptides as core- and outer-fucosylated types using tandem mass spectrometry (MS/MS) and machine learning algorithms such as the deep neural network (DNN) and support vector machine (SVM). Training and test sets of more than 800 MS/MS spectra of N-glycopeptides from the immunoglobulin gamma and alpha 1-acid-glycoprotein standards were selected for classification of the fucosylation types using supervised learning models. The best-performing model had an accuracy of more than 99% against manual characterization and area under the curve values greater than 0.99, which were calculated by probability scores from target and decoy datasets. Finally, this model was applied to classify fucosylated N-glycoproteins from human plasma. A total of 82N-glycopeptides, with 54 core-, 24 outer-, and 4 dual-fucosylation types derived from 54 glycoproteins, were commonly classified as the same type in both the DNN and SVM. Specifically, outer fucosylation was dominant in tri- and tetra-antennary N-glycopeptides, while core fucosylation was dominant in the mono-, bi-antennary and hybrid types of N-glycoproteins in human plasma. Thus, the machine learning methods can be combined with MS/MS to distinguish between different isoforms of fucosylated N-glycopeptides.
Collapse
Affiliation(s)
- Heeyoun Hwang
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea
| | - Hoi Keun Jeong
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea
- Graduate School of Analytical Science and Technology, Chungnam National University, Daejeon, 34134, Republic of Korea
| | - Hyun Kyoung Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea
- Graduate School of Analytical Science and Technology, Chungnam National University, Daejeon, 34134, Republic of Korea
| | - Gun Wook Park
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea
| | - Ju Yeon Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea
| | - Soo Youn Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea
| | - Young-Mook Kang
- Drug Information Platform Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Korea
| | - Hyun Joo An
- Graduate School of Analytical Science and Technology, Chungnam National University, Daejeon, 34134, Republic of Korea
- Asia Glycomics Reference Site, Chungnam National University, Daejeon, 34134, Republic of Korea
| | - Jeong Gu Kang
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, 34141, Republic of Korea
| | - Jeong-Heon Ko
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, 34141, Republic of Korea
- Department of Biomolecular Science, Korea University of Science and Technology (UST), Daejeon, 34113, Republic of Korea
| | - Jin Young Kim
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea.
| | - Jong Shin Yoo
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju, 28119, Republic of Korea.
- Graduate School of Analytical Science and Technology, Chungnam National University, Daejeon, 34134, Republic of Korea.
| |
Collapse
|
20
|
Nguyen DH, Nguyen CH, Mamitsuka H. Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches. Brief Bioinform 2019; 20:2028-2043. [PMID: 30099485 PMCID: PMC6954430 DOI: 10.1093/bib/bby066] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 06/14/2018] [Accepted: 07/03/2018] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Metabolomics involves studies of a great number of metabolites, which are small molecules present in biological systems. They play a lot of important functions such as energy transport, signaling, building block of cells and inhibition/catalysis. Understanding biochemical characteristics of the metabolites is an essential and significant part of metabolomics to enlarge the knowledge of biological systems. It is also the key to the development of many applications and areas such as biotechnology, biomedicine or pharmaceuticals. However, the identification of the metabolites remains a challenging task in metabolomics with a huge number of potentially interesting but unknown metabolites. The standard method for identifying metabolites is based on the mass spectrometry (MS) preceded by a separation technique. Over many decades, many techniques with different approaches have been proposed for MS-based metabolite identification task, which can be divided into the following four groups: mass spectra database, in silico fragmentation, fragmentation tree and machine learning. In this review paper, we thoroughly survey currently available tools for metabolite identification with the focus on in silico fragmentation, and machine learning-based approaches. We also give an intensive discussion on advanced machine learning methods, which can lead to further improvement on this task.
Collapse
Affiliation(s)
- Dai Hai Nguyen
- Department of machine learning and bioinformatics, Bioinformatics Center, Kyoto University, Uji, Japan
| | - Canh Hao Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan
- Department of Computer Science, Aalto University, Otakaari, FI, Finland
| |
Collapse
|
21
|
Ni Z, Goracci L, Cruciani G, Fedorova M. Computational solutions in redox lipidomics - Current strategies and future perspectives. Free Radic Biol Med 2019; 144:110-123. [PMID: 31035005 DOI: 10.1016/j.freeradbiomed.2019.04.027] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/15/2019] [Accepted: 04/23/2019] [Indexed: 12/31/2022]
Abstract
The high chemical diversity of lipids allows them to perform multiple biological functions ranging from serving as structural building blocks of biological membranes to regulation of metabolism and signal transduction. In addition to the native lipidome, lipid species derived from enzymatic and non-enzymatic modifications (the epilipidome) make the overall picture even more complex, as their functions are still largely unknown. Oxidized lipids represent the fraction of epilipidome which has attracted high scientific attention due to their apparent involvement in the onset and development of numerous human disorders. Development of high-throughput analytical methods such as liquid chromatography coupled on-line to mass spectrometry provides the possibility to address epilipidome diversity in complex biological samples. However, the main bottleneck of redox lipidomics, the branch of lipidomics dealing with the characterization of oxidized lipids, remains the lack of optimal computational tools for robust, accurate and specific identification of already discovered and yet unknown modified lipids. Here we discuss the main principles of high-throughput identification of lipids and their modified forms and review the main software tools currently available in redox lipidomics. Different levels of confidence for software assisted identification of redox lipidome are defined and necessary steps toward optimal computational solutions are proposed.
Collapse
Affiliation(s)
- Zhixu Ni
- Institute of Bioanalytical Chemistry, Faculty of Chemistry and Mineralogy, University of Leipzig, Germany; Center for Biotechnology and Biomedicine, University of Leipzig, Deutscher Platz 5, Leipzig, Germany
| | - Laura Goracci
- Department of Chemistry, Biology and Biotechnology, University of Perugia, via Elce di Sotto 8, 06123 Perugia, Italy; Consortium for Computational Molecular and Materials Sciences (CMS), via Elce di Sotto 8, 06123 Perugia, Italy
| | - Gabriele Cruciani
- Department of Chemistry, Biology and Biotechnology, University of Perugia, via Elce di Sotto 8, 06123 Perugia, Italy; Consortium for Computational Molecular and Materials Sciences (CMS), via Elce di Sotto 8, 06123 Perugia, Italy
| | - Maria Fedorova
- Institute of Bioanalytical Chemistry, Faculty of Chemistry and Mineralogy, University of Leipzig, Germany; Center for Biotechnology and Biomedicine, University of Leipzig, Deutscher Platz 5, Leipzig, Germany.
| |
Collapse
|
22
|
Moumbock AFA, Ntie-Kang F, Akone SH, Li J, Gao M, Telukunta KK, Günther S. An overview of tools, software, and methods for natural product fragment and mass spectral analysis. PHYSICAL SCIENCES REVIEWS 2019. [DOI: 10.1515/psr-2018-0126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Abstract
One major challenge in natural product (NP) discovery is the determination of the chemical structure of unknown metabolites using automated software tools from either GC–mass spectrometry (MS) or liquid chromatography–MS/MS data only. This chapter reviews the existing spectral libraries and predictive computational tools used in MS-based untargeted metabolomics, which is currently a hot topic in NP structure elucidation. We begin by focusing on spectral databases and the general workflow of MS annotation. We then describe software and tools used in MS, particularly those used to predict fragmentation patterns, mass spectral classifiers, and tools for fragmentation trees analysis. We then round up the chapter by looking at more advanced approaches implemented in tools for competitive fragmentation modeling and quantum chemical approaches.
Collapse
|
23
|
Colby SM, Thomas DG, Nuñez JR, Baxter DJ, Glaesemann KR, Brown JM, Pirrung MA, Govind N, Teeguarden JG, Metz TO, Renslow RS. ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries. Anal Chem 2019; 91:4346-4356. [PMID: 30741529 PMCID: PMC6526953 DOI: 10.1021/acs.analchem.8b04567] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
High-throughput, comprehensive, and confident identifications of metabolites and other chemicals in biological and environmental samples will revolutionize our understanding of the role these chemically diverse molecules play in biological systems. Despite recent technological advances, metabolomics studies still result in the detection of a disproportionate number of features that cannot be confidently assigned to a chemical structure. This inadequacy is driven by the single most significant limitation in metabolomics, the reliance on reference libraries constructed by analysis of authentic reference materials with limited commercial availability. To this end, we have developed the in silico chemical library engine (ISiCLE), a high-performance computing-friendly cheminformatics workflow for generating libraries of chemical properties. In the instantiation described here, we predict probable three-dimensional molecular conformers (i.e., conformational isomers) using chemical identifiers as input, from which collision cross sections (CCS) are derived. The approach employs first-principles simulation, distinguished by the use of molecular dynamics, quantum chemistry, and ion mobility calculations, to generate structures and chemical property libraries, all without training data. Importantly, optimization of ISiCLE included a refactoring of the popular MOBCAL code for trajectory-based mobility calculations, improving its computational efficiency by over 2 orders of magnitude. Calculated CCS values were validated against 1983 experimentally measured CCS values and compared to previously reported CCS calculation approaches. Average calculated CCS error for the validation set is 3.2% using standard parameters, outperforming other density functional theory (DFT)-based methods and machine learning methods (e.g., MetCCS). An online database is introduced for sharing both calculated and experimental CCS values ( metabolomics.pnnl.gov ), initially including a CCS library with over 1 million entries. Finally, three successful applications of molecule characterization using calculated CCS are described, including providing evidence for the presence of an environmental degradation product, the separation of molecular isomers, and an initial characterization of complex blinded mixtures of exposure chemicals. This work represents a method to address the limitations of small molecule identification and offers an alternative to generating chemical identification libraries experimentally by analyzing authentic reference materials. All code is available at github.com/pnnl .
Collapse
Affiliation(s)
- Sean M. Colby
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Dennis G. Thomas
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Jamie R. Nuñez
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Douglas J. Baxter
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Kurt R. Glaesemann
- Communications and Information Technology Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Joseph M. Brown
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Meg A. Pirrung
- National Security Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Niranjan Govind
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Justin G. Teeguarden
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon 97331, United States
| | - Thomas O. Metz
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Ryan S. Renslow
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
24
|
Ivanova BB, Spiteller M. On the [2+2] cycloaddition reaction of configurationally locked polyenes – An experimental and theoretical study. J Mol Struct 2018. [DOI: 10.1016/j.molstruc.2018.05.064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
25
|
Pereira F, Aires-de-Sousa J. Computational Methodologies in the Exploration of Marine Natural Product Leads. Mar Drugs 2018; 16:md16070236. [PMID: 30011882 PMCID: PMC6070892 DOI: 10.3390/md16070236] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 07/02/2018] [Accepted: 07/06/2018] [Indexed: 12/18/2022] Open
Abstract
Computational methodologies are assisting the exploration of marine natural products (MNPs) to make the discovery of new leads more efficient, to repurpose known MNPs, to target new metabolites on the basis of genome analysis, to reveal mechanisms of action, and to optimize leads. In silico efforts in drug discovery of NPs have mainly focused on two tasks: dereplication and prediction of bioactivities. The exploration of new chemical spaces and the application of predicted spectral data must be included in new approaches to select species, extracts, and growth conditions with maximum probabilities of medicinal chemistry novelty. In this review, the most relevant current computational dereplication methodologies are highlighted. Structure-based (SB) and ligand-based (LB) chemoinformatics approaches have become essential tools for the virtual screening of NPs either in small datasets of isolated compounds or in large-scale databases. The most common LB techniques include Quantitative Structure–Activity Relationships (QSAR), estimation of drug likeness, prediction of adsorption, distribution, metabolism, excretion, and toxicity (ADMET) properties, similarity searching, and pharmacophore identification. Analogously, molecular dynamics, docking and binding cavity analysis have been used in SB approaches. Their significance and achievements are the main focus of this review.
Collapse
Affiliation(s)
- Florbela Pereira
- LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
| | - Joao Aires-de-Sousa
- LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
| |
Collapse
|
26
|
Giles C, Takechi R, Lam V, Dhaliwal SS, Mamo JCL. Contemporary lipidomic analytics: opportunities and pitfalls. Prog Lipid Res 2018; 71:86-100. [PMID: 29959947 DOI: 10.1016/j.plipres.2018.06.003] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 05/18/2018] [Accepted: 06/26/2018] [Indexed: 01/08/2023]
Abstract
Recent advances in analytical techniques have greatly enhanced the depth of coverage, however lipidomic studies are still restricted to analysing only a subset of known lipids. Numerous complementary techniques are used for investigation of cellular lipidomes, including mass spectrometry (MS), nuclear magnetic resonance and vibrational spectroscopy. The development in electrospray ionization (ESI) MS has accelerated lipidomics research in the past two decades and represents one of the most widely used technique. The versatility of ESI-MS systems allows development of methods to detect and quantify a large diversity of lipid species and classes. However, highly targeted and specific approaches can preclude global analysis of many lipid classes. Indeed, experimental procedures are generally optimised for the lipid species, or lipid class of interest. Therefore, careful consideration of experimental procedures is required for characterisation of biological lipidomes. The current review will describe the lipidomic approaches for considering tissue lipid physiology. Discussion of the main sequences in a lipidomics workflow will be presented, including preparation of samples, accurate quantitation of lipid species and statistical modelling.
Collapse
Affiliation(s)
- Corey Giles
- Curtin Health Innovation Research Institute, Curtin University, WA, Australia; School of Public Health, Faculty of Health Sciences, Curtin University, WA, Australia
| | - Ryusuke Takechi
- Curtin Health Innovation Research Institute, Curtin University, WA, Australia; School of Public Health, Faculty of Health Sciences, Curtin University, WA, Australia
| | - Virginie Lam
- Curtin Health Innovation Research Institute, Curtin University, WA, Australia; School of Public Health, Faculty of Health Sciences, Curtin University, WA, Australia
| | - Satvinder S Dhaliwal
- Curtin Health Innovation Research Institute, Curtin University, WA, Australia; School of Public Health, Faculty of Health Sciences, Curtin University, WA, Australia
| | - John C L Mamo
- Curtin Health Innovation Research Institute, Curtin University, WA, Australia; School of Public Health, Faculty of Health Sciences, Curtin University, WA, Australia.
| |
Collapse
|
27
|
Godzien J, Gil de la Fuente A, Otero A, Barbas C. Metabolite Annotation and Identification. COMPREHENSIVE ANALYTICAL CHEMISTRY 2018. [DOI: 10.1016/bs.coac.2018.07.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
28
|
Organosilver(I) and organozinc(II) catalysed synthesis of quaterphenyls – Experimental and theoretical treatment. J Organomet Chem 2017. [DOI: 10.1016/j.jorganchem.2017.09.035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
29
|
Hufsky F, Böcker S. Mining molecular structure databases: Identification of small molecules based on fragmentation mass spectrometry data. MASS SPECTROMETRY REVIEWS 2017; 36:624-633. [PMID: 26763615 DOI: 10.1002/mas.21489] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 12/18/2015] [Indexed: 06/05/2023]
Abstract
Mass spectrometry (MS) is a key technology for the analysis of small molecules. For the identification and structural elucidation of novel molecules, new approaches beyond straightforward spectral comparison are required. In this review, we will cover computational methods that help with the identification of small molecules by analyzing fragmentation MS data. We focus on the four main approaches to mine a database of metabolite structures, that is rule-based fragmentation spectrum prediction, combinatorial fragmentation, competitive fragmentation modeling, and molecular fingerprint prediction. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:624-633, 2017.
Collapse
Affiliation(s)
- Franziska Hufsky
- Lehrstuhl für Bioinformatik, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, Jena, 07743, Germany
- Bioinformatik für Hochdurchsatzverfahren, Friedrich-Schiller-Universität Jena, Leutragraben 1, Jena, 07743, Germany
| | - Sebastian Böcker
- Lehrstuhl für Bioinformatik, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, Jena, 07743, Germany
| |
Collapse
|
30
|
Edmands WMB, Petrick L, Barupal DK, Scalbert A, Wilson MJ, Wickliffe JK, Rappaport SM. compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC-MS Data Sets. Anal Chem 2017; 89:3919-3928. [PMID: 28225587 PMCID: PMC6338221 DOI: 10.1021/acs.analchem.6b02394] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
A long-standing challenge of untargeted metabolomic profiling by ultrahigh-performance liquid chromatography-high-resolution mass spectrometry (UHPLC-HRMS) is efficient transition from unknown mass spectral features to confident metabolite annotations. The compMS2Miner (Comprehensive MS2 Miner) package was developed in the R language to facilitate rapid, comprehensive feature annotation using a peak-picker-output and MS2 data files as inputs. The number of MS2 spectra that can be collected during a metabolomic profiling experiment far outweigh the amount of time required for pain-staking manual interpretation; therefore, a degree of software workflow autonomy is required for broad-scale metabolite annotation. CompMS2Miner integrates many useful tools in a single workflow for metabolite annotation and also provides a means to overview the MS2 data with a Web application GUI compMS2Explorer (Comprehensive MS2 Explorer) that also facilitates data-sharing and transparency. The automatable compMS2Miner workflow consists of the following steps: (i) matching unknown MS1 features to precursor MS2 scans, (ii) filtration of spectral noise (dynamic noise filter), (iii) generation of composite mass spectra by multiple similar spectrum signal summation and redundant/contaminant spectra removal, (iv) interpretation of possible fragment ion substructure using an internal database, (v) annotation of unknowns with chemical and spectral databases with prediction of mammalian biotransformation metabolites, wrapper functions for in silico fragmentation software, nearest neighbor chemical similarity scoring, random forest based retention time prediction, text-mining based false positive removal/true positive ranking, chemical taxonomic prediction and differential evolution based global annotation score optimization, and (vi) network graph visualizations, data curation, and sharing are made possible via the compMS2Explorer application. Metabolite identities and comments can also be recorded using an interactive table within compMS2Explorer. The utility of the package is illustrated with a data set of blood serum samples from 7 diet induced obese (DIO) and 7 nonobese (NO) C57BL/6J mice, which were also treated with an antibiotic (streptomycin) to knockdown the gut microbiota. The results of fully autonomous and objective usage of compMS2Miner are presented here. All automatically annotated spectra output by the workflow are provided in the Supporting Information and can alternatively be explored as publically available compMS2Explorer applications for both positive and negative modes ( https://wmbedmands.shinyapps.io/compMS2_mouseSera_POS and https://wmbedmands.shinyapps.io/compMS2_mouseSera_NEG ). The workflow provided rapid annotation of a diversity of endogenous and gut microbially derived metabolites affected by both diet and antibiotic treatment, which conformed to previously published reports. Composite spectra (n = 173) were autonomously matched to entries of the Massbank of North America (MoNA) spectral repository. These experimental and virtual (lipidBlast) spectra corresponded to 29 common endogenous compound classes (e.g., 51 lysophosphatidylcholines spectra) and were then used to calculate the ranking capability of 7 individual scoring metrics. It was found that an average of the 7 individual scoring metrics provided the most effective weighted average ranking ability of 3 for the MoNA matched spectra in spite of potential risk of false positive annotations emerging from automation. Minor structural differences such as relative carbon-carbon double bond positions were found in several cases to affect the correct rank of the MoNA annotated metabolite. The latest release and an example workflow is available in the package vignette ( https://github.com/WMBEdmands/compMS2Miner ) and a version of the published application is available on the shinyapps.io site ( https://wmbedmands.shinyapps.io/compMS2Example ).
Collapse
Affiliation(s)
- William M. B. Edmands
- Rappaport Lab, UC Berkeley, School of Public Health, GL81 Koshland Hall, Berkeley, California 94720, United States
| | - Lauren Petrick
- Rappaport Lab, UC Berkeley, School of Public Health, GL81 Koshland Hall, Berkeley, California 94720, United States
| | - Dinesh K. Barupal
- Metabolomics FiehnLab, NIH West-Coast Metabolomics Center (WCMC), University of California Davis, Davis, California 95616 United States
| | - Augustin Scalbert
- International Agency for Research on Cancer (IARC), Nutrition and Metabolism Section (NME), Biomarkers Group (BMA), 150 Cours Albert Thomas, F-69372 Lyon Cedex 08, France
| | - Mark J. Wilson
- Department of Global Environmental Health Sciences, Tulane University, 1440 Canal Street, Suite 2100 No. 8360, New Orleans, Louisiana 70112 United States
| | - Jeffrey K. Wickliffe
- Department of Global Environmental Health Sciences, Tulane University, 1440 Canal Street, Suite 2100 No. 8360, New Orleans, Louisiana 70112 United States
| | - Stephen M. Rappaport
- Rappaport Lab, UC Berkeley, School of Public Health, GL81 Koshland Hall, Berkeley, California 94720, United States
| |
Collapse
|
31
|
Metz TO, Baker ES, Schymanski EL, Renslow RS, Thomas DG, Causon TJ, Webb IK, Hann S, Smith RD, Teeguarden JG. Integrating ion mobility spectrometry into mass spectrometry-based exposome measurements: what can it add and how far can it go? Bioanalysis 2017; 9:81-98. [PMID: 27921453 PMCID: PMC5674211 DOI: 10.4155/bio-2016-0244] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 10/12/2016] [Indexed: 01/01/2023] Open
Abstract
Measuring the exposome remains a challenge due to the range and number of anthropogenic molecules that are encountered in our daily lives, as well as the complex systemic responses to these exposures. One option for improving the coverage, dynamic range and throughput of measurements is to incorporate ion mobility spectrometry (IMS) into current MS-based analytical methods. The implementation of IMS in exposomics studies will lead to more frequent observations of previously undetected chemicals and metabolites. LC-IMS-MS will provide increased overall measurement dynamic range, resulting in detections of lower abundance molecules. Alternatively, the throughput of IMS-MS alone will provide the opportunity to analyze many thousands of longitudinal samples over lifetimes of exposure, capturing evidence of transitory accumulations of chemicals or metabolites. The volume of data corresponding to these new chemical observations will almost certainly outpace the generation of reference data to enable their confident identification. In this perspective, we briefly review the state-of-the-art in measuring the exposome, and discuss the potential use for IMS-MS and the physico-chemical property of collisional cross section in both exposure assessment and molecular identification.
Collapse
Affiliation(s)
- Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Erin S Baker
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Emma L Schymanski
- Eawag, Swiss Federal Institute of Aquatic Science & Technology, Dübendorf, Switzerland
| | - Ryan S Renslow
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Dennis G Thomas
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Tim J Causon
- Division of Analytical Chemistry, Department of Chemistry, University of Natural Resources & Life Sciences (BOKU Vienna), Vienna, Austria
| | - Ian K Webb
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Stephan Hann
- Division of Analytical Chemistry, Department of Chemistry, University of Natural Resources & Life Sciences (BOKU Vienna), Vienna, Austria
| | - Richard D Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Justin G Teeguarden
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
- Department of Environmental & Molecular Toxicology, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
32
|
Ivanova B, Spiteller M. Collision-induced thermochemistry of reactions of dissociation of glycyl-homopeptides-An experimental and theoretical analysis. Biopolymers 2016; 107:80-89. [DOI: 10.1002/bip.22996] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 09/24/2016] [Accepted: 09/30/2016] [Indexed: 02/04/2023]
Affiliation(s)
- Bojidarka Ivanova
- Lehrstuhl für Analytische Chemie, Institut für Umweltforschung, Fakultät für Chemie und Chemische Biologie, Universität Dortmund; Otto-Hahn-Straße 6 44221 Dortmund Nordrhein-Westfalen Deutschland
| | - Michael Spiteller
- Lehrstuhl für Analytische Chemie, Institut für Umweltforschung, Fakultät für Chemie und Chemische Biologie, Universität Dortmund; Otto-Hahn-Straße 6 44221 Dortmund Nordrhein-Westfalen Deutschland
| |
Collapse
|
33
|
Allen F, Pon A, Greiner R, Wishart D. Computational Prediction of Electron Ionization Mass Spectra to Assist in GC/MS Compound Identification. Anal Chem 2016; 88:7689-97. [PMID: 27381172 DOI: 10.1021/acs.analchem.6b01622] [Citation(s) in RCA: 104] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We describe a tool, competitive fragmentation modeling for electron ionization (CFM-EI) that, given a chemical structure (e.g., in SMILES or InChI format), computationally predicts an electron ionization mass spectrum (EI-MS) (i.e., the type of mass spectrum commonly generated by gas chromatography mass spectrometry). The predicted spectra produced by this tool can be used for putative compound identification, complementing measured spectra in reference databases by expanding the range of compounds able to be considered when availability of measured spectra is limited. The tool extends CFM-ESI, a recently developed method for computational prediction of electrospray tandem mass spectra (ESI-MS/MS), but unlike CFM-ESI, CFM-EI can handle odd-electron ions and isotopes and incorporates an artificial neural network. Tests on EI-MS data from the NIST database demonstrate that CFM-EI is able to model fragmentation likelihoods in low-resolution EI-MS data, producing predicted spectra whose dot product scores are significantly better than full enumeration "bar-code" spectra. CFM-EI also outperformed previously reported results for MetFrag, MOLGEN-MS, and Mass Frontier on one compound identification task. It also outperformed MetFrag in a range of other compound identification tasks involving a much larger data set, containing both derivatized and nonderivatized compounds. While replicate EI-MS measurements of chemical standards are still a more accurate point of comparison, CFM-EI's predictions provide a much-needed alternative when no reference standard is available for measurement. CFM-EI is available at https://sourceforge.net/projects/cfm-id/ for download and http://cfmid.wishartlab.com as a web service.
Collapse
Affiliation(s)
- Felicity Allen
- Department of Computing Science, University of Alberta , Edmonton T6G 2E8, Canada
| | - Allison Pon
- Department of Computing Science, University of Alberta , Edmonton T6G 2E8, Canada
| | - Russ Greiner
- Department of Computing Science, University of Alberta , Edmonton T6G 2E8, Canada
| | - David Wishart
- Department of Computing Science, University of Alberta , Edmonton T6G 2E8, Canada
| |
Collapse
|
34
|
Yi L, Dong N, Yun Y, Deng B, Ren D, Liu S, Liang Y. Chemometric methods in data processing of mass spectrometry-based metabolomics: A review. Anal Chim Acta 2016; 914:17-34. [PMID: 26965324 DOI: 10.1016/j.aca.2016.02.001] [Citation(s) in RCA: 173] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 01/28/2016] [Accepted: 02/01/2016] [Indexed: 01/03/2023]
Abstract
This review focuses on recent and potential advances in chemometric methods in relation to data processing in metabolomics, especially for data generated from mass spectrometric techniques. Metabolomics is gradually being regarded a valuable and promising biotechnology rather than an ambitious advancement. Herein, we outline significant developments in metabolomics, especially in the combination with modern chemical analysis techniques, and dedicated statistical, and chemometric data analytical strategies. Advanced skills in the preprocessing of raw data, identification of metabolites, variable selection, and modeling are illustrated. We believe that insights from these developments will help narrow the gap between the original dataset and current biological knowledge. We also discuss the limitations and perspectives of extracting information from high-throughput datasets.
Collapse
Affiliation(s)
- Lunzhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming, 650500, China.
| | - Naiping Dong
- Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, 999077, China
| | - Yonghuan Yun
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, China
| | - Baichuan Deng
- College of Animal Science, South China Agricultural University, Guangzhou, 510642, China
| | - Dabing Ren
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming, 650500, China
| | - Shao Liu
- Xiangya Hospital, Central South University, Changsha, 410008, China
| | - Yizeng Liang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
35
|
Sumner LW, Lei Z, Nikolau BJ, Saito K. Modern plant metabolomics: advanced natural product gene discoveries, improved technologies, and future prospects. Nat Prod Rep 2015; 32:212-29. [PMID: 25342293 DOI: 10.1039/c4np00072b] [Citation(s) in RCA: 149] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Plant metabolomics has matured and modern plant metabolomics has accelerated gene discoveries and the elucidation of a variety of plant natural product biosynthetic pathways. This review covers the approximate period of 2000 to 2014, and highlights specific examples of the discovery and characterization of novel genes and enzymes associated with the biosynthesis of natural products such as flavonoids, glucosinolates, terpenoids, and alkaloids. Additional examples of the integration of metabolomics with genome-based functional characterizations of plant natural products that are important to modern pharmaceutical technology are also reviewed. This article also provides a substantial review of recent technical advances in mass spectrometry imaging, nuclear magnetic resonance imaging, integrated LC-MS-SPE-NMR for metabolite identifications, and X-ray crystallography of microgram quantities for structural determinations. The review closes with a discussion on the future prospects of metabolomics related to crop species and herbal medicine.
Collapse
Affiliation(s)
- Lloyd W Sumner
- The Samuel Roberts Noble Foundation, Plant Biology Division, 2510 Sam Noble Parkway, Ardmore, OK, USA.
| | | | | | | |
Collapse
|
36
|
Gaudêncio SP, Pereira F. Dereplication: racing to speed up the natural products discovery process. Nat Prod Rep 2015; 32:779-810. [PMID: 25850681 DOI: 10.1039/c4np00134f] [Citation(s) in RCA: 177] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Covering: 1993-2014 (July)To alleviate the dereplication holdup, which is a major bottleneck in natural products discovery, scientists have been conducting their research efforts to add tools to their "bag of tricks" aiming to achieve faster, more accurate and efficient ways to accelerate the pace of the drug discovery process. Consequently dereplication has become a hot topic presenting a huge publication boom since 2012, blending multidisciplinary fields in new ways that provide important conceptual and/or methodological advances, opening up pioneering research prospects in this field.
Collapse
Affiliation(s)
- Susana P Gaudêncio
- LAQV, REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal.
| | | |
Collapse
|
37
|
Vaniya A, Fiehn O. Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics. Trends Analyt Chem 2015; 69:52-61. [PMID: 26213431 PMCID: PMC4509603 DOI: 10.1016/j.trac.2015.04.002] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Identification of unknown metabolites is the bottleneck in advancing metabolomics, leaving interpretation of metabolomics results ambiguous. The chemical diversity of metabolism is vast, making structure identification arduous and time consuming. Currently, comprehensive analysis of mass spectra in metabolomics is limited to library matching, but tandem mass spectral libraries are small compared to the large number of compounds found in the biosphere, including xenobiotics. Resolving this bottleneck requires richer data acquisition and better computational tools. Multi-stage mass spectrometry (MSn) trees show promise to aid in this regard. Fragmentation trees explore the fragmentation process, generate fragmentation rules and aid in sub-structure identification, while mass spectral trees delineate the dependencies in multi-stage MS of collision-induced dissociations. This review covers advancements over the past 10 years as a tool for metabolite identification, including algorithms, software and databases used to build and to implement fragmentation trees and mass spectral annotations.
Collapse
Affiliation(s)
- Arpana Vaniya
- University of California Davis, Department of Chemistry, One Shields Avenue, Davis, CA 95616, USA
- University of California Davis, West Coast Metabolomics Center, Genome Center, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Oliver Fiehn
- University of California Davis, West Coast Metabolomics Center, Genome Center, 451 Health Sciences Drive, Davis, CA 95616, USA
- King Abdulaziz University, Biochemistry Department, Jeddah, Saudi Arabia
| |
Collapse
|
38
|
Nielsen KF, Larsen TO. The importance of mass spectrometric dereplication in fungal secondary metabolite analysis. Front Microbiol 2015; 6:71. [PMID: 25741325 PMCID: PMC4330896 DOI: 10.3389/fmicb.2015.00071] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2014] [Accepted: 01/20/2015] [Indexed: 11/13/2022] Open
Abstract
Having entered the Genomic Era, it is now evident that the biosynthetic potential of filamentous fungi is much larger than was thought even a decade ago. Fungi harbor many cryptic gene clusters encoding for the biosynthesis of polyketides, non-ribosomal peptides, and terpenoids - which can all undergo extensive modifications by tailoring enzymes - thus potentially providing a large array of products from a single pathway. Elucidating the full chemical profile of a fungal species is a challenging exercise, even with elemental composition provided by high-resolution mass spectrometry (HRMS) used in combination with chemical databases (e.g., AntiBase) to dereplicate known compounds. This has led to a continuous effort to improve chromatographic separation in conjunction with improvement in HRMS detection. Major improvements have also occurred with 2D chromatography, ion-mobility, MS/MS and MS(3), stable isotope labeling feeding experiments, classic UV/Vis, and especially automated data-mining and metabolomics software approaches as the sheer amount of data generated is now the major challenge. This review will focus on the development and implementation of dereplication strategies and will highlight the importance of each stage of the process from sample preparation to chromatographic separation and finally toward both manual and more targeted methods for automated dereplication of fungal natural products using state-of-the art MS instrumentation.
Collapse
Affiliation(s)
- Kristian F Nielsen
- Department of Systems Biology, Technical University of Denmark, Kongens Lyngby Denmark
| | - Thomas O Larsen
- Department of Systems Biology, Technical University of Denmark, Kongens Lyngby Denmark
| |
Collapse
|
39
|
Nikolskiy I, Siuzdak G, Patti GJ. Discriminating precursors of common fragments for large-scale metabolite profiling by triple quadrupole mass spectrometry. Bioinformatics 2015; 31:2017-23. [PMID: 25691443 DOI: 10.1093/bioinformatics/btv085] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 02/05/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The goal of large-scale metabolite profiling is to compare the relative concentrations of as many metabolites extracted from biological samples as possible. This is typically accomplished by measuring the abundances of thousands of ions with high-resolution and high mass accuracy mass spectrometers. Although the data from these instruments provide a comprehensive fingerprint of each sample, identifying the structures of the thousands of detected ions is still challenging and time intensive. An alternative, less-comprehensive approach is to use triple quadrupole (QqQ) mass spectrometry to analyze predetermined sets of metabolites (typically fewer than several hundred). This is done using authentic standards to develop QqQ experiments that specifically detect only the targeted metabolites, with the advantage that the need for ion identification after profiling is eliminated. RESULTS Here, we propose a framework to extend the application of QqQ mass spectrometers to large-scale metabolite profiling. We aim to provide a foundation for designing QqQ multiple reaction monitoring (MRM) experiments for each of the 82 696 metabolites in the METLIN metabolite database. First, we identify common fragmentation products from the experimental fragmentation data in METLIN. Then, we model the likelihoods of each precursor structure in METLIN producing each common fragmentation product. With these likelihood estimates, we select ensembles of common fragmentation products that minimize our uncertainty about metabolite identities. We demonstrate encouraging performance and, based on our results, we suggest how our method can be integrated with future work to develop large-scale MRM experiments. AVAILABILITY AND IMPLEMENTATION Our predictions, Supplementary results, and the code for estimating likelihoods and selecting ensembles of fragmentation reactions are made available on the lab website at http://pattilab.wustl.edu/FragPred.
Collapse
Affiliation(s)
- Igor Nikolskiy
- Department of Genetics, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA, Scripps Center for Metabolomics and Mass Spectrometry, Departments of Chemistry, Molecular and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA and Department of Chemistry, Washington University, St. Louis, MO 63130, USA
| | - Gary Siuzdak
- Department of Genetics, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA, Scripps Center for Metabolomics and Mass Spectrometry, Departments of Chemistry, Molecular and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA and Department of Chemistry, Washington University, St. Louis, MO 63130, USA
| | - Gary J Patti
- Department of Genetics, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA, Scripps Center for Metabolomics and Mass Spectrometry, Departments of Chemistry, Molecular and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA and Department of Chemistry, Washington University, St. Louis, MO 63130, USA Department of Genetics, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA, Scripps Center for Metabolomics and Mass Spectrometry, Departments of Chemistry, Molecular and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA and Department of Chemistry, Washington University, St. Louis, MO 63130, USA Department of Genetics, Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA, Scripps Center for Metabolomics and Mass Spectrometry, Departments of Chemistry, Molecular and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA and Department of Chemistry, Washington University, St. Louis, MO 63130, USA
| |
Collapse
|
40
|
Miller JH, Schrom BT, Kangas LJ. Artificial neural network for charge prediction in metabolite identification by mass spectrometry. Methods Mol Biol 2015; 1260:89-100. [PMID: 25502377 DOI: 10.1007/978-1-4939-2239-0_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Collision-induced dissociation (CID) is widely used in mass spectrometry to identify biologically important molecules by gaining information about their internal structure. Interpretation of experimental CID spectra always involves some form of in silico spectra of potential candidate molecules. Knowledge of how charge is distributed among fragments is an important part of CID simulations that generate in silico spectra from the chemical structure of the precursor ions entering the collision chamber. In this chapter we describe a method to obtain this knowledge by machine learning.
Collapse
Affiliation(s)
- J H Miller
- Washington State University Tri-Cities, Richland, WA, 99354, USA,
| | | | | |
Collapse
|
41
|
Yi L, Dong N, Yun Y, Deng B, Liu S, Zhang Y, Liang Y. WITHDRAWN: Recent advances in chemometric methods for plant metabolomics: A review. Biotechnol Adv 2014:S0734-9750(14)00183-9. [PMID: 25461504 DOI: 10.1016/j.biotechadv.2014.11.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 11/17/2014] [Accepted: 11/18/2014] [Indexed: 12/17/2022]
Abstract
This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.
Collapse
Affiliation(s)
- Lunzhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming 650500, China.
| | - Naiping Dong
- Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong 999077, Hong Kong, China
| | - Yonghuan Yun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Baichuan Deng
- Department of Chemistry, University of Bergen, Bergen N-5007, Norway
| | - Shao Liu
- Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yi Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yizeng Liang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
42
|
Taylor R, Miller RH, Miller RD, Porter M, Dalgleish J, Prince JT. Automated structural classification of lipids by machine learning. Bioinformatics 2014; 31:621-5. [DOI: 10.1093/bioinformatics/btu723] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
43
|
Shen H, Dührkop K, Böcker S, Rousu J. Metabolite identification through multiple kernel learning on fragmentation trees. ACTA ACUST UNITED AC 2014; 30:i157-64. [PMID: 24931979 PMCID: PMC4058957 DOI: 10.1093/bioinformatics/btu275] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Motivation: Metabolite identification from tandem mass spectrometric data is a key task in metabolomics. Various computational methods have been proposed for the identification of metabolites from tandem mass spectra. Fragmentation tree methods explore the space of possible ways in which the metabolite can fragment, and base the metabolite identification on scoring of these fragmentation trees. Machine learning methods have been used to map mass spectra to molecular fingerprints; predicted fingerprints, in turn, can be used to score candidate molecular structures. Results: Here, we combine fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures. We introduce a family of kernels capturing the similarity of fragmentation trees, and combine these kernels using recently proposed multiple kernel learning approaches. Experiments on two large reference datasets show that the new methods significantly improve molecular fingerprint prediction accuracy. These improvements result in better metabolite identification, doubling the number of metabolites ranked at the top position of the candidates list. Contact:huibin.shen@aalto.fi Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Huibin Shen
- Department of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, GermanyDepartment of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| | - Kai Dührkop
- Department of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| | - Sebastian Böcker
- Department of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| | - Juho Rousu
- Department of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, GermanyDepartment of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| |
Collapse
|
44
|
Wang Y, Kora G, Bowen BP, Pan C. MIDAS: A Database-Searching Algorithm for Metabolite Identification in Metabolomics. Anal Chem 2014; 86:9496-503. [DOI: 10.1021/ac5014783] [Citation(s) in RCA: 82] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Yingfeng Wang
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Guruprasad Kora
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Benjamin P. Bowen
- Life Sciences
Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Chongle Pan
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| |
Collapse
|
45
|
Dührkop K, Hufsky F, Böcker S. Molecular Formula Identification Using Isotope Pattern Analysis and Calculation of Fragmentation Trees. Mass Spectrom (Tokyo) 2014; 3:S0037. [PMID: 26819880 DOI: 10.5702/massspectrometry.s0037] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 05/14/2014] [Indexed: 11/23/2022] Open
Abstract
We present the results of a fully automated de novo approach for identification of molecular formulas in the CASMI 2013 contest. Only results for Category 1 (molecular formula identification) were submitted. Our approach combines isotope pattern analysis and fragmentation pattern analysis and is completely independent from any (spectral and structural) database. We correctly identified the molecular formula for ten out of twelve challenges, being the best automated method competing in this category.
Collapse
Affiliation(s)
- Kai Dührkop
- Lehrstuhl für Bioinformatik, Friedrich-Schiller-Universität Jena
| | - Franziska Hufsky
- Lehrstuhl für Bioinformatik, Friedrich-Schiller-Universität Jena
| | - Sebastian Böcker
- Lehrstuhl für Bioinformatik, Friedrich-Schiller-Universität Jena
| |
Collapse
|
46
|
Hufsky F, Scheubert K, Böcker S. New kids on the block: novel informatics methods for natural product discovery. Nat Prod Rep 2014; 31:807-17. [DOI: 10.1039/c3np70101h] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
47
|
Hufsky F, Scheubert K, Böcker S. Computational mass spectrometry for small-molecule fragmentation. Trends Analyt Chem 2014. [DOI: 10.1016/j.trac.2013.09.008] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
48
|
Nikolskiy I, Mahieu NG, Chen YJ, Tautenhahn R, Patti GJ. An untargeted metabolomic workflow to improve structural characterization of metabolites. Anal Chem 2013; 85:7713-9. [PMID: 23829391 DOI: 10.1021/ac400751j] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Mass spectrometry-based metabolomics relies on MS(2) data for structural characterization of metabolites. To obtain the high-quality MS(2) data necessary to support metabolite identifications, ions of interest must be purely isolated for fragmentation. Here, we show that metabolomic MS(2) data are frequently characterized by contaminating ions that prevent structural identification. Although using narrow-isolation windows can minimize contaminating MS(2) fragments, even narrow windows are not always selective enough, and they can complicate data analysis by removing isotopic patterns from MS(2) spectra. Moreover, narrow windows can significantly reduce sensitivity. In this work, we introduce a novel, two-part approach for performing metabolomic identifications that addresses these issues. First, we collect MS(2) scans with less stringent isolation settings to obtain improved sensitivity at the expense of specificity. Then, by evaluating MS(2) fragment intensities as a function of retention time and precursor mass targeted for MS(2) analysis, we obtain deconvolved MS(2) spectra that are consistent with pure standards and can therefore be used for metabolite identification. The value of our approach is highlighted with metabolic extracts from brain, liver, astrocytes, as well as nerve tissue, and performance is evaluated by using pure metabolite standards in combination with simulations based on raw MS(2) data from the METLIN metabolite database. A R package implementing the algorithms used in our workflow is available on our laboratory website ( http://pattilab.wustl.edu/decoms2.php ).
Collapse
Affiliation(s)
- Igor Nikolskiy
- Department of Chemistry, Washington University School of Medicine, St. Louis, Missouri 63108, United States
| | | | | | | | | |
Collapse
|
49
|
Kind T, Liu KH, Lee DY, DeFelice B, Meissen JK, Fiehn O. LipidBlast in silico tandem mass spectrometry database for lipid identification. Nat Methods 2013; 10:755-8. [PMID: 23817071 PMCID: PMC3731409 DOI: 10.1038/nmeth.2551] [Citation(s) in RCA: 715] [Impact Index Per Article: 59.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2012] [Accepted: 05/13/2013] [Indexed: 12/17/2022]
Abstract
Current tandem mass spectral libraries for lipid annotations in metabolomics are limited in size and diversity. We provide a freely available computer generated in-silico tandem mass spectral library of 212,516 MS/MS spectra covering 119,200 compounds from 26 lipid compound classes, including phospholipids, glycerolipids, bacterial lipoglycans and plant glycolipids. Platform independence is shown by using tandem mass spectra from 40 different mass spectrometer types including low-resolution and high-resolution instruments.
Collapse
Affiliation(s)
- Tobias Kind
- Metabolics Core, UC Davis Genome Center, University of California, Davis, Davis, California, USA.
| | | | | | | | | | | |
Collapse
|
50
|
Molecular Formula Identification with SIRIUS. Metabolites 2013; 3:506-16. [PMID: 24958003 PMCID: PMC3901276 DOI: 10.3390/metabo3020506] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Revised: 06/03/2013] [Accepted: 06/04/2013] [Indexed: 01/06/2023] Open
Abstract
We present results of the SIRIUS2 submission to the 2012 CASMI contest. Only results for Category 1 (molecular formula identification) were submitted. The SIRIUS method and the parameters used are briefly described, followed by detailed analysis of the results and a discussion of cases where SIRIUS2 was unable to come up with the correct molecular formula. SIRIUS2 returns consistently high quality results, with the exception of fragmentation pattern analysis of time-of-flight data. We then discuss possibilities for further improving SIRIUS2 in the future.
Collapse
|