1
|
Meekel N, Kruve A, Lamoree MH, Been FM. Machine Learning-based Classification for the Prioritization of Potentially Hazardous Chemicals with Structural Alerts in Nontarget Screening. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:5056-5065. [PMID: 40051380 PMCID: PMC11924234 DOI: 10.1021/acs.est.4c10498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 02/17/2025] [Accepted: 02/19/2025] [Indexed: 03/19/2025]
Abstract
Nontarget screening (NTS) with liquid chromatography high-resolution mass spectrometry (LC-HRMS) is commonly used to detect unknown organic micropollutants in the environment. One of the main challenges in NTS is the prioritization of relevant LC-HRMS features. A novel prioritization strategy based on structural alerts to select NTS features that correspond to potentially hazardous chemicals is presented here. This strategy leverages raw tandem mass spectra (MS2) and machine learning models to predict the probability that NTS features correspond to chemicals with structural alerts. The models were trained on fragments and neutral losses from the experimental MS2 data. The feasibility of this approach is evaluated for two groups: aromatic amines and organophosphorus structural alerts. The neural network classification model for organophosphorus structural alerts achieved an Area Under the Curve of the Receiver Operating Characteristics (AUC-ROC) of 0.97 and a true positive rate of 0.65 on the test set. The random forest model for the classification of aromatic amines achieved an AUC-ROC value of 0.82 and a true positive rate of 0.58 on the test set. The models were successfully applied to prioritize LC-HRMS features in surface water samples, showcasing the high potential to develop and implement this approach further.
Collapse
Affiliation(s)
- Nienke Meekel
- KWR
Water Research Institute, P.O. Box 1072, Nieuwegein 3430 BB, The Netherlands
- Chemistry
for Environment and Health, Amsterdam Institute for Life and Environment
(A-LIFE), Vrije Universiteit, De Boelelaan 1085, Amsterdam 1081 HV, The Netherlands
| | - Anneli Kruve
- Department
of Materials and Environmental Chemistry, Stockholm University, Stockholm SE-106 91, Sweden
- Department
of Environmental Science, Stockholm University, Stockholm SE-106 91, Sweden
| | - Marja H. Lamoree
- Chemistry
for Environment and Health, Amsterdam Institute for Life and Environment
(A-LIFE), Vrije Universiteit, De Boelelaan 1085, Amsterdam 1081 HV, The Netherlands
| | - Frederic M. Been
- KWR
Water Research Institute, P.O. Box 1072, Nieuwegein 3430 BB, The Netherlands
- Chemistry
for Environment and Health, Amsterdam Institute for Life and Environment
(A-LIFE), Vrije Universiteit, De Boelelaan 1085, Amsterdam 1081 HV, The Netherlands
| |
Collapse
|
2
|
Barnabé A, Delcourt V, Loup B, Montanuy W, Trévisiol S, Popot MA, Garcia P, Bailly-Chouriberry L. Convolutional Neural Networks Assisted Peak Classification in Targeted LC-HRMS/MS for Equine Doping Control Screening Analyses. Anal Chem 2025; 97:3236-3241. [PMID: 39901649 DOI: 10.1021/acs.analchem.4c03608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025]
Abstract
Doping control screening analyses usually involve visual inspection of extracted ion chromatograms (EIC) by a trained analytical chemist, followed by further investigations if needed. This task is both highly repetitive and time-consuming, given the hundreds of compounds and metabolites to be screened in tens of thousands of samples per year. With the recent widespread adoption of machine learning in analytical chemistry and the training of high-performance convolutional neural networks (CNN), these operations can be automated with high accuracy and throughput. Applying this technology to doping control is challenging as the false negative rate (FNR) shall be equal to zero. In this study, we demonstrated that implementing a deep learning strategy for chromatogram classification in equine doping control can be feasible and accurate. We illustrated our findings with a CNN scoring model combined with a linear discriminant analysis (LDA) classifier trained on chromatogram images from our ultra-high-pressure liquid chromatography coupled to high-resolution tandem mass spectrometry (UHPLC-HRMS/MS)-based biotherapeutics screening method. We expect that artificial intelligence (AI) will be a valuable tool for doping control laboratories in the near future.
Collapse
Affiliation(s)
- Agnès Barnabé
- GIE LCH, Laboratoire des Courses Hippiques, 15 rue de Paradis, 91370 Verrières-le-Buisson, France
| | - Vivian Delcourt
- GIE LCH, Laboratoire des Courses Hippiques, 15 rue de Paradis, 91370 Verrières-le-Buisson, France
| | - Benoit Loup
- GIE LCH, Laboratoire des Courses Hippiques, 15 rue de Paradis, 91370 Verrières-le-Buisson, France
| | - William Montanuy
- GIE LCH, Laboratoire des Courses Hippiques, 15 rue de Paradis, 91370 Verrières-le-Buisson, France
| | - Stéphane Trévisiol
- GIE LCH, Laboratoire des Courses Hippiques, 15 rue de Paradis, 91370 Verrières-le-Buisson, France
| | - Marie-Agnès Popot
- GIE LCH, Laboratoire des Courses Hippiques, 15 rue de Paradis, 91370 Verrières-le-Buisson, France
| | - Patrice Garcia
- GIE LCH, Laboratoire des Courses Hippiques, 15 rue de Paradis, 91370 Verrières-le-Buisson, France
| | | |
Collapse
|
3
|
Huang J, Li Y, Meng B, Zhang Y, Wei Y, Dai X, An D, Zhao Y, Fang X. ProteoNet: A CNN-based framework for analyzing proteomics MS-RGB images. iScience 2024; 27:111362. [PMID: 39679296 PMCID: PMC11638609 DOI: 10.1016/j.isci.2024.111362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 06/15/2024] [Accepted: 11/07/2024] [Indexed: 12/17/2024] Open
Abstract
Proteomics is crucial in clinical research, yet the clinical application of proteomic data remains challenging. Transforming proteomic mass spectrometry (MS) data into red, green, and blue color (MS-RGB) image formats and applying deep learning (DL) techniques has shown great potential to enhance analysis efficiency. However, current DL models often fail to extract subtle, crucial features from MS-RGB data. To address this, we developed ProteoNet, a deep learning framework that refines MS-RGB data analysis. ProteoNet incorporates semantic partitioning, adaptive average pooling, and weighted factors into the Convolutional Neural Network (CNN) model, thus enhancing data analysis accuracy. Our experiments with proteomics data from urine, blood, and tissue samples related to liver, kidney, and thyroid diseases demonstrate that ProteoNet outperforms existing models in accuracy. ProteoNet also provides a direct conversion method for MS-RGB data, enabling a seamless workflow. Moreover, its compatibility with various CNN architectures, including lightweight models like MobileNetV2, underscores its scalability and clinical potential.
Collapse
Affiliation(s)
- Jinze Huang
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
| | - Yimin Li
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Bo Meng
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
| | - Yong Zhang
- Institutes for Systems Genetics, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yaoguang Wei
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Xinhua Dai
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
| | - Dong An
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Yang Zhao
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
| | - Xiang Fang
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing 100029, China
| |
Collapse
|
4
|
Zhang H, Yang Q, Xie T, Wang Y, Zhang Z, Lu H. MSBERT: Embedding Tandem Mass Spectra into Chemically Rational Space by Mask Learning and Contrastive Learning. Anal Chem 2024; 96:16599-16608. [PMID: 39397717 DOI: 10.1021/acs.analchem.4c02426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Tandem mass spectrometry (MS/MS) is a powerful technique for chemical analysis in many areas of science. The vast MS/MS spectral data generated in liquid chromatography-mass spectrometry (LC-MS) experiments require efficient analysis and interpretation methods for the following compound identification. In this study, we propose MSBERT based on self-supervised learning strategies to embed MS/MS spectra into reasonable embeddings for efficient compound identification. It adopts the transformer encoder as the backbone for mask learning and uses the same spectra with different masks for contrastive learning. MSBERT is trained on the GNPS data set and tested on the GNPS data set, the MoNA data set, and the MTBLS1572 data set. It exhibits enhanced library matching and analogous compound searching capabilities compared to existing methods. The recalls at 1, 5, and 10 on a GNPS test subset with structures not in the training set are 0.7871, 0.8950, and 0.9080, respectively. The results are better than those of Spec2Vec with 0.6898, 0.8276, and 0.8620, and DreaMS with 0.7158, 0.8327, and 0.8635. The rationality of embeddings is demonstrated by t-SNE visualization, structural similarity, spectra clustering, compound identification, and analogous compound searching. A user-friendly web server is provided for efficient spectral analysis, and the source code for MSBERT is available at https://github.com/zhanghailiangcsu/MSBERT.
Collapse
Affiliation(s)
- Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ting Xie
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
5
|
Beck A, Muhoberac M, Randolph CE, Beveridge CH, Wijewardhane PR, Kenttämaa HI, Chopra G. Recent Developments in Machine Learning for Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:233-246. [PMID: 38910862 PMCID: PMC11191731 DOI: 10.1021/acsmeasuresciau.3c00060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/27/2023] [Accepted: 01/22/2024] [Indexed: 06/25/2024]
Abstract
Statistical analysis and modeling of mass spectrometry (MS) data have a long and rich history with several modern MS-based applications using statistical and chemometric methods. Recently, machine learning (ML) has experienced a renaissance due to advents in computational hardware and the development of new algorithms for artificial neural networks (ANN) and deep learning architectures. Moreover, recent successes of new ANN and deep learning architectures in several areas of science, engineering, and society have further strengthened the ML field. Importantly, modern ML methods and architectures have enabled new approaches for tasks related to MS that are now widely adopted in several popular MS-based subdisciplines, such as mass spectrometry imaging and proteomics. Herein, we aim to provide an introductory summary of the practical aspects of ML methodology relevant to MS. Additionally, we seek to provide an up-to-date review of the most recent developments in ML integration with MS-based techniques while also providing critical insights into the future direction of the field.
Collapse
Affiliation(s)
- Armen
G. Beck
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Matthew Muhoberac
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Caitlin E. Randolph
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Connor H. Beveridge
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Prageeth R. Wijewardhane
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Hilkka I. Kenttämaa
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Gaurav Chopra
- Department
of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
- Department
of Computer Science (by courtesy), Purdue University, West Lafayette, Indiana 47907, United States
- Purdue
Institute for Drug Discovery, Purdue Institute for Cancer Research,
Regenstrief Center for Healthcare Engineering, Purdue Institute for
Inflammation, Immunology and Infectious Disease, Purdue Institute for Integrative Neuroscience, West Lafayette, Indiana 47907 United States
| |
Collapse
|
6
|
Bui-Thi D, Liu Y, Lippens JL, Laukens K, De Vijlder T. TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry. J Cheminform 2024; 16:61. [PMID: 38807166 PMCID: PMC11134763 DOI: 10.1186/s13321-024-00858-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open
Abstract
Small molecule identification is a crucial task in analytical chemistry and life sciences. One of the most commonly used technologies to elucidate small molecule structures is mass spectrometry. Spectral library search of product ion spectra (MS/MS) is a popular strategy to identify or find structural analogues. This approach relies on the assumption that spectral similarity and structural similarity are correlated. However, popular spectral similarity measures, usually calculated based on identical fragment matches between the MS/MS spectra, do not always accurately reflect the structural similarity. In this study, we propose TransExION, a Transformer based Explainable similarity metric for IONS. TransExION detects related fragments between MS/MS spectra through their mass difference and uses these to estimate spectral similarity. These related fragments can be nearly identical, but can also share a substructure. TransExION also provides a post-hoc explanation of its estimation, which can be used to support scientists in evaluating the spectral library search results and thus in structure elucidation of unknown molecules. Our model has a Transformer based architecture and it is trained on the data derived from GNPS MS/MS libraries. The experimental results show that it improves existing spectral similarity measures in searching and interpreting structural analogues as well as in molecular networking. SCIENTIFIC CONTRIBUTION: We propose a transformer-based spectral similarity metrics that improves the comparison of small molecule tandem mass spectra. We provide a post hoc explanation that can serve as a good starting point for unknown spectra annotation based on database spectra.
Collapse
Affiliation(s)
- Danh Bui-Thi
- Computer Science Department, University of Antwerp, Middelheimlaan 1, 2020, Antwerp, Belgium
| | - Youzhong Liu
- Therapeutic Development and Supply, Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Jennifer L Lippens
- Therapeutic Development and Supply, Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Kris Laukens
- Computer Science Department, University of Antwerp, Middelheimlaan 1, 2020, Antwerp, Belgium
| | - Thomas De Vijlder
- Therapeutic Development and Supply, Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340, Beerse, Belgium.
| |
Collapse
|
7
|
Leshuk TC, Young ZW, Wilson B, Chen ZQ, Smith DA, Lazaris G, Gopanchuk M, McLay S, Seelemann CA, Paradis T, Bekele A, Guest R, Massara H, White T, Zubot W, Letinski DJ, Redman AD, Allen DG, Gu F. A Light Touch: Solar Photocatalysis Detoxifies Oil Sands Process-Affected Waters Prior to Significant Treatment of Naphthenic Acids. ACS ES&T WATER 2024; 4:1483-1497. [PMID: 38633367 PMCID: PMC11019557 DOI: 10.1021/acsestwater.3c00616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/25/2024] [Accepted: 01/25/2024] [Indexed: 04/19/2024]
Abstract
Environmental reclamation of Canada's oil sands tailings ponds is among the single largest water treatment challenges globally. The toxicity of oil sands process-affected water (OSPW) has been associated with its dissolved organics, a complex mixture of naphthenic acid fraction components (NAFCs). Here, we evaluated solar treatment with buoyant photocatalysts (BPCs) as a passive advanced oxidation process (P-AOP) for OSPW remediation. Photocatalysis fully degraded naphthenic acids (NAs) and acid extractable organics (AEO) in 3 different OSPW samples. However, classical NAs and AEO, traditionally considered among the principal toxicants in OSPW, were not correlated with OSPW toxicity herein. Instead, nontarget petroleomic analysis revealed that low-polarity organosulfur compounds, composing <10% of the total AEO, apparently accounted for the majority of waters' toxicity to fish, as described by a model of tissue partitioning. These findings have implications for OSPW release, for which a less extensive but more selective treatment may be required than previously expected.
Collapse
Affiliation(s)
- Timothy
M. C. Leshuk
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Department
of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada M5S 3E5
- Department
of Chemical Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Institute for Nanotechnology, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
| | - Zachary W. Young
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Department
of Chemical Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Institute for Nanotechnology, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
| | - Brad Wilson
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Stantec, Waterloo, Ontario, Canada N2L 0A4
| | - Zi Qi Chen
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Department
of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada M5S 3E5
- Department
of Chemical Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Institute for Nanotechnology, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
| | - Danielle A. Smith
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Department
of Chemical Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Institute for Nanotechnology, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- P&P
Optica, Waterloo, Ontario, Canada N2 V 2C3
| | - Greg Lazaris
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Department
of Chemical Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Institute for Nanotechnology, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Department
of Mining and Materials Engineering, McGill
University, Montreal, Quebec, Canada H3A 0C5
| | - Mary Gopanchuk
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Department
of Chemical Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Institute for Nanotechnology, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
| | - Sean McLay
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Department
of Chemical Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Institute for Nanotechnology, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
| | - Corin A. Seelemann
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Department
of Chemical Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Composite Biomaterials Systems Lab, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Institute for Nanotechnology, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
| | - Theo Paradis
- Canadian
Natural Resources Ltd., Calgary, Alberta, Canada T2P 4J8
| | - Asfaw Bekele
- Imperial
Oil Ltd., Calgary, Alberta, Canada T2C 5N1
- ExxonMobil
Biomedical Sciences, Inc., Annandale, New Jersey 08801, United States
| | - Rodney Guest
- Suncor Energy Inc., Calgary, Alberta, Canada T2P 3E3
| | - Hafez Massara
- Suncor Energy Inc., Calgary, Alberta, Canada T2P 3E3
- Trans-Northern Pipelines Inc., Richmond Hill, Ontario, Canada L4B 3P6
| | - Todd White
- Teck Resources Ltd., Vancouver, British Columbia, Canada V6C 0B3
| | - Warren Zubot
- Syncrude Canada Ltd., Fort McMurray, Alberta, Canada T9H 0B6
| | - Daniel J. Letinski
- ExxonMobil
Biomedical Sciences, Inc., Annandale, New Jersey 08801, United States
| | - Aaron D. Redman
- ExxonMobil
Biomedical Sciences, Inc., Annandale, New Jersey 08801, United States
| | - D. Grant Allen
- Department
of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada M5S 3E5
| | - Frank Gu
- H2nanO
Inc., Kitchener, Ontario, Canada N2R 1E8
- Department
of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada M5S 3E5
- Department
of Chemical Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
- Waterloo
Institute for Nanotechnology, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
| |
Collapse
|
8
|
Hu A, Liu Q, Ouyang J. Identification and characterization of the metabolites of moscatilin in mouse, rat, dog, monkey and human hepatocytes by LC-Orbitrap-MS/MS combined with diagnostic fragment ions and accurate mass measurements. Biomed Chromatogr 2023; 37:e5573. [PMID: 36529812 DOI: 10.1002/bmc.5573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 12/07/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Moscatilin, a bibenzyl derivative from the stem of Dendrobium loddigesii, has been shown to have anticancer activity. The aim of this study was to identify and characterize the possible in vitro metabolites of moscatilin generated from hepatocytes. The metabolites generated in the hepatocytes of mouse, rat, dog, monkey and human were identified and characterized employing ultra-high-performance liquid chromatography coupled with quadrupole Orbitrap tandem mass spectrometry (LC-Orbitrap-MS/MS) based on diagnostic fragment ions and accurate mass measurements. A total of 18 metabolites were identified, among which seven were phase I and 11 were phase II metabolites. The plausible structures of the metabolites and the probable biotransformation pathways were proposed based on the diagnostic fragment ions, chemical formula and mass fragmentation pattern, as well as the accurate masses. The majority of phase I metabolites were generated by demethylation and hydroxylation, while phase II metabolites were mainly generated by glucuronidation, glutathione conjugation and sulfation. Our study first expounded the metabolites of moscatilin in mouse, rat, dog, monkey and human hepatocytes and provided a foundation for a further pharmacokinetic and toxicity study. More importantly, LC-Orbitrap-MS/MS combined with diagnostic fragment ions and accurate mass measurements has been proved to be an effective method for the rapid identification of bibenzyl derivatives and their metabolites.
Collapse
Affiliation(s)
- Aizhen Hu
- Clinical Research Center, Chongqing Public Health Medical Center, Chongqing, China
| | - Qingwang Liu
- Institute of Heath and Medical Technology, Hefei Institute of Physical Science, Chinese Academy of Sciences, Hefei, Anhui Province, China
| | - Jing Ouyang
- Clinical Research Center, Chongqing Public Health Medical Center, Chongqing, China
| |
Collapse
|
9
|
Liu H, Guo S, Xi S. A high-resolution accurate mass approach to identification of graveoline metabolites using ultra-high-performance liquid chromatography combined with a photo diode array detector and quadrupole/time-of-flight tandem mass spectrometry. Biomed Chromatogr 2023; 37:e5511. [PMID: 36100977 DOI: 10.1002/bmc.5511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 09/08/2022] [Accepted: 09/11/2022] [Indexed: 12/15/2022]
Abstract
Graveoline is a biologically active ingredient extracted from Ruta graveolens. Current work aimed at investigating in vitro metabolism of graveoline using rat or human liver microsomes and hepatocytes. Graveoline (20 μM) was incubated with nicotinamide adenine dinucleotide phosphate-supplemented rat and human liver microsomes as well as hepatocytes. LC coupled to a photo diode array detector and quadrupole/time-of-flight tandem mass spectrometry was used to detect and identify the metabolites. The structures of the metabolites were identified by accurate mass, elemental composition, and indicative fragment ions. A total of 12 metabolites, comprising 6 phase I and 6 phase II metabolites, were obtained. The metabolic pathways included demethylenation, demethylation, hydroxylation, glucuronidation, and glutathion conjugation. The metabolite (M10) produced by opening the ring of the methylenedioxyphenyl moiety was detected as the most abundant in both liver microsomes and hepatocytes, mainly catalyzed by CYP1A2, 2C8, 2C9, 2C19, 2D6, 3A4, and 3A5. This study provides valuable information on the in vitro metabolism of graveoline, which is indispensable for further development and safety evaluation of this compound.
Collapse
Affiliation(s)
- Hao Liu
- Department of Physical Education, Taiyuan Institute of Technology, Taiyuan, China
| | - Siyuan Guo
- Department of Physical Education, Taiyuan Institute of Technology, Taiyuan, China
| | - Shuyi Xi
- Department of Physical Education, Taiyuan Institute of Technology, Taiyuan, China
| |
Collapse
|
10
|
de Jonge NF, Mildau K, Meijer D, Louwen JJR, Bueschl C, Huber F, van der Hooft JJJ. Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools. Metabolomics 2022; 18:103. [PMID: 36469190 PMCID: PMC9722809 DOI: 10.1007/s11306-022-01963-y] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 11/18/2022] [Indexed: 12/12/2022]
Abstract
BACKGROUND Untargeted metabolomics approaches based on mass spectrometry obtain comprehensive profiles of complex biological samples. However, on average only 10% of the molecules can be annotated. This low annotation rate hampers biochemical interpretation and effective comparison of metabolomics studies. Furthermore, de novo structural characterization of mass spectral data remains a complicated and time-intensive process. Recently, the field of computational metabolomics has gained traction and novel methods have started to enable large-scale and reliable metabolite annotation. Molecular networking and machine learning-based in-silico annotation tools have been shown to greatly assist metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery. AIM OF REVIEW We highlight recent advances in computational metabolite annotation workflows with a special focus on their evaluation and comparison with other tools. Whilst the progress is substantial and promising, we also argue that inconsistencies in benchmarking different tools hamper users from selecting the most appropriate and promising method for their research. We summarize benchmarking strategies of the different tools and outline several recommendations for benchmarking and comparing novel tools. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on recent advances in mass spectral library-based and machine learning-supported metabolite annotation workflows. We discuss large-scale library matching and analogue search, the current bloom of mass spectral similarity scores, and how molecular networking has changed the field. In addition, the potentials and challenges of machine learning-supported metabolite annotation workflows are highlighted. Overall, recent developments in computational metabolomics have started to fundamentally change metabolomics workflows, and we expect that as a community we will be able to overcome current method performance ambiguities and annotation bottlenecks.
Collapse
Affiliation(s)
- Niek F. de Jonge
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
| | - Kevin Mildau
- Department of Analytical Chemistry, Biochemical Network Analysis Lab, University of Vienna, Vienna, Austria
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
| | - Joris J. R. Louwen
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
| | - Christoph Bueschl
- Department of Analytical Chemistry, Biochemical Network Analysis Lab, University of Vienna, Vienna, Austria
| | - Florian Huber
- Centre for Digitalization and Digitality (ZDD), University of Applied Sciences Düsseldorf, Düsseldorf, Germany
| | - Justin J. J. van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| |
Collapse
|
11
|
Bittremieux W, Wang M, Dorrestein PC. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 2022; 18:94. [PMID: 36409434 PMCID: PMC10284100 DOI: 10.1007/s11306-022-01947-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/19/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. AIM OF REVIEW We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.
Collapse
Affiliation(s)
- Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, 92507, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
12
|
Petrick LM, Shomron N. AI/ML-driven advances in untargeted metabolomics and exposomics for biomedical applications. CELL REPORTS. PHYSICAL SCIENCE 2022; 3:100978. [PMID: 35936554 PMCID: PMC9354369 DOI: 10.1016/j.xcrp.2022.100978] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Metabolomics describes a high-throughput approach for measuring a repertoire of metabolites and small molecules in biological samples. One utility of untargeted metabolomics, unbiased global analysis of the metabolome, is to detect key metabolites as contributors to, or readouts of, human health and disease. In this perspective, we discuss how artificial intelligence (AI) and machine learning (ML) have promoted major advances in untargeted metabolomics workflows and facilitated pivotal findings in the areas of disease screening and diagnosis. We contextualize applications of AI and ML to the emerging field of high-resolution mass spectrometry (HRMS) exposomics, which unbiasedly detects endogenous metabolites and exogenous chemicals in human tissue to characterize exposure linked with disease outcomes. We discuss the state of the science and suggest potential opportunities for using AI and ML to improve data quality, rigor, detection, and chemical identification in untargeted metabolomics and exposomics studies.
Collapse
Affiliation(s)
- Lauren M. Petrick
- The Bert Strassburger Metabolic Center, Sheba Medical Center, Tel-Hashomer, Israel
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Exposomics Research, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Noam Shomron
- Faculty of Medicine, Edmond J. Safra Center for Bioinformatics, Sagol School of Neuroscience, Center for Nanoscience and Nanotechnology, Center for Innovation Laboratories (TILabs), Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
13
|
Wu X, Sun S, Wu X, Sun Z. Identification of the metabolites of methylophiopogonanone A by ultra-high-performance liquid chromatography combined with high-resolution mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2022; 36:e9304. [PMID: 35347765 DOI: 10.1002/rcm.9304] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 03/21/2022] [Accepted: 03/26/2022] [Indexed: 06/14/2023]
Abstract
RATIONALE Methylophiopogonanone A (MOA) is a naturally occurring homoisoflavonoid from the Chinese herb Ophiopogon japonicus, which has been demonstrated to attenuate myocardial apoptosis. However, the metabolism of MOA remains unknown. The goal of the present work was to investigate the in vitro metabolism of MOA using liver microsomes and hepatocytes. METHODS The metabolites were generated by incubating MOA with rat, monkey and human liver microsomes or hepatocytes. The resulting samples were analyzed by using a quadrupole-orbitrap high-resolution mass spectrometer. The metabolites were identified through the measurements of the exact mass, elemental composition and product ions. RESULTS A total of 15 metabolites were detected and identified. Among these metabolites, M7 (demethylenation) was the most abundant metabolite in liver microsomes, while M6 (hydroxylation) was the predominant metabolite in hepatocytes, and glucuronidation metabolites (M9 and M10) were also the main metabolites in hepatocytes. The metabolic pathways of MOA included hydroxylation, demethylenation, glucuronidation, methylation, sulfation and glutathione conjugation. CONCLUSIONS This study for the first time provides valuable data on the metabolites of MOA, which will be of great importance for a better understanding of its disposition and to predict human pharmacokinetics.
Collapse
Affiliation(s)
- Xiaowen Wu
- Department of Pharmacy, The First People's Hospital of Lianyungang, Lianyungang, Jiangsu Province, China
| | - Shuai Sun
- Department of Pharmacy, The First People's Hospital of Lianyungang, Lianyungang, Jiangsu Province, China
| | - Xiaoyi Wu
- Department of Pharmacy, The First People's Hospital of Lianyungang, Lianyungang, Jiangsu Province, China
| | - Zengxian Sun
- Department of Pharmacy, The First People's Hospital of Lianyungang, Lianyungang, Jiangsu Province, China
| |
Collapse
|
14
|
Shrivastava AD, Swainston N, Samanta S, Roberts I, Wright Muelas M, Kell DB. MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra. Biomolecules 2021; 11:1793. [PMID: 34944436 PMCID: PMC8699281 DOI: 10.3390/biom11121793] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 11/14/2021] [Accepted: 11/27/2021] [Indexed: 12/15/2022] Open
Abstract
The 'inverse problem' of mass spectrometric molecular identification ('given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came') is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem ('calculate a small molecule's likely fragmentation and hence at least some of its mass spectrum from its structure alone') is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the 'translation' a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the 'true' molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are 'similar' to the top hit. In addition to using the 'top hits' directly, we can produce a rank order of these by 'round-tripping' candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to 'learn' millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.
Collapse
Affiliation(s)
- Aditya Divyakant Shrivastava
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Ivayla Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Marina Wright Muelas
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (A.D.S.); (N.S.); (S.S.); (I.R.); (M.W.M.)
- Mellizyme Biotechnology Ltd., Liverpool Science Park IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
15
|
Huber F, van der Burg S, van der Hooft JJJ, Ridder L. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra. J Cheminform 2021; 13:84. [PMID: 34715914 PMCID: PMC8556919 DOI: 10.1186/s13321-021-00558-4] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 09/25/2021] [Indexed: 11/18/2022] Open
Abstract
Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model's prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines.
Collapse
Affiliation(s)
- Florian Huber
- Netherlands eScience Center, 1098 XG, Amsterdam, The Netherlands.
| | | | | | - Lars Ridder
- Netherlands eScience Center, 1098 XG, Amsterdam, The Netherlands
| |
Collapse
|
16
|
Comprehensive Large-Scale Integrative Analysis of Omics Data To Accelerate Specialized Metabolite Discovery. mSystems 2021; 6:e0072621. [PMID: 34427506 PMCID: PMC8407348 DOI: 10.1128/msystems.00726-21] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Microbial specialized metabolites are key mediators in host-microbiome interactions. Most of the chemical space produced by the microbiome currently remains unexplored and uncharacterized. This situation calls for new and improved methods to exploit the growing publicly available genomic and metabolomic data sets and connect the outcomes to structural and functional knowledge inferred from transcriptomics and proteomics experiments. Here, we first describe currently available approaches that support the comprehensive mining of metabolomics and genomics data. Next, we provide our vision on how to move forward toward the automated linking of omics data of specialized metabolites to their structures, biosynthesis pathways, producers, and functions.
Collapse
|