1
|
Zhao T, Low B, Shen Q, Wang Y, Hidalgo Delgado D, Chau KNM, Pang Z, Li X, Xia J, Li XF, Huan T. Exposome-Scale Investigation of Cl-/Br-Containing Chemicals Using High-Resolution Mass Spectrometry, Multistage Machine Learning, and Cloud Computing. Anal Chem 2025. [PMID: 40401576 DOI: 10.1021/acs.analchem.5c00503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2025]
Abstract
Over 70% of organic halogens, representing chlorine- and bromine-containing disinfection byproducts (Cl-/Br-DBPs), remain unidentified after 50 years of research. This work introduces a streamlined and cloud-based exposomics workflow that integrates high-resolution mass spectrometry (HRMS) analysis, multistage machine learning, and cloud computing for efficient analysis and characterization of Cl-/Br-DBPs. In particular, the multistage machine learning structure employs progressively different heavy isotopic peaks at each layer and capture the distinct isotopic characteristics of nonhalogenated compounds and Cl-/Br-compounds at different halogenation levels. This innovative approach enables the recognition of 22 types of Cl-/Br-compounds with up to 6 Br and 8 Cl atoms. To address the data imbalance among different classes, particularly the limited number of heavily chlorinated and brominated compounds, data perturbation is performed to generate hypothetical/synthetic molecular formulas containing multiple Cl and Br atoms, facilitating data augmentation. To further benefit the environmental chemistry community with limited computational experience and hardware access, above innovations are incorporated into HalogenFinder (http://www.halogenfinder.com/), a user-friendly, web-based platform for Cl-/Br-compound characterization, with statistical analysis support via MetaboAnalyst. In the benchmarking, HalogenFinder outperformed two established tools, achieving a higher recognition rate for 277 authentic Cl-/Br-compounds and uniquely identifying the number of Cl/Br atoms. In laboratory tests of DBP mixtures, it identified 72 Cl-/Br-DBPs with proposed structures, of which eight were confirmed with chemical standards. A retrospective analysis of 2022 finished water HRMS data revealed insightful temporal trends in Cl-DBP features. These results demonstrate HalogenFinder's effectiveness in advancing Cl-/Br-compound identification for environmental science and exposomics.
Collapse
Affiliation(s)
- Tingting Zhao
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada
| | - Brian Low
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada
| | - Qiming Shen
- Division of Analytical and Environmental Toxicology, Department of Laboratory Medicine and Pathology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta T6G 2G3, Canada
| | - Yukai Wang
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada
| | - David Hidalgo Delgado
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada
| | - K N Minh Chau
- Division of Analytical and Environmental Toxicology, Department of Laboratory Medicine and Pathology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta T6G 2G3, Canada
| | - Zhiqiang Pang
- Institute of Parasitology, Faculty of Agricultural and Environmental Sciences, McGill University, Sainte-Anne-de-Bellevue, Quebec H9X 3 V9, Canada
| | - Xiaoxiao Li
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Jianguo Xia
- Institute of Parasitology, Faculty of Agricultural and Environmental Sciences, McGill University, Sainte-Anne-de-Bellevue, Quebec H9X 3 V9, Canada
- Department of Microbiology and Immunology, School of Biomedical Sciences, McGill University, Montreal, Quebec H3A 2B4, Canada
| | - Xing-Fang Li
- Division of Analytical and Environmental Toxicology, Department of Laboratory Medicine and Pathology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta T6G 2G3, Canada
| | - Tao Huan
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada
| |
Collapse
|
2
|
Kong F, Shen T, Li Y, Bashar A, Bird SS, Fiehn O. Denoising Search doubles the number of metabolite and exposome annotations in human plasma using an Orbitrap Astral mass spectrometer. Nat Methods 2025; 22:1008-1016. [PMID: 40155721 DOI: 10.1038/s41592-025-02646-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 02/24/2025] [Indexed: 04/01/2025]
Abstract
Chemical exposures may affect human metabolism and contribute to the etiology of neurodegenerative disorders such as Alzheimer's disease. Identifying these small metabolites involves matching experimental spectra to reference spectra in databases. However, environmental chemicals or physiologically active metabolites are usually present at low concentrations in human specimens. The presence of noise ions can substantially degrade spectral quality, leading to false negatives and reduced identification rates. In response to this challenge, the Spectral Denoising algorithm removes both chemical and electronic noise. Spectral Denoising outperformed alternative methods in benchmarking studies on 240 tested metabolites. It improved high confident compound identifications at an average 35-fold lower concentrations than previously achievable. Spectral Denoising proved highly robust against varying levels of both chemical and electronic noise even with a greater than 150-fold higher intensity of noise ions than true fragment ions. For human plasma samples from patients with Alzheimer's disease that were analyzed on the Orbitrap Astral mass spectrometer, Denoising Search detected 2.5-fold more annotated compounds compared to the Exploris 240 Orbitrap instrument, including drug metabolites, household and industrial chemicals, and pesticides.
Collapse
Affiliation(s)
- Fanzhou Kong
- Chemistry Department, University of California Davis, Davis, CA, USA
- West Coast Metabolomics Center, University of California Davis, Davis, CA, USA
| | - Tong Shen
- West Coast Metabolomics Center, University of California Davis, Davis, CA, USA
| | - Yuanyue Li
- West Coast Metabolomics Center, University of California Davis, Davis, CA, USA
| | | | | | - Oliver Fiehn
- West Coast Metabolomics Center, University of California Davis, Davis, CA, USA.
| |
Collapse
|
3
|
Qiang H, Wang F, Lu W, Xing X, Kim H, Merette SAM, Ayres LB, Oler E, AbuSalim JE, Roichman A, Neinast M, Cordova RA, Lee WD, Herbst E, Gupta V, Neff S, Hiebert-Giesbrecht M, Young A, Gautam V, Tian S, Wang B, Röst H, Greiner R, Chen L, Johnston CW, Foster LJ, Shapiro AM, Wishart DS, Rabinowitz JD, Skinnider MA. Language model-guided anticipation and discovery of unknown metabolites. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.13.623458. [PMID: 39605668 PMCID: PMC11601323 DOI: 10.1101/2024.11.13.623458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Despite decades of study, large parts of the mammalian metabolome remain unexplored. Mass spectrometry-based metabolomics routinely detects thousands of small molecule-associated peaks within human tissues and biofluids, but typically only a small fraction of these can be identified, and structure elucidation of novel metabolites remains a low-throughput endeavor. Biochemical large language models have transformed the interpretation of DNA, RNA, and protein sequences, but have not yet had a comparable impact on understanding small molecule metabolism. Here, we present an approach that leverages chemical language models to discover previously uncharacterized metabolites. We introduce DeepMet, a chemical language model that learns the latent biosynthetic logic embedded within the structures of known metabolites and exploits this understanding to anticipate the existence of as-of-yet undiscovered metabolites. Prospective chemical synthesis of metabolites predicted to exist by DeepMet directs their targeted discovery. Integrating DeepMet with tandem mass spectrometry (MS/MS) data enables automated metabolite discovery within complex tissues. We harness DeepMet to discover several dozen structurally diverse mammalian metabolites. Our work demonstrates the potential for language models to accelerate the mapping of the metabolome.
Collapse
|
4
|
Pang Z, Lu Y, Zhou G, Hui F, Xu L, Viau C, Spigelman A, MacDonald P, Wishart D, Li S, Xia J. MetaboAnalyst 6.0: towards a unified platform for metabolomics data processing, analysis and interpretation. Nucleic Acids Res 2024; 52:W398-W406. [PMID: 38587201 PMCID: PMC11223798 DOI: 10.1093/nar/gkae253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/14/2024] [Accepted: 03/26/2024] [Indexed: 04/09/2024] Open
Abstract
We introduce MetaboAnalyst version 6.0 as a unified platform for processing, analyzing, and interpreting data from targeted as well as untargeted metabolomics studies using liquid chromatography - mass spectrometry (LC-MS). The two main objectives in developing version 6.0 are to support tandem MS (MS2) data processing and annotation, as well as to support the analysis of data from exposomics studies and related experiments. Key features of MetaboAnalyst 6.0 include: (i) a significantly enhanced Spectra Processing module with support for MS2 data and the asari algorithm; (ii) a MS2 Peak Annotation module based on comprehensive MS2 reference databases with fragment-level annotation; (iii) a new Statistical Analysis module dedicated for handling complex study design with multiple factors or phenotypic descriptors; (iv) a Causal Analysis module for estimating metabolite - phenotype causal relations based on two-sample Mendelian randomization, and (v) a Dose-Response Analysis module for benchmark dose calculations. In addition, we have also improved MetaboAnalyst's visualization functions, updated its compound database and metabolite sets, and significantly expanded its pathway analysis support to around 130 species. MetaboAnalyst 6.0 is freely available at https://www.metaboanalyst.ca.
Collapse
Affiliation(s)
- Zhiqiang Pang
- Institute of Parasitology, McGill University,Sainte-Anne-de-Bellevue, Quebec, Canada
| | - Yao Lu
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada
| | - Guangyan Zhou
- Institute of Parasitology, McGill University,Sainte-Anne-de-Bellevue, Quebec, Canada
| | - Fiona Hui
- Institute of Parasitology, McGill University,Sainte-Anne-de-Bellevue, Quebec, Canada
| | - Lei Xu
- Institute of Parasitology, McGill University,Sainte-Anne-de-Bellevue, Quebec, Canada
| | - Charles Viau
- Institute of Parasitology, McGill University,Sainte-Anne-de-Bellevue, Quebec, Canada
| | - Aliya F Spigelman
- Department of Pharmacology and Alberta Diabetes Institute, University of Alberta, Edmonton, Alberta, Canada
| | - Patrick E MacDonald
- Department of Pharmacology and Alberta Diabetes Institute, University of Alberta, Edmonton, Alberta, Canada
| | - David S Wishart
- Departments of Biological Sciences and Computing Science, University of Alberta, Edmonton, Alberta, Canada
| | - Shuzhao Li
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- University of Connecticut School of Medicine, Farmington, CT, USA
| | - Jianguo Xia
- Institute of Parasitology, McGill University,Sainte-Anne-de-Bellevue, Quebec, Canada
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
5
|
Pang Z, Xu L, Viau C, Lu Y, Salavati R, Basu N, Xia J. MetaboAnalystR 4.0: a unified LC-MS workflow for global metabolomics. Nat Commun 2024; 15:3675. [PMID: 38693118 PMCID: PMC11063062 DOI: 10.1038/s41467-024-48009-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 04/18/2024] [Indexed: 05/03/2024] Open
Abstract
The wide applications of liquid chromatography - mass spectrometry (LC-MS) in untargeted metabolomics demand an easy-to-use, comprehensive computational workflow to support efficient and reproducible data analysis. However, current tools were primarily developed to perform specific tasks in LC-MS based metabolomics data analysis. Here we introduce MetaboAnalystR 4.0 as a streamlined pipeline covering raw spectra processing, compound identification, statistical analysis, and functional interpretation. The key features of MetaboAnalystR 4.0 includes an auto-optimized feature detection and quantification algorithm for LC-MS1 spectra processing, efficient MS2 spectra deconvolution and compound identification for data-dependent or data-independent acquisition, and more accurate functional interpretation through integrated spectral annotation. Comprehensive validation studies using LC-MS1 and MS2 spectra obtained from standards mixtures, dilution series and clinical metabolomics samples have shown its excellent performance across a wide range of common tasks such as peak picking, spectral deconvolution, and compound identification with good computing efficiency. Together with its existing statistical analysis utilities, MetaboAnalystR 4.0 represents a significant step toward a unified, end-to-end workflow for LC-MS based global metabolomics in the open-source R environment.
Collapse
Affiliation(s)
- Zhiqiang Pang
- Faculty of Agricultural and Environmental Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | - Lei Xu
- Faculty of Agricultural and Environmental Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | - Charles Viau
- Faculty of Agricultural and Environmental Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | - Yao Lu
- Department of Microbiology and Immunology, McGill University, Montreal, QC, Canada
| | - Reza Salavati
- Faculty of Agricultural and Environmental Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | - Niladri Basu
- Faculty of Agricultural and Environmental Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | - Jianguo Xia
- Faculty of Agricultural and Environmental Sciences, McGill University, Ste-Anne-de-Bellevue, QC, Canada.
- Department of Microbiology and Immunology, McGill University, Montreal, QC, Canada.
| |
Collapse
|
6
|
An L, Chen B, Zhang Y, Li H, Huang R, Li F, Tang Y. Compound Similarity Network as a Novel Data Mining Strategy for High-Throughput Investigation of Degradation Pathways of Organic Pollutants in Industrial Wastewater Treatment. Anal Chem 2024; 96:3951-3959. [PMID: 38377587 DOI: 10.1021/acs.analchem.3c05983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Identification of degradation products and pathways is crucial for investigating emerging pollutants and evaluation of wastewater treatment methods. Nontargeted analysis is a powerful tool to comprehensively investigate the degradation pathways of organic pollutants in real-world wastewater samples but often generates large data sets, making it difficult to effectively locate the exact information on interests. Herein, to efficiently establish the linkages among compounds in the same degradation pathways, we introduce a compound similarity network (CSN) as a novel data mining strategy for LC-MS-based nontargeted analysis of complex wastewater samples. Different from molecular networks that cluster compounds based on MS/MS spectra similarity, our CSN strategy harnesses molecular fingerprints to establish linkages among compounds and thus is spectra-independent. The effectiveness of CSN was demonstrated by nontargeted identification of degradation pathways and products of organic pollutants in leather industrial wastewater that underwent laboratory-scale activated carbon adsorption (ACD) and ozonation treatments. Utilizing CSN in interpreting nontargeted data, we tentatively annotated 4324 compounds in the untreated leather industrial wastewater, 3246 after ACD, and 3777 after ACD/ozonation. We located 145 potential degradation pathways of organic pollutants in the ACD/ozonation process using CSN and validated 7 pathways with 15 chemical standards. CSN also revealed 5 clusters of emerging pollutants, from which 3 compounds were selected for in vitro cytotoxicity study to evaluate their potential biohazards as new pollutants. As CSN offers an efficient way to connect massive compounds and to find multiple degradation pathways in a high-throughput manner, we anticipate that it will find wide applications in nontargeted analysis of diverse environmental samples.
Collapse
Affiliation(s)
- Lirong An
- Analytical & Testing Center, Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| | - Bin Chen
- Analytical & Testing Center, Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| | - Yuchen Zhang
- Sichuan Provincial Key Laboratory of Universities on Environmental Science and Engineering, MOE Key Laboratory of Deep Earth Science and Engineering, College of Architecture and Environment, Sichuan University, Chengdu, Sichuan 610065, China
| | - Hailiang Li
- Analytical & Testing Center, Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| | - Rongfu Huang
- Sichuan Provincial Key Laboratory of Universities on Environmental Science and Engineering, MOE Key Laboratory of Deep Earth Science and Engineering, College of Architecture and Environment, Sichuan University, Chengdu, Sichuan 610065, China
| | - Feng Li
- Analytical & Testing Center, Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| | - Yanan Tang
- Analytical & Testing Center, Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| |
Collapse
|
7
|
Zhao T, Xing S, Yu H, Huan T. De Novo Cleaning of Chimeric MS/MS Spectra for LC-MS/MS-Based Metabolomics. Anal Chem 2023; 95:13018-13028. [PMID: 37603462 DOI: 10.1021/acs.analchem.3c00736] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
The purity of tandem mass spectrometry (MS/MS) is essential to MS/MS-based metabolite annotation and unknown exploration. This work presents a de novo approach to cleaning chimeric MS/MS spectra generated in liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based metabolomics. The assumption is that true fragments and their precursors are well correlated across the samples in a study, while false or contamination fragments are rather independent. Using data simulation, this work starts with an investigation of the negative effects of chimeric MS/MS spectra on spectral similarity analysis and molecular networking. Next, the characteristics of true and false fragments in chimeric MS/MS spectra were investigated using MS/MS of chemical standards. We recognized three fragment peak attributes indicative of whether a peak is a false fragment, including (1) intensity ratio fluctuation, (2) appearance rate, and (3) relative intensity. Using these attributes, we tested three machine learning models and identified XGBoost as the best model to achieve an area under the precision-recall curve of 0.98 for a clear separation between true and false fragments. Based on the trained model, we constructed an automated bioinformatic platform, DNMS2Purifier (short for de novo MS2Purifier), for metabolic features from metabolomics studies. DNMS2Purifier recognizes and processes chimeric MS/MS spectra without additional sample analysis or library confirmation. DNMS2Purifer was evaluated on a metabolomics data set generated with different MS/MS precursor isolation windows. It successfully captured the increase in the number of false fragments from the increased isolation window. DNMS2Purifier was also compared to MS2Purifier, an existing MS/MS spectral cleaning tool based on the addition of data-independent acquisition (DIA) analysis. Results indicated that DNMS2Purifier uniquely recognizes false fragments, which complements the previous DIA-based approach. Finally, DNMS2Purifier was demonstrated using a real experimental metabolomics study, showing improved MS/MS spectral quality and leading to an improved spectral match ratio and molecular networking outcome.
Collapse
Affiliation(s)
- Tingting Zhao
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1 British Columbia, Canada
| | - Shipei Xing
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1 British Columbia, Canada
| | - Huaxu Yu
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1 British Columbia, Canada
| | - Tao Huan
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, V6T 1Z1 British Columbia, Canada
| |
Collapse
|
8
|
Jia Z, Qiu Q, He R, Zhou T, Chen L. Identification of Metabolite Interference Is Necessary for Accurate LC-MS Targeted Metabolomics Analysis. Anal Chem 2023; 95:7985-7992. [PMID: 37155916 DOI: 10.1021/acs.analchem.3c00804] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Targeted metabolomics has been broadly used for metabolite measurement due to its good quantitative linearity and simple metabolite annotation workflow. However, metabolite interference, the phenomenon where one metabolite generates a peak in another metabolite's MRM setting (Q1/Q3) with a close retention time (RT), may lead to inaccurate metabolite annotation and quantification. Besides isomeric metabolites having the same precursor and product ions that may interfere with each other, we found other metabolite interferences as the result of inadequate mass resolution of triple-quadruple mass spectrometry and in-source fragmentation of metabolite ions. Characterizing the targeted metabolomics data using 334 metabolite standards revealed that about 75% of the metabolites generated measurable signals in at least one other metabolite's MRM setting. Different chromatography techniques can resolve 65-85% of these interfering signals among standards. Metabolite interference analysis combined with the manual inspection of cell lysate and serum data suggested that about 10% out of ∼180 annotated metabolites were mis-annotated or mis-quantified. These results highlight that a thorough investigation of metabolite interference is necessary for accurate metabolite measurement in targeted metabolomics.
Collapse
Affiliation(s)
- Zhikun Jia
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism & Integrative Biology, Fudan University, Shanghai 200433, China
| | - Qiongju Qiu
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism & Integrative Biology, Fudan University, Shanghai 200433, China
| | - Ruiping He
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism & Integrative Biology, Fudan University, Shanghai 200433, China
| | - Tianyu Zhou
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism & Integrative Biology, Fudan University, Shanghai 200433, China
- Shanghai Qi Zhi Institute, Shanghai 200030, China
| | - Li Chen
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism & Integrative Biology, Fudan University, Shanghai 200433, China
- Shanghai Qi Zhi Institute, Shanghai 200030, China
| |
Collapse
|
9
|
Xing S, Shen S, Xu B, Li X, Huan T. BUDDY: molecular formula discovery via bottom-up MS/MS interrogation. Nat Methods 2023:10.1038/s41592-023-01850-x. [PMID: 37055660 DOI: 10.1038/s41592-023-01850-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 03/15/2023] [Indexed: 04/15/2023]
Abstract
A substantial fraction of metabolic features remains undetermined in mass spectrometry (MS)-based metabolomics, and molecular formula annotation is the starting point for unraveling their chemical identities. Here we present bottom-up tandem MS (MS/MS) interrogation, a method for de novo formula annotation. Our approach prioritizes MS/MS-explainable formula candidates, implements machine-learned ranking and offers false discovery rate estimation. Compared with the mathematically exhaustive formula enumeration, our approach shrinks the formula candidate space by 42.8% on average. Method benchmarking on annotation accuracy was systematically carried out on reference MS/MS libraries and real metabolomics datasets. Applied on 155,321 recurrent unidentified spectra, our approach confidently annotated >5,000 novel molecular formulae absent from chemical databases. Beyond the level of individual metabolic features, we combined bottom-up MS/MS interrogation with global optimization to refine formula annotations while revealing peak interrelationships. This approach allowed the systematic annotation of 37 fatty acid amide molecules in human fecal data. All bioinformatics pipelines are available in a standalone software, BUDDY ( https://github.com/HuanLab/BUDDY ).
Collapse
Affiliation(s)
- Shipei Xing
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada
| | - Sam Shen
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada
| | - Banghua Xu
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xiaoxiao Li
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Tao Huan
- Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
10
|
Guo J, Yu H, Xing S, Huan T. Addressing big data challenges in mass spectrometry-based metabolomics. Chem Commun (Camb) 2022; 58:9979-9990. [PMID: 35997016 DOI: 10.1039/d2cc03598g] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Advancements in computer science and software engineering have greatly facilitated mass spectrometry (MS)-based untargeted metabolomics. Nowadays, gigabytes of metabolomics data are routinely generated from MS platforms, containing condensed structural and quantitative information from thousands of metabolites. Manual data processing is almost impossible due to the large data size. Therefore, in the "omics" era, we are faced with new challenges, the big data challenges of how to accurately and efficiently process the raw data, extract the biological information, and visualize the results from the gigantic amount of collected data. Although important, proposing solutions to address these big data challenges requires broad interdisciplinary knowledge, which can be challenging for many metabolomics practitioners. Our laboratory in the Department of Chemistry at the University of British Columbia is committed to combining analytical chemistry, computer science, and statistics to develop bioinformatics tools that address these big data challenges. In this Feature Article, we elaborate on the major big data challenges in metabolomics, including data acquisition, feature extraction, quantitative measurements, statistical analysis, and metabolite annotation. We also introduce our recently developed bioinformatics solutions for these challenges. Notably, all of the bioinformatics tools and source codes are freely available on GitHub (https://www.github.com/HuanLab), along with revised and regularly updated content.
Collapse
Affiliation(s)
- Jian Guo
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, BC Canada, V6T 1Z1, Canada.
| | - Huaxu Yu
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, BC Canada, V6T 1Z1, Canada.
| | - Shipei Xing
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, BC Canada, V6T 1Z1, Canada.
| | - Tao Huan
- Department of Chemistry, University of British Columbia, 2036 Main Mall, Vancouver, BC Canada, V6T 1Z1, Canada.
| |
Collapse
|
11
|
Chen L, Lu W, Wang L, Xing X, Chen Z, Teng X, Zeng X, Muscarella AD, Shen Y, Cowan A, McReynolds MR, Kennedy BJ, Lato AM, Campagna SR, Singh M, Rabinowitz JD. Metabolite discovery through global annotation of untargeted metabolomics data. Nat Methods 2021; 18:1377-1385. [PMID: 34711973 PMCID: PMC8733904 DOI: 10.1038/s41592-021-01303-3] [Citation(s) in RCA: 135] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 09/16/2021] [Indexed: 11/08/2022]
Abstract
Liquid chromatography-high-resolution mass spectrometry (LC-MS)-based metabolomics aims to identify and quantify all metabolites, but most LC-MS peaks remain unidentified. Here we present a global network optimization approach, NetID, to annotate untargeted LC-MS metabolomics data. The approach aims to generate, for all experimentally observed ion peaks, annotations that match the measured masses, retention times and (when available) tandem mass spectrometry fragmentation patterns. Peaks are connected based on mass differences reflecting adduction, fragmentation, isotopes, or feasible biochemical transformations. Global optimization generates a single network linking most observed ion peaks, enhances peak assignment accuracy, and produces chemically informative peak-peak relationships, including for peaks lacking tandem mass spectrometry spectra. Applying this approach to yeast and mouse data, we identified five previously unrecognized metabolites (thiamine derivatives and N-glucosyl-taurine). Isotope tracer studies indicate active flux through these metabolites. Thus, NetID applies existing metabolomic knowledge and global optimization to substantially improve annotation coverage and accuracy in untargeted metabolomics datasets, facilitating metabolite discovery.
Collapse
Affiliation(s)
- Li Chen
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, China
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Wenyun Lu
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | - Lin Wang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | - Xi Xing
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | - Ziyang Chen
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, China
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
| | - Xin Teng
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Xianfeng Zeng
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | - Antonio D Muscarella
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Yihui Shen
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Alexis Cowan
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA
| | - Melanie R McReynolds
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Chemistry, Princeton University, Princeton, NJ, USA
| | - Brandon J Kennedy
- Lotus Separations, LLC, Department of Chemistry, Princeton University, Princeton, NJ, USA
| | - Ashley M Lato
- Department of Chemistry, The University of Tennessee at Knoxville, Knoxville, TN, USA
| | - Shawn R Campagna
- Department of Chemistry, The University of Tennessee at Knoxville, Knoxville, TN, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Joshua D Rabinowitz
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
- Department of Chemistry, Princeton University, Princeton, NJ, USA.
- Department of Molecular Biology, Princeton University, Princeton, NJ, USA.
- Ludwig Institute for Cancer Research, Princeton Branch, Princeton, NJ, USA.
| |
Collapse
|
12
|
Guo J, Shen S, Xing S, Yu H, Huan T. ISFrag: De Novo Recognition of In-Source Fragments for Liquid Chromatography-Mass Spectrometry Data. Anal Chem 2021; 93:10243-10250. [PMID: 34270210 DOI: 10.1021/acs.analchem.1c01644] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
In-source fragmentation (ISF) is a naturally occurring phenomenon during electrospray ionization (ESI) in liquid chromatography-mass spectrometry (LC-MS) analysis. ISF leads to false metabolite annotation in untargeted metabolomics, prompting misinterpretation of the underlying biological mechanisms. Conventional metabolomic data cleaning mainly focuses on the annotation of adducts and isotopes, and the recognition of ISF features is mainly based on common neutral losses and the LC coelution pattern. In this work, we recognized three increasingly important patterns of ISF features, including (1) coeluting with their precursor ions, (2) being in the tandem MS (MS2) spectra of their precursor ions, and (3) sharing similar MS2 fragmentation patterns with their precursor ions. Based on these patterns, we developed an R package, ISFrag, to comprehensively recognize all possible ISF features from LC-MS data generated from full-scan, data-dependent acquisition, and data-independent acquisition modes without the assistance of common neutral loss information or MS2 spectral library. Tested using metabolite standards, we achieved a 100% correct recognition of level 1 ISF features and over 80% correct recognition for level 2 ISF features. Further application of ISFrag on untargeted metabolomics data allows us to identify ISF features that can potentially cause false metabolite annotation at an omics-scale. With the help of ISFrag, we performed a systematic investigation of how ISF features are influenced by different MS parameters, including capillary voltage, end plate offset, ion energy, and "collision energy". Our results show that while increasing energies can increase the number of real metabolic features and ISF features, the percentage of ISF features might not necessarily increase. Finally, using ISFrag, we created an ISF pathway to visualize the relationships between multiple ISF features that belong to the same precursor ion. ISFrag is freely available on GitHub (https://github.com/HuanLab/ISFrag).
Collapse
Affiliation(s)
- Jian Guo
- Department of Chemistry, Faculty of Science, University of British Columbia, 2036 Main Mall, Vancouver, V6T 1Z1 British Columbia Canada
| | - Sam Shen
- Department of Chemistry, Faculty of Science, University of British Columbia, 2036 Main Mall, Vancouver, V6T 1Z1 British Columbia Canada
| | - Shipei Xing
- Department of Chemistry, Faculty of Science, University of British Columbia, 2036 Main Mall, Vancouver, V6T 1Z1 British Columbia Canada
| | - Huaxu Yu
- Department of Chemistry, Faculty of Science, University of British Columbia, 2036 Main Mall, Vancouver, V6T 1Z1 British Columbia Canada
| | - Tao Huan
- Department of Chemistry, Faculty of Science, University of British Columbia, 2036 Main Mall, Vancouver, V6T 1Z1 British Columbia Canada
| |
Collapse
|