1
|
Wang X, Abiead YE, Acharya DD, Brown CJ, Clevenger K, Hu J, Kretsch A, Menegatti C, Xiong Q, Bittremieux W, Wang M. MS-RT: A Method for Evaluating MS/MS Clustering Performance for Metabolomics Data. J Proteome Res 2025; 24:1778-1790. [PMID: 40042915 DOI: 10.1021/acs.jproteome.4c00881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2025]
Abstract
The clustering of tandem mass spectra (MS/MS) is a crucial computational step to deduplicate repeated acquisitions in data-dependent experiments. This technique is essential in untargeted metabolomics, particularly with high-throughput mass spectrometers capable of generating hundreds of MS/MS spectra per second. Despite advancements in MS/MS clustering algorithms in proteomics, their performance in metabolomics has not been extensively evaluated due to the lack of database search tools with false discovery rate control for molecule identification. To bridge this gap, this study introduces the MS1-retention time (MS-RT) method to assess MS/MS clustering performance in metabolomics data sets. Here, we validate MS-RT by comparing MS-RT to established proteomics clustering evaluation approaches that utilize database search identifications. Additionally, we evaluate the performance of several MS/MS clustering tools on metabolomics data sets, highlighting their advantages and drawbacks. This MS-RT method and the MS/MS clustering tool benchmarking will provide valuable real world practical recommendations for tools and set the stage for future advancements in metabolomics MS/MS clustering.
Collapse
Affiliation(s)
- Xianghu Wang
- Department of Computer Science and Engineering, University of California Riverside, 900 University Avenue, Riverside, California 92521, United States
| | - Yasin El Abiead
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9255 Pharmacy Lane, San Diego, California 92093, United States
| | - Deepa D Acharya
- Integrated Discovery and Bioprocess, Crop Health R&D, Corteva Agriscience, 9330 Zionsville Road, Indianapolis, Indiana 46268, United States
| | - Christopher J Brown
- Regulatory Science, Corteva Agriscience, 9330 Zionsville Road, Indianapolis, Indiana 46268, United States
| | - Ken Clevenger
- Integrated Discovery and Bioprocess, Crop Health R&D, Corteva Agriscience, 9330 Zionsville Road, Indianapolis, Indiana 46268, United States
| | - Jie Hu
- Data Science, Corteva Agriscience, 9330 Zionsville Road, Indianapolis, Indiana 46268, United States
| | - Ashley Kretsch
- Integrated Discovery and Bioprocess, Crop Health R&D, Corteva Agriscience, 9330 Zionsville Road, Indianapolis, Indiana 46268, United States
| | - Carla Menegatti
- Integrated Discovery and Bioprocess, Crop Health R&D, Corteva Agriscience, 9330 Zionsville Road, Indianapolis, Indiana 46268, United States
| | - Quanbo Xiong
- Integrated Discovery and Bioprocess, Crop Health R&D, Corteva Agriscience, 9330 Zionsville Road, Indianapolis, Indiana 46268, United States
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
| | - Mingxun Wang
- Department of Computer Science and Engineering, University of California Riverside, 900 University Avenue, Riverside, California 92521, United States
| |
Collapse
|
2
|
Breeur M, Stepaniants G, Keski-Rahkonen P, Rigollet P, Viallon V. Optimal transport for automatic alignment of untargeted metabolomic data. eLife 2024; 12:RP91597. [PMID: 38896449 PMCID: PMC11186628 DOI: 10.7554/elife.91597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2024] Open
Abstract
Untargeted metabolomic profiling through liquid chromatography-mass spectrometry (LC-MS) measures a vast array of metabolites within biospecimens, advancing drug development, disease diagnosis, and risk prediction. However, the low throughput of LC-MS poses a major challenge for biomarker discovery, annotation, and experimental comparison, necessitating the merging of multiple datasets. Current data pooling methods encounter practical limitations due to their vulnerability to data variations and hyperparameter dependence. Here, we introduce GromovMatcher, a flexible and user-friendly algorithm that automatically combines LC-MS datasets using optimal transport. By capitalizing on feature intensity correlation structures, GromovMatcher delivers superior alignment accuracy and robustness compared to existing approaches. This algorithm scales to thousands of features requiring minimal hyperparameter tuning. Manually curated datasets for validating alignment algorithms are limited in the field of untargeted metabolomics, and hence we develop a dataset split procedure to generate pairs of validation datasets to test the alignments produced by GromovMatcher and other methods. Applying our method to experimental patient studies of liver and pancreatic cancer, we discover shared metabolic features related to patient alcohol intake, demonstrating how GromovMatcher facilitates the search for biomarkers associated with lifestyle risk factors linked to several cancer types.
Collapse
Affiliation(s)
- Marie Breeur
- Nutrition and Metabolism Branch, International Agency for Research on CancerLyonFrance
| | - George Stepaniants
- Massachusetts Institute of Technology, Department of MathematicsBostonUnited States
| | - Pekka Keski-Rahkonen
- Nutrition and Metabolism Branch, International Agency for Research on CancerLyonFrance
| | - Philippe Rigollet
- Massachusetts Institute of Technology, Department of MathematicsBostonUnited States
| | - Vivian Viallon
- Nutrition and Metabolism Branch, International Agency for Research on CancerLyonFrance
| |
Collapse
|
3
|
HAO J, CHEN Y, WANG Y, AN N, BAI P, ZHU Q, FENG Y. [Alignment method for metabolite chromatographic peaks using an N-acyl glycine retention index system]. Se Pu 2024; 42:159-163. [PMID: 38374596 PMCID: PMC10877472 DOI: 10.3724/sp.j.1123.2023.07015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Indexed: 02/21/2024] Open
Abstract
Peak alignment is a crucial data-processing step in untargeted metabolomics analysis that aims to integrate metabolite data from multiple liquid chromatography-mass spectrometry (LC-MS) batches for enhanced comparability and reliability. However, slight variations in the chromatographic separation conditions can result in retention time (RT) shifts between consecutive analyses, adversely affecting peak alignment accuracy. In this study, we present a retention index (RI)-based chromatographic peak-shift correction (CPSC) strategy to address RT shifts and align chromatographic peaks for metabolomics studies. A series of N-acyl glycine homologues (C2-C23) was synthesized as calibrants, and an LC RI system was established. This system effectively corrected RT shifts arising from variations in flow rate, gradient elution, instrument systems, and chromatographic columns. Leveraging the RI system, we successfully adjusted the RT of raw data to mitigate RT shifts and then implemented the Joint Aligner algorithm for peak alignment. We assessed the accuracy of the RI-based CPSC strategy using pooled human fecal samples as a test model. Notably, the application of the RI-based CPSC strategy to a long-term dataset spanning 157 d as an illustration revealed a significant enhancement in peak alignment accuracy from 15.5% to 80.9%, indicating its ability to substantially improve peak-alignment precision in multibatch LC-MS analyses.
Collapse
|
4
|
Domżał B, Nawrocka EK, Gołowicz D, Ciach MA, Miasojedow B, Kazimierczuk K, Gambin A. Magnetstein: An Open-Source Tool for Quantitative NMR Mixture Analysis Robust to Low Resolution, Distorted Lineshapes, and Peak Shifts. Anal Chem 2024; 96:188-196. [PMID: 38117933 PMCID: PMC10782418 DOI: 10.1021/acs.analchem.3c03594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 11/30/2023] [Accepted: 11/30/2023] [Indexed: 12/22/2023]
Abstract
1H NMR spectroscopy is a powerful tool for analyzing mixtures including determining the concentrations of individual components. When signals from multiple compounds overlap, this task requires computational solutions. They are typically based on peak-picking and the comparison of obtained peak lists with libraries of individual components. This can fail if peaks are not sufficiently resolved or when peak positions differ between the library and the mixture. In this paper, we present Magnetstein, a quantification algorithm rooted in the optimal transport theory that makes it robust to unexpected frequency shifts and overlapping signals. Thanks to this, Magnetstein can quantitatively analyze difficult spectra with the estimation trueness an order of magnitude higher than that of commercial tools. Furthermore, the method is easier to use than other approaches, having only two parameters with default values applicable to a broad range of experiments and requiring little to no preprocessing of the spectra.
Collapse
Affiliation(s)
- Barbara Domżał
- Faculty
of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw 02-097, Poland
| | - Ewa Klaudia Nawrocka
- Centre
of New Technologies, University of Warsaw, Banacha 2C, Warsaw 02-097, Poland
| | - Dariusz Gołowicz
- Institute
of Physical Chemistry, Polish Academy of
Sciences, Kasprzaka 44/52, Warsaw 01-224, Poland
| | - Michał Aleksander Ciach
- Faculty
of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw 02-097, Poland
| | - Błażej Miasojedow
- Faculty
of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw 02-097, Poland
| | | | - Anna Gambin
- Faculty
of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw 02-097, Poland
| |
Collapse
|
5
|
Hao JD, Chen YY, Wang YZ, An N, Bai PR, Zhu QF, Feng YQ. Novel Peak Shift Correction Method Based on the Retention Index for Peak Alignment in Untargeted Metabolomics. Anal Chem 2023; 95:13330-13337. [PMID: 37609864 DOI: 10.1021/acs.analchem.3c02583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Peak alignment is a crucial step in liquid chromatography-mass spectrometry (LC-MS)-based large-scale untargeted metabolomics workflows, as it enables the integration of metabolite peaks across multiple samples, which is essential for accurate data interpretation. Slight differences or fluctuations in chromatographic separation conditions, however, can cause the chromatographic retention time (RT) shift between consecutive analyses, ultimately affecting the accuracy of peak alignment between samples. Here, we introduce a novel RT shift correction method based on the retention index (RI) and apply it to peak alignment. We synthesized a series of N-acyl glycine (C2-C23) homologues via the amidation reaction between glycine with normal saturated fatty acids (C2-C23) as calibrants able to respond proficiently in both mass spectrometric positive- and negative-ion modes. Using these calibrants, we established an N-acyl glycine RI system. This RI system is capable of covering a broad chromatographic space and addressing chromatographic RT shift caused by variations in flow rate, gradient elution, instrument systems, and LC separation columns. Moreover, based on the RI system, we developed a peak shift correction model to enhance peak alignment accuracy. Applying the model resulted in a significant improvement in the accuracy of peak alignment from 15.5 to 80.9% across long-term data spanning a period of 157 days. To facilitate practical application, we developed a Python-based program, which is freely available at https://github.com/WHU-Fenglab/RI-based-CPSC.
Collapse
Affiliation(s)
- Jun-Di Hao
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Yao-Yu Chen
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Yan-Zhen Wang
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Na An
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Pei-Rong Bai
- Department of Chemistry, Wuhan University, Wuhan 430072, China
| | - Quan-Fei Zhu
- School of Public Health, Wuhan University, Wuhan 430071, China
| | - Yu-Qi Feng
- Department of Chemistry, Wuhan University, Wuhan 430072, China
- Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan 430071, China
| |
Collapse
|
6
|
Skoraczyński G, Gambin A, Miasojedow B. Alignstein: Optimal transport for improved LC-MS retention time alignment. Gigascience 2022; 11:giac101. [PMID: 36329619 PMCID: PMC9633278 DOI: 10.1093/gigascience/giac101] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/24/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Reproducibility of liquid chromatography separation is limited by retention time drift. As a result, measured signals lack correspondence over replicates of the liquid chromatography-mass spectrometry (LC-MS) experiments. Correction of these errors is named retention time alignment and needs to be performed before further quantitative analysis. Despite the availability of numerous alignment algorithms, their accuracy is limited (e.g., for retention time drift that swaps analytes' elution order). RESULTS We present the Alignstein, an algorithm for LC-MS retention time alignment. It correctly finds correspondence even for swapped signals. To achieve this, we implemented the generalization of the Wasserstein distance to compare multidimensional features without any reduction of the information or dimension of the analyzed data. Moreover, Alignstein by design requires neither a reference sample nor prior signal identification. We validate the algorithm on publicly available benchmark datasets obtaining competitive results. Finally, we show that it can detect the information contained in the tandem mass spectrum by the spatial properties of chromatograms. CONCLUSIONS We show that the use of optimal transport effectively overcomes the limitations of existing algorithms for statistical analysis of mass spectrometry datasets. The algorithm's source code is available at https://github.com/grzsko/Alignstein.
Collapse
Affiliation(s)
- Grzegorz Skoraczyński
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| | - Anna Gambin
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| | - Błażej Miasojedow
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097 Warsaw, Poland
| |
Collapse
|