1
|
Climaco Pinto R, Karaman I, Lewis MR, Hällqvist J, Kaluarachchi M, Graça G, Chekmeneva E, Durainayagam B, Ghanbari M, Ikram MA, Zetterberg H, Griffin J, Elliott P, Tzoulaki I, Dehghan A, Herrington D, Ebbels T. Finding Correspondence between Metabolomic Features in Untargeted Liquid Chromatography-Mass Spectrometry Metabolomics Datasets. Anal Chem 2022; 94:5493-5503. [PMID: 35360896 PMCID: PMC9008693 DOI: 10.1021/acs.analchem.1c03592] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
![]()
Integration
of multiple datasets can greatly enhance bioanalytical
studies, for example, by increasing power to discover and validate
biomarkers. In liquid chromatography–mass spectrometry (LC–MS)
metabolomics, it is especially hard to combine untargeted datasets
since the majority of metabolomic features are not annotated and thus
cannot be matched by chemical identity. Typically, the information
available for each feature is retention time (RT), mass-to-charge
ratio (m/z), and feature intensity
(FI). Pairs of features from the same metabolite in separate datasets
can exhibit small but significant differences, making matching very
challenging. Current methods to address this issue are too simple
or rely on assumptions that cannot be met in all cases. We present
a method to find feature correspondence between two similar LC–MS
metabolomics experiments or batches using only the features’
RT, m/z, and FI. We demonstrate
the method on both real and synthetic datasets, using six orthogonal
validation strategies to gauge the matching quality. In our main example,
4953 features were uniquely matched, of which 585 (96.8%) of 604 manually
annotated features were correct. In a second example, 2324 features
could be uniquely matched, with 79 (90.8%) out of 87 annotated features
correctly matched. Most of the missed annotated matches are between
features that behave very differently from modeled inter-dataset shifts
of RT, MZ, and FI. In a third example with simulated data with 4755
features per dataset, 99.6% of the matches were correct. Finally,
the results of matching three other dataset pairs using our method
are compared with a published alternative method, metabCombiner, showing
the advantages of our approach. The method can be applied using M2S
(Match 2 Sets), a free, open-source MATLAB toolbox, available at https://github.com/rjdossan/M2S.
Collapse
Affiliation(s)
- Rui Climaco Pinto
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W12 0BZ, U.K.,UK Dementia Research Institute, Imperial College London, London W12 0BZ, U.K
| | - Ibrahim Karaman
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W12 0BZ, U.K.,UK Dementia Research Institute, Imperial College London, London W12 0BZ, U.K
| | - Matthew R Lewis
- MRC-NIHR National Phenome Centre, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, U.K.,Section of Bioanalytical Chemistry, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, U.K
| | - Jenny Hällqvist
- Centre for Translational Omics, Great Ormond Street Hospital, University College London, London WC1N 1EH, U.K.,Department of Clinical and Movement Neurosciences, Queen Square Institute of Neurology, University College London, London WC1N 3BG, U.K
| | - Manuja Kaluarachchi
- UK Dementia Research Institute, Imperial College London, London W12 0BZ, U.K.,Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, U.K
| | - Gonçalo Graça
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, U.K
| | - Elena Chekmeneva
- MRC-NIHR National Phenome Centre, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, U.K.,Section of Bioanalytical Chemistry, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, U.K
| | - Brenan Durainayagam
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W12 0BZ, U.K.,UK Dementia Research Institute, Imperial College London, London W12 0BZ, U.K
| | - Mohsen Ghanbari
- Department of Epidemiology, Erasmus University Medical Center, 3015 GD Rotterdam, The Netherlands
| | - M Arfan Ikram
- Department of Epidemiology, Erasmus University Medical Center, 3015 GD Rotterdam, The Netherlands
| | - Henrik Zetterberg
- Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, The Sahlgrenska Academy at University of Gothenburg, 431 41 Mölndal, Sweden.,Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, 413 45 Mölndal, Sweden.,Department of Neurodegenerative Disease, University College London, Queen Square, London WC1N 3BG, U.K.,UK Dementia Research Institute, University College London, London WC1N 3BG, U.K
| | - Julian Griffin
- UK Dementia Research Institute, Imperial College London, London W12 0BZ, U.K.,Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, U.K
| | - Paul Elliott
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W12 0BZ, U.K.,UK Dementia Research Institute, Imperial College London, London W12 0BZ, U.K
| | - Ioanna Tzoulaki
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W12 0BZ, U.K.,Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, 451 10 Ioannina, Greece
| | - Abbas Dehghan
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W12 0BZ, U.K.,UK Dementia Research Institute, Imperial College London, London W12 0BZ, U.K.,Department of Epidemiology, Erasmus University Medical Center, 3015 GD Rotterdam, The Netherlands
| | - David Herrington
- Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina 27101, United States
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London SW7 2AZ, U.K
| |
Collapse
|
2
|
Habra H, Kachman M, Bullock K, Clish C, Evans CR, Karnovsky A. metabCombiner: Paired Untargeted LC-HRMS Metabolomics Feature Matching and Concatenation of Disparately Acquired Data Sets. Anal Chem 2021; 93:5028-5036. [PMID: 33724799 PMCID: PMC9906987 DOI: 10.1021/acs.analchem.0c03693] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
LC-HRMS experiments detect thousands of compounds, with only a small fraction of them identified in most studies. Traditional data processing pipelines contain an alignment step to assemble the measurements of overlapping features across samples into a unified table. However, data sets acquired under nonidentical conditions are not amenable to this process, mostly due to significant alterations in chromatographic retention times. Alignment of features between disparately acquired LC-MS metabolomics data could aid collaborative compound identification efforts and enable meta-analyses of expanded data sets. Here, we describe metabCombiner, a new computational pipeline for matching known and unknown features in a pair of untargeted LC-MS data sets and concatenating their abundances into a combined table of intersecting feature measurements. metabCombiner groups features by mass-to-charge (m/z) values to generate a search space of possible feature pair alignments, fits a spline through a set of selected retention time ordered pairs, and ranks alignments by m/z, mapped retention time, and relative abundance similarity. We evaluated this workflow on a pair of plasma metabolomics data sets acquired with different gradient elution methods, achieving a mean absolute retention time prediction error of roughly 0.06 min and a weighted per-compound matching accuracy of approximately 90%. We further demonstrate the utility of this method by comprehensively mapping features in urine and muscle metabolomics data sets acquired from different laboratories. metabCombiner has the potential to bridge the gap between otherwise incompatible metabolomics data sets and is available as an R package at https://github.com/hhabra/metabCombiner and Bioconductor.
Collapse
Affiliation(s)
- Hani Habra
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Arbor, Michigan 48109, United States
| | - Maureen Kachman
- Michigan Regional Comprehensive Metabolomics Resource Core, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan 48105, United States
| | - Kevin Bullock
- Metabolomics Platform, Broad Institute, Cambridge, Massachusetts 02142, United States
| | - Clary Clish
- Metabolomics Platform, Broad Institute, Cambridge, Massachusetts 02142, United States
| | - Charles R Evans
- Michigan Regional Comprehensive Metabolomics Resource Core, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan 48105, United States
| | - Alla Karnovsky
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Arbor, Michigan 48109, United States
- Michigan Regional Comprehensive Metabolomics Resource Core, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan 48105, United States
| |
Collapse
|
3
|
Mak TD, Goudarzi M, Laiakis EC, Stein SE. Disparate Metabolomics Data Reassembler: A Novel Algorithm for Agglomerating Incongruent LC-MS Metabolomics Datasets. Anal Chem 2020; 92:5231-5239. [PMID: 32118408 PMCID: PMC10926180 DOI: 10.1021/acs.analchem.9b05763] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In the past decade, the field of LC-MS-based metabolomics has transformed from an obscure specialty into a major "-omics" platform for studying metabolic processes and biomolecular characterization. However, as a whole the field is still very fractured, as the nature of the instrumentation and the information produced by the platform essentially creates incompatible "islands" of datasets. This lack of data coherency results in the inability to accumulate a critical mass of metabolomics data that has enabled other -omics platforms to make impactful discoveries and meaningful advances. As such, we have developed a novel algorithm, called Disparate Metabolomics Data Reassembler (DIMEDR), which attempts to bridge the inconsistencies between incongruent LC-MS metabolomics datasets of the same biological sample type. A single "primary" dataset is postprocessed via traditional means of peak identification, alignment, and grouping. DIMEDR utilizes this primary dataset as a progenitor template by which data from subsequent disparate datasets are reassembled and integrated into a unified framework that maximizes spectral feature similarity across all samples. This is accomplished by a novel procedure for universal retention time correction and comparison via identification of ubiquitous features in the initial primary dataset, which are subsequently utilized as endogenous internal standards during integration. For demonstration purposes, two human and two mouse urine metabolomics datasets from four unrelated studies acquired over 4 years were unified via DIMEDR, which enabled meaningful analysis across otherwise incomparable and unrelated datasets.
Collapse
Affiliation(s)
- Tytus D. Mak
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899-8632
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, New Research Building E504/508, 3970 Reservoir Rd, NW, Washington, DC 20057
| | - Maryam Goudarzi
- Department of Cellular & Molecular Medicine, Cleveland Clinic Lerner Research Institute, Building NN1, Room 28, 9500 Euclid Ave, Cleveland, OH 44195
| | - Evagelia C. Laiakis
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, New Research Building E504/508, 3970 Reservoir Rd, NW, Washington, DC 20057
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899-8632
| |
Collapse
|
4
|
Doppler M, Kluger B, Bueschl C, Steiner B, Buerstmayr H, Lemmens M, Krska R, Adam G, Schuhmacher R. Stable Isotope-Assisted Plant Metabolomics: Investigation of Phenylalanine-Related Metabolic Response in Wheat Upon Treatment With the Fusarium Virulence Factor Deoxynivalenol. FRONTIERS IN PLANT SCIENCE 2019; 10:1137. [PMID: 31736983 PMCID: PMC6831647 DOI: 10.3389/fpls.2019.01137] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 08/20/2019] [Indexed: 05/03/2023]
Abstract
The major Fusarium mycotoxin deoxynivalenol (DON) is a virulence factor in wheat and has also been shown to induce defense responses in host plant tissue. In this study, global and tracer labeling with 13C were combined to annotate the overall metabolome of wheat spikes and to evaluate the response of phenylalanine-related pathways upon treatment with DON. At anthesis, spikes of resistant and susceptible cultivars as well as two related near isogenic wheat lines (NILs) differing in the presence/absence of the major resistance QTL Fhb1 were treated with 1 mg DON or water (control), and samples were collected at 0, 12, 24, 48, and 96 h after treatment (hat). A total of 172 Phe-derived wheat constituents were detected with our untargeted approach employing 13C-labeled phenylalanine and subsequently annotated as flavonoids, lignans, coumarins, benzoic acid derivatives, hydroxycinnamic acid amides (HCAAs), as well as peptides. Ninety-six hours after the DON treatment, up to 30% of the metabolites biosynthesized from Phe showed significantly increased levels compared to the control samples. Major metabolic changes included the formation of precursors of compounds implicated in cell wall reinforcement and presumed antifungal compounds. In addition, also dipeptides, which presumably are products of proteolytic degradation of truncated proteins generated in the presence of the toxin, were significantly more abundant upon DON treatment. An in-depth comparison of the two NILs with correlation clustering of time course profiles revealed some 70 DON-responsive Phe derivatives. While several flavonoids had constitutively different abundance levels between the two NILs differing in resistance, other Phe-derived metabolites such as HCAAs and hydroxycinnamoyl quinates were affected differently in the two NILs after treatment with DON. Our results suggest a strong activation of the general phenylpropanoid pathway and that coumaroyl-CoA is mainly diverted towards HCAAs in the presence of Fhb1, whereas the metabolic route to monolignol(-conjugates), lignans, and lignin seems to be favored in the absence of the Fhb1 resistance quantitative trait loci.
Collapse
Affiliation(s)
- Maria Doppler
- Department of Agrobiotechnology (IFA-Tulln), Institute of Bioanalytics and Agro-Metabolomics, University of Natural Resources and Life Sciences, Vienna (BOKU), Tulln, Austria
| | - Bernhard Kluger
- Department of Agrobiotechnology (IFA-Tulln), Institute of Bioanalytics and Agro-Metabolomics, University of Natural Resources and Life Sciences, Vienna (BOKU), Tulln, Austria
| | - Christoph Bueschl
- Department of Agrobiotechnology (IFA-Tulln), Institute of Bioanalytics and Agro-Metabolomics, University of Natural Resources and Life Sciences, Vienna (BOKU), Tulln, Austria
| | - Barbara Steiner
- Department of Agrobiotechnology (IFA-Tulln), Institute for Biotechnology in Plant Production, University of Natural Resources and Life Sciences, Vienna (BOKU), Tulln, Austria
| | - Hermann Buerstmayr
- Department of Agrobiotechnology (IFA-Tulln), Institute for Biotechnology in Plant Production, University of Natural Resources and Life Sciences, Vienna (BOKU), Tulln, Austria
| | - Marc Lemmens
- Department of Agrobiotechnology (IFA-Tulln), Institute for Biotechnology in Plant Production, University of Natural Resources and Life Sciences, Vienna (BOKU), Tulln, Austria
| | - Rudolf Krska
- Department of Agrobiotechnology (IFA-Tulln), Institute of Bioanalytics and Agro-Metabolomics, University of Natural Resources and Life Sciences, Vienna (BOKU), Tulln, Austria
- School of Biological Sciences, Institute for Global Food Security, Queen’s University Belfast, Belfast, United Kingdom
| | - Gerhard Adam
- Department of Applied Genetics and Cell Biology (DAGZ), University of Natural Resources and Life Sciences, Vienna (BOKU), Tulln, Austria
| | - Rainer Schuhmacher
- Department of Agrobiotechnology (IFA-Tulln), Institute of Bioanalytics and Agro-Metabolomics, University of Natural Resources and Life Sciences, Vienna (BOKU), Tulln, Austria
| |
Collapse
|