1
|
Zhang L, Zhang X, Leach JM, Rahman AKMF, Howell CR, Yi N. Bayesian compositional generalized linear mixed models for disease prediction using microbiome data. BMC Bioinformatics 2025; 26:98. [PMID: 40188058 PMCID: PMC11971746 DOI: 10.1186/s12859-025-06114-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Accepted: 03/12/2025] [Indexed: 04/07/2025] Open
Abstract
The primary goal of predictive modeling for compositional microbiome data is to better understand and predict disease susceptibility based on the relative abundance of microbial species. Current approaches in this area often assume a high-dimensional sparse setting, where only a small subset of microbiome features is considered relevant to the outcome. However, in real-world data, both large and small effects frequently coexist, and acknowledging the contribution of smaller effects can significantly enhance predictive performance. To address this challenge, we developed Bayesian Compositional Generalized Linear Mixed Models for Analyzing Microbiome Data (BCGLMM). BCGLMM is capable of identifying both moderate taxa effects and the cumulative impact of numerous minor taxa, which are often overlooked in conventional models. With a sparsity-inducing prior, the structured regularized horseshoe prior, BCGLMM effectively collaborates phylogenetically related moderate effects. The random effect term efficiently captures sample-related minor effects by incorporating sample similarities within its variance-covariance matrix. We fitted the proposed models using Markov Chain Monte Carlo (MCMC) algorithms with rstan. The performance of the proposed method was evaluated through extensive simulation studies, demonstrating its superiority with higher prediction accuracy compared to existing methods. We then applied the proposed method on American Gut Data to predict inflammatory bowel disease (IBD). To ensure reproducibility, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCGLMM .
Collapse
Affiliation(s)
- Li Zhang
- Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, PA, USA.
| | - Xinyan Zhang
- School of Data Science and Analytics, Kennesaw State University, Kennesaw, GA, USA
| | - Justin M Leach
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - A K M F Rahman
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Carrie R Howell
- Department of Medicine, Division of Preventive Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA.
| |
Collapse
|
2
|
Bucket Fuser: Statistical Signal Extraction for 1D 1H NMR Metabolomic Data. Metabolites 2022; 12:metabo12090812. [PMID: 36144216 PMCID: PMC9501206 DOI: 10.3390/metabo12090812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/20/2022] [Accepted: 08/25/2022] [Indexed: 11/17/2022] Open
Abstract
Untargeted metabolomics is a promising tool for identifying novel disease biomarkers and unraveling underlying pathomechanisms. Nuclear magnetic resonance (NMR) spectroscopy is particularly suited for large-scale untargeted metabolomics studies due to its high reproducibility and cost effectiveness. Here, one-dimensional (1D) 1H NMR experiments offer good sensitivity at reasonable measurement times. Their subsequent data analysis requires sophisticated data preprocessing steps, including the extraction of NMR features corresponding to specific metabolites. We developed a novel 1D NMR feature extraction procedure, called Bucket Fuser (BF), which is based on a regularized regression framework with fused group LASSO terms. The performance of the BF procedure was demonstrated using three independent NMR datasets and was benchmarked against existing state-of-the-art NMR feature extraction methods. BF dynamically constructs NMR metabolite features, the widths of which can be adjusted via a regularization parameter. BF consistently improved metabolite signal extraction, as demonstrated by our correlation analyses with absolutely quantified metabolites. It also yielded a higher proportion of statistically significant metabolite features in our differential metabolite analyses. The BF algorithm is computationally efficient and it can deal with small sample sizes. In summary, the Bucket Fuser algorithm, which is available as a supplementary python code, facilitates the fast and dynamic extraction of 1D NMR signals for the improved detection of metabolic biomarkers.
Collapse
|
3
|
Forouzandeh A, Rutar A, Kalmady SV, Greiner R. Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets. PLoS One 2022; 17:e0252697. [PMID: 35901020 PMCID: PMC9333302 DOI: 10.1371/journal.pone.0252697] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 06/29/2022] [Indexed: 11/19/2022] Open
Abstract
Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible – subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between the biomarkers proposed in pairs of related studies that explore the same phenotypes over the same distribution of subjects. This paper first defines the Reproducibility Score for a labeled dataset as a measure (taking values between 0 and 1) of the reproducibility of the results produced by a specified fixed biomarker discovery process for a given distribution of subjects. We then provide ways to reliably estimate this score by defining algorithms that produce an over-bound and an under-bound for this score for a given dataset and biomarker discovery process, for the case of univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results on a large number of datasets and show that these predictions match known reproducibility results. To encourage others to apply this technique to analyze their biomarker sets, we have also created a publicly available website, https://biomarker.shinyapps.io/BiomarkerReprod/, that produces these Reproducibility Score approximations for any given dataset (with continuous or discrete features and binary class labels).
Collapse
Affiliation(s)
- Amir Forouzandeh
- Department of Computing Science, University of Alberta, Edmonton, Canada
- * E-mail:
| | - Alex Rutar
- Department of Pure Math, University of Waterloo, Waterloo, ON, Canada
| | - Sunil V. Kalmady
- Department of Computing Science, University of Alberta, Edmonton, Canada
- Canadian VIGOUR Centre, University of Alberta, Edmonton, Canada
| | - Russell Greiner
- Department of Computing Science, University of Alberta, Edmonton, Canada
- Alberta Machine Intelligence Institute, Edmonton, Canada
| |
Collapse
|
4
|
A Longitudinal 1H NMR-Based Metabolic Profile Analysis of Urine from Hospitalized Premature Newborns Receiving Enteral and Parenteral Nutrition. Metabolites 2022; 12:metabo12030255. [PMID: 35323698 PMCID: PMC8952338 DOI: 10.3390/metabo12030255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 03/04/2022] [Accepted: 03/10/2022] [Indexed: 12/24/2022] Open
Abstract
Preterm newborns are extremely vulnerable to morbidities, complications, and death. Preterm birth is a global public health problem due to its socioeconomic burden. Nurturing preterm newborns is a critical medical issue because they have limited nutrient stores and it is difficult to establish enteral feeding, which leads to inadequate growth frequently associated with poor neurodevelopmental outcomes. Parenteral nutrition (PN) provides nutrients to preterm newborns, but its biochemical effects are not completely known. To study the effect of PN treatment on preterm newborns, an untargeted metabolomic 1H nuclear magnetic resonance (NMR) assay was performed on 107 urine samples from 34 hospitalized patients. Multivariate data (Principal Component Analysis, PCA, Orthogonal partial least squares discriminant analysis OPLS-DA, parallel factor analysis PARAFAC-2) and univariate analyses were used to identify the association of specific spectral data with different nutritional types (NTs) and gestational ages. Our results revealed changes in the metabolic profile related to the NT, with the tricarboxylic acid cycle and galactose metabolic pathways being the most impacted pathways. Low citrate and succinate levels, despite higher glucose relative urinary concentrations, seem to constitute the metabolic profile found in the studied critically ill preterm newborns who received PN, indicating an energetic dysfunction that must be taken into account for better nutritional management.
Collapse
|
5
|
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools. Funct Integr Genomics 2021; 22:3-26. [PMID: 34657989 DOI: 10.1007/s10142-021-00810-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 09/25/2021] [Accepted: 10/03/2021] [Indexed: 10/20/2022]
Abstract
This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."
Collapse
|
6
|
Monti GS, Filzmoser P. Robust logistic zero-sum regression for microbiome compositional data. ADV DATA ANAL CLASSI 2021. [DOI: 10.1007/s11634-021-00465-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractWe introduce the Robust Logistic Zero-Sum Regression (RobLZS) estimator, which can be used for a two-class problem with high-dimensional compositional covariates. Since the log-contrast model is employed, the estimator is able to do feature selection among the compositional parts. The proposed method attains robustness by minimizing a trimmed sum of deviances. A comparison of the performance of the RobLZS estimator with a non-robust counterpart and with other sparse logistic regression estimators is conducted via Monte Carlo simulation studies. Two microbiome data applications are considered to investigate the stability of the estimators to the presence of outliers. Robust Logistic Zero-Sum Regression is available as an R package that can be downloaded at https://github.com/giannamonti/RobZS.
Collapse
|
7
|
Schultheiss UT, Kosch R, Kotsis F, Altenbuchinger M, Zacharias HU. Chronic Kidney Disease Cohort Studies: A Guide to Metabolome Analyses. Metabolites 2021; 11:460. [PMID: 34357354 PMCID: PMC8304377 DOI: 10.3390/metabo11070460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/08/2021] [Accepted: 07/12/2021] [Indexed: 12/14/2022] Open
Abstract
Kidney diseases still pose one of the biggest challenges for global health, and their heterogeneity and often high comorbidity load seriously hinders the unraveling of their underlying pathomechanisms and the delivery of optimal patient care. Metabolomics, the quantitative study of small organic compounds, called metabolites, in a biological specimen, is gaining more and more importance in nephrology research. Conducting a metabolomics study in human kidney disease cohorts, however, requires thorough knowledge about the key workflow steps: study planning, sample collection, metabolomics data acquisition and preprocessing, statistical/bioinformatics data analysis, and results interpretation within a biomedical context. This review provides a guide for future metabolomics studies in human kidney disease cohorts. We will offer an overview of important a priori considerations for metabolomics cohort studies, available analytical as well as statistical/bioinformatics data analysis techniques, and subsequent interpretation of metabolic findings. We will further point out potential research questions for metabolomics studies in the context of kidney diseases and summarize the main results and data availability of important studies already conducted in this field.
Collapse
Affiliation(s)
- Ulla T. Schultheiss
- Institute of Genetic Epidemiology, Faculty of Medicine and Medical Center, University of Freiburg, 79106 Freiburg, Germany; (U.T.S.); (F.K.)
- Department of Medicine IV—Nephrology and Primary Care, Faculty of Medicine and Medical Center, University of Freiburg, 79106 Freiburg, Germany
| | - Robin Kosch
- Computational Biology, University of Hohenheim, 70599 Stuttgart, Germany;
| | - Fruzsina Kotsis
- Institute of Genetic Epidemiology, Faculty of Medicine and Medical Center, University of Freiburg, 79106 Freiburg, Germany; (U.T.S.); (F.K.)
- Department of Medicine IV—Nephrology and Primary Care, Faculty of Medicine and Medical Center, University of Freiburg, 79106 Freiburg, Germany
| | - Michael Altenbuchinger
- Institute of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany;
| | - Helena U. Zacharias
- Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Campus Kiel, 24105 Kiel, Germany
- Institute of Clinical Molecular Biology, Kiel University and University Medical Center Schleswig-Holstein, Campus Kiel, 24105 Kiel, Germany
| |
Collapse
|
8
|
Altenbuchinger M, Weihs A, Quackenbush J, Grabe HJ, Zacharias HU. Gaussian and Mixed Graphical Models as (multi-)omics data analysis tools. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194418. [PMID: 31639475 PMCID: PMC7166149 DOI: 10.1016/j.bbagrm.2019.194418] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/21/2019] [Accepted: 08/21/2019] [Indexed: 11/30/2022]
Abstract
Gaussian Graphical Models (GGMs) are tools to infer dependencies between biological variables. Popular applications are the reconstruction of gene, protein, and metabolite association networks. GGMs are an exploratory research tool that can be useful to discover interesting relations between genes (functional clusters) or to identify therapeutically interesting genes, but do not necessarily infer a network in the mechanistic sense. Although GGMs are well investigated from a theoretical and applied perspective, important extensions are not well known within the biological community. GGMs assume, for instance, multivariate normal distributed data. If this assumption is violated Mixed Graphical Models (MGMs) can be the better choice. In this review, we provide the theoretical foundations of GGMs, present extensions such as MGMs or multi-class GGMs, and illustrate how those methods can provide insight in biological mechanisms. We summarize several applications and present user-friendly estimation software. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Michael Altenbuchinger
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, MA Boston, 02115, USA.
| | - Antoine Weihs
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, 17475 Greifswald, Germany
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, MA Boston, 02115, USA; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Hans Jörgen Grabe
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, 17475 Greifswald, Germany; German Center for Neurodegenerative Diseases DZNE, Site Rostock/Greifswald, 17475 Greifswald, Germany
| | - Helena U Zacharias
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, 17475 Greifswald, Germany.
| |
Collapse
|
9
|
Lausser L, Szekely R, Klimmek A, Schmid F, Kestler HA. Constraining classifiers in molecular analysis: invariance and robustness. J R Soc Interface 2020; 17:20190612. [PMID: 32019472 PMCID: PMC7061712 DOI: 10.1098/rsif.2019.0612] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 01/09/2020] [Indexed: 12/02/2022] Open
Abstract
Analysing molecular profiles requires the selection of classification models that can cope with the high dimensionality and variability of these data. Also, improper reference point choice and scaling pose additional challenges. Often model selection is somewhat guided by ad hoc simulations rather than by sophisticated considerations on the properties of a categorization model. Here, we derive and report four linked linear concept classes/models with distinct invariance properties for high-dimensional molecular classification. We can further show that these concept classes also form a half-order of complexity classes in terms of Vapnik-Chervonenkis dimensions, which also implies increased generalization abilities. We implemented support vector machines with these properties. Surprisingly, we were able to attain comparable or even superior generalization abilities to the standard linear one on the 27 investigated RNA-Seq and microarray datasets. Our results indicate that a priori chosen invariant models can replace ad hoc robustness analysis by interpretable and theoretically guaranteed properties in molecular categorization.
Collapse
Affiliation(s)
- Ludwig Lausser
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Robin Szekely
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Attila Klimmek
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Florian Schmid
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Hans A. Kestler
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
- Leibniz Institute on Aging, Jena, Germany
| |
Collapse
|
10
|
A multi-source data integration approach reveals novel associations between metabolites and renal outcomes in the German Chronic Kidney Disease study. Sci Rep 2019; 9:13954. [PMID: 31562371 PMCID: PMC6764972 DOI: 10.1038/s41598-019-50346-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Accepted: 09/09/2019] [Indexed: 01/25/2023] Open
Abstract
Omics data facilitate the gain of novel insights into the pathophysiology of diseases and, consequently, their diagnosis, treatment, and prevention. To this end, omics data are integrated with other data types, e.g., clinical, phenotypic, and demographic parameters of categorical or continuous nature. We exemplify this data integration issue for a chronic kidney disease (CKD) study, comprising complex clinical, demographic, and one-dimensional 1H nuclear magnetic resonance metabolic variables. Routine analysis screens for associations of single metabolic features with clinical parameters while accounting for confounders typically chosen by expert knowledge. This knowledge can be incomplete or unavailable. We introduce a framework for data integration that intrinsically adjusts for confounding variables. We give its mathematical and algorithmic foundation, provide a state-of-the-art implementation, and evaluate its performance by sanity checks and predictive performance assessment on independent test data. Particularly, we show that discovered associations remain significant after variable adjustment based on expert knowledge. In contrast, we illustrate that associations discovered in routine univariate screening approaches can be biased by incorrect or incomplete expert knowledge. Our data integration approach reveals important associations between CKD comorbidities and metabolites, including novel associations of the plasma metabolite trimethylamine-N-oxide with cardiac arrhythmia and infarction in CKD stage 3 patients.
Collapse
|
11
|
Emwas AH, Roy R, McKay RT, Tenori L, Saccenti E, Gowda GAN, Raftery D, Alahmari F, Jaremko L, Jaremko M, Wishart DS. NMR Spectroscopy for Metabolomics Research. Metabolites 2019; 9:E123. [PMID: 31252628 PMCID: PMC6680826 DOI: 10.3390/metabo9070123] [Citation(s) in RCA: 609] [Impact Index Per Article: 101.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Revised: 06/14/2019] [Accepted: 06/18/2019] [Indexed: 12/14/2022] Open
Abstract
Over the past two decades, nuclear magnetic resonance (NMR) has emerged as one of the three principal analytical techniques used in metabolomics (the other two being gas chromatography coupled to mass spectrometry (GC-MS) and liquid chromatography coupled with single-stage mass spectrometry (LC-MS)). The relative ease of sample preparation, the ability to quantify metabolite levels, the high level of experimental reproducibility, and the inherently nondestructive nature of NMR spectroscopy have made it the preferred platform for long-term or large-scale clinical metabolomic studies. These advantages, however, are often outweighed by the fact that most other analytical techniques, including both LC-MS and GC-MS, are inherently more sensitive than NMR, with lower limits of detection typically being 10 to 100 times better. This review is intended to introduce readers to the field of NMR-based metabolomics and to highlight both the advantages and disadvantages of NMR spectroscopy for metabolomic studies. It will also explore some of the unique strengths of NMR-based metabolomics, particularly with regard to isotope selection/detection, mixture deconvolution via 2D spectroscopy, automation, and the ability to noninvasively analyze native tissue specimens. Finally, this review will highlight a number of emerging NMR techniques and technologies that are being used to strengthen its utility and overcome its inherent limitations in metabolomic applications.
Collapse
Affiliation(s)
- Abdul-Hamid Emwas
- Core Labs, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Raja Roy
- Centre of Biomedical Research, Formerly, Centre of Biomedical Magnetic Resonance, Sanjay Gandhi Post-Graduate Institute of Medical Sciences Campus, Uttar Pradesh 226014, India
| | - Ryan T McKay
- Department of Chemistry, University of Alberta, Edmonton, AB T6G 2W2, Canada
| | - Leonardo Tenori
- Department of Experimental and Clinical Medicine, University of Florence, Largo Brambilla 3, 50134 Florence, Italy
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands
| | - G A Nagana Gowda
- Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine, University of Washington, 850 Republican St., Seattle, WA 98109, USA
| | - Daniel Raftery
- Northwest Metabolomics Research Center, Department of Anesthesiology and Pain Medicine, University of Washington, 850 Republican St., Seattle, WA 98109, USA
- Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue, Seattle, WA 98109, USA
| | - Fatimah Alahmari
- Department of NanoMedicine Research, Institute for Research and Medical Consultations (IRMC), Imam Abdulrahman bin Faisal University, Dammam 31441, Saudi Arabia
| | - Lukasz Jaremko
- Division of Biological and Environmental Sciences and Engineering (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Mariusz Jaremko
- Division of Biological and Environmental Sciences and Engineering (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - David S Wishart
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E8, Canada
| |
Collapse
|
12
|
Sun Y, Saito K, Iiji R, Saito Y. Application of Ion Chromatography Coupled with Mass Spectrometry for Human Serum and Urine Metabolomics. SLAS DISCOVERY 2019; 24:778-786. [DOI: 10.1177/2472555219850082] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Biomarkers that indicate the presence or severity of organ damage caused by diseases and toxicities are useful diagnostic tools. Metabolomics platforms using chromatography coupled with mass spectrometry (MS) have been widely used for biomarker screening. In this study, we aimed to establish a novel metabolomics platform using ion chromatography coupled with MS (IC-MS) for human biofluids. We found that ethylenediaminetetraacetic acid (EDTA) plasma is not suitable for IC-MS metabolomics platforms because of the desensitization of MS. IC-MS enabled detection of 131 polar metabolites in human serum and urine from healthy volunteers. Pathway analysis demonstrated that the metabolites detectable using our platform were composed of a broad spectrum of organic acids with carboxylic moieties. These metabolites were significantly associated with pathways such as the tricarboxylic acid (TCA) cycle; glyoxylate and dicarboxylate metabolism; alanine, aspartate, and glutamate metabolism; butanoate metabolism; and the pentose phosphate pathway. Moreover, comparison of serum and urine samples showed that four metabolites (4-hydroxybutyric acid, aspartic acid, lactic acid, and γ-glutamyl glutamine) were abundant in serum, whereas 62 metabolites, including phosphoric acid, vanillylmandelic acid, and N-tiglylglycine, were abundant in urine. In addition, allantoin and uric acid were abundant in male serum, whereas no gender-associated differences were found for polar metabolites in urine. Our results demonstrate that the present established IC-MS metabolomics platform can be applied for analysis of human serum and urine as well as detection of a broad spectrum of polar metabolites in human biofluids.
Collapse
Affiliation(s)
- Yuchen Sun
- Division of Medical Safety Science, National Institute of Health Sciences, Kanagawa, Japan
| | - Kosuke Saito
- Division of Medical Safety Science, National Institute of Health Sciences, Kanagawa, Japan
| | - Ryota Iiji
- Division of Medical Safety Science, National Institute of Health Sciences, Kanagawa, Japan
| | - Yoshiro Saito
- Division of Medical Safety Science, National Institute of Health Sciences, Kanagawa, Japan
| |
Collapse
|
13
|
Zacharias HU, Altenbuchinger M, Gronwald W. Statistical Analysis of NMR Metabolic Fingerprints: Established Methods and Recent Advances. Metabolites 2018; 8:E47. [PMID: 30154338 PMCID: PMC6161311 DOI: 10.3390/metabo8030047] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 08/01/2018] [Accepted: 08/18/2018] [Indexed: 01/02/2023] Open
Abstract
In this review, we summarize established and recent bioinformatic and statistical methods for the analysis of NMR-based metabolomics. Data analysis of NMR metabolic fingerprints exhibits several challenges, including unwanted biases, high dimensionality, and typically low sample numbers. Common analysis tasks comprise the identification of differential metabolites and the classification of specimens. However, analysis results strongly depend on the preprocessing of the data, and there is no consensus yet on how to remove unwanted biases and experimental variance prior to statistical analysis. Here, we first review established and new preprocessing protocols and illustrate their pros and cons, including different data normalizations and transformations. Second, we give a brief overview of state-of-the-art statistical analysis in NMR-based metabolomics. Finally, we discuss a recent development in statistical data analysis, where data normalization becomes obsolete. This method, called zero-sum regression, builds metabolite signatures whose estimation as well as predictions are independent of prior normalization.
Collapse
Affiliation(s)
- Helena U Zacharias
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.
| | - Michael Altenbuchinger
- Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Am Biopark 9, 93053 Regensburg, Germany.
| | - Wolfram Gronwald
- Institute of Functional Genomics, University of Regensburg, Am Biopark 9, 93053 Regensburg, Germany.
| |
Collapse
|
14
|
Hertel J, Rotter M, Frenzel S, Zacharias HU, Krumsiek J, Rathkolb B, Hrabe de Angelis M, Rabstein S, Pallapies D, Brüning T, Grabe HJ, Wang-Sattler R. Dilution correction for dynamically influenced urinary analyte data. Anal Chim Acta 2018; 1032:18-31. [PMID: 30143216 DOI: 10.1016/j.aca.2018.07.068] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 06/29/2018] [Accepted: 07/25/2018] [Indexed: 01/03/2023]
Abstract
Urinary analyte data has to be corrected for the sample specific dilution as the dilution varies intra- and interpersonally dramatically, leading to non-comparable concentration measures. Most methods of dilution correction utilized nowadays like probabilistic quotient normalization or total spectra normalization result in a division of the raw data by a dilution correction factor. Here, however, we show that the implicit assumption behind the application of division, log-linearity between the urinary flow rate and the raw urinary concentration, does not hold for analytes which are not in steady state in blood. We explicate the physiological reason for this short-coming in mathematical terms and demonstrate the empirical consequences via simulations and on multiple time-point metabolomic data, showing the insufficiency of division-based normalization procedures to account for the complex non-linear analyte specific dependencies on the urinary flow rate. By reformulating normalization as a regression problem, we propose an analyte specific way to remove the dilution variance via a flexible non-linear regression methodology which then was shown to be more effective in comparison to division-based normalization procedures. In the progress, we developed several, easily applicable methods of normalization diagnostics to decide on the method of dilution correction in a given sample. On the way, we identified furthermore the time-span since last urination as an important variance factor in urinary metabolome data which is until now completely neglected. In conclusion, we present strong theoretical and empirical evidence that normalization has to be analyte specific in dynamically influenced data. Accordingly, we developed a normalization methodology for removing the dilution variance in urinary data respecting the single analyte kinetics.
Collapse
Affiliation(s)
- Johannes Hertel
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Germany.
| | - Markus Rotter
- Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, Germany; Institute of Epidemiology, Helmholtz Zentrum München, Germany
| | - Stefan Frenzel
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Germany
| | | | - Jan Krumsiek
- Institute of Computational Biology, Helmholtz Zentrum München, Germany; Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, USA
| | - Birgit Rathkolb
- German Center for Diabetes Research (DZD), München, Germany; Chair for Molecular Animal Breeding and Biotechnology, Gene Center and Department of Veterinary Sciences, And Center for Innovative Medical Models (CiMM), Ludwig Maximilian University of Munich, Germany; German Mouse Clinic (GMC), Institute of Experimental Genetics, Helmholtz Zentrum München, Germany
| | - Martin Hrabe de Angelis
- German Center for Diabetes Research (DZD), München, Germany; Institute of Experimental Genetics, Helmholtz Zentrum München, Germany; Chair of Experimental Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universität München, Germany
| | - Sylvia Rabstein
- Institute for Prevention and Occupational Medicine of the German Social Accident Insurance, Institute of the Ruhr-Universität Bochum (IPA), Germany
| | - Dirk Pallapies
- Institute for Prevention and Occupational Medicine of the German Social Accident Insurance, Institute of the Ruhr-Universität Bochum (IPA), Germany
| | - Thomas Brüning
- Institute for Prevention and Occupational Medicine of the German Social Accident Insurance, Institute of the Ruhr-Universität Bochum (IPA), Germany
| | - Hans J Grabe
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Germany; German Center for Neurodegenerative Diseases (DZNE), Site Rostock/ Greifswald, Germany
| | - Rui Wang-Sattler
- Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, Germany; Institute of Epidemiology, Helmholtz Zentrum München, Germany; German Center for Diabetes Research (DZD), München, Germany
| |
Collapse
|