Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tang ZZ, Chen G. Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis. Biostatistics 2019;20:698-713. [PMID: 29939212 PMCID: PMC7410344 DOI: 10.1093/biostatistics/kxy025] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 04/26/2018] [Accepted: 05/06/2018] [Indexed: 12/19/2022] Open

For:	Tang ZZ, Chen G. Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis. Biostatistics 2019;20:698-713. [PMID: 29939212 PMCID: PMC7410344 DOI: 10.1093/biostatistics/kxy025] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 04/26/2018] [Accepted: 05/06/2018] [Indexed: 12/19/2022] Open

Number

Cited by Other Article(s)

Deek RA, Ma S, Lewis J, Li H. Statistical and computational methods for integrating microbiome, host genomics, and metabolomics data. eLife 2024;13:e88956. [PMID: 38832759 PMCID: PMC11149933 DOI: 10.7554/elife.88956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 05/10/2024] [Indexed: 06/05/2024] Open

Chi J, Ye J, Zhou Y. A GLM-based zero-inflated generalized Poisson factor model for analyzing microbiome data. Front Microbiol 2024;15:1394204. [PMID: 38873138 PMCID: PMC11173601 DOI: 10.3389/fmicb.2024.1394204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 05/20/2024] [Indexed: 06/15/2024] Open

Ozminkowski S, Solís‐Lemus C. Identifying microbial drivers in biological phenotypes with a Bayesian network regression model. Ecol Evol 2024;14:e11039. [PMID: 38774136 PMCID: PMC11106058 DOI: 10.1002/ece3.11039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 01/29/2024] [Accepted: 02/03/2024] [Indexed: 05/24/2024] Open

Chi J, Ye J, Zhou Y. Mapping QTL controlling count traits with excess zeros and ones using a zero-and-one-inflated generalized Poisson regression model. Biom J 2024;66:e2200342. [PMID: 38616336 DOI: 10.1002/bimj.202200342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 11/26/2023] [Accepted: 12/08/2023] [Indexed: 04/16/2024]

Koslovsky MD. A Bayesian zero-inflated Dirichlet-multinomial regression model for multivariate compositional count data. Biometrics 2023;79:3239-3251. [PMID: 36896642 DOI: 10.1111/biom.13853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 02/23/2023] [Indexed: 03/11/2023]

Ribaud M, Gabriel E, Hughes J, Soubeyrand S. Identifying potential significant factors impacting zero-inflated proportion data. Stat Med 2023;42:3467-3486. [PMID: 37290435 DOI: 10.1002/sim.9814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 04/03/2023] [Accepted: 05/19/2023] [Indexed: 06/10/2023]

Boshuizen HC, Te Beest DE. Pitfalls in the statistical analysis of microbiome amplicon sequencing data. Mol Ecol Resour 2023;23:539-548. [PMID: 36330663 DOI: 10.1111/1755-0998.13730] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 10/27/2022] [Indexed: 11/06/2022]

Aldirawi H, Morales FG. Univariate and Multivariate Statistical Analysis of Microbiome Data: An Overview. Appl Microbiol 2023. [DOI: 10.3390/applmicrobiol3020023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]

Zhao X, Zhang J, Lin W. Clustering multivariate count data via Dirichlet-multinomial network fusion. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]

Jiang R, Zhan X, Wang T. A Flexible Zero-Inflated Poisson-Gamma Model with Application to Microbiome Sequence Count Data. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2151447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Li Z, Yu X, Guo H, Lee T, Hu J. A maximum-type microbial differential abundance test with application to high-dimensional microbiome data analyses. Front Cell Infect Microbiol 2022;12:988717. [PMID: 36389165 PMCID: PMC9650337 DOI: 10.3389/fcimb.2022.988717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 10/04/2022] [Indexed: 12/03/2022] Open

Love CJ, Gubert C, Kodikara S, Kong G, Lê Cao KA, Hannan AJ. Microbiota DNA isolation, 16S rRNA amplicon sequencing, and bioinformatic analysis for bacterial microbiome profiling of rodent fecal samples. STAR Protoc 2022;3:101772. [PMID: 36313541 PMCID: PMC9597187 DOI: 10.1016/j.xpro.2022.101772] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Identification of microbial features in multivariate regression under false discovery rate control. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Jensen AJ, Kelly RP, Anderson EC, Satterthwaite WH, Shelton AO, Ward EJ. Introducing zoid: A mixture model and R package for modeling proportional data with zeros and ones in ecology. Ecology 2022;103:e3804. [PMID: 35804486 DOI: 10.1002/ecy.3804] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 04/26/2022] [Accepted: 04/29/2022] [Indexed: 11/08/2022]

Verster A, Petronella N, Green J, Matias F, Brooks SPJ. A Bayesian method for identifying associations between response variables and bacterial community composition. PLoS Comput Biol 2022;18:e1010108. [PMID: 35793382 PMCID: PMC9307184 DOI: 10.1371/journal.pcbi.1010108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 07/22/2022] [Accepted: 04/14/2022] [Indexed: 11/18/2022] Open

Abstract

Determining associations between intestinal bacteria and continuously measured physiological outcomes is important for understanding the bacteria-host relationship but is not straightforward since abundance data (compositional data) are not normally distributed. To address this issue, we developed a fully Bayesian linear regression model (BRACoD; Bayesian Regression Analysis of Compositional Data) with physiological measurements (continuous data) as a function of a matrix of relative bacterial abundances. Bacteria can be classified as operational taxonomic units or by taxonomy (genus, family, etc.). Bacteria associated with the physiological measurement were identified using a Bayesian variable selection method: Stochastic Search Variable Selection. The output is a list of inclusion probabilities ([Formula: see text]) and coefficients that indicate the strength of the association ([Formula: see text]) for each bacterial taxa. Tests with simulated communities showed that adopting a cut point value of [Formula: see text] ≥ 0.3 for identifying included bacteria optimized the true positive rate (TPR) while maintaining a false positive rate (FPR) of ≤ 5%. At this point, the chances of identifying non-contributing bacteria were low and all well-established contributors were included. Comparison with other methods showed that BRACoD (at [Formula: see text] ≥ 0.3) had higher precision and a higher TPR than a commonly used center log transformed LASSO procedure (clr-LASSO) as well as higher TPR than an off-the-shelf Spike and Slab method after center log transformation (clr-SS). BRACoD was also less likely to include non-contributing bacteria that merely correlate with contributing bacteria. Analysis of a rat microbiome experiment identified 47 operational taxonomic units that contributed to fecal butyrate levels. Of these, 31 were positively and 16 negatively associated with butyrate. Consistent with their known role in butyrate metabolism, most of these fell within the Lachnospiraceae and Ruminococcaceae. We conclude that BRACoD provides a more precise and accurate method for determining bacteria associated with a continuous physiological outcome compared to clr-LASSO. It is more sensitive than a generalized clr-SS algorithm, although it has a higher FPR. Its ability to distinguish genuine contributors from correlated bacteria makes it better suited to discriminating bacteria that directly contribute to an outcome. The algorithm corrects for the distortions arising from compositional data making it appropriate for analysis of microbiome data.

Collapse

Wu Q, O’Malley J, Datta S, Gharaibeh RZ, Jobin C, Karagas MR, Coker MO, Hoen AG, Christensen BC, Madan JC, Li Z. MarZIC: A Marginal Mediation Model for Zero-Inflated Compositional Mediators with Applications to Microbiome Data. Genes (Basel) 2022;13:1049. [PMID: 35741811 PMCID: PMC9223163 DOI: 10.3390/genes13061049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/06/2022] [Accepted: 06/07/2022] [Indexed: 12/15/2022] Open

Zeng Y, Pang D, Zhao H, Wang T. A Zero-inflated Logistic Normal Multinomial Model for Extracting Microbial Compositions. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2044827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Alenazi A. A review of compositional data analysis and recent advances. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.2014890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Tang M, Wu Q, Yang S, Tian G. Dirichlet composition distribution for compositional data with zero components: An application to fluorescence in situ hybridization (FISH) detection of chromosome. Biom J 2021;64:714-732. [PMID: 34914842 PMCID: PMC9300144 DOI: 10.1002/bimj.202000334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 08/24/2021] [Accepted: 08/31/2021] [Indexed: 11/26/2022]

Ostner J, Carcy S, Müller CL. tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data. Front Genet 2021;12:766405. [PMID: 34950190 PMCID: PMC8689185 DOI: 10.3389/fgene.2021.766405] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 11/01/2021] [Indexed: 12/11/2022] Open

Chen B, Xu W. Functional response regression model on correlated longitudinal microbiome sequencing data. Stat Methods Med Res 2021;31:361-371. [PMID: 34866471 PMCID: PMC8829735 DOI: 10.1177/09622802211061634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks. Genes (Basel) 2021;12:genes12111755. [PMID: 34828361 PMCID: PMC8624964 DOI: 10.3390/genes12111755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/25/2021] [Accepted: 10/27/2021] [Indexed: 11/16/2022] Open

Srinivasan A, Xue L, Zhan X. Compositional knockoff filter for high-dimensional regression analysis of microbiome data. Biometrics 2021;77:984-995. [PMID: 32683674 PMCID: PMC7831267 DOI: 10.1111/biom.13336] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 06/29/2020] [Accepted: 07/09/2020] [Indexed: 01/10/2023]

Stevens BR, Roesch L, Thiago P, Russell JT, Pepine CJ, Holbert RC, Raizada MK, Triplett EW. Depression phenotype identified by using single nucleotide exact amplicon sequence variants of the human gut microbiome. Mol Psychiatry 2021;26:4277-4287. [PMID: 31988436 DOI: 10.1038/s41380-020-0652-5] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 01/13/2020] [Accepted: 01/16/2020] [Indexed: 12/15/2022]

Zhou C, Zhao H, Wang T. Transformation and differential abundance analysis of microbiome data incorporating phylogeny. Bioinformatics 2021;37:4652-4660. [PMID: 34302462 DOI: 10.1093/bioinformatics/btab543] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 05/31/2021] [Accepted: 07/22/2021] [Indexed: 12/13/2022] Open

Abstract

MOTIVATION

Microbiome data have proven extremely useful for understanding microbial communities and their impacts in health and disease. Although microbiome analysis methods and standards are evolving rapidly, obtaining meaningful and interpretable results from microbiome studies still requires careful statistical treatment. In particular, many existing and emerging methods for differential abundance analysis fail to account for the fact that microbiome data are high-dimensional and sparse, compositional, negatively and positively correlated, and phylogenetically structured. To better describe microbiome data and improve the power of differential abundance testing, there is still a great need for the continued development of appropriate statistical methodology.

RESULTS

In this paper, we propose a model-based approach for microbiome data transformation, and a phylogenetically informed procedure for differential abundance (DA) testing based on the transformed data. First, we extend the Dirichlet-tree multinomial (DTM) to zero-inflated DTM (ZIDTM) for multivariate modeling of microbial counts, addressing data sparsity, and correlation and phylogeny among bacterial taxa. Then, within this framework and using a Bayesian formulation, we introduce posterior mean transformation to convert raw counts into nonzero relative abundances that sum to one, accounting for the compositionality nature of microbiome data. Second, using the transformed data, we propose adaptive analysis of composition of microbiomes (adaANCOM) for DA testing by constructing log-ratios adaptively on the tree for each taxon, greatly reducing the computational complexity of ANCOM in high dimensions. Finally, we present extensive simulation studies, an analysis of HMP data across 18 body sites and 2 visits, and an application to a gut microbiome and malnutrition study, to investigate the performance of posterior mean transformation and adaANCOM. Comparisons with ANCOM and other DA testing procedures show that adaANCOM controls the false discovery rate well, allows for easy interpretation of the results, and is computationally efficient for high-dimensional problems.

AVAILABILITY

The developed R package is available at https://github.com/ZRChao/adaANCOM. For replicability purposes, scripts for our simulations and data analysis are available at https://github.com/ZRChao/Papers_supplementary.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Shuler K, Verbanic S, Chen IA, Lee J. A Bayesian nonparametric analysis for zero‐inflated multivariate count data with application to microbiome study. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Fiksel J, Datta A, Amouzou A, Zeger S. Generalized Bayes Quantification Learning under Dataset Shift. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1909599] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Data Analysis Strategies for Microbiome Studies in Human Populations-a Systematic Review of Current Practice. mSystems 2021;6:6/1/e01154-20. [PMID: 33622856 PMCID: PMC8573962 DOI: 10.1128/msystems.01154-20] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Abstract

Reproducibility is a major issue in microbiome studies, which is partly caused by missing consensus about data analysis strategies. The complex nature of microbiome data, which are high-dimensional, zero-inflated, and compositional, makes them challenging to analyze, as they often violate assumptions of classic statistical methods. With advances in human microbiome research, research questions and study designs increase in complexity so that more sophisticated data analysis concepts are applied. To improve current practice of the analysis of microbiome studies, it is important to understand what kind of research questions are asked and which tools are used to answer these questions. We conducted a systematic literature review considering all publications focusing on the analysis of human microbiome data from June 2018 to June 2019. Of 1,444 studies screened, 419 fulfilled the inclusion criteria. Information about research questions, study designs, and analysis strategies were extracted. The results confirmed the expected shift to more advanced research questions, as one-third of the studies analyzed clustered data. Although heterogeneity in the methods used was found at any stage of the analysis process, it was largest for differential abundance testing. Especially if the underlying data structure was clustered, we identified a lack of use of methods that appropriately addressed the underlying data structure while taking into account additional dependencies in the data. Our results confirm considerable heterogeneity in analysis strategies among microbiome studies; increasingly complex research questions require better guidance for analysis strategies.

IMPORTANCE The human microbiome has emerged as an important factor in the development of health and disease. Growing interest in this topic has led to an increasing number of studies investigating the human microbiome using high-throughput sequencing methods. However, the development of suitable analytical methods for analyzing microbiome data has not kept pace with the rapid progression in the field. It is crucial to understand current practice to identify the scope for development. Our results highlight the need for an extensive evaluation of the strengths and shortcomings of existing methods in order to guide the choice of proper analysis strategies. We have identified where new methods could be designed to address more advanced research questions while taking into account the complex structure of the data.

Collapse

Li Z, Tian L, O’Malley AJ, Karagas MR, Hoen AG, Christensen BC, Madan JC, Wu Q, Gharaibeh RZ, Jobin C, Li H. IFAA: Robust Association Identification and Inference for Absolute Abundance in Microbiome Analyses. J Am Stat Assoc 2021;116:1595-1608. [PMID: 35241863 PMCID: PMC8890673 DOI: 10.1080/01621459.2020.1860770] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 09/30/2020] [Accepted: 12/03/2020] [Indexed: 12/15/2022]

Deek RA, Li H. A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies. Front Genet 2021;11:602594. [PMID: 33552122 PMCID: PMC7862749 DOI: 10.3389/fgene.2020.602594] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 12/29/2020] [Indexed: 11/13/2022] Open

Hu YJ, Lane A, Satten GA. A rarefaction-based extension of the LDM for testing presence-absence associations in the microbiome. Bioinformatics 2021;37:1652-1657. [PMID: 33479757 PMCID: PMC8289387 DOI: 10.1093/bioinformatics/btab012] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/16/2020] [Accepted: 01/05/2021] [Indexed: 12/13/2022] Open

Abstract

MOTIVATION

Many methods for testing association between the microbiome and covariates of interest (e.g., clinical outcomes, environmental factors) assume that these associations are driven by changes in the relative abundance of taxa. However, these associations may also result from changes in which taxa are present and which are absent. Analyses of such presence-absence associations face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias, but at the potential cost of information loss as well as the introduction of a stochastic component into the analysis. Currently, there is a need for robust and efficient methods for testing presence-absence associations in the presence of such confounding, both at the community level and at the individual-taxon level, that avoid the drawbacks of rarefaction.

RESULTS

We have previously developed the linear decomposition model (LDM) that unifies the community-level and taxon-level tests into one framework. Here we present an extension of the LDM for testing presence-absence associations. The extended LDM is a non-stochastic approach that repeatedly applies the LDM to all rarefied taxa count tables, averages the residual sum-of-squares (RSS) terms over the rarefaction replicates, and then forms an F-statistic based on these average RSS terms. We show that this approach compares favorably to averaging the F-statistic from R rarefaction replicates, which can only be calculated stochastically. The flexible nature of the LDM allows discrete or continuous traits or interactions to be tested while allowing confounding covariates to be adjusted for. Our simulations indicate that our proposed method is robust to any systematic differences in library size and has better power than alternative approaches. We illustrate our method using an analysis of data on inflammatory bowel disease (IBD) in which cases have systematically smaller library sizes than controls.

AVAILABILITY

The R package LDM is available on GitHub at https://github.com/yijuanhu/LDM in formats appropriate for Macintosh or Windows.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Chen B, Xu W. Generalized estimating equation modeling on correlated microbiome sequencing data with longitudinal measures. PLoS Comput Biol 2020;16:e1008108. [PMID: 32898133 PMCID: PMC7500673 DOI: 10.1371/journal.pcbi.1008108] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 09/18/2020] [Accepted: 06/30/2020] [Indexed: 11/19/2022] Open

Liu T, Zhao H, Wang T. An empirical Bayes approach to normalization and differential abundance testing for microbiome data. BMC Bioinformatics 2020;21:225. [PMID: 32493208 PMCID: PMC7268703 DOI: 10.1186/s12859-020-03552-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 05/18/2020] [Indexed: 12/14/2022] Open

Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020;171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Abstract

Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.

Collapse

Cullen CM, Aneja KK, Beyhan S, Cho CE, Woloszynek S, Convertino M, McCoy SJ, Zhang Y, Anderson MZ, Alvarez-Ponce D, Smirnova E, Karstens L, Dorrestein PC, Li H, Sen Gupta A, Cheung K, Powers JG, Zhao Z, Rosen GL. Emerging Priorities for Microbiome Research. Front Microbiol 2020;11:136. [PMID: 32140140 PMCID: PMC7042322 DOI: 10.3389/fmicb.2020.00136] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 01/21/2020] [Indexed: 12/12/2022] Open

Abstract

Microbiome research has increased dramatically in recent years, driven by advances in technology and significant reductions in the cost of analysis. Such research has unlocked a wealth of data, which has yielded tremendous insight into the nature of the microbial communities, including their interactions and effects, both within a host and in an external environment as part of an ecological community. Understanding the role of microbiota, including their dynamic interactions with their hosts and other microbes, can enable the engineering of new diagnostic techniques and interventional strategies that can be used in a diverse spectrum of fields, spanning from ecology and agriculture to medicine and from forensics to exobiology. From June 19-23 in 2017, the NIH and NSF jointly held an Innovation Lab on Quantitative Approaches to Biomedical Data Science Challenges in our Understanding of the Microbiome. This review is inspired by some of the topics that arose as priority areas from this unique, interactive workshop. The goal of this review is to summarize the Innovation Lab's findings by introducing the reader to emerging challenges, exciting potential, and current directions in microbiome research. The review is broken into five key topic areas: (1) interactions between microbes and the human body, (2) evolution and ecology of microbes, including the role played by the environment and microbe-microbe interactions, (3) analytical and mathematical methods currently used in microbiome research, (4) leveraging knowledge of microbial composition and interactions to develop engineering solutions, and (5) interventional approaches and engineered microbiota that may be enabled by selectively altering microbial composition. As such, this review seeks to arm the reader with a broad understanding of the priorities and challenges in microbiome research today and provide inspiration for future investigation and multi-disciplinary collaboration.

Collapse

Affiliation(s)

Chad M. Cullen School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States
Kawalpreet K. Aneja The School District of Philadelphia, Philadelphia, PA, United States
Sinem Beyhan Department of Infectious Diseases, J. Craig Venter Institute, La Jolla, CA, United States
Clara E. Cho Department of Nutrition, Dietetics and Food Sciences, Utah State University, Logan, UT, United States
Stephen Woloszynek Ecological and Evolutionary Signal-processing and Informatics Laboratory (EESI), Electrical and Computer Engineering, Drexel University, Philadelphia, PA, United States College of Medicine, Drexel University, Philadelphia, PA, United States
Matteo Convertino Nexus Group, Faculty of Information Science and Technology, Gi-CoRE Station for Big Data & Cybersecurity, Hokkaido University, Sapporo, Japan
Sophie J. McCoy Department of Biological Science, Florida State University, Tallahassee, FL, United States
Yanyan Zhang Department of Civil Engineering, New Mexico State University, Las Cruces, NM, United States
Matthew Z. Anderson Department of Microbiology, The Ohio State University, Columbus, OH, United States Department of Microbial Infection and Immunity, The Ohio State University, Columbus, OH, United States
David Alvarez-Ponce Department of Biology, University of Nevada, Reno, Reno, NV, United States
Ekaterina Smirnova Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, United States
Lisa Karstens Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, United States Department of Obstetrics and Gynecology, Oregon Health & Science University, Portland, OR, United States
Pieter C. Dorrestein Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, San Diego, CA, United States
Hongzhe Li Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
Ananya Sen Gupta Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA, United States
Kevin Cheung Department of Dermatology, The University of Iowa, Iowa City, IA, United States
Jennifer Gloeckner Powers Department of Dermatology, The University of Iowa, Iowa City, IA, United States
Zhengqiao Zhao Ecological and Evolutionary Signal-processing and Informatics Laboratory (EESI), Electrical and Computer Engineering, Drexel University, Philadelphia, PA, United States
Gail L. Rosen School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States Ecological and Evolutionary Signal-processing and Informatics Laboratory (EESI), Electrical and Computer Engineering, Drexel University, Philadelphia, PA, United States

Collapse

Jiang D, Armour CR, Hu C, Mei M, Tian C, Sharpton TJ, Jiang Y. Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities. Front Genet 2019;10:995. [PMID: 31781153 PMCID: PMC6857202 DOI: 10.3389/fgene.2019.00995] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 09/18/2019] [Indexed: 12/21/2022] Open

Song Y, Zhao H, Wang T. An adaptive independence test for microbiome community data. Biometrics 2019;76:414-426. [DOI: 10.1111/biom.13154] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Accepted: 09/16/2019] [Indexed: 11/29/2022]

Tang ZZ, Chen G. Robust and Powerful Differential Composition Tests for Clustered Microbiome Data. STATISTICS IN BIOSCIENCES 2019. [DOI: 10.1007/s12561-019-09251-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Tang ZZ, Chen G, Hong Q, Huang S, Smith HM, Shah RD, Scholz M, Ferguson JF. Multi-Omic Analysis of the Microbiome and Metabolome in Healthy Subjects Reveals Microbiome-Dependent Relationships Between Diet and Metabolites. Front Genet 2019;10:454. [PMID: 31164901 PMCID: PMC6534069 DOI: 10.3389/fgene.2019.00454] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 04/30/2019] [Indexed: 12/22/2022] Open