1
|
He M, Zhao N, Satten GA. MIDASim: a fast and simple simulator for realistic microbiome data. MICROBIOME 2024; 12:135. [PMID: 39039570 PMCID: PMC11264979 DOI: 10.1186/s40168-024-01822-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 04/22/2024] [Indexed: 07/24/2024]
Abstract
BACKGROUND Advances in sequencing technology has led to the discovery of associations between the human microbiota and many diseases, conditions, and traits. With the increasing availability of microbiome data, many statistical methods have been developed for studying these associations. The growing number of newly developed methods highlights the need for simple, rapid, and reliable methods to simulate realistic microbiome data, which is essential for validating and evaluating the performance of these methods. However, generating realistic microbiome data is challenging due to the complex nature of microbiome data, which feature correlation between taxa, sparsity, overdispersion, and compositionality. Current methods for simulating microbiome data are deficient in their ability to capture these important features of microbiome data, or can require exorbitant computational time. METHODS We develop MIDASim (MIcrobiome DAta Simulator), a fast and simple approach for simulating realistic microbiome data that reproduces the distributional and correlation structure of a template microbiome dataset. MIDASim is a two-step approach. The first step generates correlated binary indicators that represent the presence-absence status of all taxa, and the second step generates relative abundances and counts for the taxa that are considered to be present in step 1, utilizing a Gaussian copula to account for the taxon-taxon correlations. In the second step, MIDASim can operate in both a nonparametric and parametric mode. In the nonparametric mode, the Gaussian copula uses the empirical distribution of relative abundances for the marginal distributions. In the parametric mode, a generalized gamma distribution is used in place of the empirical distribution. RESULTS We demonstrate improved performance of MIDASim relative to other existing methods using gut and vaginal data. MIDASim showed superior performance by PERMANOVA and in terms of alpha diversity and beta dispersion in either parametric or nonparametric mode. We also show how MIDASim in parametric mode can be used to assess the performance of methods for finding differentially abundant taxa in a compositional model. CONCLUSIONS MIDASim is easy to implement, flexible and suitable for most microbiome data simulation situations. MIDASim has three major advantages. First, MIDASim performs better in reproducing the distributional features of real data compared to other methods, at both the presence-absence level and the relative-abundance level. MIDASim-simulated data are more similar to the template data than competing methods, as quantified using a variety of measures. Second, MIDASim makes few distributional assumptions for the relative abundances, and thus can easily accommodate complex distributional features in real data. Third, MIDASim is computationally efficient and can be used to simulate large microbiome datasets. Video Abstract.
Collapse
Affiliation(s)
- Mengyu He
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30329, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205, USA.
| | - Glen A Satten
- Department of Gynecology and Obstetrics, Emory University, Atlanta, GA, 30329, USA
| |
Collapse
|
2
|
Bommana S, Hu YJ, Kama M, Wang R, Kodimerla R, Jijakli K, Read TD, Dean D. Unique microbial diversity, community composition, and networks among Pacific Islander endocervical and vaginal microbiomes with and without Chlamydia trachomatis infection in Fiji. mBio 2024; 15:e0306323. [PMID: 38117091 PMCID: PMC10790706 DOI: 10.1128/mbio.03063-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 11/15/2023] [Indexed: 12/21/2023] Open
Abstract
IMPORTANCE Chlamydia trachomatis (Ct) is the most common sexually transmitted bacterium globally. Endocervical and vaginal microbiome interactions are rarely examined within the context of Ct or among vulnerable populations. We evaluated 258 vaginal and 92 paired endocervical samples from Fijian women using metagenomic shotgun sequencing. Over 37% of the microbiomes could not be classified into sub-community state types (subCSTs). We, therefore, developed subCSTs IV-D0, IV-D1, IV-D2, and IV-E-dominated primarily by Gardnerella vaginalis-to improve classification. Among paired microbiomes, the endocervix had a significantly higher alpha diversity and, independently, higher diversity for high-risk human papilloma virus (HPV) genotypes compared to low-risk and no HPV. Ct-infected endocervical networks had smaller clusters without interactions with potentially beneficial Lactobacillus spp. Overall, these data suggest that G. vaginalis may generate polymicrobial biofilms that predispose to and/or promote Ct and possibly HPV persistence and pathogenicity. Our findings expand on the existing repertoire of endocervical and vaginal microbiomes and fill in knowledge gaps regarding Pacific Islanders.
Collapse
Affiliation(s)
- Sankhya Bommana
- Department of Pediatrics, University of California San Francisco, Oakland, California, USA
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA
| | - Mike Kama
- Ministry of Health and Medical Services, Suva, Fiji
| | - Ruohong Wang
- Department of Pediatrics, University of California San Francisco, Oakland, California, USA
| | - Reshma Kodimerla
- Department of Pediatrics, University of California San Francisco, Oakland, California, USA
| | - Kenan Jijakli
- Department of Medicine, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Timothy D. Read
- Department of Medicine, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Deborah Dean
- Department of Pediatrics, University of California San Francisco, Oakland, California, USA
- Department of Medicine, University of California San Francisco, San Francisco, California, USA
- Department of Bioengineering, Joint Graduate Program, University of California San Francisco and University of California Berkeley, San Francisco, California, USA
- Bixby Center for Global Reproductive Health, University of California San Francisco, San Francisco, California, USA
- University of California San Francisco, Benioff Center for Microbiome Medicine, San Francisco, California, USA
| |
Collapse
|
3
|
Hu YJ, Satten GA. Compositional analysis of microbiome data using the linear decomposition model (LDM). Bioinformatics 2023; 39:btad668. [PMID: 37930883 PMCID: PMC10639033 DOI: 10.1093/bioinformatics/btad668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 09/13/2023] [Accepted: 10/31/2023] [Indexed: 11/08/2023] Open
Abstract
SUMMARY There are compelling reasons to test compositional hypotheses about microbiome data. We present here linear decomposition model-centered log ratio (LDM-clr), an extension of our LDM approach to allow fitting linear models to centered-log-ratio-transformed taxa count data. As LDM-clr is implemented within the existing LDM program, this extension enjoys all the features supported by LDM, including a compositional analysis of differential abundance at both the taxon and community levels, while allowing for a wide range of covariates and study designs for either association or mediation analysis. AVAILABILITY AND IMPLEMENTATION LDM-clr has been added to the R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM.
Collapse
Affiliation(s)
- Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, United States
| | - Glen A Satten
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA 30322, United States
| |
Collapse
|
4
|
Hu YJ, Satten GA. Compositional analysis of microbiome data using the linear decomposition model (LDM). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.26.542540. [PMID: 37398068 PMCID: PMC10312423 DOI: 10.1101/2023.05.26.542540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Summary There are compelling reasons to test compositional hypotheses about microbiome data. We present here LDM-clr, an extension of our linear decomposition model (LDM) approach to allow fitting linear models to centered-log-ratio-transformed taxa count data. As LDM-clr is implemented within the existing LDM program, it enjoys all the features supported by LDM, including a compositional analysis of differential abundance at both the taxon and community levels, while allowing for a wide range of covariates and study designs for either association or mediation analysis. Availability and Implementation LDM-clr has been added to the R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM . Contact yijuan.hu@emory.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
5
|
Van Doren VE, Smith SA, Hu YJ, Tharp G, Bosinger S, Ackerley CG, Murray PM, Amara RR, Amancha PK, Arthur RA, Johnston HR, Kelley CF. HIV, asymptomatic STI, and the rectal mucosal immune environment among young men who have sex with men. PLoS Pathog 2023; 19:e1011219. [PMID: 37253061 PMCID: PMC10256205 DOI: 10.1371/journal.ppat.1011219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 06/09/2023] [Accepted: 05/10/2023] [Indexed: 06/01/2023] Open
Abstract
Young men who have sex with men (YMSM) are disproportionately affected by HIV and bacterial sexually transmitted infections (STI) including gonorrhea, chlamydia, and syphilis; yet research into the immunologic effects of these infections is typically pursued in siloes. Here, we employed a syndemic approach to understand potential interactions of these infections on the rectal mucosal immune environment among YMSM. We enrolled YMSM aged 18-29 years with and without HIV and/or asymptomatic bacterial STI and collected blood, rectal secretions, and rectal tissue biopsies. YMSM with HIV were on suppressive antiretroviral therapy (ART) with preserved blood CD4 cell counts. We defined 7 innate and 19 adaptive immune cell subsets by flow cytometry, the rectal mucosal transcriptome by RNAseq, and the rectal mucosal microbiome by 16S rRNA sequencing and examined the effects of HIV and STI and their interactions. We measured tissue HIV RNA viral loads among YMSM with HIV and HIV replication in rectal explant challenge experiments among YMSM without HIV. HIV, but not asymptomatic STI, was associated with profound alterations in the cellular composition of the rectal mucosa. We did not detect a difference in the microbiome composition associated with HIV, but asymptomatic bacterial STI was associated with a higher probability of presence of potentially pathogenic taxa. When examining the rectal mucosal transcriptome, there was evidence of statistical interaction; asymptomatic bacterial STI was associated with upregulation of numerous inflammatory genes and enrichment for immune response pathways among YMSM with HIV, but not YMSM without HIV. Asymptomatic bacterial STI was not associated with differences in tissue HIV RNA viral loads or in HIV replication in explant challenge experiments. Our results suggest that asymptomatic bacterial STI may contribute to inflammation particularly among YMSM with HIV, and that future research should examine potential harms and interventions to reduce the health impact of these syndemic infections.
Collapse
Affiliation(s)
- Vanessa E. Van Doren
- The Hope Clinic of the Emory Vaccine Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - S. Abigail Smith
- The Hope Clinic of the Emory Vaccine Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America
| | - Gregory Tharp
- Emory National Primate Research Center, Emory University, Atlanta, Georgia, United States of America
| | - Steven Bosinger
- Emory National Primate Research Center, Emory University, Atlanta, Georgia, United States of America
- Department of Microbiology and Immunology, Emory University, Atlanta, Georgia, United States of America
| | - Cassie G. Ackerley
- The Hope Clinic of the Emory Vaccine Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Phillip M. Murray
- The Hope Clinic of the Emory Vaccine Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Rama R. Amara
- Emory National Primate Research Center, Emory University, Atlanta, Georgia, United States of America
- Department of Microbiology and Immunology, Emory University, Atlanta, Georgia, United States of America
| | - Praveen K. Amancha
- The Hope Clinic of the Emory Vaccine Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Robert A. Arthur
- Emory Integrated Computational Core, Emory University, Atlanta, Georgia, United States of America
| | - H. Richard Johnston
- Emory Integrated Computational Core, Emory University, Atlanta, Georgia, United States of America
| | - Colleen F. Kelley
- The Hope Clinic of the Emory Vaccine Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Grady Health System, Atlanta, Georgia, United States of America
| |
Collapse
|
6
|
Jiang R, Zhan X, Wang T. A Flexible Zero-Inflated Poisson-Gamma Model with Application to Microbiome Sequence Count Data. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2151447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Affiliation(s)
- Roulan Jiang
- Center for Statistical Science and Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Xiang Zhan
- Department of Biostatistics, School of Public Health, Beijing International Center for Mathematical Research and Center for Statistical Science, Peking University, Beijing 100871, China
| | - Tianying Wang
- 3Center for Statistical Science and Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| |
Collapse
|
7
|
Ackerley CG, Smith SA, Murray PM, Amancha PK, Arthur RA, Zhu Z, Chahroudi A, Amara RR, Hu YJ, Kelley CF. The rectal mucosal immune environment and HIV susceptibility among young men who have sex with men. Front Immunol 2022; 13:972170. [PMID: 36341414 PMCID: PMC9631201 DOI: 10.3389/fimmu.2022.972170] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 09/07/2022] [Indexed: 12/03/2022] Open
Abstract
Young men who have sex with men (YMSM) represent a particularly high-risk group for HIV acquisition in the US, despite similarly reported rates of sexual activity as older, adult MSM (AMSM). Increased rates of HIV infection among YMSM compared to AMSM could be partially attributable to differences within the rectal mucosal (RM) immune environment associated with earlier sexual debut and less lifetime exposure to receptive anal intercourse. Using an ex vivo explant HIV challenge model, we found that rectal tissues from YMSM supported higher levels of p24 at peak viral replication timepoints compared to AMSM. Among YMSM, the RM was characterized by increased CD4+ T cell proliferation, as well as lower frequencies of tissue resident CD8+ T cells and pro-inflammatory cytokine producing CD4+ and CD8+ T cells. In addition, the microbiome composition of YMSM was enriched for anaerobic taxa that have previously been associated with HIV acquisition risk, including Prevotella, Peptostreptococcus, and Peptoniphilus. These distinct immunologic and microbiome characteristics were found to be associated with higher HIV replication following ex vivo challenge of rectal explants, suggesting the RM microenvironment of YMSM may be uniquely conducive to HIV infection.
Collapse
Affiliation(s)
- Cassie G. Ackerley
- The Hope Clinic of the Emory Vaccine Research Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Decatur, GA, United States
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, United States
- *Correspondence: Cassie G. Ackerley,
| | - S. Abigail Smith
- The Hope Clinic of the Emory Vaccine Research Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Decatur, GA, United States
| | - Phillip M. Murray
- The Hope Clinic of the Emory Vaccine Research Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Decatur, GA, United States
| | - Praveen K. Amancha
- The Hope Clinic of the Emory Vaccine Research Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Decatur, GA, United States
| | - Robert A. Arthur
- Emory Integrated Computational Core, Emory University, Atlanta, GA, United States
| | - Zhengyi Zhu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Ann Chahroudi
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, United States
- Emory National Primate Research Center, Emory University, Atlanta, GA, United States
- Center for Childhood Infections and Vaccines of Children’s Healthcare of Atlanta, Emory University, Atlanta, GA, United States
| | - Rama R. Amara
- Emory National Primate Research Center, Emory University, Atlanta, GA, United States
- Department of Microbiology and Immunology, Emory University, Atlanta, GA, United States
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Colleen F. Kelley
- The Hope Clinic of the Emory Vaccine Research Center, Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Decatur, GA, United States
| |
Collapse
|
8
|
Hu Y, Li Y, Satten GA, Hu YJ. Testing microbiome associations with survival times at both the community and individual taxon levels. PLoS Comput Biol 2022; 18:e1010509. [PMID: 36103548 PMCID: PMC9512219 DOI: 10.1371/journal.pcbi.1010509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 09/26/2022] [Accepted: 08/23/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Finding microbiome associations with possibly censored survival times is an important problem, especially as specific taxa could serve as biomarkers for disease prognosis or as targets for therapeutic interventions. The two existing methods for survival outcomes, MiRKAT-S and OMiSA, are restricted to testing associations at the community level and do not provide results at the individual taxon level. An ad hoc approach testing each taxon with a survival outcome using the Cox proportional hazard model may not perform well in the microbiome setting with sparse count data and small sample sizes. METHODS We have previously developed the linear decomposition model (LDM) for testing continuous or discrete outcomes that unifies community-level and taxon-level tests into one framework. Here we extend the LDM to test survival outcomes. We propose to use the Martingale residuals or the deviance residuals obtained from the Cox model as continuous covariates in the LDM. We further construct tests that combine the results of analyzing each set of residuals separately. Finally, we extend PERMANOVA, the most commonly used distance-based method for testing community-level hypotheses, to handle survival outcomes in a similar manner. RESULTS Using simulated data, we showed that the LDM-based tests preserved the false discovery rate for testing individual taxa and had good sensitivity. The LDM-based community-level tests and PERMANOVA-based tests had comparable or better power than MiRKAT-S and OMiSA. An analysis of data on the association of the gut microbiome and the time to acute graft-versus-host disease revealed several dozen associated taxa that would not have been achievable by any community-level test, as well as improved community-level tests by the LDM and PERMANOVA over those obtained using MiRKAT-S and OMiSA. CONCLUSIONS Unlike existing methods, our new methods are capable of discovering individual taxa that are associated with survival times, which could be of important use in clinical settings.
Collapse
Affiliation(s)
- Yingtian Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | - Yunxiao Li
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | - Glen A. Satten
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
9
|
Hu YJ, Satten GA. A rarefaction-without-resampling extension of PERMANOVA for testing presence-absence associations in the microbiome. Bioinformatics 2022; 38:3689-3697. [PMID: 35723568 PMCID: PMC9991891 DOI: 10.1093/bioinformatics/btac399] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 06/09/2022] [Accepted: 06/16/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION PERMANOVA is currently the most commonly used method for testing community-level hypotheses about microbiome associations with covariates of interest. PERMANOVA can test for associations that result from changes in which taxa are present or absent by using the Jaccard or unweighted UniFrac distance. However, such presence-absence analyses face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias but at the potential costs of information loss and the introduction of a stochastic component into the analysis. RESULTS Here, we develop a non-stochastic approach to PERMANOVA presence-absence analyses that aggregates information over all potential rarefaction replicates without actual resampling, when the Jaccard or unweighted UniFrac distance is used. We compare this new approach to three possible ways of aggregating PERMANOVA over multiple rarefactions obtained from resampling: averaging the distance matrix, averaging the (element-wise) squared distance matrix and averaging the F-statistic. Our simulations indicate that our non-stochastic approach is robust to confounding by library size and outperforms each of the stochastic resampling approaches. We also show that, when overdispersion is low, averaging the (element-wise) squared distance outperforms averaging the unsquared distance, currently implemented in the R package vegan. We illustrate our methods using an analysis of data on inflammatory bowel disease in which samples from case participants have systematically smaller library sizes than samples from control participants. AVAILABILITY AND IMPLEMENTATION We have implemented all the approaches described above, including the function for calculating the analytical average of the squared or unsquared distance matrix, in our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Glen A Satten
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|
10
|
LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control. Proc Natl Acad Sci U S A 2022; 119:e2122788119. [PMID: 35867822 PMCID: PMC9335309 DOI: 10.1073/pnas.2122788119] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zero counts is problematic, and these methods cannot always control the false discovery rate (FDR). Further, high-throughput microbiome data such as 16S amplicon or metagenomic sequencing are subject to experimental biases that are introduced in every step of the experimental workflow. McLaren et al. [eLife 8, e46923 (2019)] have recently proposed a model for how these biases affect relative abundance data. Motivated by this model, we show that the odds ratios in a logistic regression comparing counts in two taxa are invariant to experimental biases. With this motivation, we propose logistic compositional analysis (LOCOM), a robust logistic regression approach to compositional analysis, that does not require pseudocounts. Inference is based on permutation to account for overdispersion and small sample sizes. Traits can be either binary or continuous, and adjustment for confounders is supported. Our simulations indicate that LOCOM always preserved FDR and had much improved sensitivity over existing methods. In contrast, analysis of composition of microbiomes (ANCOM) and ANCOM with bias correction (ANCOM-BC)/ANOVA-Like Differential Expression tool (ALDEx2) had inflated FDR when the effect sizes were small and large, respectively. Only LOCOM was robust to experimental biases in every situation. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. Our R package LOCOM is publicly available.
Collapse
|
11
|
Yue Y, Hu YJ. A new approach to testing mediation of the microbiome at both the community and individual taxon levels. Bioinformatics 2022; 38:3173-3180. [PMID: 35512399 PMCID: PMC9191207 DOI: 10.1093/bioinformatics/btac310] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/28/2022] [Accepted: 05/02/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Understanding whether and which microbes played a mediating role between an exposure and a disease outcome are essential for researchers to develop clinical interventions to treat the disease by modulating the microbes. Existing methods for mediation analysis of the microbiome are often limited to a global test of community-level mediation or selection of mediating microbes without control of the false discovery rate (FDR). Further, while the null hypothesis of no mediation at each microbe is a composite null that consists of three types of null, most existing methods treat the microbes as if they were all under the same type of null, leading to excessive false positive results. RESULTS We propose a new approach based on inverse regression that regresses the microbiome data at each taxon on the exposure and the exposure-adjusted outcome. Then, the P-values for testing the coefficients are used to test mediation at both the community and individual taxon levels. This approach fits nicely into our Linear Decomposition Model (LDM) framework, so our new method LDM-med, implemented in the LDM framework, enjoys all the features of the LDM, e.g. allowing an arbitrary number of taxa to be tested simultaneously, supporting continuous, discrete, or multivariate exposures and outcomes (including survival outcomes), and so on. Using extensive simulations, we showed that LDM-med always preserved the FDR of testing individual taxa and had adequate sensitivity; LDM-med always controlled the type I error of the global test and had compelling power over existing methods. The flexibility of LDM-med for a variety of mediation analyses is illustrated by an application to a murine microbiome dataset, which identified several plausible mediating taxa. AVAILABILITY AND IMPLEMENTATION Our new method has been added to our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ye Yue
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Yi-Juan Hu
- To whom correspondence should be addressed.
| |
Collapse
|
12
|
Liñares-Blanco J, Fernandez-Lozano C, Seoane JA, López-Campos G. Machine Learning Based Microbiome Signature to Predict Inflammatory Bowel Disease Subtypes. Front Microbiol 2022; 13:872671. [PMID: 35663898 PMCID: PMC9157387 DOI: 10.3389/fmicb.2022.872671] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 04/26/2022] [Indexed: 11/13/2022] Open
Abstract
Inflammatory bowel disease (IBD) is a chronic disease with unknown pathophysiological mechanisms. There is evidence of the role of microorganims in this disease development. Thanks to the open access to multiple omics data, it is possible to develop predictive models that are able to prognosticate the course and development of the disease. The interpretability of these models, and the study of the variables used, allows the identification of biological aspects of great importance in the development of the disease. In this work we generated a metagenomic signature with predictive capacity to identify IBD from fecal samples. Different Machine Learning models were trained, obtaining high performance measures. The predictive capacity of the identified signature was validated in two external cohorts. More precisely a cohort containing samples from patients suffering Ulcerative Colitis and another from patients suffering Crohn's Disease, the two major subtypes of IBD. The results obtained in this validation (AUC 0.74 and AUC = 0.76, respectively) show that our signature presents a generalization capacity in both subtypes. The study of the variables within the model, and a correlation study based on text mining, identified different genera that play an important and common role in the development of these two subtypes.
Collapse
Affiliation(s)
- Jose Liñares-Blanco
- Department of Computer Science and Information Technologies, Faculty of Computer Science, CITIC, University of A Coruña, A Coruña, Spain.,GENYO, Centre for Genomics and Oncological Research, Pfizer/University of Granada/Andalusian Regional Government PTS Granada, Granada, Spain.,Department of Statistics and Operational Research, University of Granada, Granada, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science and Information Technologies, Faculty of Computer Science, CITIC, University of A Coruña, A Coruña, Spain
| | - Jose A Seoane
- Vall d'Hebron Institute of Oncology, Barcelona, Spain
| | - Guillermo López-Campos
- Wellcome-Wolfson Institute for Experimental Medicine, Queen's University Belfast, Belfast, United Kingdom
| |
Collapse
|
13
|
Zhu Z, Satten GA, Hu YJ. Integrative analysis of relative abundance data and presence-absence data of the microbiome using the LDM. Bioinformatics 2022; 38:2915-2917. [PMID: 35561163 PMCID: PMC9113255 DOI: 10.1093/bioinformatics/btac181] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 03/11/2022] [Accepted: 03/22/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY We previously developed the LDM for testing hypotheses about the microbiome that performs the test at both the community level and the individual taxon level. The LDM can be applied to relative abundance data and presence-absence data separately, which work well when associated taxa are abundant and rare, respectively. Here, we propose LDM-omni3 that combines LDM analyses at the relative abundance and presence-absence data scales, thereby offering optimal power across scenarios with different association mechanisms. The new LDM-omni3 test is available for the wide range of data types and analyses that are supported by the LDM. AVAILABILITY AND IMPLEMENTATION The LDM-omni3 test has been added to the R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhengyi Zhu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Glen A Satten
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
14
|
Chaverri P, Chaverri G. Fungal communities in feces of the frugivorous bat Ectophylla alba and its highly specialized Ficus colubrinae diet. Anim Microbiome 2022; 4:24. [PMID: 35303964 PMCID: PMC8932179 DOI: 10.1186/s42523-022-00169-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 02/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Bats are important long-distance dispersers of many tropical plants, yet, by consuming fruits, they may disperse not only the plant's seeds, but also the mycobiota within those fruits. We characterized the culture-dependent and independent fungal communities in fruits of Ficus colubrinae and feces of Ectophylla alba to determine if passage through the digestive tract of bats affected the total mycobiota. RESULTS Using presence/absence and normalized abundance data from fruits and feces, we demonstrate that the fungal communities were significantly different, even though there was an overlap of ca. 38% of Amplicon Sequence Variants (ASVs). We show that some of the fungi from fruits were also present and grew from fecal samples. Fecal fungal communities were dominated by Agaricomycetes, followed by Dothideomycetes, Sordariomycetes, Eurotiomycetes, and Malasseziomycetes, while fruit samples were dominated by Dothideomycetes, followed by Sordariomycetes, Agaricomycetes, Eurotiomycetes, and Laboulbeniomycetes. Linear discriminant analyses (LDA) show that, for bat feces, the indicator taxa include Basidiomycota (i.e., Agaricomycetes: Polyporales and Agaricales), and the ascomycetous class Eurotiomycetes (i.e., Eurotiales, Aspergillaceae). For fruits, indicator taxa are in the Ascomycota (i.e., Dothideomycetes: Botryosphaeriales; Laboulbeniomycetes: Pyxidiophorales; and Sordariomycetes: Glomerellales). In our study, the differences in fungal species composition between the two communities (fruits vs. feces) reflected on the changes in the functional diversity. For example, the core community in bat feces is constituted by saprobes and animal commensals, while that of fruits is composed mostly of phytopathogens and arthropod-associated fungi. CONCLUSIONS Our study provides the groundwork to continue disentangling the direct and indirect symbiotic relationships in an ecological network that has not received enough attention: fungi-plants-bats. Findings also suggest that the role of frugivores in plant-animal mutualistic networks may extend beyond seed dispersal: they may also promote the dispersal of potentially beneficial microbial symbionts while, for example, hindering those that can cause plant disease.
Collapse
Affiliation(s)
- Priscila Chaverri
- Escuela de Biología and Centro de Investigaciones en Productos Naturales (CIPRONA), Universidad de Costa Rica, San Pedro, Costa Rica. .,Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, USA.
| | - Gloriana Chaverri
- Sede del Sur, Universidad de Costa Rica, Golfito, 60701, Costa Rica.,Smithsonian Tropical Research Institute, Balboa, Ancón, Panamá
| |
Collapse
|
15
|
Dunlop AL, Satten GA, Hu YJ, Knight AK, Hill CC, Wright ML, Smith AK, Read TD, Pearce BD, Corwin EJ. Vaginal Microbiome Composition in Early Pregnancy and Risk of Spontaneous Preterm and Early Term Birth Among African American Women. Front Cell Infect Microbiol 2021; 11:641005. [PMID: 33996627 PMCID: PMC8117784 DOI: 10.3389/fcimb.2021.641005] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 04/06/2021] [Indexed: 12/15/2022] Open
Abstract
Objective To evaluate the association between the early pregnancy vaginal microbiome and spontaneous preterm birth (sPTB) and early term birth (sETB) among African American women. Methods Vaginal samples collected in early pregnancy (8-14 weeks' gestation) from 436 women enrolled in the Emory University African American Vaginal, Oral, and Gut Microbiome in Pregnancy Study underwent 16S rRNA gene sequencing of the V3-V4 region, taxonomic classification, and community state type (CST) assignment. We compared vaginal CST and abundance of taxa for women whose pregnancy ended in sPTB (N = 44) or sETB (N= 84) to those who delivered full term (N = 231). Results Nearly half of the women had a vaginal microbiome classified as CST IV (Diverse CST), while one-third had CST III (L. iners dominated) and just 16% had CST I, II, or V (non-iners Lactobacillus dominated). Compared to vaginal CST I, II, or V (non-iners Lactobacillus dominated), both CST III (L. iners dominated) and CST IV (Diverse) were associated with sPTB with an adjusted odds ratio (95% confidence interval) of 4.1 (1.1, infinity) and 7.7 (2.2, infinity), respectively, in multivariate logistic regression. In contrast, no vaginal CST was associated with sETB. The linear decomposition model (LDM) based on amplicon sequence variant (ASV) relative abundance found a significant overall effect of the vaginal microbiome on sPTB (p=0.034) but not sETB (p=0.320), whereas the LDM based on presence/absence of ASV found no overall effect on sPTB (p=0.328) but a significant effect on sETB (p=0.030). In testing for ASV-specific effects, the LDM found that no ASV was significantly associated with sPTB considering either relative abundance or presence/absence data after controlling for multiple comparisons (FDR 10%), although in marginal analysis the relative abundance of Gardnerella vaginalis (p=0.011), non-iners Lactobacillus (p=0.016), and Mobiluncus curtisii (p=0.035) and the presence of Atopobium vaginae (p=0.049), BVAB2 (p=0.024), Dialister microaerophilis (p=0.011), and Prevotella amnii (p=0.044) were associated with sPTB. The LDM identified the higher abundance of 7 ASVs and the presence of 13 ASVs, all commonly residents of the gut, as associated with sETB at FDR < 10%. Conclusions In this cohort of African American women, an early pregnancy vaginal CST III or IV was associated with an increased risk of sPTB but not sETB. The relative abundance and presence of distinct taxa within the early pregnancy vaginal microbiome was associated with either sPTB or sETB.
Collapse
Affiliation(s)
- Anne L. Dunlop
- Emory University Nell Hodgson Woodruff School of Nursing, Atlanta, GA, United States
- Department of Family & Preventive Medicine, Emory University School of Medicine, Atlanta, GA, United States
| | - Glen A. Satten
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA, United States
- Department of Biostatistics and Bioinformatics, Emory University Rollins School of Public Health, Atlanta, GA, United States
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University Rollins School of Public Health, Atlanta, GA, United States
| | - Anna K. Knight
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA, United States
| | - Cherie C. Hill
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA, United States
| | - Michelle L. Wright
- School of Nursing, University of Texas at Austin, Austin, TX, United States
| | - Alicia K. Smith
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA, United States
| | - Timothy D. Read
- Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, GA, United States
| | - Bradley D. Pearce
- Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, United States
| | | |
Collapse
|