101
|
Grantham NS, Guan Y, Reich BJ, Borer ET, Gross K. MIMIX: A Bayesian Mixed-Effects Model for Microbiome Data From Designed Experiments. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2019.1626242] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Neal S. Grantham
- Department of Statistics, North Carolina State University, Raleigh, NC
| | - Yawen Guan
- Department of Statistics, North Carolina State University, Raleigh, NC
| | - Brian J. Reich
- Department of Statistics, North Carolina State University, Raleigh, NC
| | - Elizabeth T. Borer
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN
| | - Kevin Gross
- Department of Statistics, North Carolina State University, Raleigh, NC
| |
Collapse
|
102
|
Abstract
Microbiomes are complex microbial communities whose structure and function are heavily influenced by microbe-microbe and microbe-host interactions mediated by a range of mechanisms, all of which have been implicated in the modulation of disease progression and clinical outcome. Therefore, understanding the microbiome as a whole, including both the complex interplay among microbial taxa and interactions with their hosts, is essential for understanding the spectrum of roles played by microbiomes in host health, development, dysbiosis, and polymicrobial infections. Network theory, in the form of systems-oriented, graph-theoretical approaches, is an exciting holistic methodology that can facilitate microbiome analysis and enhance our understanding of the complex ecological and evolutionary processes involved. Using network theory, one can model and analyze a microbiome and all its complex interactions in a single network. Here, we describe in detail and step by step, the process of building, analyzing and visualizing microbiome networks from operational taxonomic unit (OTU) tables in R and RStudio, using several different approaches and extensively commented code snippets.
Collapse
Affiliation(s)
- Mehdi Layeghifard
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - David M Hwang
- Department of Pathology, University Health Network, Toronto, ON, Canada.,Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
| | - David S Guttman
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada. .,Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
103
|
Koh H, Li Y, Zhan X, Chen J, Zhao N. A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies. Front Genet 2019; 10:458. [PMID: 31156711 PMCID: PMC6532659 DOI: 10.3389/fgene.2019.00458] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 04/30/2019] [Indexed: 12/12/2022] Open
Abstract
Researchers have increasingly employed family-based or longitudinal study designs to survey the roles of the human microbiota on diverse host traits of interest (e. g., health/disease status, medical intervention, behavioral/environmental factor). Such study designs are useful to properly control for potential confounders or the sensitive changes in microbial composition and host traits. However, downstream data analysis is challenging because the measurements within clusters (e.g., families, subjects including repeated measures) tend to be correlated so that statistical methods based on the independence assumption cannot be used. For the correlated microbiome studies, a distance-based kernel association test based on the linear mixed model, namely, correlated sequence kernel association test (cSKAT), has recently been introduced. cSKAT models the microbial community using an ecological distance (e.g., Jaccard/Bray-Curtis dissimilarity, unique fraction distance), and then tests its association with a host trait. Similar to prior distance-based kernel association tests (e.g., microbiome regression-based kernel association test), the use of ecological distances gives a high power to cSKAT. However, cSKAT is limited to handling Gaussian traits [e.g., body mass index (BMI)] and a single chosen distance measure at a time. The power of cSKAT differs a lot by which distance measure is used. However, choosing an optimal distance measure is challenging because of the unknown nature of the true association. Here, we introduce a distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, to handle diverse types of traits, such as Gaussian (e.g., BMI), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. We further propose a data-driven adaptive test of GLMM-MiRKAT, namely, aGLMM-MiRKAT, so as to avoid the need to choose the optimal distance measure. Our extensive simulations demonstrate that aGLMM-MiRKAT is robustly powerful while correctly controlling type I error rates. We apply aGLMM-MiRKAT to real familial and longitudinal microbiome data, where we discover significant disparity in microbial community composition by BMI status and the frequency of antibiotic use. In summary, aGLMM-MiRKAT is a useful analytical tool with its broad applicability to diverse types of traits, robust power and valid statistical inference.
Collapse
Affiliation(s)
- Hyunwook Koh
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Yutong Li
- School of Physics, Peking University, Beijing, China
| | - Xiang Zhan
- Department of Public Health Sciences, Pennsylvania State University, Hershey, PA, United States
| | - Jun Chen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| |
Collapse
|
104
|
Ilan Y. Why targeting the microbiome is not so successful: can randomness overcome the adaptation that occurs following gut manipulation? Clin Exp Gastroenterol 2019; 12:209-217. [PMID: 31190948 PMCID: PMC6514118 DOI: 10.2147/ceg.s203823] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 03/19/2019] [Indexed: 12/12/2022] Open
Abstract
The microbiome is explored as a potential target for therapy of bowel and systemic diseases. Fecal microbiota transplantation (FMT) has demonstrated efficacy in Clostridium difficile infection. However, clinical results regarding other diseases are modest, despite the abundant research on the microbiome over the last decade. Both high rate variability of the microbiome and adaptation to gut manipulations may underlie the lack of ultimate effects of FMT, probiotics, prebiotics, synbiotics, and antibiotics, which are aimed at restoring a healthier microbiome. The present review discusses the inherent variability of the microbiome and multiple factors that affect its diversity, as possible causes of the adaptation of the gut microbiome to chronic manipulation. The potential use of randomness is proposed, as a means of overcoming the adaptation and of restoring some of the inherent variability, with the goal of improving the long-term efficacy of these therapies.
Collapse
Affiliation(s)
- Yaron Ilan
- Department of Medicine, Hadassah-Hebrew University Medical Center, Jerusalem, Israel
| |
Collapse
|
105
|
Liu L, Shih YCT, Strawderman RL, Zhang D, Johnson BA, Chai H. Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review. Stat Sci 2019. [DOI: 10.1214/18-sts681] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
106
|
Ai D, Pan H, Li X, Gao Y, Liu G, Xia LC. Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model. Front Microbiol 2019; 10:826. [PMID: 31068913 PMCID: PMC6491826 DOI: 10.3389/fmicb.2019.00826] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 04/01/2019] [Indexed: 12/26/2022] Open
Abstract
Colorectal cancer (CRC) is the third most common cancer worldwide. Its incidence is still increasing, and the mortality rate is high. New therapeutic and prognostic strategies are urgently needed. It became increasingly recognized that the gut microbiota composition differs significantly between healthy people and CRC patients. Thus, identifying the difference between gut microbiota of the healthy people and CRC patients is fundamental to understand these microbes' functional roles in the development of CRC. We studied the microbial community structure of a CRC metagenomic dataset of 156 patients and healthy controls, and analyzed the diversity, differentially abundant bacteria, and co-occurrence networks. We applied a modified zero-inflated lognormal (ZIL) model for estimating the relative abundance. We found that the abundance of genera: Anaerostipes, Bilophila, Catenibacterium, Coprococcus, Desulfovibrio, Flavonifractor, Porphyromonas, Pseudoflavonifractor, and Weissella was significantly different between the healthy and CRC groups. We also found that bacteria such as Streptococcus, Parvimonas, Collinsella, and Citrobacter were uniquely co-occurring within the CRC patients. In addition, we found that the microbial diversity of healthy controls is significantly higher than that of the CRC patients, which indicated a significant negative correlation between gut microbiota diversity and the stage of CRC. Collectively, our results strengthened the view that individual microbes as well as the overall structure of gut microbiota were co-evolving with CRC.
Collapse
Affiliation(s)
- Dongmei Ai
- Basic Experimental of Natural Science, University of Science and Technology Beijing, Beijing, China
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China
| | - Hongfei Pan
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China
| | - Xiaoxin Li
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China
| | - Yingxin Gao
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China
| | - Gang Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, China
| | - Li C Xia
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, United States
| |
Collapse
|
107
|
Ho NT, Li F, Wang S, Kuhn L. metamicrobiomeR: an R package for analysis of microbiome relative abundance data using zero-inflated beta GAMLSS and meta-analysis across studies using random effects models. BMC Bioinformatics 2019; 20:188. [PMID: 30991942 PMCID: PMC6469060 DOI: 10.1186/s12859-019-2744-2] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 03/18/2019] [Indexed: 01/12/2023] Open
Abstract
Background The rapid growth of high-throughput sequencing-based microbiome profiling has yielded tremendous insights into human health and physiology. Data generated from high-throughput sequencing of 16S rRNA gene amplicons are often preprocessed into composition or relative abundance. However, reproducibility has been lacking due to the myriad of different experimental and computational approaches taken in these studies. Microbiome studies may report varying results on the same topic, therefore, meta-analyses examining different microbiome studies to provide consistent and robust results are important. So far, there is still a lack of implemented methods to properly examine differential relative abundances of microbial taxonomies and to perform meta-analysis examining the heterogeneity and overall effects across microbiome studies. Results We developed an R package ‘metamicrobiomeR’ that applies Generalized Additive Models for Location, Scale and Shape (GAMLSS) with a zero-inflated beta (BEZI) family (GAMLSS-BEZI) for analysis of microbiome relative abundance datasets. Both simulation studies and application to real microbiome data demonstrate that GAMLSS-BEZI well performs in testing differential relative abundances of microbial taxonomies. Importantly, the estimates from GAMLSS-BEZI are log (odds ratio) of relative abundances between comparison groups and thus are analogous between microbiome studies. As such, we also apply random effects meta-analysis models to pool estimates and their standard errors across microbiome studies. We demonstrate the meta-analysis examples and highlight the utility of our package on four studies comparing gut microbiomes between male and female infants in the first six months of life. Conclusions GAMLSS-BEZI allows proper examination of microbiome relative abundance data. Random effects meta-analysis models can be directly applied to pool comparable estimates and their standard errors to evaluate the overall effects and heterogeneity across microbiome studies. The examples and workflow using our ‘metamicrobiomeR’ package are reproducible and applicable for the analyses and meta-analyses of other microbiome studies. Electronic supplementary material The online version of this article (10.1186/s12859-019-2744-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nhan Thi Ho
- Gertrude H. Sergievsky Center, Columbia University, New York City, NY, USA. .,Institute of Applied Sciences and Regenerative Medicine, Vinmec Healthcare System, 458 Minh Khai, Hai Ba Trung, Ha Noi, Vietnam.
| | - Fan Li
- Department of Pediatrics, University of California, Los Angeles, CA, USA
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York City, NY, USA
| | - Louise Kuhn
- Gertrude H. Sergievsky Center, Columbia University, New York City, NY, USA
| |
Collapse
|
108
|
Mougeot JLC, Stevens CB, Almon KG, Paster BJ, Lalla RV, Brennan MT, Mougeot FB. Caries-associated oral microbiome in head and neck cancer radiation patients: a longitudinal study. J Oral Microbiol 2019; 11:1586421. [PMID: 30891159 PMCID: PMC6419625 DOI: 10.1080/20002297.2019.1586421] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Revised: 02/08/2019] [Accepted: 02/13/2019] [Indexed: 01/04/2023] Open
Abstract
Head and neck cancer (HNC) therapy often leads to caries development. Our goal was to characterize the oral microbiome of HNC patients who underwent radiation therapy (RT) at baseline (T0), and 6 (T6) and 18 (T18) months post-RT, and to determine if there was a relationship with increased caries. HOMINGS was used to determine the relative abundance (RA) of >600 bacterial species in oral samples of 31 HNC patients. The DMFS score was used to define patient groups with tooth decay increase (DMFS[+]) or no increase (DMFS[-]).A change in microbiome beta-diversity was observed at T6 and T18. The Streptococcus mutans RA increased at T6 in both DMFS[+] and DMFS[-] groups. The RA of Prevotella melaninogenica, the species often associated with caries in young children, decreased at T6 in the DMFS[-] group. The RA of the health-associated species, Abiotrophia defective, decreased in the DMFS[+] group. The oral microbiome underwent significant changes in radiation-treated HNC patients, whether they developed caries or not. Caries rates were not associated with a difference in salivary flow reduction between DMFS[+] andDMFS[-] groups. Patients who develop caries might be more susceptible to certain species associated with oral disease or have fewer potentially protective oral species.
Collapse
Affiliation(s)
| | - Craig B Stevens
- Carolinas Medical Center - Atrium Health, Charlotte, NC, USA
| | - Kathryn G Almon
- Carolinas Medical Center - Atrium Health, Charlotte, NC, USA
| | | | | | | | | |
Collapse
|
109
|
Watson RL, de Koff EM, Bogaert D. Characterising the respiratory microbiome. Eur Respir J 2019; 53:13993003.01711-2018. [PMID: 30487204 DOI: 10.1183/13993003.01711-2018] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 11/05/2018] [Indexed: 12/27/2022]
Affiliation(s)
- Rebecca L Watson
- Center for Inflammation Research, Queens Medical Research Institute, University of Edinburgh, Edinburgh, UK.,Both authors contributed equally
| | - Emma M de Koff
- Dept of Pediatrics, Wilhelmina Children's Hospital, University Medical Center Utrecht, Utrecht, The Netherlands.,Spaarne Academy, Spaarne Gasthuis, Hoofddorp, The Netherlands.,Both authors contributed equally
| | - Debby Bogaert
- Center for Inflammation Research, Queens Medical Research Institute, University of Edinburgh, Edinburgh, UK.,Dept of Pediatrics, Wilhelmina Children's Hospital, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
110
|
D'Agata AL, Wu J, Welandawe MKV, Dutra SVO, Kane B, Groer MW. Effects of early life NICU stress on the developing gut microbiome. Dev Psychobiol 2019; 61:650-660. [PMID: 30697700 DOI: 10.1002/dev.21826] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 12/11/2018] [Accepted: 12/12/2018] [Indexed: 02/06/2023]
Abstract
Succession of gut microbial community structure for newborns is highly influenced by early life factors. Many preterm infants cared for in the NICU are exposed to parent-infant separation, stress, and pain from medical care procedures. The purpose of the study was to investigate the impact of early life stress on the trajectory of gut microbial structure. Stool samples from very preterm infants were collected weekly for 6 weeks. NICU stress exposure data were collected daily for 6 weeks. V4 region of the 16S rRNA gene was amplified by PCR and sequenced. Zero-inflated beta regression model with random effects was used to assess the impact of stress on gut microbiome trajectories. Week of sampling was significant for Escherichia, Staphylococcus, Enterococcus, Bifidobacterium, Proteus, Streptococcus, Clostridium butyricum, and Clostridium perfringens. Antibiotic usage was significant for Proteus, Citrobacter, and C. perfringens. Gender was significant for Proteus. Stress exposure occurring 1 and 2 weeks prior to sampling had a significant effect on Proteus and Veillonella. NICU stress exposure had a significant effect on Proteus and Veillonella. An overall dominance of Gammaproteobacteria was found. Findings suggest early life NICU stress may significantly influence the developing gut microbiome, which is important to NICU practice and future microbiome research.
Collapse
Affiliation(s)
- Amy L D'Agata
- College of Nursing, University of Rhode Island, Kingston, Rhode Island.,College of Nursing, University of South Florida, Tampa, Florida
| | - Jing Wu
- Computer Science and Statistics, University of Rhode Island, Kingston, Rhode Island
| | | | - Samia V O Dutra
- College of Nursing, University of South Florida, Tampa, Florida
| | - Bradley Kane
- College of Nursing, University of South Florida, Tampa, Florida
| | - Maureen W Groer
- College of Nursing, University of South Florida, Tampa, Florida
| |
Collapse
|
111
|
Zhai J, Knox K, Twigg HL, Zhou H, Zhou JJ. Exact variance component tests for longitudinal microbiome studies. Genet Epidemiol 2019; 43:250-262. [PMID: 30623484 DOI: 10.1002/gepi.22185] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 10/28/2018] [Accepted: 11/26/2018] [Indexed: 01/12/2023]
Abstract
In metagenomic studies, testing the association between microbiome composition and clinical outcomes translates to testing the nullity of variance components. Motivated by a lung human immunodeficiency virus (HIV) microbiome project, we study longitudinal microbiome data by using variance component models with more than two variance components. Current testing strategies only apply to models with exactly two variance components and when sample sizes are large. Therefore, they are not applicable to longitudinal microbiome studies. In this paper, we propose exact tests (score test, likelihood ratio test, and restricted likelihood ratio test) to (a) test the association of the overall microbiome composition in a longitudinal design and (b) detect the association of one specific microbiome cluster while adjusting for the effects from related clusters. Our approach combines the exact tests for null hypothesis with a single variance component with a strategy of reducing multiple variance components to a single one. Simulation studies demonstrate that our method has a correct type I error rate and superior power compared to existing methods at small sample sizes and weak signals. Finally, we apply our method to a longitudinal pulmonary microbiome study of HIV-infected patients and reveal two interesting genera Prevotella and Veillonella associated with forced vital capacity. Our findings shed light on the impact of the lung microbiome on HIV complexities. The method is implemented in the open-source, high-performance computing language Julia and is freely available at https://github.com/JingZhai63/VCmicrobiome.
Collapse
Affiliation(s)
- Jing Zhai
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona
| | - Kenneth Knox
- Division of Pulmonary, Allergy, Critical Care, Sleep Medicine, Department of Medicine, University of Arizona, Tucson, Arizona
| | - Homer L Twigg
- Division of Pulmonary, Critical Care, Sleep, and Occupational Medicine, Indiana University Medical Center, Indianapolis, Indiana
| | - Hua Zhou
- Department of Biostatistics, University of California, Los Angeles, California
| | - Jin J Zhou
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona
| |
Collapse
|
112
|
Abe K, Hirayama M, Ohno K, Shimamura T. A latent allocation model for the analysis of microbial composition and disease. BMC Bioinformatics 2018; 19:519. [PMID: 30598099 PMCID: PMC6311924 DOI: 10.1186/s12859-018-2530-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Establishing the relationship between microbiota and specific diseases is important but requires appropriate statistical methodology. A specialized feature of microbiome count data is the presence of a large number of zeros, which makes it difficult to analyze in case-control studies. Most existing approaches either add a small number called a pseudo-count or use probability models such as the multinomial and Dirichlet-multinomial distributions to explain the excess zero counts, which may produce unnecessary biases and impose a correlation structure taht is unsuitable for microbiome data. Results The purpose of this article is to develop a new probabilistic model, called BERnoulli and MUltinomial Distribution-based latent Allocation (BERMUDA), to address these problems. BERMUDA enables us to describe the differences in bacteria composition and a certain disease among samples. We also provide a simple and efficient learning procedure for the proposed model using an annealing EM algorithm. Conclusion We illustrate the performance of the proposed method both through both the simulation and real data analysis. BERMUDA is implemented with R and is available from GitHub (https://github.com/abikoushi/Bermuda).
Collapse
Affiliation(s)
- Ko Abe
- Division of Systems Biology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, 4668550, Japan
| | - Masaaki Hirayama
- School of Health Sciences, Nagoya University Graduate School of Medicine, 1-1-20 Daiko-Minami, Higashi-ku, Nagoya, 61-8873, Japan
| | - Kinji Ohno
- Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, 4668550, Japan
| | - Teppei Shimamura
- Division of Systems Biology, Nagoya university Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, 4668550, Japan.
| |
Collapse
|
113
|
Jonsson V, Österlund T, Nerman O, Kristiansson E. Modelling of zero-inflation improves inference of metagenomic gene count data. Stat Methods Med Res 2018; 28:3712-3728. [PMID: 30474490 DOI: 10.1177/0962280218811354] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Metagenomics enables the study of gene abundances in complex mixtures of microorganisms and has become a standard methodology for the analysis of the human microbiome. However, gene abundance data is inherently noisy and contains high levels of biological and technical variability as well as an excess of zeros due to non-detected genes. This makes the statistical analysis challenging. In this study, we present a new hierarchical Bayesian model for inference of metagenomic gene abundance data. The model uses a zero-inflated overdispersed Poisson distribution which is able to simultaneously capture the high gene-specific variability as well as zero observations in the data. By analysis of three comprehensive datasets, we show that zero-inflation is common in metagenomic data from the human gut and, if not correctly modelled, it can lead to substantial reductions in statistical power. We also show, by using resampled metagenomic data, that our model has, compared to other methods, a higher and more stable performance for detecting differentially abundant genes. We conclude that proper modelling of the gene-specific variability, including the excess of zeros, is necessary to accurately describe gene abundances in metagenomic data. The proposed model will thus pave the way for new biological insights into the structure of microbial communities.
Collapse
Affiliation(s)
- Viktor Jonsson
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden.,Computational Systems Biology, Chalmers University of Technology, Gothenburg, Sweden
| | - Tobias Österlund
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden
| | - Olle Nerman
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden
| | - Erik Kristiansson
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
114
|
Chen J, King E, Deek R, Wei Z, Yu Y, Grill D, Ballman K, Stegle O. An omnibus test for differential distribution analysis of microbiome sequencing data. Bioinformatics 2018; 34:643-651. [PMID: 29040451 DOI: 10.1093/bioinformatics/btx650] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 10/11/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation One objective of human microbiome studies is to identify differentially abundant microbes across biological conditions. Previous statistical methods focus on detecting the shift in the abundance and/or prevalence of the microbes and treat the dispersion (spread of the data) as a nuisance. These methods also assume that the dispersion is the same across conditions, an assumption which may not hold in presence of sample heterogeneity. Moreover, the widespread outliers in the microbiome sequencing data make existing parametric models not overly robust. Therefore, a robust and powerful method that allows covariate-dependent dispersion and addresses outliers is still needed for differential abundance analysis. Results We introduce a novel test for differential distribution analysis of microbiome sequencing data by jointly testing the abundance, prevalence and dispersion. The test is built on a zero-inflated negative binomial regression model and winsorized count data to account for zero-inflation and outliers. Using simulated data and real microbiome sequencing datasets, we show that our test is robust across various biological conditions and overall more powerful than previous methods. Availability and implementation R package is available at https://github.com/jchen1981/MicrobiomeDDA. Contact chen.jun2@mayo.edu or zhiwei@njit.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Chen
- Division of Biomedical Statistics and Informatics.,Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Emily King
- Division of Biomedical Statistics and Informatics.,Department of Statistics, Iowa State University, Ames, IA 50011, USA
| | - Rebecca Deek
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Yue Yu
- Division of Biomedical Statistics and Informatics.,College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Diane Grill
- Division of Biomedical Statistics and Informatics
| | - Karla Ballman
- Department of Healthcare Policy and Research, Weill Cornell Medical College, New York, NY 10065, USA
| | | |
Collapse
|
115
|
Zhan X, Xue L, Zheng H, Plantinga A, Wu MC, Schaid DJ, Zhao N, Chen J. A small‐sample kernel association test for correlated data with application to microbiome association studies. Genet Epidemiol 2018; 42:772-782. [DOI: 10.1002/gepi.22160] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 06/27/2018] [Accepted: 07/15/2018] [Indexed: 01/11/2023]
Affiliation(s)
- Xiang Zhan
- Department of Public Health SciencesPennsylvania State UniversityHershey Pennsylvania
| | - Lingzhou Xue
- Department of StatisticsPennsylvania State UniversityUniversity Park Pennsylvania
| | - Haotian Zheng
- Department of Mathematical SciencesTsinghua UniversityBeijing China
| | - Anna Plantinga
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
| | - Michael C. Wu
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
- Division of Public Health SciencesFred Hutchinson Cancer Research CenterSeattle Washington
| | - Daniel J. Schaid
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
| | - Ni Zhao
- Department of BiostatisticsJohns Hopkins UniversityBaltimore Maryland
| | - Jun Chen
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
- Center for Individualized MedicineMayo ClinicRochester Minnesota
| |
Collapse
|
116
|
Zhang X, Pei YF, Zhang L, Guo B, Pendegraft AH, Zhuang W, Yi N. Negative Binomial Mixed Models for Analyzing Longitudinal Microbiome Data. Front Microbiol 2018; 9:1683. [PMID: 30093893 PMCID: PMC6070621 DOI: 10.3389/fmicb.2018.01683] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 07/06/2018] [Indexed: 01/09/2023] Open
Abstract
The metagenomics sequencing data provide valuable resources for investigating the associations between the microbiome and host environmental/clinical factors and the dynamic changes of microbial abundance over time. The distinct properties of microbiome measurements include varied total sequence reads across samples, over-dispersion and zero-inflation. Additionally, microbiome studies usually collect samples longitudinally, which introduces time-dependent and correlation structures among the samples and thus further complicates the analysis and interpretation of microbiome count data. In this article, we propose negative binomial mixed models (NBMMs) for longitudinal microbiome studies. The proposed NBMMs can efficiently handle over-dispersion and varying total reads, and can account for the dynamic trend and correlation among longitudinal samples. We develop an efficient and stable algorithm to fit the NBMMs. We evaluate and demonstrate the NBMMs method via extensive simulation studies and application to a longitudinal microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of flexible framework for modeling correlation structures and detecting dynamic effects. We have developed an R package NBZIMM to implement the proposed method, which is freely available from the public GitHub repository http://github.com//nyiuab//NBZIMM and provides a useful tool for analyzing longitudinal microbiome data.
Collapse
Affiliation(s)
- Xinyan Zhang
- Department of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA, United States
| | - Yu-Fang Pei
- Department of Epidemiology and Health Statistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Lei Zhang
- Department of Epidemiology and Health Statistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Boyi Guo
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Amanda H Pendegraft
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Wenzhuo Zhuang
- Department of Cell Biology, School of Biology & Basic Medical Science, Soochow University, Suzhou, China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| |
Collapse
|
117
|
Hu J, Koh H, He L, Liu M, Blaser MJ, Li H. A two-stage microbial association mapping framework with advanced FDR control. MICROBIOME 2018; 6:131. [PMID: 30045760 PMCID: PMC6060480 DOI: 10.1186/s40168-018-0517-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 07/11/2018] [Indexed: 05/31/2023]
Abstract
BACKGROUND In microbiome studies, it is important to detect taxa which are associated with pathological outcomes at the lowest definable taxonomic rank, such as genus or species. Traditionally, taxa at the target rank are tested for individual association, followed by the Benjamini-Hochberg (BH) procedure to control for false discovery rate (FDR). However, this approach neglects the dependence structure among taxa and may lead to conservative results. The taxonomic tree of microbiome data represents alignment from phylum to species rank and characterizes evolutionary relationships across microbial taxa. Taxa that are closer on the tree usually have similar responses to the exposure (environment). The statistical power in microbial association tests can be enhanced by efficiently employing the prior evolutionary information via the taxonomic tree. METHODS We propose a two-stage microbial association mapping framework (massMap) which uses grouping information from the taxonomic tree to strengthen statistical power in association tests at the target rank. massMap first screens the association of taxonomic groups at a pre-selected higher taxonomic rank using a powerful microbial group test OMiAT. The method then proceeds to test the association for each candidate taxon at the target rank within the significant taxonomic groups identified in the first stage. Hierarchical BH (HBH) and selected subset testing (SST) procedures are evaluated to control the FDR for the two-stage structured tests. RESULTS Our simulations show that massMap incorporating OMiAT and the advanced FDR controlling methodologies largely alleviates the multiplicity issue. It is statistically more powerful than the traditional association mapping directly at the target rank while controlling the FDR at desired levels under most scenarios. In our real data analyses, massMap detects more or the same amount of associated species with smaller adjusted p values compared to the traditional method, which further illustrates the efficiency of the proposed framework. The R package of massMap is publicly available at https://sites.google.com/site/huilinli09/software and https://github.com/JiyuanHu/ . CONCLUSIONS massMap is a novel microbial association mapping framework and achieves additional efficiency by utilizing the intrinsic taxonomic structure of microbiome data.
Collapse
Affiliation(s)
- Jiyuan Hu
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016 USA
- Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, 200433 China
| | - Hyunwook Koh
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016 USA
| | - Linchen He
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016 USA
| | - Menghan Liu
- Department of Medicine, New York University School of Medicine, New York, NY 10016 USA
| | - Martin J. Blaser
- Department of Medicine, New York University School of Medicine, New York, NY 10016 USA
| | - Huilin Li
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016 USA
| |
Collapse
|
118
|
Li Z, Lee K, Karagas MR, Madan JC, Hoen AG, O'Malley AJ, Li H. Conditional Regression Based on a Multivariate Zero-Inflated Logistic-Normal Model for Microbiome Relative Abundance Data. STATISTICS IN BIOSCIENCES 2018; 10:587-608. [PMID: 30923584 DOI: 10.1007/s12561-018-9219-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The human microbiome plays critical roles in human health and has been linked to many diseases. While advanced sequencing technologies can characterize the composition of the microbiome in unprecedented detail, it remains challenging to disentangle the complex interplay between human microbiome and disease risk factors due to the complicated nature of microbiome data. Excessive numbers e f zero values, high dimensionality, the hierarchical phylogenetic tree and compositional structure are compounded and consequently make existing methods inadequate to appropriately address these issues. We propose a multivariate two-part zero-inflated logistic normal (MZILN) model to analyze the association of disease risk factors with individual microbial taxa and overall microbial community composition. This approach can naturally handle excessive numbers e f zeros and the compositional data structure with the discrete part and the logistic-normal part e f the model. For parameter estimation, an estimating equations approach is employed that enables us to address the complex inter-taxa correlation structure induced by the hierarchical phylogenetic tree structure and the compositional data structure. This model is able to incorporate standard regularization approaches to deal with high dimensionality. Simulation shews that our model outperforms existing methods. Our approach is also compared to ethers using the analysis of real microbiome data.
Collapse
Affiliation(s)
- Zhigang Li
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NK 03756, USA.,Children's Environmental Health and Disease Prevention Research Center at Dartmouth, Hanever, New Hampshire.,Department of Epidemiology, Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NH 03756, USA.,Department of Biestatistics, University e f Florida, Gainesville, fL 32611, USA
| | | | - Margaret R Karagas
- Children's Environmental Health and Disease Prevention Research Center at Dartmouth, Hanever, New Hampshire.,Department of Epidemiology, Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NH 03756, USA
| | - Juliette C Madan
- Children's Environmental Health and Disease Prevention Research Center at Dartmouth, Hanever, New Hampshire.,Department of Epidemiology, Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NH 03756, USA.,Division of Neenatelegy, Department of Pediatrics, Children's Hospital at Dartmouth, Lebanon, New Kampshire
| | - Anne G Hoen
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NK 03756, USA.,Children's Environmental Health and Disease Prevention Research Center at Dartmouth, Hanever, New Hampshire.,Department of Epidemiology, Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NH 03756, USA
| | - A James O'Malley
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NK 03756, USA.,The Dartmouth Institute for Kealth Policy and Clinical Practice, Geisel School e f Medicine at Dartmouth, 1 Medical Center Drive, Lebanon, NK 03756, USA
| | - Hongzhe Li
- Department of Biestatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
119
|
Chai H, Jiang H, Lin L, Liu L. A marginalized two-part Beta regression model for microbiome compositional data. PLoS Comput Biol 2018; 14:e1006329. [PMID: 30036363 PMCID: PMC6072097 DOI: 10.1371/journal.pcbi.1006329] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 08/02/2018] [Accepted: 06/26/2018] [Indexed: 12/21/2022] Open
Abstract
In microbiome studies, an important goal is to detect differential abundance of microbes across clinical conditions and treatment options. However, the microbiome compositional data (quantified by relative abundance) are highly skewed, bounded in [0, 1), and often have many zeros. A two-part model is commonly used to separate zeros and positive values explicitly by two submodels: a logistic model for the probability of a specie being present in Part I, and a Beta regression model for the relative abundance conditional on the presence of the specie in Part II. However, the regression coefficients in Part II cannot provide a marginal (unconditional) interpretation of covariate effects on the microbial abundance, which is of great interest in many applications. In this paper, we propose a marginalized two-part Beta regression model which captures the zero-inflation and skewness of microbiome data and also allows investigators to examine covariate effects on the marginal (unconditional) mean. We demonstrate its practical performance using simulation studies and apply the model to a real metagenomic dataset on mouse skin microbiota. We find that under the proposed marginalized model, without loss in power, the likelihood ratio test performs better in controlling the type I error than those under conventional methods.
Collapse
Affiliation(s)
- Haitao Chai
- Institute for Financial Studies, Shandong University, Jinan, Shandong, China
- Department of Preventive Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Hongmei Jiang
- Department of Statistics, Northwestern University, Evanston, Illinois, United States of America
| | - Lu Lin
- Institute for Financial Studies, Shandong University, Jinan, Shandong, China
| | - Lei Liu
- Department of Preventive Medicine, Northwestern University, Chicago, Illinois, United States of America
- Division of Biostatistics, Washington University in St. Louis, St. Louis, Missouri, United States of America
| |
Collapse
|
120
|
Sitarik A, Havstad S, Levin A, Lynch SV, Fujimura K, Ownby D, Johnson C, Wegienka G. Dog introduction alters the home dust microbiota. INDOOR AIR 2018; 28:539-547. [PMID: 29468742 PMCID: PMC6003855 DOI: 10.1111/ina.12456] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 02/13/2018] [Indexed: 05/13/2023]
Abstract
Research has largely reported that dog exposure is associated with reduced allergic disease risk. Responsible mechanism(s) are not understood. The goal was to investigate whether introducing a dog into the home changes the home dust microbiota. Families without dogs or cats planning to adopt a dog and those who were not were recruited. Dust samples were collected from the homes at recruitment and 12 months later. Microbiota composition and taxa (V4 region of the 16S rRNA gene) were compared between homes that did and did not adopt a dog. A total of 91 dust samples from 54 families (27 each, dog and no dog; 17 dog and 20 no dog homes with paired samples) were analyzed. A significant dog effect was seen across time in both unweighted UniFrac and Canberra metrics (both P = .008), indicating dog introduction may result in rapid establishment of rarer and phylogenetically related taxa. A significant dog-time interaction was seen in both weighted UniFrac (P < .001) and Bray-Curtis (P = .002) metrics, suggesting that while there may not initially be large relative abundance shifts following dog introduction, differences can be seen within a year. Therefore, dog introduction into the home has both immediate effects and effects that emerge over time.
Collapse
Affiliation(s)
- Alexandra Sitarik
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI
- The in-FLAME Global Network, an affiliate of the World Universities Network (WUN), West New York, NJ 07093 USA
| | - Suzanne Havstad
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI
- The in-FLAME Global Network, an affiliate of the World Universities Network (WUN), West New York, NJ 07093 USA
| | - Albert Levin
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI
| | - Susan V. Lynch
- Division of Gastroenterology, University of California, San Francisco, California
| | - Kei Fujimura
- Division of Gastroenterology, University of California, San Francisco, California
| | - Dennis Ownby
- Department of Pediatrics, Medical College of Georgia at Augusta University, Augusta, Georgia
| | - Christine Johnson
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI
- The in-FLAME Global Network, an affiliate of the World Universities Network (WUN), West New York, NJ 07093 USA
| | - Ganesa Wegienka
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI
- The in-FLAME Global Network, an affiliate of the World Universities Network (WUN), West New York, NJ 07093 USA
| |
Collapse
|
121
|
Tipton L, Cuenco KT, Huang L, Greenblatt RM, Kleerup E, Sciurba F, Duncan SR, Donahoe MP, Morris A, Ghedin E. Measuring associations between the microbiota and repeated measures of continuous clinical variables using a lasso-penalized generalized linear mixed model. BioData Min 2018; 11:12. [PMID: 29983746 PMCID: PMC6003033 DOI: 10.1186/s13040-018-0173-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 05/27/2018] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Human microbiome studies in clinical settings generally focus on distinguishing the microbiota in health from that in disease at a specific point in time. However, microbiome samples may be associated with disease severity or continuous clinical health indicators that are often assessed at multiple time points. While the temporal data from clinical and microbiome samples may be informative, analysis of this type of data can be problematic for standard statistical methods. RESULTS To identify associations between microbiota and continuous clinical variables measured repeatedly in two studies of the respiratory tract, we adapted a statistical method, the lasso-penalized generalized linear mixed model (LassoGLMM). LassoGLMM can screen for associated clinical variables, incorporate repeated measures of individuals, and address the large number of species found in the microbiome. As is common in microbiome studies, when the number of variables is an order of magnitude larger than the number of samples LassoGLMM can be imperfect in its variable selection. We overcome this limitation by adding a pre-screening step to reduce the number of variables evaluated in the model. We assessed the use of this adapted two-stage LassoGLMM for its ability to determine which microbes are associated with continuous repeated clinical measures.We found associations (retaining a non-zero coefficient in the LassoGLMM) between 10 laboratory measurements and 43 bacterial genera in the oral microbiota, and between 2 cytokines and 3 bacterial genera in the lung. We compared our associations with those identified by the Wilcoxon test after dichotomizing our outcomes and identified a non-significant trend towards differential abundance between high and low outcomes. Our two-step LassoGLMM explained more of the variance seen in the outcome of interest than other variants of the LassoGLMM method. CONCLUSIONS We demonstrated a method that can account for the large number of genera detected in microbiome studies and repeated measures of clinical or longitudinal studies, allowing for the detection of strong associations between microbes and clinical measures. By incorporating the design strengths of repeated measurements and a prescreening step to aid variable selection, our two-step LassoGLMM will be a useful analytic method for investigating relationships between microbes and repeatedly measured continuous outcomes.
Collapse
Affiliation(s)
- Laura Tipton
- Department of Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261 USA
- Department of Biology, Center for Genomics & Systems Biology, New York University, New York, NY 10003 USA
| | - Karen T. Cuenco
- Genentech, 1 DNA Way, MS-231C, South San Francisco, CA 94080 USA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261 USA
| | - Laurence Huang
- Department of Medicine, School of Medicine, University of California, San Francisco, CA 94143 USA
| | - Ruth M. Greenblatt
- Department of Medicine, School of Medicine, University of California, San Francisco, CA 94143 USA
- Departments of Clinical Pharmacy, Epidemiology and Biostatistics, Schools of Pharmacy and Medicine, University of California, San Francisco, CA 94143 USA
| | - Eric Kleerup
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA 90095 USA
| | - Frank Sciurba
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213 USA
| | - Steven R. Duncan
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213 USA
| | - Michael P. Donahoe
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213 USA
| | - Alison Morris
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213 USA
| | - Elodie Ghedin
- Department of Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261 USA
- Department of Biology, Center for Genomics & Systems Biology, New York University, New York, NY 10003 USA
- College of Global Public Health, New York University, New York, NY 10003 USA
| |
Collapse
|
122
|
Abstract
The human microbiome is associated with complex disorders such as diabetes, cancer, obesity and cardiovascular disorders. Recent technological developments have allowed researchers to fully quantify the composition of the microbiome using culture-independent approaches, resulting in a large amount of microbiome data, which provide invaluable opportunities to assess the important contributions of the microbiome to human health and disease. In this chapter, we discuss and evaluate multiple statistical approaches for processing, summarizing, and analyzing microbiome data. Specifically, we provide programming scripts for processing microbiome data using QIIME and calculating alpha and beta diversities, assessing the association between diversities and outcomes of interest using R programs, as well as interpretation of results. We illustrate the methods in the context of analyzing the foregut microbiome in esophageal adenocarcinoma.
Collapse
|
123
|
Wagner BD, Grunwald GK, Zerbe GO, Mikulich-Gilbertson SK, Robertson CE, Zemanick ET, Harris JK. On the Use of Diversity Measures in Longitudinal Sequencing Studies of Microbial Communities. Front Microbiol 2018; 9:1037. [PMID: 29872428 PMCID: PMC5972327 DOI: 10.3389/fmicb.2018.01037] [Citation(s) in RCA: 117] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 05/01/2018] [Indexed: 01/09/2023] Open
Abstract
Identification of the majority of organisms present in human-associated microbial communities is feasible with the advent of high throughput sequencing technology. As substantial variability in microbiota communities is seen across subjects, the use of longitudinal study designs is important to better understand variation of the microbiome within individual subjects. Complex study designs with longitudinal sample collection require analytic approaches to account for this additional source of variability. A common approach to assessing community changes is to evaluate the change in alpha diversity (the variety and abundance of organisms in a community) over time. However, there are several commonly used alpha diversity measures and the use of different measures can result in different estimates of magnitude of change and different inferences. It has recently been proposed that diversity profile curves are useful for clarifying these differences, and may provide a more complete picture of the community structure. However, it is unclear how to utilize these curves when interest is in evaluating changes in community structure over time. We propose the use of a bi-exponential function in a longitudinal model that accounts for repeated measures on each subject to compare diversity profiles over time. Furthermore, it is possible that no change in alpha diversity (single community/sample) may be observed despite the presence of a highly divergent community composition. Thus, it is also important to use a beta diversity measure (similarity between multiple communities/samples) that captures changes in community composition. Ecological methods developed to evaluate temporal turnover have currently only been applied to investigate changes of a single community over time. We illustrate the extension of this approach to multiple communities of interest (i.e., subjects) by modeling the beta diversity measure over time. With this approach, a rate of change in community composition is estimated. There is a need for the extension and development of analytic methods for longitudinal microbiota studies. In this paper, we discuss different approaches to model alpha and beta diversity indices in longitudinal microbiota studies and provide both a review of current approaches and a proposal for new methods.
Collapse
Affiliation(s)
- Brandie D. Wagner
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado, Anschutz Medical Campus, Aurora, CO, United States
- Department of Pediatrics, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, United States
| | - Gary K. Grunwald
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado, Anschutz Medical Campus, Aurora, CO, United States
| | - Gary O. Zerbe
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado, Anschutz Medical Campus, Aurora, CO, United States
| | - Susan K. Mikulich-Gilbertson
- Department of Psychiatry, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, United States
| | - Charles E. Robertson
- Department of Molecular, Cellular and Developmental Biology, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, United States
| | - Edith T. Zemanick
- Department of Pediatrics, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, United States
| | - J. Kirk Harris
- Department of Pediatrics, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
124
|
Shields-Cutler RR, Al-Ghalith GA, Yassour M, Knights D. SplinectomeR Enables Group Comparisons in Longitudinal Microbiome Studies. Front Microbiol 2018; 9:785. [PMID: 29740416 PMCID: PMC5924793 DOI: 10.3389/fmicb.2018.00785] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 04/06/2018] [Indexed: 12/17/2022] Open
Abstract
Longitudinal, prospective studies often rely on multi-omics approaches, wherein various specimens are analyzed for genomic, metabolomic, and/or transcriptomic profiles. In practice, longitudinal studies in humans and other animals routinely suffer from subject dropout, irregular sampling, and biological variation that may not be normally distributed. As a result, testing hypotheses about observations over time can be statistically challenging without performing transformations and dramatic simplifications to the dataset, causing a loss of longitudinal power in the process. Here, we introduce splinectomeR, an R package that uses smoothing splines to summarize data for straightforward hypothesis testing in longitudinal studies. The package is open-source, and can be used interactively within R or run from the command line as a standalone tool. We present a novel in-depth analysis of a published large-scale microbiome study as an example of its utility in straightforward testing of key hypotheses. We expect that splinectomeR will be a useful tool for hypothesis testing in longitudinal microbiome studies.
Collapse
Affiliation(s)
- Robin R Shields-Cutler
- BioTechnology Institute, College of Biological Sciences, University of Minnesota, Minneapolis, MN, United States
| | - Gabe A Al-Ghalith
- Bioinformatics and Computational Biology, University of Minnesota, Minneapolis, MN, United States
| | - Moran Yassour
- Broad Institute of Massachusetts Institute of Technology, Harvard University, Cambridge, MA, United States.,Center for Computational and Integrative Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States
| | - Dan Knights
- BioTechnology Institute, College of Biological Sciences, University of Minnesota, Minneapolis, MN, United States.,Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, United States
| |
Collapse
|
125
|
Chen L, Garmaeva S, Zhernakova A, Fu J, Wijmenga C. A system biology perspective on environment–host–microbe interactions. Hum Mol Genet 2018; 27:R187-R194. [DOI: 10.1093/hmg/ddy137] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 04/11/2018] [Indexed: 02/07/2023] Open
Affiliation(s)
- Lianmin Chen
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
- Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
| | - Sanzhima Garmaeva
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
| | - Alexandra Zhernakova
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
| | - Jingyuan Fu
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
- Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
| | - Cisca Wijmenga
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, RB, The Netherlands
- Department of Immunology, K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, Oslo, Norway
| |
Collapse
|
126
|
Zhai J, Kim J, Knox KS, Twigg HL, Zhou H, Zhou JJ. Variance Component Selection With Applications to Microbiome Taxonomic Data. Front Microbiol 2018; 9:509. [PMID: 29643839 PMCID: PMC5883493 DOI: 10.3389/fmicb.2018.00509] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 03/06/2018] [Indexed: 12/21/2022] Open
Abstract
High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection.
Collapse
Affiliation(s)
- Jing Zhai
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ, United States
| | - Juhyun Kim
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA, United States
| | - Kenneth S Knox
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, Department of Medicine, University of Arizona, Tucson, AZ, United States
| | - Homer L Twigg
- Division of Pulmonary, Critical Care, Sleep, and Occupational Medicine, Indiana University Medical Center, Indianapolis, IN, United States
| | - Hua Zhou
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA, United States
| | - Jin J Zhou
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ, United States
| |
Collapse
|
127
|
Lee J, Sison-Mangus M. A Bayesian Semiparametric Regression Model for Joint Analysis of Microbiome Data. Front Microbiol 2018; 9:522. [PMID: 29632519 PMCID: PMC5879107 DOI: 10.3389/fmicb.2018.00522] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 03/08/2018] [Indexed: 11/13/2022] Open
Abstract
The successional dynamics of microbial communities are influenced by the synergistic interactions of physical and biological factors. In our motivating data, ocean microbiome samples were collected from the Santa Cruz Municipal Wharf, Monterey Bay at multiple time points and then 16S ribosomal RNA (rRNA) sequenced. We develop a Bayesian semiparametric regression model to investigate how microbial abundance and succession change with covarying physical and biological factors including algal bloom and domoic acid concentration level using 16S rRNA sequencing data. A generalized linear regression model is built using the Laplace prior, a sparse inducing prior, to improve estimation of covariate effects on mean abundances of microbial species represented by operational taxonomic units (OTUs). A nonparametric prior model is used to facilitate borrowing strength across OTUs, across samples and across time points. It flexibly estimates baseline mean abundances of OTUs and provides the basis for improved quantification of covariate effects. The proposed method does not require prior normalization of OTU counts to adjust differences in sample total counts. Instead, the normalization and estimation of covariate effects on OTU abundance are simultaneously carried out for joint analysis of all OTUs. Using simulation studies and a real data analysis, we demonstrate improved inference compared to an existing method.
Collapse
Affiliation(s)
- Juhee Lee
- Department of Applied Mathematics and Statistics, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Marilou Sison-Mangus
- Department of Ocean Sciences, University of California, Santa Cruz, Santa Cruz, CA, United States
| |
Collapse
|
128
|
|
129
|
Xia Y, Sun J, Chen DG. Introductory Overview of Statistical Analysis of Microbiome Data. STATISTICAL ANALYSIS OF MICROBIOME DATA WITH R 2018. [DOI: 10.1007/978-981-13-1534-3_3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
130
|
Mallick H, Ma S, Franzosa EA, Vatanen T, Morgan XC, Huttenhower C. Experimental design and quantitative analysis of microbial community multiomics. Genome Biol 2017; 18:228. [PMID: 29187204 PMCID: PMC5708111 DOI: 10.1186/s13059-017-1359-z] [Citation(s) in RCA: 123] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Studies of the microbiome have become increasingly sophisticated, and multiple sequence-based, molecular methods as well as culture-based methods exist for population-scale microbiome profiles. To link the resulting host and microbial data types to human health, several experimental design considerations, data analysis challenges, and statistical epidemiological approaches must be addressed. Here, we survey current best practices for experimental design in microbiome molecular epidemiology, including technologies for generating, analyzing, and integrating microbiome multiomics data. We highlight studies that have identified molecular bioactives that influence human health, and we suggest steps for scaling translational microbiome research to high-throughput target discovery across large populations.
Collapse
Affiliation(s)
- Himel Mallick
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Siyuan Ma
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Eric A Franzosa
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Tommi Vatanen
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Xochitl C Morgan
- Department of Microbiology and Immunology, The University of Otago, Dunedin, New Zealand
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
131
|
Kaul A, Mandal S, Davidov O, Peddada SD. Analysis of Microbiome Data in the Presence of Excess Zeros. Front Microbiol 2017; 8:2114. [PMID: 29163406 PMCID: PMC5682008 DOI: 10.3389/fmicb.2017.02114] [Citation(s) in RCA: 193] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 10/17/2017] [Indexed: 01/12/2023] Open
Abstract
Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data. Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR.
Collapse
Affiliation(s)
- Abhishek Kaul
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences (NIH), Durham, NC, United States
| | | | - Ori Davidov
- Department of Statistics, University of Haifa, Haifa, Israel
| | - Shyamal D. Peddada
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences (NIH), Durham, NC, United States
| |
Collapse
|
132
|
Zhang Y, Han SW, Cox LM, Li H. A multivariate distance-based analytic framework for microbial interdependence association test in longitudinal study. Genet Epidemiol 2017; 41:769-778. [PMID: 28872698 DOI: 10.1002/gepi.22065] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Revised: 05/30/2017] [Accepted: 07/10/2017] [Indexed: 12/31/2022]
Abstract
Human microbiome is the collection of microbes living in and on the various parts of our body. The microbes living on our body in nature do not live alone. They act as integrated microbial community with massive competing and cooperating and contribute to our human health in a very important way. Most current analyses focus on examining microbial differences at a single time point, which do not adequately capture the dynamic nature of the microbiome data. With the advent of high-throughput sequencing and analytical tools, we are able to probe the interdependent relationship among microbial species through longitudinal study. Here, we propose a multivariate distance-based test to evaluate the association between key phenotypic variables and microbial interdependence utilizing the repeatedly measured microbiome data. Extensive simulations were performed to evaluate the validity and efficiency of the proposed method. We also demonstrate the utility of the proposed test using a well-designed longitudinal murine experiment and a longitudinal human study. The proposed methodology has been implemented in the freely distributed open-source R package and Python code.
Collapse
Affiliation(s)
- Yilong Zhang
- Merck Research Laboratories, Rahway, New Jersey, United States of America
| | - Sung Won Han
- Fusion Data Analytics Lab, School of Industrial Management Engineering, Korea University, Seoul, South Korea
| | - Laura M Cox
- Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Huilin Li
- Department of Population Health (Biostatistics), NYU Langone Medical Center, New York, NY, United States of America.,Department of Environmental Medicine, NYU Langone Medical Center, New York, NY, United States of America
| |
Collapse
|
133
|
Patel CJ, Kerr J, Thomas DC, Mukherjee B, Ritz B, Chatterjee N, Jankowska M, Madan J, Karagas MR, McAllister KA, Mechanic LE, Fallin MD, Ladd-Acosta C, Blair IA, Teitelbaum SL, Amos CI. Opportunities and Challenges for Environmental Exposure Assessment in Population-Based Studies. Cancer Epidemiol Biomarkers Prev 2017; 26:1370-1380. [PMID: 28710076 PMCID: PMC5581729 DOI: 10.1158/1055-9965.epi-17-0459] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 06/14/2017] [Accepted: 06/22/2017] [Indexed: 12/15/2022] Open
Abstract
A growing number and increasing diversity of factors are available for epidemiological studies. These measures provide new avenues for discovery and prevention, yet they also raise many challenges for adoption in epidemiological investigations. Here, we evaluate 1) designs to investigate diseases that consider heterogeneous and multidimensional indicators of exposure and behavior, 2) the implementation of numerous methods to capture indicators of exposure, and 3) the analytical methods required for discovery and validation. We find that case-control studies have provided insights into genetic susceptibility but are insufficient for characterizing complex effects of environmental factors on disease development. Prospective and two-phase designs are required but must balance extended data collection with follow-up of study participants. We discuss innovations in assessments including the microbiome; mass spectrometry and metabolomics; behavioral assessment; dietary, physical activity, and occupational exposure assessment; air pollution monitoring; and global positioning and individual sensors. We claim the the availability of extensive correlated data raises new challenges in disentangling specific exposures that influence cancer risk from among extensive and often correlated exposures. In conclusion, new high-dimensional exposure assessments offer many new opportunities for environmental assessment in cancer development. Cancer Epidemiol Biomarkers Prev; 26(9); 1370-80. ©2017 AACR.
Collapse
Affiliation(s)
- Chirag J Patel
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts.
| | - Jacqueline Kerr
- Department of Family Medicine and Public Health, University of California San Diego, La Jolla, California
| | - Duncan C Thomas
- Department of Preventive Medicine, University of Southern California, Los Angeles, California
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan
| | - Beate Ritz
- Department of Epidemiology, Fielding School of Public Health, University of California Los Angeles, Los Angeles, California
| | - Nilanjan Chatterjee
- Department of Biostatistics and Department of Oncology, Johns Hopkins University, Baltimore, Maryland
| | - Marta Jankowska
- Department of Family Medicine and Public Health, University of California San Diego, La Jolla, California
| | - Juliette Madan
- Division of Neonatology, Department of Pediatrics, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire
| | - Margaret R Karagas
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Lebanon, New Hampshire
| | - Kimberly A McAllister
- Susceptibility and Population Health Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, North Carolina
| | - Leah E Mechanic
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, Maryland
| | - M Daniele Fallin
- Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland
| | | | - Ian A Blair
- Center of Excellence in Environmental Toxicology and Penn SRP Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Susan L Teitelbaum
- Department of Preventive Medicine, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Lebanon, New Hampshire.
| |
Collapse
|
134
|
Gorshein E, Wei C, Ambrosy S, Budney S, Vivas J, Shenkerman A, Manago J, McGrath MK, Tyno A, Lin Y, Patel V, Gharibo M, Schaar D, Jenq RR, Khiabanian H, Strair R. Lactobacillus rhamnosus GG probiotic enteric regimen does not appreciably alter the gut microbiome or provide protection against GVHD after allogeneic hematopoietic stem cell transplantation. Clin Transplant 2017; 31. [PMID: 28256022 DOI: 10.1111/ctr.12947] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/26/2017] [Indexed: 12/30/2022]
Abstract
Graft-versus-host disease (GVHD) is a major adverse effect associated with allogeneic stem cell transplant. Previous studies in mice indicated that administration of the probiotic Lactobacillus rhamnosus GG can reduce the incidence of GVHD after hematopoietic stem cell transplant. Here we report results from the first randomized probiotic enteric regimen trial in which allogenic hematopoietic stem cell patients were supplemented with Lactobacillus rhamnosus GG. Gut microbiome analysis confirmed a previously reported gut microbiome association with GVHD. However, the clinical trial was terminated when interim analysis did not detect an appreciable probiotic-related change in the gut microbiome or incidence of GVHD. Additional studies are necessary to determine whether probiotics can alter the incidence of GVHD after allogeneic stem cell transplant.
Collapse
Affiliation(s)
- Elan Gorshein
- Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ, USA
| | - Catherine Wei
- Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ, USA
| | - Susan Ambrosy
- Department of Pathology and Laboratory Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ, USA
| | - Shanna Budney
- Department of Pathology and Laboratory Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ, USA
| | - Juliana Vivas
- Department of Pathology and Laboratory Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ, USA
| | - Angelika Shenkerman
- Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Jacqueline Manago
- Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Mary Kate McGrath
- Department of Pathology and Laboratory Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ, USA
| | - Anne Tyno
- Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Yong Lin
- Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Vimal Patel
- Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Mecide Gharibo
- Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Dale Schaar
- Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Robert R Jenq
- Division of Cancer Medicine, Departments of Genomic Medicine and Stem Cell Transplantation Cellular Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Hossein Khiabanian
- Department of Pathology and Laboratory Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ, USA
| | - Roger Strair
- Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| |
Collapse
|
135
|
Bradley PH, Pollard KS. Proteobacteria explain significant functional variability in the human gut microbiome. MICROBIOME 2017; 5:36. [PMID: 28330508 PMCID: PMC5363007 DOI: 10.1186/s40168-017-0244-z] [Citation(s) in RCA: 137] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Accepted: 02/13/2017] [Indexed: 05/06/2023]
Abstract
BACKGROUND While human gut microbiomes vary significantly in taxonomic composition, biological pathway abundance is surprisingly invariable across hosts. We hypothesized that healthy microbiomes appear functionally redundant due to factors that obscure differences in gene abundance between individuals. RESULTS To account for these biases, we developed a powerful test of gene variability called CCoDA, which is applicable to shotgun metagenomes from any environment and can integrate data from multiple studies. Our analysis of healthy human fecal metagenomes from three separate cohorts revealed thousands of genes whose abundance differs significantly and consistently between people, including glycolytic enzymes, lipopolysaccharide biosynthetic genes, and secretion systems. Even housekeeping pathways contain a mix of variable and invariable genes, though most highly conserved genes are significantly invariable. Variable genes tend to be associated with Proteobacteria, as opposed to taxa used to define enterotypes or the dominant phyla Bacteroidetes and Firmicutes. CONCLUSIONS These results establish limits on functional redundancy and predict specific genes and taxa that may explain physiological differences between gut microbiomes.
Collapse
Affiliation(s)
| | - Katherine S. Pollard
- Gladstone Institutes, San Francisco, CA USA
- Division of Biostatistics, Institute for Human Genetics, and Institute for Computational Health Sciences, University of California, San Francisco, CA USA
| |
Collapse
|
136
|
Layeghifard M, Hwang DM, Guttman DS. Disentangling Interactions in the Microbiome: A Network Perspective. Trends Microbiol 2017; 25:217-228. [PMID: 27916383 PMCID: PMC7172547 DOI: 10.1016/j.tim.2016.11.008] [Citation(s) in RCA: 456] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 10/31/2016] [Accepted: 11/08/2016] [Indexed: 12/12/2022]
Abstract
Microbiota are now widely recognized as being central players in the health of all organisms and ecosystems, and subsequently have been the subject of intense study. However, analyzing and converting microbiome data into meaningful biological insights remain very challenging. In this review, we highlight recent advances in network theory and their applicability to microbiome research. We discuss emerging graph theoretical concepts and approaches used in other research disciplines and demonstrate how they are well suited for enhancing our understanding of the higher-order interactions that occur within microbiomes. Network-based analytical approaches have the potential to help disentangle complex polymicrobial and microbe-host interactions, and thereby further the applicability of microbiome research to personalized medicine, public health, environmental and industrial applications, and agriculture.
Collapse
Affiliation(s)
- Mehdi Layeghifard
- Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - David M Hwang
- Department of Pathology, University Health Network Toronto, Ontario, Canada; Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
| | - David S Guttman
- Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada; Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
137
|
Gibbons SM, Kearney SM, Smillie CS, Alm EJ. Two dynamic regimes in the human gut microbiome. PLoS Comput Biol 2017; 13:e1005364. [PMID: 28222117 PMCID: PMC5340412 DOI: 10.1371/journal.pcbi.1005364] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 03/07/2017] [Accepted: 01/16/2017] [Indexed: 12/22/2022] Open
Abstract
The gut microbiome is a dynamic system that changes with host development, health, behavior, diet, and microbe-microbe interactions. Prior work on gut microbial time series has largely focused on autoregressive models (e.g. Lotka-Volterra). However, we show that most of the variance in microbial time series is non-autoregressive. In addition, we show how community state-clustering is flawed when it comes to characterizing within-host dynamics and that more continuous methods are required. Most organisms exhibited stable, mean-reverting behavior suggestive of fixed carrying capacities and abundant taxa were largely shared across individuals. This mean-reverting behavior allowed us to apply sparse vector autoregression (sVAR)—a multivariate method developed for econometrics—to model the autoregressive component of gut community dynamics. We find a strong phylogenetic signal in the non-autoregressive co-variance from our sVAR model residuals, which suggests niche filtering. We show how changes in diet are also non-autoregressive and that Operational Taxonomic Units strongly correlated with dietary variables have much less of an autoregressive component to their variance, which suggests that diet is a major driver of microbial dynamics. Autoregressive variance appears to be driven by multi-day recovery from frequent facultative anaerobe blooms, which may be driven by fluctuations in luminal redox. Overall, we identify two dynamic regimes within the human gut microbiota: one likely driven by external environmental fluctuations, and the other by internal processes. Dynamics reveal crucial information about how a system functions. In this study, we develop an approach for disentangling two types of dynamics within the human gut microbiome. We find that autoregressive dynamics involve recovery from large deviations in community structure. These recovery processes appear to involve the blooming of facultative anaerobes and aerotolerant taxa, likely due to transient shifts in redox potential, followed by re-establishment of obligate anaerobes. Non-autoregressive dynamics carry a strong phylogenetic signal, wherein highly related taxa fluctuate coherently. These non-autoregressive dynamics appear to be driven by external, non-autoregressive variables like diet. We find that most of the community variance is driven by day-to-day fluctuations in the environment, with occasional autoregressive dynamics as the system recovers from larger shocks. Despite frequently observed disruptions to the gut ecosystem, there exists a returning force that continually pushes the gut microbiome back towards its steady-state configuration.
Collapse
Affiliation(s)
- Sean M. Gibbons
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- The Broad Institute, Cambridge, MA, United States of America
- The Center for Microbiome Informatics and Therapeutics, Cambridge, MA, United States of America
| | - Sean M. Kearney
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- The Broad Institute, Cambridge, MA, United States of America
- The Center for Microbiome Informatics and Therapeutics, Cambridge, MA, United States of America
| | - Chris S. Smillie
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- The Broad Institute, Cambridge, MA, United States of America
- The Center for Microbiome Informatics and Therapeutics, Cambridge, MA, United States of America
| | - Eric J. Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- The Broad Institute, Cambridge, MA, United States of America
- The Center for Microbiome Informatics and Therapeutics, Cambridge, MA, United States of America
- * E-mail:
| |
Collapse
|
138
|
Zheng X, Qin G, Tu D. A generalized partially linear mean-covariance regression model for longitudinal proportional data, with applications to the analysis of quality of life data from cancer clinical trials. Stat Med 2017; 36:1884-1894. [PMID: 28215045 DOI: 10.1002/sim.7240] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Revised: 12/21/2016] [Accepted: 01/17/2017] [Indexed: 11/07/2022]
Abstract
Motivated by the analysis of quality of life data from a clinical trial on early breast cancer, we propose in this paper a generalized partially linear mean-covariance regression model for longitudinal proportional data, which are bounded in a closed interval. Cholesky decomposition of the covariance matrix for within-subject responses and generalized estimation equations are used to estimate unknown parameters and the nonlinear function in the model. Simulation studies are performed to evaluate the performance of the proposed estimation procedures. Our new model is also applied to analyze the data from the cancer clinical trial that motivated this research. In comparison with available models in the literature, the proposed model does not require specific parametric assumptions on the density function of the longitudinal responses and the probability function of the boundary values and can capture dynamic changes of time or other interested variables on both mean and covariance of the correlated proportional responses. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Xueying Zheng
- Department of Biostatistics, School of Public Health, Key Laboratory of Public Health Safety and Collaborative Innovation Center of Social Risks Governance in Health, Fudan University, Shanghai, 200032, China
| | - Guoyou Qin
- Department of Biostatistics, School of Public Health, Key Laboratory of Public Health Safety and Collaborative Innovation Center of Social Risks Governance in Health, Fudan University, Shanghai, 200032, China
| | - Dongsheng Tu
- Canadian Cancer Trials Group, Queen's University, Kingston, Ontario, Canada.,Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
139
|
Zhang X, Mallick H, Tang Z, Zhang L, Cui X, Benson AK, Yi N. Negative binomial mixed models for analyzing microbiome count data. BMC Bioinformatics 2017; 18:4. [PMID: 28049409 PMCID: PMC5209949 DOI: 10.1186/s12859-016-1441-7] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 12/21/2016] [Indexed: 12/21/2022] Open
Abstract
Background Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data. Results In this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models. Conclusions We evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/ and http://github.com/abbyyan3/BhGLM), providing a useful tool for analyzing microbiome data.
Collapse
Affiliation(s)
- Xinyan Zhang
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, 35294-0022, USA
| | - Himel Mallick
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Program in Medical and Population Genetics, the Broad Institute, Cambridge, MA, 02142, USA
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China
| | - Lei Zhang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China
| | - Xiangqin Cui
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, 35294-0022, USA
| | - Andrew K Benson
- Department of Food Science and Technology and Core for Applied Genomics and Ecology, University of Nebraska, Lincoln, NE, 68583, USA
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, 35294-0022, USA.
| |
Collapse
|
140
|
Gregory KE, Samuel BS, Houghteling P, Shan G, Ausubel FM, Sadreyev RI, Walker WA. Influence of maternal breast milk ingestion on acquisition of the intestinal microbiome in preterm infants. MICROBIOME 2016; 4:68. [PMID: 28034306 PMCID: PMC5200970 DOI: 10.1186/s40168-016-0214-x] [Citation(s) in RCA: 155] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 11/29/2016] [Indexed: 05/14/2023]
Abstract
BACKGROUND The initial acquisition and early development of the intestinal microbiome during infancy are important to human health across the lifespan. Mode of birth, antibiotic administration, environment of care, and nutrition have all been shown to play a role in the assembly of the intestinal microbiome during early life. For preterm infants, who are disproportionately at risk of inflammatory intestinal disease (i.e., necrotizing enterocolitis), a unique set of clinical factors influence the establishment of the microbiome. The purpose of this study was to establish the influence of nutritional exposures on the intestinal microbiome in a cohort of preterm infants early in life. RESULTS Principal component analysis of 199 samples from 30 preterm infants (<32 weeks) over the first 60 days following birth showed that the intestinal microbiome was influenced by postnatal time (p < 0.001, R 2 = 0.13), birth weight (p < 0.001, R 2 = 0.08), and nutrition (p < 0.001, R 2 = 0.21). Infants who were fed breast milk had a greater initial bacterial diversity and a more gradual acquisition of diversity compared to infants who were fed infant formula. The microbiome of infants fed breast milk were more similar regardless of birth weight (p = 0.049), in contrast to the microbiome of infants fed infant formula, which clustered differently based on birth weight (p < 0.001). By adjusting for differences in gut maturity, an ordered succession of microbial phylotypes was observed in breast milk-fed infants, which appeared to be disrupted in those fed infant formula. Supplementation with pasteurized donor human milk was partially successful in promoting a microbiome more similar to breast milk-fed infants and moderating rapid increases in bacterial diversity. CONCLUSIONS The preterm infant intestinal microbiome is influenced by postnatal time, birth weight, gestational age, and nutrition. Feeding with breast milk appears to mask the influence of birth weight, suggesting a protective effect against gut immaturity in the preterm infant. These findings suggest not only a microbial mechanism underpinning the body of evidence showing that breast milk promotes intestinal health in the preterm infant but also a dynamic interplay of host and dietary factors that facilitate the colonization of and enrichment for specific microbes during establishment of the preterm infant microbiota.
Collapse
Affiliation(s)
- Katherine E. Gregory
- Department of Pediatric Newborn Medicine, Brigham and Women’s Hospital, 75 Francis Street, Boston, MA 02115 USA
| | - Buck S. Samuel
- Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX 77030 USA
| | - Pearl Houghteling
- Department of Pediatrics, Yale School of Medicine, New Haven, CT USA
| | - Guru Shan
- Cooper Medical School, Camden, NJ USA
| | - Frederick M. Ausubel
- Department of Molecular Biology, Massachusetts General Hospital, Department of Genetics, Harvard Medical School, Boston, MA USA
| | - Ruslan I. Sadreyev
- Department of Molecular Biology, Massachusetts General Hospital, Department of Pathology, Harvard Medical School, Boston, MA USA
| | - W. Allan Walker
- Department of Pediatrics, Mucosal Immunology and Biology Research Center, Massachusetts General Hospital for Children, Harvard Medical School, Boston, MA USA
| |
Collapse
|