1
|
Leske M, Bottacini F, Afli H, Andrade BGN. BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets. Methods Protoc 2022; 5:42. [PMID: 35645350 PMCID: PMC9149982 DOI: 10.3390/mps5030042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/16/2022] [Accepted: 05/18/2022] [Indexed: 11/23/2022] Open
Abstract
The relationship between the host and the microbiome, or the assemblage of microorganisms (including bacteria, archaea, fungi, and viruses), has been proven crucial for its health and disease development. The high dimensionality of microbiome datasets has often been addressed as a major difficulty for data analysis, such as the use of machine-learning (ML) and deep-learning (DL) models. Here, we present BiGAMi, a bi-objective genetic algorithm fitness function for feature selection in microbial datasets to train high-performing phenotype classifiers. The proposed fitness function allowed us to build classifiers that outperformed the baseline performance estimated by the original studies by using as few as 0.04% to 2.32% features of the original dataset. In 35 out of 42 performance comparisons between BiGAMi and other feature selection methods evaluated here (sequential forward selection, SelectKBest, and GARS), BiGAMi achieved its results by selecting 6-93% fewer features. This study showed that the application of a bi-objective GA fitness function against microbiome datasets succeeded in selecting small subsets of bacteria whose contribution to understood diseases and the host state was already experimentally proven. Applying this feature selection approach to novel diseases is expected to quickly reveal the microbes most relevant to a specific condition.
Collapse
Affiliation(s)
- Mike Leske
- Department of Computer Sciences, Munster Technological University, MTU/ADAPT, T12 P928 Cork, Ireland;
| | - Francesca Bottacini
- Department of Biological Sciences, Munster Technological University, MTU, T12 P928 Cork, Ireland;
| | - Haithem Afli
- Department of Computer Sciences, Munster Technological University, MTU/ADAPT, T12 P928 Cork, Ireland;
| | - Bruno G. N. Andrade
- Department of Computer Sciences, Munster Technological University, MTU/ADAPT, T12 P928 Cork, Ireland;
| |
Collapse
|
2
|
Valeris-Chacin R, Pieters M, Hwang H, Johnson TJ, Singer RS. Association of Broiler Litter Microbiome Composition and Campylobacter Isolation. Front Vet Sci 2021; 8:654927. [PMID: 34109233 PMCID: PMC8180553 DOI: 10.3389/fvets.2021.654927] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 04/19/2021] [Indexed: 12/31/2022] Open
Abstract
Infection with Campylobacter species is one of the leading causes of bacterial diarrhea in humans in the US. Chickens, which become colonized on the farm, are important reservoirs of this bacterium. Campylobacter can establish itself in the broiler house via a variety of sources, can survive in the litter of the house, and possibly persist over successive flock cycles. However, the role of the broiler litter microbiome on Campylobacter persistence is not clear. A matched case-control study was conducted to determine whether the broiler litter microbiome composition was associated with Campylobacter isolation within the broiler house. Flocks were classified as cases when either Campylobacter jejuni or Campylobacter coli was isolated in boot sock samples, or as controls otherwise. Case and control flocks were matched at the broiler house level. Composite broiler litter samples were collected and used for DNA extraction and 16S rRNA gene V4 region sequencing. Reads were processed using the DADA2 pipeline to obtain a table of amplicon sequence variants. Alpha diversity and differential bacterial relative abundance were used as predictors of Campylobacter isolation status in conditional logistic regression models adjusting for flock age and sampling season. Beta diversity distances were used as regressors in stratified PERMANOVA with Campylobacter isolation status as predictor, and broiler house as stratum. When Campylobacter was isolated in boot socks, broiler litter microbiome richness and evenness were lower and higher, respectively, without reaching statistical significance. Campylobacter isolation status significantly explained a small proportion of the beta diversity (genus-level Aitchison dissimilarity distance). Clostridium and Anaerostipes were positively associated with Campylobacter isolation status, whereas Bifidobacterium, Anaerosporobacter, and Stenotrophomonas were negatively associated. Our results suggest the presence of bacterial interactions between Campylobacter and the broiler litter microbiome. The negative association of Campylobacter with Bifidobacterium, Anaerosporobacter, and Stenotrophomonas in litter could be potentially exploited as a pre-harvest control strategy.
Collapse
Affiliation(s)
- Robert Valeris-Chacin
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, MN, United States
| | - Maria Pieters
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, Saint Paul, MN, United States.,Veterinary Diagnostic Laboratory, College of Veterinary Medicine, University of Minnesota, Saint Paul, MN, United States
| | - Haejin Hwang
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, MN, United States
| | - Timothy J Johnson
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, MN, United States
| | - Randall S Singer
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, Saint Paul, MN, United States
| |
Collapse
|
3
|
Song J, Zhang J, Su Y, Zhang X, Li J, Tu L, Yu J, Zheng Y, Wang M. Monascus vinegar-mediated alternation of gut microbiota and its correlation with lipid metabolism and inflammation in hyperlipidemic rats. J Funct Foods 2020. [DOI: 10.1016/j.jff.2020.104152] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
|
4
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
5
|
Tang Z, Shen Y, Li Y, Zhang X, Wen J, Qian C, Zhuang W, Shi X, Yi N. Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information. Bioinformatics 2018; 34:901-910. [PMID: 29077795 PMCID: PMC5860634 DOI: 10.1093/bioinformatics/btx684] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 10/05/2017] [Accepted: 10/24/2017] [Indexed: 01/10/2023] Open
Abstract
Motivation Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. Results We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. Availability and implementation The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Contact nyi@uab.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China
- Center for Genetic Epidemiology and Genomics, Medical College of Soochow University, Suzhou, China
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Yueping Shen
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China
| | - Yan Li
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Xinyan Zhang
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jia Wen
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Chen’ao Qian
- Department of Bioinformatics, School of Biology & Basic Medical Science, Soochow University, Suzhou, China
| | - Wenzhuo Zhuang
- Department of Cell Biology, School of Biology & Basic Medical Science, Soochow University, Suzhou, China
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
6
|
Shankar J. Insights into study design and statistical analyses in translational microbiome studies. ANNALS OF TRANSLATIONAL MEDICINE 2017; 5:249. [PMID: 28706917 DOI: 10.21037/atm.2017.01.13] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Research questions in translational microbiome studies are substantially more complex than their counterparts in basic science. Robust study designs with appropriate statistical analysis frameworks are pivotal to the success of these translational studies. This review considers how study designs can account for heterogeneous phenotypes by adopting representative sampling schemes for recruiting the study population and making careful choices about the control population. Advantages and limitations of 16S profiling and whole-genome sequencing, the two primary techniques for measuring the microbiome, are discussed followed by an overview of bioinformatic processing of high-throughput sequencing data from these measurements. Practical insights into the downstream statistical analyses including data processing and integration, variable transformations, and data exploration are provided. The merits of regularization and ensemble modeling for analyzing microbiome data are discussed along with a recommendation for selecting modeling approaches based on data-driven simulations and objective evaluation. The review builds on several recent discussions of study design issues in microbiome research but with a stronger emphasis on the downstream and often-ignored aspects of statistical analyses that are crucial for bridging the gap between basic science and translation.
Collapse
|
7
|
Tang Z, Shen Y, Zhang X, Yi N. The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection. Genetics 2017; 205:77-88. [PMID: 27799277 PMCID: PMC5223525 DOI: 10.1534/genetics.116.192195] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 10/27/2016] [Indexed: 11/18/2022] Open
Abstract
Large-scale "omics" data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, there are considerable challenges in analyzing high-dimensional molecular data, including the large number of potential molecular predictors, limited number of samples, and small effect of each predictor. We propose new Bayesian hierarchical generalized linear models, called spike-and-slab lasso GLMs, for prognostic prediction and detection of associated genes using large-scale molecular data. The proposed model employs a spike-and-slab mixture double-exponential prior for coefficients that can induce weak shrinkage on large coefficients, and strong shrinkage on irrelevant coefficients. We have developed a fast and stable algorithm to fit large-scale hierarchal GLMs by incorporating expectation-maximization (EM) steps into the fast cyclic coordinate descent algorithm. The proposed approach integrates nice features of two popular methods, i.e., penalized lasso and Bayesian spike-and-slab variable selection. The performance of the proposed method is assessed via extensive simulation studies. The results show that the proposed approach can provide not only more accurate estimates of the parameters, but also better prediction. We demonstrate the proposed procedure on two cancer data sets: a well-known breast cancer data set consisting of 295 tumors, and expression data of 4919 genes; and the ovarian cancer data set from TCGA with 362 tumors, and expression data of 5336 genes. Our analyses show that the proposed procedure can generate powerful models for predicting outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
Collapse
Affiliation(s)
- Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases and Center for Genetic Epidemiology and Genomics, Medical College of Soochow University, Suzhou 215123, China
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Alabama 35294
| | - Yueping Shen
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases and Center for Genetic Epidemiology and Genomics, Medical College of Soochow University, Suzhou 215123, China
| | - Xinyan Zhang
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Alabama 35294
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Alabama 35294
| |
Collapse
|
8
|
Huang BFF, Boutros PC. The parameter sensitivity of random forests. BMC Bioinformatics 2016; 17:331. [PMID: 27586051 PMCID: PMC5009551 DOI: 10.1186/s12859-016-1228-x] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 08/26/2016] [Indexed: 02/07/2023] Open
Abstract
Background The Random Forest (RF) algorithm for supervised machine learning is an ensemble learning method widely used in science and many other fields. Its popularity has been increasing, but relatively few studies address the parameter selection process: a critical step in model fitting. Due to numerous assertions regarding the performance reliability of the default parameters, many RF models are fit using these values. However there has not yet been a thorough examination of the parameter-sensitivity of RFs in computational genomic studies. We address this gap here. Results We examined the effects of parameter selection on classification performance using the RF machine learning algorithm on two biological datasets with distinct p/n ratios: sequencing summary statistics (low p/n) and microarray-derived data (high p/n). Here, p, refers to the number of variables and, n, the number of samples. Our findings demonstrate that parameterization is highly correlated with prediction accuracy and variable importance measures (VIMs). Further, we demonstrate that different parameters are critical in tuning different datasets, and that parameter-optimization significantly enhances upon the default parameters. Conclusions Parameter performance demonstrated wide variability on both low and high p/n data. Therefore, there is significant benefit to be gained by model tuning RFs away from their default parameter settings. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1228-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Barbara F F Huang
- Informatics and Bio-computing Program, Ontario Institute for Cancer Research, Toronto, Canada
| | - Paul C Boutros
- Informatics and Bio-computing Program, Ontario Institute for Cancer Research, Toronto, Canada. .,Department of Medical Biophysics, University of Toronto, Toronto, Canada. .,Department of Pharmacology and Toxicology, University of Toronto, Toronto, Canada. .,MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, M5G 0A3, Canada.
| |
Collapse
|
9
|
Mayr A, Hofner B, Schmid M. Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection. BMC Bioinformatics 2016; 17:288. [PMID: 27444890 PMCID: PMC4957316 DOI: 10.1186/s12859-016-1149-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 07/13/2016] [Indexed: 12/15/2022] Open
Abstract
Background When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fitting Cox models. Those are, however, not necessarily optimal with respect to the resulting discriminatory power and are based on restrictive assumptions. We present a combined approach to automatically select and fit sparse discrimination models for potentially high-dimensional survival data based on boosting a smooth version of the concordance index (C-index). Due to this objective function, the resulting prediction models are optimal with respect to their ability to discriminate between patients with longer and shorter survival times. The gradient boosting algorithm is combined with the stability selection approach to enhance and control its variable selection properties. Results The resulting algorithm fits prediction models based on the rankings of the survival times and automatically selects only the most stable predictors. The performance of the approach, which works best for small numbers of informative predictors, is demonstrated in a large scale simulation study: C-index boosting in combination with stability selection is able to identify a small subset of informative predictors from a much larger set of non-informative ones while controlling the per-family error rate. In an application to discover biomarkers for breast cancer patients based on gene expression data, stability selection yielded sparser models and the resulting discriminatory power was higher than with lasso penalized Cox regression models. Conclusion The combination of stability selection and C-index boosting can be used to select small numbers of informative biomarkers and to derive new prediction rules that are optimal with respect to their discriminatory power. Stability selection controls the per-family error rate which makes the new approach also appealing from an inferential point of view, as it provides an alternative to classical hypothesis tests for single predictor effects. Due to the shrinkage and variable selection properties of statistical boosting algorithms, the latter tests are typically unfeasible for prediction models fitted by boosting. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1149-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Andreas Mayr
- Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Waldstr. 6, Erlangen, 91054, Germany. .,Institut für Medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Sigmund-Freud-Str. 25, Bonn, 53105, Germany.
| | - Benjamin Hofner
- Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Waldstr. 6, Erlangen, 91054, Germany
| | - Matthias Schmid
- Institut für Medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Sigmund-Freud-Str. 25, Bonn, 53105, Germany
| |
Collapse
|
10
|
Shankar J, Nguyen MH, Crespo MM, Kwak EJ, Lucas SK, McHugh KJ, Mounaud S, Alcorn JF, Pilewski JM, Shigemura N, Kolls JK, Nierman WC, Clancy CJ. Looking Beyond Respiratory Cultures: Microbiome-Cytokine Signatures of Bacterial Pneumonia and Tracheobronchitis in Lung Transplant Recipients. Am J Transplant 2016; 16:1766-78. [PMID: 26693965 DOI: 10.1111/ajt.13676] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Revised: 11/10/2015] [Accepted: 12/06/2015] [Indexed: 01/25/2023]
Abstract
Bacterial pneumonia and tracheobronchitis are diagnosed frequently following lung transplantation. The diseases share clinical signs of inflammation and are often difficult to differentiate based on culture results. Microbiome and host immune-response signatures that distinguish between pneumonia and tracheobronchitis are undefined. Using a retrospective study design, we selected 49 bronchoalveolar lavage fluid samples from 16 lung transplant recipients associated with pneumonia (n = 8), tracheobronchitis (n = 12) or colonization without respiratory infection (n = 29). We ensured an even distribution of Pseudomonas aeruginosa or Staphylococcus aureus culture-positive samples across the groups. Bayesian regression analysis identified non-culture-based signatures comprising 16S ribosomal RNA microbiome profiles, cytokine levels and clinical variables that characterized the three diagnoses. Relative to samples associated with colonization, those from pneumonia had significantly lower microbial diversity, decreased levels of several bacterial genera and prominent multifunctional cytokine responses. In contrast, tracheobronchitis was characterized by high microbial diversity and multifunctional cytokine responses that differed from those of pneumonia-colonization comparisons. The dissimilar microbiomes and cytokine responses underlying bacterial pneumonia and tracheobronchitis following lung transplantation suggest that the diseases result from different pathogenic processes. Microbiomes and cytokine responses had complementary features, suggesting that they are closely interconnected in the pathogenesis of both diseases.
Collapse
Affiliation(s)
- J Shankar
- J. Craig Venter Institute, Rockville, MD
| | - M H Nguyen
- Division of Infectious Diseases, Department of Medicine, University of Pittsburgh, Pittsburgh, PA
| | - M M Crespo
- Division of Pulmonary Allergy and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA
| | - E J Kwak
- Division of Infectious Diseases, Department of Medicine, University of Pittsburgh, Pittsburgh, PA
| | - S K Lucas
- J. Craig Venter Institute, Rockville, MD
| | - K J McHugh
- Department of Pediatrics, Children's Hospital of Pittsburgh of the University of Pittsburgh Medical Center, Pittsburgh, PA
| | - S Mounaud
- J. Craig Venter Institute, Rockville, MD
| | - J F Alcorn
- Department of Pediatrics, Children's Hospital of Pittsburgh of the University of Pittsburgh Medical Center, Pittsburgh, PA
| | - J M Pilewski
- Division of Pulmonary Allergy and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA
| | - N Shigemura
- Department of Cardiothoracic Surgery, University of Pittsburgh, Pittsburgh, PA
| | - J K Kolls
- Richard King Mellon Foundation Institute for Pediatric Research, Children's Hospital of Pittsburgh of the University of Pittsburgh Medical Center, Pittsburgh, PA
| | | | - C J Clancy
- Division of Infectious Diseases, Department of Medicine, University of Pittsburgh, Pittsburgh, PA.,VA Pittsburgh Healthcare System, Division of Infectious Diseases, Pittsburgh, PA
| |
Collapse
|
11
|
Shankar J, Solis NV, Mounaud S, Szpakowski S, Liu H, Losada L, Nierman WC, Filler SG. Using Bayesian modelling to investigate factors governing antibiotic-induced Candida albicans colonization of the GI tract. Sci Rep 2015; 5:8131. [PMID: 25644850 PMCID: PMC4314636 DOI: 10.1038/srep08131] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 01/07/2015] [Indexed: 12/29/2022] Open
Abstract
Receipt of broad-spectrum antibiotics enhances Candida albicans colonization of the GI tract, a risk factor for haematogenously-disseminated candidiasis. To understand how antibiotics influence C. albicans colonization, we treated mice orally with vancomycin or a combination of penicillin, streptomycin, and gentamicin (PSG) and then inoculated them with C. albicans by gavage. Only PSG treatment resulted in sustained, high-level GI colonization with C. albicans. Furthermore, PSG reduced bacterial diversity in the colon much more than vancomycin. Both antibiotic regimens significantly reduced IL-17A, IL-21, IL-22 and IFN-γ mRNA levels in the terminal ileum but had limited effect on the GI fungal microbiome. Through a series of models that employed Bayesian model averaging, we investigated the associations between antibiotic treatment, GI microbiota, and host immune response and their collective impact on C. albicans colonization. Our analysis revealed that bacterial genera were typically associated with either C. albicans colonization or altered cytokine expression but not with both. The only exception was Veillonella, which was associated with both increased C. albicans colonization and reduced IL-21 expression. Overall, antibiotic-induced changes in the bacterial microbiome were much more consistent determinants of C. albicans colonization than either the GI fungal microbiota or the GI immune response.
Collapse
Affiliation(s)
| | - Norma V. Solis
- Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | | | | | - Hong Liu
- Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | | | | | - Scott G. Filler
- Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
- David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| |
Collapse
|
12
|
Shankar J, Szpakowski S, Solis NV, Mounaud S, Liu H, Losada L, Nierman WC, Filler SG. A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses. BMC Bioinformatics 2015; 16:31. [PMID: 25638274 PMCID: PMC4339743 DOI: 10.1186/s12859-015-0467-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 01/15/2015] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Microbiome studies incorporate next-generation sequencing to obtain profiles of microbial communities. Data generated from these experiments are high-dimensional with a rich correlation structure but modest sample sizes. A statistical model that utilizes these microbiome profiles to explain a clinical or biological endpoint needs to tackle high-dimensionality resulting from the very large space of variable configurations. Ensemble models are a class of approaches that can address high-dimensionality by aggregating information across large model spaces. Although such models are popular in fields as diverse as economics and genetics, their performance on microbiome data has been largely unexplored. RESULTS We developed a simulation framework that accurately captures the constraints of experimental microbiome data. Using this setup, we systematically evaluated a selection of both frequentist and Bayesian regression modeling ensembles. These are represented by variants of stability selection in conjunction with elastic net and spike-and-slab Bayesian model averaging (BMA), respectively. BMA ensembles that explore a larger space of models relative to stability selection variants performed better and had lower variability across simulations. However, stability selection ensembles were able to match the performance of BMA in scenarios of low sparsity where several variables had large regression coefficients. CONCLUSIONS Given a microbiome dataset of interest, we present a methodology to generate simulated data that closely mimics its characteristics in a manner that enables meaningful evaluation of analytical strategies. Our evaluation demonstrates that the largest ensembles yield the strongest performance on microbiome data with modest sample sizes and high-dimensional measurements. We also demonstrate the ability of these ensembles to identify microbiome signatures that are associated with opportunistic Candida albicans colonization during antibiotic exposure. As the focus of microbiome research evolves from pilot to translational studies, we anticipate that our strategy will aid investigators in making evaluation-based decisions for selecting appropriate analytical methods.
Collapse
Affiliation(s)
- Jyoti Shankar
- J. Craig Venter Institute, 9704, Medical Center Drive, Rockville, Maryland, 20850, US.
| | - Sebastian Szpakowski
- J. Craig Venter Institute, 9704, Medical Center Drive, Rockville, Maryland, 20850, US.
| | - Norma V Solis
- Los Angeles Biomedical Research Institute at Harbor, UCLA Medical Center, 1124 West Carson Street, Torrance, California, 90509, US.
| | - Stephanie Mounaud
- J. Craig Venter Institute, 9704, Medical Center Drive, Rockville, Maryland, 20850, US.
| | - Hong Liu
- Los Angeles Biomedical Research Institute at Harbor, UCLA Medical Center, 1124 West Carson Street, Torrance, California, 90509, US.
| | - Liliana Losada
- J. Craig Venter Institute, 9704, Medical Center Drive, Rockville, Maryland, 20850, US.
| | - William C Nierman
- J. Craig Venter Institute, 9704, Medical Center Drive, Rockville, Maryland, 20850, US.
| | - Scott G Filler
- Los Angeles Biomedical Research Institute at Harbor, UCLA Medical Center, 1124 West Carson Street, Torrance, California, 90509, US.
- David Geffen School of Medicine, University of California at Los Angeles, California, 90095, US.
| |
Collapse
|