Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Shankar J, Szpakowski S, Solis NV, Mounaud S, Liu H, Losada L, Nierman WC, Filler SG. A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses. BMC Bioinformatics 2015;16:31. [PMID: 25638274 PMCID: PMC4339743 DOI: 10.1186/s12859-015-0467-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 01/15/2015] [Indexed: 01/09/2023] Open

For:	Shankar J, Szpakowski S, Solis NV, Mounaud S, Liu H, Losada L, Nierman WC, Filler SG. A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses. BMC Bioinformatics 2015;16:31. [PMID: 25638274 PMCID: PMC4339743 DOI: 10.1186/s12859-015-0467-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 01/15/2015] [Indexed: 01/09/2023] Open

Number

Cited by Other Article(s)

Leske M, Bottacini F, Afli H, Andrade BGN. BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets. Methods Protoc 2022;5:42. [PMID: 35645350 PMCID: PMC9149982 DOI: 10.3390/mps5030042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/16/2022] [Accepted: 05/18/2022] [Indexed: 11/23/2022] Open

Valeris-Chacin R, Pieters M, Hwang H, Johnson TJ, Singer RS. Association of Broiler Litter Microbiome Composition and Campylobacter Isolation. Front Vet Sci 2021;8:654927. [PMID: 34109233 PMCID: PMC8180553 DOI: 10.3389/fvets.2021.654927] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 04/19/2021] [Indexed: 12/31/2022] Open

Abstract

Infection with Campylobacter species is one of the leading causes of bacterial diarrhea in humans in the US. Chickens, which become colonized on the farm, are important reservoirs of this bacterium. Campylobacter can establish itself in the broiler house via a variety of sources, can survive in the litter of the house, and possibly persist over successive flock cycles. However, the role of the broiler litter microbiome on Campylobacter persistence is not clear. A matched case-control study was conducted to determine whether the broiler litter microbiome composition was associated with Campylobacter isolation within the broiler house. Flocks were classified as cases when either Campylobacter jejuni or Campylobacter coli was isolated in boot sock samples, or as controls otherwise. Case and control flocks were matched at the broiler house level. Composite broiler litter samples were collected and used for DNA extraction and 16S rRNA gene V4 region sequencing. Reads were processed using the DADA2 pipeline to obtain a table of amplicon sequence variants. Alpha diversity and differential bacterial relative abundance were used as predictors of Campylobacter isolation status in conditional logistic regression models adjusting for flock age and sampling season. Beta diversity distances were used as regressors in stratified PERMANOVA with Campylobacter isolation status as predictor, and broiler house as stratum. When Campylobacter was isolated in boot socks, broiler litter microbiome richness and evenness were lower and higher, respectively, without reaching statistical significance. Campylobacter isolation status significantly explained a small proportion of the beta diversity (genus-level Aitchison dissimilarity distance). Clostridium and Anaerostipes were positively associated with Campylobacter isolation status, whereas Bifidobacterium, Anaerosporobacter, and Stenotrophomonas were negatively associated. Our results suggest the presence of bacterial interactions between Campylobacter and the broiler litter microbiome. The negative association of Campylobacter with Bifidobacterium, Anaerosporobacter, and Stenotrophomonas in litter could be potentially exploited as a pre-harvest control strategy.

Collapse

Song J, Zhang J, Su Y, Zhang X, Li J, Tu L, Yu J, Zheng Y, Wang M. Monascus vinegar-mediated alternation of gut microbiota and its correlation with lipid metabolism and inflammation in hyperlipidemic rats. J Funct Foods 2020. [DOI: 10.1016/j.jff.2020.104152] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020;171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Abstract

Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.

Collapse

Tang Z, Shen Y, Li Y, Zhang X, Wen J, Qian C, Zhuang W, Shi X, Yi N. Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information. Bioinformatics 2018;34:901-910. [PMID: 29077795 PMCID: PMC5860634 DOI: 10.1093/bioinformatics/btx684] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 10/05/2017] [Accepted: 10/24/2017] [Indexed: 01/10/2023] Open

Shankar J. Insights into study design and statistical analyses in translational microbiome studies. ANNALS OF TRANSLATIONAL MEDICINE 2017;5:249. [PMID: 28706917 DOI: 10.21037/atm.2017.01.13] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Tang Z, Shen Y, Zhang X, Yi N. The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection. Genetics 2017;205:77-88. [PMID: 27799277 PMCID: PMC5223525 DOI: 10.1534/genetics.116.192195] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 10/27/2016] [Indexed: 11/18/2022] Open

Huang BFF, Boutros PC. The parameter sensitivity of random forests. BMC Bioinformatics 2016;17:331. [PMID: 27586051 PMCID: PMC5009551 DOI: 10.1186/s12859-016-1228-x] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 08/26/2016] [Indexed: 02/07/2023] Open

Mayr A, Hofner B, Schmid M. Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection. BMC Bioinformatics 2016;17:288. [PMID: 27444890 PMCID: PMC4957316 DOI: 10.1186/s12859-016-1149-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 07/13/2016] [Indexed: 12/15/2022] Open

Abstract

Background

When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fitting Cox models. Those are, however, not necessarily optimal with respect to the resulting discriminatory power and are based on restrictive assumptions. We present a combined approach to automatically select and fit sparse discrimination models for potentially high-dimensional survival data based on boosting a smooth version of the concordance index (C-index). Due to this objective function, the resulting prediction models are optimal with respect to their ability to discriminate between patients with longer and shorter survival times. The gradient boosting algorithm is combined with the stability selection approach to enhance and control its variable selection properties.

Results

The resulting algorithm fits prediction models based on the rankings of the survival times and automatically selects only the most stable predictors. The performance of the approach, which works best for small numbers of informative predictors, is demonstrated in a large scale simulation study: C-index boosting in combination with stability selection is able to identify a small subset of informative predictors from a much larger set of non-informative ones while controlling the per-family error rate. In an application to discover biomarkers for breast cancer patients based on gene expression data, stability selection yielded sparser models and the resulting discriminatory power was higher than with lasso penalized Cox regression models.

Conclusion

The combination of stability selection and C-index boosting can be used to select small numbers of informative biomarkers and to derive new prediction rules that are optimal with respect to their discriminatory power. Stability selection controls the per-family error rate which makes the new approach also appealing from an inferential point of view, as it provides an alternative to classical hypothesis tests for single predictor effects. Due to the shrinkage and variable selection properties of statistical boosting algorithms, the latter tests are typically unfeasible for prediction models fitted by boosting.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1149-8) contains supplementary material, which is available to authorized users.

Collapse

Shankar J, Nguyen MH, Crespo MM, Kwak EJ, Lucas SK, McHugh KJ, Mounaud S, Alcorn JF, Pilewski JM, Shigemura N, Kolls JK, Nierman WC, Clancy CJ. Looking Beyond Respiratory Cultures: Microbiome-Cytokine Signatures of Bacterial Pneumonia and Tracheobronchitis in Lung Transplant Recipients. Am J Transplant 2016;16:1766-78. [PMID: 26693965 DOI: 10.1111/ajt.13676] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Revised: 11/10/2015] [Accepted: 12/06/2015] [Indexed: 01/25/2023]

Shankar J, Solis NV, Mounaud S, Szpakowski S, Liu H, Losada L, Nierman WC, Filler SG. Using Bayesian modelling to investigate factors governing antibiotic-induced Candida albicans colonization of the GI tract. Sci Rep 2015;5:8131. [PMID: 25644850 PMCID: PMC4314636 DOI: 10.1038/srep08131] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 01/07/2015] [Indexed: 12/29/2022] Open

Shankar J, Szpakowski S, Solis NV, Mounaud S, Liu H, Losada L, Nierman WC, Filler SG. A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses. BMC Bioinformatics 2015;16:31. [PMID: 25638274 PMCID: PMC4339743 DOI: 10.1186/s12859-015-0467-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Accepted: 01/15/2015] [Indexed: 01/09/2023] Open

Abstract

BACKGROUND

Microbiome studies incorporate next-generation sequencing to obtain profiles of microbial communities. Data generated from these experiments are high-dimensional with a rich correlation structure but modest sample sizes. A statistical model that utilizes these microbiome profiles to explain a clinical or biological endpoint needs to tackle high-dimensionality resulting from the very large space of variable configurations. Ensemble models are a class of approaches that can address high-dimensionality by aggregating information across large model spaces. Although such models are popular in fields as diverse as economics and genetics, their performance on microbiome data has been largely unexplored.

RESULTS

We developed a simulation framework that accurately captures the constraints of experimental microbiome data. Using this setup, we systematically evaluated a selection of both frequentist and Bayesian regression modeling ensembles. These are represented by variants of stability selection in conjunction with elastic net and spike-and-slab Bayesian model averaging (BMA), respectively. BMA ensembles that explore a larger space of models relative to stability selection variants performed better and had lower variability across simulations. However, stability selection ensembles were able to match the performance of BMA in scenarios of low sparsity where several variables had large regression coefficients.

CONCLUSIONS

Given a microbiome dataset of interest, we present a methodology to generate simulated data that closely mimics its characteristics in a manner that enables meaningful evaluation of analytical strategies. Our evaluation demonstrates that the largest ensembles yield the strongest performance on microbiome data with modest sample sizes and high-dimensional measurements. We also demonstrate the ability of these ensembles to identify microbiome signatures that are associated with opportunistic Candida albicans colonization during antibiotic exposure. As the focus of microbiome research evolves from pilot to translational studies, we anticipate that our strategy will aid investigators in making evaluation-based decisions for selecting appropriate analytical methods.

Collapse