1
|
Abegaz F, Abedini D, White F, Guerrieri A, Zancarini A, Dong L, Westerhuis JA, van Eeuwijk F, Bouwmeester H, Smilde AK. A strategy for differential abundance analysis of sparse microbiome data with group-wise structured zeros. Sci Rep 2024; 14:12433. [PMID: 38816496 PMCID: PMC11139916 DOI: 10.1038/s41598-024-62437-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 05/16/2024] [Indexed: 06/01/2024] Open
Abstract
Comparing the abundance of microbial communities between different groups or obtained under different experimental conditions using count sequence data is a challenging task due to various issues such as inflated zero counts, overdispersion, and non-normality. Several methods and procedures based on counts, their transformation and compositionality have been proposed in the literature to detect differentially abundant species in datasets containing hundreds to thousands of microbial species. Despite efforts to address the large numbers of zeros present in microbiome datasets, even after careful data preprocessing, the performance of existing methods is impaired by the presence of inflated zero counts and group-wise structured zeros (i.e. all zero counts in a group). We propose and validate using extensive simulations an approach combining two differential abundance testing methods, namely DESeq2-ZINBWaVE and DESeq2, to address the issues of zero-inflation and group-wise structured zeros, respectively. This combined approach was subsequently successfully applied to two plant microbiome datasets that revealed a number of taxa as interesting candidates for further experimental validation.
Collapse
Affiliation(s)
- Fentaw Abegaz
- Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands.
- Biometris, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands.
| | - Davar Abedini
- Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands
| | - Fred White
- Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands
| | - Alessandra Guerrieri
- Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands
| | - Anouk Zancarini
- IGEPP, INRAE, Institut Agro, Univ Rennes, 35653, Le Rheu, France
| | - Lemeng Dong
- Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands
| | - Johan A Westerhuis
- Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands
| | - Fred van Eeuwijk
- Biometris, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
| | - Harro Bouwmeester
- Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands
| | - Age K Smilde
- Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 XH, Amsterdam, The Netherlands
| |
Collapse
|
2
|
Yonatan Y, Kahn S, Bashan A. Interactions-based classification of a single microbial sample. CELL REPORTS METHODS 2024; 4:100775. [PMID: 38744286 PMCID: PMC11133833 DOI: 10.1016/j.crmeth.2024.100775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 02/11/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024]
Abstract
To address the limitation of overlooking crucial ecological interactions due to relying on single time point samples, we developed a computational approach that analyzes individual samples based on the interspecific microbial relationships. We verify, using both numerical simulations as well as real and shuffled microbial profiles from the human oral cavity, that the method can classify single samples based on their interspecific interactions. By analyzing the gut microbiome of people with autistic spectrum disorder, we found that our interaction-based method can improve the classification of individual subjects based on a single microbial sample. These results demonstrate that the underlying ecological interactions can be practically utilized to facilitate microbiome-based diagnosis and precision medicine.
Collapse
Affiliation(s)
- Yogev Yonatan
- Physics Department, Bar-Ilan University, Ramat-Gan, Israel
| | - Shaya Kahn
- Physics Department, Bar-Ilan University, Ramat-Gan, Israel
| | - Amir Bashan
- Physics Department, Bar-Ilan University, Ramat-Gan, Israel.
| |
Collapse
|
3
|
Wang M, Fontaine S, Jiang H, Li G. ADAPT: Analysis of Microbiome Differential Abundance by Pooling Tobit Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.14.594186. [PMID: 38798558 PMCID: PMC11118451 DOI: 10.1101/2024.05.14.594186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Microbiome differential abundance analysis remains a challenging problem despite multiple methods proposed in the literature. The excessive zeros and compositionality of metagenomics data are two main challenges for differential abundance analysis. We propose a novel method called "analysis of differential abundance by pooling Tobit models" (ADAPT) to overcome these two challenges. ADAPT uniquely treats zero counts as left-censored observations to facilitate computation and enhance interpretation. ADAPT also encompasses a theoretically justified way of selecting non-differentially abundant microbiome taxa as a reference for hypothesis testing. We generate synthetic data using independent simulation frameworks to show that ADAPT has more consistent false discovery rate control and higher statistical power than competitors. We use ADAPT to analyze 16S rRNA sequencing of saliva samples and shotgun metagenomics sequencing of plaque samples collected from infants in the COHRA2 study. The results provide novel insights into the association between the oral microbiome and early childhood dental caries.
Collapse
Affiliation(s)
- Mukai Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Simon Fontaine
- Department of Statistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Hui Jiang
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Gen Li
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| |
Collapse
|
4
|
Kean IRL, Clark JA, Zhang Z, Daubney E, White D, Ferrando-Vivas P, Milla G, Cuthbertson B, Pappachan J, Klein N, Mouncey P, Rowan K, Myburgh J, Gouliouris T, Baker S, Parkhill J, Pathan N, Arctic Research Team. Short-duration selective decontamination of the digestive tract infection control does not contribute to increased antimicrobial resistance burden in a pilot cluster randomised trial (the ARCTIC Study). Gut 2024; 73:910-921. [PMID: 38253478 PMCID: PMC11103307 DOI: 10.1136/gutjnl-2023-330851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 01/11/2024] [Indexed: 01/24/2024]
Abstract
OBJECTIVE Selective decontamination of the digestive tract (SDD) is a well-studied but hotly contested medical intervention of enhanced infection control. Here, we aim to characterise the changes to the microbiome and antimicrobial resistance (AMR) gene profiles in critically ill children treated with SDD-enhanced infection control compared with conventional infection control. DESIGN We conducted shotgun metagenomic microbiome and resistome analysis on serial oropharyngeal and faecal samples collected from critically ill, mechanically ventilated patients in a pilot multicentre cluster randomised trial of SDD. The microbiome and AMR profiles were compared for longitudinal and intergroup changes. Of consented patients, faecal microbiome baseline samples were obtained in 89 critically ill children. Additionally, samples collected during and after critical illness were collected in 17 children treated with SDD-enhanced infection control and 19 children who received standard care. RESULTS SDD affected the alpha and beta diversity of critically ill children to a greater degree than standard care. At cessation of treatment, the microbiome of SDD patients was dominated by Actinomycetota, specifically Bifidobacterium, at the end of mechanical ventilation. Altered gut microbiota was evident in a subset of SDD-treated children who returned late longitudinal samples compared with children receiving standard care. Clinically relevant AMR gene burden was unaffected by the administration of SDD-enhanced infection control compared with standard care. SDD did not affect the composition of the oral microbiome compared with standard treatment. CONCLUSION Short interventions of SDD caused a shift in the microbiome but not of the AMR gene pool in critically ill children at the end mechanical ventilation, compared with standard antimicrobial therapy.
Collapse
Affiliation(s)
| | - John A Clark
- Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - Zhenguang Zhang
- Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - Esther Daubney
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Deborah White
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | | | | | | | | | | | | | - John Myburgh
- The George Institute for Global Health, Newtown, New South Wales, Australia
| | | | - Stephen Baker
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Nazima Pathan
- Department of Paediatrics, University of Cambridge, Cambridge, UK
| | | |
Collapse
|
5
|
Chen See JR, Leister J, Wright JR, Kruse PI, Khedekar MV, Besch CE, Kumamoto CA, Madden GR, Stewart DB, Lamendella R. Clostridioides difficile infection is associated with differences in transcriptionally active microbial communities. Front Microbiol 2024; 15:1398018. [PMID: 38680911 PMCID: PMC11045941 DOI: 10.3389/fmicb.2024.1398018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Accepted: 04/02/2024] [Indexed: 05/01/2024] Open
Abstract
Clostridioides difficile infection (CDI) is responsible for around 300,000 hospitalizations yearly in the United States, with the associated monetary cost being billions of dollars. Gut microbiome dysbiosis is known to be important to CDI. To the best of our knowledge, metatranscriptomics (MT) has only been used to characterize gut microbiome composition and function in one prior study involving CDI patients. Therefore, we utilized MT to investigate differences in active community diversity and composition between CDI+ (n = 20) and CDI- (n = 19) samples with respect to microbial taxa and expressed genes. No significant (Kruskal-Wallis, p > 0.05) differences were detected for richness or evenness based on CDI status. However, clustering based on CDI status was significant for both active microbial taxa and expressed genes datasets (PERMANOVA, p ≤ 0.05). Furthermore, differential feature analysis revealed greater expression of the opportunistic pathogens Enterocloster bolteae and Ruminococcus gnavus in CDI+ compared to CDI- samples. When only fungal sequences were considered, the family Saccharomycetaceae expressed more genes in CDI-, while 31 other fungal taxa were identified as significantly (Kruskal-Wallis p ≤ 0.05, log(LDA) ≥ 2) associated with CDI+. We also detected a variety of genes and pathways that differed significantly (Kruskal-Wallis p ≤ 0.05, log(LDA) ≥ 2) based on CDI status. Notably, differential genes associated with biofilm formation were expressed by C. difficile. This provides evidence of another possible contributor to C. difficile's resistance to antibiotics and frequent recurrence in vivo. Furthermore, the greater number of CDI+ associated fungal taxa constitute additional evidence that the mycobiome is important to CDI pathogenesis. Future work will focus on establishing if C. difficile is actively producing biofilms during infection and if any specific fungal taxa are particularly influential in CDI.
Collapse
Affiliation(s)
| | | | - Justin R. Wright
- Juniata College, Huntingdon, PA, United States
- Wright Labs LLC, Huntingdon, PA, United States
| | | | | | | | - Carol A. Kumamoto
- Molecular Biology and Microbiology, Tufts University, Boston, MA, United States
| | - Gregory R. Madden
- University of Virginia School of Medicine, Charlottesville, VA, United States
| | - David B. Stewart
- Department of Surgery, Southern Illinois University School of Medicine, Springfield, IL, United States
| | | |
Collapse
|
6
|
Ahn S, Datta S. Differential network connectivity analysis for microbiome data adjusted for clinical covariates using jackknife pseudo-values. BMC Bioinformatics 2024; 25:117. [PMID: 38500042 PMCID: PMC10946111 DOI: 10.1186/s12859-024-05689-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 02/02/2024] [Indexed: 03/20/2024] Open
Abstract
BACKGROUND A recent breakthrough in differential network (DN) analysis of microbiome data has been realized with the advent of next-generation sequencing technologies. The DN analysis disentangles the microbial co-abundance among taxa by comparing the network properties between two or more graphs under different biological conditions. However, the existing methods to the DN analysis for microbiome data do not adjust for other clinical differences between subjects. RESULTS We propose a Statistical Approach via Pseudo-value Information and Estimation for Differential Network Analysis (SOHPIE-DNA) that incorporates additional covariates such as continuous age and categorical BMI. SOHPIE-DNA is a regression technique adopting jackknife pseudo-values that can be implemented readily for the analysis. We demonstrate through simulations that SOHPIE-DNA consistently reaches higher recall and F1-score, while maintaining similar precision and accuracy to existing methods (NetCoMi and MDiNE). Lastly, we apply SOHPIE-DNA on two real datasets from the American Gut Project and the Diet Exchange Study to showcase the utility. The analysis of the Diet Exchange Study is to showcase that SOHPIE-DNA can also be used to incorporate the temporal change of connectivity of taxa with the inclusion of additional covariates. As a result, our method has found taxa that are related to the prevention of intestinal inflammation and severity of fatigue in advanced metastatic cancer patients. CONCLUSION SOHPIE-DNA is the first attempt of introducing the regression framework for the DN analysis in microbiome data. This enables the prediction of characteristics of a connectivity of a network with the presence of additional covariate information in the regression. The R package with a vignette of our methodology is available through the CRAN repository ( https://CRAN.R-project.org/package=SOHPIE ), named SOHPIE (pronounced as Sofie). The source code and user manual can be found at https://github.com/sjahnn/SOHPIE-DNA .
Collapse
Affiliation(s)
- Seungjun Ahn
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Somnath Datta
- Department of Biostatistics, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
7
|
Austin GI, Kav AB, Park H, Biermann J, Uhlemann AC, Korem T. Processing-bias correction with DEBIAS-M improves cross-study generalization of microbiome-based prediction models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.09.579716. [PMID: 38405914 PMCID: PMC10888995 DOI: 10.1101/2024.02.09.579716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Every step in common microbiome profiling protocols has variable efficiency for each microbe. For example, different DNA extraction kits may have different efficiency for Gram-positive and -negative bacteria. These variable efficiencies, combined with technical variation, create strong processing biases, which impede the identification of signals that are reproducible across studies and the development of generalizable and biologically interpretable prediction models. "Batch-correction" methods have been used to alleviate these issues computationally with some success. However, many make strong parametric assumptions which do not necessarily apply to microbiome data or processing biases, or require the use of an outcome variable, which risks overfitting. Lastly and importantly, existing transformations used to correct microbiome data are largely non-interpretable, and could, for example, introduce values to features that were initially mostly zeros. Altogether, processing bias currently compromises our ability to glean robust and generalizable biological insights from microbiome data. Here, we present DEBIAS-M (Domain adaptation with phenotype Estimation and Batch Integration Across Studies of the Microbiome), an interpretable framework for inference and correction of processing bias, which facilitates domain adaptation in microbiome studies. DEBIAS-M learns bias-correction factors for each microbe in each batch that simultaneously minimize batch effects and maximize cross-study associations with phenotypes. Using benchmarks of HIV and colorectal cancer classification from gut microbiome data, and cervical neoplasia prediction from cervical microbiome data, we demonstrate that DEBIAS-M outperforms batch-correction methods commonly used in the field. Notably, we show that the inferred bias-correction factors are stable, interpretable, and strongly associated with specific experimental protocols. Overall, we show that DEBIAS-M allows for better modeling of microbiome data and identification of interpretable signals that are reproducible across studies.
Collapse
Affiliation(s)
- George I. Austin
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
- Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Aya Brown Kav
- Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Heekuk Park
- Division of Infectious Diseases, Columbia University Irving Medical Center, New York, NY, USA
| | - Jana Biermann
- Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Medicine, Division of Hematology/Oncology, Columbia University Irving Medical Center, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Anne-Catrin Uhlemann
- Division of Infectious Diseases, Columbia University Irving Medical Center, New York, NY, USA
| | - Tal Korem
- Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
| |
Collapse
|
8
|
Lin Q, Li L, De Vrieze J, Li C, Fang X, Li X. Functional conservation of microbial communities determines composition predictability in anaerobic digestion. THE ISME JOURNAL 2023; 17:1920-1930. [PMID: 37666974 PMCID: PMC10579369 DOI: 10.1038/s41396-023-01505-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 08/24/2023] [Accepted: 08/29/2023] [Indexed: 09/06/2023]
Abstract
A major challenge in managing and engineering microbial communities is determining whether and how microbial community responses to environmental alterations can be predicted and explained, especially in microorganism-driven systems. We addressed this challenge by monitoring microbial community responses to the periodic addition of the same feedstock throughout anaerobic digestion, a typical microorganism-driven system where microorganisms degrade and transform the feedstock. The immediate and delayed response consortia were assemblages of microorganisms whose abundances significantly increased on the first or third day after feedstock addition. The immediate response consortia were more predictable than the delayed response consortia and showed a reproducible and predictable order-level composition across multiple feedstock additions. These results stood in both present (16 S rRNA gene) and potentially active (16 S rRNA) microbial communities and in different feedstocks with different biodegradability and were validated by simulation modeling. Despite substantial species variability, the immediate response consortia aligned well with the reproducible CH4 production, which was attributed to the conservation of expressed functions by the response consortia throughout anaerobic digestion, based on metatranscriptomic data analyses. The high species variability might be attributed to intraspecific competition and contribute to biodiversity maintenance and functional redundancy. Our results demonstrate reproducible and predictable microbial community responses and their importance in stabilizing system functions.
Collapse
Affiliation(s)
- Qiang Lin
- Key Laboratory of Environmental and Applied Microbiology, CAS; Environmental Microbiology Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
| | - Lingjuan Li
- Department of Biology, University of Antwerp, 2610, Wilrijk, Belgium
| | - Jo De Vrieze
- Center for Microbial Ecology and Technology (CMET), Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Chaonan Li
- Key Laboratory of Environmental and Applied Microbiology, CAS; Environmental Microbiology Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
| | - Xiaoyu Fang
- Key Laboratory of Environmental and Applied Microbiology, CAS; Environmental Microbiology Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
| | - Xiangzhen Li
- Key Laboratory of Environmental and Applied Microbiology, CAS; Environmental Microbiology Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China.
| |
Collapse
|
9
|
Mishra AK, Mahmud I, Lorenzi PL, Jenq RR, Wargo JA, Ajami NJ, Peterson CB. TARO: tree-aggregated factor regression for microbiome data integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.17.562792. [PMID: 37904958 PMCID: PMC10614880 DOI: 10.1101/2023.10.17.562792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Motivation Although the human microbiome plays a key role in health and disease, the biological mechanisms underlying the interaction between the microbiome and its host are incompletely understood. Integration with other molecular profiling data offers an opportunity to characterize the role of the microbiome and elucidate therapeutic targets. However, this remains challenging to the high dimensionality, compositionality, and rare features found in microbiome profiling data. These challenges necessitate the use of methods that can achieve structured sparsity in learning cross-platform association patterns. Results We propose Tree-Aggregated factor RegressiOn (TARO) for the integration of microbiome and metabolomic data. We leverage information on the phylogenetic tree structure to flexibly aggregate rare features. We demonstrate through simulation studies that TARO accurately recovers a low-rank coefficient matrix and identifies relevant features. We applied TARO to microbiome and metabolomic profiles gathered from subjects being screened for colorectal cancer to understand how gut microrganisms shape intestinal metabolite abundances. Availability and implementation The R package TARO implementing the proposed methods is available online at https://github.com/amishra-stats/taro-package .
Collapse
|
10
|
Castillo DF, Denson LA, Haslam DB, Hommel KA, Ollberding NJ, Sahay R, Santucci NR. The microbiome in adolescents with irritable bowel syndrome and changes with percutaneous electrical nerve field stimulation. Neurogastroenterol Motil 2023; 35:e14573. [PMID: 37092330 PMCID: PMC10729794 DOI: 10.1111/nmo.14573] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 02/19/2023] [Accepted: 03/14/2023] [Indexed: 04/25/2023]
Abstract
BACKGROUND Irritable bowel syndrome (IBS), a disorder of the gut-brain axis, is affected by the microbiome. Microbial studies in pediatric IBS, especially for centrally mediated treatments, are lacking. We compared the microbiome between pediatric IBS patients and healthy controls (HC), in relation to symptom severity, and with percutaneous electrical nerve field stimulation (PENFS), a non-invasive treatment targeting central pain pathways. METHODS We collected a stool sample, questionnaires and a 1-2 week stool and pain diary from 11 to 18 years patients with IBS. A patient subset completed 4 weeks of PENFS and repeated data collection immediately after and/or 3 months after treatment. Stool samples were collected from HC. Samples underwent metagenomic sequencing to evaluate diversity, composition, and abundance of species and MetaCyc pathways. KEY RESULTS We included 27 cases (15.4 ± 2.5 year) and 34 HC (14.2 ± 2.9 year). Twelve species including Firmicutes spp., and carbohydrate degradation/long-chain fatty acid (LCFA) synthesis pathways, were increased in IBS but not statistically significantly associated with symptom severity. Seventeen participants (female) who completed PENFS showed improvements in pain (p = 0.012), disability (p = 0.007), and catastrophizing (p = 0.003). Carbohydrate degradation and LCFA synthesis pathways decreased post-treatment and at follow-up (FDR p-value <0.1). CONCLUSIONS AND INFERENCES Firmicutes, including Clostridiaceae spp., and LCFA synthesis pathways were increased in IBS patients suggesting pain-potentiating effects. PENFS led to marked improvements in abdominal pain, functioning, and catastrophizing, while Clostridial species and LCFA microbial pathways decreased with treatment, suggesting these as potential targets for IBS centrally mediated treatments.
Collapse
Affiliation(s)
- Daniel F. Castillo
- Division of Gastroenterology, Hepatology and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| | - Lee A. Denson
- Division of Gastroenterology, Hepatology and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| | - David B. Haslam
- Division of Infectious Disease, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Kevin A. Hommel
- Division of Behavioral Medicine and Clinical Psychology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Nicholas J. Ollberding
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Rashmi Sahay
- Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Neha R. Santucci
- Division of Gastroenterology, Hepatology and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| |
Collapse
|
11
|
Viljanen M, Boshuizen H. llperm: a permutation of regressor residuals test for microbiome data. BMC Bioinformatics 2022; 23:540. [PMID: 36510128 PMCID: PMC9743778 DOI: 10.1186/s12859-022-05088-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 11/29/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Differential abundance testing is an important aspect of microbiome data analysis, where each taxa is fitted with a statistical test or a regression model. However, many models do not provide a good fit to real microbiome data. This has been shown to result in high false positive rates. Permutation tests are a good alternative, but a regression approach is desired for small data sets with many covariates, where stratification is not an option. RESULTS We implement an R package 'llperm' where the The Permutation of Regressor Residuals (PRR) test can be applied to any likelihood based model, not only generalized linear models. This enables distributions with zero-inflation and overdispersion, making the test suitable for count regression models popular in microbiome data analysis. Simulations based on a real data set show that the PRR-test approach is able to maintain the correct nominal false positive rate expected from the null hypothesis, while having equal or greater power to detect the true positives as models based on likelihood at a given false positive rate. CONCLUSIONS Standard count regression models can have a shockingly high false positive rate in microbiome data sets. As they may lead to false conclusions, the guaranteed nominal false positive rate gained from the PRR-test can be viewed as a major benefit.
Collapse
Affiliation(s)
- Markus Viljanen
- grid.31147.300000 0001 2208 0118National Institute for Public Health and the Environment - RIVM, PO Box 1, 3720 BA Bilthoven, The Netherlands
| | - Hendriek Boshuizen
- grid.31147.300000 0001 2208 0118National Institute for Public Health and the Environment - RIVM, PO Box 1, 3720 BA Bilthoven, The Netherlands
| |
Collapse
|
12
|
Cappellato M, Baruzzo G, Di Camillo B. Investigating differential abundance methods in microbiome data: A benchmark study. PLoS Comput Biol 2022; 18:e1010467. [PMID: 36074761 PMCID: PMC9488820 DOI: 10.1371/journal.pcbi.1010467] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 09/20/2022] [Accepted: 08/03/2022] [Indexed: 11/19/2022] Open
Abstract
The development of increasingly efficient and cost-effective high throughput DNA sequencing techniques has enhanced the possibility of studying complex microbial systems. Recently, researchers have shown great interest in studying the microorganisms that characterise different ecological niches. Differential abundance analysis aims to find the differences in the abundance of each taxa between two classes of subjects or samples, assigning a significance value to each comparison. Several bioinformatic methods have been specifically developed, taking into account the challenges of microbiome data, such as sparsity, the different sequencing depth constraint between samples and compositionality. Differential abundance analysis has led to important conclusions in different fields, from health to the environment. However, the lack of a known biological truth makes it difficult to validate the results obtained. In this work we exploit metaSPARSim, a microbial sequencing count data simulator, to simulate data with differential abundance features between experimental groups. We perform a complete comparison of recently developed and established methods on a common benchmark with great effort to the reliability of both the simulated scenarios and the evaluation metrics. The performance overview includes the investigation of numerous scenarios, studying the effect on methods’ results on the main covariates such as sample size, percentage of differentially abundant features, sequencing depth, feature variability, normalisation approach and ecological niches. Mainly, we find that methods show a good control of the type I error and, generally, also of the false discovery rate at high sample size, while recall seem to depend on the dataset and sample size. The Microbiota is the set of microorganisms that characterize an ecological environment or niche. Several studies have shown that the microbiota is involved in various biological mechanisms that affect the health or balance of the host organism or the ecosystem. New discoveries and insights have been possible thanks to the increasingly efficient sequencing technologies together with the development of bioinformatic computational methods. One of the most interesting analyses in this landscape is the identification of microorganisms that show significant different abundances when two groups of subjects are analysed. Although many computational methods have been developed, it is still unclear which one has the best performance. Therefore, we exploited a simulator of microbiome data to build a simulation framework that allowed us to carry out an extensive benchmarking of the known tools of differential abundance analysis. Our work is not only a starting point to guide analysts in the choice of tools, but also a first step towards a robust, reliable and fair simulation framework.
Collapse
Affiliation(s)
- Marco Cappellato
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Giacomo Baruzzo
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Barbara Di Camillo
- Department of Information Engineering, University of Padova, Padova, Italy
- Department of Comparative Biomedicine and Food Science, University of Padova, Padova, Italy
- * E-mail:
| |
Collapse
|
13
|
Chen Q, Lin S, Song C. An Adaptive and Robust Test for Microbial Community Analysis. Front Genet 2022; 13:846258. [PMID: 35664318 PMCID: PMC9162041 DOI: 10.3389/fgene.2022.846258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 03/28/2022] [Indexed: 11/21/2022] Open
Abstract
In microbiome studies, researchers measure the abundance of each operational taxon unit (OTU) and are often interested in testing the association between the microbiota and the clinical outcome while conditional on certain covariates. Two types of approaches exists for this testing purpose: the OTU-level tests that assess the association between each OTU and the outcome, and the community-level tests that examine the microbial community all together. It is of considerable interest to develop methods that enjoy both the flexibility of OTU-level tests and the biological relevance of community-level tests. We proposed MiAF, a method that adaptively combines p-values from the OTU-level tests to construct a community-level test. By borrowing the flexibility of OTU-level tests, the proposed method has great potential to generate a series of community-level tests that suit a range of different microbiome profiles, while achieving the desirable high statistical power of community-level testing methods. Using simulation study and real data applications in a smoker throat microbiome study and a HIV patient stool microbiome study, we demonstrated that MiAF has comparable or better power than methods that are specifically designed for community-level tests. The proposed method also provides a natural heuristic taxa selection.
Collapse
Affiliation(s)
- Qingyu Chen
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, United States
| | - Shili Lin
- Department of Statistics, College of Arts and Sciences, The Ohio State University, Columbus, OH, United States
- *Correspondence: Shili Lin, ; Chi Song,
| | - Chi Song
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, United States
- *Correspondence: Shili Lin, ; Chi Song,
| |
Collapse
|
14
|
Sun H, Huang X, Huo B, Tan Y, He T, Jiang X. Detecting sparse microbial association signals adaptively from longitudinal microbiome data based on generalized estimating equations. Brief Bioinform 2022; 23:6585623. [PMID: 35561307 DOI: 10.1093/bib/bbac149] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/11/2022] [Accepted: 04/02/2022] [Indexed: 12/18/2022] Open
Abstract
The association between the compositions of microbial communities and various host phenotypes is an important research topic. Microbiome association research addresses multiple domains, such as human disease and diet. Statistical methods for testing microbiome-phenotype associations have been studied recently to determine their ability to assess longitudinal microbiome data. However, existing methods fail to detect sparse association signals in longitudinal microbiome data. In this paper, we developed a novel method, namely aGEEMIHC, which is a data-driven adaptive microbiome higher criticism analysis based on generalized estimating equations to detect sparse microbial association signals from longitudinal microbiome data. aGEEMiHC adopts generalized estimating equations framework that fully considers the correlation among different observations from the same subject in longitudinal data. To be robust to diverse correlation structures for longitudinal data, aGEEMiHC integrates multiple microbiome higher criticism analyses based on generalized estimating equations with different working correlation structures. Extensive simulation experiments demonstrate that aGEEMiHC can control the type I error correctly and achieve superior performance according to a statistical power comparison. We also applied it to longitudinal microbiome data with various types of host phenotypes to demonstrate the stability of our method. aGEEMiHC is also utilized for real longitudinal microbiome data, and we found a significant association between the gut microbiome and Crohn's disease. In addition, our method ranks the significant factors associated with the host phenotype to provide potential biomarkers.
Collapse
Affiliation(s)
- Han Sun
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.,Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
| | - Xiaoyun Huang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.,Collaborative & Innovative Center for Educational Technology, Central China Normal University, Wuhan 430079, China
| | - Ban Huo
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.,School of Computer, Central China Normal University, Wuhan 430079, China
| | - Yuting Tan
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.,Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
| | - Tingting He
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.,School of Computer, Central China Normal University, Wuhan 430079, China.,National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.,School of Computer, Central China Normal University, Wuhan 430079, China.,National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
15
|
Dahan E, Martin VM, Yassour M. EasyMap - An Interactive Web Tool for Evaluating and Comparing Associations of Clinical Variables and Microbiome Composition. Front Cell Infect Microbiol 2022; 12:854164. [PMID: 35646745 PMCID: PMC9136407 DOI: 10.3389/fcimb.2022.854164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 04/05/2022] [Indexed: 12/30/2022] Open
Abstract
One of the most common tasks in microbiome studies is comparing microbial profiles across various groups of people (e.g., sick vs. healthy). Routinely, researchers use multivariate linear regression models to address these challenges, such as linear regression packages, MaAsLin2, LEfSe, etc. In many cases, it is unclear which metadata variables should be included in the linear model, as many human-associated variables are correlated with one another. Thus, multiple models are often tested, each including a different set of variables, however the challenge of selecting the metadata variables in the final model remains. Here, we present EasyMap, an interactive online tool allowing for (1) running multiple multivariate linear regression models, on the same features and metadata; (2) visualizing the associations between microbial features and clinical metadata found in each model; and (3) comparing across the various models to identify the critical metadata variables and select the optimal model. EasyMap provides a side-by-side visualization of association results across the various models, each with additional metadata variables, enabling us to evaluate the impact of each metadata variable on the associated feature. EasyMap’s interface enables filtering associations by significance, focusing on specific microbes and finding the robust associations that are found across multiple models. While EasyMap was designed to analyze microbiome data, it can handle any other tabular data with numeric features and metadata variables. EasyMap takes the common task of multivariate linear regression to the next level, with an intuitive and simple user interface, allowing for wide comparisons of multiple models to identify the robust microbial feature associations. EasyMap is available at http://yassour.rcs.huji.ac.il/easymap.
Collapse
Affiliation(s)
- Ehud Dahan
- Microbiology and Molecular Genetics, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Victoria M. Martin
- Department of Pediatrics, Massachusetts General Hospital, Boston, MA, United States
| | - Moran Yassour
- Microbiology and Molecular Genetics, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
- School of Computer Science & Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
- *Correspondence: Moran Yassour,
| |
Collapse
|
16
|
Nguyen QP, Hoen AG, Frost HR. CBEA: Competitive balances for taxonomic enrichment analysis. PLoS Comput Biol 2022; 18:e1010091. [PMID: 35584140 PMCID: PMC9154102 DOI: 10.1371/journal.pcbi.1010091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 05/31/2022] [Accepted: 04/08/2022] [Indexed: 12/15/2022] Open
Abstract
Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and sparsity is to aggregate variables into pre-defined sets. Set-based analysis is ubiquitous in the genomics literature and has demonstrable impact on improving interpretability and power of downstream analysis. Unfortunately, there is a lack of sophisticated set-based analysis methods specific to microbiome taxonomic data, where current practice often employs abundance summation as a technique for aggregation. This approach prevents comparison across sets of different sizes, does not preserve inter-sample distances, and amplifies protocol bias. Here, we attempt to fill this gap with a new single-sample taxon enrichment method that uses a novel log-ratio formulation based on the competitive null hypothesis commonly used in the enrichment analysis literature. Our approach, titled competitive balances for taxonomic enrichment analysis (CBEA), generates sample-specific enrichment scores as the scaled log-ratio of the subcomposition defined by taxa within a set and the subcomposition defined by its complement. We provide sample-level significance testing by estimating an empirical null distribution of our test statistic with valid p-values. Herein, we demonstrate, using both real data applications and simulations, that CBEA controls for type I error, even under high sparsity and high inter-taxa correlation scenarios. Additionally, CBEA provides informative scores that can be inputs to downstream analyses such as prediction tasks.
Collapse
Affiliation(s)
- Quang P. Nguyen
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
| | - Anne G. Hoen
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
| | - H. Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America
| |
Collapse
|
17
|
Zeng Y, Li J, Wei C, Zhao H, Wang T. mbDenoise: microbiome data denoising using zero-inflated probabilistic principal components analysis. Genome Biol 2022; 23:94. [PMID: 35422001 PMCID: PMC9011970 DOI: 10.1186/s13059-022-02657-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 03/21/2022] [Indexed: 12/13/2022] Open
Abstract
The analysis of microbiome data has several technical challenges. In particular, count matrices contain a large proportion of zeros, some of which are biological, whereas others are technical. Furthermore, the measurements suffer from unequal sequencing depth, overdispersion, and data redundancy. These nuisance factors introduce substantial noise. We propose an accurate and robust method, mbDenoise, for denoising microbiome data. Assuming a zero-inflated probabilistic PCA (ZIPPCA) model, mbDenoise uses variational approximation to learn the latent structure and recovers the true abundance levels using the posterior, borrowing information across samples and taxa. mbDenoise outperforms state-of-the-art methods to extract the signal for downstream analyses.
Collapse
Affiliation(s)
- Yanyan Zeng
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Jing Li
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Chaochun Wei
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, USA.
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China.
| | - Tao Wang
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China.
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China.
- Department of Statistics, School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China.
- Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
18
|
Nagpal S, Singh R, Taneja B, Mande SS. MarkerML – Marker feature identification in metagenomic datasets using interpretable machine learning. J Mol Biol 2022; 434:167589. [DOI: 10.1016/j.jmb.2022.167589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 04/08/2022] [Accepted: 04/12/2022] [Indexed: 12/29/2022]
|
19
|
Xu Y, Nash K, Acharjee A, Gkoutos GV. CACONET: a novel classification framework for microbial correlation networks. Bioinformatics 2022; 38:1639-1647. [PMID: 34983063 PMCID: PMC8896646 DOI: 10.1093/bioinformatics/btab879] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 12/15/2021] [Accepted: 12/30/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Existing microbiome-based disease prediction relies on the ability of machine learning methods to differentiate disease from healthy subjects based on the observed taxa abundance across samples. Despite numerous microbes have been implicated as potential biomarkers, challenges remain due to not only the statistical nature of microbiome data but also the lack of understanding of microbial interactions which can be indicative of the disease. RESULTS We propose CACONET (classification of Compositional-Aware COrrelation NETworks), a computational framework that learns to classify microbial correlation networks and extracts potential signature interactions, taking as input taxa relative abundance across samples and their health status. By using Bayesian compositional-aware correlation inference, a collection of posterior correlation networks can be drawn and used for graph-level classification, thus incorporating uncertainty in the estimates. CACONET then employs a deep learning approach for graph classification, achieving excellent performance metrics by exploiting the correlation structure. We test the framework on both simulated data and a large real-world dataset pertaining to microbiome samples of colorectal cancer (CRC) and healthy subjects, and identify potential network substructure characteristic of CRC microbiota. CACONET is customizable and can be adapted to further improve its utility. AVAILABILITY AND IMPLEMENTATION CACONET is available at https://github.com/yuanwxu/corr-net-classify. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuanwei Xu
- To whom correspondence should be addressed.
| | - Katrina Nash
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK
| | - Animesh Acharjee
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK,NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham B15 2TT, UK,Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham B15 2TT, UK,MRC Health Data Research UK (HDR), Midlands Site B15 2TT, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK,NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham B15 2TT, UK,Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham B15 2TT, UK,MRC Health Data Research UK (HDR), Midlands Site B15 2TT, UK
| |
Collapse
|
20
|
Mallick H, Rahnavard A, McIver LJ, Ma S, Zhang Y, Nguyen LH, Tickle TL, Weingart G, Ren B, Schwager EH, Chatterjee S, Thompson KN, Wilkinson JE, Subramanian A, Lu Y, Waldron L, Paulson JN, Franzosa EA, Bravo HC, Huttenhower C. Multivariable association discovery in population-scale meta-omics studies. PLoS Comput Biol 2021; 17:e1009442. [PMID: 34784344 PMCID: PMC8714082 DOI: 10.1371/journal.pcbi.1009442] [Citation(s) in RCA: 601] [Impact Index Per Article: 200.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 12/28/2021] [Accepted: 09/09/2021] [Indexed: 12/13/2022] Open
Abstract
It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2's linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.
Collapse
Affiliation(s)
- Himel Mallick
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Ali Rahnavard
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington DC, United States of America
| | - Lauren J. McIver
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Siyuan Ma
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Yancong Zhang
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Long H. Nguyen
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Timothy L. Tickle
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - George Weingart
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Boyu Ren
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Emma H. Schwager
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Suvo Chatterjee
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Kelsey N. Thompson
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Jeremy E. Wilkinson
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Ayshwarya Subramanian
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Yiren Lu
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Levi Waldron
- Department of Epidemiology and Biostatistics, CUNY School of Public Health, New York City, New York, United States of America
| | - Joseph N. Paulson
- Department of Biostatistics, Product Development, Genentech, Inc., South San Francisco, California, United States of America
| | - Eric A. Franzosa
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Hector Corrada Bravo
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Curtis Huttenhower
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| |
Collapse
|
21
|
Mallick H, Rahnavard A, McIver LJ, Ma S, Zhang Y, Nguyen LH, Tickle TL, Weingart G, Ren B, Schwager EH, Chatterjee S, Thompson KN, Wilkinson JE, Subramanian A, Lu Y, Waldron L, Paulson JN, Franzosa EA, Bravo HC, Huttenhower C. Multivariable association discovery in population-scale meta-omics studies. PLoS Comput Biol 2021. [PMID: 34784344 DOI: 10.1101/2021.01.20.427420v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2023] Open
Abstract
It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2's linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.
Collapse
Affiliation(s)
- Himel Mallick
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Ali Rahnavard
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington DC, United States of America
| | - Lauren J McIver
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Siyuan Ma
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Yancong Zhang
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Long H Nguyen
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Timothy L Tickle
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - George Weingart
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Boyu Ren
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Emma H Schwager
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Suvo Chatterjee
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Kelsey N Thompson
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Jeremy E Wilkinson
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Ayshwarya Subramanian
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Yiren Lu
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Levi Waldron
- Department of Epidemiology and Biostatistics, CUNY School of Public Health, New York City, New York, United States of America
| | - Joseph N Paulson
- Department of Biostatistics, Product Development, Genentech, Inc., South San Francisco, California, United States of America
| | - Eric A Franzosa
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| | - Hector Corrada Bravo
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Curtis Huttenhower
- Biostatistics Department, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- The Broad Institute, Cambridge, Massachusetts, United States of America
| |
Collapse
|