1
|
Creus-Martí I, Moya A, Santonja FJ. Methodology for microbiome data analysis: An overview. Comput Biol Med 2025; 192:110157. [PMID: 40279974 DOI: 10.1016/j.compbiomed.2025.110157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 03/07/2025] [Accepted: 04/04/2025] [Indexed: 04/29/2025]
Abstract
It is known that microbiome and health are related, in addition, recent research has found that microbiome has potential clinical uses. These facts highlight the importance of the microbiome in actual science. However, microbiome data has some characteristics that makes its statistical study challenging. In recent years, longitudinal and non-longitudinal methods have been designed to analyze the microbiota and knowing more about the bacterial behavior. In this article in the form of a review we summarize the characteristics of microbiome data and the statistical methods most widespread to analyze it. We have taken into account if the strategies are longitudinal or not. We also classify the methods based on their specific analytical objectives and based on their mathematical characteristics. The methods are structured according to their biological goals and mathematical features, ensuring that the insights provided are both relevant and accessible to professionals in biology and statistics. We present this review as a reference for the most widely used methods in microbiome data analysis and as a foundation for identifying potential areas for future research. We want to point out that this review can be particularly useful to remark the importance of the methodology designed in order to study microbiome longitudinal datasets.
Collapse
Affiliation(s)
- Irene Creus-Martí
- Department of Applied Mathematics, Universitat Politècnica de València, Valencia, Spain.
| | - Andrés Moya
- Institute for Integrative Systems Biology (I2Sysbio), Universitat de València and CSIC, València, Spain; The Foundation for the Promotion of Health and Biomedical Research of Valencia Region (FISABIO), Valencia, Spain; CIBER in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Francisco J Santonja
- Department of Statistics and Operation Research, Universitat de València, Valencia, Spain
| |
Collapse
|
2
|
Mei H, Wang Z, Yang H, Li X, Xu Y. Network analysis of multivariate time series data in biological systems: methods and applications. Brief Bioinform 2025; 26:bbaf223. [PMID: 40401349 PMCID: PMC12096012 DOI: 10.1093/bib/bbaf223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2025] [Revised: 04/17/2025] [Accepted: 04/30/2025] [Indexed: 05/23/2025] Open
Abstract
Network analysis has become an essential tool in biological and biomedical research, providing insights into complex biological mechanisms. Since biological systems are inherently time-dependent, incorporating time-varying methods is crucial for capturing temporal changes, adaptive interactions, and evolving dependencies within networks. Our study explores key time-varying methodologies for network structure estimation and network inference based on observed structures. We begin by discussing approaches for estimating network structures from data, focusing on the time-varying Gaussian graphical model, dynamic Bayesian network, and vector autoregression-based causal analysis. Next, we examine analytical techniques that leverage pre-specified or observed networks, including other autoregression-based methods and latent variable models. Furthermore, we explore practical applications and computational tools designed for these methods. By synthesizing these approaches, our study provides a comprehensive evaluation of their strengths and limitations in the context of biological data analysis.
Collapse
Affiliation(s)
- Hao Mei
- Center for Applied Statistics, School of Statistics, Institute of Health Data Science, Renmin University of China, 59 Zhongguancun Street, 100872 Beijing, China
| | - Zhiyuan Wang
- Center for Applied Statistics, School of Statistics, Institute of Health Data Science, Renmin University of China, 59 Zhongguancun Street, 100872 Beijing, China
| | - Hang Yang
- Center for Applied Statistics, School of Statistics, Institute of Health Data Science, Renmin University of China, 59 Zhongguancun Street, 100872 Beijing, China
| | - Xiaoke Li
- Center for Applied Statistics, School of Statistics, Institute of Health Data Science, Renmin University of China, 59 Zhongguancun Street, 100872 Beijing, China
| | - Yaqing Xu
- Department of Epidemiology and Biostatistics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, 200025 Shanghai, China
| |
Collapse
|
3
|
VP B, Narayanan M. Demixer: a probabilistic generative model to delineate different strains of a microbial species in a mixed infection sample. Bioinformatics 2025; 41:btaf139. [PMID: 40178927 PMCID: PMC12011361 DOI: 10.1093/bioinformatics/btaf139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 02/20/2025] [Accepted: 04/02/2025] [Indexed: 04/05/2025] Open
Abstract
MOTIVATION Multi-drug resistant or hetero-resistant tuberculosis (TB) hinders the successful treatment of TB. Hetero-resistant TB occurs when multiple strains of the TB-causing bacterium with varying degrees of drug susceptibility are present in an individual. Existing studies predicting the proportion and identity of strains in a mixed infection sample rely on a reference database of known strains. A main challenge then is to identify de novo strains not present in the reference database, while quantifying the proportion of known strains. RESULTS We present Demixer, a probabilistic generative model that uses a combination of reference-based and reference-free techniques to delineate mixed infection strains in whole genome sequencing (WGS) data. Demixer extends a topic model widely used in text mining to represent known mutations and discover novel ones. Parallelization and other heuristics enabled Demixer to process large datasets like CRyPTIC (Comprehensive Resistance Prediction for Tuberculosis: an International Consortium). In both synthetic and experimental benchmark datasets, our proposed method precisely detected the identity (e.g. 91.67% accuracy on the experimental in vitro dataset) as well as the proportions of the mixed strains. In real-world applications, Demixer revealed novel high confidence mixed infections (101 out of 1963 Malawi samples analysed), and new insights into the global frequency of mixed infection (2% at the most stringent threshold in the CRyPTIC dataset) and its significant association to drug resistance. Our approach is generalizable and hence applicable to any bacterial and viral WGS data. AVAILABILITY AND IMPLEMENTATION All code relevant to Demixer is available at https://github.com/BIRDSgroup/Demixer.
Collapse
Affiliation(s)
- Brintha VP
- Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras, Chennai, 600036, India
- Center for Integrative Biology and Systems Medicine (IBSE), IIT Madras, Chennai, 600036, India
- Wadhwani School of Data Science and AI, IIT Madras, Chennai, 600036, India
| | - Manikandan Narayanan
- Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras, Chennai, 600036, India
- Center for Integrative Biology and Systems Medicine (IBSE), IIT Madras, Chennai, 600036, India
- Wadhwani School of Data Science and AI, IIT Madras, Chennai, 600036, India
| |
Collapse
|
4
|
Xiong J, Ma YJ, Liao XS, Li LQ, Bao L. Gut microbiota in infants with food protein enterocolitis. Pediatr Res 2025; 97:763-773. [PMID: 39033251 DOI: 10.1038/s41390-024-03424-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 07/07/2024] [Indexed: 07/23/2024]
Abstract
BACKGROUND We explored the effects of two formulas, extensively hydrolyzed formula (EHF) and amino acid-based formula (AAF), on the gut microbiota and short-chain fatty acids (SCFAs) in infants with food protein-induced enterocolitis syndrome (FPIES). METHODS Fecal samples of thirty infants with bloody diarrhea receiving EHF or AAF feeding were collected at enrollment, diagnosis of FPIES, and four weeks after diagnosis. The gut microbiota and SCFAs were analyzed using 16 S rRNA gene sequencing and gas chromatography-mass spectrometry, respectively. RESULTS Microbial diversity of FPIES infants was significantly different from that of the controls. FPIES infants had a significantly lower abundance of Bifidobacterium and a higher level of hexanoic acid compared with controls. In EHF-fed FPIES infants, microbial richness was significantly decreased over time; while the microbial diversity and richness in AAF-fed FPIES infants exhibited no differences at the three time points. By four weeks after diagnosis, EHF-fed FPIES infants contained a decreased abundance of Acinetobacter, whereas AAF-fed FPIES infants contained an increased abundance of Escherichia-Shigella. EHF-fed infants experienced significantly decreased levels of butyric acid and hexanoic acid at four weeks after diagnosis. CONCLUSIONS Infants with FPIES had intestinal dysbiosis and different formulas differentially affected gut microbiota and SCFAs in FPIES infants. IMPACT We firstly report the impacts of two different nutritional milk formulas on the gut microbial composition and SCFAs levels in infants with FPIES. We show that infants with FPIES have obvious intestinal dysbiosis and different formulas differentially affect gut microbiota and SCFAs in FPIES infants. Understanding the effects of different types of formulas on gut microbial colonization and composition, as well as the related metabolites in infants with FPIES could help provide valuable insights for making choices about feeding practices.
Collapse
Affiliation(s)
- Jing Xiong
- Department of Neonatology, Children's Hospital of Chongqing Medical University, Chongqing, China
- Ministry of Education Key Laboratory of Child Development and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China
- National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China
- China International Science and Technology Cooperation base of Child Development and Critical Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Pediatric Metabolism and Inflammatory Diseases, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Yu-Jue Ma
- Department of Neonatology, Children's Hospital of Chongqing Medical University, Chongqing, China
- Ministry of Education Key Laboratory of Child Development and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China
- National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China
- China International Science and Technology Cooperation base of Child Development and Critical Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Pediatric Metabolism and Inflammatory Diseases, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Xing-Sheng Liao
- Department of Neonatology, The first People's Hospital of Jiulongpo District, Chongqing, China
| | - Lu-Quan Li
- Department of Neonatology, Children's Hospital of Chongqing Medical University, Chongqing, China.
- Ministry of Education Key Laboratory of Child Development and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China.
- National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China.
- China International Science and Technology Cooperation base of Child Development and Critical Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China.
- Chongqing Key Laboratory of Pediatric Metabolism and Inflammatory Diseases, Children's Hospital of Chongqing Medical University, Chongqing, China.
| | - Lei Bao
- Department of Neonatology, Children's Hospital of Chongqing Medical University, Chongqing, China.
- Ministry of Education Key Laboratory of Child Development and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China.
- National Clinical Research Center for Child Health and Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China.
- China International Science and Technology Cooperation base of Child Development and Critical Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China.
- Chongqing Key Laboratory of Pediatric Metabolism and Inflammatory Diseases, Children's Hospital of Chongqing Medical University, Chongqing, China.
| |
Collapse
|
5
|
Mishra A, McNichol J, Fuhrman J, Blei D, Müller CL. Variational inference for microbiome survey data with application to global ocean data. ISME COMMUNICATIONS 2025; 5:ycaf062. [PMID: 40352106 PMCID: PMC12064564 DOI: 10.1093/ismeco/ycaf062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 01/21/2025] [Accepted: 04/08/2025] [Indexed: 05/14/2025]
Abstract
Linking sequence-derived microbial taxa abundances to host (patho-)physiology or habitat characteristics in a reproducible and interpretable manner has remained a formidable challenge for the analysis of microbiome survey data. Here, we introduce a flexible probabilistic modeling framework, VI-MIDAS (variational inference for microbiome survey data analysis), that enables joint estimation of context-dependent drivers and broad patterns of associations of microbial taxon abundances from microbiome survey data. VI-MIDAS comprises mechanisms for direct coupling of taxon abundances with covariates and taxa-specific latent coupling, which can incorporate spatio-temporal information and taxon-taxon interactions. We leverage mean-field variational inference for posterior VI-MIDAS model parameter estimation and illustrate model building and analysis using Tara Ocean Expedition survey data. Using VI-MIDAS' latent embedding model and tools from network analysis, we show that marine microbial communities can be broadly categorized into five modules, including SAR11-, nitrosopumilus-, and alteromondales-dominated communities, each associated with specific environmental and spatiotemporal signatures. VI-MIDAS also finds evidence for largely positive taxon-taxon associations in SAR11 or Rhodospirillales clades, and negative associations with Alteromonadales and Flavobacteriales classes. Our results indicate that VI-MIDAS provides a powerful integrative statistical analysis framework for discovering broad patterns of associations between microbial taxa and context-specific covariate data from microbiome survey data.
Collapse
Affiliation(s)
- Aditya Mishra
- Department of Statistics, University of Georgia, Athens, GA, 30606, United States
| | - Jesse McNichol
- Department of Biology, St. Francis Xavier University, Antigonish, NS, B2G 2W5, Canada
- Department of Biological Sciences, University of Southern California, LA, 90007, United States
| | - Jed Fuhrman
- Department of Biological Sciences, University of Southern California, LA, 90007, United States
| | - David Blei
- Center for Computational Mathematics, Flatiron Institute, New York, NY, 10010, United States
- Department of Statistics and Computer Science, Columbia University, New York, NY, 10027, United States
| | - Christian L Müller
- Center for Computational Mathematics, Flatiron Institute, New York, NY, 10010, United States
- Computational Health Center, Helmholtz Zentrum München, Munich, 85764, Germany
- Department of Statistics, LMU München, Munich, 80539, Germany
| |
Collapse
|
6
|
Sankaran K, Kodikara S, Li JJ, Cao KAL. Semisynthetic simulation for microbiome data analysis. Brief Bioinform 2024; 26:bbaf051. [PMID: 39927858 PMCID: PMC11808806 DOI: 10.1093/bib/bbaf051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/19/2024] [Accepted: 01/23/2025] [Indexed: 02/11/2025] Open
Abstract
High-throughput sequencing data lie at the heart of modern microbiome research. Effective analysis of these data requires careful preprocessing, modeling, and interpretation to detect subtle signals and avoid spurious associations. In this review, we discuss how simulation can serve as a sandbox to test candidate approaches, creating a setting that mimics real data while providing ground truth. This is particularly valuable for power analysis, methods benchmarking, and reliability analysis. We explain the probability, multivariate analysis, and regression concepts behind modern simulators and how different implementations make trade-offs between generality, faithfulness, and controllability. Recognizing that all simulators only approximate reality, we review methods to evaluate how accurately they reflect key properties. We also present case studies demonstrating the value of simulation in differential abundance testing, dimensionality reduction, network analysis, and data integration. Code for these examples is available in an online tutorial (https://go.wisc.edu/8994yz) that can be easily adapted to new problem settings.
Collapse
Affiliation(s)
- Kris Sankaran
- Department of Statistics, University of Wisconsin-Madison, 1300 University Ave, Madison,WI 53703, United States
| | - Saritha Kodikara
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Building 184/30 Royal Parade, Melbourne, VIC 3052, Australia
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, CA 90095, United States
- Department of Human Genetics, University of California, Los Angeles, 695 Charles E Young Dr S, Los Angeles, CA 90095, United States
- Department of Biostatistics, University of California, Los Angeles, 650 Charles E. Young Dr S, Los Angeles, CA 90095, United States
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Building 184/30 Royal Parade, Melbourne, VIC 3052, Australia
| |
Collapse
|
7
|
Mohr AE, Ortega-Santos CP, Whisner CM, Klein-Seetharaman J, Jasbi P. Navigating Challenges and Opportunities in Multi-Omics Integration for Personalized Healthcare. Biomedicines 2024; 12:1496. [PMID: 39062068 PMCID: PMC11274472 DOI: 10.3390/biomedicines12071496] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 06/25/2024] [Accepted: 06/28/2024] [Indexed: 07/28/2024] Open
Abstract
The field of multi-omics has witnessed unprecedented growth, converging multiple scientific disciplines and technological advances. This surge is evidenced by a more than doubling in multi-omics scientific publications within just two years (2022-2023) since its first referenced mention in 2002, as indexed by the National Library of Medicine. This emerging field has demonstrated its capability to provide comprehensive insights into complex biological systems, representing a transformative force in health diagnostics and therapeutic strategies. However, several challenges are evident when merging varied omics data sets and methodologies, interpreting vast data dimensions, streamlining longitudinal sampling and analysis, and addressing the ethical implications of managing sensitive health information. This review evaluates these challenges while spotlighting pivotal milestones: the development of targeted sampling methods, the use of artificial intelligence in formulating health indices, the integration of sophisticated n-of-1 statistical models such as digital twins, and the incorporation of blockchain technology for heightened data security. For multi-omics to truly revolutionize healthcare, it demands rigorous validation, tangible real-world applications, and smooth integration into existing healthcare infrastructures. It is imperative to address ethical dilemmas, paving the way for the realization of a future steered by omics-informed personalized medicine.
Collapse
Affiliation(s)
- Alex E. Mohr
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- Biodesign Institute Center for Health Through Microbiomes, Arizona State University, Tempe, AZ 85281, USA
| | - Carmen P. Ortega-Santos
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
- Department of Exercise and Nutrition Sciences, Milken Institute School of Public Health, George Washington University, Washington, DC 20052, USA
| | - Corrie M. Whisner
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- Biodesign Institute Center for Health Through Microbiomes, Arizona State University, Tempe, AZ 85281, USA
| | - Judith Klein-Seetharaman
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85281, USA
| | - Paniz Jasbi
- Systems Precision Engineering and Advanced Research (SPEAR), Theriome Inc., Phoenix, AZ 85004, USA; (A.E.M.); (C.P.O.-S.); (C.M.W.); (J.K.-S.)
| |
Collapse
|
8
|
Rashidi A, Ebadi M, Rehman TU, Elhusseini H, Kazadi D, Halaweish H, Khan MH, Hoeschen A, Cao Q, Luo X, Kabage AJ, Lopez S, Ramamoorthy S, Holtan SG, Weisdorf DJ, Khoruts A, Staley C. Multi-omics Analysis of a Fecal Microbiota Transplantation Trial Identifies Novel Aspects of Acute GVHD Pathogenesis. CANCER RESEARCH COMMUNICATIONS 2024; 4:1454-1466. [PMID: 38767452 PMCID: PMC11164016 DOI: 10.1158/2767-9764.crc-24-0138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 04/23/2024] [Accepted: 05/15/2024] [Indexed: 05/22/2024]
Abstract
Acute GVHD (aGVHD) is a major complication of allogeneic hematopoietic cell transplantation (alloHCT) associated with gut microbiota disruptions. However, whether therapeutic microbiota modulation prevents aGVHD is unknown. We conducted a randomized, placebo-controlled trial of third-party fecal microbiota transplantation (FMT) administered at the peak of microbiota injury in 100 patients with acute myeloid leukemia receiving induction chemotherapy and alloHCT recipients. Despite improvements in microbiome diversity, expansion of commensals, and shrinkage of potential pathogens, aGVHD occurred more frequently after FMT than placebo. Although this unexpected finding could be explained by clinical differences between the two arms, we asked whether a microbiota explanation might be also present. To this end, we performed multi-omics analysis of preintervention and postintervention gut microbiome and serum metabolome. We found that postintervention expansion of Faecalibacterium, a commensal genus with gut-protective and anti-inflammatory properties under homeostatic conditions, predicted a higher risk for aGVHD. Faecalibacterium expansion occurred predominantly after FMT and was due to engraftment of unique donor taxa, suggesting that donor Faecalibacterium-derived antigens might have stimulated allogeneic immune cells. Faecalibacterium and ursodeoxycholic acid (an anti-inflammatory secondary bile acid) were negatively correlated, offering an alternative mechanistic explanation. In conclusion, we demonstrate context dependence of microbiota effects where a normally beneficial bacteria may become detrimental in disease. While FMT is a broad, community-level intervention, it may need precision engineering in ecologically complex settings where multiple perturbations (e.g., antibiotics, intestinal damage, alloimmunity) are concurrently in effect. SIGNIFICANCE Post-FMT expansion of Faecalibacterium, associated with donor microbiota engraftment, predicted a higher risk for aGVHD in alloHCT recipients. Although Faecalibacterium is a commensal genus with gut-protective and anti-inflammatory properties under homeostatic conditions, our findings suggest that it may become pathogenic in the setting of FMT after alloHCT. Our results support a future trial with precision engineering of the FMT product used as GVHD prophylaxis after alloHCT.
Collapse
Affiliation(s)
- Armin Rashidi
- Clinical Research Division, Fred Hutchinson Cancer Center; and Division of Oncology, University of Washington, Seattle, Washington
- Division of Hematology, Oncology, and Transplantation, Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | - Maryam Ebadi
- Department of Radiation Oncology, University of Washington and Fred Hutchinson Cancer Center, Seattle, Washington
| | - Tauseef U. Rehman
- Division of Hematology, Oncology, and Transplantation, Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | - Heba Elhusseini
- Division of Hematology, Oncology, and Transplantation, Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | - David Kazadi
- Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | - Hossam Halaweish
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota
| | - Mohammad H. Khan
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota
| | - Andrea Hoeschen
- Division of Hematology, Oncology, and Transplantation, Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | - Qing Cao
- Biostatistics Core, Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota
| | - Xianghua Luo
- Biostatistics Core, Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Amanda J. Kabage
- Division of Gastroenterology, Hepatology, and Nutrition, Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | - Sharon Lopez
- Division of Gastroenterology, Hepatology, and Nutrition, Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | | | - Shernan G. Holtan
- Division of Hematology, Oncology, and Transplantation, Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | - Daniel J. Weisdorf
- Division of Hematology, Oncology, and Transplantation, Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | - Alexander Khoruts
- Division of Gastroenterology, Hepatology, and Nutrition, Department of Medicine, University of Minnesota, Minneapolis, Minnesota
- Biotechnology Institute, University of Minnesota, St. Paul, Minnesota
- Center for Immunology, University of Minnesota, Minneapolis, Minnesota
| | | |
Collapse
|
9
|
Ozminkowski S, Solís‐Lemus C. Identifying microbial drivers in biological phenotypes with a Bayesian network regression model. Ecol Evol 2024; 14:e11039. [PMID: 38774136 PMCID: PMC11106058 DOI: 10.1002/ece3.11039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 01/29/2024] [Accepted: 02/03/2024] [Indexed: 05/24/2024] Open
Abstract
In Bayesian Network Regression models, networks are considered the predictors of continuous responses. These models have been successfully used in brain research to identify regions in the brain that are associated with specific human traits, yet their potential to elucidate microbial drivers in biological phenotypes for microbiome research remains unknown. In particular, microbial networks are challenging due to their high dimension and high sparsity compared to brain networks. Furthermore, unlike in brain connectome research, in microbiome research, it is usually expected that the presence of microbes has an effect on the response (main effects), not just the interactions. Here, we develop the first thorough investigation of whether Bayesian Network Regression models are suitable for microbial datasets on a variety of synthetic and real data under diverse biological scenarios. We test whether the Bayesian Network Regression model that accounts only for interaction effects (edges in the network) is able to identify key drivers (microbes) in phenotypic variability. We show that this model is indeed able to identify influential nodes and edges in the microbial networks that drive changes in the phenotype for most biological settings, but we also identify scenarios where this method performs poorly which allows us to provide practical advice for domain scientists aiming to apply these tools to their datasets. BNR models provide a framework for microbiome researchers to identify connections between microbes and measured phenotypes. We allow the use of this statistical model by providing an easy-to-use implementation which is publicly available Julia package at https://github.com/solislemuslab/BayesianNetworkRegression.jl.
Collapse
Affiliation(s)
- Samuel Ozminkowski
- Department of Statistics and Wisconsin Institute for DiscoveryUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Claudia Solís‐Lemus
- Department of Plant Pathology and Wisconsin Institute for DiscoveryUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| |
Collapse
|
10
|
Pappalardo VY, Azarang L, Zaura E, Brandt BW, de Menezes RX. A new approach to describe the taxonomic structure of microbiome and its application to assess the relationship between microbial niches. BMC Bioinformatics 2024; 25:58. [PMID: 38317062 PMCID: PMC10840258 DOI: 10.1186/s12859-023-05575-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 11/20/2023] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND Data from microbiomes from multiple niches is often collected, but methods to analyse these often ignore associations between niches. One interesting case is that of the oral microbiome. Its composition is receiving increasing attention due to reports on its associations with general health. While the oral cavity includes different niches, multi-niche microbiome data analysis is conducted using a single niche at a time and, therefore, ignores other niches that could act as confounding variables. Understanding the interaction between niches would assist interpretation of the results, and help improve our understanding of multi-niche microbiomes. METHODS In this study, we used a machine learning technique called latent Dirichlet allocation (LDA) on two microbiome datasets consisting of several niches. LDA was used on both individual niches and all niches simultaneously. On individual niches, LDA was used to decompose each niche into bacterial sub-communities unveiling their taxonomic structure. These sub-communities were then used to assess the relationship between microbial niches using the global test. On all niches simultaneously, LDA allowed us to extract meaningful microbial patterns. Sets of co-occurring operational taxonomic units (OTUs) comprising those patterns were then used to predict the original location of each sample. RESULTS Our approach showed that the per-niche sub-communities displayed a strong association between supragingival plaque and saliva, as well as between the anterior and posterior tongue. In addition, the LDA-derived microbial signatures were able to predict the original sample niche illustrating the meaningfulness of our sub-communities. For the multi-niche oral microbiome dataset we had an overall accuracy of 76%, and per-niche sensitivity of up to 83%. Finally, for a second multi-niche microbiome dataset from the entire body, microbial niches from the oral cavity displayed stronger associations to each other than with those from other parts of the body, such as niches within the vagina and the skin. CONCLUSION Our LDA-based approach produces sets of co-occurring taxa that can describe niche composition. LDA-derived microbial signatures can also be instrumental in summarizing microbiome data, for both descriptions as well as prediction.
Collapse
Affiliation(s)
- Vincent Y Pappalardo
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam, University of Amsterdam and Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
- Biostatistics Centre, Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands.
| | - Leyla Azarang
- Biostatistics Centre, Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Egija Zaura
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam, University of Amsterdam and Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Bernd W Brandt
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam, University of Amsterdam and Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Renée X de Menezes
- Biostatistics Centre, Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| |
Collapse
|
11
|
Mohr AE, Ahern MM, Sears DD, Bruening M, Whisner CM. Gut microbiome diversity, variability, and latent community types compared with shifts in body weight during the freshman year of college in dormitory-housed adolescents. Gut Microbes 2023; 15:2250482. [PMID: 37642346 PMCID: PMC10467528 DOI: 10.1080/19490976.2023.2250482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 06/26/2023] [Accepted: 08/17/2023] [Indexed: 08/31/2023] Open
Abstract
Significant human gut microbiome changes during adolescence suggest that microbial community evolution occurs throughout important developmental periods including the transition to college, a typical life phase of weight gain. In this observational longitudinal study of 139 college freshmen living in on-campus dormitories, we tracked changes in the gut microbiome via 16S amplicon sequencing and body weight across a single academic year. Participants were grouped by weight change categories of gain (WG), loss (WL), and maintenance (WM). Upon assessment of the community structure, unweighted and weighted UniFrac metrics revealed significant shifts with substantial variation explained by individual effects within weight change categories. Genera that positively contributed to these associations with weight change included Bacteroides, Blautia, and Bifidobacterium in WG participants and Prevotella and Faecalibacterium in WL and WM participants. Moreover, the Prevotella/Bacteroides ratio was significantly different by weight change category, with WL participants displaying an increased ratio. Importantly, these genera did not display co-dominance nor ease of transition between Prevotella- and Bacteroides-dominated states. We further assessed the overall taxonomic variation, noting the increased stability of the WL compared to the WG microbiome. Finally, we found 30 latent community structures within the microbiome with significant associations with waist circumference, sleep, and dietary factors, with alcohol consumption chief among them. Our findings highlight the high level of individual variation and the importance of initial gut microbiome community structure in college students during a period of major lifestyle changes. Further work is needed to confirm these findings and explore mechanistic relationships between gut microbes and weight change in free-living individuals.
Collapse
Affiliation(s)
- Alex E. Mohr
- College of Health Solutions, Arizona State University, Phoenix, AZ, USA
- Center for Health Through Microbiomes, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Mary M. Ahern
- College of Health Solutions, Arizona State University, Phoenix, AZ, USA
| | - Dorothy D. Sears
- College of Health Solutions, Arizona State University, Phoenix, AZ, USA
| | - Meg Bruening
- College of Health Solutions, Arizona State University, Phoenix, AZ, USA
- Department of Nutritional Sciences, College of Health and Human Development, Pennsylvania State University, University Park, PA, USA
| | - Corrie M. Whisner
- College of Health Solutions, Arizona State University, Phoenix, AZ, USA
- Center for Health Through Microbiomes, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
12
|
Symul L, Jeganathan P, Costello EK, France M, Bloom SM, Kwon DS, Ravel J, Relman DA, Holmes S. Sub-communities of the vaginal microbiota in pregnant and non-pregnant women. Proc Biol Sci 2023; 290:20231461. [PMID: 38018105 PMCID: PMC10685114 DOI: 10.1098/rspb.2023.1461] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 10/30/2023] [Indexed: 11/30/2023] Open
Abstract
Diverse and non-Lactobacillus-dominated vaginal microbial communities are associated with adverse health outcomes such as preterm birth and the acquisition of sexually transmitted infections. Despite the importance of recognizing and understanding the key risk-associated features of these communities, their heterogeneous structure and properties remain ill-defined. Clustering approaches are commonly used to characterize vaginal communities, but they lack sensitivity and robustness in resolving substructures and revealing transitions between potential sub-communities. Here, we address this need with an approach based on mixed membership topic models. Using longitudinal data from cohorts of pregnant and non-pregnant study participants, we show that topic models more accurately describe sample composition, longitudinal changes, and better predict the loss of Lactobacillus dominance. We identify several non-Lactobacillus-dominated sub-communities common to both cohorts and independent of reproductive status. In non-pregnant individuals, we find that the menstrual cycle modulates transitions between and within sub-communities, as well as the concentrations of half of the cytokines and 18% of metabolites. Overall, our analyses based on mixed membership models reveal substructures of vaginal ecosystems which may have important clinical and biological associations.
Collapse
Affiliation(s)
- Laura Symul
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, CA 94305, USA
| | - Pratheepa Jeganathan
- Department of Mathematics and Statistics, McMaster University, 1280 Main Street, West Hamilton, Ontario, Canada L8S 4K1
| | - Elizabeth K. Costello
- Department of Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA
| | - Michael France
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W. Baltimore Street, Baltimore, MD 21201, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, 685 West Baltimore Street, HSF-I Suite 380, Baltimore, MD 21201, USA
| | - Seth M. Bloom
- Division of Infectious Diseases, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114, USA
- Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA
- Ragon Institute of MGH, MIT, and Harvard, 400 Technology Square, Cambridge, MA 02139, USA
| | - Douglas S. Kwon
- Division of Infectious Diseases, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114, USA
- Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA
- Ragon Institute of MGH, MIT, and Harvard, 400 Technology Square, Cambridge, MA 02139, USA
| | - Jacques Ravel
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W. Baltimore Street, Baltimore, MD 21201, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, 685 West Baltimore Street, HSF-I Suite 380, Baltimore, MD 21201, USA
| | - David A. Relman
- Department of Medicine, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, USA
- Department of Microbiology & Immunology, Stanford University School of Medicine, 299 Campus Drive, Stanford, CA 94305, USA
- Infectious Diseases Section, Veterans Affairs Palo Alto Health Care System, 3801 Miranda Avenue, Palo Alto, CA 94304, USA
| | - Susan Holmes
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, CA 94305, USA
| |
Collapse
|
13
|
Fukuyama J, Sankaran K, Symul L. Multiscale analysis of count data through topic alignment. Biostatistics 2023; 24:1045-1065. [PMID: 35657012 DOI: 10.1093/biostatistics/kxac018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Revised: 03/10/2022] [Accepted: 03/21/2022] [Indexed: 10/19/2023] Open
Abstract
Topic modeling is a popular method used to describe biological count data. With topic models, the user must specify the number of topics $K$. Since there is no definitive way to choose $K$ and since a true value might not exist, we develop a method, which we call topic alignment, to study the relationships across models with different $K$. In addition, we present three diagnostics based on the alignment. These techniques can show how many topics are consistently present across different models, if a topic is only transiently present, or if a topic splits into more topics when $K$ increases. This strategy gives more insight into the process of generating the data than choosing a single value of $K$ would. We design a visual representation of these cross-model relationships, show the effectiveness of these tools for interpreting the topics on simulated and real data, and release an accompanying R package, alto.
Collapse
Affiliation(s)
- Julia Fukuyama
- Department of Statistics, Indiana University Bloomington, 919 E 10th Street, Bloomington, IN 47408, USA
| | - Kris Sankaran
- Department of Statistics, University of Wisconsin - Madison, 1300 University Ave, Madison, WI 53706, USA
| | - Laura Symul
- Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, CA 94305, USA
| |
Collapse
|
14
|
LeBlanc P, Ma L. Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation. Biometrics 2023; 79:2321-2332. [PMID: 36222326 PMCID: PMC10090221 DOI: 10.1111/biom.13772] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 09/26/2022] [Indexed: 11/28/2022]
Abstract
Mixed-membership (MM) models such as latent Dirichlet allocation (LDA) have been applied to microbiome compositional data to identify latent subcommunities of microbial species. These subcommunities are informative for understanding the biological interplay of microbes and for predicting health outcomes. However, microbiome compositions typically display substantial cross-sample heterogeneities in subcommunity compositions-that is, the variability in the proportions of microbes in shared subcommunities across samples-which is not accounted for in prior analyses. As a result, LDA can produce inference, which is highly sensitive to the specification of the number of subcommunities and often divides a single subcommunity into multiple artificial ones. To address this limitation, we incorporate the logistic-tree normal (LTN) model into LDA to form a new MM model. This model allows cross-sample variation in the composition of each subcommunity around some "centroid" composition that defines the subcommunity. Incorporation of auxiliary Pólya-Gamma variables enables a computationally efficient collapsed blocked Gibbs sampler to carry out Bayesian inference under this model. By accounting for such heterogeneity, our new model restores the robustness of the inference in the specification of the number of subcommunities and allows meaningful subcommunities to be identified.
Collapse
Affiliation(s)
- Patrick LeBlanc
- Department of Statistical Sciences, Duke University, Durham, North Carolina, USA
| | - Li Ma
- Department of Statistical Sciences, Duke University, Durham, North Carolina, USA
- Department of Biostatistics and Bioinformatics, Duke University Medical School, Durham, North Carolina, USA
| |
Collapse
|
15
|
Li Y, Cheng M, Zha Y, Yang K, Tong Y, Wang S, Lu Q, Ning K. Gut microbiota and inflammation patterns for specialized athletes: a multi-cohort study across different types of sports. mSystems 2023; 8:e0025923. [PMID: 37498086 PMCID: PMC10470055 DOI: 10.1128/msystems.00259-23] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/09/2023] [Indexed: 07/28/2023] Open
Abstract
Regular high-intensity exercise can cause changes in athletes' gut microbiota, and the extent and nature of these changes may be affected by the athletes' exercise patterns. However, it is still unclear to what extent different types of athletes have distinct gut microbiome profiles and whether we can effectively monitor an athlete's inflammatory risk based on their microbiota. To address these questions, we conducted a multi-cohort study of 543 fecal samples from athletes in three different sports: aerobics (n = 316), wrestling (n = 53), and rowing (n = 174). We sought to investigate how athletes' gut microbiota was specialized for different types of sports, and its associations with inflammation, diet, anthropometrics, and anaerobic measurements. We established a microbiota catalog of multi-cohort athletes and found that athletes have specialized gut microbiota specific to the type of sport they engaged in. Using latent Dirichlet allocation, we identified 10 microbial subgroups of athletes' gut microbiota, each of which had specific correlations with inflammation, diet, and anaerobic performance in different types of athletes. Notably, most inflammation indicators were associated with Prevotella-driven subgroup 7. Finally, we found that the effects of sport types and exercise intensity on the gut microbiota were sex-dependent. These findings shed light on the complex associations between physical factors, gut microbiota, and inflammation in athletes of different sports types and could have significant implications for monitoring potential inflammation risk and developing personalized exercise programs. IMPORTANCE This study is the first multi-cohort investigation of athletes across a range of sports, including aerobics, wrestling, and rowing, with the goal of establishing a multi-sport microbiota catalog. Our findings highlight that athletes' gut microbiota is sport-specific, indicating that exercise patterns may play a significant role in shaping the microbiome. Additionally, we observed distinct associations between gut microbiota and markers of inflammation, diet, and anaerobic performance in athletes of different sports. Moreover, we expanded our analysis to include a non-athlete cohort and found that exercise intensity had varying effects on the gut microbiota of participants, depending on sex.
Collapse
Affiliation(s)
- Yuxue Li
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Center of Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Mingyue Cheng
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Center of Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Yuguo Zha
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Center of Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Kun Yang
- Exercise Immunology Center, Wuhan Sports University, Wuhan, China
| | - Yigang Tong
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Song Wang
- Exercise Immunology Center, Wuhan Sports University, Wuhan, China
| | - Qunwei Lu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Center of Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Center of Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
- Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
16
|
Peng X, Lee J, Adamow M, Maher C, Postow MA, Callahan MK, Panageas KS, Shen R. A topic modeling approach reveals the dynamic T cell composition of peripheral blood during cancer immunotherapy. CELL REPORTS METHODS 2023; 3:100546. [PMID: 37671017 PMCID: PMC10475788 DOI: 10.1016/j.crmeth.2023.100546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 02/15/2023] [Accepted: 07/10/2023] [Indexed: 09/07/2023]
Abstract
We present TopicFlow, a computational framework for flow cytometry data analysis of patient blood samples for the identification of functional and dynamic topics in circulating T cell population. This framework applies a Latent Dirichlet Allocation (LDA) model, adapting the concept of topic modeling in text mining to flow cytometry. To demonstrate the utility of our method, we conducted an analysis of ∼17 million T cells collected from 138 peripheral blood samples in 51 patients with melanoma undergoing treatment with immune checkpoint inhibitors (ICIs). Our study highlights three latent dynamic topics identified by LDA: a T cell exhaustion topic that independently recapitulates the previously identified LAG-3+ immunotype associated with ICI resistance, a naive topic and its association with immune-related toxicity, and a T cell activation topic that emerges upon ICI treatment. Our approach can be broadly applied to mine high-parameter flow cytometry data for insights into mechanisms of treatment response and toxicity.
Collapse
Affiliation(s)
- Xiyu Peng
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jasme Lee
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Matthew Adamow
- Immune Monitoring Facility, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA 94129, USA
| | - Colleen Maher
- Parker Institute for Cancer Immunotherapy, San Francisco, CA 94129, USA
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Michael A. Postow
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Weill Cornell Medical College, New York, NY 10065, USA
| | - Margaret K. Callahan
- Parker Institute for Cancer Immunotherapy, San Francisco, CA 94129, USA
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Weill Cornell Medical College, New York, NY 10065, USA
| | - Katherine S. Panageas
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ronglai Shen
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| |
Collapse
|
17
|
Tataru C, Peras M, Rutherford E, Dunlap K, Yin X, Chrisman BS, DeSantis TZ, Wall DP, Iwai S, David MM. Topic modeling for multi-omic integration in the human gut microbiome and implications for Autism. Sci Rep 2023; 13:11353. [PMID: 37443184 PMCID: PMC10345091 DOI: 10.1038/s41598-023-38228-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 07/05/2023] [Indexed: 07/15/2023] Open
Abstract
While healthy gut microbiomes are critical to human health, pertinent microbial processes remain largely undefined, partially due to differential bias among profiling techniques. By simultaneously integrating multiple profiling methods, multi-omic analysis can define generalizable microbial processes, and is especially useful in understanding complex conditions such as Autism. Challenges with integrating heterogeneous data produced by multiple profiling methods can be overcome using Latent Dirichlet Allocation (LDA), a promising natural language processing technique that identifies topics in heterogeneous documents. In this study, we apply LDA to multi-omic microbial data (16S rRNA amplicon, shotgun metagenomic, shotgun metatranscriptomic, and untargeted metabolomic profiling) from the stool of 81 children with and without Autism. We identify topics, or microbial processes, that summarize complex phenomena occurring within gut microbial communities. We then subset stool samples by topic distribution, and identify metabolites, specifically neurotransmitter precursors and fatty acid derivatives, that differ significantly between children with and without Autism. We identify clusters of topics, deemed "cross-omic topics", which we hypothesize are representative of generalizable microbial processes observable regardless of profiling method. Interpreting topics, we find each represents a particular diet, and we heuristically label each cross-omic topic as: healthy/general function, age-associated function, transcriptional regulation, and opportunistic pathogenesis.
Collapse
Affiliation(s)
- Christine Tataru
- Department of Microbiology, Oregon State University, SW Campus Way, Corvallis, USA.
| | - Marie Peras
- Second Genome Inc, 1000 Marina Blvd, Suite 500, Brisbane, CA, 94005, USA
| | - Erica Rutherford
- Second Genome Inc, 1000 Marina Blvd, Suite 500, Brisbane, CA, 94005, USA
| | - Kaiti Dunlap
- Department of Bioengineering, Serra Mall, Stanford, USA
| | - Xiaochen Yin
- Second Genome Inc, 1000 Marina Blvd, Suite 500, Brisbane, CA, 94005, USA
| | | | - Todd Z DeSantis
- Second Genome Inc, 1000 Marina Blvd, Suite 500, Brisbane, CA, 94005, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Serra Mall, Stanford, USA
- Department of Pediatrics (Systems Medicine), Stanford, 1265 Welch Road, Stanford, USA
| | - Shoko Iwai
- Second Genome Inc, 1000 Marina Blvd, Suite 500, Brisbane, CA, 94005, USA
| | - Maude M David
- Department of Microbiology, Oregon State University, SW Campus Way, Corvallis, USA.
- School of Pharmacy, Oregon State University, SW Campus Way, Corvallis, USA.
| |
Collapse
|
18
|
Frioux C, Ansorge R, Özkurt E, Ghassemi Nedjad C, Fritscher J, Quince C, Waszak SM, Hildebrand F. Enterosignatures define common bacterial guilds in the human gut microbiome. Cell Host Microbe 2023; 31:1111-1125.e6. [PMID: 37339626 DOI: 10.1016/j.chom.2023.05.024] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/03/2023] [Accepted: 05/23/2023] [Indexed: 06/22/2023]
Abstract
The human gut microbiome composition is generally in a stable dynamic equilibrium, but it can deteriorate into dysbiotic states detrimental to host health. To disentangle the inherent complexity and capture the ecological spectrum of microbiome variability, we used 5,230 gut metagenomes to characterize signatures of bacteria commonly co-occurring, termed enterosignatures (ESs). We find five generalizable ESs dominated by either Bacteroides, Firmicutes, Prevotella, Bifidobacterium, or Escherichia. This model confirms key ecological characteristics known from previous enterotype concepts, while enabling the detection of gradual shifts in community structures. Temporal analysis implies that the Bacteroides-associated ES is "core" in the resilience of westernized gut microbiomes, while combinations with other ESs often complement the functional spectrum. The model reliably detects atypical gut microbiomes correlated with adverse host health conditions and/or the presence of pathobionts. ESs provide an interpretable and generic model that enables an intuitive characterization of gut microbiome composition in health and disease.
Collapse
Affiliation(s)
- Clémence Frioux
- Food, Microbiome, and Health Institute Strategic Programme, Quadram Institute Bioscience, Norwich Research Park, NR4 7UQ Norwich, Norfolk, UK; Digital Biology, Earlham Institute NR4 7UZ Norwich, Norfolk, UK; Inria, University of Bordeaux, INRAE, 33400 Talence, France.
| | - Rebecca Ansorge
- Food, Microbiome, and Health Institute Strategic Programme, Quadram Institute Bioscience, Norwich Research Park, NR4 7UQ Norwich, Norfolk, UK; Digital Biology, Earlham Institute NR4 7UZ Norwich, Norfolk, UK
| | - Ezgi Özkurt
- Food, Microbiome, and Health Institute Strategic Programme, Quadram Institute Bioscience, Norwich Research Park, NR4 7UQ Norwich, Norfolk, UK; Digital Biology, Earlham Institute NR4 7UZ Norwich, Norfolk, UK
| | | | - Joachim Fritscher
- Food, Microbiome, and Health Institute Strategic Programme, Quadram Institute Bioscience, Norwich Research Park, NR4 7UQ Norwich, Norfolk, UK; Digital Biology, Earlham Institute NR4 7UZ Norwich, Norfolk, UK
| | - Christopher Quince
- Food, Microbiome, and Health Institute Strategic Programme, Quadram Institute Bioscience, Norwich Research Park, NR4 7UQ Norwich, Norfolk, UK; Digital Biology, Earlham Institute NR4 7UZ Norwich, Norfolk, UK
| | - Sebastian M Waszak
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo and Oslo University Hospital, Oslo 0318, Norway; Department of Neurology, University of California, San Francisco, San Francisco, CA 94148, USA; Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Falk Hildebrand
- Food, Microbiome, and Health Institute Strategic Programme, Quadram Institute Bioscience, Norwich Research Park, NR4 7UQ Norwich, Norfolk, UK; Digital Biology, Earlham Institute NR4 7UZ Norwich, Norfolk, UK.
| |
Collapse
|
19
|
Kim A, Sevanto S, Moore ER, Lubbers N. Latent Dirichlet Allocation modeling of environmental microbiomes. PLoS Comput Biol 2023; 19:e1011075. [PMID: 37289841 PMCID: PMC10249879 DOI: 10.1371/journal.pcbi.1011075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 04/05/2023] [Indexed: 06/10/2023] Open
Abstract
Interactions between stressed organisms and their microbiome environments may provide new routes for understanding and controlling biological systems. However, microbiomes are a form of high-dimensional data, with thousands of taxa present in any given sample, which makes untangling the interaction between an organism and its microbial environment a challenge. Here we apply Latent Dirichlet Allocation (LDA), a technique for language modeling, which decomposes the microbial communities into a set of topics (non-mutually-exclusive sub-communities) that compactly represent the distribution of full communities. LDA provides a lens into the microbiome at broad and fine-grained taxonomic levels, which we show on two datasets. In the first dataset, from the literature, we show how LDA topics succinctly recapitulate many results from a previous study on diseased coral species. We then apply LDA to a new dataset of maize soil microbiomes under drought, and find a large number of significant associations between the microbiome topics and plant traits as well as associations between the microbiome and the experimental factors, e.g. watering level. This yields new information on the plant-microbial interactions in maize and shows that LDA technique is useful for studying the coupling between microbiomes and stressed organisms.
Collapse
Affiliation(s)
- Anastasiia Kim
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Sanna Sevanto
- Earth and Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Eric R. Moore
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| |
Collapse
|
20
|
Shalon D, Culver RN, Grembi JA, Folz J, Treit PV, Shi H, Rosenberger FA, Dethlefsen L, Meng X, Yaffe E, Aranda-Díaz A, Geyer PE, Mueller-Reif JB, Spencer S, Patterson AD, Triadafilopoulos G, Holmes SP, Mann M, Fiehn O, Relman DA, Huang KC. Profiling the human intestinal environment under physiological conditions. Nature 2023; 617:581-591. [PMID: 37165188 PMCID: PMC10191855 DOI: 10.1038/s41586-023-05989-7] [Citation(s) in RCA: 203] [Impact Index Per Article: 101.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 03/21/2023] [Indexed: 05/12/2023]
Abstract
The spatiotemporal structure of the human microbiome1,2, proteome3 and metabolome4,5 reflects and determines regional intestinal physiology and may have implications for disease6. Yet, little is known about the distribution of microorganisms, their environment and their biochemical activity in the gut because of reliance on stool samples and limited access to only some regions of the gut using endoscopy in fasting or sedated individuals7. To address these deficiencies, we developed an ingestible device that collects samples from multiple regions of the human intestinal tract during normal digestion. Collection of 240 intestinal samples from 15 healthy individuals using the device and subsequent multi-omics analyses identified significant differences between bacteria, phages, host proteins and metabolites in the intestines versus stool. Certain microbial taxa were differentially enriched and prophage induction was more prevalent in the intestines than in stool. The host proteome and bile acid profiles varied along the intestines and were highly distinct from those of stool. Correlations between gradients in bile acid concentrations and microbial abundance predicted species that altered the bile acid pool through deconjugation. Furthermore, microbially conjugated bile acid concentrations exhibited amino acid-dependent trends that were not apparent in stool. Overall, non-invasive, longitudinal profiling of microorganisms, proteins and bile acids along the intestinal tract under physiological conditions can help elucidate the roles of the gut microbiome and metabolome in human physiology and disease.
Collapse
Affiliation(s)
| | - Rebecca Neal Culver
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
| | - Jessica A Grembi
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Jacob Folz
- West Coast Metabolomics Center, University of California, Davis, Davis, CA, USA
| | - Peter V Treit
- Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, Martinsried, Germany
| | - Handuo Shi
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Florian A Rosenberger
- Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, Martinsried, Germany
| | - Les Dethlefsen
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Eitan Yaffe
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Philipp E Geyer
- Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, Martinsried, Germany
| | - Johannes B Mueller-Reif
- Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, Martinsried, Germany
| | - Sean Spencer
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA, USA
| | - Andrew D Patterson
- Department of Veterinary and Biomedical Sciences, Pennsylvania State University, University Park, PA, USA
| | - George Triadafilopoulos
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA, USA
- Silicon Valley Neurogastroenterology and Motility Center, Mountain View, CA, USA
| | - Susan P Holmes
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Matthias Mann
- Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, Martinsried, Germany
| | - Oliver Fiehn
- West Coast Metabolomics Center, University of California, Davis, Davis, CA, USA.
- Department of Food Science and Technology, University of California, Davis, Davis, CA, USA.
| | - David A Relman
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
- Infectious Diseases Section, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA.
| | - Kerwyn Casey Huang
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
21
|
Peng X, Lee J, Adamow M, Maher C, Postow MA, Callahan MK, Panageas KS, Shen R. Uncovering the hidden structure of dynamic T cell composition in peripheral blood during cancer immunotherapy: a topic modeling approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538095. [PMID: 37162890 PMCID: PMC10168231 DOI: 10.1101/2023.04.24.538095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Immune checkpoint inhibitors (ICIs), now mainstays in the treatment of cancer treatment, show great potential but only benefit a subset of patients. A more complete understanding of the immunological mechanisms and pharmacodynamics of ICI in cancer patients will help identify the patients most likely to benefit and will generate knowledge for the development of next-generation ICI regimens. We set out to interrogate the early temporal evolution of T cell populations from longitudinal single-cell flow cytometry data. We developed an innovative statistical and computational approach using a Latent Dirichlet Allocation (LDA) model that extends the concept of topic modeling used in text mining. This powerful unsupervised learning tool allows us to discover compositional topics within immune cell populations that have distinct functional and differentiation states and are biologically and clinically relevant. To illustrate the model's utility, we analyzed ∼17 million T cells obtained from 138 pre- and on-treatment peripheral blood samples from a cohort of melanoma patients treated with ICIs. We identified three latent dynamic topics: a T-cell exhaustion topic that recapitulates a LAG3+ predominant patient subgroup with poor clinical outcome; a naive topic that shows association with immune-related toxicity; and an immune activation topic that emerges upon ICI treatment. We identified that a patient subgroup with a high baseline of the naïve topic has a higher toxicity grade. While the current application is demonstrated using flow cytometry data, our approach has broader utility and creates a new direction for translating single-cell data into biological and clinical insights.
Collapse
Affiliation(s)
- Xiyu Peng
- Department of Epidemiology and Biostatistics, San Francisco, CA
| | - Jasme Lee
- Department of Epidemiology and Biostatistics, San Francisco, CA
| | - Matthew Adamow
- Immune Monitoring Facility, San Francisco, CA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA
| | - Colleen Maher
- Parker Institute for Cancer Immunotherapy, San Francisco, CA
- Department of Medicine, Memorial Sloan Kettering Cancer Center New York, NY
| | - Michael A Postow
- Department of Medicine, Memorial Sloan Kettering Cancer Center New York, NY
- Weill Cornell Medical College, New York, NY
| | - Margaret K Callahan
- Parker Institute for Cancer Immunotherapy, San Francisco, CA
- Department of Medicine, Memorial Sloan Kettering Cancer Center New York, NY
- Weill Cornell Medical College, New York, NY
| | | | - Ronglai Shen
- Department of Epidemiology and Biostatistics, San Francisco, CA
| |
Collapse
|
22
|
McCabe SD, Nobel AB, Love MI. ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel. Biostatistics 2023; 24:388-405. [PMID: 33948626 PMCID: PMC10102900 DOI: 10.1093/biostatistics/kxab013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 03/19/2021] [Accepted: 03/23/2021] [Indexed: 11/13/2022] Open
Abstract
The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public data sets produced by genomic consortia as a reference, one can compare splicing patterns in a data set of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. We propose A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel (ACTOR), a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a data set to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression project as a reference data set, we evaluate ACTOR on simulated and real RNA-seq data sets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.
Collapse
Affiliation(s)
- Sean D McCabe
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA
| | - Andrew B Nobel
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, 318 Hanes Hall, Chapel Hill, NC 27599-3260, USA and Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599-7400, USA and Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Rd, Chapel Hill, NC 27514, USA
| |
Collapse
|
23
|
A New Algorithm for Convex Biclustering and Its Extension to the Compositional Data. STATISTICS IN BIOSCIENCES 2022. [DOI: 10.1007/s12561-022-09356-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
24
|
Ruuskanen MO, Vats D, Potbhare R, RaviKumar A, Munukka E, Ashma R, Lahti L. Towards standardized and reproducible research in skin microbiomes. Environ Microbiol 2022; 24:3840-3860. [PMID: 35229437 PMCID: PMC9790573 DOI: 10.1111/1462-2920.15945] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 02/15/2022] [Accepted: 02/16/2022] [Indexed: 12/30/2022]
Abstract
Skin is a complex organ serving a critical role as a barrier and mediator of interactions between the human body and its environment. Recent studies have uncovered how resident microbial communities play a significant role in maintaining the normal healthy function of the skin and the immune system. In turn, numerous host-associated and environmental factors influence these communities' composition and diversity across the cutaneous surface. In addition, specific compositional changes in skin microbiota have also been connected to the development of several chronic diseases. The current era of microbiome research is characterized by its reliance on large data sets of nucleotide sequences produced with high-throughput sequencing of sample-extracted DNA. These approaches have yielded new insights into many previously uncharacterized microbial communities. Application of standardized practices in the study of skin microbial communities could help us understand their complex structures, functional capacities, and health associations and increase the reproducibility of the research. Here, we overview the current research in human skin microbiomes and outline challenges specific to their study. Furthermore, we provide perspectives on recent advances in methods, analytical tools and applications of skin microbiomes in medicine and forensics.
Collapse
Affiliation(s)
- Matti O. Ruuskanen
- Department of Computing, Faculty of TechnologyUniversity of TurkuTurkuFinland
| | - Deepti Vats
- Department of Zoology, Centre of Advanced StudySavitribai Phule Pune UniversityPuneIndia
| | - Renuka Potbhare
- Department of Zoology, Centre of Advanced StudySavitribai Phule Pune UniversityPuneIndia
| | - Ameeta RaviKumar
- Institute of Bioinformatics and BiotechnologySavitribai Phule Pune UniversityPuneIndia
| | - Eveliina Munukka
- Microbiome Biobank, Institute of BiomedicineUniversity of TurkuTurkuFinland
| | - Richa Ashma
- Department of Zoology, Centre of Advanced StudySavitribai Phule Pune UniversityPuneIndia
| | - Leo Lahti
- Department of Computing, Faculty of TechnologyUniversity of TurkuTurkuFinland
| |
Collapse
|
25
|
Tataru C, Eaton A, David MM. GMEmbeddings: An R Package to Apply Embedding Techniques to Microbiome Data. FRONTIERS IN BIOINFORMATICS 2022; 2:828703. [PMID: 36304322 PMCID: PMC9580954 DOI: 10.3389/fbinf.2022.828703] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 02/11/2022] [Indexed: 11/25/2022] Open
Abstract
Large-scale microbiome studies investigating disease-inducing microbial roles base their findings on differences between microbial count data in contrasting environments (e.g., stool samples between cases and controls). These microbiome survey studies are often impeded by small sample sizes and database bias. Combining data from multiple survey studies often results in obvious batch effects, even when DNA preparation and sequencing methods are identical. Relatedly, predictive models trained on one microbial DNA dataset often do not generalize to outside datasets. In this study, we address these limitations by applying word embedding algorithms (GloVe) and PCA transformation to ASV data from the American Gut Project and generating translation matrices that can be applied to any 16S rRNA V4 region gut microbiome sequencing study. Because these approaches contextualize microbial occurrences in a larger dataset while reducing dimensionality of the feature space, they can improve generalization of predictive models that predict host phenotype from stool associated gut microbiota. The GMEmbeddings R package contains GloVe and PCA embedding transformation matrices at 50, 100 and 250 dimensions, each learned using ∼15,000 samples from the American Gut Project. It currently supports the alignment, matching, and matrix multiplication to allow users to transform their V4 16S rRNA data into these embedding spaces. We show how to correlate the properties in the new embedding space to KEGG functional pathways for biological interpretation of results. Lastly, we provide benchmarking on six gut microbiome datasets describing three phenotypes to demonstrate the ability of embedding-based microbiome classifiers to generalize to independent datasets. Future iterations of GMEmbeddings will include embedding transformation matrices for other biological systems. Available at: https://github.com/MaudeDavidLab/GMEmbeddings.
Collapse
Affiliation(s)
- Christine Tataru
- Department of Microbiology, College of Science, Oregon State University, Corvallis, OR, United States
| | - Austin Eaton
- Department of Microbiology, College of Science, Oregon State University, Corvallis, OR, United States
| | - Maude M. David
- Department of Microbiology, College of Science, Oregon State University, Corvallis, OR, United States
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
26
|
Mishra AK, Müller CL. Negative binomial factor regression with application to microbiome data analysis. Stat Med 2022; 41:2786-2803. [PMID: 35466418 PMCID: PMC9325477 DOI: 10.1002/sim.9384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 02/28/2022] [Accepted: 03/07/2022] [Indexed: 11/17/2022]
Abstract
The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host‐microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dynamics, robust generalizable associations between specific host‐associated factors and specific microbial taxa have remained largely elusive. Here, we propose factor regression models that allow the estimation of structured parsimonious associations between host‐related features and amplicon‐derived microbial taxa. To account for the overdispersed nature of the amplicon sequencing count data, we propose negative binomial reduced rank regression (NB‐RRR) and negative binomial co‐sparse factor regression (NB‐FAR). While NB‐RRR encodes the underlying dependency among the microbial abundances as outcomes and the host‐associated features as predictors through a rank‐constrained coefficient matrix, NB‐FAR uses a sparse singular value decomposition of the coefficient matrix. The latter approach avoids the notoriously difficult joint parameter estimation by extracting sparse unit‐rank components of the coefficient matrix sequentially, effectively delivering interpretable bi‐clusters of taxa and host‐associated factors. To solve the nonconvex optimization problems associated with these factor regression models, we present a novel iterative block‐wise majorization procedure. Extensive simulation studies and an application to the microbial abundance data from the American Gut Project (AGP) demonstrate the efficacy of the proposed procedure. In the AGP data, we identify several factors that strongly link dietary habits and host life style to specific microbial families.
Collapse
Affiliation(s)
- Aditya K. Mishra
- Center for Computational Mathematics, Flatiron Institute Simons Foundation New York New York USA
| | - Christian L. Müller
- Center for Computational Mathematics, Flatiron Institute Simons Foundation New York New York USA
- Department of Statistics LMU München Munich Germany
- Institute of Computational Biology Helmholtz Zentrum München Munich Germany
| |
Collapse
|
27
|
An Overview of Modern Applications of Negative Binomial Modelling in Ecology and Biodiversity. DIVERSITY 2022. [DOI: 10.3390/d14050320] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Negative binomial modelling is one of the most commonly used statistical tools for analysing count data in ecology and biodiversity research. This is not surprising given the prevalence of overdispersion (i.e., evidence that the variance is greater than the mean) in many biological and ecological studies. Indeed, overdispersion is often indicative of some form of biological aggregation process (e.g., when species or communities cluster in groups). If overdispersion is ignored, the precision of model parameters can be severely overestimated and can result in misleading statistical inference. In this article, we offer some insight as to why the negative binomial distribution is becoming, and arguably should become, the default starting distribution (as opposed to assuming Poisson counts) for analysing count data in ecology and biodiversity research. We begin with an overview of traditional uses of negative binomial modelling, before examining several modern applications and opportunities in modern ecology/biodiversity where negative binomial modelling is playing a critical role, from generalisations based on exploiting its Poisson-gamma mixture formulation in species distribution models and occurrence data analysis, to estimating animal abundance in negative binomial N-mixture models, and biodiversity measures via rank abundance distributions. Comparisons to other common models for handling overdispersion on real data are provided. We also address the important issue of software, and conclude with a discussion of future directions for analysing ecological and biological data with negative binomial models. In summary, we hope this overview will stimulate the use of negative binomial modelling as a starting point for the analysis of count data in ecology and biodiversity studies.
Collapse
|
28
|
David MM, Tataru C, Pope Q, Baker LJ, English MK, Epstein HE, Hammer A, Kent M, Sieler MJ, Mueller RS, Sharpton TJ, Tomas F, Vega Thurber R, Fern XZ. Revealing General Patterns of Microbiomes That Transcend Systems: Potential and Challenges of Deep Transfer Learning. mSystems 2022; 7:e0105821. [PMID: 35040699 PMCID: PMC8765061 DOI: 10.1128/msystems.01058-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
A growing body of research has established that the microbiome can mediate the dynamics and functional capacities of diverse biological systems. Yet, we understand little about what governs the response of these microbial communities to host or environmental changes. Most efforts to model microbiomes focus on defining the relationships between the microbiome, host, and environmental features within a specified study system and therefore fail to capture those that may be evident across multiple systems. In parallel with these developments in microbiome research, computer scientists have developed a variety of machine learning tools that can identify subtle, but informative, patterns from complex data. Here, we recommend using deep transfer learning to resolve microbiome patterns that transcend study systems. By leveraging diverse public data sets in an unsupervised way, such models can learn contextual relationships between features and build on those patterns to perform subsequent tasks (e.g., classification) within specific biological contexts.
Collapse
Affiliation(s)
- Maude M. David
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Department of Pharmaceutical Sciences, Oregon State University, Corvallis, Oregon, USA
| | - Christine Tataru
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Quintin Pope
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA
| | - Lydia J. Baker
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Mary K. English
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Hannah E. Epstein
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Austin Hammer
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Michael Kent
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Michael J. Sieler
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Ryan S. Mueller
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Thomas J. Sharpton
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Department of Statistics, Oregon State University, Corvallis, Oregon, USA
| | - Fiona Tomas
- Instituto Mediterráneo de Estudios Avanzados, IMEDEA, Esporles, Balearic Islands, Spain
| | | | - Xiaoli Z. Fern
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, USA
| |
Collapse
|
29
|
Stratification of the Gut Microbiota Composition Landscape across the Alzheimer's Disease Continuum in a Turkish Cohort. mSystems 2022; 7:e0000422. [PMID: 35133187 PMCID: PMC8823292 DOI: 10.1128/msystems.00004-22] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Alzheimer's disease (AD) is a heterogeneous disorder that spans a continuum with multiple phases, including preclinical, mild cognitive impairment, and dementia. Unlike for most other chronic diseases, human studies reporting on AD gut microbiota in the literature are very limited. With the scarcity of approved drugs for AD therapies, the rational and precise modulation of gut microbiota composition using diet and other tools is a promising approach to the management of AD. Such an approach could be personalized if an AD continuum can first be deconstructed into multiple strata based on specific microbiota features by using single or multiomics techniques. However, stratification of AD gut microbiota has not been systematically investigated before, leaving an important research gap for gut microbiota-based therapeutic approaches. Here, we analyze 16S rRNA amplicon sequencing of stool samples from 27 patients with mild cognitive impairment, 47 patients with AD, and 51 nondemented control subjects by using tools compatible with the compositional nature of microbiota. To stratify the AD gut microbiota community, we applied four machine learning techniques, including partitioning around the medoid clustering and fitting a probabilistic Dirichlet mixture model, the latent Dirichlet allocation model, and we performed topological data analysis for population-scale microbiome stratification based on the Mapper algorithm. These four distinct techniques all converge on Prevotella and Bacteroides stratification of the gut microbiota across the AD continuum, while some methods provided fine-scale resolution in stratifying the community landscape. Finally, we demonstrate that the signature taxa and neuropsychometric parameters together robustly classify the groups. Our results provide a framework for precision nutrition approaches aiming to modulate the AD gut microbiota. IMPORTANCE The prevalence of AD worldwide is estimated to reach 131 million by 2050. Most disease-modifying treatments and drug trials have failed, due partly to the heterogeneous and complex nature of the disease. Recent studies demonstrated that gut dybiosis can influence normal brain function through the so-called "gut-brain axis." Modulation of the gut microbiota, therefore, has drawn strong interest in the clinic in the management of the disease. However, there is unmet need for microbiota-informed stratification of AD clinical cohorts for intervention studies aiming to modulate the gut microbiota. Our study fills in this gap and draws attention to the need for microbiota stratification as the first step for microbiota-based therapy. We demonstrate that while Prevotella and Bacteroides clusters are the consensus partitions, the newly developed probabilistic methods can provide fine-scale resolution in partitioning the AD gut microbiome landscape.
Collapse
|
30
|
Vaginal microbiome topic modeling of laboring Ugandan women with and without fever. NPJ Biofilms Microbiomes 2021; 7:75. [PMID: 34508087 PMCID: PMC8433417 DOI: 10.1038/s41522-021-00244-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 08/13/2021] [Indexed: 12/12/2022] Open
Abstract
The composition of the maternal vaginal microbiome influences the duration of pregnancy, onset of labor, and even neonatal outcomes. Maternal microbiome research in sub-Saharan Africa has focused on non-pregnant and postpartum composition of the vaginal microbiome. Here we aimed to illustrate the relationship between the vaginal microbiome of 99 laboring Ugandan women and intrapartum fever using routine microbiology and 16S ribosomal RNA gene sequencing from two hypervariable regions (V1–V2 and V3–V4). To describe the vaginal microbes associated with vaginal microbial communities, we pursued two approaches: hierarchical clustering methods and a novel Grades of Membership (GoM) modeling approach for vaginal microbiome characterization. Leveraging GoM models, we created a basis composed of a preassigned number of microbial topics whose linear combination optimally represents each patient yielding more comprehensive associations and characterization between maternal clinical features and the microbial communities. Using a random forest model, we showed that by including microbial topic models we improved upon clinical variables to predict maternal fever. Overall, we found a higher prevalence of Granulicatella, Streptococcus, Fusobacterium, Anaerococcus, Sneathia, Clostridium, Gemella, Mobiluncus, and Veillonella genera in febrile mothers, and higher prevalence of Lactobacillus genera (in particular L. crispatus and L. jensenii), Acinobacter, Aerococcus, and Prevotella species in afebrile mothers. By including clinical variables with microbial topics in this model, we observed young maternal age, fever reported earlier in the pregnancy, longer labor duration, and microbial communities with reduced Lactobacillus diversity were associated with intrapartum fever. These results better defined relationships between the presence or absence of intrapartum fever, demographics, peripartum course, and vaginal microbial topics, and expanded our understanding of the impact of the microbiome on maternal and potentially neonatal outcome risk.
Collapse
|
31
|
Statistical Modeling of High Dimensional Counts. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2284:97-134. [PMID: 33835440 DOI: 10.1007/978-1-0716-1307-8_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Statistical modeling of count data from RNA sequencing (RNA-seq) experiments is important for proper interpretation of results. Here I will describe how count data can be modeled using count distributions, or alternatively analyzed using nonparametric methods. I will focus on basic routines for performing data input, scaling/normalization, visualization, and statistical testing to determine sets of features where the counts reflect differences in gene expression across samples. Finally, I discuss limitations and possible extensions to the models presented here.
Collapse
|
32
|
Shuler K, Verbanic S, Chen IA, Lee J. A Bayesian nonparametric analysis for zero‐inflated multivariate count data with application to microbiome study. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Kurtis Shuler
- Sandia National Laboratories in Albuquerque Albuquerque NM USA
| | - Samuel Verbanic
- Department of Chemical and Biomolecular Engineering University of California Los Angeles Los Angeles CA USA
| | - Irene A. Chen
- Department of Chemical and Biomolecular Engineering University of California Los Angeles Los Angeles CA USA
| | - Juhee Lee
- Department of Statistics University of California Santa Cruz Santa Cruz CA USA
| |
Collapse
|
33
|
Jeganathan P, Holmes SP. A Statistical Perspective on the Challenges in Molecular Microbial Biology. JOURNAL OF AGRICULTURAL, BIOLOGICAL, AND ENVIRONMENTAL STATISTICS 2021; 26:131-160. [PMID: 36398283 PMCID: PMC9667415 DOI: 10.1007/s13253-021-00447-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 02/15/2021] [Accepted: 02/24/2021] [Indexed: 12/13/2022]
Abstract
High throughput sequencing (HTS)-based technology enables identifying and quantifying non-culturable microbial organisms in all environments. Microbial sequences have enhanced our understanding of the human microbiome, the soil and plant environment, and the marine environment. All molecular microbial data pose statistical challenges due to contamination sequences from reagents, batch effects, unequal sampling, and undetected taxa. Technical biases and heteroscedasticity have the strongest effects, but different strains across subjects and environments also make direct differential abundance testing unwieldy. We provide an introduction to a few statistical tools that can overcome some of these difficulties and demonstrate those tools on an example. We show how standard statistical methods, such as simple hierarchical mixture and topic models, can facilitate inferences on latent microbial communities. We also review some nonparametric Bayesian approaches that combine visualization and uncertainty quantification. The intersection of molecular microbial biology and statistics is an exciting new venue. Finally, we list some of the important open problems that would benefit from more careful statistical method development.
Collapse
Affiliation(s)
- Pratheepa Jeganathan
- Department of Statistics, Stanford University, Sequoia Hall, 390 Jane Stanford Way, Stanford, CA 94305, USA
| | - Susan P Holmes
- Department of Statistics, Stanford University, Sequoia Hall, 390 Jane Stanford Way, Stanford, CA 94305, USA
| |
Collapse
|
34
|
Clouse KM, Wagner MR. Plant Genetics as a Tool for Manipulating Crop Microbiomes: Opportunities and Challenges. Front Bioeng Biotechnol 2021; 9:567548. [PMID: 34136470 PMCID: PMC8201784 DOI: 10.3389/fbioe.2021.567548] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 05/05/2021] [Indexed: 11/22/2022] Open
Abstract
Growing human population size and the ongoing climate crisis create an urgent need for new tools for sustainable agriculture. Because microbiomes have profound effects on host health, interest in methods of manipulating agricultural microbiomes is growing rapidly. Currently, the most common method of microbiome manipulation is inoculation of beneficial organisms or engineered communities; however, these methods have been met with limited success due to the difficulty of establishment in complex farm environments. Here we propose genetic manipulation of the host plant as another avenue through which microbiomes could be manipulated. We discuss how domestication and modern breeding have shaped crop microbiomes, as well as the potential for improving plant-microbiome interactions through conventional breeding or genetic engineering. We summarize the current state of knowledge on host genetic control of plant microbiomes, as well as the key challenges that remain.
Collapse
Affiliation(s)
- Kayla M. Clouse
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, United States
| | - Maggie R. Wagner
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, United States
- Kansas Biological Survey, University of Kansas, Lawrence, KS, United States
| |
Collapse
|
35
|
Kong Y, Kozik A, Nakatsu CH, Jones-Hall YL, Chun H. A zero-inflated non-negative matrix factorization for the deconvolution of mixed signals of biological data. Int J Biostat 2021; 18:203-218. [PMID: 33783171 DOI: 10.1515/ijb-2020-0039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 02/23/2021] [Indexed: 12/18/2022]
Abstract
A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.
Collapse
Affiliation(s)
- Yixin Kong
- Department of Mathematics and Statistics, Boston University, Boston, MA02215, USA
| | - Ariangela Kozik
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI48104, USA
| | - Cindy H Nakatsu
- Department of Agronomy, Purdue University, West Lafayette, IN47905, USA
| | - Yava L Jones-Hall
- College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, Texas77843, USA
| | - Hyonho Chun
- Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon34141, South Korea
| |
Collapse
|
36
|
Breuninger TA, Wawro N, Breuninger J, Reitmeier S, Clavel T, Six-Merker J, Pestoni G, Rohrmann S, Rathmann W, Peters A, Grallert H, Meisinger C, Haller D, Linseisen J. Associations between habitual diet, metabolic disease, and the gut microbiota using latent Dirichlet allocation. MICROBIOME 2021; 9:61. [PMID: 33726846 PMCID: PMC7967986 DOI: 10.1186/s40168-020-00969-9] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 12/06/2020] [Indexed: 06/10/2023]
Abstract
BACKGROUND The gut microbiome impacts human health through various mechanisms and is involved in the development of a range of non-communicable diseases. Diet is a well-known factor influencing microbe-host interaction in health and disease. However, very few findings are based on large-scale analysis using population-based studies. Our aim was to investigate the cross-sectional relationship between habitual dietary intake and gut microbiota structure in the Cooperative Health Research in the Region of Augsburg (KORA) FF4 study. RESULTS Fecal microbiota was analyzed using 16S rRNA gene amplicon sequencing. Latent Dirichlet allocation (LDA) was applied to samples from 1992 participants to identify 20 microbial subgroups within the study population. Each participant's gut microbiota was subsequently described by a unique composition of these 20 subgroups. Associations between habitual dietary intake, assessed via repeated 24-h food lists and a Food Frequency Questionnaire, and the 20 subgroups, as well as between prevalence of metabolic diseases/risk factors and the subgroups, were assessed with multivariate-adjusted Dirichlet regression models. After adjustment for multiple testing, eight of 20 microbial subgroups were significantly associated with habitual diet, while nine of 20 microbial subgroups were associated with the prevalence of one or more metabolic diseases/risk factors. Subgroups 5 (Faecalibacterium, Lachnospiracea incertae sedis, Gemmiger, Roseburia) and 14 (Coprococcus, Bacteroides, Faecalibacterium, Ruminococcus) were particularly strongly associated with diet. For example, participants with a high probability for subgroup 5 were characterized by a higher Alternate Healthy Eating Index and Mediterranean Diet Score and a higher intake of food items such as fruits, vegetables, legumes, and whole grains, while participants with prevalent type 2 diabetes mellitus were characterized by a lower probability for subgroup 5. CONCLUSIONS The associations between habitual diet, metabolic diseases, and microbial subgroups identified in this analysis not only expand upon current knowledge of diet-microbiota-disease relationships, but also indicate the possibility of certain microbial groups to be modulated by dietary intervention, with the potential of impacting human health. Additionally, LDA appears to be a powerful tool for interpreting latent structures of the human gut microbiota. However, the subgroups and associations observed in this analysis need to be replicated in further studies. Video abstract.
Collapse
Affiliation(s)
- Taylor A. Breuninger
- Independent Research Unit Clinical Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
- Ludwig-Maximilians-Universität München, UNIKA-T Augsburg, Neusässer Str. 47, 86156 Augsburg, Germany
| | - Nina Wawro
- Independent Research Unit Clinical Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
- Ludwig-Maximilians-Universität München, UNIKA-T Augsburg, Neusässer Str. 47, 86156 Augsburg, Germany
| | | | - Sandra Reitmeier
- Technische Universität München, Gregor-Mendel-Str. 2, 85354 Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Thomas Clavel
- ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
- Functional Microbiome Research Group, Institute of Medical Microbiology, RWTH University Hospital, Pauwelsstrasse 30, 52074 Aachen, Germany
| | - Julia Six-Merker
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Giulia Pestoni
- Division of Chronic Disease Epidemiology, Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Hirschengraben 84, CH-8001 Zurich, Switzerland
| | - Sabine Rohrmann
- Division of Chronic Disease Epidemiology, Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Hirschengraben 84, CH-8001 Zurich, Switzerland
| | - Wolfgang Rathmann
- Institute for Biometrics and Epidemiology, Deutsches Diabetes-Zentrum (DDZ), Auf’m Hennekamp 65, 40225 Düsseldorf, Germany
| | - Annette Peters
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Harald Grallert
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Christa Meisinger
- Independent Research Unit Clinical Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
- Ludwig-Maximilians-Universität München, UNIKA-T Augsburg, Neusässer Str. 47, 86156 Augsburg, Germany
| | - Dirk Haller
- Technische Universität München, Gregor-Mendel-Str. 2, 85354 Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Jakob Linseisen
- Independent Research Unit Clinical Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
- Ludwig-Maximilians-Universität München, UNIKA-T Augsburg, Neusässer Str. 47, 86156 Augsburg, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
37
|
Moreno-Indias I, Lahti L, Nedyalkova M, Elbere I, Roshchupkin G, Adilovic M, Aydemir O, Bakir-Gungor B, Santa Pau ECD, D’Elia D, Desai MS, Falquet L, Gundogdu A, Hron K, Klammsteiner T, Lopes MB, Marcos-Zambrano LJ, Marques C, Mason M, May P, Pašić L, Pio G, Pongor S, Promponas VJ, Przymus P, Saez-Rodriguez J, Sampri A, Shigdel R, Stres B, Suharoschi R, Truu J, Truică CO, Vilne B, Vlachakis D, Yilmaz E, Zeller G, Zomer AL, Gómez-Cabrero D, Claesson MJ. Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions. Front Microbiol 2021; 12:635781. [PMID: 33692771 PMCID: PMC7937616 DOI: 10.3389/fmicb.2021.635781] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/28/2021] [Indexed: 12/23/2022] Open
Abstract
The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 "ML4Microbiome" that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies.
Collapse
Affiliation(s)
- Isabel Moreno-Indias
- Instituto de Investigación Biomédica de Málaga (IBIMA), Unidad de Gestión Clìnica de Endocrinologìa y Nutrición, Hospital Clìnico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain
- Centro de Investigación Biomeìdica en Red de Fisiopatologtìa de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Miroslava Nedyalkova
- Human Genetics and Disease Mechanisms, Latvian Biomedical Research and Study Centre, Riga, Latvia
| | - Ilze Elbere
- Latvian Biomedical Research and Study Centre, Riga, Latvia
| | | | - Muhamed Adilovic
- Department of Genetics and Bioengineering, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Onder Aydemir
- Department of Electrical and Electronics Engineering, Karadeniz Technical University, Trabzon, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | | | - Domenica D’Elia
- Department for Biomedical Sciences, Institute for Biomedical Technologies, National Research Council, Bari, Italy
| | - Mahesh S. Desai
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
- Odense Research Center for Anaphylaxis, Department of Dermatology and Allergy Center, Odense University Hospital, University of Southern Denmark, Odense, Denmark
| | - Laurent Falquet
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Aycan Gundogdu
- Department of Microbiology and Clinical Microbiology, Faculty of Medicine, Erciyes University, Kayseri, Turkey
- Metagenomics Laboratory, Genome and Stem Cell Center (GenKök), Erciyes University, Kayseri, Turkey
| | - Karel Hron
- Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia
| | | | - Marta B. Lopes
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal
- Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal
| | - Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | - Cláudia Marques
- CINTESIS, NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Michael Mason
- Computational Oncology, Sage Bionetworks, Seattle, WA, United States
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Lejla Pašić
- Sarajevo Medical School, University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
| | - Sándor Pongor
- Faculty of Information Tehnology and Bionics, Pázmány University, Budapest, Hungary
| | - Vasilis J. Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Piotr Przymus
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruñ, Poland
| | - Julio Saez-Rodriguez
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Heidelberg, Germany
| | - Alexia Sampri
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Blaz Stres
- Jozef Stefan Institute, Ljubljana, Slovenia
- Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
- Faculty of Civil and Geodetic Engineering, University of Ljubljana, Ljubljana, Slovenia
| | - Ramona Suharoschi
- Molecular Nutrition and Proteomics Lab, Faculty of the Food Science and Technology, Institute of Life Sciences, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Cluj-Napoca, Romania
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Ciprian-Octavian Truică
- Department of Computer Science and Engineering, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania
| | - Baiba Vilne
- Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia
| | - Dimitrios Vlachakis
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Ercument Yilmaz
- Department of Computer Technologies, Karadeniz Technical University, Trabzon, Turkey
| | - Georg Zeller
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Aldert L. Zomer
- Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands
| | - David Gómez-Cabrero
- Navarrabiomed, Complejo Hospitalario de Navarra (CHN), IdiSNA, Universidad Pública de Navarra (UPNA), Pamplona, Spain
| | - Marcus J. Claesson
- School of Microbiology and APC Microbiome Ireland, University College Cork, Cork, Ireland
| |
Collapse
|
38
|
Zhou X, Leite MFA, Zhang Z, Tian L, Chang J, Ma L, Li X, van Veen JA, Tian C, Kuramae EE. Facilitation in the soil microbiome does not necessarily lead to niche expansion. ENVIRONMENTAL MICROBIOME 2021; 16:4. [PMID: 33902741 PMCID: PMC8067652 DOI: 10.1186/s40793-021-00373-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Accepted: 01/21/2021] [Indexed: 05/24/2023]
Abstract
BACKGROUND The soil microbiome drives soil ecosystem function, and soil microbial functionality is directly linked to interactions between microbes and the soil environment. However, the context-dependent interactions in the soil microbiome remain largely unknown. RESULTS Using latent variable models (LVMs), we disentangle the biotic and abiotic interactions of soil bacteria, fungi and environmental factors using the Qinghai-Tibetan Plateau soil ecosystem as a model. Our results show that soil bacteria and fungi not only interact with each other but also shift from competition to facilitation or vice versa depending on environmental variation; that is, the nature of their interactions is context-dependent. CONCLUSIONS Overall, elevation is the environmental gradient that most promotes facilitative interactions among microbes but is not a major driver of soil microbial community composition, as evidenced by variance partitioning. The larger the tolerance of a microbe to a specific environmental gradient, the lesser likely it is to interact with other soil microbes, which suggests that facilitation does not necessarily lead to niche expansion.
Collapse
Affiliation(s)
- Xue Zhou
- College of Resources and Environment, Jilin Agricultural University, Changchun, China
- Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Márcio F A Leite
- Department of Microbial Ecology, Netherlands Institute of Ecology NIOO-KNAW, Wageningen, the Netherlands
| | - Zhenqing Zhang
- Key Laboratory of Wetland Ecology and Environment, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Lei Tian
- Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Jingjing Chang
- Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Lina Ma
- Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Xiujun Li
- Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Johannes A van Veen
- Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
- Department of Microbial Ecology, Netherlands Institute of Ecology NIOO-KNAW, Wageningen, the Netherlands
| | - Chunjie Tian
- Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China.
| | - Eiko E Kuramae
- Department of Microbial Ecology, Netherlands Institute of Ecology NIOO-KNAW, Wageningen, the Netherlands.
- Ecology and biodiversity, Institute of Environmental Biology, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
39
|
Deek RA, Li H. A Zero-Inflated Latent Dirichlet Allocation Model for Microbiome Studies. Front Genet 2021; 11:602594. [PMID: 33552122 PMCID: PMC7862749 DOI: 10.3389/fgene.2020.602594] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 12/29/2020] [Indexed: 11/13/2022] Open
Abstract
The human microbiome consists of a community of microbes in varying abundances and is shown to be associated with many diseases. An important first step in many microbiome studies is to identify possible distinct microbial communities in a given data set and to identify the important bacterial taxa that characterize these communities. The data from typical microbiome studies are high dimensional count data with excessive zeros due to both absence of species (structural zeros) and low sequencing depth or dropout. Although methods have been developed for identifying the microbial communities based on mixture models of counts, these methods do not account for excessive zeros observed in the data and do not differentiate structural from sampling zeros. In this paper, we introduce a zero-inflated Latent Dirichlet Allocation model (zinLDA) for sparse count data observed in microbiome studies. zinLDA builds on the flexible Latent Dirichlet Allocation model and allows for zero inflation in observed counts. We develop an efficient Markov chain Monte Carlo (MCMC) sampling procedure to fit the model. Results from our simulations show zinLDA provides better fits to the data and is able to separate structural zeros from sampling zeros. We apply zinLDA to the data set from the American Gut Project and identify microbial communities characterized by different bacterial genera.
Collapse
Affiliation(s)
- Rebecca A Deek
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
40
|
Hosoda S, Nishijima S, Fukunaga T, Hattori M, Hamada M. Revealing the microbial assemblage structure in the human gut microbiome using latent Dirichlet allocation. MICROBIOME 2020; 8:95. [PMID: 32576288 PMCID: PMC7313204 DOI: 10.1186/s40168-020-00864-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 05/13/2020] [Indexed: 06/10/2023]
Abstract
BACKGROUND The human gut microbiome has been suggested to affect human health and thus has received considerable attention. To clarify the structure of the human gut microbiome, clustering methods are frequently applied to human gut taxonomic profiles. Enterotypes, i.e., clusters of individuals with similar microbiome composition, are well-studied and characterized. However, only a few detailed studies on assemblages, i.e., clusters of co-occurring bacterial taxa, have been conducted. Particularly, the relationship between the enterotype and assemblage is not well-understood. RESULTS In this study, we detected gut microbiome assemblages using a latent Dirichlet allocation (LDA) method. We applied LDA to a large-scale human gut metagenome dataset and found that a 4-assemblage LDA model could represent relationships between enterotypes and assemblages with high interpretability. This model indicated that each individual tends to have several assemblages, three of which corresponded to the three classically recognized enterotypes. Conversely, the fourth assemblage corresponded to no enterotypes and emerged in all enterotypes. Interestingly, the dominant genera of this assemblage (Clostridium, Eubacterium, Faecalibacterium, Roseburia, Coprococcus, and Butyrivibrio) included butyrate-producing species such as Faecalibacterium prausnitzii. Indeed, the fourth assemblage significantly positively correlated with three butyrate-producing functions. CONCLUSIONS We conducted an assemblage analysis on a large-scale human gut metagenome dataset using LDA. The present study revealed that there is an enterotype-independent assemblage. Video Abstract.
Collapse
Affiliation(s)
- Shion Hosoda
- Graduate School of Advanced Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo, 169–8555 Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Suguru Nishijima
- Graduate School of Advanced Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo, 169–8555 Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
- Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Tsukasa Fukunaga
- Graduate School of Advanced Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo, 169–8555 Japan
- Department of Computer Science, Graduate School of Information Science and Engineering, The University of Tokyo, Tokyo, Japan
| | - Masahira Hattori
- Graduate School of Advanced Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo, 169–8555 Japan
- Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
- RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Michiaki Hamada
- Graduate School of Advanced Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo, 169–8555 Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
- Graduate School of Medicine, Nippon Medical School, Tokyo, Japan
- Center for Data Science, Waseda University, Tokyo, Japan
| |
Collapse
|
41
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
42
|
Santacroce L, Mavaddati S, Hamedi J, Zeinali B, Ballini A, Bilancia M. Expressive Analysis of Gut Microbiota in Pre- and Post- Solid Organ Transplantation Using Bayesian Topic Models. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS – ICCSA 2020 2020. [DOI: 10.1007/978-3-030-58811-3_11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
43
|
Morton JT, Aksenov AA, Nothias LF, Foulds JR, Quinn RA, Badri MH, Swenson TL, Van Goethem MW, Northen TR, Vazquez-Baeza Y, Wang M, Bokulich NA, Watters A, Song SJ, Bonneau R, Dorrestein PC, Knight R. Learning representations of microbe-metabolite interactions. Nat Methods 2019; 16:1306-1314. [PMID: 31686038 PMCID: PMC6884698 DOI: 10.1038/s41592-019-0616-3] [Citation(s) in RCA: 157] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 09/19/2019] [Indexed: 12/26/2022]
Abstract
Integrating multiomics datasets is critical for microbiome research; however, inferring interactions across omics datasets has multiple statistical challenges. We solve this problem by using neural networks (https://github.com/biocore/mmvec) to estimate the conditional probability that each molecule is present given the presence of a specific microorganism. We show with known environmental (desert soil biocrust wetting) and clinical (cystic fibrosis lung) examples, our ability to recover microbe-metabolite relationships, and demonstrate how the method can discover relationships between microbially produced metabolites and inflammatory bowel disease.
Collapse
Affiliation(s)
- James T Morton
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alexander A Aksenov
- Collaborative Mass Spectrometry Innovaftion Center, University of California, San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Louis Felix Nothias
- Collaborative Mass Spectrometry Innovaftion Center, University of California, San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - James R Foulds
- Department of Information Systems, University of Maryland Baltimore County, Baltimore, MD, USA
| | - Robert A Quinn
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | | | - Tami L Swenson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Marc W Van Goethem
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Trent R Northen
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- DOE Joint Genome Institute, Walnut Creek, CA, USA
| | - Yoshiki Vazquez-Baeza
- Jacobs School of Engineering, University of California, San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
| | - Mingxun Wang
- Collaborative Mass Spectrometry Innovaftion Center, University of California, San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Nicholas A Bokulich
- The Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA
| | - Aaron Watters
- Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Se Jin Song
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
| | - Richard Bonneau
- Department of Biology, New York University, New York, NY, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA
- Computer Science Department, Courant Institute, New York, NY, USA
- Center For Data Science, New York University, New York, NY, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovaftion Center, University of California, San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
44
|
Holmes S. Successful strategies for human microbiome data generation, storage and analyses. J Biosci 2019; 44:111. [PMID: 31719220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Current interest in the potential for clinical use of new tools for improving human health are now focused on techniques for the study of the human microbiome and its interaction with environmental and clinical covariates. This review outlines the use of statistical strategies that have been developed in past studies and can inform successful design and analyses of controlled perturbation experiments performed in the human microbiome. We carefully outline what the data are, their imperfections and how we need to transform, decontaminate and denoise them. We show how to identify the important unknown parameters and how to can leverage variability we see to produce efficient models for prediction and uncertainty quantification. We encourage a reproducible strategy that builds on best practice principles that can be adapted for effective experimental design and reproducible workflows. Nonparametric, data-driven denoising strategies already provide the best strain identification and decontamination methods. Data driven models can be combined with uncertainty quantification to provide reproducible aids to decision making in the clinical context, as long as careful, separate, registered confirmatory testing are undertaken. Here we provide guidelines for effective longitudinal studies and their analyses. Lessons learned along the way are that visualizations at every step can pinpoint problems and outliers, normalization and filtering improve power in downstream testing. We recommend collecting and binding the metadata and covariates to sample descriptors and recording complete computer scripts into an R markdown supplement that can reduce opportunities for human error and enable collaborators and readers to replicate all the steps of the study. Finally, we note that optimizing the bioinformatic and statistical workflow involves adopting a wait-and-see approach that is particularly effective in cases where the features such as 'mass spectrometry peaks' and metagenomic tables can only be partially annotated.
Collapse
Affiliation(s)
- Susan Holmes
- Statistics Department, Sequoia Hall, Stanford, CA 94305, USA,
| |
Collapse
|
45
|
Shetty SA, Lahti L. Microbiome data science. J Biosci 2019; 44:115. [PMID: 31719224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Best practices from open data science are spreading across research fields, providing new opportunities for research and education. Open data science emphasizes the view that digitalization is enabling new forms of resource sharing, collaboration and outreach. This has the potential to improve the overall transparency and efficiency of research. Microbiome bioinformatics is a rapidly developing area that can greatly benefit from this progress. The concept of microbiome data science refers to the application of best practices from open data science to microbiome bioinformatics. The increasing availability of open data and new opportunities to collaborate online are greatly facilitating the development of this field. A microbiome data science ecosystem combines experimental research data with open data processing and analysis and reproducible tutorials that can also serve as an educational resource. Here, we provide an overview of the current status of microbiome data science from a community developer perspective and propose directions for future development of the field.
Collapse
Affiliation(s)
- Sudarshan A Shetty
- Laboratory of Microbiology, Wageningen University and Research, Wageningen, The Netherlands
| | | |
Collapse
|
46
|
|
47
|
Shi TT, Hua L, Wang H, Xin Z. The Potential Link between Gut Microbiota and Serum TRAb in Chinese Patients with Severe and Active Graves' Orbitopathy. Int J Endocrinol 2019; 2019:9736968. [PMID: 31933641 PMCID: PMC6942819 DOI: 10.1155/2019/9736968] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/04/2019] [Revised: 11/07/2019] [Accepted: 11/11/2019] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND AND OBJECTIVE A previous study reported alterations in the intestinal microbiota in patients with Graves' orbitopathy (GO). Thyrotropin receptor autoantibody (TRAb) stimulates orbital and periorbital tissues and plays a pivotal role in the development of GO. However, the association between gut microbiota and TRAb in GO patients has still remained elusive. In this study, we explored the relationships between gut microbiota and GO-related traits, in which we applied a metabolic-network-driven analysis to identify GO trait-related modules and extracted significant operational taxonomic units (OTUs). METHODS In the present study, we profiled gut microbiota using 16S rRNA gene sequencing in 31 GO patients. We performed metabolic-network-driven analysis to investigate the association between gut microbiota and GO-related traits (e.g., TRAb, TGAb, and TPOAb) in the combination of microbial effects. RESULTS Applying microbiome network analysis of cooccurrence patterns and analysis of topological properties, we found that s_Prevotella_copri and f_Prevotellaceae showed a significant correlation with TRAb. In particular, we applied the latent class model to explore the association between gut microbiota and GO-related traits in the combination of microbial effects. It was revealed that the subjects involved in the latent class model with the higher abundance of s_Prevotella_copri and g_Bacteroides had a higher TRAb level. CONCLUSIONS Our results revealed the potential relationships between gut microbiota and GO-related traits in the combination of microbial effects. This study may provide a new insight into the interaction between the intestinal microbiota and TRAb-associated immune responses in GO patients.
Collapse
Affiliation(s)
- Ting-Ting Shi
- Department of Endocrinology, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Lin Hua
- Department of Mathematics, School of Biomedical Engineering, Capital Medical University, Beijing, China
| | - Hua Wang
- Department of Emergency, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Zhong Xin
- Department of Endocrinology, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| |
Collapse
|