1
|
Creus-Martí I, Moya A, Santonja FJ. Methodology for microbiome data analysis: An overview. Comput Biol Med 2025; 192:110157. [PMID: 40279974 DOI: 10.1016/j.compbiomed.2025.110157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 03/07/2025] [Accepted: 04/04/2025] [Indexed: 04/29/2025]
Abstract
It is known that microbiome and health are related, in addition, recent research has found that microbiome has potential clinical uses. These facts highlight the importance of the microbiome in actual science. However, microbiome data has some characteristics that makes its statistical study challenging. In recent years, longitudinal and non-longitudinal methods have been designed to analyze the microbiota and knowing more about the bacterial behavior. In this article in the form of a review we summarize the characteristics of microbiome data and the statistical methods most widespread to analyze it. We have taken into account if the strategies are longitudinal or not. We also classify the methods based on their specific analytical objectives and based on their mathematical characteristics. The methods are structured according to their biological goals and mathematical features, ensuring that the insights provided are both relevant and accessible to professionals in biology and statistics. We present this review as a reference for the most widely used methods in microbiome data analysis and as a foundation for identifying potential areas for future research. We want to point out that this review can be particularly useful to remark the importance of the methodology designed in order to study microbiome longitudinal datasets.
Collapse
Affiliation(s)
- Irene Creus-Martí
- Department of Applied Mathematics, Universitat Politècnica de València, Valencia, Spain.
| | - Andrés Moya
- Institute for Integrative Systems Biology (I2Sysbio), Universitat de València and CSIC, València, Spain; The Foundation for the Promotion of Health and Biomedical Research of Valencia Region (FISABIO), Valencia, Spain; CIBER in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Francisco J Santonja
- Department of Statistics and Operation Research, Universitat de València, Valencia, Spain
| |
Collapse
|
2
|
Chen YC, Su YY, Chu TY, Wu MF, Huang CC, Lin CC. PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts. NPJ Biofilms Microbiomes 2025; 11:3. [PMID: 39753565 PMCID: PMC11698977 DOI: 10.1038/s41522-024-00598-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 10/29/2024] [Indexed: 01/06/2025] Open
Abstract
The intricate nature of microbiota sequencing data-high dimensionality and sparsity-presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framework that harnesses microbes' prevalence to facilitate consistent selection in sparse microbiota data. Upon rigorous benchmarking against established feature selection methodologies across 42 microbiome datasets, PreLect demonstrated superior classification capabilities compared to statistical methods and outperformed machine learning-based methods by selecting features with greater prevalence and abundance. A significant strength of PreLect lies in its ability to reliably identify reproducible microbial features across varied cohorts. Applied to colorectal cancer, PreLect identifies key microbes and highlights crucial pathways, such as lipopolysaccharide and glycerophospholipid biosynthesis, in cancer progression. This case study exemplifies PreLect's utility in discerning clinically relevant microbial signatures. In summary, PreLect's accuracy and robustness make it a significant advancement in the analysis of complex microbiota data.
Collapse
Grants
- NSTC 112-2221-E-A49 -106 -MY3 Ministry of Science and Technology, Taiwan (Ministry of Science and Technology of Taiwan)
- NSTC 109-2221-E-010 -014 -MY3 Ministry of Science and Technology, Taiwan (Ministry of Science and Technology of Taiwan)
- NSTC 109-2221-E-010 -014 -MY3 Ministry of Science and Technology, Taiwan (Ministry of Science and Technology of Taiwan)
- NSTC 112-2221-E-A49 -106 -MY3 Ministry of Science and Technology, Taiwan (Ministry of Science and Technology of Taiwan)
- NSTC 109-2221-E-010 -014 -MY3 Ministry of Science and Technology, Taiwan (Ministry of Science and Technology of Taiwan)
- NSTC 109-2221-E-010 -014 -MY3 Ministry of Science and Technology, Taiwan (Ministry of Science and Technology of Taiwan)
- NSTC 109-2221-E-010 -014 -MY3 Ministry of Science and Technology, Taiwan (Ministry of Science and Technology of Taiwan)
- MOHW112-TDU-B-222-124013 Ministry of Health and Welfare (Ministry of Health and Welfare, Taiwan)
- MOHW111-TDU-B-221-114007 Ministry of Health and Welfare (Ministry of Health and Welfare, Taiwan)
- MOHW112-TDU-B-222-124013 Ministry of Health and Welfare (Ministry of Health and Welfare, Taiwan)
- MOHW111-TDU-B-221-114007 Ministry of Health and Welfare (Ministry of Health and Welfare, Taiwan)
Collapse
Affiliation(s)
- Yin-Cheng Chen
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Yin-Yuan Su
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Tzu-Yu Chu
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ming-Fong Wu
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Chieh-Chun Huang
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Chen-Ching Lin
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan.
| |
Collapse
|
3
|
Sun H, Wang Y, Xiao Z, Huang X, Wang H, He T, Jiang X. multiMiAT: an optimal microbiome-based association test for multicategory phenotypes. Brief Bioinform 2023; 24:7005163. [PMID: 36702753 DOI: 10.1093/bib/bbad012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/31/2022] [Accepted: 01/03/2023] [Indexed: 01/28/2023] Open
Abstract
Microbes can affect the metabolism and immunity of human body incessantly, and the dysbiosis of human microbiome drives not only the occurrence but also the progression of disease (i.e. multiple statuses of disease). Recently, microbiome-based association tests have been widely developed to detect the association between the microbiome and host phenotype. However, the existing methods have not achieved satisfactory performance in testing the association between the microbiome and ordinal/nominal multicategory phenotypes (e.g. disease severity and tumor subtype). In this paper, we propose an optimal microbiome-based association test for multicategory phenotypes, namely, multiMiAT. Specifically, under the multinomial logit model framework, we first introduce a microbiome regression-based kernel association test for multicategory phenotypes (multiMiRKAT). As a data-driven optimal test, multiMiAT then integrates multiMiRKAT, score test and MiRKAT-MC to maintain excellent performance in diverse association patterns. Massive simulation experiments prove the success of our method. Furthermore, multiMiAT is also applied to real microbiome data experiments to detect the association between the gut microbiome and clinical statuses of colorectal cancer as well as for diverse statuses of Clostridium difficile infections.
Collapse
Affiliation(s)
- Han Sun
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
| | - Yue Wang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
| | - Zhen Xiao
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
| | - Xiaoyun Huang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- Collaborative & Innovative Center for Educational Technology, Central China Normal University, Wuhan 430079, China
| | - Haodong Wang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
| | - Tingting He
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
4
|
Identification of microbial features in multivariate regression under false discovery rate control. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|