Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ditzler G, Morrison JC, Lan Y, Rosen GL. Fizzy: feature subset selection for metagenomics. BMC Bioinformatics 2015;16:358. [PMID: 26538306 DOI: 10.1186/s12859-015-0793-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 10/23/2015] [Indexed: 01/22/2023] Open

For:	Ditzler G, Morrison JC, Lan Y, Rosen GL. Fizzy: feature subset selection for metagenomics. BMC Bioinformatics 2015;16:358. [PMID: 26538306 DOI: 10.1186/s12859-015-0793-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 10/23/2015] [Indexed: 01/22/2023] Open

Number

Cited by Other Article(s)

Jiang Y, Aton M, Zhu Q, Lu YY. Modeling microbiome-trait associations with taxonomy-adaptive neural networks. MICROBIOME 2025;13:87. [PMID: 40158141 PMCID: PMC11954268 DOI: 10.1186/s40168-025-02080-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 03/04/2025] [Indexed: 04/01/2025]

Song Y, Atza E, Sánchez-Gil JJ, Akkermans D, de Jonge R, de Rooij PGH, Kakembo D, Bakker PAHM, Pieterse CMJ, Budko NV, Berendsen RL. Seed tuber microbiome can predict growth potential of potato varieties. Nat Microbiol 2025;10:28-40. [PMID: 39730984 DOI: 10.1038/s41564-024-01872-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 10/31/2024] [Indexed: 12/29/2024]

Affiliation(s)

Yang Song Plant-Microbe Interactions, Institute of Environmental Biology, Department of Biology, Science4Life, Utrecht University, Utrecht, the Netherlands
Elisa Atza Numerical Analysis, Delft Institute of Applied Mathematics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, the Netherlands
Juan J Sánchez-Gil Plant-Microbe Interactions, Institute of Environmental Biology, Department of Biology, Science4Life, Utrecht University, Utrecht, the Netherlands
Doretta Akkermans HZPC Research B.V., Department of Plant Pathology, Metslawier, the Netherlands
Ronnie de Jonge Plant-Microbe Interactions, Institute of Environmental Biology, Department of Biology, Science4Life, Utrecht University, Utrecht, the Netherlands AI Technology for Life, Department of Information and Computing Sciences, Science4Life, Utrecht University, Utrecht, the Netherlands
Peter G H de Rooij Plant-Microbe Interactions, Institute of Environmental Biology, Department of Biology, Science4Life, Utrecht University, Utrecht, the Netherlands
David Kakembo Plant-Microbe Interactions, Institute of Environmental Biology, Department of Biology, Science4Life, Utrecht University, Utrecht, the Netherlands
Peter A H M Bakker Plant-Microbe Interactions, Institute of Environmental Biology, Department of Biology, Science4Life, Utrecht University, Utrecht, the Netherlands
Corné M J Pieterse Plant-Microbe Interactions, Institute of Environmental Biology, Department of Biology, Science4Life, Utrecht University, Utrecht, the Netherlands
Neil V Budko Numerical Analysis, Delft Institute of Applied Mathematics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, the Netherlands
Roeland L Berendsen Plant-Microbe Interactions, Institute of Environmental Biology, Department of Biology, Science4Life, Utrecht University, Utrecht, the Netherlands.

Collapse

Hosseiniyan Khatibi SM, Dimaano NG, Veliz E, Sundaresan V, Ali J. Exploring and exploiting the rice phytobiome to tackle climate change challenges. PLANT COMMUNICATIONS 2024;5:101078. [PMID: 39233440 PMCID: PMC11671768 DOI: 10.1016/j.xplc.2024.101078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 08/07/2024] [Accepted: 09/02/2024] [Indexed: 09/06/2024]

Abstract

The future of agriculture is uncertain under the current climate change scenario. Climate change directly and indirectly affects the biotic and abiotic elements that control agroecosystems, jeopardizing the safety of the world's food supply. A new area that focuses on characterizing the phytobiome is emerging. The phytobiome comprises plants and their immediate surroundings, involving numerous interdependent microscopic and macroscopic organisms that affect the health and productivity of plants. Phytobiome studies primarily focus on the microbial communities associated with plants, which are referred to as the plant microbiome. The development of high-throughput sequencing technologies over the past 10 years has dramatically advanced our understanding of the structure, functionality, and dynamics of the phytobiome; however, comprehensive methods for using this knowledge are lacking, particularly for major crops such as rice. Considering the impact of rice production on world food security, gaining fresh perspectives on the interdependent and interrelated components of the rice phytobiome could enhance rice production and crop health, sustain rice ecosystem function, and combat the effects of climate change. Our review re-conceptualizes the complex dynamics of the microscopic and macroscopic components in the rice phytobiome as influenced by human interventions and changing environmental conditions driven by climate change. We also discuss interdisciplinary and systematic approaches to decipher and reprogram the sophisticated interactions in the rice phytobiome using novel strategies and cutting-edge technology. Merging the gigantic datasets and complex information on the rice phytobiome and their application in the context of regenerative agriculture could lead to sustainable rice farming practices that are resilient to the impacts of climate change.

Collapse

Zhao H, Wang Y, Sun Y, Wang Y, Shi B, Liu J, Zhang S. Hematological indicator-based machine learning models for preoperative prediction of lymph node metastasis in cervical cancer. Front Oncol 2024;14:1400109. [PMID: 39193382 PMCID: PMC11347340 DOI: 10.3389/fonc.2024.1400109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 07/29/2024] [Indexed: 08/29/2024] Open

Abstract

Background

Lymph node metastasis (LNM) is an important prognostic factor for cervical cancer (CC) and determines the treatment strategy. Hematological indicators have been reported as being useful biomarkers for the prognosis of a variety of cancers. This study aimed to evaluate the feasibility of machine learning models characterized by preoperative hematological indicators to predict the LNM status of CC patients before surgery.

Methods

The clinical data of 236 patients with pathologically confirmed CC were retrospectively analyzed at the Gynecology Oncology Department of the First Affiliated Hospital of Bengbu Medical University from November 2020 to August 2022. The least absolute shrinkage and selection operator (LASSO) was used to select 21 features from 35 hematological indicators and for the construction of 6 machine learning predictive models, including Adaptive Boosting (AdaBoost), Gaussian Naive Bayes (GNB), and Logistic Regression (LR), as well as Random Forest (RF), Support Vector Machines (SVM), and Extreme Gradient Boosting (XGBoost). Evaluation metrics of predictive models included the area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, and F1-score.

Results

RF has the best overall predictive performance for ten-fold cross-validation in the training set. The specific performance indicators of RF were AUC (0.910, 95% confidence interval [CI]: 0.820-1.000), accuracy (0.831, 95% CI: 0.702-0.960), specificity (0.835, 95% CI: 0.708-0.962), sensitivity (0.831, 95% CI: 0.702-0.960), and F1-score (0.829, 95% CI: 0.696-0.962). RF had the highest AUC in the testing set (AUC = 0.854).

Conclusion

RF based on preoperative hematological indicators that are easily available in clinical practice showed superior performance in the preoperative prediction of CC LNM. However, investigations on larger external cohorts of patients are required for further validation of our findings.

Collapse

Peralta-Marzal LN, Rojas-Velazquez D, Rigters D, Prince N, Garssen J, Kraneveld AD, Perez-Pardo P, Lopez-Rincon A. A robust microbiome signature for autism spectrum disorder across different studies using machine learning. Sci Rep 2024;14:814. [PMID: 38191575 PMCID: PMC10774349 DOI: 10.1038/s41598-023-50601-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/21/2023] [Indexed: 01/10/2024] Open

Abstract

Autism spectrum disorder (ASD) is a highly complex neurodevelopmental disorder characterized by deficits in sociability and repetitive behaviour, however there is a great heterogeneity within other comorbidities that accompany ASD. Recently, gut microbiome has been pointed out as a plausible contributing factor for ASD development as individuals diagnosed with ASD often suffer from intestinal problems and show a differentiated intestinal microbial composition. Nevertheless, gut microbiome studies in ASD rarely agree on the specific bacterial taxa involved in this disorder. Regarding the potential role of gut microbiome in ASD pathophysiology, our aim is to investigate whether there is a set of bacterial taxa relevant for ASD classification by using a sibling-controlled dataset. Additionally, we aim to validate these results across two independent cohorts as several confounding factors, such as lifestyle, influence both ASD and gut microbiome studies. A machine learning approach, recursive ensemble feature selection (REFS), was applied to 16S rRNA gene sequencing data from 117 subjects (60 ASD cases and 57 siblings) identifying 26 bacterial taxa that discriminate ASD cases from controls. The average area under the curve (AUC) of this specific set of bacteria in the sibling-controlled dataset was 81.6%. Moreover, we applied the selected bacterial taxa in a tenfold cross-validation scheme using two independent cohorts (a total of 223 samples-125 ASD cases and 98 controls). We obtained average AUCs of 74.8% and 74%, respectively. Analysis of the gut microbiome using REFS identified a set of bacterial taxa that can be used to predict the ASD status of children in three distinct cohorts with AUC over 80% for the best-performing classifiers. Our results indicate that the gut microbiome has a strong association with ASD and should not be disregarded as a potential target for therapeutic interventions. Furthermore, our work can contribute to use the proposed approach for identifying microbiome signatures across other 16S rRNA gene sequencing datasets.

Collapse

Alshawaqfeh M, Rababah S, Hayajneh A, Gharaibeh A, Serpedin E. MetaAnalyst: a user-friendly tool for metagenomic biomarker detection and phenotype classification. BMC Med Res Methodol 2022;22:336. [PMID: 36577938 PMCID: PMC9795700 DOI: 10.1186/s12874-022-01812-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 11/28/2022] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

Many metagenomic studies have linked the imbalance in microbial abundance profiles to a wide range of diseases. These studies suggest utilizing the microbial abundance profiles as potential markers for metagenomic-associated conditions. Due to the inevitable importance of biomarkers in understanding the disease progression and the development of possible therapies, various computational tools have been proposed for metagenomic biomarker detection. However, most existing tools require prior scripting knowledge and lack user friendly interfaces, causing considerable time and effort to install, configure, and run these tools. Besides, there is no available all-in-one solution for running and comparing various metagenomic biomarker detection simultaneously. In addition, most of these tools just present the suggested biomarkers without any statistical evaluation for their quality.

RESULTS

To overcome these limitations, this work presents MetaAnalyst, a software package with a simple graphical user interface (GUI) that (i) automates the installation and configuration of 28 state-of-the-art tools, (ii) supports flexible study design to enable studying the dataset under different scenarios smoothly, iii) runs and evaluates several algorithms simultaneously iv) supports different input formats and provides the user with several preprocessing capabilities, v) provides a variety of metrics to evaluate the quality of the suggested markers, and vi) presents the outcomes in the form of publication quality plots with various formatting capabilities as well as Excel sheets.

CONCLUSIONS

The utility of this tool has been verified through studying a metagenomic dataset under four scenarios. The executable file for MetaAnalyst along with its user manual are made available at https://github.com/mshawaqfeh/MetaAnalyst .

Collapse

Correa-Garcia S, Constant P, Yergeau E. The forecasting power of the microbiome. Trends Microbiol 2022;31:444-452. [PMID: 36549949 DOI: 10.1016/j.tim.2022.11.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 11/25/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022]

Loganathan T, Priya Doss C G. The influence of machine learning technologies in gut microbiome research and cancer studies - A review. Life Sci 2022;311:121118. [DOI: 10.1016/j.lfs.2022.121118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/19/2022] [Accepted: 10/19/2022] [Indexed: 11/18/2022]

Li P, Luo H, Ji B, Nielsen J. Machine learning for data integration in human gut microbiome. Microb Cell Fact 2022;21:241. [PMID: 36419034 PMCID: PMC9685977 DOI: 10.1186/s12934-022-01973-4] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 11/15/2022] [Indexed: 11/25/2022] Open

Hernández Medina R, Kutuzova S, Nielsen KN, Johansen J, Hansen LH, Nielsen M, Rasmussen S. Machine learning and deep learning applications in microbiome research. ISME COMMUNICATIONS 2022;2:98. [PMID: 37938690 PMCID: PMC9723725 DOI: 10.1038/s43705-022-00182-9] [Citation(s) in RCA: 105] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/12/2022] [Accepted: 09/16/2022] [Indexed: 05/27/2023]

Bakir-Gungor B, Hacılar H, Jabeer A, Nalbantoglu OU, Aran O, Yousef M. Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods. PeerJ 2022;10:e13205. [PMID: 35497193 PMCID: PMC9048649 DOI: 10.7717/peerj.13205] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/10/2022] [Indexed: 01/12/2023] Open

Abstract

The tremendous boost in next generation sequencing and in the "omics" technologies makes it possible to characterize the human gut microbiome-the collective genomes of the microbial community that reside in our gastrointestinal tract. Although some of these microorganisms are considered to be essential regulators of our immune system, the alteration of the complexity and eubiotic state of microbiota might promote autoimmune and inflammatory disorders such as diabetes, rheumatoid arthritis, Inflammatory bowel diseases (IBD), obesity, and carcinogenesis. IBD, comprising Crohn's disease and ulcerative colitis, is a gut-related, multifactorial disease with an unknown etiology. IBD presents defects in the detection and control of the gut microbiota, associated with unbalanced immune reactions, genetic mutations that confer susceptibility to the disease, and complex environmental conditions such as westernized lifestyle. Although some existing studies attempt to unveil the composition and functional capacity of the gut microbiome in relation to IBD diseases, a comprehensive picture of the gut microbiome in IBD patients is far from being complete. Due to the complexity of metagenomic studies, the applications of the state-of-the-art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, (i) to generate a classification model that aids IBD diagnosis, (ii) to discover IBD-associated biomarkers, (iii) to discover subgroups of IBD patients using k-means and hierarchical clustering approaches. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR), Select K Best (SKB), Information Gain (IG) and Extreme Gradient Boosting (XGBoost). In our experiments with 100-fold Monte Carlo cross-validation (MCCV), XGBoost, IG, and SKB methods showed a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to Decision Tree, Support Vector Machine, Logitboost, Adaboost, and stacking ensemble classifiers, our Random Forest classifier resulted in better performance measures for the classification of IBD. Our findings revealed potential microbiome-mediated mechanisms of IBD and these findings might be useful for the development of microbiome-based diagnostics.

Collapse

Host phenotype classification from human microbiome data is mainly driven by the presence of microbial taxa. PLoS Comput Biol 2022;18:e1010066. [PMID: 35446845 PMCID: PMC9064115 DOI: 10.1371/journal.pcbi.1010066] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 05/03/2022] [Accepted: 03/29/2022] [Indexed: 12/14/2022] Open

Abstract

Machine learning-based classification approaches are widely used to predict host phenotypes from microbiome data. Classifiers are typically employed by considering operational taxonomic units or relative abundance profiles as input features. Such types of data are intrinsically sparse, which opens the opportunity to make predictions from the presence/absence rather than the relative abundance of microbial taxa. This also poses the question whether it is the presence rather than the abundance of particular taxa to be relevant for discrimination purposes, an aspect that has been so far overlooked in the literature. In this paper, we aim at filling this gap by performing a meta-analysis on 4,128 publicly available metagenomes associated with multiple case-control studies. At species-level taxonomic resolution, we show that it is the presence rather than the relative abundance of specific microbial taxa to be important when building classification models. Such findings are robust to the choice of the classifier and confirmed by statistical tests applied to identifying differentially abundant/present taxa. Results are further confirmed at coarser taxonomic resolutions and validated on 4,026 additional 16S rRNA samples coming from 30 public case-control studies.

The composition of the human microbiome has been linked to a large number of different diseases. In this context, classification methodologies based on machine learning approaches have represented a promising tool for diagnostic purposes from metagenomics data. The link between microbial population composition and host phenotypes has been usually performed by considering taxonomic profiles represented by relative abundances of microbial species. In this study, we show that it is more the presence rather than the relative abundance of microbial taxa to be relevant to maximize classification accuracy. This is accomplished by conducting a meta-analysis on more than 4,000 shotgun metagenomes coming from 25 case-control studies and in which original relative abundance data are degraded to presence/absence profiles. Findings are also extended to 16S rRNA data and advance the research field in building prediction models directly from human microbiome data.

Collapse

Chen X, Liu L, Zhang W, Yang J, Wong KC. Human host status inference from temporal microbiome changes via recurrent neural networks. Brief Bioinform 2021;22:6307015. [PMID: 34151933 DOI: 10.1093/bib/bbab223] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/21/2021] [Accepted: 04/21/2021] [Indexed: 01/04/2023] Open

Jasner Y, Belogolovski A, Ben-Itzhak M, Koren O, Louzoun Y. Microbiome Preprocessing Machine Learning Pipeline. Front Immunol 2021;12:677870. [PMID: 34220823 PMCID: PMC8250139 DOI: 10.3389/fimmu.2021.677870] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 05/07/2021] [Indexed: 11/13/2022] Open

Anyaso-Samuel S, Sachdeva A, Guha S, Datta S. Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier. Front Genet 2021;12:642282. [PMID: 33959149 PMCID: PMC8093763 DOI: 10.3389/fgene.2021.642282] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/18/2021] [Indexed: 11/13/2022] Open

Zeng T, Yu X, Chen Z. Applying artificial intelligence in the microbiome for gastrointestinal diseases: A review. J Gastroenterol Hepatol 2021;36:832-840. [PMID: 33880762 DOI: 10.1111/jgh.15503] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/20/2022]

Marcos-Zambrano LJ, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V, Aasmets O, Berland M, Gruca A, Hasic J, Hron K, Klammsteiner T, Kolev M, Lahti L, Lopes MB, Moreno V, Naskinova I, Org E, Paciência I, Papoutsoglou G, Shigdel R, Stres B, Vilne B, Yousef M, Zdravevski E, Tsamardinos I, Carrillo de Santa Pau E, Claesson MJ, Moreno-Indias I, Truu J. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment. Front Microbiol 2021;12:634511. [PMID: 33737920 PMCID: PMC7962872 DOI: 10.3389/fmicb.2021.634511] [Citation(s) in RCA: 159] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 02/01/2021] [Indexed: 12/19/2022] Open

Abstract

The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.

Collapse

Affiliation(s)

Laura Judith Marcos-Zambrano Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
Kanita Karaduzovic-Hadziabdic Faculty of Engineering and Natural Sciences, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina
Tatjana Loncar Turukalo Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
Piotr Przymus Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Poland
Vladimir Trajkovik Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
Oliver Aasmets Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
Magali Berland Université Paris-Saclay, INRAE, MGP, Jouy-en-Josas, France
Aleksandra Gruca Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
Jasminka Hasic University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
Karel Hron Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia
Thomas Klammsteiner Department of Microbiology, University of Innsbruck, Innsbruck, Austria
Mikhail Kolev South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
Leo Lahti Department of Computing, University of Turku, Turku, Finland
Marta B. Lopes NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal
Victor Moreno Oncology Data Analytics Program, Catalan Institute of Oncology (ICO)Barcelona, Spain Colorectal Cancer Group, Institut de Recerca Biomedica de Bellvitge (IDIBELL), Barcelona, Spain Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Barcelona, Spain Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
Irina Naskinova South West University “Neofit Rilski”, Blagoevgrad, Bulgaria
Elin Org Institute of Genomics, Estonian Genome Centre, University of Tartu, Tartu, Estonia
Inês Paciência EPIUnit – Instituto de Saúde Pública da Universidade do Porto, Porto, Portugal
Georgios Papoutsoglou Department of Computer Science, University of Crete, Heraklion, Greece
Rajesh Shigdel Department of Clinical Science, University of Bergen, Bergen, Norway
Blaz Stres Group for Microbiology and Microbial Biotechnology, Department of Animal Science, University of Ljubljana, Ljubljana, Slovenia
Baiba Vilne Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia
Malik Yousef Department of Information Systems, Zefat Academic College, Zefat, Israel Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
Eftim Zdravevski Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
Ioannis Tsamardinos Department of Computer Science, University of Crete, Heraklion, Greece
Enrique Carrillo de Santa Pau Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
Marcus J. Claesson School of Microbiology & APC Microbiome Ireland, University College Cork, Cork, Ireland
Isabel Moreno-Indias Unidad de Gestión Clínica de Endocrinología y Nutrición, Instituto de Investigación Biomédica de Málaga (IBIMA), Hospital Clínico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain Centro de Investigación Biomédica en Red de Fisiopatología de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
Jaak Truu Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia

Collapse

Iadanza E, Fabbri R, Bašić-ČiČak D, Amedei A, Telalovic JH. Gut microbiota and artificial intelligence approaches: A scoping review. HEALTH AND TECHNOLOGY 2020;10:1343-1358. [DOI: 10.1007/s12553-020-00486-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 10/01/2020] [Indexed: 12/19/2022]

Beyene SS, Ling T, Ristevski B, Chen M. A novel riboswitch classification based on imbalanced sequences achieved by machine learning. PLoS Comput Biol 2020;16:e1007760. [PMID: 32687488 PMCID: PMC7392346 DOI: 10.1371/journal.pcbi.1007760] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 07/30/2020] [Accepted: 05/13/2020] [Indexed: 11/24/2022] Open

Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data. MATHEMATICS 2020. [DOI: 10.3390/math8010110] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

Abbas M, Matta J, Le T, Bensmail H, Obafemi-Ajayi T, Honavar V, EL-Manzalawy Y. Biomarker discovery in inflammatory bowel diseases using network-based feature selection. PLoS One 2019;14:e0225382. [PMID: 31756219 PMCID: PMC6874333 DOI: 10.1371/journal.pone.0225382] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 11/04/2019] [Indexed: 12/20/2022] Open

LaPierre N, Ju CJT, Zhou G, Wang W. MetaPheno: A critical evaluation of deep learning and machine learning in metagenome-based disease prediction. Methods 2019;166:74-82. [PMID: 30885720 PMCID: PMC6708502 DOI: 10.1016/j.ymeth.2019.03.003] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 02/14/2019] [Accepted: 03/04/2019] [Indexed: 01/21/2023] Open

Zhou YH, Gallins P. A Review and Tutorial of Machine Learning Methods for Microbiome Host Trait Prediction. Front Genet 2019;10:579. [PMID: 31293616 PMCID: PMC6603228 DOI: 10.3389/fgene.2019.00579] [Citation(s) in RCA: 105] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 06/04/2019] [Indexed: 12/19/2022] Open

Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data. MATHEMATICS 2019. [DOI: 10.3390/math7060493] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Maltez Thomas A, Prata Lima F, Maria Silva Moura L, Maria da Silva A, Dias-Neto E, Setubal JC. Comparative Metagenomics. Methods Mol Biol 2018;1704:243-260. [PMID: 29277868 DOI: 10.1007/978-1-4939-7463-4_8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Oudah M, Henschel A. Taxonomy-aware feature engineering for microbiome classification. BMC Bioinformatics 2018;19:227. [PMID: 29907097 PMCID: PMC6003080 DOI: 10.1186/s12859-018-2205-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Accepted: 05/15/2018] [Indexed: 12/17/2022] Open

Abstract

Background

What is a healthy microbiome? The pursuit of this and many related questions, especially in light of the recently recognized microbial component in a wide range of diseases has sparked a surge in metagenomic studies. They are often not simply attributable to a single pathogen but rather are the result of complex ecological processes. Relatedly, the increasing DNA sequencing depth and number of samples in metagenomic case-control studies enabled the applicability of powerful statistical methods, e.g. Machine Learning approaches. For the latter, the feature space is typically shaped by the relative abundances of operational taxonomic units, as determined by cost-effective phylogenetic marker gene profiles. While a substantial body of microbiome/microbiota research involves unsupervised and supervised Machine Learning, very little attention has been put on feature selection and engineering.

Results

We here propose the first algorithm to exploit phylogenetic hierarchy (i.e. an all-encompassing taxonomy) in feature engineering for microbiota classification. The rationale is to exploit the often mono- or oligophyletic distribution of relevant (but hidden) traits by virtue of taxonomic abstraction. The algorithm is embedded in a comprehensive microbiota classification pipeline, which we applied to a diverse range of datasets, distinguishing healthy from diseased microbiota samples.

Conclusion

We demonstrate substantial improvements over the state-of-the-art microbiota classification tools in terms of classification accuracy, regardless of the actual Machine Learning technique while using drastically reduced feature spaces. Moreover, generalized features bear great explanatory value: they provide a concise description of conditions and thus help to provide pathophysiological insights. Indeed, the automatically and reproducibly derived features are consistent with previously published domain expert analyses.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2205-3) contains supplementary material, which is available to authorized users.

Collapse

Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 2018;15:20170387. [PMID: 29618526 PMCID: PMC5938574 DOI: 10.1098/rsif.2017.0387] [Citation(s) in RCA: 905] [Impact Index Per Article: 129.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 03/07/2018] [Indexed: 11/12/2022] Open

Affiliation(s)

Travers Ching Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA
Daniel S Himmelstein Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Brett K Beaulieu-Jones Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Alexandr A Kalinin Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
Brian T Do Harvard Medical School, Boston, MA, USA
Gregory P Way Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Enrico Ferrero Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK
Paul-Michael Agapow Data Science Institute, Imperial College London, London, UK
Michael Zietz Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Michael M Hoffman Princess Margaret Cancer Centre, Toronto, Ontario, Canada Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
Wei Xie Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA
Gail L Rosen Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Benjamin J Lengerich Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Johnny Israeli Biophysics Program, Stanford University, Stanford, CA, USA
Jack Lanchantin Department of Computer Science, University of Virginia, Charlottesville, VA, USA
Stephen Woloszynek Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
Anne E Carpenter Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Avanti Shrikumar Department of Computer Science, Stanford University, Stanford, CA, USA
Jinbo Xu Toyota Technological Institute at Chicago, Chicago, IL, USA
Evan M Cofer Department of Computer Science, Trinity University, San Antonio, TX, USA Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
Christopher A Lavender Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
Srinivas C Turaga Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
Amr M Alexandari Department of Computer Science, Stanford University, Stanford, CA, USA
Zhiyong Lu National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
David J Harris Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
Dave DeCaprio ClosedLoop.ai, Austin, TX, USA
Yanjun Qi Department of Computer Science, University of Virginia, Charlottesville, VA, USA
Anshul Kundaje Department of Computer Science, Stanford University, Stanford, CA, USA Department of Genetics, Stanford University, Stanford, CA, USA
Yifan Peng National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Laura K Wiley Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA
Marwin H S Segler Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany
Simina M Boca Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
S Joshua Swamidass Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA
Austin Huang Department of Medicine, Brown University, Providence, RI, USA
Anthony Gitter Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA Morgridge Institute for Research, Madison, WI, USA
Casey S Greene Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA

Collapse

Huang BFF, Boutros PC. The parameter sensitivity of random forests. BMC Bioinformatics 2016;17:331. [PMID: 27586051 PMCID: PMC5009551 DOI: 10.1186/s12859-016-1228-x] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 08/26/2016] [Indexed: 02/07/2023] Open

Pasolli E, Truong DT, Malik F, Waldron L, Segata N. Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights. PLoS Comput Biol 2016;12:e1004977. [PMID: 27400279 PMCID: PMC4939962 DOI: 10.1371/journal.pcbi.1004977] [Citation(s) in RCA: 345] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 05/11/2016] [Indexed: 12/12/2022] Open

Abstract

Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.

The human microbiome–the entire set of microbial organisms associated with the human host–interacts closely with host immune and metabolic functions and is crucial for human health. Significant advances in the characterization of the microbiome associated with healthy and diseased individuals have been obtained through next-generation DNA sequencing technologies, which permit accurate estimation of microbial communities directly from uncultured human-associated samples (e.g., stool). In particular, shotgun metagenomics provide data at unprecedented species- and strain- levels of resolution. Several large-scale metagenomic disease-associated datasets are also becoming available, and disease-predictive models built on metagenomic signatures have been proposed. However, the generalization of resulting prediction models on different cohorts and diseases has not been validated. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of microbiome-phenotype associations. We consider 2424 samples from eight studies and six different diseases to assess the independent prediction accuracy of models built on shotgun metagenomic data and to compare strategies for practical use of the microbiome as a prediction tool.

Collapse